emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] [OT] Scanning for archiving


From: Pieter Praet
Subject: Re: [O] [OT] Scanning for archiving
Date: Sat, 05 Nov 2011 23:36:22 +0100
User-agent: Notmuch/0.9+33~gadde72d (http://notmuchmail.org) Emacs/23.3.1 (x86_64-unknown-linux-gnu)

On Sat, 5 Nov 2011 14:03:24 -0600, Marcelo de Moraes Serpa <address@hidden> 
wrote:
> Hi list,
> 
> I just bought a scanner and started to scan important documents as a
> backup, and archiving them with meaningful metadata in orgmode files. Then
> a question came to mind - what dpi to use? I'm not really savvy when it
> comes to scanning or printing, and I want like a dpi that allows me to
> reprint the document at an acceptable quality later if necessary, but that
> also doesn't take that much space (600dpi pdfs take around 5MB).
> 
> Any insights welcome,
> 
> Thanks,
> 
> Marcelo.

Using PDF for scanned documents results in *huge* files with a seriously
disappointing image quality.  Consider storing your scans in DjVu format
[1], which was developed specifically for this purpose.

I scan all docs @ 600dpi, predominantly gray-scale (only in colour when
it's *really* necessary) and store in DjVu format, all using gscan2pdf [2].

Even at that seemingly overkill resolution, single-page documents are
generally (if they aren't too "grainy") only a few 100 KiB in size.

gscan2pdf also supports a number of OCR utils, but the UI for this is
clumsy (aren't they all...), so you're better off using the CLI tools
directly.  Tesseract is recommended.


I've used this approach to "convert" piles upon piles of old bank
statements to Ledger format, with very little effort.

NOTE: When attempting something like this, a fast scanner with a *reliable*
automatic document feeder will help prevent premature hair loss ;)


Peace

-- 
Pieter

[1] http://djvu.org/resources/whatisdjvu.php
[2] http://gscan2pdf.sourceforge.net/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]