Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract

gnumed-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract

From:	Karsten Hilbert
Subject:	Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR
Date:	Tue, 26 Jan 2010 16:20:20 +0100
User-agent:	Mutt/1.5.20 (2009-06-14)

On Mon, Jan 25, 2010 at 11:41:03PM +0100, Karsten Hilbert wrote:

> > For GNUmed to be able to access such a layer in within-patient searches,
> > would it be necessary for such PDFs to have been imported twice, and/or to
> > use some additional tool to "split" the document into two parts (one an
> > image part, and one the text part)?
> 
> It would be possible to implement the access to the text part inside
> GNUmed. Actually using that in a search would, however, presently
> require exporting each and every document and trying to search it.
> 
> That could, indeed, only be mitigated by splitting the text part
> into a separate for-search table upon import.
> 
> Except that GNUmed already has that table: blobs.doc_desc, of which
> there can by any number per document. In fact, we should probably
> extend the per-patient and across-patients search to look at those !

Which we apparently already do, of course :-)

One concept of the GNUmed document archive that it tries
hard to *not* concern itself with the particulars of the
document part file types. It delegates that as much as at
all possible. Hence splitting / appropriately importing PDF
parts is up to the environment.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnumed-devel] Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Jim Busser, 2010/01/05
- [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Jim Busser, 2010/01/15
  - Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Karsten Hilbert, 2010/01/15
  - [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Jim Busser, 2010/01/25
    - Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Karsten Hilbert, 2010/01/25
    - Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Jim Busser, 2010/01/25
    - Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Karsten Hilbert, 2010/01/26
    - Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Karsten Hilbert <=
    - Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Jim Busser, 2010/01/26
    - Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR, Karsten Hilbert, 2010/01/26

Prev by Date: Re: [Gnumed-devel] No tabs at the bottom
Next by Date: [Gnumed-devel] can wiki be https?
Previous by thread: Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR
Next by thread: Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR
Index(es):
- Date
- Thread