gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract


From: Karsten Hilbert
Subject: Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR
Date: Tue, 26 Jan 2010 16:20:20 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On Mon, Jan 25, 2010 at 11:41:03PM +0100, Karsten Hilbert wrote:

> > For GNUmed to be able to access such a layer in within-patient searches,
> > would it be necessary for such PDFs to have been imported twice, and/or to
> > use some additional tool to "split" the document into two parts (one an
> > image part, and one the text part)?
> 
> It would be possible to implement the access to the text part inside
> GNUmed. Actually using that in a search would, however, presently
> require exporting each and every document and trying to search it.
> 
> That could, indeed, only be mitigated by splitting the text part
> into a separate for-search table upon import.
> 
> Except that GNUmed already has that table: blobs.doc_desc, of which
> there can by any number per document. In fact, we should probably
> extend the per-patient and across-patients search to look at those !

Which we apparently already do, of course :-)

One concept of the GNUmed document archive that it tries
hard to *not* concern itself with the particulars of the
document part file types. It delegates that as much as at
all possible. Hence splitting / appropriately importing PDF
parts is up to the environment.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346




reply via email to

[Prev in Thread] Current Thread [Next in Thread]