emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[O] orgmode and pdf


From: x . piter
Subject: [O] orgmode and pdf
Date: Tue, 24 Jul 2012 10:40:02 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Hi list.
I try to make a workflow to mine data from pdfs into org mode.
I prefer to read in emacs, since I have fast dictionary lookup in it and
many other things.
There are two tools I think useful for conversion of pdfs into txt:
cuneiform - to extract text, and pdfimages for image extraction.
Cuneiform is better then other text extractors (what I have tried) in handling 
two columned
pdfs.
A pdf as split to pages and each of them processed separateddly
Using this two programs and some scripting I believe it is possible to
convert pdf in org file. However there are two issues I would like to
solve.
1) Is there any way to extract  figure captions from a pdf?
2) I have no solution for formulas and Greek letters. The only way to handle it 
would be
to consult an image of the page.
Any suggestions about it? Have somebody tried something similar. 
Thanks.
Petro.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]