emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[O] Parsing Org-mode in Python (was: Implementing Org-mode tools in lang


From: Karl Voit
Subject: [O] Parsing Org-mode in Python (was: Implementing Org-mode tools in languages other than ELISP)
Date: Mon, 6 Jan 2014 11:44:40 +0100
User-agent: slrn/0.9.9 (Linux)

Hi!

* Daniel Clemente <address@hidden> wrote:
>> 
>> I dream of having a general Python parser for Org mode files, knowing
>> every bit about the current syntax for Org files, surrounded by enough
>> Python machinery to make it useful.

Oh, this would be great since there are way more Python-coders out
there as ELISP coders.

> Try PyOrgMode (https://github.com/bjonnh/PyOrgMode), it works for
> some files (but still needs corrections: it crashes with date
> formats, with bold markers, etc.).

For my blogging system I am implementing [4] I was doing some
research on current Org-parsers in Python.

My notes about PyOrgMode (2013-05) were that there is not much of a
documentation to use it properly and that the list of open todos
contains rather basic things to consider it elaborated enough.

So far, I consider my own Python parser[1] as the most advanced
Python parser so far (unfortunately). However, I am completely aware
of its downsides:

- it's a very primitive line-by-line parser and not using any classical
  parsing tool at all (works for me so far!)
- it's currently limited to a few Org-mode elements so that I can
  continue to develop my blogging system
  - more Org-mode elements (not all!) will be added when my blogging
    system gets stable enough to add Org-mode syntax features such
    as tables.
- it's not written with the premise to be a stand-alone Org-mode
  parser since I only need it for my blogging system
  - feel free to use it and modify it to be a stand-alone parser

I do think that for a more general approach, somebody should develop
an Org-mode Python parser with classical parsing engines. I do have
some experience with ply[2]. Unfortunately, I have to say that using
ply feels a bit awkward in Python. I did not get the impression that
this is a parsing engine that is done the Python way. A lot of
things are done by convention (naming stuff, and so on) which has
certain limitations in details. And AFAIR there were more things that
puzzled me. However, it got my (simple) job [3] done back then.

> You don't need a Lisp interpreter written in Python, only Python
> code that understands org syntax without getting confused.

I am no expert in this. I do feel that if you are going to use a
ELISP interpreter to parse Org-mode syntax for Python, this should
completely re-use the original Org-parser and nothing else. I have
no idea if this is possible or not.

If you have to implement a parser on your own, you probably should
stick to Python-only.

In order to avoid confusion, your own Python parser implements only
a very well defined and documented sub-set of Org-mode syntax and
should accept/parse everything else als ordinary text (content).
IMHO.

HTH.

  1. https://github.com/novoid/lazyblorg/blob/master/lib/orgparser.py
  2. http://www.dabeaz.com/ply/
  3. 
https://github.com/novoid/2011-04-tagstore-formal-experiment/tree/master/analysis_and_derived_data/scripts
  4. https://github.com/novoid/lazyblorg
-- 
mail|git|SVN|photos|postings|SMS|phonecalls|RSS|CSV|XML to Org-mode:
       > get Memacs from https://github.com/novoid/Memacs <

https://github.com/novoid/extract_pdf_annotations_to_orgmode + more on github




reply via email to

[Prev in Thread] Current Thread [Next in Thread]