emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Extract document structure from Org file


From: John Kitchin
Subject: Re: [O] Extract document structure from Org file
Date: Sat, 04 Jul 2015 11:54:39 -0400

I worked out a new version of the swish-e org indexer that indexes
custom xml representing the org file that you may find interesting for
your project.

http://kitchingroup.cheme.cmu.edu/blog/2015/07/04/An-xml-representation-of-an-org-document-for-indexing-with-swish-e/

It enables a search like this:

swish-e -f index-org2xml.swish-e -w src-block.language=python -w 
src-block=diffusion

to find org files with a python source block containing the word
diffusion.

I think swish-e supports ranking
(http://swish-e.org/docs/swish-faq.html#how_is_ranking_calculated_) too,
but I have not tried it.

It is pretty interesting overall!



Oleg Sivokon writes:

> John Kitchin <address@hidden> writes:
>
>> You would use org-element.  Try org-element-parse-buffer and
>> org-element-map and maybe org-element-interpret-data.  There's also a
>> bunch of regexp for identifying/finding particular types of elements.
>
> Thanks! I'm already looking into it.
>
>> That sounds really cool. I recently hacked a swish-e index of my org
>> files (there might have been 3000+!)
>> http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/.
>> and
>>
>> I just updated it to index the html version of an org-file so that I
>> take advantage of the structure in the
>> search. 
>> http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/.
>>  It
>> would be cool to have more granular searching though.
>>
>> Is your info project visible
>> anywhere? i can imagine a close-file hook function that updates the
>> database automatically.
>
> Whoa, that's a lot of Org files :) What I wrote so far is on Github, but
> it's in a very early stage, so it's not something you could just drop
> into your Emacs directory and start using right away.
> https://github.com/wvxvw/sphinx-mode
> I've also looked into Swish some time ago.  I also thought about using
> Nepomuk, but, in the later case, I've to admit, I didn't make it through
> the documentation.
>
> The difference in using Sphinx is that it has ranking, and it has a
> relatively terse way of specifying searching criteria.  For example, you
> could ask to search for "some words in this phrase"/3 and it would look
> for occurances of 3 of 5 words given between the quotes.  Or, you could
> ask it to search for @node "R" @contents "printf" "format", and this
> would search for node titles mentioning "R" and having contents with
> words "printf" and "format".
> I've to admit I didn't master it fully (there are far more options and
> settings) but it does something that seems reasonable (if I compare it
> to M-x info-apropos).
>
> I'm also still trying to learn what's the best way to do indenxing, so
> the project is still very raw, but I'll get there one day :)
>
> The ultimate goal is also to write a more human-friendly interface to
> Sphinx, where one could ask questions in a subset of natural language :)
> (but that's a very long way into the future!)
>
> PS. I see that many posts on this list are titled with [O].  What does
> it mean, should I do that too?
>
> Best.
>
> Oleg

--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]