[O] Re: unnumbered subsections in latex export

emacs-orgmode
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[O] Re: unnumbered subsections in latex export

From:	Nicolas
Subject:	[O] Re: unnumbered subsections in latex export
Date:	Thu, 31 Mar 2011 23:58:11 +0200
Hello,

Bastien <address@hidden> writes:

> 2. exporters use various methods to export the file (e.g. the HTML
>    exporter goes line by line, the LaTeX exporter parses the file and
>    render each section);
>
>    *Example*: users often ask why the LaTeX exporter cannot export a
>    headline of level 3 right after a headline of level 1: they ask that
>    because the HTML exporter can do this, while the LaTeX one cannot.
>    And the LaTeX one cannot because parsing an ill-structured Org buffer
>    is tricky for it.
>
> 3. exporters are maintained by various people: I know the HTML exporter
>    and the LaTeX one, others know the other exporters, etc.
>
> I need your help do deal with these issues.
>
> The first thing to do is to have a list of annoying inconsistencies that
> need to be addressed in priority.

I have been thinking about exporters for a while now, and I'd like to
share my point of view. Be warned, I will be a bit verbose.

Honestly, I wouldn't talk about just "annoying inconsistencies". I think
we may be running into a serious problem with exporters if some work
isn't done about them. Indeed, It seems to me that it is too much
difficult to create new exporters and managing them could become
unwieldy soon. I have my opinion on how we could anticipate and solve
that.

At the moment, the export process is done in two parts. At first, the
buffer is parsed and changed into a quite complex, and not documented
enough, format: this is the job of org-exp.el. It is complex because the
new format mixes new string markers ("ORG-CENTER-END\n") and text
properties (original-indentation). It isn't documented enough because
some of those properties are not exactly defined, and their meaning, or
their differences, aren't always explicit (org-protected, org-example,
org-verbatim-emph are coming to my mind).

It isn't a problem per se, after all Org is also rich and complex, and
a simpler way to handle this may not be sufficient. But any person
planning to create a new exporter these days has to know all of those
subtleties, and pay attention to both visible and invisible markers when
parsing the new format.

The second part of the export process is backend specific. I'm talking
about org-latex.el org-html.el, etc. As Bastien pointed out, they often
parse the buffer their own way (line-wise or section-wise), adding one
layer of complexity for anyone trying to understand them, and creating
inconsistencies at the same time.

This is why I think exporting should take a slightly different approach.
In essence, org-exp.el should parse itself the format it creates and
call functions from backend specific exporters for each environment or
object it encounters during the parsing. In other words, specific
exporters should only consist in a sum of independent functions, named
uniformly (org-html-export-list, org-latex-export-center), and acting
recursively on parts of the buffer, in a format precisely documented.

Thus, Org documentation should provide an exhaustive list of
environments and objects it offers with their associated format during
export. Then, creating an exporter should be as simple as providing
functions to change every one of them into meaningful strings, which
would then be collected by org-exp.el. The immediate benefit is that
only those among us patching org-exp.el will have to know the
intermediate format it creates, and those creating or patching backends
will work on a well-defined format.

I'll show two examples to illustrate my point: lists and tables. Taken
from a docstring, 

1. first item
   + sub-item one
   + [X] sub-item two
   more text in first item
2. address@hidden last item

will be parsed as:

(ordered (nil \"first item\"
              (unordered (nil "sub-item one")
                         (nil "[CBON] sub-item two"))
              "more text in first item"")
         (3 "last item"))

This allows to easily (see org-list-to-latex, org-list-to-html,
org-list-to-texinfo, and so on) transform an Org list in many different
formats. Alas, it cannot be used in org-html.el and org-docbook.el, as
those, again, parse buffer line-wise.

The same could be said about tables:

| Row 1 | 1 | 2 |
|-------+---+---|
| Row 2 | 3 | 4 |

can be parsed as:

(("Row 1" "1" "2")
 'hline
 ("Row 2" "3" "4"))

and from that, such functions as orgtbl-to-html, or orgtbl-to-latex were
easy to create.

So, basically, what I suggest here is:

1. list all possible environments and objects offered by the Org format
   (table, lists, inlinetasks, center, verbatim, paragraph, headlines,
   time-stamps, LaTeX snippets, footnotes, links, source);
2. define an explicit export format for each of them;
3. determine options that should be know by org-exp, by the backend;
4. create a parser, in org-exp, that will output Org buffer in the
   chosen format;
5. create (many are readily available) functions for each backend to
   interpret them.


Now about that explicit format. Taking this buffer,

--8<---------------cut here---------------start------------->8---
#+title: Example buffer

Some text before first headline.

* First section

  First paragraph $\alpha = 1$.

  Second paragraph.

  - item 1
  - item 2
    #+begin_center
    Text
    #+end_center

  | Row 1 | 1 | 2 |
  | Row 2 | 3 | 4 |

* Second section

  Text with footnote[fn:1].
*************** Inline task
                Some text and
                a [[http://www.gnu.org/software/emacs/][link]]
                :DRAWER:
                - I like
                - lists.
                :END:
*************** END

* Footnotes
[fn:1] Footnote definition.
--8<---------------cut here---------------end--------------->8---

It could be parsed as the following:

'((:title "Example buffer")
  (paragraph "Some text before first headline.")
  (headline "First section" 
            (paragraph "First paragraph " 
                       (latex "$\alpha = 1") 
                       ".")
            (paragraph "Second paragraph")
            (list unordered (nil "item 1") 
                            (nil "item 2")
                            (center (paragraph "Text")))
            (table ("Row 1" "1" "2")
                   hline
                   ("Row 2" "3" "4")))
  (headline "Second section"
            (paragraph "Text with footnote"
                       (footnote "Footnote definition")
                       ".")
            (inlinetask "Inline task"
                        (paragraph "Some text and\na "
                                   (link "link" "http://www.gnu.org/";))
                        (drawer (list unordered (nil "I like")
                                                (nil "lists."))))))

Note that such a parsing will need a decent forward-paragraph function.
It's also a very simplified example: headlines would need more than the
title string (todo keyword, priority, tags) before starting the body.

I have no code to offer at the moment, and, as we all know, Devil is in
the details. But if the output from org-exp.el is clear, exporters will
be more coherent. It is even provide tools to help exporters doing their
task (a function to extract footnotes from the output, for example).

Again, it may be a big task to undertake, but I think it will be
necessary at some point.

Regards,

-- 
Nicolas
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [O] Re: unnumbered subsections in latex export, (continued)
Prev by Date: Re: [O] Illiterate programming question
Next by Date: Re: [O] Re: Continuation of main section text after subsections ?
Previous by thread: [O] Re: unnumbered subsections in latex export
Next by thread: [O] beamer code and auctex
Index(es):
- Date
- Thread