emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some commentary on the Org Syntax document


From: Tom Gillespie
Subject: Re: Some commentary on the Org Syntax document
Date: Fri, 3 Dec 2021 21:26:23 -0800

Hi Timothy,
   Replies in line. Some things might seem a bit out of order
because I responded from bottom to top. Best,
Tom

> from heading to bed, so to quote Pascal "I have only made this letter
> longer because I have not had the time to make it shorter".

Likewise, and I've heard it as Mark Twain :D

> I think a a big problem is the mix of implicit and explicit information.
> Some components are rigorously specified in terms of the characters they
> may contain, elements and objects that are recognised inside them, and
> even the order in which different parts of the pattern are parsed.

I agree completely.

> As mentioned originally, the current Dynamic Blocks description doesn't
> even mention the CONTENTS part of the pattern, and relies on the reader
> inferring that it operates similarly to the CONTENTS part of Drawers.

Indeed this should be fixed.

> Forcing the reader to start making inferences like this is a treacherous
> path, and I think I can blame for some of the other issues I've
> experienced. Take for instance the "surely X can't contain a newline?"
> comments I've made. In the Node Properties and Entities descriptions you
> have statements along the lines of "X can contain any character [...]
> except a newline". In my mind this then sets up the reader to interpret
> a similar statement without the "except a newline" clause to mean that
> newlines are permitted.

I agree completely and had almost the exact same experience as you
when I was working on it. As I mention below, my responses were to
illustrate why the explicit information is missing, not to suggest that it
should be left out. We should definitely work to make everything more
explicit so that future readers don't have to go through the same issues
we have.

> I'm also thinking that the term "element" is overworked in the document.
> It's basically pulling tripple duty: you have Elements, Greater
> Elements, and elements which are Elements and/or Greater Elements 😓.

In extreme agreement.

> 3. Section

Technically This isn't part of the syntax, rather it is part of
elisp Org mode's internal representation. I'm not sure I would
even mention sections at all, because they have to do with
the interpretation of the syntax. In a section on the internal
representation for Org sections definitely belong, but they
are incidental. That said, I suspect we will find that they are
useful for talking about the behavior of the file under transformation,
e.g. "headings are not reordered when pressing M-up or M-down,
sections are reordered" this allows us to make it possible to
talk about an Org implementation that has commands that allow
one to switch the headings without moving their associated
sections.

> 5. (Greater Element / Element)

There are issues here with forms that are part of the syntax vs
forms that are part of the intermediate representation. A line
based parser for Org syntax that assembles greater blocks
after the fact and a parser that uses arbitrary lookahead to
truncate on headings won't have the exact same surface
syntax, however they will both have an equivalent in their
intermediate representation that corresponds to a greater
block. Again, very deep in implementation details here,
but trying to force things like sections into the syntax
hierarchy seems confusing to me.

> 7. Object

Paragraph element maybe? Might seem odd for heading titles
to have paragraph scope, but on the other hand it certainly
simplifies the explanation of the grammar. And you can put
an inline footnote in a heading title.

> 8. Pattern / Form

Don't know what to make of this one. Like "Term" these are
incredibly generic.

> 9. Term

Use of "Term" is super confusing to me.

> We could say call (1) Components, (7) Units, (6) Objects, (5) Element or
> Object (why not spell it out to avoid telling people to remember
> something).

I'm not sure we are ready to specify this. One way that we
might try to manage this would be to create a taxonomy of
element types, e.g. top-level elements, paragraph elements,
etc. This would be consistent with the fact that the elisp
implementation of org-element has all of these as an instance
of element.

> I could have put more thought into this, but it should do for
> illustrating my line of thinking. Let me know if you have any good
> ideas.

Let's leave the terminology as is right now. I'm expecting that there
will be quite a few new terms that we will want to introduce and we
will want to separate syntax and intermediate representation.

With progress on using org-element for fontification and on laundry
we should be able to come up with language that can be used to
distinguish between concepts that are needed for syntax, (tokens,
parser) and for intermediate representations. Things like basic syntax
highlighting need only the language for syntax to be specified, but more
complex syntax such as babel font-locking either requires a more
advanced tokenizer or it requires that we talk about it at the level
of the intermediate representation. Other things such as behavior
in response to commands (e.g. M-up and M-down mentioned
above) require the language of the intermediate representation.

> A separate improvement could be using more formatting to distinguish
> when terms are used in a particular way.

I think it will be clearer to come up with distinct terms. There are
times where this stuff has to be talked about in spoken language
and it is hard to speak /*_markup_*/.

> I've sort of covered this before, but I think the document would benefit
> from being more explicit in general.

Yes. The reason I brought this up was to indicate the reason why
an explicit account was not present, not to suggest that we shouldn't
add one. Overall the more explicit we can be the better the document.
I have some stashed changes in worg from the time I was reading this
syntax document deeply. I'll see if any of them are relevant for the pass
you are doing now.

> Specifically regarding newlines, perhaps we could add something like
> this to the start of the Objects section?
>
> "Furthermore, while many objects may contain newlines, an empty line
> (i.e. a double newline) often terminates the element that the object is
> a part of, such as a paragraph."

Good idea.

> On this, I'm cautiously optimistic about the discussion about using
> org-element for fontification.

Likewise. Though I expect there will be some growing pains
based on the divergent behaviors I have seen while developing
the laundry test cases.

> I must thank you and Ihor for pointing me to
> org-element-object-restrictions! I wasn't aware of that till now, and
> it's most helpful. Should all the information given by it be included in
> the Syntax document? I lean towards saying yes.

I'm not entirely sure. I think this may be one area where we don't want
to over-specify. I consider it an implementation detail. For example,
when we were discussing valid scopes for org-cite syntax a few
months ago 
https://lists.gnu.org/archive/html/emacs-orgmode/2021-09/msg00128.html
I suggested that the [cite:] syntax could appear in property drawers.
Nicolas corrected me on that. However, there is no reason why a
parser should be prevented from recognizing [cite:] syntax wherever
it wants --- so long as it does not immediately expand that syntax
and execute it to add/include such a citation in the exported file.

For example, in laundry I would parse it and have it expand to a no-op
when exporting, but still have it expand for user interaction so that they
could jump to the citation reference by clicking in the buffer. Similar thing
for syntax in comment comment blocks where I frequently abuse the fact
that it is possible to jump to org links that are in comment blocks to make
it easier to navigate files.

In short, elisp Org mode doesn't have a single intermediate representation
atm, so syntactic restrictions listed by org-element-object-restrictions
are overly narrow and should not be included in the spec for the syntax
because they can be controlled at other levels of the implementation in
cases where there is a unified intermediate representation.

> I'm not sure this element = Element / Greater Element "shorthand" is
> doing us any favours, but I've discussed that already...

Agree. (see response above, I responded from bottom to top)
The object/element/greater-element/org-element/org-object
is supremely confusing. We got the name for heading updated,
(or are in the process of doing so?), but at some point I think we
should see if we can make this a bit less confusing. Too many
collisions when dropping a single qualifier.

> Is it? Perhaps I'm not doing it right but it didn't seem bad to me when
> implementing my parser (though I need to add the element support).

For a ... fun? time see the test case I cooked up for plain lists (linked
below) and then consider how to deal with cases where someone has
put a source block at some indent level. IIRC the suggested behavior
is to truncate leading whitespace to the #+end_src level. Tracking
the indentation level is required to correctly ressemble the nesting
of the lists and cannot be done during tokenization or during parsing
as a result indentation level must be retained for _all_ paragraphs
because they might be preceded by a plain list line. Not hard to
implement, just a lot of things to keep track, thus complex.

https://github.com/tgbugs/laundry/blame/c90700bd1c15d7b04e5ead44ac10005d8d2ada50/laundry/test.org#L70-L91



reply via email to

[Prev in Thread] Current Thread [Next in Thread]