emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Feedback on Org syntax document


From: Ihor Radchenko
Subject: Feedback on Org syntax document
Date: Sun, 06 Nov 2022 07:28:05 +0000

Hi,

It has been a while since I looked into the Org syntax document at
https://orgmode.org/worg/dev/org-syntax.html. So, I am not again ready
to see and comment on things with a fresh eye.

Below, I am listing my further feedback on the document.
I encourage other Org users to look into it as well.
This is one of the big projects that need to be finished as a
prerequisite for registering Org format in RFC:

Submit an IETF RFC to register Org as a MIME type
https://list.orgmode.org/87r1qt9cf0.fsf@gnu.org/

I also answered all the notes. I think that we should remove them for
good and instead spin off email treads from here, if necessary.

-------------

1. Introduction
> Should markdown be mentioned at all?
I see no problem with it.

> 2.2. The minimal and standard sets of objects

> excluding citation references and table cells.

"citation references" could be a link to the relevant 4.6 section.

> 2.4. Indentation

We can mention that Org parser discards common indentation.
For example

   This paragraph will not contain
   a long sequence of spaces before "a".

Also,

   This paragraph does not have leading spaces according to the parser.

This feature is also important in src blocks for whitespace-sensitive
programming languages.

> 3.1.1. Headings
> TITLE (optional)
> A series of objects from the standard set, excluding line break
> objects. It is matched after every other part.

"after every other part" is a bit confusing. I would name what exactly
is matches (KEYWORD and PRIORITY).

Also, at least one space is mandatory after STARS.

> If the TITLE of a heading is exactly the value of org-footnote-section
> (Footnotes by default)

I think that we can clarify a bit about using lisp variables across the
document in 2. Terminology and conventions.

> All content following a heading — up to either the next heading, or
> the end of the document, forms a section contained by the heading.
> This is optional, as the next heading may occur immediately in which
> case no section is formed.

Note that leading blank lines after heading before section are not
included into the section.

In particular, this means that

* Heading without section, but with blank lines


* Another heading with section

This is a section. It includes everything from "This is" down to "Last
heading", including the trailing blank lines.

* Last heading


Top-section follows the same rule:

---------

Paragraph after blank lines after bob. The parent section starts at
"Paragraph".

> Since sections are usually thought of as a larger group that includes
> nested content (e.g. “section 3”), and this isn’t what Org sections
> are, maybe this should be called something slightly different?

Org manual indeed uses "section" as a larger group. In contrast to the
parser.

However, it is difficult to rename section element in Org code for
backwards compatibility reasons. I do not see any easy way to rename
sections, unfortunately.

Thus, I am inclined to keep "section" in Org parser's sense within
syntax document. Possibly, adding a visible disclaimer to the syntax
document.

> 3.2.1. Greater Blocks
> A collection of zero or more elements, subject to two conditions:
>   No line may start with #+end_NAME.

There is just one condition.

> 3.2.4. Footnote Definitions
> [fn:LABEL] CONTENTS
> LABEL
> Either a number or an instance of the pattern fn:WORD, where WORD
> represents a string consisting of word-constituent characters, hyphens
> and underscores (-_).

This is a bit misleading. I am reading this as "LABEL .. an instance of
... fn:WORD ...", which implies [fn:fn:WORD].

> 3.2.5. Inlinetasks
> Urgh, this syntax is ugly. — Tom G, Timothy

Oh, well. This note is not very useful. Lets remote it. You'd better
open a feature request.

> 3.2.6. Items
> TAG (optional)
> ... does not contain the substring "\nbsp{}::\nbsp{}"

What is \nbsp? Something is likely wrong with Org source formatting.

> 3.2.7. Plain Lists
> At a glance it may appear as though nested lists are not possible.
> They are, as items may themselves contain lists.

I am stumbling upon this wording. Maybe

  Note that item elements can contain nested plain list elements.

> if both types are present consecutively then they parse as separate
> lists.

"are parsed"?

> (ordered-plain-list
> (item)
> (item
>  (descriptive-plain-list
>   (item))))

This is wrong. Need

(ordered-plain-list
 (item
   (paragraph))
 (item
  (paragraph)
  (descriptive-plain-list
   (item
     (paragraph)))))

> The failure mode for malformed contents needs to be determined more
> clearly here. We don’t want property draws to suddenly become plain
> drawers just because a user has a malformed line, that could be
> disastrous if certain settings in the property drawer mask settings
> from further up the tree. In short, malformed contents should not
> poison the whole property drawer. — Tom G

Yet, malformed property drawers do become ordinary drawers. If we want
to do something about this, let's discuss in a separate thread.

> Example

> :PROPERTIES:
> :CUSTOM_ID: someid
> :END:

This example does not include a heading, which might be misleading.

Also, it is a good idea to mention top-level property drawer and provide
examples.

> Maybe drop table.el from the spec?

No.

> Can we drop switch support? This seems like a fairly good idea. The 
> functionality can simply be shifted to ARGUMENTS with the well-established 
> :key val forms.
> “For the love of all that is sane” — Tom G

I believe that it is a good idea to drop switches _from syntax document_.
For the Org parser, we should first deprecate it.

> 3.3.4. Planning

> Tom G has requested adding a OPENED keyword to track task
> creation/registration.

Let's discuss it in a separate thread.

> 3.3.8. Keywords

> Perhaps this should be changed to be #+KEY[OPT]: VAL? It would make
> the syntax more regular, considering affiliated keywords. I can’t see
> any backwards compatibility concerns.
> This was suggested by Tom G, but I’m a fan — Timothy.

I think that it is a good idea. Also, see my comment on affiliated
keywords below.

> 3.3.8. Keywords
> Note that while instances of this pattern are preferentially parsed as
> affiliated keywords

Affiliated keywords are described later, making this paragraph hard to
digest. Maybe we can restructure this section to described special
keywords (affiliated and call) first?

> Should this be distinguished from other keywords at the AST
> interpretation stage, instead of the base syntax? — Tom G

I am not sure if I understand the issue. AST has a special babel-call
element.

> Repeating an affiliated keyword before an element will usually result
> in the prior VALUEs being overwritten by the last instance of KEY. The
> sole exception to this is #+header: keywords, where in the case of
> multiple :opt val declarations the last declaration on the first line
> it occurs on has priority.

This is not accurate.
We may instead follow `org-element--collect-affiliated-keywords' and describe
`org-element-keyword-translation-alist', `org-element-parsed-keywords',
`org-element-dual-keywords', and `org-element-multiple-keywords'.

However, I feel like this level of detail is probably too much for
syntax description. If we describe these details, we will restrict
ourselves from possible future syntax extensions. Moreover, merging
certain keywords in AST means that `org-element-interpret-data' simply
cannot recover the original document structure.

I will create a separate thread detailing some ideas on what we may
change in this area.

> 3.3.12. Table Rows
> Table rows can only exist in tables.

Only in Org type tables.

> 4.1. Entities
> It’s been raised that “{}” is really part of the entity, and so probably 
> shouldn’t be considered part of POST — Timothy.

Yes. Please, fix the entity syntax description.

> 4.2. LaTeX Fragments
> It would introduce incompatibilities with previous Org versions, but
> support for $...$ (and for symmetry, $$...$$) constructs ought to be
> removed.

Let's discuss in a separate thread.

> 4.5. Citations

>  and it does not contain any semicolons (;) or subsequence that
>  matches @KEY.

I think that we need to add \semicolon and \at entities to allow
escaping.

> 4.9. Line Breaks
> SPACE
> Zero or more tab and space characters.

Note that pretty much every object includes trailing whitespace. We
should probably mention that.

Also, \\\ is not a line break -- we need to provide PRE\\SPACE pattern
in the description. PRE being not "\".

> 4.10.1. Radio Links
> Is the raw (unparsed) text or the parsed structure matched with radio links?

Unparsed text cannot contain objects.

> 4.13. Statistics Cookies
> A number.

Positive integer.

> 4.15. Table Cells
> The final vertical bar (|) may be omitted in the last cell of a table
> row.

I think that it will be clearer to define cells as

CONTENTS SPACES|
CONTENTS SPACES EOL

> 4.16. Timestamps
> TIME (optional)
> An instance of the pattern H:MM where H represents a one to two digit number 
> (and can start with 0), and M represents a single digit.
> Tom G has some syntax extensions he’d like to suggest for historical /
> far-future dates, timezone offsets, and second/sub-second times.

I agree. We can allow non-whitespace after H:MM for forward-compatibility.

In particular, I have seen 9:34.05+000 to define seconds and time zone.

> Summary of changes compared to the current org-syntax document

Should probably be removed at this point.

>  Org Entities

> Idot
&idot; ??

> shy
Empty HTML representation

> WHITESPACE

That does not look helpful in HTML.

> zwnj
same

> =_ =
???

Also, footnote 12 looks wrong.


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]