emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on the standardization of Org


From: Asa Zeren
Subject: Re: Thoughts on the standardization of Org
Date: Sun, 1 Nov 2020 11:03:03 -0500

Thanks for the comments.

Both of you have raised some very good points, but I think that there has been
some confusion as to a number of my arguments. I hope to clarify some things
below.

On Sun Nov. 1, 2020, at 1:20AM Tom Gillespie<tgbugs@gmail.com> wrote:
> My general take is that any active work toward standardization
> would be premature. At the very least a full implementation outside
> of Emacs would need to exist. In the absence of that there is little
> point to standardization. There is ample existing documentation to
> build a compliant parser (pandoc exists as well ...) and any effort
> toward standardization right now would be better spent improving
> the existing implementation or fixing broken ones (e.g. org-ruby).

This could very well be the case. When to create a formal standard is a very
hard question, and there are lots of reasons for it to be too early.

One point I do think needs to be clarified is the extent of a "full
implementation". I don't think that a full editing environment like the one that
exists in Emacs today needs to exist, only a fully functional export
framework. This would require it to understand the full org syntax and
semantics.

Also, part of the reason I wrote my original thoughts is because I observed
some motivation towards standardization, as part of the MIME type effort.

> From your comments, I would suggest reading through
> https://orgmode.org/worg/dev/org-syntax.html if you have not done so
> already. Much of what you mention is already there.

I did give it a read, and I have just given it another read. While I do confess
I did make some terminology mistakes, most of my points still stand after giving
it another read through.

> If something like standardization is still desired, I would suggest that the
> proper framing for any such activities would be as improvement and
> clarification in the documentation, and potentially as formalization of some
> of the existing behaviors of the system. Org is a fairly stable system, and as
> others have said, explicitly leaving things open an unspecified would be
> vital.  There are also parts of org (e.g. babel) where the behavior needs to
> be regularized and made consistent. At the moment those areas need
> contributors, not standardization.

I do agree that this is the right method of creating the standard. Org-mode is a
very large beast to standardize, and it can only be done incrementally, or it is
doomed to fail.

> On Sat, Oct 31, 2020 at 8:22 PM Asa Zeren <asaizeren@gmail.com> wrote:
> > this is impossible. If org catches on before it is standardized, we end up
> > in the situation of Markdown, with many competing standards and
> > non-standards. Hence, standardization is essential.
> The situation for Org is not comparable to markdown. There is a single
> reference implementation for org at the moment. The codebase is massive. There
> are many existing parsers for org files. Many are obviously broken since they
> do not match the reference implementation's behavior. The obviousness is a
> sign that there is not a need for standardization at this time. Further, there
> is little risk that another impl will be created without interoperating with
> the elisp implementation. For example, consider Mauro's use case: being able
> to get colleagues who do not use Emacs to use Org. I suspect most of the
> people who would be working on other implementations would be starting from
> Emacs and would be unlikely to leave. Also unlike markdown, html export is
> just one tiny part of Org, whereas markdown was implemented repeatedly to
> allow text input on web pages where people needed to implement parts of html
> that had not already been specified in markdown.

I agree that this is the current situation. However, there is a real danger
here. People are continually trying to create org implementations (myself
included), and if one of these is successful before an org standard is created,
and it differs from the original elisp implementation in non-trivial to fix
ways, then we have an issue. Perhaps this will not come to pass, and other
implementations should strive for parity, but it is still a danger.

> Standardizing org is much harder than standardizing something like Markdown,
> but I think by breaking it down as follows will maximize the portability of
> org while not compromising on development of org. See some of my other
> recent emails. In the short term this is impossible due to the deep
> dependence on Emacs Lisp. Any outside implementation that is created today
> would have to implement elisp. Few have been able to do this in over 30
> years. Moving beyond elisp requires additional machinery to be added to
> org to be able to specify other top level languages. This is not something
> that is remotely ready for standardization because no one even has a single
> working implementation yet!

I definitely agree that a deep dependence on Emacs Lisp should not be
standardized, and thus there are certain parts of the current org-mode
implementation that cannot be currently specified. However, there still are
areas of org that /can/ be specified without elisp, and we should not stop
standardization of anything because of some things.

> > I see three areas of standardization, which I think should be standardized
> > separately:
> > - Org DOM
> No. This is an implementation detail (see below for more).
> ...
> Depending on exactly what you mean by DOM this does not need to be
> standardized.  There are a couple of points that need to be clarified
> regarding how to treeify the flat list of elements that come out of a parse in
> order to tie things like associated keywords to the correct elements, but
> these are quite minimal. The potential rats nest that is trying to standardize
> a DOM when it is an implementation detail means that I would strongly
> discourage even thinking about Org in that way. I would even discourage
> putting too much emphasis on the org-element api which, while extremely useful
> inside Emacs, is not something that should be standardized because it is a
> detail peculiar to the elisp implementation.

I think that my use of the word DOM has been very confusing here. I definitely
agree that we should not standardize the org-element API, nor the particular way
syntax nodes are represented in elisp. However, what I do think we should
standardize is the abstract tree representation of an org document. For
example, elements vs objects, the idea of nested headlines, etc. would be
specified in the DOM, separate from how to write them. For clarification, this
is what I mean should be specified for HTML. (Though org would necessarily be a
bit more complicated)

    A document is a node. Nodes are either text nodes, which contain only text,
    or they are normal nodes (what's the real name for them?). If they are
    normal nodes, they have a tag, which is text, and a number of attributes
    that have a text key and optionally a text value. Each normal node contains
    an ordered list of children nodes.

Also another note is that the worg syntax document does begin to specify
this. My point is to bring this out into a separate document.

> To the extent that an element tree could be useful, I think it would be as a
> concept in an implementation guide, not as something formally specified.

You may be right about this. Perhaps a formal standard is unnecessary for this.

> > - Org Standard Environments
> Read https://orgmode.org/worg/dev/org-syntax.html. It will get you up to 
> speed with
> the existing terminology that is used in the community.
> ...
> > Org Standard Environments:
> >This is how I would specify elements such as #+begin_src..#+end_src would be
> > specified, as standardized elements of the environment. This would be
> > structured as a number of individual standard environments, such as
> > "Source Blocks" or "Standard Header Properties" (specifying #+title, 
> > #+author, etc.)
> These are well specified already in the
> worg syntax draft. There are a couple of special cases such as src and example
> blocks that could be included explicitly in the syntax to facilitate
> interoperability with parsers for org babel languages. Beyond that, the
> community already has vocabulary that covers what you describe here, as
> mentioned above.

I think I was unclear. I am discussing /how/ they are specified, not their
specification, which is, as you say, currently specified in the worg
document. Perhaps the best way to illustrate my idea is with an example.

Worg said:
> Affiliated Keywords
>
> With the exception of comment, clocks, headlines, inlinetasks, items, node
> properties, planning, property drawers, sections, and table rows, every other
> element type can be assigned attributes.
>
> This is done by adding specific keywords, named “affiliated keywords”, just
> above the element considered, no blank line allowed.
>
> Affiliated keywords
are built upon one of the following patterns:
>
> #+KEY: VALUE
> #+KEY[OPTIONAL]: VALUE
> #+ATTR_BACKEND: VALUE
>
> KEY is either “CAPTION”, “HEADER”, “NAME”, “PLOT” or “RESULTS” string.
>
> BACKEND is a string constituted of alpha-numeric characters, hyphens or
> underscores.
>
> OPTIONAL and VALUE can contain any character but a new line. Only “CAPTION”
> and “RESULTS” keywords can have an optional value.
>
> An affiliated keyword can appear more than once if KEY is either “CAPTION” or
> “HEADER” or if its pattern is “#+ATTR_BACKEND: VALUE”.
>
> “CAPTION”, “AUTHOR”, “DATE” and “TITLE” keywords can contain objects in their
> value and their optional value, if applicable.

The way I envision this standardized is the following:
> Affiliated Keywords
>
> With the exception of comment, clocks, headlines, inlinetasks, items, node
> properties, planning, property drawers, sections, and table rows, every other
> element type can be assigned attributes.
>
> This is done by adding specific keywords, named “affiliated keywords”, just
> above the element considered, no blank line allowed.
>
> Affiliated keywords are built upon one of the following patterns:
>
> #+KEY: VALUE
> #+KEY[OPTIONAL]: VALUE
>
> OPTIONAL and VALUE can contain any character but a new line.
>
> An environment specifies a number of legal KEYs, and for each one must
> specify the following:
> - the structure of VALUE and OPTIONAL
> - whether OPTIONAL is permitted
> - whether the keyword can be repeated multiple times on a single element
>
> ...
>
> Org Standard Environment #42: Backend Attributes
>
> Affiliated keywords where key begins with =ATTR_=, followed by a string
> BACKEND, which must consist of alphanumeric characters, hyphens, or
> underscores, are defined. OPTIONAL is not permitted. Multiple occurrences of
> the keyword are permitted. The structure of VALUE is determined by the export
> backend specified by BACKEND.
>
> These should be used to give additional information to an export backend
> identified by BACKEND.
>
> Org Standard Environment #314: Captioning
>
> The affiliated keywords "CAPTION," "AUTHOR," "DATE," and "TITLE," are defined.
> OPTIONAL is permitted in CAPTION. CAPTION may appear multiple times on a
> single element.
> OPTIONAL is not permitted in AUTHOR, DATE, or TITLE. These may not appear
> multiple times on a single element.
>
> For CAPTION, AUTHOR, DATE, and TITLE, objects may appear in VALUE and
> OPTIONAL (if applicable).

I hope that this example explains what I mean better.

Dr. Arne Babenhauserheide said:
> I would like to add, that this is pretty easy to do, and also to make
> independent of the users emacs environment. Here is an example that
> uses the whole orgmode-babel-latex-html machinery to create derived
> documents from source-of-truth org-mode files which get exported to a
> book:

Yes. Emacs can definitely be used in this way. However, I do not believe that it
should be the only tool that can be used in this way, even if no other tool
exists as present.



I hope I have clarified some of the confusions surrounding my argument.

Thanks, Asa



reply via email to

[Prev in Thread] Current Thread [Next in Thread]