emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter api


From: Yuan Fu
Subject: Re: Tree-sitter api
Date: Thu, 16 Sep 2021 23:56:20 -0700

>> 
>> My point is, major mode writers need to read the source of the tree-sitter 
>> language definition to do anything useful with tree-sitter
> 
> If this is so, then why do we bother documenting the Lisp APIs for
> TS-related features?  If Lisp programmers need to read the TS sources
> to do anything useful in Emacs, let them read the sources, including
> the Lisp and C sources you are working on?
> 
> That was somewhat sarcastic, but my point is that this is NOT how we
> do this kind of stuff in Emacs.  We should have Lisp-level facilities
> that reflect the TS features, and those Lisp-level facilities should
> be documented and should be the ONLY thing a Lisp programmer needs to
> read to adapt his/her major mode to TS.  We should NOT assume that
> Lisp programmers read the TS source code, exactly like we don't assume
> that for other libraries, like GnuTLS, librsvg, or libgccjit.  Under
> that modus operandi, the way to glean the <lang> part from the major
> mode's language name is something that should be part of the
> facilities we provide.

Thank you for your patience. I certainly believe in documentation and put 
considerable effort into it, and if it is possible to document as you 
described, I would do it. We have documentation for all the tree-sitter 
features provided by Emacs and a bit more, but I don’t think it is possible to 
document the language definitions. We can think of language definitions as BNF 
grammars for each language, how do you document that? Say, for the language 
definition for Scheme below, how do we document it?

<token> --> <identifier> | <boolean> | <number>
     | <character> | <string>
     | ( | ) | #( | 
' | ` | , | ,@ | .
<delimiter> --> <whitespace> | ( | ) | " | ;
<whitespace> --> <space or newline>
<comment> --> ;  <all subsequent characters up to a
                 line break>
...
<number> --> <num 2>| <num 8>
     | <num 10>| <num 16>
…

The language definition source of a tree-sitter language is basically that, 
with some superfluous javascript syntax. Language definitions are not mechanic, 
but rather data—you can document mechanic but not really data.

And I want to also point out that as Emacs core developers, we can’t possibly 
provide a good translation from convention language names to their tree-sitter 
name (C# -> c-sharp). Maybe we can do a half-decent job, but 1) that won’t 
cover all available languages, and 2) if there is a new language, we need to 
wait for the next release to update our translation. It is better for the major 
mode writers to provide the information on how to translate names. Because, as 
I said earlier, they already know it.

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]