emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using incremental parsing in Emacs


From: Dmitry Gutov
Subject: Re: Using incremental parsing in Emacs
Date: Fri, 10 Jan 2020 00:56:38 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 04.01.2020 16:46, arthur miller wrote:

There is a very good presentation of tree-sitter on YT by its author:

https://www.youtube.com/watch?v=Jes3bD6P0To

Looks much better then what I got a picture by just reading on the
website:

It was a good watch.

Some takeaways from me:

It implements a GLR parser. One that can update the existing AST quickly for an arbitrary edit in the middle of a file. (*)

But it parses a new file quickly as well: a 20000 lines JS file in 54ms.

To be able to reach that speed, they went the traditional compiler-writer route of having a separate (grammar-to-C-code) compilation step from a grammar to a parser program (which relies on a shared runtime). (**)

Some of it seems to be by necessity. Every run returns a full AST, not just an "AST up to this position". I suppose the author didn't want the problems that come with unfinished parse trees when code relies on that returned value. (***)

The generated parser, in addition to being incremental, is error-tolerant, which is a necessity for use in editors.

As a result, they have features like fast semantic syntax highlighting, as well code folding that accurately detects where function body begins and ends (previously, Atom and other editors used guessing based on indentation levels, apparently). And a "extend selection" command based on AST as well (****)

Tree-Sitter is also used inside GitHub for various features, including their Semantic library (which implements code navigation on the web).

In the meantime, our current answer to all of the above is syntax-ppss plus local regexp-based parsing around the visible part of the buffer.

To compare:

(*) syntax-ppss is also fully incremental, although the returned value is a very simplistic substitute for an AST. But we've been using it for a while and have done solid things with it.

(**) Which means that if we try to use Tree-Sitter as-is, our current practice of defining the language grammar in Lisp would go our of the window. https://github.com/ubolonton/emacs-tree-sitter demonstrates this as well: language grammars have to be compiled into a shared library (or libraries). We would have lots of grammars supplied by the third party, which is kind of good, but we would lose the ease of experimenting with them that we have now, or being able to write support for a new up-and-coming language very quickly. Which a certain fraction of our users enjoys, AFAIK.

(***) Whereas syntax-ppss stops at a requested position, thus saving on CPU cycles this way. Similarly, if a new system we'll transition to someday also does this, its absolute performance/throughput would be less important if it only usually has to parse a screen-worth of file at a time.

(****) We've been managing surprisingly well with syntax-ppss, forward-sexp, etc. So code folding works quite well in Emacs already, and the easy-kill package in GNU ELPA does the "expand selection" thing very successfully as well. But we could use some improvement in having some more complex syntax supported or handled more easily, in certain languages. Having a "proper AST" available is nothing to sneeze at either, and would likely help a lot in indentation code.

My personal takeaway is that we could really benefit from a lispier version of this technology, and Someone(tm) should start working on that.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]