Re: Reliable after-change-functions (via: Using incremental parsing in E

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliable after-change-functions (via: Using incremental parsing in E

From:	Stefan Monnier
Subject:	Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Date:	Tue, 31 Mar 2020 11:11:22 -0400
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

>> IIUC, tree-sitter starts by parsing the whole buffer anyway, and then
>> keeps the parse tree up-to-date in response to buffer changes.
> Why does it need the entire buffer up front?

Because as a general rule you cannot parse a region without looking at
all the preceding text.  That's why when we fontify START..BEG we need
to begin by computing the `syntax-ppss` at START, which involved passing
the whole text from `point-min` to START though `parse-partial-sexp`.

> that sounds like a potential performance killer.

Indeed.  And so does this `syntax-ppss` call we have.
It's OK as long as the parsing is fast enough and you don't use it in
too large buffers.

E.g. I expect that most programming major modes currently exhibit
significant delays when you jump to the end of multi-GB buffer because
of that `syntax-ppss` call.

> Fontifying a small part of a buffer doesn't need its entire text.

Sadly, it does.  In specific cases you may be able to speed things up,
but that's only applicable to some cases.

I'm sure there could be other approaches that focus on trying to parse as
little of the buffer text as possible (e.g. SMIE follows this kind of
idea), but it's difficult to make them work with a "normal" grammar,
providing a full parse tree and giving a reliable result (and without
it degenerating to parsing the whole buffer anyway in most cases).

> In any case, I hope that passing the buffer to tree-sitter doesn't
> involve marshalling the entire buffer text via a function call as a
> huge string, or some such.

These are internal implementation details that can be tweaked later on.
I do expect that the code currently needs to call `buffer-string` or its
moral equivalent.  But if the resources this requires are significant
enough to worry about, then it's a great news: it means the parsing
itself is very fast.

> We should instead request that tree-sitter exposes an API through
> which we could give it direct access to buffer text as 2 parts, before
> and after the gap, like we do with regex code.  Otherwise this will be
> a bottleneck in the long run, not unlike the problem we have with LSP.

I'm not sure exactly which problem with LSP you're thinking about, but
I doubt `buffer-string` is a significant component of a performance
problem with LSP: the time to pass that string to the server via a pipe
should dwarf it.

> I still don't see why it would need the entire buffer for this class
> of applications.  Did anyone try the alternatives, in particular on
> very large buffers?

What alternatives?
How large is "very large" here?


        Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Reliable after-change-functions (via: Using incremental parsing in Emacs), (continued)

Prev by Date: Re: font lock with functions
Next by Date: Re: Interest in nt_load_image?
Previous by thread: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Next by thread: Re: Reliable after-change-functions (via: Using incremental parsing in Emacs)
Index(es):
- Date
- Thread