emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tree-sitter integration on feature/tree-sitter


From: Yuan Fu
Subject: Re: Tree-sitter integration on feature/tree-sitter
Date: Tue, 10 May 2022 10:54:53 -0700


> On May 10, 2022, at 8:43 AM, Yoav Marco <yoavm448@gmail.com> wrote:
> 
> I benchmarked query compilation reuse:
> 
> |   |                                      | no reuse (now) | reuse |
> | 1 | Fontify xdisp.c all at once          |          0.01s | 0.01s |
> | 2 | Fontify 60 next lines of xdisp.c ×10 |          0.10s | 0.00s |
> | 3 | Fontify 60 next lines till the end   |          6.06s | 0.01s |
> 
> 
> The patch to reuse the query is pretty dumb: if the char* for the query
> string didn't change from last time, it reuses the TSQuery object from
> last time instead of calling ts_new_query again. The patch is attached.
> 
> The elisp code for the benchmarks is also attached, but I'll give a
> summary here:
> 
> The queries are tree-sitter-langs' highlights.scm for C.
> 
> Benchmark 1 runs treesit-font-lock-fontify-region once on the entire
> buffer, meaning the query is compiled only once in both cases
> 
> Benchmark 2 runs treesit-font-lock-fontify-region on blocks of 60 lines,
> meaning the no reuse version has to compile the query 10 times even
> though nothing changes in the buffer or query.
> 
> Benchmark 3 is just 2 done all the way. xdisp.c has 36k lines, so the
> 6.06s is consistent
> (600 lines = 0.10s, multiply by 60 ⇒ 36k lines ~= 6.00s).
> 

I had a look and it’s a pretty sensible benchmark, and creating the query 
object taking a lot of time makes sense. But could you maybe run the benchmark 
under gprof and see what you get? Just curious.

> So, is caching worth it? I don't know. It definetily is if it's possible
> to do it internally without introducing a new object type. But I don't
> think that's possible without making a hash map or a complicated cache
> like the one for compiled regexps that compile_pattern uses in
> search.c.

Yeah using a single cache would probably result in a lot of misses since Emacs 
don’t fontify the whole buffer at once. We don’t necessarily need to use a hash 
map. I had a look at search.c and IIUC it uses an Emacs-wide array of 20 regex 
caches and links them into a linked list sorted by most-recently used, which 
doesn’t seem too bad? I think I can do something similar to that. Tho we might 
also want to allow users to pin some “persistent” cache, for example major mode 
font-locking and indent queries, as they are guaranteed to be reused a lot and 
are generally large (ie, slow to create). Maybe that’s unnecessary tho. And I 
wonder if there is a cheap & easy way to do caching buffer-locally…

Or maybe add an argument to query-capture that allow the user to specify 
whether they want the query to be cached, or assume user wants the query to be 
cached if the query is in string form rather than in sexp form.

Yuan


reply via email to

[Prev in Thread] Current Thread [Next in Thread]