emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Status update of tree-sitter features


From: Yuan Fu
Subject: Status update of tree-sitter features
Date: Wed, 28 Dec 2022 01:44:32 -0800

Hi,

As the complete feature freeze approaching, this is probably the last set of 
features added to Emacs 29. I stuffed  them in just in time ;-)

1. There is a new predicate in the query language, #pred. It’s like #equal and 
#match. Basically it allows you to filter the captured node with an arbitrary 
function. Right now there are some queries in the font-lock settings that 
matches a little more than what we actually want. For example, for the property 
feature, we only want the “bb” in “aa.bb”, but not in “aa.bb(cc)”, because the 
latter is a method, not property. The query usually matches both. With this new 
predicate we can use a function to filter out the methods.

If we can ensure that every query only captures the intended nodes, the 
font-lock queries can be reused for context extraction: using the query for the 
variable feature, I can find all the variables in a given region, etc.

2. We’ve had treesit-defun-type-regexp for a while, I recently generalized the 
idea into “things”. Now you can use treesit—things-around, 
treesit—navigate-thing, and treesit—thing-at-point to find and navigate 
arbitrary “things”. A “thing” is defined by a regexp that matches the node 
types, plus (optionally) a filter function.

3. Now there is imenu support. Major modes don’t need to define their own imenu 
functions anymore, they just need to set treesit-simple-imenu-settings. They 
also need to set treesit-defun-name-function, which is a function that finds 
out the name of a defun node. It is used by both imenu and add-log-entry.

4. C-like modes now have adequate indent and filling for block comments. 

Lastly I want to remind everyone to update the font-lock settings for your 
major mode to be more complaint to the standard list of features we decided on. 
This is not a hard requirement and major modes are free to extend upon it, but 
it’s nice to be consistent, especially among built-in modes.

Here is the list, for your reference. Among all the features, I think 
assignment is “nice to have”, it’s fine to leave it out if there isn’t enough 
time. Same goes for key: it may or may not apply to a language.

Basic tokens:

delimiter       ,.;      (delimit things)
operator        == != || (produces a value)
bracket         []{}()
misc-punctuation

constant        true, false, null
number
keyword
comment         (includes doc-comments)
string          (includes chars and docstrings)
string-interpolation    f"text {variable}"
escape-sequence         "\n\t\\"
function                every function identifier
variable                every variable identifier
type                    every type identifier
property                a.b  <--- highlight b
key                     { a: b, c: d } <--- highlight a, c
error                   highlight parse error

Abstract features:

assignment: the LHS of an assignment (thing being assigned to), eg:

a = b    <--- highlight a
a.b = c  <--- highlight b
a[1] = d <--- highlight a

definition: the thing being defined, eg:

int a(int b) { <--- highlight a
 return 0
}

int a;  <-- highlight a

struct a { <--- highlight a
 int b;   <--- highlight b
}

As for decoration levels, this is my suggestion:

'(( comment definition)
  ( keyword string type)
  ( assignment builtin constant decorator
    escape-sequence key number property string-interpolation)
  ( bracket delimiter function misc-punctuation operator variable))

Yuan 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]