emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Comments to the new tree sitter implementation


From: Theodor Thornhill
Subject: Comments to the new tree sitter implementation
Date: Sat, 23 Apr 2022 14:33:04 +0200

Hi there!

I think I finally understand most of the important bits of the tree
sitter implementation as seen from a major mode maintainer, and I want
to share my thoughts so that we can have something concrete to work on
when the feature branch hits.

1. New parent-beginning-of-line preset

For typescript-mode more often that not you want to find some close
parent, then go to beginning of line to calculate the offset.  A
suggested implementation of this is:

```elisp
(defun parent-beginning-of-line ()
  (lambda (node parent bol &rest _)
    (when-let ((parent (tree-sitter-node-parent node)))
      (save-excursion
        (goto-char (tree-sitter-node-start parent))
        (back-to-indentation)
        (tree-sitter-node-start
         (tree-sitter-node-at (point) (point) 'tree-sitter-tsx))))))
```

This works as following:

```typescript
function foo() {
  bar(() => ({
    baz
  }))
}
```

baz will know where to indent because it finds the closest {, then goes
to the start of 'bar' and returns that nodes position.  The same applies
to how 'bar' finds its closest '{', then goes to the first letter of
'function'.

To do all this you need a rule such as
```elisp
((node-is "}" (parent-beginning-of-line) 0)
```
and

```elisp
((parent-is "object_or_some_other_parent" (parent-beginning-of-line) 2)
```

inside of the 'simple-indent-rules'

2. Performance of query preset
The performance of the query preset is not good.  I finally understood
how the indentation engine worked by using that preset, but only after 4
or 5 queries indenting a file took several seconds.  Either it should be
marked with a caveat, or improved in some way.  I didn't get a
profiler-report before removing them.  Removing them and supplying
equivalent rules by using the 'parent-beginning-of-line' function made
indenting huge files instantaneous.

3. Performance of font-locking

The font-locking also can be flaky, and I'm not sure what happens here
either.  It may be the current implementation of typescript-mode, it
could be in tree-sitter.  Does the complexity increase proportionally to
the granularity of the queries, maybe?

Steps to reproduce:
  1. paste in some file, for example this:
    
https://raw.githubusercontent.com/tree-sitter/tree-sitter-javascript/master/grammar.js
  2. start scrolling using 'C-n'
  3. observe enourmous lags when scrolling.

This time I got a profiler report:
```
       15588  97% - command-execute
       15588  97%  - call-interactively
       15417  96%   - funcall-interactively
       15417  96%    - previous-line
       15417  96%     - line-move
       13677  85%      - line-move-visual
       13655  85%       - vertical-motion
       13655  85%        - jit-lock-function
       13651  85%         - jit-lock-fontify-now
       13651  85%          - jit-lock--run-functions
       13651  85%           - run-hook-wrapped
       13651  85%            - #<compiled -0x156eafb94cdd4083>
       13651  85%             - font-lock-fontify-region
       13647  85%              - tree-sitter-font-lock-fontify-region
         762   4%               + tree-sitter-query-capture
         112   0%               + font-lock-default-fontify-region
           4   0%               + font-lock-unfontify-region
        1716  10%      - line-pixel-height
        1716  10%       - jit-lock-function
        1716  10%        - jit-lock-fontify-now
        1716  10%         - jit-lock--run-functions
        1716  10%          - run-hook-wrapped
        1716  10%           - #<compiled -0x156ecffbbac1d883>
        1716  10%            - font-lock-fontify-region
        1716  10%             - tree-sitter-font-lock-fontify-region
         108   0%              - tree-sitter-query-capture
         104   0%               - tree-sitter-expand-pattern
          36   0%                - tree-sitter-expand-pattern
           8   0%                   tree-sitter-expand-pattern
          12   0%              - font-lock-default-fontify-region
          12   0%               - font-lock-fontify-syntactically-region
          12   0%                  syntax-ppss
         171   1%   + byte-code
         156   0% + redisplay_internal (C function)
          89   0% + jsonrpc--process-filter
          63   0% + eldoc-pre-command-refresh-echo-area
          17   0% + timer-event-handler
           4   0%   internal-timer-start-idle
           0   0% + ...
```
As you can see most of the time is spent inside of
'tree-sitter-font-lock-fontify-region'.  I'm not sure how to get even
more granular information than this.

4. How do I make sure that the null node indents without the catch all
I want to be able to type this:

```typescript
const foo = () => {
|                       // This is the indentation I get
}
```

```typescript
const foo = () => {
  |                     // This is the indentation I want
}
```

In some cases the parser returns null, which I can match on using the
provided preset.  However, usually tree-sitter-tsx returns something
like:

```elisp
(some_parent (object))
```

where object is not a null node.  I don't want to match on this
specifically, for two reasons

  1. I have to make rules for every case where this happens
  2. I need to be mindful of the precedence of rules inside of the 
simple-indent-rules

This is because when I start to type something, the ast changes to:

```elisp
(some_parent (object (some_child)))
```

It is the 'some_child' I want to indent, not the object itself.  As such
a rule such as:

```elisp
((parent-is "object") parent 2)
```

will not work for this problem.  It will indent the 'some_child', but
not the cursor inside of the squigglies.

Is there a way to do this with a simple preset?

5. Bugs in error-handling during font-locking

When typing it seems tree-sitter-tsx returns ERROR (which is expected
during typing), but then emacs shifts around the font-locking and
returning broken font-locking.  Sometimes it recovers, sometimes it does
not.  Maybe we shouldn't font-lock when there are errors, or at least
don't change the font-locking from outside of the node where there is an
error?  Now it looks like the whole file errors out, but it is really
only one node, or maybe it bleeds into its sibling.

I attach the screenshots I've already sent to make documentation a
little easier.


Lastly I want to say that I'm really impressed by this, Yuan! Thank you,
and I'm looking forward to getting it mainlined.

I'll also add the link to the typescript-mode I'm making atm, so that
the context of what I'm talking about may be a little clearer:

https://github.com/emacs-typescript/typescript.el/blob/feature/tsx-support/typescript-ts.el

All the best,
Theodor

Attachment: correct.png
Description: PNG image

Attachment: wrong.png
Description: PNG image


reply via email to

[Prev in Thread] Current Thread [Next in Thread]