[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: treesit: how to get it to parse multiple languages
From: |
Yuan Fu |
Subject: |
Re: treesit: how to get it to parse multiple languages |
Date: |
Mon, 4 Nov 2024 22:46:50 -0800 |
> On Nov 4, 2024, at 4:02 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Andrew De Angelis <bobodeangelis@gmail.com>
>> Date: Sun, 3 Nov 2024 13:28:57 -0500
>>
>> I'm trying to get a better understanding of treesit.el, and I've stumbled on
>> a couple of things that make me
>> think the manual is either outdated/faulty, or just not entirely clear and
>> I'm missing something.
>>
>> The latter is most likely, but I'd appreciate any help in figuring out what
>> exactly is wrong in my
>> approach/setup. I would be happy to contribute to the manual, if needed, to
>> ensure it is clearer.
>>
>> This is the relevant section of the manual:
>> https://www.gnu.org/software/emacs/manual/html_node/elisp/Multiple-Languages.html
>> I've started out with simply trying to recreate the setup described in the
>> manual, but I've run into some
>> issues.
>> Here's what I've done so far:
>> - I've defined a very simple `html-ts-mode`, using the elisp functions from
>> the manual:
>> https://github.com/andrewdea/poc-html-ts-mode/blob/main/html-ts-mode.el
>> - I activate this mode when visiting the example.html file (which is also
>> copied from the manual):
>> https://github.com/andrewdea/poc-html-ts-mode/blob/main/example.html
>> - the queries seem to be working as expected: when I'm in a buffer visiting
>> example.html, evaluating
>> `(treesit-query-capture 'html css-query)` and `(treesit-query-capture 'html
>> js-query)` return the expected
>> nodes
>> - ISSUE: `treesit-update-ranges` doesn't seem to be working as expected:
>> even if I call it multiple times, the
>> parser for the whole buffer seems to still be 'html. `(treesit-language-at
>> (point))` always returns 'html, even
>> when I'm inside the nodes captured by the css-query or js-query.
>>
>> Some additional context: the reason I'm looking into tree-sitter (and its
>> functionalities to support multiple
>> languages) is to potentially use it to fontify markdown code blocks and to
>> improve emacs support for python
>> notebooks. For markdown, I was trying a similar approach to the HTML one
>> described in the manual, but ran
>> into other similar issues:
>> https://www.reddit.com/r/emacs/comments/1gcrv8k/syntaxhighlighting_codeblocks_in_markdown/.
>> I'm just including this as context.
>>
>> Let me know if any of this is not clear.
>>
>> Thanks in advance for all your help!
>
> Yuan, can you help Andrew?
Ah yes, thanks for the ping. Andrew, I take that your problem is with
treesit-language-at, right? Specifically, it doesn’t return expected results.
That’s because for treesit-language-at to work, major mode needs to define
treesit-language-at-function.
This confusion has came up a couple times now, evidently treesit-language-at is
not very intuitive. Hopefully it’ll be fixed by our updated manual for Emacs
30. In Emacs 30, we define treesit-language-at-function in the example code:
Emacs automates this process in ‘treesit-update-ranges’. A
multi-language major mode should set ‘treesit-range-settings’ so that
‘treesit-update-ranges’ knows how to perform this process automatically.
Major modes should use the helper function ‘treesit-range-rules’ to
generate a value that can be assigned to ‘treesit-range-settings’. The
settings in the following example directly translate into operations
shown above.
(setq treesit-range-settings
(treesit-range-rules
:embed 'javascript
:host 'html
'((script_element (raw_text) @capture))
:embed 'css
:host 'html
'((style_element (raw_text) @capture))))
;; Major modes with multiple languages should always set
;; `treesit-language-at-point-function' (which see).
(setq treesit-language-at-point-function
(lambda (pos)
(let* ((node (treesit-node-at pos 'html))
(parent (treesit-node-parent node)))
(cond
((and node parent
(equal (treesit-node-type node) "raw_text")
(equal (treesit-node-type parent) "script_element"))
'javascript)
((and node parent
(equal (treesit-node-type node) "raw_text")
(equal (treesit-node-type parent) "style_element"))
'css)
(t 'html)))))
And FYI, in Emacs 30 we added local parsers, that might make implementing
code/markdown blocks in a notebook easier.
Yuan
- treesit: how to get it to parse multiple languages, Andrew De Angelis, 2024/11/03
- Re: treesit: how to get it to parse multiple languages, Eli Zaretskii, 2024/11/04
- Re: treesit: how to get it to parse multiple languages,
Yuan Fu <=
- Re: treesit: how to get it to parse multiple languages, Andrew De Angelis, 2024/11/10
- Re: treesit: how to get it to parse multiple languages, Peter Oliver, 2024/11/10
- Re: treesit: how to get it to parse multiple languages, Juri Linkov, 2024/11/11
- Re: treesit: how to get it to parse multiple languages, Yuan Fu, 2024/11/19
- Re: treesit: how to get it to parse multiple languages, Vincenzo Pupillo, 2024/11/19
- Re: treesit: how to get it to parse multiple languages, Yuan Fu, 2024/11/24
- Re: treesit: how to get it to parse multiple languages, Vincenzo Pupillo, 2024/11/24
- Re: treesit: how to get it to parse multiple languages, Juri Linkov, 2024/11/29
- Re: treesit: how to get it to parse multiple languages, Yuan Fu, 2024/11/29
- Re: treesit: how to get it to parse multiple languages, Vincenzo Pupillo, 2024/11/29