emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature/tree-sitter: Where to Put C/C++ Stuff


From: Theodor Thornhill
Subject: Re: feature/tree-sitter: Where to Put C/C++ Stuff
Date: Tue, 01 Nov 2022 12:53:11 +0100

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: emacs-devel@gnu.org, dev@rjt.dev, emacs-devel@gnu.org
>> Date: Tue, 01 Nov 2022 08:55:44 +0100
>> 
>> Yes, well, partially.  I think that we are too likely to create unwanted
>> issues by merging the two too closely.
>
> Then we should merge them "not too closely", I guess.  The challenge
> is to merge them so that we gain the most and lose the least.
>

That is reasonable.  It's just the sentiment that we should do a full on
merge between tree-sitter and cc mode I don't like.  If we can find a
way to blend and still keep them distinct, we are on the correct path.
I don't have a clear solution, I'm afraid.  Personally I like how I did
it in ts-mode, where we fall back to cc mode if we cannot enable
tree-sitter.  That's not as easy an option for i.e java because java
already exists.  So some code has to end up in cc-mode, unless we make
separate modes.

>> 1: Use CC mode for one thing and tree-sitter for the rest
>> While first implementing tree-sitter in c-sharp mode we tried just
>> applying font-locking, and use cc mode for indentation and the rest.
>> What happened was that we immediately inherited the performance issues
>> from cc mode straight into our code.
>
> If those same performance issues exist today, then we don't lose
> anything, do we?  We just gain less than we could.  But the amount of
> work required for rewriting the other parts of CC Mode is huge, and we
> don't want to leave users of CC Mode in a dilemma whether to switch to
> a new mode and lose everything else for a significant amount of time,
> or give up tree-sitter and stay with CC Mode.  Not something I'd agree
> to.
>

That is also reasonable.


> I also have hard time believing that you can reimplement those slow
> parts of CC Mode to be much faster, but if you have code to show which
> does that, I'm sure I'd be interested to look at it and consider
> improving CC Mode using that code.
>

You'd be surprised.

- https://github.com/emacs-csharp/csharp-mode/pull/251
- https://github.com/emacs-csharp/csharp-mode/issues/207
- https://github.com/emacs-csharp/csharp-mode/issues/164
- https://debbugs.gnu.org/db/43/43631.html
- https://github.com/emacs-csharp/csharp-mode/issues/151
- https://github.com/emacs-csharp/csharp-mode/issues/200

All of these are solved with [0], no implementation needed for anything
(apart from generic tree-sitter machinery of course). 


>> Specifically, when typing in a
>> file with too many (from cc mode's perspective) strings, typing lag rose
>> to several seconds per press.  I filed several bug reports on this both
>> here and to Alan.  After some time and much heroics we got some
>> improvement on this from Alan, but c-sharp already had moved on.
>
> I don't know what c-sharp mode does besides fontification and
> indentation, but CC Mode does a lot more, see below.  If you
> disregarded a significant part of that, or if it is not relevant for
> editing C# code, then your particular experience is not very
> educational for the purposes of this discussion, and could lead us to
> wrong conclusions.
>
> It is trivially correct that a new mode can move much faster and make
> breaking changes, but this is unacceptable for a mode that comes with
> Emacs.  We respect our users much more than 3rd-party packages out
> there do, and we do that for good reasons.
>

I don't believe I disregard much here.  Yes it is trivially correct, but
I've spent a lot of time to improve on the c#-cc-mode support, out of
the same reasons you mention.

>> 2: Using separate names for modes.
>> The great advantage here is easy to understand.  You have no inheritance
>> issues, and are free to develop features without regards to legacy.  A
>> disadvantage is that some users depend on that major mode name for other
>> stuff.
>
> That's a _huge_ disadvantage, in my book.
>

Yes I agree

>> 3: Confusion with where to file bugs
>
> Not relevant in our case: the bugs should be filed with Emacs.
>

Well, are you sure?  Diagnosing a bug and its origin is as important as
actually writing the code.  Trying to make that diagnosing step easier
isn't worthless.  Even though all bugs end up in Emacs, the likelihood
that some casual reader of this list submits some queries and a function
to tree-sitter is _much_ bigger than almost anyone on this list trying
to grok cc.

>> 4: How do we know what to disable?
>> If there's a problem somewhere in the tree-sitter variant of the cc mode
>> derived new mode, and we see some issue - who makes the fix?
>
> Also not relevant: the answer is "we the Emacs project make the fix".
>

Sure, but we want as many as possible to be able to fix them, no?


>> 5: While tree-sitter is only an engine, it provides a lot more goodies
>> We have a huge opportunity to create real new frameworks for emacs now,
>> but limiting us to merge the features/modes suggests that we cannot
>> reliably do overarching advancements such as we see now in the
>> feature/tree-sitter branch.
>
> Yes.  And trying to make breaking changes in important Emacs features
> such as CC Mode is really a non-starter.  It isn't going to happen.
>

Ok.  Let me be clear.  I'm not suggesting breaking changes.  I'm only
saying that CC mode should go.  I agree with you here.  I'm trying to be
mindful with how, and offering some real, hard won experiences in this
exact tree-sitter/cc-mode gap.  It is trivially easy to say that we
should just add it to cc mode, not so much to know what some of the
hidden issues are.

>> 6: What are the goodies that we really need from CC mode?
>> CC mode provides indentation and font locking.  What else does it
>> provide that isn't replaceable pretty quickly?  I mean this not as a
>> contrarian, but out of real curiosity.
>
> CC Mode has a full-blown manual, where this question is answered.
> Here's a partial list of features outside of the fontification and
> indentation area, which I collected just by looking at the top-level
> menus of that manual:
>
>  . filling and breaking text in comments and strings
>  . automatic insertion of newlines after braces, colons, commas, semi-colons
>  . whitespace cleanups
>  . minor modes: electric, hungry-delete, comment-style
>  . c-offsets-alist and interactive indentation customization (related
>    to indentation, but still extremely important, and not directly in
>    tree-sitter)
>

Yes, I've read the manual many times.  Filling is one nice thing,
agreed.  electric, hungry-delete is just sitting there waiting for us to
create a framework using tree-sitter that would benefit _all_ languages
supported by tree-sitter, not just cc.

>> My guess is that we can get to feature parity and well beyond that
>> in a very short amount of time, if we're not hindered by merging
>> everything.
>
> As they say, "show me the code".  If you can write up a C/C++ mode
> from scratch which supports most everything in the CC Mode manual, do
> it better/cleaner than CC Mode does, and do it before the emacs-29
> branch is cut, in a month or so, I might change my mind.
>

Challenge accepted.  Can I create it for java, which is a language I'm
writing a lot in these days?  It would be simpler for me to test with
stuff I use daily, but still very much related to CC mode functionality.
I can branch out from feature/tree-sitter and create
progmodes/java-ts-mode.el in scratch/tree-sitter/java, then we can
decide if some variant of it should be merged in to tree-sitter before
the branch is cut.  What do you think?  If so, it would be nice to be
able to commit myself to simplify rebasing/merging with
feature/tree-sitter, and also not littering Yuan with reviews.

>> Sorry for the long mail, but I think we are missing the point by viewing
>> tree-sitter simply as an engine to plop in aside cc mode for
>> convenience, and not the real infrastructure change it is.
>
> Who said we view tree-sitter that way?
>
> What actually happens is that we gradually introduce tree-sitter as an
> engine for replacing the implementation of Emacs features where it is
> faster and/or better.  That is the plan.  There's no limit to these
> replacements, except what tree-sitter can do and how we can use that.
> But one thing we will NOT do is throw away existing important features
> before we have equivalent replacements and before users tell us the
> replacements are indeed better.
>

Yes, I don't disagree and never said we should.  If did then I misspoke.

>> There is no need to sunset cc mode, but equally there is no need to
>> limit tree-sitter.
>
> There's no limits.  The fact that we use tree-sitter for what we use
> it now is just because _we_ decided to do that initially, in order to
> have it in Emacs 29 as a useful infrastructure that users can take
> advantage of.  I don't believe in releasing Emacs with infrastructure
> that has no user-level features built on it.
>

And which is why I try to create some actual, useful modes for us for
the merge.

>> > Tree-sitter doesn't (and cannot) replace everything a major mode does
>> > for a programming language.  So a completely new mode means we through
>> > the baby with the bathwater.
>> 
>> I don't agree, but I'm very curious to what else would take a
>> significant effort _apart_ from indentation feature parity with cc mode is.
>
> See above: just read the CC Mode manual, and see for yourself.

I have, many times :-)


-- 
Theo


[0]: 
https://github.com/emacs-csharp/csharp-mode/blob/master/csharp-tree-sitter.el#L69-L78



reply via email to

[Prev in Thread] Current Thread [Next in Thread]