bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful


From: Stefan Monnier
Subject: bug#58158: 29.0.50; [overlay] Interval tree iteration considered harmful
Date: Thu, 29 Sep 2022 12:48:34 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

>> > I may be missing something, but it looks like the sole purpose of the
>> > iter_start/iter_finish dance is to ensure only one iteration per tree
>> > is running at any given time, and that's because the iteration uses
>> > some state variable(s) of which there's only one instance per tree.
>> >
>> > Stefan, am I missing something?
>> 
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
>
> I'm not sure I understand how recursion is related.  Are you saying
> that recursion is replaced with iteration?  But then, if _start and
> _finish are called by the same caller, we don't really need the
> protection, since no one can start another iteration while the first
> one runs.  Right?

Typically, current code will look something like:

    int x;
    Lisp_Object y;

    buffer_overlay_iter_start (current_buffer, prev, pos, ITREE_DESCENDING);
    while ((node = buffer_overlay_iter_next (current_buffer)))
      {
        ... do something that updates x and y ...
      }
    buffer_overlay_iter_finish (current_buffer);

If we were to use recursion, then we'd need to define a new (recursive)
function which does what's currently done in the `while` loop, but this
function can't access `x` and `y`, so it would need to take them as
argument, or a reference to them...

The use of an iterator is definitely convenient (and is a standard
approach in many languages).

>> Another is the need to update the begin/end fields (these need updating
>> because of insertions/deletions but they're updated lazily while
>> traversing the tree to avoid an O(N) complexity during the
>> insertions/deletions).  Hiding that behind 'some kind of "next node"
>> function keeps the code more readable.
>
> Where in the code do you see iteration that adds or deletes nodes?

No, I meant insertion/deletion of text in the buffer, thus requiring
updates to `begin/end` fields.

> Btw, couldn't we handle this by having a flag in the node that tells
> us whether the begin/end fields can be trusted?  Then the first caller
> who need them would run the update and reset the flags, and we still
> have lazy update, albeit somewhat less lazy, but without the need for
> guarding the iteration with start/finish calls.  Would that work?

Yes, it would, but it's still O(N).
The current approach is inspired by the approach used in `intervals.c`
which also updates those fields lazily/ondemand so as to avoid the O(N)
performance impact.

>> For now, I pushed a simple fix to traverse the tree "by hand" in the GC
>> rather than via the iterator.
> So this removes the restriction of not having GC during iteration?

Yes.

> Also, I take it that you don't consider the current code is as
> "harmful" as Gerd thinks it is?

I don't like the global state it uses, but I think we can fix this
aspect without too much trouble.

> IOW, you don't share his opinion that this implementation is
> a "no-go"?

No, indeed, I don't.


        Stefan






reply via email to

[Prev in Thread] Current Thread [Next in Thread]