bug#58342: 29.0.50; noverlay branch is O(N) for important calls

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58342: 29.0.50; noverlay branch is O(N) for important calls

From:	Matt Armstrong
Subject:	bug#58342: 29.0.50; noverlay branch is O(N) for important calls
Date:	Thu, 06 Oct 2022 16:25:48 -0700

First let me preface:

A) I hope I'm wrong here, but after careful thought I've failed to
convince myself that I am, so I am either right or need help from others
to spot my flawed logic.

B) Even if I'm right the problem may not be judged serious enough to
address.

I believe that the feature/noverlay branch uses an O(N) algorithm for
next-overlay-change and previous-overlay-change.  Or, more precisely,
O(MIN(N, (buffer-size))), where N is the overlay count.

Now, in "normal" cases the real world achieved performance will be fine.
If overlays form mostly disjoint intervals, without too many overlapping
overlays, without too many overlays that span large portions of the
buffer, the algorithm used will find the next/previous change quickly.

However, consider the new next_overlay_change:

     1  ptrdiff_t
     2  next_overlay_change (ptrdiff_t pos)
     3  {
     4    ptrdiff_t next = ZV;
     5    struct interval_node *node;
     6    ITREE_FOREACH (node, current_buffer->overlays, pos, next, ASCENDING)
     7      {
     8        if (node->begin > pos)
     9          {
    10            /* If we reach this branch, node->begin must be the least 
upper bound
    11               of pos, because the search is limited to [pos,next) . */
    12            eassert (node->begin < next);
    13            next = node->begin;
    14            ITREE_FOREACH_ABORT ();
    15            break;
    16          }
    17        else if (node->begin < node->end && node->end < next)
    18          {
    19            next = node->end;
    20            ITREE_FOREACH_NARROW (pos, next);
    21          }
    22      }
    23    return next;
    24  }

Here we traverse overlays in ASCENDING order of BEG positions.  The best
we can say is that this loop executes in O(K*log(N)) time, where K is
the MIN of number of overlays that overlap POS and the number of valid
positions in the buffer after POS.  It is always going to be possible to
construct pathalogical cases where lines 19-20 are taken for as many
positions as there are in the buffer after POS, since the tree is not
ordered by END.

I've attached a patch that can go in to itree.c (which, if I'm right, I
think would be good to add, especially if we decide to not improve the
data structure).  I've quoted the text:

   ====
   FIXME: this data structure is O(N) for important operations -matt
   ====

   The amortized costs of Emacs' previous-overlay-change and
   next-overlay-change functions are O(N) with this data structure.
   The root problem is that we only have an order for the BEG field,
   but not the END.  The previous/next overlay change operations need
   to find the nearest point where there is *either* an interval BEG
   or END point, but there is no efficient way to narrow the search
   space over END postions.

   Consider the case where next-overlay-change is called at POS, all
   interval BEG positions are less than pos POS and all interval END
   posistions are after.  These END positions have no order, and so
   *every* interval must be examined.  This is at least O(N).  The
   previous-overlay-change case is similar.  The root issue is that
   the iterative "narrowing" approach is not guaranteed to reduce the
   search space in logarithmic time, since END is not ordered in the
   tree.

   One might argue that the LIMIT value will do this narrowing, but
   this narrowing is O(K*log(N)) where K is the size of the result
   set.  If we are interested in finding the node in a range with the
   smallest END, we might have to examine all K nodes in that range.
   In the case of the *-overlay-channge functions, K may well be equal
   to N.

   Ideally, a tree based data structure for overlays would have
   O(log(N)) performance for previous-overlay-change and
   next-overlay-change, as these are called in performance sensitive
   situations such as redisplay.  The only way I can think of
   achieving this is by keeping one ordering by BEG and a separate
   ordering by END, and then performing logic quite similar to the
   current Emacs overlays-before and overlays-after lists.

>From 6ca364a6ad45b1cfdcf080c492fde536975f6e09 Mon Sep 17 00:00:00 2001
From: Matt Armstrong <matt@rfc20.org>
Date: Thu, 6 Oct 2022 15:47:20 -0700
Subject: [PATCH 5/5] ; * src/itree.c: Add comment describing when noverlay is
 O(N)

---
 src/itree.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/src/itree.c b/src/itree.c
index 79e39d6e2a..6de84ae0b8 100644
--- a/src/itree.c
+++ b/src/itree.c
@@ -62,6 +62,42 @@ Copyright (C) 2017-2022  Free Software Foundation, Inc.
    complexity of O(K*log(N)) for this operation, where K is the size
    of the result set and N the size of the tree.
 
+   ====
+   FIXME: this data structure is O(N) for important operations -matt
+   ====
+
+   The amortized costs of Emacs' previous-overlay-change and
+   next-overlay-change functions are O(N) with this data structure.
+   The root problem is that we only have an order for the BEG field,
+   but not the END.  The previous/next overlay change operations need
+   to find the nearest point where there is *either* an interval BEG
+   or END point, but there is no efficient way to narrow the search
+   space over END postions.
+
+   Consider the case where next-overlay-change is called at POS, all
+   interval BEG positions are less than pos POS and all interval END
+   posistions are after.  These END positions have no order, and so
+   *every* interval must be examined.  This is at least O(N).  The
+   previous-overlay-change case is similar.  The root issue is that
+   the iterative "narrowing" approach is not guaranteed to reduce the
+   search space in logarithmic time, since END is not ordered in the
+   tree.
+
+   One might argue that the LIMIT value will do this narrowing, but
+   this narrowing is O(K*log(N)) where K is the size of the result
+   set.  If we are interested in finding the node in a range with the
+   smallest END, we might have to examine all K nodes in that range.
+   In the case of the *-overlay-channge functions, K may well be equal
+   to N.
+
+   Ideally, a tree based data structure for overlays would have
+   O(log(N)) performance for previous-overlay-change and
+   next-overlay-change, as these are called in performance sensitive
+   situations such as redisplay.  The only way I can think of
+   achieving this is by keeping one ordering by BEG and a separate
+   ordering by END, and then performing logic quite similar to the
+   current Emacs overlays-before and overlays-after lists.
+
    ==== Adjusting intervals ====
 
    Since this data-structure will be used for overlays in an Emacs
-- 
2.35.1

[Prev in Thread]

Current Thread

[Next in Thread]

bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Matt Armstrong <=
- bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Stefan Monnier, 2022/10/06
  - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Eli Zaretskii, 2022/10/07
    - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Stefan Monnier, 2022/10/07
    - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Eli Zaretskii, 2022/10/07
    - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Stefan Monnier, 2022/10/07
  - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Matt Armstrong, 2022/10/07
    - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Stefan Monnier, 2022/10/07
    - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Matt Armstrong, 2022/10/07
    - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Drew Adams, 2022/10/07
    - bug#58342: 29.0.50; noverlay branch is O(N) for important calls, Drew Adams, 2022/10/07

Prev by Date: bug#58302: 29.0.50; browse-url-emacs is extremely slow (and I think always has been?)
Next by Date: bug#58343: 29.0.50; ELisp code run in "inconsitent" selected-window state
Previous by thread: bug#56359: seccomp test failures on RHEL 9.0
Next by thread: bug#58342: 29.0.50; noverlay branch is O(N) for important calls
Index(es):
- Date
- Thread