bug#54532: [PATCH] sorting

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#54532: [PATCH] sorting

From:	Andrew Cohen
Subject:	bug#54532: [PATCH] sorting
Date:	Fri, 01 Apr 2022 07:47:27 +0800
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:

[...]

    EZ> Yes, I think as soon as Andrew comes up with an updated patch,
    EZ> we can install this.

Attached is the patch for the items that Eli identified earlier
(improvements in comments and breaking one long line). I'll wait for any
further changes/comments. Then I think its best to squash this down to 2
commits: one for the GC marking, and all the rest related to
sorting. (I'm happy to use more commits if you think the predicate
resolution or the unit tests deserve their own patches).

Best,
Andy

>From 551352052021c92bf75001b07ad714454aad7706 Mon Sep 17 00:00:00 2001
From: Andrew G Cohen <cohen@andy.bu.edu>
Date: Fri, 1 Apr 2022 07:38:48 +0800
Subject: [PATCH] ; * src/sort.c: Improve comments.

---
 src/sort.c | 155 +++++++++++++++++++++++++++--------------------------
 1 file changed, 78 insertions(+), 77 deletions(-)

diff --git a/src/sort.c b/src/sort.c
index 48e106e92d..694d826853 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -117,10 +117,10 @@ inorder (const Lisp_Object predicate, const Lisp_Object 
a, const Lisp_Object b)
 }
 
 
-/* BINARYSORT() is a stable binary insertion sort used for sorting the
-   list starting at LO and ending at HI.  On entry, LO <= START <= HI,
-   and [LO, START) is already sorted (pass START == LO if you don't
-   know!).  Even in case of error, the output slice will be some
+/* Sort the list starting at LO and ending at HI using a stable binary
+   insertion sort algorithm. On entry the sublist [LO, START) (with
+   START between LO and HIGH) is known to be sorted (pass START == LO
+   if you are unsure).  Even in case of error, the output will be some
    permutation of the input (nothing is lost or duplicated).  */
 
 static void
@@ -154,17 +154,18 @@ binarysort (merge_state *ms, Lisp_Object *lo, const 
Lisp_Object *hi,
 }
 
 
-/*  COUNT_RUN() returns the length of the run beginning at LO, in the
-    slice [LO, HI) with LO < HI.  A "run" is the longest
+/*  Find and return the length of the "run" (the longest
     non-decreasing sequence or the longest strictly decreasing
     sequence, with the Boolean *DESCENDING set to 0 in the former
-    case, or to 1 in the latter.  The strictness of the definition of
-    "descending" is needed so that the caller can safely reverse a
-    descending sequence without violating stability (strict > ensures
-    there are no equal elements to get out of order).  */
+    case, or to 1 in the latter) beginning at LO, in the slice [LO,
+    HI) with LO < HI.  The strictness of the definition of
+    "descending" ensures there are no equal elements to get out of
+    order so the caller can safely reverse a descending sequence
+    without violating stability.  */
 
 static ptrdiff_t
-count_run (merge_state *ms, Lisp_Object *lo, const Lisp_Object *hi, bool 
*descending)
+count_run (merge_state *ms, Lisp_Object *lo, const Lisp_Object *hi,
+          bool *descending)
 {
   Lisp_Object pred = ms->predicate;
 
@@ -198,23 +199,24 @@ count_run (merge_state *ms, Lisp_Object *lo, const 
Lisp_Object *hi, bool *descen
 }
 
 
-/*  GALLOP_LEFT() locates the proper position of KEY in a sorted
+/*  Locate and return the proper insertion position of KEY in a sorted
     vector: if the vector contains an element equal to KEY, return the
     position immediately to the left of the leftmost equal element.
-    [GALLOP_RIGHT() does the same except returns the position to the
+    [GALLOP_RIGHT does the same except it returns the position to the
     right of the rightmost equal element (if any).]
 
-    'A' is a sorted vector with N elements, starting at A[0].  N must be > 0.
+    'A' is a sorted vector of N elements. N must be > 0.
 
-    HINT is an index at which to begin the search, 0 <= HINT < N.  The closer
-    HINT is to the final result, the faster this runs.
+    Elements preceding HINT, a non-negative index less than N, are
+    skipped.  The closer HINT is to the final result, the faster this
+    runs.
 
     The return value is the int k in [0, N] such that
 
     A[k-1] < KEY <= a[k]
 
-    pretending that *(A-1) is minus infinity and A[N] is plus infinity.  IOW,
-    KEY belongs at index k; or, IOW, the first k elements of A should precede
+    pretending that *(A-1) precedes all values and *(A+N) succeeds all
+    values.  In other words, the first k elements of A should precede
     KEY, and the last N-k should follow KEY.  */
 
 static ptrdiff_t
@@ -254,7 +256,7 @@ gallop_left (merge_state *ms, const Lisp_Object key, 
Lisp_Object *a,
     {
       /* When key <= a[hint], gallop left, until
         a[hint - ofs] < key <= a[hint - lastofs].  */
-      const ptrdiff_t maxofs = hint + 1;             /* Here &a[0] is lowest.  
*/
+      const ptrdiff_t maxofs = hint + 1;        /* Here &a[0] is lowest.  */
       while (ofs < maxofs)
        {
          if (inorder (pred, a[-ofs], key))
@@ -283,18 +285,19 @@ gallop_left (merge_state *ms, const Lisp_Object key, 
Lisp_Object *a,
       ptrdiff_t m = lastofs + ((ofs - lastofs) >> 1);
 
       if (inorder (pred, a[m], key))
-       lastofs = m + 1;              /* Here a[m] < key.  */
+       lastofs = m + 1;            /* Here a[m] < key.  */
       else
        ofs = m;                    /* Here key <= a[m].  */
     }
-  eassume (lastofs == ofs);             /* Then a[ofs-1] < key <= a[ofs].  */
+  eassume (lastofs == ofs);         /* Then a[ofs-1] < key <= a[ofs].  */
   return ofs;
 }
 
 
-/*  GALLOP_RIGHT() is exactly like GALLOP_LEFT(), except that if KEY
-    already exists in A[0:N], it finds the position immediately to the
-    right of the rightmost equal value.
+/*  Locate and return the proper position of KEY in a sorted vector
+    exactly like GALLOP_LEFT, except that if KEY already exists in
+    A[0:N] find the position immediately to the right of the rightmost
+    equal value.
 
     The return value is the int k in [0, N] such that
 
@@ -315,7 +318,7 @@ gallop_right (merge_state *ms, const Lisp_Object key, 
Lisp_Object *a,
     {
       /* When key < a[hint], gallop left until
         a[hint - ofs] <= key < a[hint - lastofs].  */
-      const ptrdiff_t maxofs = hint + 1;             /* Here &a[0] is lowest.  
*/
+      const ptrdiff_t maxofs = hint + 1;        /* Here &a[0] is lowest.  */
       while (ofs < maxofs)
        {
          if (inorder (pred, key, a[-ofs]))
@@ -338,7 +341,7 @@ gallop_right (merge_state *ms, const Lisp_Object key, 
Lisp_Object *a,
     {
       /* When a[hint] <= key, gallop right, until
         a[hint + lastofs] <= key < a[hint + ofs].  */
-      const ptrdiff_t maxofs = n - hint;             /* Here &a[n-1] is 
highest.  */
+      const ptrdiff_t maxofs = n - hint;        /* Here &a[n-1] is highest.  */
       while (ofs < maxofs)
        {
          if (inorder (pred, key, a[ofs]))
@@ -368,9 +371,9 @@ gallop_right (merge_state *ms, const Lisp_Object key, 
Lisp_Object *a,
       if (inorder (pred, key, a[m]))
        ofs = m;                    /* Here key < a[m].  */
       else
-       lastofs = m + 1;              /* Here a[m] <= key.  */
+       lastofs = m + 1;            /* Here a[m] <= key.  */
     }
-  eassume (lastofs == ofs);             /* Now  a[ofs-1] <= key < a[ofs].  */
+  eassume (lastofs == ofs);         /* Now  a[ofs-1] <= key < a[ofs].  */
   return ofs;
 }
 
@@ -411,9 +414,9 @@ merge_markmem (void *arg)
 }
 
 
-/* CLEANUP_MEM frees all temp storage.  If an exception occurs while
-   merging it will first relocate any lisp elements in temp storage
-   back to the original array.  */
+/* Free all temp storage.  If an exception occurs while merging,
+   relocate any lisp elements in temp storage back to the original
+   array.  */
 
 static void
 cleanup_mem (void *arg)
@@ -440,9 +443,9 @@ cleanup_mem (void *arg)
 }
 
 
-/* MERGE_GETMEM() ensures availability of enough temp memory for NEED
-   array slots.  Any previously allocated memory is first freed, and a
-   cleanup routine is registered to free memory at the very end, or on
+/* Allocate enough temp memory for NEED array slots.  Any previously
+   allocated memory is first freed, and a cleanup routine is
+   registered to free memory at the very end of the sort, or on
    exception.  */
 
 static void
@@ -453,7 +456,7 @@ merge_getmem (merge_state *ms, const ptrdiff_t need)
   if (ms->a == ms->temparray)
     {
       /* We only get here if alloc is needed and this is the first
-        time, so we set up the unwind.  */
+        time, so we set up the unwind protection.  */
       specpdl_ref count = SPECPDL_INDEX ();
       record_unwind_protect_ptr_mark (cleanup_mem, ms, merge_markmem);
       ms->count = count;
@@ -479,10 +482,10 @@ needmem (merge_state *ms, ptrdiff_t na)
 }
 
 
-/* MERGE_LO() stably merges the NA elements starting at SSA with the
-   NB elements starting at SSB = SSA + NA, in-place.  NA and NB must
-   be positive.  We also require that SSA[NA-1] belongs at the end of
-   the merge, and should have NA <= NB.  */
+/* Stably merge (in-place) the NA elements starting at SSA with the NB
+   elements starting at SSB = SSA + NA.  NA and NB must be positive.
+   Require that SSA[NA-1] belongs at the end of the merge, and NA <=
+   NB.  */
 
 static void
 merge_lo (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na, Lisp_Object *ssb,
@@ -509,9 +512,9 @@ merge_lo (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na, 
Lisp_Object *ssb,
   ptrdiff_t min_gallop = ms->min_gallop;
   for (;;)
     {
-      ptrdiff_t acount = 0;  /* This holds the # of consecutive times A won.  
*/
+      ptrdiff_t acount = 0;   /* The # of consecutive times A won.  */
 
-      ptrdiff_t bcount = 0;  /* This holds the # of consecutive times B won.  
*/
+      ptrdiff_t bcount = 0;   /* The # of consecutive times B won.  */
 
       for (;;)
        {
@@ -540,9 +543,10 @@ merge_lo (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na, 
Lisp_Object *ssb,
            }
        }
 
-      /* One run is winning so consistently that galloping may be a huge
-        win.  We try that, and continue galloping until (if ever)
-        neither run appears to be winning consistently anymore.  */
+      /* One run is winning so consistently that galloping may be a
+        huge speedup.  We try that, and continue galloping until (if
+        ever) neither run appears to be winning consistently
+        anymore.  */
       ++min_gallop;
       do {
        eassume (na > 1 && nb > 0);
@@ -558,8 +562,8 @@ merge_lo (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na, 
Lisp_Object *ssb,
            na -= k;
            if (na == 1)
              goto CopyB;
-           /* While na==0 is impossible now if the comparison function is
-              consistent, we shouldn't assume that it is.  */
+           /* While na==0 is impossible for a consistent comparison
+              function, we shouldn't assume that it is.  */
            if (na == 0)
              goto Succeed;
          }
@@ -603,10 +607,10 @@ merge_lo (merge_state *ms, Lisp_Object *ssa, ptrdiff_t 
na, Lisp_Object *ssb,
 }
 
 
-/* MERGE_HI() stably merges the NA elements starting at SSA with the
-   NB elements starting at SSB = SSA + NA, in-place.  NA and NB must
-   be positive.  We also require that SSA[NA-1] belongs at the end of
-   the merge, and should have NA >= NB.  */
+/* Stably merge (in-place) the NA elements starting at SSA with the NB
+   elements starting at SSB = SSA + NA.  NA and NB must be positive.
+   Require that SSA[NA-1] belongs at the end of the merge, and NA >=
+   NB.  */
 
 static void
 merge_hi (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na,
@@ -636,8 +640,8 @@ merge_hi (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na,
 
   ptrdiff_t min_gallop = ms->min_gallop;
   for (;;) {
-    ptrdiff_t acount = 0;   /* This holds the # of consecutive times A won.  */
-    ptrdiff_t bcount = 0;   /* This holds the # of consecutive times B won.  */
+    ptrdiff_t acount = 0;   /* The # of consecutive times A won.  */
+    ptrdiff_t bcount = 0;   /* The # of consecutive times B won.  */
 
     for (;;) {
       eassume (na > 0 && nb > 1);
@@ -666,8 +670,8 @@ merge_hi (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na,
     }
 
     /* One run is winning so consistently that galloping may be a huge
-       win.  Try that, and continue galloping until (if ever) neither
-       run appears to be winning consistently anymore.  */
+       speedup.  Try that, and continue galloping until (if ever)
+       neither run appears to be winning consistently anymore.  */
     ++min_gallop;
     do {
       eassume (na > 0 && nb > 1);
@@ -701,8 +705,8 @@ merge_hi (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na,
          nb -= k;
          if (nb == 1)
            goto CopyA;
-         /* While nb==0 is impossible now if the comparison function
-            is consistent, we shouldn't assume that it is.  */
+         /* While nb==0 is impossible for a consistent comparison
+             function we shouldn't assume that it is.  */
          if (nb == 0)
            goto Succeed;
        }
@@ -730,7 +734,7 @@ merge_hi (merge_state *ms, Lisp_Object *ssa, ptrdiff_t na,
 }
 
 
-/* MERGE_AT() merges the two runs at stack indices I and I+1.  */
+/* Merge the two runs at stack indices I and I+1.  */
 
 static void
 merge_at (merge_state *ms, const ptrdiff_t i)
@@ -747,9 +751,9 @@ merge_at (merge_state *ms, const ptrdiff_t i)
   eassume (na > 0 && nb > 0);
   eassume (ssa + na == ssb);
 
-  /* Record the length of the combined runs; if i is the 3rd-last run
-     now, also slide over the last run (which isn't involved in this
-     merge).  The current run i+1 goes away in any case.  */
+  /* Record the length of the combined runs. The current run i+1 goes
+     away after the merge.  If i is the 3rd-last run now, slide the
+     last run (which isn't involved in this merge) over to i+1.  */
   ms->pending[i].len = na + nb;
   if (i == ms->n - 3)
     ms->pending[i + 1] = ms->pending[i + 2];
@@ -779,10 +783,9 @@ merge_at (merge_state *ms, const ptrdiff_t i)
 }
 
 
-/* POWERLOOP() computes the "power" of the first of two adjacent runs
-   begining at index S1, with the first having length N1 and the
-   second (starting at index S1+N1) having length N2.  The list has
-   total length N.  */
+/* Compute the "power" of the first of two adjacent runs begining at
+   index S1, with the first having length N1 and the second (starting
+   at index S1+N1) having length N2.  The run has total length N.  */
 
 static int
 powerloop (const ptrdiff_t s1, const ptrdiff_t n1, const ptrdiff_t n2,
@@ -824,11 +827,11 @@ powerloop (const ptrdiff_t s1, const ptrdiff_t n1, const 
ptrdiff_t n2,
 }
 
 
-/* FOUND_NEW_RUN() updates the state when a run of length N2 has been
-   identified.  If there's already a stretch on the stack, apply the
-   "powersort" merge strategy: compute the topmost stretch's "power"
-   (depth in a conceptual binary merge tree) and merge adjacent runs
-   on the stack with greater power.  */
+/* Update the state upon identifying a run of length N2.  If there's
+   already a stretch on the stack, apply the "powersort" merge
+   strategy: compute the topmost stretch's "power" (depth in a
+   conceptual binary merge tree) and merge adjacent runs on the stack
+   with greater power.  */
 
 static void
 found_new_run (merge_state *ms, const ptrdiff_t n2)
@@ -851,9 +854,8 @@ found_new_run (merge_state *ms, const ptrdiff_t n2)
 }
 
 
-/* MERGE_FORCE_COLLAPSE() unconditionally merges all stretches on the
-   stack until only one remains, and returns 0 on success.  This is
-   used at the end of the mergesort.  */
+/* Unconditionally merge all stretches on the stack until only one
+   remains.  */
 
 static void
 merge_force_collapse (merge_state *ms)
@@ -871,9 +873,8 @@ merge_force_collapse (merge_state *ms)
 }
 
 
-/* MERGE_COMPUTE_MINRUN() computes a good value for the minimum run
-   length; natural runs shorter than this are boosted artificially via
-   binary insertion.
+/* Compute a good value for the minimum run length; natural runs
+   shorter than this are boosted artificially via binary insertion.
 
    If N < 64, return N (it's too small to bother with fancy stuff).
    Otherwise if N is an exact power of 2, return 32.  Finally, return
@@ -907,8 +908,8 @@ reverse_vector (Lisp_Object *s, const ptrdiff_t n)
     }
 }
 
-/* TIM_SORT sorts the array SEQ with LENGTH elements in the order
-   determined by PREDICATE.  */
+/* Sort the array SEQ with LENGTH elements in the order determined by
+   PREDICATE.  */
 
 void
 tim_sort (Lisp_Object predicate, Lisp_Object *seq, const ptrdiff_t length)
-- 
2.34.1.575.g55b058a8bb



-- 
Andrew Cohen

[Prev in Thread]

Current Thread

[Next in Thread]

bug#54532: [PATCH] sorting, (continued)
- bug#54532: [PATCH] sorting, Mattias Engdegård, 2022/03/23
  - bug#54532: [PATCH] sorting, Eli Zaretskii, 2022/03/24
    - bug#54532: [PATCH] sorting, Andrew Cohen, 2022/03/24
    - bug#54532: [PATCH] sorting, Eli Zaretskii, 2022/03/24
    - bug#54532: [PATCH] sorting, Andrew Cohen, 2022/03/24
    - bug#54532: [PATCH] sorting, Mattias Engdegård, 2022/03/24
    - bug#54532: [PATCH] sorting, Mattias Engdegård, 2022/03/24
- bug#54532: [PATCH] sorting, Lars Ingebrigtsen, 2022/03/31
  - bug#54532: [PATCH] sorting, Eli Zaretskii, 2022/03/31
    - bug#54532: [PATCH] sorting, Andrew Cohen <=

Prev by Date: bug#54603: 29.0.50; [PATCH] Eshell's external pipe module interferes with other argument parsing hooks
Next by Date: bug#54411: German 'ß' is treated as subword start (a capital letter?) in `subword-mode'
Previous by thread: bug#54532: [PATCH] sorting
Next by thread: bug#41124: 26.3; highlight regexp not working properly - not updating as you type
Index(es):
- Date
- Thread