emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sorting in C


From: Andrew Cohen
Subject: Re: sorting in C
Date: Wed, 23 Feb 2022 12:14:52 +0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

Sorry for the long post. 

I wanted to give some benchmarking update on the patch I posted to the
list earlier. These are far from comprehensive but I hope are a useful
starting point.

Background: these benchmarks compare the current ("oldsort") with the
modified ("newsort"). The modified is described in points 1 and 2 from
my earlier email: replacing the current list sort (a recursive merge
sort) with an iterative in-place insertion sort for short (n < 64)
lists; and converting to a vector, passing off to the current vector
sorting routine, and converting back to a list for longer lists.

To generate them I compile a version of emacs that has BOTH routines,
run =emacs -Q=, and use =benchmark-call= to time the results over
multiple repititions. I compute the time as =(total time - GC time)=
averaged over the repetitions (the resuls I am reporting have no GC, so
this is just the average time). If I increase the repetitions I get GC,
but don't see much difference in how often GC happens.

There are really two cases: length < 64 which uses the insertion sort;
and longer lengths which uses the vector sorting. I compare with similar
sized lists (63 to call the insertion sort, and 65 to call the vector
sort.) I use 4 lists for each case: sorted (s63, s65), reverse sorted
(r63, r65), random (rand63, rand65), and "worst-case" (worst63,worst65:
a sawtooth, which maximizes the number of comparisons for insertion
sort), all with 1000 repetitions. Finally I try many random list with
100K elements (only one is included in the table---the others are all
similar). These are in milliseconds.

|            |   old |   new |
|------------+-------+-------|
| s63        | 1.485 |  .376 |
| s65        | 1.211 |  .990 |
| r63        | 1.323 |  .609 |
| r65        | 1.422 | 1.091 |
| rand63     | 1.236 | 2.581 |
| rand65     | 1.627 | 1.238 |
| worst63    | 1.109 | 6.019 |
| worst65    | 1.359 | 1.067 |
| random100K |   160 |   101 |


Observations: It all looks as expected. Its compatible with everything
I have seen in similar comparisons you can find in the ether.

Insertion sort: great for ordered lists (factor of 2 to 4 better);
poor for random data (factor of 2 worse); worst case is 5.4 times
worse (theoretical it should be about 32/6 = 5.333 :) ).

Vector sort: its typically 30% faster, but does even better for longer
lists.

Is the insertion sort for small lists worth it compared to the vector
sort? For partially ordered lists its between 2 and 4 times faster than
the vector sort. For random data its a factor of 2 slower. The worst
case is a factor of 6. I think its worth it (my prejudice is the data is
more likely to be partially ordered than not.)  Also the advantage of
the insertion sort shown here is really the worst case---its more
advantageous for even shorter lists where the overhead of converting to
and from the list becomes more important. We could lower the threshold
to a list size less than 64, but the TIMSORT testing suggests 64 is the
sweet spot.

And implementing TIMSORT will be even better---similar or better speedup
for the best cases, and only marginal degradation of the worst.  But I
think even this small change here is a win.

-- 
Andrew Cohen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]