bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58168: string-lessp glitches and inconsistencies


From: Eli Zaretskii
Subject: bug#58168: string-lessp glitches and inconsistencies
Date: Sat, 08 Oct 2022 10:35:05 +0300

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Fri, 7 Oct 2022 16:23:26 +0200
> Cc: 58168@debbugs.gnu.org
> 
> 6 okt. 2022 kl. 13.06 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Cf. NaN comparisons with numerical values.
> 
> Emacs strings are completely different from floats and NaNs in just about 
> every respect; no meaningful parallels can be drawn. (And do believe me when 
> I say that we should be thankful for that.)

I'm totally aware that NaNs and unibyte strings are completely
different beasts, believe me.  I was just pointing out another
widespread case where comparison results are surprising and order is
not defined.  My point is that it isn't an unimaginable situation.

> > You missed me here.  Why are you suddenly talking about mismatches?
> > And if only mismatches matter here, why is it a problem to use memchr
> > in the first place?
> 
> Any lexicographic comparison is a matter of finding the first point of 
> difference, then interpreting the difference at that point. `memchr` does not 
> help with that, nor does `memcmp` unless we are doing a bytewise string 
> comparison.

Wed are miscommunicating, because you remove too much of previous
context.  I suggested to use memchr to find whether a string has any
C0 or C1 bytes, _before_ doing the actual comparison, to find out
whether a multibyte string includes any raw bytes, which would then
require slower comparisons.  If there are no C0/C1 bytes, you could
use memcmp, which is always faster than hand-made word-wise comparison
we have there now.

I also suggested to try memmem as yet another possibility -- not sure
up front whether it can be faster in cases that matter.

> Similar improvements could be made to the comparison between unibyte and 
> non-ASCII multibyte strings. These are less common and not quite as slow; I 
> haven't made up my mind about whether it's worth the trouble.

I don't think it's worth the trouble.

> In any case, the situation is now better than it was before the bug was 
> opened: string< is faster and the remaining problems have at least been 
> chartered, whether or not an agreement to remedy them can be reached. Let's 
> be happy about this!

This is me being happy.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]