bug#58168: string-lessp glitches and inconsistencies

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58168: string-lessp glitches and inconsistencies

From:	Eli Zaretskii
Subject:	bug#58168: string-lessp glitches and inconsistencies
Date:	Sat, 08 Oct 2022 10:35:05 +0300

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Fri, 7 Oct 2022 16:23:26 +0200
> Cc: 58168@debbugs.gnu.org
> 
> 6 okt. 2022 kl. 13.06 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Cf. NaN comparisons with numerical values.
> 
> Emacs strings are completely different from floats and NaNs in just about 
> every respect; no meaningful parallels can be drawn. (And do believe me when 
> I say that we should be thankful for that.)

I'm totally aware that NaNs and unibyte strings are completely
different beasts, believe me.  I was just pointing out another
widespread case where comparison results are surprising and order is
not defined.  My point is that it isn't an unimaginable situation.

> > You missed me here.  Why are you suddenly talking about mismatches?
> > And if only mismatches matter here, why is it a problem to use memchr
> > in the first place?
> 
> Any lexicographic comparison is a matter of finding the first point of 
> difference, then interpreting the difference at that point. `memchr` does not 
> help with that, nor does `memcmp` unless we are doing a bytewise string 
> comparison.

Wed are miscommunicating, because you remove too much of previous
context.  I suggested to use memchr to find whether a string has any
C0 or C1 bytes, _before_ doing the actual comparison, to find out
whether a multibyte string includes any raw bytes, which would then
require slower comparisons.  If there are no C0/C1 bytes, you could
use memcmp, which is always faster than hand-made word-wise comparison
we have there now.

I also suggested to try memmem as yet another possibility -- not sure
up front whether it can be faster in cases that matter.

> Similar improvements could be made to the comparison between unibyte and 
> non-ASCII multibyte strings. These are less common and not quite as slow; I 
> haven't made up my mind about whether it's worth the trouble.

I don't think it's worth the trouble.

> In any case, the situation is now better than it was before the bug was 
> opened: string< is faster and the remaining problems have at least been 
> chartered, whether or not an agreement to remedy them can be reached. Let's 
> be happy about this!

This is me being happy.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/01
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/01
  - bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/02
    - bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/03
    - bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/04
    - bug#58168: string-lessp glitches and inconsistencies, Richard Stallman, 2022/10/04
    - bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/04
    - bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/06
    - bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/06
    - bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/07
    - bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii <=
    - bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/14
    - bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/14
    - bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/17
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/01
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/01
  - bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/01
- bug#58168: string-lessp glitches and inconsistencies, Lars Ingebrigtsen, 2022/10/01
  - bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/01
  - bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/01
    - bug#58168: string-lessp glitches and inconsistencies, Lars Ingebrigtsen, 2022/10/01

Prev by Date: bug#58364: [PATCH] Add new function 'file-name-parent-p'
Next by Date: bug#58330: [PATCH] Add support for the Coptic script
Previous by thread: bug#58168: string-lessp glitches and inconsistencies
Next by thread: bug#58168: string-lessp glitches and inconsistencies
Index(es):
- Date
- Thread