[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#58168: string-lessp glitches and inconsistencies
From: |
Eli Zaretskii |
Subject: |
bug#58168: string-lessp glitches and inconsistencies |
Date: |
Sat, 08 Oct 2022 10:35:05 +0300 |
> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Fri, 7 Oct 2022 16:23:26 +0200
> Cc: 58168@debbugs.gnu.org
>
> 6 okt. 2022 kl. 13.06 skrev Eli Zaretskii <eliz@gnu.org>:
>
> > Cf. NaN comparisons with numerical values.
>
> Emacs strings are completely different from floats and NaNs in just about
> every respect; no meaningful parallels can be drawn. (And do believe me when
> I say that we should be thankful for that.)
I'm totally aware that NaNs and unibyte strings are completely
different beasts, believe me. I was just pointing out another
widespread case where comparison results are surprising and order is
not defined. My point is that it isn't an unimaginable situation.
> > You missed me here. Why are you suddenly talking about mismatches?
> > And if only mismatches matter here, why is it a problem to use memchr
> > in the first place?
>
> Any lexicographic comparison is a matter of finding the first point of
> difference, then interpreting the difference at that point. `memchr` does not
> help with that, nor does `memcmp` unless we are doing a bytewise string
> comparison.
Wed are miscommunicating, because you remove too much of previous
context. I suggested to use memchr to find whether a string has any
C0 or C1 bytes, _before_ doing the actual comparison, to find out
whether a multibyte string includes any raw bytes, which would then
require slower comparisons. If there are no C0/C1 bytes, you could
use memcmp, which is always faster than hand-made word-wise comparison
we have there now.
I also suggested to try memmem as yet another possibility -- not sure
up front whether it can be faster in cases that matter.
> Similar improvements could be made to the comparison between unibyte and
> non-ASCII multibyte strings. These are less common and not quite as slow; I
> haven't made up my mind about whether it's worth the trouble.
I don't think it's worth the trouble.
> In any case, the situation is now better than it was before the bug was
> opened: string< is faster and the remaining problems have at least been
> chartered, whether or not an agreement to remedy them can be reached. Let's
> be happy about this!
This is me being happy.
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/01
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/01
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/02
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/06
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/06
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/07
- bug#58168: string-lessp glitches and inconsistencies,
Eli Zaretskii <=
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/14
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/14
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/17