[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#58168: string-lessp glitches and inconsistencies
From: |
Mattias Engdegård |
Subject: |
bug#58168: string-lessp glitches and inconsistencies |
Date: |
Sat, 1 Oct 2022 21:57:45 +0200 |
1 okt. 2022 kl. 07.22 skrev Eli Zaretskii <eliz@gnu.org>:
> It depends on the use case, but in general I see no problem with
> signaling errors when we cannot produce reasonably correct results.
> For example, string-to-unibyte does signal an error in some cases.
That's fine because that function is documented to do so and always has, but
making previously possible comparisons raise errors shouldn't be done lightly.
Comparison between objects is not only useful when someone cares about their
order, as in presenting a sorted list to the user. Often what is important is
an ability to impose an order, preferably total, for use in building and
searching data structures. I came across this bug when implementing a string
set.
>> It's also a matter of performance -- string< has been improved recently but
>> currently we compare text in Latin and Swahili much faster than French and
>> Arabic; it would be nice to close that gap. UTF-8 is designed so that
>> comparing strings by scalar values can be done byte-wise, but the way we
>> encode raw bytes make them sort right between ASCII and Latin-1. Given that
>> the specific order doesn't matter much, we could just run with that.
>
> I see no reason to make comparison of unibyte and multibyte strings
> perform better.
Actually I was talking about multibyte-multibyte comparisons.
You were probably thinking about comparisons between unibyte strings that
contain raw bytes and multibyte strings, and those are indeed not very
performance-sensitive. However there is no way to detect whether a unibyte
string contains non-ASCII chars without looking at every byte, and comparing
unibyte ASCII with multibyte is definitely of interest. Strings are still
unibyte by default.
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/01
- bug#58168: string-lessp glitches and inconsistencies,
Mattias Engdegård <=
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/02
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/06
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/06
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/07
- bug#58168: string-lessp glitches and inconsistencies, Eli Zaretskii, 2022/10/08
- bug#58168: string-lessp glitches and inconsistencies, Mattias Engdegård, 2022/10/14