bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58168: string-lessp glitches and inconsistencies


From: Eli Zaretskii
Subject: bug#58168: string-lessp glitches and inconsistencies
Date: Tue, 04 Oct 2022 19:24:52 +0300

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Tue, 4 Oct 2022 16:44:17 +0200
> Cc: larsi@gnus.org,
>  58168@debbugs.gnu.org
> 
> 4 okt. 2022 kl. 13.37 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > First I needed to fix fallout from making STRING_CHAR intolerant of
> > unibyte text, because redisplay-testsuite caused assertion violations
> > in string_char_and_length.
> 
> Good catch! Just to satisfy my curiosity:
> 
> >             error ("Invalid format operation %%%c",
> > -                  STRING_CHAR ((unsigned char *) format - 1));
> > +                  multibyte_format
> > +                  ? STRING_CHAR ((unsigned char *) format - 1)
> > +                  : *((unsigned char *) format - 1));
> 
> This treats unibyte format strings as if they were Latin-1 for the purpose of 
> the error message.

No, it doesn't.  It shows the problematic characters as raw bytes, as
in "%\200" (where \200 is a single character).  If you see something
different, please show the recipe.

> Not very important, of course, but maybe there should be a UNIBYTE_TO_CHAR in 
> the alternative branch?

No, that would show the multibyte codepoint, and will confuse users,
because the result would look very different from the problematic
format spec in this case.

> >  (Doesn't it abort for you? or do you not
> > build Emacs with --enable-checking?)
> 
> Oh I certainly do that occasionally, but it's mostly when I've changed 
> something at the C level or have reason to believe that something is broken 
> there.

Please _always_ test changes related to encoding/decoding and
character representation conversions in a --enable-checking build.  We
should have discovered these bugs in time for Emacs 28.2 to be devoid
of them.

> > I could understand why you'd want to _add_ the larger values, but why
> > replace?
> 
> Because it seemed pretty clear that the old code intended to use #x3ffffc for 
> testing display of raw bytes but a typo turned it into #x3fffc instead which 
> isn't a raw byte but a multibyte character. That it's an easy mistake to make 
> (done so several times myself).

Who said anything about #x3fffc?  The original code had #xfc, the
unibyte code for #x3ffffc.  I don't see why we shouldn't test both.
In the other problematic hunk you replaced \777774 with \374 -- why?

> I've now pushed the patch; the code can be improved further if necessary.

I've reverted it.  Please stop this madness of rushing into installing
changes that are still under controversy.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]