bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12291: [rev 109796] wrong UTF-8 handling


From: Lars Ingebrigtsen
Subject: bug#12291: [rev 109796] wrong UTF-8 handling
Date: Thu, 27 Jan 2022 17:32:53 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

Werner LEMBERG <wl@gnu.org> writes:

> Have a look at the attached file, containing a single character.
> (It's transmitted as binary to avoid e-mail encoding issues).  It
> contains a single, four-byte UTF-8 encoded character (0xF4 0xB5 0x87
> 0x9E, which would map to the non-existent Unicode character code
> U+1351DE).  If I load this file as UTF-8 encoded, Emacs gives this as
> the output of `C-u C-x =':
>
>                position: 1 of 2 (0%), column: 0
>               character: 二 (displayed as 二) (codepoint 20108, #o47214, #x4e8c)
>       preferred charset: unicode (Unicode (ISO10646))

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

This has changed at some point between this was reported and now:

             position: 1 of 2 (0%), column: 0
            character:  (displayed as ) (codepoint 1266142, #o4650736, #x1351de)
              charset: emacs (Full Emacs charset (excluding eight bit chars))
code point in charset: 0x1351DE
               syntax: w        which means: word
             category: L:Strong L2R
             to input: type "C-x 8 RET 1351de"

So Emacs now displays more accurate information about the utf-8
sequence.

It was pointed out that this sequence is outside the Unicode range,
which only extends up to U+10FFFF, and that Emacs should perhaps display
this as a number of raw bytes instead.  Is that something we still want
to pursue, or is Emacs behaving like we want to here?  Eli?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





reply via email to

[Prev in Thread] Current Thread [Next in Thread]