emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ucs-normalize and diacritics


From: Robert Pluim
Subject: Re: ucs-normalize and diacritics
Date: Thu, 26 Jul 2018 10:40:45 +0200

Eli Zaretskii <address@hidden> writes:

>> From: Robert Pluim <address@hidden>
>> Date: Wed, 25 Jul 2018 16:45:03 +0200
>> 
>>        As a special case, if the character lies in the range 128 (0200
>>     octal) through 159 (0237 octal), it stands for a raw byte that does not
>>     correspond to any specific displayable character.  Such a character lies
>>     within the eight-bit-control character set, and is displayed as an
>>     escaped octal character code.  In this case, C-x = shows part of
>>     display ... instead of file.
>
> This text is obsolete and inaccurate, it should be replaced/rewritten.
>

How about something like:

  As a special case, if the character lies in the range #x3fff80
through #x3fff9a (128 through 159 decimal, with prefix #x3fff), it
stands for a raw byte that does not correspond to any specific
displayable character.  Such a character lies within the
@code{eight-bit-control} character set, and is displayed as an escaped
octal character code (0200 through 0237), or as an escaped hex
character code (x80 through x9a) if @code{display-raw-bytes-as-hex} is
address@hidden

Iʼm not sure the 'eight-bit-control' part is true, given the reference
to 'tis620-2533' in the what-cursor-position output.

>> emacs -Q
>> C-x C-f /tmp/bin.txt
>> C-x 8 RET 80
>> C-b
>> C-x =
>> 
>> which gives
>> 
>> Char: \200 (128, #o200, #x80, file ...) point=1 of 1 (0%) column=0
>
> Try
>
>   C-x 8 RET 3fff80 RET

Yes, that's better. So C-x 8 RET 80 results in emacs writing 2 bytes on
disk, but 3fff80 results in only one. The joys of multibyte :-)

Robert



reply via email to

[Prev in Thread] Current Thread [Next in Thread]