bug#53236: 26.1; encode-coding-string does not encode the string as expe

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#53236: 26.1; encode-coding-string does not encode the string as expe

From:	Philipp Stephani
Subject:	bug#53236: 26.1; encode-coding-string does not encode the string as expected
Date:	Thu, 13 Jan 2022 21:23:33 +0100

Am Do., 13. Jan. 2022 um 21:14 Uhr schrieb Markus Triska <triska@metalevel.at>:
>
> Dear all,
>
> please consider the UTF-8 encoding of the Unicode codepoint 0x80, which
> is formed by two bytes. In hexadecimal notation, they are: 0xC2 0x80.
>
> We can use decode-coding-string to verify that this byte sequence is
> decoded to 0x80 when specifying utf-8, which works exactly as expected:
>
>     (decode-coding-string "\xC2\x80" 'utf-8)
>
> This yields "\200", which is the same as "\x80", as verified via:
>
>     (string= "\200" "\x80") --> t

There are two possible interpretations of "\200":
1. The unibyte string containing the byte #x80
2. The multibyte string containing the Unicode character U+0080
The string literal "\200" gives you the former, while
(decode-coding-string "\xC2\x80" 'utf-8) gives you the latter. In
fact,
(string= (decode-coding-string "\xC2\x80" 'utf-8) "\200") ⇒ nil
but
(string= (decode-coding-string "\xC2\x80" 'utf-8) "\u0080") ⇒ t

>
> Correspondingly, I expect (encode-coding-string "\200" 'utf-8) to yield
> a string equivalent to "\xC2\x80", but that seems not to be the case. I get:
>
>     (encode-coding-string "\200" 'utf-8) --> "\200"

Here "\200" gives you the unibyte string that contains the byte #x80.
That can't be encoded as UTF-8 (since UTF-8 encodes Unicode scalar
values, not raw bytes), so it's left alone.
However,
(encode-coding-string "\u0080" 'utf-8) ⇒ "\302\200"

There's some background in the chapter "Text representations" in the
ELisp manual.
HTH

[Prev in Thread]

Current Thread

[Next in Thread]

bug#53236: 26.1; encode-coding-string does not encode the string as expected, Markus Triska, 2022/01/13
- bug#53236: 26.1; encode-coding-string does not encode the string as expected, Philipp Stephani <=
- bug#53236: 26.1; encode-coding-string does not encode the string as expected, Eli Zaretskii, 2022/01/14
  - bug#53236: 26.1; encode-coding-string does not encode the string as expected, Andreas Schwab, 2022/01/14

Prev by Date: bug#53229: 29.0.50; Image-dired is not reusing thumbnails created in emacs-27
Next by Date: bug#53227: master: Wrong error message with M-: (funcall).
Previous by thread: bug#53236: 26.1; encode-coding-string does not encode the string as expected
Next by thread: bug#53236: 26.1; encode-coding-string does not encode the string as expected
Index(es):
- Date
- Thread