bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#27270: display-raw-bytes-as-hex generates ambiguous output for Emacs


From: Vasilij Schneidermann
Subject: bug#27270: display-raw-bytes-as-hex generates ambiguous output for Emacs strings
Date: Sun, 24 Apr 2022 12:51:58 +0200

> You need to use a wide string:
>
>       wslen(L"\x1234")
>
> >     std::string("\x1234").length() // C++: compilation error
>
> Likewise:
>
>       std::wstring(L"\x1234").length()

Thank you for pointing this out. This gives us three camps:

- Languages where "\x1234" is always one character (Emacs Lisp)
- Languages where "\x1234" is an error, but may become one character
when opting into this with wide literals (C, C++)
- Languages where "\x1234" is always multiple characters (everything
else under the sun)

I propose Emacs Lisp to move into camp 3 (not really a point in moving
to camp two as it requires new syntax for a hardly used feature). As
evident by the bug report, this is a footgun waiting to happen. We
already do have syntax in case one truly wants to specify a value
greater than #xFF using Unicode names/values. This would require an
amendment in `(info "(elisp) General Escape Syntax")`, point 3. Like
with oldstyle backquotes, a warning could be emitted if greater hex
values are used in a string.

I've checked Emacs sources for usage of such hex escapes and only
found org-entities.el to represent non-breaking space (nbsp) this way,
so breakage should be limited.

If there is interest, I could extend the survey to include whether
character syntax is/should be affected the same way and/or include
more languages.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]