[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie he

From: Stefan Monnier
Subject: distinguishing multibyte/unibyte ASCII (was: [PATCH] url: Wrap cookie headers in url-http--encode-string.)
Date: Fri, 09 Sep 2016 16:01:57 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

> If you just generate an ASCII string from ASCII characters, it will
> usually be unibyte.  If you take it as a substring from a multibyte
> buffer, it will usually be multibyte.

And it's arguably a wart in Emacs's handling of chars-vs-bytes.
But it's kind of hard to fix now.

At some point I tried to change this handling (not exactly fix it) by
treating multibyte ASCII strings specially (it's easy to recognize by
checking that the char length is equal to the byte length and both are
readily available in the "struct Lisp_String" object).  Then when we
read an ASCII string, instead of making it unibyte, I'd keep it as
multibyte.  And then change things like "concat" so that those "ASCII
multibyte" strings don't force the result to be multibyte.

My local Emacs still runs with those changes, but in the end I don't
think the result is really better (or sufficiently better to justify
the subtle incompatibilities it introduces).

[ Also, I wouldn't be surprised to hear that such a change causes real
  problems with utf-7 or EBCDIC, or other systems where decoding/encoding
  a string of bytes/chars all <127 is not a no-op.  ]


reply via email to

[Prev in Thread] Current Thread [Next in Thread]