Hi :)
On Mon 27 Feb 2017 17:07, Eli Zaretskii <address@hidden> writes:
From: Andy Wingo <address@hidden>
Date: Sun, 26 Feb 2017 22:20:31 +0100
In Scheme, strings are sequences of characters. Encoding and decoding
is only needed when going to and from bytes. Guile supports a finite
number of encodings, so in general some encoding/decoding will always be
needed. The specific encoding may change over time.
The lesson of Emacs development is that there's a need for
"characters" that represent raw bytes which cannot be decoded into the
internal representation, for whatever reasons. These special
"characters" need to be representable in strings, among "normal"
recognizable characters (and thus distinguishable from the latter
kind), and they need to be converted back to their single-byte form
when the string is output to the external world. An implementation of
text that doesn't include these features will always fail to support
some important use cases.
Thanks for this note (and upthread). I didn't know Emacs settled on
this strategy. It could fit in as a new "conversion strategy" (see
Encoding in the manual).
I think this feature will probably slip for 2.2.0 for lack of time,
though. When someone does go to look at it, this thread is a useful
resource, or parts of it anyway :) I especially appreciated the
tradeoffs between surrogates and strange UTF-8 hacks.
Andy