guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Eli Zaretskii
Subject: Re: guile can't find a chinese named file
Date: Fri, 17 Feb 2017 08:44:23 +0200

> From: Marko Rauhamaa <address@hidden>
> Cc: Eli Zaretskii <address@hidden>,  address@hidden
> Date: Thu, 16 Feb 2017 23:13:35 +0200
> 
> Python uses the surrogate hole in the middle of the Unicode range to
> represent such stray bytes, but only when naming files.

IMO, it makes no sense to limit this to file names, because (a) you
don't always know on all levels of the code which string is a file
name or a part thereof; and (b) because situations where non-ASCII
bytes cannot be properly decoded into Unicode happen with text that is
not file names, and users still expect Emacs to silently produce the
same byte stream on round-trip operations, e.g., when copying text
from one file to another.

> Internally, CPython (the principal implementation) has Latin-1, UCS-2
> and UCS-4 strings to optimize memory use while maintaining fixed-width
> character representation.

Emacs uses a superset of UTF-8 internally.  We have found that the
variable-length encoding doesn't slow down Emacs enough to worry
about, because the need to go back in a string or buffer text is rare.
It wasn't worth the complication of maintaining different
representations, with the corresponding risk of bugs (because it is
very easy in Emacs to gain access to the internal representation of
text).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]