guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: David Kastrup
Subject: Re: guile can't find a chinese named file
Date: Fri, 17 Feb 2017 10:04:29 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

Marko Rauhamaa <address@hidden> writes:

> Eli Zaretskii <address@hidden>:
>>> From: Marko Rauhamaa <address@hidden>
>>> Python uses the surrogate hole in the middle of the Unicode range to
>>> represent such stray bytes, but only when naming files.
>>
>> IMO, it makes no sense to limit this to file names, because (a) you
>> don't always know on all levels of the code which string is a file
>> name or a part thereof; and (b) because situations where non-ASCII
>> bytes cannot be properly decoded into Unicode happen with text that is
>> not file names, and users still expect Emacs to silently produce the
>> same byte stream on round-trip operations, e.g., when copying text
>> from one file to another.
>
> Python just barfs:
>
>    $ python3 -c "import sys; print(sys.stdin.read(30))" <<<$'\xdd'
>    Traceback (most recent call last):
>      File "<string>", line 1, in <module>
>      File "/usr/lib64/python3.5/codecs.py", line 321, in decode
>        (result, consumed) = self._buffer_decode(data, self.errors, final)
>    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdd in position \
>    0: invalid continuation byte
>
> The situation is a bit difficult to recover from.

You can load an executable into an Emacs buffer and do a
search-and-replace on UTF-8 strings, then save again.  Assuming that the
replacement has been by a string of the same length and that the string
does not appear as part of symbols for the linker, the executable will
likely work fine afterwards.

I don't think that XEmacs (another Emacs implementation that migrated a
lot more leisurely to multibyte encodings) would stand up to the same
sort of abuse.  And probably quite a few text editors would throw in the
towel as well.  But once you view Emacs as a text processing platform,
it's a reasonable conclusion that failure is not a good option.

For a general-purpose programming language like Python or Guile, I
should think it should be at least as important that strings can
represent input accurately without having to degress outside of string
processing and use stuff like byte arrays.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]