guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Marko Rauhamaa
Subject: Re: guile can't find a chinese named file
Date: Thu, 16 Feb 2017 23:13:35 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

David Kastrup <address@hidden>:
> Eli Zaretskii <address@hidden> writes:
>> Yes, to be viable in real-life situation, Guile needs to support
>> character strings with occasional embedded raw bytes that cannot be
>> interpreted as characters.
>
> They can be interpreted as "characters", just not inside the _Unicode_
> character range. Raw bytes 0x00 to 0xff could be assigned character
> codes -256 to -1 (when decoding UTF-8, only "raw bytes" 0x80 to 0xff
> will occur since 0x00 to 0x7f is always represented as its own Unicode
> code point). That would it easy to do a blanket check for invalid
> sequences.

Python uses the surrogate hole in the middle of the Unicode range to
represent such stray bytes, but only when naming files. Unlike Guile,
Python character strings permit surrogate code points for arbitrary
purposes.

Internally, CPython (the principal implementation) has Latin-1, UCS-2
and UCS-4 strings to optimize memory use while maintaining fixed-width
character representation.


Marko



reply via email to

[Prev in Thread] Current Thread [Next in Thread]