guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Running script from directory with UTF-8 characters


From: Marko Rauhamaa
Subject: Re: Running script from directory with UTF-8 characters
Date: Thu, 24 Dec 2015 00:20:55 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

David Kastrup <address@hidden>:

> That's more economical than Python's method which uses the encodings
> of surrogate words not allowed in properly encoded UTF-8, taking
> 3 bytes rather than the 2 Emacs makes do with. Using high codepoints
> above the Unicode space would even take 4 bytes.

Actually, CPython represents strings internally even less
"economically:" it uses single-byte strings if it can (Latin-1). If it
can't, it uses all-two-byte strings (UCS-2). If it can't do even that,
it uses all-four-byte strings (UCS-4). Thus, even a single code point
above 65535 will cause the whole string to consist of 4-byte integers.


Marko



reply via email to

[Prev in Thread] Current Thread [Next in Thread]