[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Running script from directory with UTF-8 characters

From: David Kastrup
Subject: Re: Running script from directory with UTF-8 characters
Date: Wed, 23 Dec 2015 23:25:15 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

Marko Rauhamaa <address@hidden> writes:

> David Kastrup <address@hidden>:
>> That's more economical than Python's method which uses the encodings
>> of surrogate words not allowed in properly encoded UTF-8, taking
>> 3 bytes rather than the 2 Emacs makes do with. Using high codepoints
>> above the Unicode space would even take 4 bytes.
> Actually, CPython represents strings internally even less
> "economically:" it uses single-byte strings if it can (Latin-1). If it
> can't, it uses all-two-byte strings (UCS-2). If it can't do even that,
> it uses all-four-byte strings (UCS-4). Thus, even a single code point
> above 65535 will cause the whole string to consist of 4-byte integers.

Maybe I confused Python and Perl here.  No idea.  But I'm pretty sure
about Emacs.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]