emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers


From: David Kastrup
Subject: Re: Unibyte characters, strings, and buffers
Date: Fri, 28 Mar 2014 12:34:56 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

Andreas Schwab <address@hidden> writes:

> David Kastrup <address@hidden> writes:
>
>> "Stephen J. Turnbull" <address@hidden> writes:
>>
>>> I agree that having a way to represent "undecodable bytes" in a string
>>> or buffer is extremely convenient.  XEmacs's lack of this capability
>>> is surely a deficiency (Hi, David K!)
>>
>> Doing this in an utf-8 based internal coding is somewhat doable by
>> employing non-utf-8 sequences.  Either using code points above the
>> Unicode code range (2^20 + something, requiring 4 bytes), or by using
>> non-minimal encodings (since the minimal ones are two bytes, requiring 3
>> bytes).  Either way, the size increases significantly.
>
> Emacs uses U3fff80-U3fffff for raw 8-bit bytes, internally represented
> by 2 bytes.

Well, I forgot the non-minimal encodings for 0x00-0x7f, namely two-byte
sequences starting with 0xc0 or 0xc1 and ending with 0x80-0xbf.

Those would still fit the representation invariants.  Are those the
two-byte encodings used for "raw 0x80 to 0xff"?

-- 
David Kastrup




reply via email to

[Prev in Thread] Current Thread [Next in Thread]