emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers


From: Eli Zaretskii
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 12:25:43 +0300

> From: David Kastrup <address@hidden>
> Cc: address@hidden
> Date: Sat, 29 Mar 2014 09:40:03 +0100
> 
> >> It means a buffer where each _character_ has the same value that the
> >> no-longer-available unibyte buffer would have in its bytes/characters.
> >
> > This doesn't seem to be a complete description of what is suggested.
> > E.g., just by looking at the values of characters, it is impossible to
> > distinguish between Latin characters below 256 and raw bytes.  In a
> > unibyte buffer, we know how to make that distinction,
> 
> Uh, what?  The point of a unibyte buffer is that it does not make the
> distinction.

Yes, it does: it treats every character as a raw byte.  So the dilemma
is resolved there by definition.  How to do that without unibyte
buffers remains to be defined, otherwise plans to remove unibyte
buffers are impractical.

> > but if there are no unibyte buffers, something else is needed for
> > doing that.
> 
> >> You can do that whether or not the conceptual array of 0..255 characters
> >> is internally encoded in unibyte or multibyte encodings.
> >
> > What do you mean by "multibyte encodings" in this context?  Are you
> > suggesting to store the bytes 128..255 as Latin-1 characters,
> > i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
> > characters?
> 
> That would make the most sense, yes.

Then the above distinction is impossible, and all kinds of subtly
incorrect behaviors creep in.

> > Or are you suggesting something else?
> 
> You could also use the "raw byte" character encodings we use for not
> losing information when reading not properly formed utf-8 files into a
> multibyte buffer, but that seems less practical when working with the
> character codes.

Why less practical?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]