gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [OT] Unicode meets Scheme strings draft


From: Florian Weimer
Subject: Re: [Gnu-arch-users] [OT] Unicode meets Scheme strings draft
Date: Sun, 25 Jan 2004 00:00:58 +0100
User-agent: Mutt/1.5.5.1+cvs20040105i

Tom Lord wrote:

>     > Let's take one step back a bit.  What is a "character" in the
>     > context of this thread (i.e. Pika)?
> 
> A unicode codepoint, plus buckybits.   

I don't think these buckybits are a good idea.  On almost any system,
the relationship between "byte", "Unicode codepoint" and "key sequence"
is non-trivial.  Gluing things together is prone to confusion and
future problems.

Just decide how many ISO 10646 planes you want to support, and use the
appropriate number of bits (21 is fine).  Use an additional bit to
squeeze in 256 code positions you might want to use to represent invalid
UTF-8 input data (so you have round-trip capability even for binary
files accidentally interpreted as UTF-8).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]