gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [OT] Unicode meets Scheme strings draft


From: David Allouche
Subject: Re: [Gnu-arch-users] [OT] Unicode meets Scheme strings draft
Date: Thu, 22 Jan 2004 16:21:16 +0100
User-agent: Mutt/1.5.5.1+cvs20040105i

Wow! I could not make it to the end of the document, but I have seen
something which I believe is a slight problem.

I apologize if my remarks and suggestions are addressed in a late part
of the document I have not read.


On Wed, Jan 21, 2004 at 09:12:26PM -0800, Tom Lord wrote:
>   /=== Pika Design:
> 
>    Pika is of the "approximately 2^21 characters" variety.
> 
>    Specifically, the Pika CHAR? type will in effect be a _superset_ of
>    the set of Unicode codepoints.   Each 21-bit codepoint will
>    correspond to a Pika character.   For each such character, there 
>    will be (2^4-1) (15) additional related characters representing the
>    basic code point modified by a combination of any of four
>    "buckybits".

Each Pika char is 21 bits long, 4 bits are used for bucky bits, that
leaves 17 bits for the unicode characters.

That is enough to represent all characters in the Unicode BMP (U+0000
to U+FFFF) and SMP (U+10000 to U+1FFFF), but that leaves out:

 * Supplementary Ideographic Plane (SIP): U+20000 -- U+2FFFF

 * Supplementary Special-purpose Plane (SSP): U+E0000 -- U+EFFFF

 * Whatever characters the Unicode people will want to stuff in the
   range from U+30000 to U+FFFFF in the future.

The reference is "Changes from Unicode Version 3.0 to Version 3.1" in
this document: http://www.unicode.org/versions/Unicode4.0.0/appD.pdf

Why not use 20 bits codepoints? This is large enough for Unicode today
and in the forseeable future.

>    and by applying buckybits (shift, meta, alt, hyper) an additional
>   15 characters can be formed giving the total set of 16 "A"
>   characters:

Also, the set of bucky bits seems a bit limited to me.

In TeXmacs, we represent keypress sequences as character strings, like
"C-x C-c", "M-<", or "S-return". I see no compelling reason why
"control" could not be a bucky bit. Though there not (yet) any support
for the "super" modifier, that's one modifier someone used once, so
someone may want to use it again (that could be a use for Menu key of
international keyboards).

Why not use control, shift, meta, alt, super, hyper bucky bits?

That would sum up to 26 bits per character.


Of course, while we are at it, why not just use 24 bits codepoints and 8
modifiers? Even if the extra codepoints and bits are not given any
special significance in Pika, they could come in handy for private use
by applications. Anyway, 21 bits chars or 26 bits chars are probably
going to be represented internally as 32 bits chars.

-- 
                                                            -- ddaa




reply via email to

[Prev in Thread] Current Thread [Next in Thread]