classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: generation of gnu/java/locale/*.uni


From: Eric Blake
Subject: Re: generation of gnu/java/locale/*.uni
Date: Sun, 17 Feb 2002 00:46:19 -0700

Brian Jones wrote:
> 
> As I recall Unicode now requires more bits than a Java 'char' allows.
> I don't know that helps at all?  I don't really know what Sun's
> solution is.  It looks like we did update to unicode data 3.0, but I
> know our implementation fails many Mauve tests related to Character.

Unicode 3.1 introduced several code points in the surrogate space.  And
the upcoming 3.2 adds even more.  These characters require two 16-bit
fields to represent them (the first in \ud800 - \udb7f, the second in
\udc00 - \udfff).  And Java does ignore these - the 4-byte abbreviation
sequences of UTF-8 are illegal in class files (you have to use a 6-byte
sequence instead), and Java identifiers may not include surrogate
characters.  Sun would need to add more methods to the API to use them,
because the point of surrogates is that two characters together have
semantic meaning, while one alone is an error.  For example, it is
impossible to tell if \ud820 in isolation is part of a letter, number,
or punctuation.  So for now, Sun's "solution" is to stall.  I did verify
today that JDK 1.4 is still on Unicode 3.0.0.

The implementation of Character that I just checked in to Classpath is
identical in behavior to Sun's (fortunately, testing every method on all
64k chars is not terribly time-consuming).  However, I could not run it
through Mauve; as I still have been unable to compile a free VM on
cygwin, and Sun's VM doesn't like me replacing core classes like
Character.  But if Character fails any tests in Mauve now, then I would
suspect that Mauve has the bugs.

-- 
This signature intentionally left boring.

Eric Blake             address@hidden
  BYU student, free software programmer




reply via email to

[Prev in Thread] Current Thread [Next in Thread]