gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [semi-OT] Unicode / han unification (was Re: Spaces


From: David Brown
Subject: Re: [Gnu-arch-users] [semi-OT] Unicode / han unification (was Re: Spaces ...)
Date: Wed, 21 Jan 2004 17:35:35 -0800
User-agent: Mutt/1.5.4i

On Wed, Jan 21, 2004 at 05:20:29PM -0800, Tom Lord wrote:

> My understanding is that there are certain characters (in one sense of
> the word) which are common to Chinese, Japanese, and Korean.   There
> are, broadly speaking, four different styles of rendering these
> characters as glyphs -- two for Chinese (traditional and simplified),
> and one each for Japanese and Korean.   That is to say, there are four
> different ways of drawing these characters.
> 
> A single font can render each these characters in a way such that all
> users will be able to recognize and read them.  Linguists would (so I
> hear) generally agree that, though they may be written in four
> different styles, these are each a single character.

Korean is a bit more "annoying", since Unicode provides several
different ways to encode a single glyph.  There are two encodings in
Unicode that take several Unicode code points and map to a single glyph.
So, for example, 'Han' could be three code points, representing 'H',
then 'A', then 'N'.  There are two different encodings just for this.
Then, given a complex set of rules, this can be spilled down to a single
glyph for the syllable 'Han'.  There is also a codepoint just for the
symbol 'Han'.

All this means is that, especially for Korean, determine if two strings
are equal is quite complex.

Dave




reply via email to

[Prev in Thread] Current Thread [Next in Thread]