Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Sp

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Sp

From:	Tom Lord
Subject:	Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...)
Date:	Wed, 21 Jan 2004 19:29:05 -0800 (PST)



    > From: Andrew Suffield <address@hidden>

    > On Wed, Jan 21, 2004 at 05:20:29PM -0800, Tom Lord wrote:

    >> My understanding is that there are certain characters (in one
    >> sense of the word) which are common to Chinese, Japanese, and
    >> Korean.  There are, broadly speaking, four different styles of
    >> rendering these characters as glyphs -- two for Chinese
    >> (traditional and simplified), and one each for Japanese and
    >> Korean.  That is to say, there are four different ways of
    >> drawing these characters.

    > It's not quite that simple - there are multiple, similar-looking
    > ways of writing the same character within the same language in
    > some cases. Usually it doesn't matter, but for some things it
    > does - names are a good example. For a person's given name,
    > writing the character differently is akin to spelling it
    > differently in English, and Han unification is akin to declaring
    > that from now on, all people with names like "Tom", "Thom", or
    > other derivatives of "Thomas" will henceforth be called "Thom".

I'm having trouble seeing the analogy.  As far as I can tell you are
comparing related words (names in this case), spelled in a a phonetic
alphabet, all in one language -- to related words, written in an
ideographic script, from different languages or from different regions
where (roughly) the same language but different typography is used.
Can you try to explain it more precisely?

It is true, as far as I know, that Unicode does not include all
ideographs used for personal names but I don't think that that's the
issue you are talking about.


    > The Eastern countries are pretty serious about etiquette, and using
    > the wrong writing for somebodies name could easily tip the balance
    > between a contract going to you, or to your next competitor.

With due respect, and acknowledging that I take your meaning and that
you meant no harm, I think that that statement at least boarders on
harmful cultural stereotyping.  We have no shortage complex, sometimes
codified, and sometimes quite irrational rules of etiquette here in
the "Western countries".



    >> A single font can render each these characters in a way such that all
    >> users will be able to recognize and read them.  Linguists would (so I
    >> hear) generally agree that, though they may be written in four
    >> different styles, these are each a single character.

    >> No single font can render each of these characters in a way that will
    >> seem "natural" to all users -- a single font can only make them
    >> legible.  For "natural" rendering, you would want to use one font for
    >> Japanese text, another for simplified Chinese, and so forth.

    > If you don't code them as the same character, then having a font that
    > uses the proper writing for them all is easy. Mozilla under X, for
    > example, does it pretty well so long as you don't use unicode and have
    > enough fonts installed - it'll pick a font that matches the character
    > set of the web page.

    > If you use unicode, there is no way to tell which font is the right
    > one to use. Sometimes the application is going to pick the wrong one,
    > and the result is an awful ugly mess. FroM an aEsthEtic pErspEctive, a
    > docuMEnt whErE soME of thE charactErs usE the ChinEse style and the
    > rEst usE the JapanEsE is fairly siMilar to a docuMEnt where randoM
    > charactErs have had their casE flippEd. You can parse it, but you
    > don't *want* to.

Why isn't that a problem to be solved with markup?


    > The unicode "solution" to this is for Chinese users to use Chinese
    > fonts, Japanese users to use Japanese fonts, and neither to interact
    > with the other, which quite neatly defeats the point of unicode.

I thought that the solution was to use a sub-optimal but readable font
where markup is unavailable (miles' "README test") and to use things
like markup elsewhere.

If I write about math or programming in ASCII, to be clear either I
use typographical conventions like `variable' or I rely on a markup
system to set "variable" in a distinctive font.

Why is CJK different?   (It's an open-minded question, not a
rhetorical one.)


    >>     >> It isn't perfect and it certainly is not complete when you
    >>     >> consider all forms of writing humans have ever used, but it is
    >>     >> maintained, it works at least as well as anything else out there.

    >>     > Doesn't do that either, if you happen to be Chinese, Japanese, or
    >>     > Korean.

    >> As nearly as I can tell, opinions vary about that.  That is to say
    >> that there are some Chinese, Japanese, Koreans, and certainly plenty
    >> of others who would disagree with asuffield, here.

    > I don't think any rational CJK users would agree that unicode does
    > everything that the native character sets do. Some are willing to make
    > this tradeoff, and some are not (which is not the same thing). There
    > is a not insignificant amount of "I'm willing to put up with this, but
    > there is *NO WAY* my boss is going to accept it".


I'm not sure what you mean by that.

-t

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gnu-arch-users] Spaces in filenames ... will come soon!, (continued)
- Re: [Gnu-arch-users] Spaces in filenames ... will come soon!, Robert Anderson, 2004/01/15

Prev by Date: [Gnu-arch-users] Speeding up mirroring
Next by Date: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...)
Previous by thread: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...)
Next by thread: Re: [Gnu-arch-users] Re: [semi-OT] Unicode / han unification (was Re: Spaces ...)
Index(es):
- Date
- Thread