gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] [OT] Unicode vs. legacy character sets


From: Tom Lord
Subject: [Gnu-arch-users] [OT] Unicode vs. legacy character sets
Date: Tue, 3 Feb 2004 10:16:28 -0800 (PST)




    > From: Tom Lord <address@hidden>

    > I think we can assume with an escape clause.  Most people should
    > agree that Unicode is ample.  Many people will want to use a
    > character set with a lossy roundtrip to Unicode.  But as long
    > as arch is tolerant of private-use codepoints, this issue can
    > reasonably be "not our problem".

Just to be clear, by "not our problem" I only mean that the bulk of
the hacking work would not fall on the arch project -- though some of
it would.

It seems clear to me that Unicode is the future and that, for sanity's
sake, it makes the greatest sense for me (and most others) to focus
on writing clean Unicode software rather than writing "character set
neutral" software.

But controversy of Unicode seems to persist, mostly around the "Han
Unification" issue.

For application writers -- that issue should mostly be a non-issue.
In principle, Unicode could be upwardly compatibly extended in ways
that "undo" the unification.   It might very well be a dumb idea to do
so -- it might very well not be.   But either way - for an application
programmer writing anything but the most linguistically sensative code
- it's a non-issue.   Both answers fit equally well within the
"logical structure" of Unicode.   Very little good Unicode software
should care one way or the other.

For libhackerlab, Pika Scheme, and now it seems arch -- I'm interested
in writing good Unicode software.

So what I am (tentatively) willing to do is this: if there's enough
programmers who both (a) want to help with my software and (b) are
against unification -- I'm willing to have libhackerlab (hence Pika
and arch) use an _extended_ Unicode.  Standardizing, within those
libraries and programs on assigning-by-convention some private-use
codepoints to un-unified characters.

That wouldn't provide interoperability with everything in the world --
far from it.   For example, it would be (at best) a long time before
browsers would recognize the non-standard characters.  

But for many applications (such as arch) -- it would allow the tools
we write to be used in environments favoring non-unified character
sets.

Beyond that -- it could provide a practical demonstration (or
refutation) of the benefits of undoing the unification in Unicode.

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]