Re: console translator set without encoding

From: Thomas Bushnell BSG
Subject: Re: console translator set without encoding
Date: 21 Jan 2005 19:31:13 -0800
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes:

> UTF-8 is an insanely complex standard, if you start to look down its
> depths.  

UTF-8 is a complex standard.  It is not insanely so.  It is complex
because it is representing a very complex problem.  

It is a standard computer programmer's disease to start talking about
how much easier the world would be if every Latin character set had
the same rules for capitalizing I, but they don't and it's the job of
the computer to make both Turkish and French work.  Complaining that
this is hard is crazy; good grief, it's certainly no harder than a
fancy VM system.

[The reference is that Turkish has two letters I: I-with-dot and
I-without-dot, in each case.  So in French, you have lowercase
I-with-dot which capitalizes to capital I-without-dot.  But in
Turkish, lowercase I-with-dot maps to capital I-with-dot.]

So faced with a long history of computer programmers doing just
enough to get by, pretending that language writing systems were
simpler than they really are, the Unicode designers laudably set the
goal of adapting to the world, rather than forcing the world to adapt
to the damn computer.


