Re: console translator set without encoding

bug-hurd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: console translator set without encoding

From:	Danilo Segan
Subject:	Re: console translator set without encoding
Date:	Sun, 23 Jan 2005 01:44:25 +0100
User-agent:	Gnus/5.11 (Gnus v5.11) Emacs/21.3.50 (gnu/linux)

Hi Marcus,

Yesterday at 5:56, Marcus Brinkmann wrote:

> At 21 Jan 2005 19:31:13 -0800,
> Thomas Bushnell BSG wrote:
>> 
>> Marcus Brinkmann <marcus.brinkmann@ruhr-uni-bochum.de> writes:
>> 
>> > UTF-8 is an insanely complex standard, if you start to look down its
>> > depths.  
>> 
>> UTF-8 is a complex standard.  It is not insanely so.  It is complex
>> because it is representing a very complex problem.  

Now, UTF-8 is an extremely simple standard, but Unicode is not so :)
Proper UTF-8 transformation functions usually take no more than a
couple of dozen lines, and that's including error checking :)

Or, I may be missing what UTF-8 standard you're talking about (RFC
something :).

> Oh, sure.  The insanity starts if you talk about using "UTF-8" for
> things like filenames without being very exact in what you mean by
> that.  The implications of putting the complex system UTF-8 into a
> POSIX-like operating systems as they exist today are not well
> understood, and the resulting lose ends, conflicts, etc are not
> resolved as of today.

POSIX has never used "equivalences" for characters (i.e. 
case-differences), so I don't see what's so different in using 
UTF-8 instead of ISO-8859-1 for filenames: after all, one can treat
UTF-8 as ISO-8859-1 without any problem at all, so from POSIX point of
view, it all works, just displays as gibberish :)

Using normalized forms would then simply be up to the writer and
reader, just as it is up to the writer and reader today to check for
all of "Music", "music", "mUSIC" and similar when a user actually
searches for his music directory.  Of course, going a step further and
doing this in libdiskfs or wherever is nice as well.

Users' expectations are that they can use their own characters.
Character set is only an implementation detail, and whoever cares
about it is not a regular user, but a technical computer user (a
programmer most commonly).  UTF-8 in that sense simplifies the
implementation, instead of complicating it (as you seem to be
suggesting), and it further improves the portability.

Of course, UTF-8 is no hammer for every nail, as you put it, but it's 
clearly an improvement over any 8-bit character set in the POSIX world.

Well, this is just my opinion at least :)

Cheers,
Danilo

[Prev in Thread]

Current Thread

[Next in Thread]

Re: console translator set without encoding, Marcus Brinkmann, 2005/01/21
- Re: console translator set without encoding, Thomas Bushnell BSG, 2005/01/21
  - Re: console translator set without encoding, Marcus Brinkmann, 2005/01/21
    - Re: console translator set without encoding, Thomas Bushnell BSG, 2005/01/21
    - Re: console translator set without encoding, Marcus Brinkmann, 2005/01/22
    - Re: console translator set without encoding, Thomas Bushnell BSG, 2005/01/22
    - Re: console translator set without encoding, Danilo Segan <=
    - Re: console translator set without encoding, Samuel Thibault, 2005/01/23
    - Re: console translator set without encoding, Danilo Segan, 2005/01/23
    - Re: console translator set without encoding, Marco Gerards, 2005/01/23
    - Re: console translator set without encoding, Danilo Segan, 2005/01/23
- Re: console translator set without encoding, Alfred M. Szmidt, 2005/01/22
- Re: console translator set without encoding, Michal 'hramrach' Suchanek, 2005/01/23

Prev by Date: Re: manual
Next by Date: Re: console translator set without encoding
Previous by thread: Re: console translator set without encoding
Next by thread: Re: console translator set without encoding
Index(es):
- Date
- Thread