bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: term, utf-8 and cooked mode, combining characters


From: Marcus Brinkmann
Subject: Re: term, utf-8 and cooked mode, combining characters
Date: Wed, 18 Sep 2002 15:37:14 +0200
User-agent: Mutt/1.4i

On Wed, Sep 18, 2002 at 03:25:41PM +0200, Niels Möller wrote:
> 
> > Nobody will use UTF-32 as their local encoding for the foreseeable
> > future, right?
> 
> I really don't know. Right now, utf8 seeems almost as impractical as
> utf-32 to me, and I don't know how that will change when more programs
> pick up support for larger character sets.

I can not imagine how you will use something that is not 7bit ASCII
backwards compatible on a Unix-like terminal.  A lot of interfaces have to
change before that is remotely possible, not to talk of all the applications
like the shell, filesystem, etc.  A lot of things use \0 to mark the end of
a string.  Having three of those in every ASCII-range character is somewhat
inconvenient.

> > There is no advantage whatsoever to use the same encoding in the input and
> > output half.  Both are completely separated.
> 
> They're the same program and the same binary, so at least it's less
> code bloat to add unicode support to the second half than to the
> first.
> 
> Using unicode somewhere in the input path seems necessary, and if you
> follow Rolands idea of putting more of term into the console server,
> then the console server seems to be the right place. If you don't do
> that, then I agree that the console need not know about it, and just pass
> the utf8 stream on to term.

I have no idea what you are talking about.  It seems to be related to the
thread, but you have to be much more precise.  What is this Unicode support
you are talking about, that my second half seems to be missing?

Here is the input half one more time, just to make sure that there isn't a
simple misunderstanding:

The console clients write UTF-8 encoded unicode to the console server via
the console/NR/input node.  The console server converts this input stream to
the local encoding (the one that it also receives characters in).  This can
be isolat1, UTF-8, or whatever.  Even UTF-32 if you want.  This is taken
from the --encoding option to the console server, which also determines
the output conversion.

The console server provides the locally encoded strings to the term server.

> > I think all this legacy chinese/japanese/korean stuff, bu . Of
> > course we could just hard code UTF-8 support in term (that's what a
> > patch for the Linux kernel does), but that is kinda cheap. ;)
> 
> Special casing utf8 is a reasonable thing to do in almost all cases

Maybe even in this case.  I guess we will find out.  But actually I consider
it simpler to just use the mb* functions than to roll my own.

Thanks,
Marcus

-- 
`Rhubarb is no Egyptian god.' GNU      http://www.gnu.org    marcus@gnu.org
Marcus Brinkmann              The Hurd http://www.gnu.org/software/hurd/
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]