bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: term, utf-8 and cooked mode, combining characters


From: Marcus Brinkmann
Subject: Re: term, utf-8 and cooked mode, combining characters
Date: Wed, 18 Sep 2002 15:04:14 +0200
User-agent: Mutt/1.4i

On Wed, Sep 18, 2002 at 02:46:07PM +0200, Niels Möller wrote:
> Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de> writes:
> 
> > > For the multibyte issue, console already knows all about the characters.
> > > So it can naturally dtrt if the term functionality is built in via
> > > libtermserver.  That seems like the righter thing.
> > 
> > The console does not know about single characters written to term.  It
> > just converts the UTF-8 stream from the client via an iconv into the local
> > encoding, and outputs that.
> 
> Why is it utf8 from the client? I would have expected unicode to be
> the "native" language of the console (you already understand it for
> output)?

Because UTF-8 is trivially free of endian problems, and more likely what
the console is forwarding to term anyway.  Nobody will use UTF-32 as their
local encoding for the foreseeable future, right?

We use UTF-32 in the output part, because that is organized as shared memory
with a screen matrix, so the fact that all characters have the same length
is an advantage (it's also what ncursesw does, for example, and wants, and
of course it is the internal format of glibc).

There is no advantage whatsoever to use the same encoding in the input and
output half.  Both are completely separated.
 
> How many different multibyte charsets are there? I think it is a
> reasonable rule to assume that any random eightbit charset is a
> unibyte charset, i.e. one byte = one character = one glyph (modulo
> control characters). And then make utf8 an exception to that rule, and
> perhaps one could handle one or two additional charsets as
> exceptional.

I think all this legacy chinese/japanese/korean stuff, bu .  Of course we could
just hard code UTF-8 support in term (that's what a patch for the Linux
kernel does), but that is kinda cheap. ;)  In particular as everything else
will support any encoding transparently.

Thanks,
Marcus

-- 
`Rhubarb is no Egyptian god.' GNU      http://www.gnu.org    marcus@gnu.org
Marcus Brinkmann              The Hurd http://www.gnu.org/software/hurd/
Marcus.Brinkmann@ruhr-uni-bochum.de
http://www.marcus-brinkmann.de/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]