[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improved console interface (was: Re: some console code checked in

From: Gaute B Strokkenes
Subject: Re: improved console interface (was: Re: some console code checked in
Date: Tue, 23 Apr 2002 05:27:57 -0400
User-agent: Gnus/5.090006 (Oort Gnus v0.06) Emacs/21.1 (i586-pc-linux-gnu)

On Tue, 23 Apr 2002, Marcus.Brinkmann@ruhr-uni-bochum.de wrote:
> On Mon, Apr 22, 2002 at 11:13:47PM -0400, Gaute B Strokkenes wrote:
>> Any Unicode character will fit in 21 bits, so you have plenty of
>> bits left over for attributes of various sorts.  Furthermore, the
>> Unicode standard guarantees that this will always be so.
> Is this also true for UCS-4?  Because that is what wchar_t is on GNU
> systems.  It would be nice if you could verify that.

Absolutely.  See, for instance,
<http://www.unicode.org/reports/tr19/>.  You will have to wade through
a lot of poorly worded standardese in order to convince yourself of
that, though.

It's also worth pointing out that UCS _is_ Unicode.  (Actually, UCS is
a term from ISO 10464, but there is no technical distinction between
that and Unicode, the distinction is merely bureaucratic nonsense).

In versions of Unicode up to 3.0, the 16-bit UTF-16 was the One True
Unicode, but the situation has changed since then, so that now UTF-8,
UTF-16 and UTF-32 are coequal concrete representation of strings of
Unicode scalar values.  UCS-4 is ISO-ese for UTF-32, and UCS-2 is
similar to UTF-16, except that surrogates (the hack that allows you to
squeeze 21-bit characters into the 16-bit encoding UTF-8) are not
allowed, so that only BMP characters (that is, characters with USV <
0x10000) can be used.

> I feel a bit uncomfortable with overloading the wchar this way,
> because it does not allow you to run iconv_t on the mapped memory,
> you have to copy it first.  But it might be an option.

Yes.  It is up to you to eventually decide whether it is easier to
keep the extra information together with or separate from the
character number.  I'm just pointing out that it is possible.  I
suppose it might be easiest to add an option to iconv to ignore the
extra bits.

If you want to design a console/terminal emulator in The Right
Way(tm), then you might want to have a look at the linux-utf8 [sic]
i18n@xfree86.org mailing lists, where the topic of next-generation
terminal emulation and consoles is discussed from time to time.  It
would be good to be compatible with UTF-8 xterm, for instance.

Gaute Strokkenes                        http://www.srcf.ucam.org/~gs234/
Now KEN is having a MENTAL CRISIS beacuse his "R.V." PAYMENTS are

reply via email to

[Prev in Thread] Current Thread [Next in Thread]