bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: console update


From: Niels Möller
Subject: Re: console update
Date: 17 Jun 2002 12:58:12 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de> writes:

> I don't think curses has a choice here.  It needs to send those characters
> down to the terminal.

That's true, I guess, at least as long as the terminal uses unicode or
utf8. If the terminal only speaks latin1 (e.g.), one will lose one way
or the other no matter what curses does.

> As far as our console goes, it doesn't have a choice for similar
> reasons (the characters are passed down to a display client).

I'm thinking of clients that can display more or less arbitrary
bitmaps, like the vga client, X clients, and framebuffer clients.
I think those are the only clients that have a figting chance of doing
anything resembling the Right Thing with unicode. Even if you allow
only two or three combining characters.

> In fact, in our implementation we even have less of a choice, as we don't
> pass the data down in a stream, but provide a matrix of characters.

The idea that came to my mind was this: Each cell in the matrix contains
a precomposed unicode character (one single word) and attributes.

For each combination of base+combining characters that occurs, and
which doesn't have precomposed forms in unicode, allocate one code
word in the private use area at 0xe000. Of course, one can't fit all
combinations, but one could use a cache of the 1000-4000 most recently
used combinations. I think it is good enough to have that cache be
global for the console server, but one could have one for each screen,
if really needed. I suspect that every language that is supported at
all in unicode can be represented with precomposed characters and a
small number of combinations, so a fairly small cache should work
fine.

(And if one think that the private use area is too small, or that it
shouldn't be abused in this way, the range 2^31 to 2^32-2 is totally
free. Only -1 (for EOF) to 2^31-1 is needed by ISO-10646).

Clients need to be able to look up what such a cooked up code means.
It needs to get (i) the corresponding unicode character sequence
[needed if it's written to an utf8 capable terminal, or for
copy&paste], and (ii) a bitmap that can be used to draw the glyph. The
latter needs some cooperation between the console server and any font
server, I guess. Note that the extra indirection is used *only* for
glyphs that can't be represented as a single Unicode character. So it
sure adds complexity, but I don't think it will hurt performance, at
least not in the common case. And the table will change a lot less
often than the matrix.

If/when the cache overflows, occurences of codes that must be garbage
collected can either be replaced with the unicode replacement
character, or (trickier) expanded to its component characters.

IIRC, you've already been thinking along those lines for the vga
client, I think you mentioned putting glyph bitmaps into the vga card
dynamically, depending on which glyphs were actually used on the screen.

I think one would also want an option to *not* combine characters at
all, and give each unicode character, base or combining, it's own
individual cell in the matrix. One might also want to have
configurable glyph boundaries, as described in section 5.15 of the
Unicode Standard version 3, "Locating Text Element Boundaries",
"Grapheme boundaries".

Regards,
/Niels



reply via email to

[Prev in Thread] Current Thread [Next in Thread]