gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [OT] Unicode meets Scheme strings draft


From: David Allouche
Subject: Re: [Gnu-arch-users] [OT] Unicode meets Scheme strings draft
Date: Fri, 23 Jan 2004 16:30:58 +0100
User-agent: Mutt/1.5.5.1+cvs20040105i

On Thu, Jan 22, 2004 at 09:39:46PM -0800, Junio C Hamano wrote:
> TL> Smiley acknowledged but, you know -- that's a pretty good reason to
> TL> have more buckybits than you expect to use on a normal
> TL> keyboard.
> 
> Let's take one step back a bit.  What is a "character" in the
> context of this thread (i.e. Pika)?

My guess (informed by the dont-hurt-your-fingers joke command) is that
this character model comes from elisp:

http://www.zvon.org/other/elisp/Output/SEC20.html#SEC26

The reason such a complicated character model is that experience shows
that it is generally convenient to be able to represent sequences of
keypresses as sequences of characters.

For example, in texmacs, we use a surprisingly complex "keyboard
wildcard" system (very bad name though) to make sequences like "esc esc
esc a", "F5 a" or "H-a" equivalent.

> I do not understand why you would want *any* bucky bit.  For
> example, what does the bucky bit "Shift" really mean?  How would
> a character #\S-A be different from #\S-a?  Is #\S-a really the
> character "a" with bucky bit "Shift" on?  Or is it simply a
> character "A", which is different from the character "a"?

This is a thorny issue, and I think the answer you are looking for does
not lie in Pika itself but in the way applications handle keypresses.
I only have some experience in GUI programming, so I might be missing
part of the problem (keyboard handling on console applications is a very
different matter) but I think you can generally represent the keyboard
handling process like that:

  input      system      application --> text input
  device --> input  -->  input layer
             layer                  `--> command input


The system input layer can convert input device data, typically
containing a keycode and a set of active modifiers, to symbolic keys.
This mapping is affected by the system settings (keymap) and
application mappings.

Somewhere in the chain it is decided whether "key 42 with shift and alt"
represents "S-A-a", "A-A", or "AE". The specifics are a bit inclear in
my memory, but the important thing to remember is that some
communication with the system (with Xlib, that's XLookupString) and some
"reasonable expectations" (like considering Shift as a modifier only for
control keys like ENTER) makes it possible to find out some "base
character" and which modifiers were consumed in its production. One
could call this process "keypress canonicalization".

Pika characters only come into action _after_ this conversion.

> Which one of the following is true: (eq \#S-a \#A) (eqv \#S-a \#A)
> (equal \#S-a \#A)?

None. They are different characters entirely.

> For that matter, What is the difference between #\C-a, #\C-A?
> How about #\U+0001?

Same thing, they are different characters. Input canonicalization is
done in the input layer. In texmacs, #\C-A is the canonical form of
#\C-S-a and #\U+0001 is probably not something you can type on your
keyboard. In a console application, #\U+0001 may be the canonical form
of #\C-a and #\C-A may or may not be inputtable.

-- 
                                                            -- ddaa




reply via email to

[Prev in Thread] Current Thread [Next in Thread]