speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Speakup and UTF-8


From: Hynek Hanke
Subject: Speakup and UTF-8
Date: Mon Sep 4 09:59:45 2006

On Mon, Feb 09, 2004 at 12:33:40PM -0500, Kirk Reiser wrote:
> I apologize about the spk_chartab[] misunderstanding.  I was thinking
> about the character array so I misunderstood what you were referring
> to.  I would like to see that changed to reflect characters above 128
> correctly.  

Hi Kirk,

I don't think there are common range across all the character sets.
Maybe there are ranges about which we could say they represent
alphanumeric characters, but definitely we can't distinguish
small letters, capital letters and numbers implicitly. In some
character sets, they are mixed together. But I really
think there will be no need for this when Speakup uses UTF-8
internally.

> As to your question about utf8 or unicode, speakup could handle that
> by modifying the call to speakup_con_write() in console.c line 2006 to
> pass the int instead of recasting to a char. 

Yes. I've checked this and it seems as a quite simple way how to input
data to Speakup in full UTF-8. Then, all variables in Speakup that
represent characters would have to be changed to int from char and
arrays to int* instead of char*. But this shouldn't be hard for anyone
who is familiar with the code.

Also, the punctuation processing wouldn't work for characters above
standard ascii, but I don't think this is a problem as I think this
should be only a last option when the synthesizer doesn't support
punctuation itself. It's definitely not an obstacle for Festival
and other software synthesizers.

> The only thing I am
> reluctant about there is there someway of handling it in speakup.c to
> as not to need to define an array of 65536 elements because of the
> space requirement. 

I don't like this table at all. Why do you need it? There are some
control characters and special characters in some range of the standard
ascii. The others are just different letters and numbers. Why do you
need to know, which letter is capital in Speakup? I think it would
be better to pass the synthesizers just commands like "capital
recognition on" and "capital recognition off". The information whether
the letter is capital or not is carried in the letter itself.

If there are synthesizers you want to support and they don't have
capital recognition, it would be better to emulate it in their
drivers instead. It should be fairly simple, but it will be restricted
only to some parts of UTF-8 that you have tables for. But this doesn't
matter, since the rest of the synthesizers will not be affected by this
limitation.

Speakup is a screen reader, I think it should only pass the appropriate
text and set the right parameters on the synthesizers, not repeat
the functionality of the synthesizers themselves. I'm definitely not
against providing this option for the devices that don't have the
capability of handling the more advanced things themselves, but let's
not limit the better devices with that.

If you take the thing as: ``all devices are the same, all of them
are stupid'', then you will allways get only the ``higest common
divisor''. I think Speakup could do better!

So I propose to not distinguish between characters on a general level
and maybe provide this capability for the devices that don't support
this in their drivers (of course using some functions defined at one
place, not repeating the code). You can consider if it's
worth the effort (I know little about hardware devices).

> Maybe we could define a two dimentional array with
> null lookups for sets which didn't exist or weren't necessary.  It'd be
> very trivial to pass in the entire 16 bit argument though.  It is
> after the conversion so the character is correct at that point and I
> strip off the high byte.

I'm sorry, but I don't understand the last two sentences here very much.
Could you please explain it in more detail?

Have a nice day,
Hynek


reply via email to

[Prev in Thread] Current Thread [Next in Thread]