speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BUG] speechd-up fails to detect one-character messages


From: Alexander E . Patrakov
Subject: [BUG] speechd-up fails to detect one-character messages
Date: Mon, 14 Jan 2008 18:30:05 +0500

2008/1/14, Hynek Hanke <hanke at brailcom.org>:
> You are right. I've inverted the isprint test to !isspace() and set
> locale to "C".

The setlocale(LC_CTYPE, "C") call is completely unneeded, because this
is already the default until the application explicitly sets another
locale. In other words, the locale-related environment variables are
ignored until the application calls setlocale( LC_CTYPE, "").

> The controll characters should already be parsed out at that place.

isspace() is still wrong, see below. And, does the kernel really
generate whitespace characters (according to the "C" locale) that are
not spaces, i.e., 0x09 (tab), 0x0a (newline), 0x0b (vertical tab),
0x0c (form feed), 0x0d (carriage return)? Where can I read the
complete speakup kernel-to-userspace protocol description?

And still, single non-ASCII characters are not detected correctly.
Here is what happens.

1) speechd-up receives (from the kernel) some text in the encoding
specified with the "-c" option.

2) speechd-up converts the text to UTF-8, thus causing non-ASCII
characters to be represented by multibyte sequences.

3) speechd-up counts bytes for which isspace() returns false. However,
since at this stage non-ASCII characters are represented by multibyte
sequences, the count becomes greater than 1 (because each byte is
counted separately). Since at this stage you know that the text is in
UTF-8, you should count bytes less than 0x80 that are not spaces, and
also bytes in the 0xc2..0xf4 range (inclusive). This excludes bytes in
the 0x80..0xbf range, which are always "continuation" bytes (i.e., are
part of the same character as the previous byte in the stream).

-- 
Alexander E. Patrakov


reply via email to

[Prev in Thread] Current Thread [Next in Thread]