speechd-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BUG] speechd-up fails to detect one-character messages


From: Alexander E . Patrakov
Subject: [BUG] speechd-up fails to detect one-character messages
Date: Mon, 14 Jan 2008 18:39:18 +0500

2008/1/14, Alexander E. Patrakov <patrakov at gmail.com>:
> 3) speechd-up counts bytes for which isspace() returns false. However,
> since at this stage non-ASCII characters are represented by multibyte
> sequences, the count becomes greater than 1 (because each byte is
> counted separately). Since at this stage you know that the text is in
> UTF-8, you should count bytes less than 0x80 that are not spaces, and
> also bytes in the 0xc2..0xf4 range (inclusive). This excludes bytes in
> the 0x80..0xbf range, which are always "continuation" bytes (i.e., are
> part of the same character as the previous byte in the stream).

And I forgot to say that using a "%c" in printf is _always_ wrong with
UTF-8 text, exactly because of multibyte characters. You should pass
the complete multibyte character (i.e., all its bytes in one CHAR
command) to speechd-up, by making a string out of it (see below how to
determine the number of bytes to be copied) and using "%s" with
printf.

First byte      number of bytes to copy
0x00-0x7F         1 byte
0xC2-0xDF        2 bytes
0xE0-0xEF        3 bytes
0xF0-0xF4        4 bytes

-- 
Alexander E. Patrakov


reply via email to

[Prev in Thread] Current Thread [Next in Thread]