[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uc_width and wcwidth optimization
From: |
Bruno Haible |
Subject: |
Re: uc_width and wcwidth optimization |
Date: |
Tue, 13 Dec 2011 11:32:53 +0100 |
User-agent: |
KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; ) |
Hello,
Alexander V. Lukyanov wrote:
> Attached is the patch to optimize performance of wcwith, uc_width and
> uc{8,16,32}_width functions.
>
> The optimization is caching of is_cjk_encoding() and using
> nl_langinfo(CODESET) before the complex locale_charset() to check if the
> charset has changed.
Thanks for the patch, but I cannot use it like this:
1) The uc_width change modifies public API of libunistring.
You can introduce new API in <uniwidth.h>, but changing the signature
of an existing function is impossible.
2) The wcwidth change is a good idea, but unfortunately is not multithread-
safe. Different threads can have different locales, therefore a global
variable as a cache won't lead to correct results always.
I'm attaching the benchmark program I'm experimenting with. So far, it seems
that locale_charset() is really slow, whereas the is_cjk stuff is not a big
speed problem.
I would love to have locale_charset be either faster or use some thread-safe
cache. Do you have an idea how to realize this?
> Besides, uc_width is used in wcwidth for cjk encodings as designed.
- if (STREQ (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0))
+ if (cached_is_utf8_encoding || cached_is_cjk_encoding)
{
/* We assume that in a UTF-8 locale, a wide character is the same as a
Unicode character. */
- return uc_width (wc, encoding);
+ return uc_width (wc, cached_is_cjk_encoding);
}
This won't work portably: The comment says that only in UTF-8 locales we know
that a wchar_t represents a Unicode character. In locales with encodings
such as EUC-JP or GB18030 you cannot assume anything about how to libc has
defined the wchar_t values.
Bruno
--
In memoriam The victims of the Massacre of Margarita Belén
<http://en.wikipedia.org/wiki/Massacre_of_Margarita_Belén>
bench-wcwidth.c
Description: Text Data