bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uc_width and wcwidth optimization


From: Bruno Haible
Subject: Re: uc_width and wcwidth optimization
Date: Tue, 13 Dec 2011 11:32:53 +0100
User-agent: KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; )

Hello,

Alexander V. Lukyanov wrote:
> Attached is the patch to optimize performance of wcwith, uc_width and
> uc{8,16,32}_width functions.
> 
> The optimization is caching of is_cjk_encoding() and using
> nl_langinfo(CODESET) before the complex locale_charset() to check if the
> charset has changed.

Thanks for the patch, but I cannot use it like this:
  1) The uc_width change modifies public API of libunistring.
     You can introduce new API in <uniwidth.h>, but changing the signature
     of an existing function is impossible.
  2) The wcwidth change is a good idea, but unfortunately is not multithread-
     safe. Different threads can have different locales, therefore a global
     variable as a cache won't lead to correct results always.

I'm attaching the benchmark program I'm experimenting with. So far, it seems
that locale_charset() is really slow, whereas the is_cjk stuff is not a big
speed problem.

I would love to have locale_charset be either faster or use some thread-safe
cache. Do you have an idea how to realize this?

> Besides, uc_width is used in wcwidth for cjk encodings as designed.

-  if (STREQ (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0))
+  if (cached_is_utf8_encoding || cached_is_cjk_encoding)
     {
       /* We assume that in a UTF-8 locale, a wide character is the same as a
          Unicode character.  */
-      return uc_width (wc, encoding);
+      return uc_width (wc, cached_is_cjk_encoding);
     }

This won't work portably: The comment says that only in UTF-8 locales we know
that a wchar_t represents a Unicode character. In locales with encodings
such as EUC-JP or GB18030 you cannot assume anything about how to libc has
defined the wchar_t values.

Bruno
-- 
In memoriam The victims of the Massacre of Margarita Belén 
<http://en.wikipedia.org/wiki/Massacre_of_Margarita_Belén>

Attachment: bench-wcwidth.c
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]