[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] mbrtowc: work around glibc bug#19932

From: Paul Eggert
Subject: Re: [PATCH] mbrtowc: work around glibc bug#19932
Date: Sat, 9 Apr 2016 10:28:58 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

Bruno Haible wrote:
The function hard_locale is quite slow, as it calls setlocale(),
strdup(), and similar functions. rpl_mbrtowc is supposed to be fast,
as it's called once on every character in a string. Can't you get
away without the call to hard_locale?

I am concerned about performance there too. On glibc it is not so bad, since it hard_locale calls only setlocale. On my platform (x86-64, GCC 5.3.1 Fedora 23) calling hard_locale takes about 40 instructions total, including the setlocale. This cost is paid only for encoding errors; still, it'd be nicer to get it down.

On platforms like Solaris the cost is zero, since Solaris already conforms to future POSIX and this is tested at compile-time.

I don't know about other C libraries, such as FreeBSD. It's possible that this implementation could be quite slow there, as you say.

If this turns into a problem with GNU grep, I plan to fix it by having grep cache the results of mbrtowc in unibyte locales. GNU grep is already doing that for other reasons in its DFA engine, and I would merely need to have it do that in all places where performance is important. So this Gnulib performance problem need not be addressed for 'grep'; only for other programs that use Gnulib mbrtowc.

Perhaps we could add to the mbrtowc and/or hard-locale module a way to do the hard-locale test once after calling setlocale, so that the mbrtowc workaround can simply reference a boolean variable (either a global variable with setlocale, or a thread-local variable with uselocale). That would bring the 40 instructions down to 1 on glibc. Sounds like a bit of a hassle, but it should be doable.

Other performance hacks that I considered were to make hard_locale an inline function, and to have it test MB_CUR_MAX > 1 before going to the bother of calling setlocale and strcmp. Undoubtedly there could be benchmarks where this sort of thing would be a win, though the converse might also be true.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]