bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

optimizing localcharset


From: Bruno Haible
Subject: optimizing localcharset
Date: Sun, 20 May 2018 13:38:06 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-119-generic; KDE/5.18.0; x86_64; ; )

Hi,

Since wcwidth() reportedly has become a bottleneck [1][2], and some of the time
the gnulib wcwidth() replacement spends is in localcharset(), let me optimize
localcharset().

Patch 1 removes support for Linux libc5 (obsolete since ca. 2001), glibc 2.0.x
(last used in Red Hat Linux 5.2, obsolete since ca. 2003 [3][4]) and
Mac OS X 10.2 (obsolete since 2003-2005 [5]).

Patch 2 adds a simple manual test, so that I can verify the results are as
expected when doing changes to the code.

Patch 3 removes the ability to specify the platform-dependent mapping in an
external file. This ability was useful up until ca. 2007. config.charset
has not changed for Unix platforms since 2010, therefore it is safe to assume
that the current mappings are nearly correct, i.e. not many people will need
to adjust them, and those that do can report it here or change the source code
locally.

At the same time, introduce a binary search for the mapping lookup.

The ultimate optimization of the tables would be through gperf, but this
comes with the cost of several extra files in the source code tree, and is
mostly relevant for old platforms only.

Patch 4 adds missing mappings. Found while testing on various platforms.

Patch 5 is a micro-optimization.

I also attempted to replace the binary search that works with strcmp()
with one that progresses one character at a time. This has the same overall
asymptotic complexity, and uses ca. 20% less memory accesses, but
is slower by a factor of 2. 'perf annotate' told me that this is because
apparently strcmp() has a fast implementation in glibc, whereas the
"one character at a time" algorithm uses plain x86_64 instructions
throughout. And apparently the cost of calling the function strcmp()
is negligible.

Bruno

[1] https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00059.html
[2] https://lists.gnu.org/archive/html/coreutils/2018-05/msg00013.html
[3] https://distrowatch.com/table.php?distribution=redhat
[4] https://en.wikipedia.org/wiki/Red_Hat_Linux
[5] https://en.wikipedia.org/wiki/Darwin_(operating_system)#Release_history

Attachment: 0001-localcharset-Remove-support-for-obsolete-platforms.patch
Description: Text Data

Attachment: 0002-localcharset-Add-a-manual-test.patch
Description: Text Data

Attachment: 0003-localcharset-Move-mapping-tables-into-the-code.patch
Description: Text Data

Attachment: 0004-localcharset-Map-the-locale-encodings-found-in-newer.patch
Description: Text Data

Attachment: 0005-localcharset-Optimize.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]