bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

strstr, async-safety, and worst-case performance


From: Eric Blake
Subject: strstr, async-safety, and worst-case performance
Date: Fri, 04 Jan 2008 18:56:43 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

A recent bug report[1] against glibc complains that its current
implementation of strstr is O(m*n), with m the length of the needle, n the
length of the haystack (the report is actually against the
analagous, but non-standardized, memmem, but the issue applies to both
functions; additionally, other implementations suffer from the same
complexity problems as glibc).

[1] http://sourceware.org/bugzilla/show_bug.cgi?id=5514

In the evaluation of the bug, several issues were raised.  First, POSIX
does not require any of the mem* and str* functions to be async-signal
safe (obviously, strdup cannot be async-signal safe since it mallocs; but
most others return results computed without the use of any global memory,
and can easily be written with reentrancy in mind).  In the common case,
it would be nice to rely on seemingly simple functions like strlen to be
async-signal safe.  This would mean an update to the table in XSH 2.4.3.

Second, it seems a shame that strstr is so frequently implemented with
O(m*n) worst-case performance, even when O(m+n) algorithms such as
Knuth-Morris-Pratt or turbo-Boyer-Moore have been documented for over 30
years.  These two algorithms require O(m) memory allocation to achieve
their algorithmic speedups, and hence are harder to make async-safe for
arbitrary needle length.  But more recently, the Two-Way algorithm
exploits the additional constraint of an ordered alphabet to provide an
algorithm with an O(m) preparation, O(n) scan, and O(1) space, meaning
that it should be possible to implement strstr using this algorithm while
still remaining async-safe.
http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260

Should POSIX require linear (rather than quadratic) worst-case performance
in strstr?  Or maybe we just leave this improvement as a
quality-of-implementation issue, but perhaps document in the application
notes document that many implementations use less-than-stellar algorithms?

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHfuPa84KuGfSFAYARAlpxAJ9BajGdIWa8iLOBv79MfAh+hzpAUACfd2AK
YLqcAHsd75hBqt4MXKOYDdA=
=Oi4n
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]