bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: memmem speedup


From: Bruno Haible
Subject: Re: memmem speedup
Date: Tue, 8 Jan 2008 01:54:55 +0100
User-agent: KMail/1.5.4

Eric Blake wrote:
> I've finished my implementation of a Two-Way plus Boyer-Moore hybrid
> string search algorithm for the memmmem module.  This patch has passed
> everything I've thrown at it so far, and it has the nice properties of
> avoiding alloca/malloc (hence it is async-safe), as well as allowing
> sub-linear complexity for long needles (the added test in test-memmem.c
> fails with the KMP algorithm but passes with my Two-Way+BM hybrid).  It
> guarantees fewer than 2N comparisons, and can determine misses as quickly
> as N/M.  I chose a threshold of 128 bytes in the needle before attempting
> the BM bad-character shift table, since constructing a 256-entry table is
> expensive up front with little gain for short needles.

Marvellous!!

Thanks for the comments; they are essential for understanding the code.
I also find your comments clearer than the text in .../~lecroq/...

Just two typos: sed -e s/acheive/achieve/ -e s/Peform/Perform/

And LONG_NEEDLE_THRESHOLD should better be (1U << (CHAR_BIT - 1))
rather than (1 << (CHAR_BIT - 1)), for platforms where CHAR_BIT = 32.
On such platforms, the function two_way_long_needle will not be used.
Therefore it's also appropriate to wrap the added test (tests/test-memmem.c
lines 155..180) in an
  if (CHAR_BIT < 10) { ... }.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]