[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: faster fnmatch

From: Bruno Haible
Subject: Re: faster fnmatch
Date: Fri, 17 Apr 2009 13:02:57 +0200
User-agent: KMail/1.9.9

Hello Ondrej,

> > Hello. I am writing partial fnmatch to speed up locate et al.

Cool! We know for some time already that this is a bottleneck [1].
I find it also interesting that you go for a two-step approach,
preprocess the pattern once and use it for matching often - the same
approach that we considered useful in [2].

> > FOLD causes worst performance slowdown.
> > From what I tried is best convert in buffer to uppercase when needed.
> This seems like an attractive option but I'm a little concerned about
> whether this will produce the correct result with characters like ß or
> Ï and ï.
> ... Bruno has implemented a
> large amount of Unicode support in gnulib, and this may in fact help

Yes, implementing a case-insensitive matching by doing an uppercasing
pass on both sides is an old technique that ignores the properties of
a couple of languages. For good support of all languages, it's necessary
to use a "casefolding" pass, such as the one described by the Unicode
standard and implemented in gnulib (and soon, libunistring), in
"unicase.h". In particular the function ulc_u8_casefold from
lib/unicase/ulc-casecmp.c looks like what's needed as preprocessing pass
for an arbitrary file name in locale encoding.

Note that this Unicode aware casefolding provides correctness; it will,
however, likely decrease the speed of the matching in 'find' (whereas in
'locate', the file name list can be processed ahead of time).


[1] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00040.html
[2] http://lists.gnu.org/archive/html/bug-gnulib/2007-12/msg00071.html

reply via email to

[Prev in Thread] Current Thread [Next in Thread]