[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: faster fnmatch

From: James Youngman
Subject: Re: faster fnmatch
Date: Fri, 17 Apr 2009 10:47:20 +0100

[ CC += bug-findutils, bug-gnulib; bug-coreutils moved to BCC ]

On Thu, Apr 16, 2009 at 8:09 PM, Ondrej Bilka <address@hidden> wrote:
> Hello. I am writing partial fnmatch to speed up locate et al.

Please note that locate is part of GNU findutils, not GNU coreutils.
I have CC'ed this email to the correct list.

Regular expression matching with "locate -r" is also faster than the
default, globbing, match.

> Idea is save state state to match common prefix only once.
> fnmatch source is too complex so I wrote simplified version.
> My code is 2-4 times faster than fnmatch.
> Here is list what provided speedup and can be applied to original source
> FOLD causes worst performance slowdown.
> From what I tried is best convert in buffer to uppercase when needed.

This seems like an attractive option but I'm a little concerned about
whether this will produce the correct result with characters like ß or
Ï and ï.

> Other optimalizations are based on preprocessing pattern
> 1. For each * we compute minimal size of rest and we have smaller for cycles.
> 2. We can replace *? by ?*
> 3. If * is followed by letter we check it at for cycle of *
> 4. If pattern contains four consecutive letters we compare them as int

I'm, not certain I have fully understood your points, but if you can
speed up gnulib's fnmatch implementation, that would be good news.   I
looked at preprocessing the glob pattern to convert it into a regular
expression, but I think there were some problems around collating
symbols and character equivalence classes (i.e. that it''s not
possible for the application to extract information about these from
the C library).

Since I last looked at this issue though, Bruno has implemented a
large amount of Unicode support in gnulib, and this may in fact help
solve the problem too.    I've CC'ed the message to bug-gnulib since
that's where the folks who maintain the implementations of that
library and the fnmatch replacement hang out.

> I also tried optimalizations which didnt worked.
> use mask to match four letters and ?
> strlen trick to find first letter after * faster.
> Boyer-moore skiping didnt work either.
> --
> 50% of the manual is in .pdf readme files
> _______________________________________________
> Bug-coreutils mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/bug-coreutils

reply via email to

[Prev in Thread] Current Thread [Next in Thread]