[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Exclude optimization
From: |
Bruno Haible |
Subject: |
Re: [PATCH] Exclude optimization |
Date: |
Mon, 10 Aug 2009 01:16:23 +0200 |
User-agent: |
KMail/1.9.9 |
Hello Sergey,
> The proposed patch considerably speed-ups the exclude module
> for large exclusion lists of non-wildcard patterns.
Great idea!
'is_fnmatch_pattern' is probably a misnomer, because its argument is
by definition already an fnmatch pattern. What the function is testing is
whether it contains wildcard characters [1]ΒΈ in other words, whether it
matches a single string (modulo case, optionally) or more. Possibly better
names are:
fnmatch_pattern_has_wildcards
is_fnmatch_pattern_constant (for the opposite sense)
Btw, this function does not handle multiple adjacent backslashes correctly,
i.e. patterns such as 'ab\\[cd]'.
Also, I'm surprised that this function does not take an 'options' argument.
For example, the string 'ab\[cd\]' is constant if FNM_NOESCAPE is off, but
non-constant if FNM_NOESCAPE is on.
I would like to see some unit tests committed to this module before the
big optimization.
Thinking more into the future, there are now multiple optimizations of
regex / fnmatch from various sides. (Recall that fnmatch patterns can
be transformed into regular expressions.)
- grep and awk use a wrapper around regex, composed of 'kwset' and
'dfa', for speeding up regex. 'kwset' optimizes fgrep-like uses.
- fnmatch suffers from the API that provides no distinct "compile"
and "execute" passes. Ondrej Bilka was working on this in May 2009.
- Now for the 'exclude' module we have a particular optimization
for the case '^\(word1\|word2\|...\|more complex stuff\)$.
Also, the regex API has a few other shortcomings:
- It uses the current locale, does not have an explicit locale_t argument,
- It does not work on Unicode strings or with Unicode character predicates.
I hope there will be a better, unified, optimized regex in GNU some day.
Bruno
[1] http://en.wikipedia.org/wiki/Wildcard_character