[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#43527: [PATCH] grep: avoid unneeded compilation of regex
From: |
Jim Meyering |
Subject: |
bug#43527: [PATCH] grep: avoid unneeded compilation of regex |
Date: |
Sun, 20 Sep 2020 18:34:02 -0700 |
On Sun, Sep 20, 2020 at 12:17 AM Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> Hi,
> Performace for as following case is fixed in bug#43040.
>
> $ yes 0 | head -100000 | sed '$s/././' >pat
> $ grep -vf pat /dev/null
>
> However, still slow and a lot of memory wasted for the following cases.
>
> $ grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words
>
> This bug is introduced in commit abb7f4f2325f26f930ff59b702fe42568a8e81e7.
> Though it's an optimization for patterns with backreferences, it seems
> to cause performance degradation in many cases due to regex
> implementation issues.
>
> grep needs regex engine when patterns is not supported by DFA engine,
> and when either given only matching (-o) or color option (--color) is
> given.
>
> In other words, if none of them are met, grep only uses regex to check
> the syntax. grep avoids compilation of regex not to check syntax by this
> patch.
Yikes. Thank you!
That exposes (and fixes in this common case) a problem that makes grep
require memory that is quadratic in the number of regular expressions.
To illustrate, I ran some timings.
With only 80,000 lines of /usr/share/dict/linux.words, the following
would use 100GB of RSS and take 3 minutes. With the fix, it used less
than 400MB and took less than one second.
head -$N /usr/share/dict/linux.words > w; grep -vf w w
N Mem(k): Old New
20000 6341188 (2.4s) 103168
40000 25241288 (9.29s) 199188 (0.31s)
80000 100547432 (180s) 392872 (0.66s)
I've just pushed the gnulib-adjusting patch and will push the other soon.
I'll also add a test and a NEWS item in a separate patch.
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Norihiro Tanaka, 2020/09/20
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex,
Jim Meyering <=
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Jim Meyering, 2020/09/21
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Norihiro Tanaka, 2020/09/22
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Jim Meyering, 2020/09/22
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Norihiro Tanaka, 2020/09/22
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Jim Meyering, 2020/09/22
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Norihiro Tanaka, 2020/09/22
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Jim Meyering, 2020/09/23
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Paul Eggert, 2020/09/23
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Jim Meyering, 2020/09/23
- bug#43527: [PATCH] grep: avoid unneeded compilation of regex, Jim Meyering, 2020/09/26