bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#22357: grep -f not only huge memory usage, but also huge time co


From: Norihiro Tanaka
Subject: Re: bug#22357: grep -f not only huge memory usage, but also huge time cost
Date: Sun, 11 Dec 2016 11:22:11 +0900

On Fri, 9 Dec 2016 01:24:19 -0600
Trevor Cordes <address@hidden> wrote:

> I bisected this bug to commits:
> 662b19f2d0edc0bf07f1a2c1421080252df4a37c
> 468d5217ed6ec1679512dec208c7f30fb8612957
> (can't narrow it down because the latter doesn't compile for me)

The changes switch used algorithm.  They convert grep -w -F to grep -w.

Try following test case and before and after the changes, please.

$ yes $(printf %040d 0) | head -10000000 >k
$ time -p env LC_ALL=C grep -w -F 0 k

It did not finish in 40s on my machine without the changes, but finished
in 0.54s after the changes, and following case finished in 1.35s
regardless whether the changes applied or not.

$ time -p env LC_ALL=C grep -w 0 k

It may be an extreme example, but if a lot of people assume that grep -F
is faster than grep, I think that it is a bug.

> grep -w -f /usr/share/dict/words /tmp/greptest
> (good older version: 2 seconds to complete, minimal memory)
> (any version after the above commits: 10 or more minutes, never waited for 
> it to finish, 1.2GB RAM usage and 100% cpu)

I believe that it can not happen, as the changes work for grep -F only.
Is it grep -F -w, correctly?  grep -F -w is not good at long pattern
after the changes.

I think that the real issue of bug#21763, bug#22239 and bug#22357 is
that dfa uses a lot of memory for a long pattern, but We have no idea to
improve it currently.

Following case is poor performance and uses a lot of memory regardless
whether the changes applied or not.

$ env LC_ALL=C grep -w -f /usr/share/dict/words /dev/null

However, the poor performance will be improved partly.  Following case
is returned in 2.4s after patched on my machine.

$ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null

Thanks,
Norihiro

Attachment: 0001-dfa-performance-improvement-for-removal-of-epsilon-c.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]