bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#22357: grep -f not only huge memory usage, but also huge time co


From: Norihiro Tanaka
Subject: Re: bug#22357: grep -f not only huge memory usage, but also huge time cost
Date: Tue, 20 Dec 2016 07:38:34 +0900

On Sun, 18 Dec 2016 23:48:10 -0800
Paul Eggert <address@hidden> wrote:

> >> 'delete' is
> >> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2).
> >> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3).
> >> I haven't looked into how likely the worst-case performance is, though.
> 
> > Yes.  It is same both before and after with my proposed patch, but It
> > seems that memset() for VISITED causes slowdown in before code.
> 
> I looked into it some more. Your patch looks like a win so I
> installed it; thanks. Also, I improved the worst-case performance for
> 'replace' from O(N*(N + log N)) to O(N log N), by installing the
> attached patch. I also bumped grep's gnulib version so that it gets
> this patch.
> 
> This doesn't address the main problem in this area, as 'grep' is
> still pretty slow with lots of multiple patterns. However, one step
> at a time.

Thanks.

BTW, What case do you the worst?  One more I think previous 'replace' is
not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) .

> This doesn't address the main problem in this area, as 'grep' is still
> pretty slow with lots of multiple patterns. However, one step at a time.

Yes, I think slowness with lots of multiple patterns is caused by two
reasons mainly.

 1. dfa is not optimize parsed patterns.

    For example, dfa does not replace pattern 'a \|a \|a \| ... \|a \|b'
    to 'a \|b'.

    I tried it several times, have not succeeded it yet.

 2. grep matcher compiles regex always to check syntax of a pattern,
    even when it is not used in searching.

    I also tried it, have not succeeded it yet.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]