[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: compgen is slow for large numbers of options

From: Chet Ramey
Subject: Re: compgen is slow for large numbers of options
Date: Mon, 19 Mar 2012 16:18:33 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 3/14/12 1:44 PM, Richard Neill wrote:
> Dear All,
> I don't know for certain if this is a bug per se, but I think
> "compgen -W" is much slower than it "should" be in the case of a large
> (10000+) number of options.
> For example (on a fast i7 2700 CPU), I measure:
> compgen -W "`seq 1 50000`" 1794       #3.83 s
> compgen -W "`seq 1 50000 | grep 1794`"   #0.019 s
> In these examples, I'm using `seq` as a trivial way to generate some data,
> and picking 1794 as a totally arbitrary match.
> In the first example, compgen is doing the filtering, whereas in the 2nd, I
> obtain the same result very much faster with grep.
> If I increase the upper number by a factor of 10, to 500000, these times
> become,  436 s (yes, really, 7 minutes!) and 0.20 s respectively.  This
> suggests that the algorithm used by compgen is O(n^2) whereas the algorithm
> used by grep is 0(1).

You've identified a problem, but your analysis is incomplete.  The search
algorithm isn't the problem -- it's simply a linked list traversal that
does an initial substring match.

The function (pcomplete.c:gen_wordlist_matches()) spends more than 60% of
its time generating that linked list.  It splits the string argument to -W
into words at $IFS characters, but it differs from standard shell word
splitting in that it splits everything (not just the results of expansion),
understands embedded quoted strings, and understands embedded shell
expansions.  Though word splitting itself is costly when you're splitting a
300K character string, this function is slower.

I will look at optimizing that function, but it's always going to take time
to plow through 300K when you have to split it into words.  (There's not
actually any word splitting of consequence happening with your second
example using the pipeline.)

I've attached a script I used to identify the costs associated with the
various parts of the shell word expansions and compgen's behavior.  The
rest I had to build a profiling version of bash to analyze.

``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    address@hidden    http://cnswww.cns.cwru.edu/~chet/

Attachment: x1
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]