bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug: bash 4.2.20 impossibly slow


From: Chet Ramey
Subject: Re: bug: bash 4.2.20 impossibly slow
Date: Fri, 16 Mar 2012 10:01:38 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 3/14/12 2:14 PM, Somchai Smythe wrote:
> Hello,
> 
> I am reporting a problem with performance, not correctness.
> 
> While preparing some examples for a course lecture where I code the
> same algorithm in many languages to compare languages, I ran some code
> and while it was  reasonably quick with ksh, it would just apparently
> hang at 100% cpu in bash.  I finally let it run overnight and it does
> complete correctly in bash, but what takes ksh less than a minute
> takes bash 6 1/2 hours to complete (and keeping one core at 100% the
> entire 6.5 hours) on the same hardware.  I suspect there may be some
> special way to compile bash that I don't know about that maybe works
> with arrays differently, so I reporting this.  I am not subscribed, so
> please cc: me.  I cannot use bashbug since my university blocks
> outgoing mail.  I used exactly the same file unmodified for the tests
> in ksh and bash.  My hope is that bash would be at least 'competitive'
> and complete it without being more than 10x slower.  As it is, I
> cannot use bash in the lecture (for this) since it is only 3 hours
> long and the program won't complete in that amount of time.
> 
> BLS2 $bash --version
> GNU bash, version 4.2.20(2)-release (x86_64-unknown-linux-gnu)
> Copyright (C) 2011 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

        [...]

> The program run was a simple prime number sieve program using an array
> with only 500000 elements:
> 
> #! /bin/bash
> 
> if ((${#}==1))
> then
>   n=${1}
> else
>   n=500000
> fi
> for ((i=1;i<=n;i++))
> do
>   ((a[i]=1))
> done
> for ((i=2;i<=(n/2);i++))
> do
>   for ((j=2;j<=(n/i);j++))
>   do
>      ((k=i*j))
>      ((a[k]=0))
>   done
> done
> for ((i=1;i<=n;i++))
> do
>   if ((a[i]!=0))
>   then
>     printf "%d\n" ${i}
>   fi
> done
> exit 0
> 
> When run like:
> time bash ./sieve.sh
> 
> .....
> 499717
> 499729
> 499739
> 499747
> 499781
> 499787
> 499801
> 499819
> 499853
> 499879
> 499883
> 499897
> 499903
> 499927
> 499943
> 499957
> 499969
> 499973
> 499979
> 
> real    396m9.884s
> user    395m43.102s
> sys     0m8.913s
> 
        [...]

> Computer is a Core2 Duo with 3GB of ram running a pure64bit linux
> distribution with kernel 3.2.9 with gcc 4.5.3 and glibc 2.14.1.
> 
> Both programs get the same answer, so this is not a correctness issue,
> but instead a performance issue.
> 
        [...]

> Experimenting a bit shows me that at 9999 elements bash is still
> reasonably fast, but at 29999 elements it takes:
> 
> real    0m39.077s
> user    0m38.807s
> sys     0m0.150s
> 
> For that, ksh takes:
> 
> real    0m1.631s
> user    0m1.560s
> sys     0m0.007s
> 
> Perhaps that shorter total time still shows the problem dramatically
> enough that runs that size can be used to track down the problem
> without having to wait hours for the test runs.

As I said in my earlier reply, the bash array implementation uses sparse
doubly-linked lists.  Inserts that don't append a value to the end of
the array take O(N) instead of O(1), since you have to search the list
to find the right spot to insert the new element.

There is a lot that can be done to accommodate sequential insertion
patterns, which fits the sieve algorithm pretty well.  It would be
pretty simple, since the array code already maintains a pointer to the
last reference; starting the search for the spot to insert at the last
reference should reduce the search time considerably.

That makes a pretty big difference.  Starting at 9999 elements, but
removing the echos to minimize output:

(bash)
real    0m1.833s
user    0m1.704s
sys     0m0.118s

(bash-4.2.24)
real    0m2.728s
user    0m2.603s
sys     0m0.114s

With 29999 elements:
(bash)
real    0m6.957s
user    0m6.437s
sys     0m0.377s

(bash-4.2.24)
real    0m16.520s
user    0m16.113s
sys     0m0.387s

You start seeing real differences at 99999 elements (for what it's worth,
loading up the array initially takes about 1.2s of that total):

(bash)
real    0m32.329s
user    0m30.845s
sys     0m1.376s

(bash-4.2.24)
real    2m46.972s
user    2m45.095s
sys     0m1.590s

At this point I quit testing bash-4.2.24; I don't have *that* much time.
The rest of these are just with the modified development sources, and not
in a really rigorous way, since I was using the machine for other things
at the time.

299999 elements:
real    3m24.318s
user    3m19.463s
sys     0m4.768s

399999 elements:
real    5m27.585s
user    5m20.870s
sys     0m6.511s

499999 elements:
real    8m9.997s
user    8m0.963s
sys     0m8.553s

It's never going to be O(1) unless I rewrite the whole array module, and
I'm not saying that the bash builtin malloc's memory allocation patterns
don't have an effect, but small changes (which I've attached) can make a
big difference.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/

Attachment: array-lastref.patch
Description: Source code patch


reply via email to

[Prev in Thread] Current Thread [Next in Thread]