bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug: bash 4.2.20 impossibly slow


From: Somchai Smythe
Subject: Re: bug: bash 4.2.20 impossibly slow
Date: Mon, 19 Mar 2012 03:52:48 +0700

On 3/16/12, Chet Ramey <chet.ramey@case.edu> wrote:
> On 3/14/12 2:14 PM, Somchai Smythe wrote:
>> Hello,
>>
>> I am reporting a problem with performance, not correctness.
>>
>> While preparing some examples for a course lecture where I code the
>> same algorithm in many languages to compare languages, I ran some code
>> and while it was  reasonably quick with ksh, it would just apparently
>> hang at 100% cpu in bash.  I finally let it run overnight and it does
>> complete correctly in bash, but what takes ksh less than a minute
>> takes bash 6 1/2 hours to complete (and keeping one core at 100% the
>> entire 6.5 hours) on the same hardware.  I suspect there may be some
>> special way to compile bash that I don't know about that maybe works
>> with arrays differently, so I reporting this.  I am not subscribed, so
>> please cc: me.  I cannot use bashbug since my university blocks
>> outgoing mail.  I used exactly the same file unmodified for the tests
>> in ksh and bash.  My hope is that bash would be at least 'competitive'
>> and complete it without being more than 10x slower.  As it is, I
>> cannot use bash in the lecture (for this) since it is only 3 hours
>> long and the program won't complete in that amount of time.
>>
>> BLS2 $bash --version
>> GNU bash, version 4.2.20(2)-release (x86_64-unknown-linux-gnu)
>> Copyright (C) 2011 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>
>       [...]
>
>> The program run was a simple prime number sieve program using an array
>> with only 500000 elements:
>>
>> #! /bin/bash
>>
>> if ((${#}==1))
>> then
>>   n=${1}
>> else
>>   n=500000
>> fi
>> for ((i=1;i<=n;i++))
>> do
>>   ((a[i]=1))
>> done
>> for ((i=2;i<=(n/2);i++))
>> do
>>   for ((j=2;j<=(n/i);j++))
>>   do
>>      ((k=i*j))
>>      ((a[k]=0))
>>   done
>> done
>> for ((i=1;i<=n;i++))
>> do
>>   if ((a[i]!=0))
>>   then
>>     printf "%d\n" ${i}
>>   fi
>> done
>> exit 0
>>
>> When run like:
>> time bash ./sieve.sh
>>
>> .....
>> 499717
>> 499729
>> 499739
>> 499747
>> 499781
>> 499787
>> 499801
>> 499819
>> 499853
>> 499879
>> 499883
>> 499897
>> 499903
>> 499927
>> 499943
>> 499957
>> 499969
>> 499973
>> 499979
>>
>> real    396m9.884s
>> user    395m43.102s
>> sys     0m8.913s
>>
>       [...]
>
>> Computer is a Core2 Duo with 3GB of ram running a pure64bit linux
>> distribution with kernel 3.2.9 with gcc 4.5.3 and glibc 2.14.1.
>>
>> Both programs get the same answer, so this is not a correctness issue,
>> but instead a performance issue.
>>
>       [...]
>
>> Experimenting a bit shows me that at 9999 elements bash is still
>> reasonably fast, but at 29999 elements it takes:
>>
>> real    0m39.077s
>> user    0m38.807s
>> sys     0m0.150s
>>
>> For that, ksh takes:
>>
>> real    0m1.631s
>> user    0m1.560s
>> sys     0m0.007s
>>
>> Perhaps that shorter total time still shows the problem dramatically
>> enough that runs that size can be used to track down the problem
>> without having to wait hours for the test runs.
>
> As I said in my earlier reply, the bash array implementation uses sparse
> doubly-linked lists.  Inserts that don't append a value to the end of
> the array take O(N) instead of O(1), since you have to search the list
> to find the right spot to insert the new element.
>
> There is a lot that can be done to accommodate sequential insertion
> patterns, which fits the sieve algorithm pretty well.  It would be
> pretty simple, since the array code already maintains a pointer to the
> last reference; starting the search for the spot to insert at the last
> reference should reduce the search time considerably.
>
> That makes a pretty big difference.  Starting at 9999 elements, but
> removing the echos to minimize output:
>
> (bash)
> real  0m1.833s
> user  0m1.704s
> sys   0m0.118s
>
> (bash-4.2.24)
> real  0m2.728s
> user  0m2.603s
> sys   0m0.114s
>
> With 29999 elements:
> (bash)
> real  0m6.957s
> user  0m6.437s
> sys   0m0.377s
>
> (bash-4.2.24)
> real  0m16.520s
> user  0m16.113s
> sys   0m0.387s
>
> You start seeing real differences at 99999 elements (for what it's worth,
> loading up the array initially takes about 1.2s of that total):
>
> (bash)
> real  0m32.329s
> user  0m30.845s
> sys   0m1.376s
>
> (bash-4.2.24)
> real  2m46.972s
> user  2m45.095s
> sys   0m1.590s
>
> At this point I quit testing bash-4.2.24; I don't have *that* much time.
> The rest of these are just with the modified development sources, and not
> in a really rigorous way, since I was using the machine for other things
> at the time.
>
> 299999 elements:
> real  3m24.318s
> user  3m19.463s
> sys   0m4.768s
>
> 399999 elements:
> real  5m27.585s
> user  5m20.870s
> sys   0m6.511s
>
> 499999 elements:
> real  8m9.997s
> user  8m0.963s
> sys   0m8.553s
>
> It's never going to be O(1) unless I rewrite the whole array module, and
> I'm not saying that the bash builtin malloc's memory allocation patterns
> don't have an effect, but small changes (which I've attached) can make a
> big difference.
>
> Chet

Thank you very much for the patch; it makes a very dramatic difference
for this use case.  In my testing, it went from 6.5 hours to just over
20 minutes.

I guess this wasn't the default since it may not be great for random
access, but I hope that it is at least available as a compile-time
option in future official bash releases.  For now, I'll apply this
array access speedup patch to our packages.

John Gatewood Ham
Lecturer
Burapha University
Bang Saen, Chonburi
Thailand



reply via email to

[Prev in Thread] Current Thread [Next in Thread]