[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: using mapfile is extreamly slow compared to oldfashinod ways to read
From: |
Lennart Schultz |
Subject: |
Re: using mapfile is extreamly slow compared to oldfashinod ways to read files |
Date: |
Fri, 27 Mar 2009 12:54:29 +0100 |
Chris,
I agree with you to use the right tool at the right time, and mapfile seems
not to be the right tool for my problem, but I will just give you some facts
of my observations:
using a fast tool like egrep just to find a simple string in my datafile
gives the following times:
time egrep '<pro' >/dev/null < dr.xml
real 0m54.628s
user 0m27.310s
sys 0m0.036s
My original bash script :
time xml2e2-loadepg
real 1m53.264s
user 1m22.145s
sys 0m30.674s
While the questions seems to go on spawning subshells and the cost I have
checked my script
it is only calling one external command is date which in total is called a
little less than 20000 times. I have just for this test changed the call of
date to an assignment of an constant. and now it looks:
time xml2e2-loadepg
real 1m3.826s
user 1m2.700s
sys 0m1.004s
I also made the same change to the version of the program using mapfile, and
changed line=$(echo $i) to
line=${i##+([[:space:]])}
so the mainloop is absolulty without any sub shell spawns:
time xml2e2-loadepg.new
real 65m2.378s
user 63m16.717s
sys 0m1.124s
Lennart
2009/3/26 Chris F.A. Johnson <cfaj@freeshell.org>
> On Thu, 26 Mar 2009, Lennart Schultz wrote:
>
> I have a bash script which reads about 250000 lines of xml code generating
>> about 850 files with information extracted from the xml file.
>> It uses the construct:
>>
>> while read line
>> do
>> case "$line" in
>> ....
>> done < file
>>
>> and this takes a little less than 2 minutes
>>
>> Trying to use mapfile I changed the above construct to:
>>
>> mapfile < file
>> for i in "${MAPFILE[@]}"
>> do
>> line=$(echo $i) # strip leading blanks
>> case "$line" in
>> ....
>> done
>>
>> With this change the job now takes more than 48 minutes. :(
>>
>
> As has already been suggested, the time it almost certainly taken
> up in the command substitution which you perform on every line.
>
> If you want to remove leading spaces, it would be better to use a
> single command to do that before reading with mapfile, e,g,:
>
> mapfile < <(sed 's/^ *//' file)
>
> If you want to remove trailing spaces as well:
>
> mapfile < <(sed -e 's/^ *//' -e 's/ *$//' file)
>
> Chet, how about an option to mapfile that strips leading and/or
> trailing spaces?
>
> Another useful option would be to remove newlines.
>
> --
> Chris F.A. Johnson, webmaster <http://woodbine-gerrard.com>
> ========= Do not reply to the From: address; use Reply-To: ========
> Author:
> Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
>