bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: repeated extended pattern substitution incredibly slow w/large varia


From: Piotr Grzybowski
Subject: Re: repeated extended pattern substitution incredibly slow w/large variables
Date: Sun, 18 Sep 2016 17:21:33 +0200

Hi,

 maybe I do not fully follow your example, but wouldn't you instead of:

          time D="${C//\[+([0-9])\]=}"             # rm '[<subscr>]='

want:

          time D="${C//\[[0-9]*\]=}"               # rm '[<subscr>]='

 your example copies a lot to D and thats what takes time, I guess.

cheers,
pg


On 18 Sep 2016, at 11:32, xaoxx@t-online.de wrote:

> 
> Configuration Information [Automatically generated, do not change]:
> Machine: i686
> OS: linux-gnu
> Compiler: gcc
> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i686' 
> -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu' 
> -DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL 
> -DHAVE_CONFIG_H -DDEBUG -DMALLOC_DEBUG -I.  -I. -I./include -I./lib   -g -O2 
> -Wno-parentheses -Wno-format-security
> uname output: Linux Xaox 4.4.0-tm3 #2 Mon Feb 22 13:26:44 CET 2016 i686 
> GNU/Linux
> Machine Type: i686-pc-linux-gnu
> 
> Bash Version: 4.4
> Patch Level: 0
> Release Status: rc2 / release
> 
> Description:
>       The tests below were performed with 4.4.0-rc2. However, the problem is
>       still present in 4.4.0-release, only execution times are even higher
>       for about 20%.
> 
>       Repeated pattern substitution (here: removal) using an extended pattern
>       and variables of considerable size is incredibly time and cpu consuming.
>       The command that revealed the problem was:
> 
>                D=${C//\[+([0-9])\]=}
> 
>       The variable C contains the output of 'declare -p A', where A is an
>       array with 510 file names and C contains 510 matches. But as can be
>       seen below, also commands like
> 
>               D=${C//u+([a-z])}   or  D=${C//@(usr)}
> 
>       trigger the problem, but _not_ commands like
> 
>               D=${C//usr}         or  D=${C//u[a-z][a-z]}
> 
>       See the test case and statistics below.
> 
>       Of course, the problem is simply solvable be a mini sed(1) script, but
>       every now and then I try comands like the above, because I think that
>       simple tasks should be doable without the aid of external programmes.
>       But in many such cases I must sadly accept that using external programs,
>       especially sed(1), is the quicker method.
>       Additionally I will have to revise my script (a ~100kb font editor)
>       and possibly replace other constructs using extended pattern maching.
> 
> Repeat-By:
>       -----------------------------------------------------------------------
>       declare -a B A=( /usr/share/consolefonts/* ) # column 2: here 510 files
> 
>       # A=( "${A[@]##*/}" )                        # column 3: pure filenames
>       # A=( "${A[@]/*/a}" )                        # column 4: "a"
>       # A=( "${A[@]/*}" )                          # column 5: "" (empty)
> 
>       for matches in {10..500..10}; do
>         B=( "${A[@]:0:matches}" )                # reduce array
>         C=`declare -p B | sed -r "s/^[^=]+=?//"` # rm 'declare -<attr> 
> <name>='
>         time D="${C//\[+([0-9])\]=}"             # rm '[<subscr>]='
>       done
>       ------------------------------------------------------------------------
> 
>       results (all with >99% cpu):
> 
>       number of |  contents of array elements
>       matches   |  size=${#C}  path/file |   file  |  "a"   |  empty
>       ---------------------------------------------------------------
>         10:     |   369 bytes   0.099s   |  0.014s | 0.007s |  0.005s
>         20:     |   900         1.261s   |  0.315s | 0.048s |  0.036s
>         30:     |  1453         5.274s   |  1.538s | 0.168s |  0.134s
>         40:     |  2070        15.030s   |  4.868s | 0.406s |  0.324s
>         50:     |  2655        31.830s   | 10.694s | 0.814s |  0.644s
>         60:     |  3240        56.831s   | 19.203s | 1.423s |  1.130s
>         70:     |  3837        94.022s   | 32.356s | 2.299s |  1.829s
>         80:     |  4384       139.000s   | 47.079s | 3.473s |  2.751s
>         90:     |  4998       204.683s   |         | 4.955s |  3.932s
>        100:     |  5567       283.118s   |         | 6.871s |  5.452s
>        110:     |  6135                  |         | 9.495s |  7.547s
>        120:     |  6664                  |         |        | 10.164s
>        200:     | 15554                  |         |        | 55.529s
> 
>       I was too impatient to wait for the complete array with 510
>       elements to complete.
> 
>       The following test results all belong in column 1 + 2.
> 
>       the command:    time D=`sed -r "s/\[[0-9]+\]=//g"<<<"$C"`
> 
>        510:     | 27137 bytes,  R:0.020 U:0.007 S:0.007 67.66%   ok!
> 
> 
>       other commands:
> 
>               size=${#C}   D=${C//usr}   D=${C//u[a-z][a-z]}
>       --------------------------------------------------------
>        100:    5567 bytes  0.004s             0.004s             ok!
>        200:   11167        0.012s             0.012s
>        300:   16712        0.024s             0.024s
>        400:   21818        0.038s             0.040s
>        500:   26647        0.056s             0.057s
> 
> 
>       but:           D=${C//u+([a-z])}        D=${C//@(usr)}
> 
>         10:                0.136s             0.112s         >99% cpu
>         20:                1.647s             1.078s
>         30:                6.467s             4.014s
>         40:               17.912s            10.886s
>         50:               38.178s            22.391s
> 
>       which seems to indicate that extended pattern matching causes the
>       problem.
> 
>       Please CC answers to me as I am not subscribed to the list.
> 
> 
> 
> 
> ----------------------------------------------------------------
> Gesendet mit Telekom Mail <https://t-online.de/email-kostenlos> - kostenlos 
> und sicher für alle!
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]