[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#20006: Bash-specific performance by avoiding sed
From: |
Pavel Raiskup |
Subject: |
bug#20006: Bash-specific performance by avoiding sed |
Date: |
Wed, 07 Oct 2015 14:28:30 +0200 |
User-agent: |
KMail/4.14.10 (Linux/4.2.2-300.fc23.x86_64+debug; KDE/4.14.11; x86_64; ; ) |
On Monday 05 of October 2015 15:28:56 Pavel Raiskup wrote:
> On Monday 05 of October 2015 09:47:05 Pavel Raiskup wrote:
> > On Monday 05 of October 2015 01:25:24 Pavel Raiskup wrote:
> > > On Monday 05 of October 2015 00:45:50 Pavel Raiskup wrote:
> > > > > should we test the size of the string first ? i've written such raw
> > > > > shell
> > > > > string parsing functions before, and once you hit a certain size
> > > > > (like 1k+
> > > > > iirc), forking out to sed is way faster, especially when running in
> > > > > multibyte
> > > > > locales (like UTF8) which most people are doing nowadays.
> > > > > -mike
> > > >
> > > > Well, that optimization would require (fast) strlen()-like construct.
> > > > Anyway, the vast majority of calls to func_quote () function will have
> > > > short ARG, and its complexity is still "just" linear. We could optimize
> > > > later if that was a real issue.
> > > >
> > > > I would like to propose solution based on Eric's one, without using of
> > > > '${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even
> > > > more portable while it keeps almost the same speed (if we can use += its
> > > > even faster).
> > > >
> > > > I have yet a another patch trying to minimize option-parser overhead
> > > > (that is focused on the POV of Richard, but that needs to be cleaned up
> > > > a
> > > > bit, I'll post hopefully tomorrow).
> > > >
> > > > Any comment is welcome!
> > >
> > > Re-attached (fixes for 'make syntax-check' and fixed one comment).
> >
> > Hmm, one might-be-a-problem with this (catched by testsuite), when you
> > have:
> >
> > $ cat build-aux/test-quoting
> > . `echo "$0" |${SED-sed} 's|[^/]*$||'`/funclib.sh
> > # source this for "GNU m4" detection methods
> > . `echo "$0" |${SED-sed} 's|[^/]*$||'`/extract-trace
> >
> > func_quote_for_eval "$@"
> > echo "$func_quote_for_eval_result"
> >
> > Then:
> >
> > $ ./build-aux/test-quoting '"a b"' # fine
> > "\"a b\""
> >
> > $ ./build-aux/test-quoting '"*tool"' # broken
> > ./build-aux/test-quoting '"*tool"'
> > \"libtool\"
> >
> > We would like to have an output \"*\". I'm not aware of portable way
> > how to disable wildcard expansion in shell, and autoconf 'Shellology'
> > section haven't helped me. In particular, the problem is here:
> >
> > x='a"[a-z]*"c'
> > IFS='"'
> > for i in $x; do # Here we wan't to disable wildcard expansion
> > echo $i
> > done
> >
> > Any idea other than fallback to $sed_quote_subst in case of '*' or '['
> > exists in ARG?
>
> Attaching two (yet to be cleaned) patches doing the optimization. Is
> anybody able to test/comment on this particular solution? That would be
> really appreciated.
The cleaned patches are attached. I would like to push those very soon,
probably before weekend. If you see any issues worth holding this change,
please let me know soon, thanks!
FWIW, some numbers (systemd.git build time, right after 'make clean'):
The old libtool v2.4.2:
$ time make -j5
real 2m3.163s
user 3m54.849s
sys 3m28.684s
The latest released libtool v2.4.6:
$ time make -j5 LIBTOOL=/usr/bin/libtool
real 8m24.604s
user 9m56.977s
sys 19m45.620s
The patched git libtool:
$ time make -j5 LIBTOOL=~/rh/projects/libtool/libtool
real 2m34.682s
user 6m37.158s
sys 2m21.123s
.. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s. It's not
completely back yet but it's much better than v2.4.6.
Pavel
0001-libtool-mitigate-the-sed_quote_subst-slowdown.patch
Description: Text Data
0002-libtool-optimizing-options-parser-hooks.patch
Description: Text Data