bug#20006: Bash-specific performance by avoiding sed

bug-libtool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20006: Bash-specific performance by avoiding sed

From:	Pavel Raiskup
Subject:	bug#20006: Bash-specific performance by avoiding sed
Date:	Mon, 05 Oct 2015 00:45:50 +0200
User-agent:	KMail/4.14.10 (Linux/4.2.1-300.fc23.x86_64+debug; KDE/4.14.11; x86_64; ; )

forcemerge 20006 20005
thanks

On Monday 09 of March 2015 18:04:34 Mike Frysinger wrote:
> On 09 Mar 2015 14:48, Eric Blake wrote:
> > On 03/09/2015 01:50 PM, Bob Friesenhahn wrote:
> > > On Mon, 9 Mar 2015, Mike Gran wrote:
> > >> I don't know if y'all saw this blogpost where a guy pushed
> > >> the sed regular expression handling into bash-specific
> > >> regular expressions when bash was available.  He claims
> > >> there's a significant performance improvement because of
> > >> reduced forking.
> > >>
> > >> http://harald.hoyer.xyz/2015/03/05/libtool-getting-rid-of-180000-sed-forks/
> > > 
> > > There is an issue in the libtool bug tracker regarding this.
> > > 
> > > This solution only works with GNU bash.  It would be good if volunteers
> > > could research to see if there are similar solutions which can work with
> > > other common shells (e.g. dash, ksh, zsh).
> > 
> > For context, we're trying to speed up:
> > 
> > sed_quote_subst='s|\([`"$\\]\)|\\\1|g'
> > _G_unquoted_arg=`printf '%s\n' "$1" |$SED "$sed_quote_subst"`
> > 
> > How about this, which should be completely portable to XSI shells (alas,
> > it still uses ${a#b} and ${a%b} at the end, so it is not portable to
> > ancient Solaris /bin/sh):
> > 
> > # func_quote STRING
> > # Escapes all \`"$ in STRING with another \, and stores that in $quoted
> > func_quote () {
> >   case $1 in
> >     *[\\\`\"\$]*)
> >       save_IFS=$IFS pre=.$1.
> >       for char in '\' '`' '"' '$'; do
> >         post= IFS=$char
> >         for part in $pre; do
> >           post=${post:+$post\\$char}$part
> >         done
> >         pre=$post
> >       done
> 
> should we test the size of the string first ?  i've written such raw shell 
> string parsing functions before, and once you hit a certain size (like 1k+ 
> iirc), forking out to sed is way faster, especially when running in multibyte 
> locales (like UTF8) which most people are doing nowadays.
> -mike

Well, that optimization would require (fast) strlen()-like construct.
Anyway, the vast majority of calls to func_quote () function will have
short ARG, and its complexity is still "just" linear.  We could optimize
later if that was a real issue.

I would like to propose solution based on Eric's one, without using of
'${VAR%.}' and '${VAR#.}' constructs -- sounds like this could be even
more portable while it keeps almost the same speed (if we can use += its
even faster).

I have yet a another patch trying to minimize option-parser overhead
(that is focused on the POV of Richard, but that needs to be cleaned up a
bit, I'll post hopefully tomorrow).

Any comment is welcome!
Pave

0001-libtool-mitigate-the-sed_quote_subst-slowdown.patch
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

bug#20006: Bash-specific performance by avoiding sed, Pavel Raiskup <=
- bug#20006: Bash-specific performance by avoiding sed, Pavel Raiskup, 2015/10/08
  - bug#20006: Bash-specific performance by avoiding sed, Pavel Raiskup, 2015/10/08
    - bug#20006: Bash-specific performance by avoiding sed, Pavel Raiskup, 2015/10/08
    - bug#20006: Bash-specific performance by avoiding sed, Pavel Raiskup, 2015/10/08
    - bug#20006: Bash-specific performance by avoiding sed, Eric Blake, 2015/10/08
    - bug#20006: Bash-specific performance by avoiding sed, Pavel Raiskup, 2015/10/10
    - bug#20006: Bash-specific performance by avoiding sed, Pavel Raiskup, 2015/10/12
    - bug#20006: Bash-specific performance by avoiding sed, Eric Blake, 2015/10/08

Prev by Date: bug#20006: Bash-specific performance by avoiding sed
Next by Date: bug#20006: Bash-specific performance by avoiding sed
Previous by thread: bug#21607: [GNU Libtool 2.4.6] testsuite: 7 8 13 14 28 29 30 31 32 34 35 36 37 44 46 50 51 52 53 56 57 58 62 66 67 69 71 73 77 78 81 82 83 84 86 87 90 96 97 98 100 103 105 106 109 110 112 115 117 119 120 122 126 130 145 146 150 151 153 154 155 169 170 failed
Next by thread: bug#20006: Bash-specific performance by avoiding sed
Index(es):
- Date
- Thread