libtool
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bash-specific performance by avoiding sed


From: Pavel Raiskup
Subject: Re: Bash-specific performance by avoiding sed
Date: Fri, 09 Oct 2015 17:16:03 +0200
User-agent: KMail/4.14.10 (Linux/4.2.2-300.fc23.x86_64+debug; KDE/4.14.11; x86_64; ; )

On Wednesday 07 of October 2015 08:03:01 Eric Blake wrote:
> On 10/07/2015 06:28 AM, Pavel Raiskup wrote:
> 
> > .. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s.  It's not
> > completely back yet but it's much better than v2.4.6.
> 
> Thanks again for working on this.

Thanks for perfect review, Eric.

> > +        case $func_quote_result in
> > +          *'*'*|*'['*)
> 
> Problem: shouldn't this also include *'?'* to avoid globbing a single
> character?

Oh, sorry for this - and thanks, I'll fix that!

> Could probably be written *[\[\*\?]*)
> 
> The string itself contains globs, so...

Ok.

> > +            ;;
> > +        esac
> > +
> > +        func_quote_old_IFS=$IFS
> > +        for _G_char in '\' '`' '"' '$'
> > +        do
> > +          # STATE($1) PREV($2) SEPARATOR($3)
> > +          set start "" ""
> > +          func_quote_result=dummy"$_G_char$func_quote_result$_G_char"dummy
> > +          IFS=$_G_char
> > +          for _G_part in $func_quote_result
> > +          do
> > +            case $1 in
> > +            quote)
> > +              func_append func_quote_result "$3$2"
> > +              set quote "$_G_part" "\\$_G_char"
> > +              ;;
> > +            start)
> > +              set first "" ""
> > +              func_quote_result=
> > +              ;;
> > +            first)
> > +              set quote "$_G_part" ""
> > +              ;;
> > +            esac
> > +          done
> > +          IFS=$func_quote_old_IFS
> > +        done
> > +        ;;
> 
> The string does not contain globs, so do four linear passes that escape
> each problem character.

To be honest, in shells without '+=' operator, these are four O(N^2)
passes..  But still very acceptable parabola IMO.

Firstly I tried to do this without this ugly state machine -- using $@
array to store strings.  Very roughly:

  for _G_char in '\' '`' ...
  do
    IFS=$_G_char
    set dummy
    for _G_part $func_quote_result
    do
      set "$@" "\\$_G_char$_G_part"
      ...
    done
    shift
    # _now join $@ into one string_
  done

... but that ended in quadratic complexity _everywhere_.  I was curious
why and it was because of the 'set "$@"' command being longer and longer
for strings like '"""""""""""""""""""...'.

> converts 'a"b' into 'a\"b', then caller adds outer quotes to become
> '"a\"b"'.
> 
> > +      *) ;;
> > +    esac
> > +}
> 
> Looks correct.  However, is it also worth attempting a shell-specific
> speedup?  After all, func_append uses shell-specific speedups (it is NOT
> as fast on shells that don't support +=).

Yes.  We could do that (at least in future if not in this thread).

> That is, the above function appears to be portable to all shells, but if
> we detect that a shell supports printf -v %q, can we use that instead
> for some additional speed?  Of course, printf -v %q in bash prefers
> output that does NOT embed inside "" (it quotes ALL shell metacharacters
> using backslash), so we'd have to rework the contract of the function.

Well, I was thinking about 'printf %q' firstly, but I rejected that idea
because I missed the point that bash's printf has '-v' option (thus we can
avoid forking!).  Thanks for this point.

> That is, instead of the current function which returns a ready-escaped
> string but no outer "", we would instead be returning a string that
> already includes all necessary quoting.  Which would mean rewriting all
> callers :(  For example, we'd have to figure out what to do for
> func_quote_for_eval, which returns two values -
> func_quote_for_eval_result being fully quoted is easy with printf -v %q,
> and func_quote_for_eval_unquoted_result which is not.

Yes, rewriting all callers would raise a chance of my mistake.  So I'm not
really in favour of this dangerous action :) ..  it would be just Bash
speedup anyway (even zsh does not have 'printf -v').

Also, we would have to be careful where this optimized function would be
used -- e.g. 'sed_quote_subst' has been historically used to generate
'*.la' files.  We should keep that format as is -- so we would have to
have two versions of func_quote (one possibly with printf %q, one
producing the old output).

> Here's a (lightly-tested) idea of what it would look like, where we'd
> have to audit every caller to deal with the result already including
> full quoting:
> 
> if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then
>   func_quote()
>   {
>     printf -v func_quote_result %q "$1"
>   }
> else
>   func_quote()
>   {
>     portable version, except add:
>     func_quote_result="\"$func_quote_result\""
>   }
> fi
>
> Note that with this variant, the portable version converts 'a  *"b' into
> '"a  *\"b"', while the bash version converts it into 'a\ \ \*\"b'.

Thanks!  I've done a simple testing too .. and it seems to be equivalent
to '$SED $sed_quote_subst' (in usecases like: string -> quote -> eval).

Well -- I'm not 100% sure I want to hack this in and risk some issues.  On
the other hand, I'm able to review the patches if somebody wanted to give
it a try (the small bash slowdown from v2.4.2 to v2.4.6 is not good
motivation for me).

Much more interesting (than fighting this quotation issue) would be an
attempt to standardize some shell builtin which would have the same
effects as 'func_quote_for_eval' (because returning "$@" array unchanged
up to caller function is not an easy task).  If that was available, we
could easily (conditionally) reuse that...  (some shell hackers could
implement binary simulating this builtin..., etc.)...

... while I'm on it -- we could avoid a lot of calls to func_quote and
func_quote_for_eval if we were able to _return arrays_ (or set global
array variables).  Because some shells can do this, maybe we could detect
this feature and return $<HOOK_NAME>_result_array instead of
quoted-for-eval $<HOOK_NAME>_result?

> > @@ -8596,8 +8597,8 @@ EOF
> >         relink_command="$var=$func_quote_for_eval_result; export $var; 
> > $relink_command"
> >       fi
> >     done
> > -   relink_command="(cd `pwd`; $relink_command)"
> > -   relink_command=`$ECHO "$relink_command" | $SED "$sed_quote_subst"`
> > +   func_quote "(cd `pwd`; $relink_command)"
> > +   relink_command=$func_quote_result
> 
> Unrelated to your patch, but doesn't this fail if pwd is called in an
> absolute directory containing spaces?

Yes, good catch.  This should be fixed.

Pavel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]