libtool
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bash-specific performance by avoiding sed


From: Eric Blake
Subject: Re: Bash-specific performance by avoiding sed
Date: Wed, 7 Oct 2015 08:03:01 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 10/07/2015 06:28 AM, Pavel Raiskup wrote:

> .. so it is (2.4.6 vs. 2.4.7~dev, user+sys) 7m23.5s vs 8m58.3s.  It's not
> completely back yet but it's much better than v2.4.6.

Thanks again for working on this.

>  
> +# func_quote ARG
> +# --------------
> +# Aesthetically quote one ARG, store the result into $func_quote_result.  
> Note
> +# that we keep attention to performance here (so far O(N) complexity as long 
> as
> +# func_append is O(1)).
> +func_quote ()
> +{
> +    $debug_cmd
> +
> +    func_quote_result=$1

Common case - nothing needs escaping.  Converts 'abc' to 'abc', as well
as 'a  b' to 'a  b'.  The caller can still blindly add outer "" for a
printed form '"abc"' (unnecessary but harmless ""), or for '"a  b"'
(necessary in that case).


> +
> +    case $func_quote_result in
> +      *[\\\`\"\$]*)

Something needs escaping before being placed in double quotes.

> +        case $func_quote_result in
> +          *'*'*|*'['*)

Problem: shouldn't this also include *'?'* to avoid globbing a single
character?

Could probably be written *[\[\*\?]*)

The string itself contains globs, so...

> +            func_quote_result=`$ECHO "$func_quote_result" | $SED 
> "$sed_quote_subst"`
> +            return 0

...rather than worry about set +f, we just use sed in this rare case.
Converts 'a*b' into 'a*b', converts 'a*"b' into 'a*\"b'.  The caller
then supplies outer "", for '"a*b"' or '"a*\"b"' (outer quotes necessary
in both cases).

> +            ;;
> +        esac
> +
> +        func_quote_old_IFS=$IFS
> +        for _G_char in '\' '`' '"' '$'
> +        do
> +          # STATE($1) PREV($2) SEPARATOR($3)
> +          set start "" ""
> +          func_quote_result=dummy"$_G_char$func_quote_result$_G_char"dummy
> +          IFS=$_G_char
> +          for _G_part in $func_quote_result
> +          do
> +            case $1 in
> +            quote)
> +              func_append func_quote_result "$3$2"
> +              set quote "$_G_part" "\\$_G_char"
> +              ;;
> +            start)
> +              set first "" ""
> +              func_quote_result=
> +              ;;
> +            first)
> +              set quote "$_G_part" ""
> +              ;;
> +            esac
> +          done
> +          IFS=$func_quote_old_IFS
> +        done
> +        ;;

The string does not contain globs, so do four linear passes that escape
each problem character.

converts 'a"b' into 'a\"b', then caller adds outer quotes to become
'"a\"b"'.

> +      *) ;;
> +    esac
> +}

Looks correct.  However, is it also worth attempting a shell-specific
speedup?  After all, func_append uses shell-specific speedups (it is NOT
as fast on shells that don't support +=).

That is, the above function appears to be portable to all shells, but if
we detect that a shell supports printf -v %q, can we use that instead
for some additional speed?  Of course, printf -v %q in bash prefers
output that does NOT embed inside "" (it quotes ALL shell metacharacters
using backslash), so we'd have to rework the contract of the function.

That is, instead of the current function which returns a ready-escaped
string but no outer "", we would instead be returning a string that
already includes all necessary quoting.  Which would mean rewriting all
callers :(  For example, we'd have to figure out what to do for
func_quote_for_eval, which returns two values -
func_quote_for_eval_result being fully quoted is easy with printf -v %q,
and func_quote_for_eval_unquoted_result which is not.

Here's a (lightly-tested) idea of what it would look like, where we'd
have to audit every caller to deal with the result already including
full quoting:

if test yes = `(x=; printf -v x %q yes; echo $x) 2>/dev/null`; then
  func_quote()
  {
    printf -v func_quote_result %q "$1"
  }
else
  func_quote()
  {
    portable version, except add:
    func_quote_result="\"$func_quote_result\""
  }
fi

Note that with this variant, the portable version converts 'a  *"b' into
'"a  *\"b"', while the bash version converts it into 'a\ \ \*\"b'.

> @@ -8596,8 +8597,8 @@ EOF
>           relink_command="$var=$func_quote_for_eval_result; export $var; 
> $relink_command"
>         fi
>       done
> -     relink_command="(cd `pwd`; $relink_command)"
> -     relink_command=`$ECHO "$relink_command" | $SED "$sed_quote_subst"`
> +     func_quote "(cd `pwd`; $relink_command)"
> +     relink_command=$func_quote_result

Unrelated to your patch, but doesn't this fail if pwd is called in an
absolute directory containing spaces?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]