Re: devel: Questions about quoting in the new replacement ${var/pat/&}

From: Chet Ramey
Subject: Re: devel: Questions about quoting in the new replacement ${var/pat/&}
Date: Mon, 11 Oct 2021 12:08:04 -0400
On 10/5/21 4:41 AM, Koichi Murase wrote:
> I have questions on the new feature ${var/pat/&} in the devel branch.
>> commit f188aa6a013e89d421e39354086eed513652b492 (upstream/devel)
>> Author: Chet Ramey <chet.ramey@case.edu>
>> Date:   Mon Oct 4 15:30:21 2021 -0400
>>     enable support for using `&' in the pattern substitution replacement 
>> string
>> Any unquoted instances of & in STRING are replaced with the matching
>> portion of PATTERN.  Backslash is used to quote & in STRING; the
>> backslash is removed in order to permit a literal & in the
>> replacement string.  Users should take care if STRING is
>> double-quoted to avoid unwanted interactions between the backslash
>> and double-quoting.  Pattern substitution performs the check for &
>> after expanding STRING; shell programmers should quote backslashes
>> intended to escape the & and inhibit replacement so they survive any
>> quote removal performed by the expansion of STRING.
> I would very much like this change introduced in the latest commit
> f188aa6a in devel as it would enable many more string manipulations
> with a simple construct, but I feel the current treatment of quoting
> has problems:
> 1. There is no way to specify an arbitrary string in replacement in a
>   way that is compatible with both bash 5.1 and 5.2.

It's a change that assigns meaning to a character that was previously
valid, not an error. It's probably going to require a shell option.

> 2. There is no way to insert a backslash before the matched part
>   (which I'd think would be one of the typical usages of &).

This is quite reasonable, and a minor change. If the replacement function
treats backslash specially by allowing it to quote `&', it should also
allow it to escape a backslash.

> I below describe the details of each, followed by my suggestion or
> discussion on an alternative design.
> ----------------------------------------------------------------------
> 1. How to specify an arbitrary string in replacement copatibly with
> both bash 5.1 and 5.2?
> Currently any & in the replacement is replaced by the matched part
> regardless of whether & is quoted in the parameter-expansion context
> or not.  Even the result of the parameter expansions and other
> substitutions are subject to the special treatment of &, which makes
> it non-trivial to specify an arbitrary string to the replacement
> ${var/pat/rep}.

The documentation goes into this in some detail, including specifying the
expansions that REP undergoes.

>   $ str='X&Y&Z' pat='Y' rep='A&B'
>   $ echo ${str/$pat/XXXX}
>   X&A&B&Z
> where XXXX is some string that represents the literal "$rep" (i.e.,
> 'A&B').  A naive quoting of "$rep" does not work:
>   $ echo "1:${str/$pat/"$rep"}"
>   1:X&AYB&Z

Wouldn't it be better to treat it in the standard way a double-quoted
parameter expansion would be treated? The double-quoted expansion is
already well-specified. People know how to get a backslash through
double quoting, even in a context, like this one, where quote removal
is performed.

> I would have expected it to work because $pat will lose special
> meaning and be treated literally when it is quoted as "$pat". 
> example, the glob patterns *?[ etc. and anchors # and % in $pat will
> lose its special meaning when it is quoted:
>   $ v='A' p='?'; echo "${v/$p/B}"; echo "${v/"$p"/B}"
>   B
>   A
>   $ v='A' p='#'; echo "${v/$p/B}"; echo "${v/"$p"/B}"
>   BA
>   A
>   $ v='A' p='%'; echo "${v/$p/B}"; echo "${v/"$p"/B}"
>   AB
>   A
> Of course, if $rep is not quoted, & in $rep is replaced by the matched
> part.
>   $ echo "2:${str/$pat/$rep}"
>   2:X&AYB&Z
> * To properly specify an arbitrary string in the replacement, one
>   needs to replace all the characters.
>   $ echo "${str/$pat/${rep//&/\\\\&}}"
> * When the replacement is not stored in a variable, one needs to
>   create a variable for the replacement, i.e.,
>   $ echo "${str/$pat/$(something)}"
>   in Bash 5.1 needs to be converted to
>   $ tmp=$(something)
>   $ echo "${str/$pat/${tmp//&/\\\\&}}"
>   in Bash 5.2.
> * Also, there is no way of writing it so that it works in both Bash
>   5.1 and 5.2.  To make it work, one needs to switch the code
>   depending on the bash version as:
>   if ((BASH_VERSINFO[0]*10000+BASH_VERSINFO[1]*100>=50200)); then
>     echo "${str/$pat/${rep//&/\\\\&}}"
>   else
>     echo "${str/$pat/$rep}"
>   fi
>   [ Note: this does not work for the devel branch because the devel
>   branch still has the version 5.1. ]
> ----------------------------------------------------------------------
> 2. How to insert a literal backslash before the matched part?
> Another problem is that one cannot put a literal backslash just before
> & without affecting the meaning of &.  Currently if there is any
> backslash before &, & will lose the special meaning and the two
> characters '\&' become '&' after the replacement.

I agree that just as \& allows a literal `&', \\ should be a literal

> ----------------------------------------------------------------------
> Suggestion / Discussion
> I suggest that '&' has the meaning of the matched part only when it is
> not quoted in the parameter-expansion context ${...} [ Note that
> currently, '&' has the meaning of the matched part when it is not
> quoted by backslash in *the expanded result* ].  I expect the
> following interpretations with this suggestion:

The quoting outside the ${...} doesn't affect whether REP is quoted. This
is consistent with how POSIX specifies the pattern removal expansions, and
how bash has worked since bash-4.3.

So both of these, for instance, will expand to `&' *because of how bash
already works*, regardless of whether or not we attach meaning to `&' in
the replacement string.

> $ echo "${var/$pat/&}"    # & represents the matched part
> $ echo "${var/$pat/\&}"   # & is treated as a literal ampersand

This next one will expand to `\&' again due to existing behavior,
regardless of what we do with it, due to how quote removal works.
And so on.

> $ echo "${var/$pat/\\&}"  # A literal backslash plus the matched part

> $ echo "${var/$pat/'\'&}" # A literal backslash plus the matched part
> $ rep='A&B'
> $ echo "${var/$pat/$rep}"   # 'A' plus the mached part plus 'B'
> $ echo "${var/$pat/"$rep"}" # Literal 'A&B'

Rather than dance around behind the scenes trying to invisibly quote &,
but only in certain contexts where it would not otherwise be escaped by
double quoting, I would be more in favor of adding an option to enable the
feature and allowing the normal rules of double quoted strings to apply.

> Here are the rationale:
> * It is consistent with the treatment of the glob special characters
>   and anchors # and % in $pat of ${var/$pat}.

Yeah, doing that was probably a mistake, but we have to live with it now.
Those are really part of the pattern operator itself, not properties of
the pattern. But nevertheless.

> * One can intuitively quote & to make it a literal ampersand.  The
>   distinction of the special & in ${var/$pat/&} and the literal
>   ampersand in ${var/$pat/\&} is more intuitive than ${var/$pat/&} vs
>   ${var/$pat/\\&}.

Not if you take into account the word expansions the replacement string
undergoes. For example, if you use ${var/$pat/\&} in bash-5.1, you're going
to get a `&' in the output, not `\&'. Now you invite the questions of why
bash expands things differently whether or not there is a `&' in the
replacement string, and since the non-special bash-5.1 expanded that to
`&', why should bash-5.2 not treat it as a replacement?

I guess the question is why not let the normal shell word expansion rules
apply, and work with the result.

> ----------------------------------------------------------------------
> Bash version of devel branch?
> By the way, when would the BASH_VERSINFO be updated?  The devel
> version still has the Bash version 5.1.  I would like to reference the
> version information to switch the implementation.  In particular,
> since some incompatible changes are introduced in the devel branch
> (which are supposed to be released as Bash 5.2), I need to switch the
> implementation.

That's what I do when I need to.

``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

