bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unexpected behavior observed when using address@hidden/pattern/strin


From: Stahlman Family
Subject: Re: Unexpected behavior observed when using address@hidden/pattern/string} construct
Date: Sat, 6 Jan 2007 08:26:49 -0600


----- Original Message ----- From: "Stahlman Family" <address@hidden>
To: <address@hidden>
Sent: Sunday, December 31, 2006 4:18 PM
Subject: Unexpected behavior observed when using address@hidden/pattern/string} 
construct


While writing a script that uses the address@hidden/pattern/string} construct,
I encountered behaviors I cannot reconcile with the text in the bash manual.
I've pasted snippets from a command line session below (with comments
interspersed) that shows the apparent issues.

Perhaps some of the confusion would be cleared up by an explicit description
of the handling of quoted strings within the <string> portion of the parameter
expansion construct; specifically, will a "nested" string be parsed as though
it were at the top level? Case 2 appears to indicate yes, while case 3
indicates no...

Case 1 is included because it shows an (apparent) issue with respect to word
splitting; specifically, that it is performed on the results of the expansion,
even though IFS is set to null.

Note that the goal in the examples below is to prepend "-iname '" (portion
within double quotes only) to each of the 2 elements in the original array,
without changing the number of words. i.e., the new array should contain the
following 2 words:
-iname 'abc
-iname 'def

$ a=(abc def)

# Prevent word splitting
$ IFS=

# ***
# *** Case 1 *** (Doesn't work because of undesired word splitting)
# ***
$ a2=(address@hidden/#/"-iname '"})

$ echo "address@hidden"
4

$ for el in "address@hidden"; do echo "$el"; done
-iname
'abc
-iname
'def

# Conclusion: The resulting list (after parameter pattern substitution is
# applied to each element in the original array) undergoes word splitting,
# even though IFS is set to null! Why?

After looking through the code, I believe I now know why. It doesn't appear to
be by design. Note that the word splitting problem affects only array
expansions that are not within double quotes, and occurs only when IFS is
null. Here's what appears to be happening...

When expand_word_internal encounters an unquoted ${ } construct, it calls
param_expand with the 'quoted' argument set to FALSE to perform the expansion.
When it encounters a left brace, param_expand invokes parameter_brace_expand,
once again passing quoted == FALSE. If the operation is pattern substitution,
parameter_brace_patsub is invoked with quoted == FALSE. If the substitution is
being performed on an array, array_patsub is called to perform it.
Array_patsub copies the array, performs the substitution on the copy, and
then, (*if and only if* the quoted argument is TRUE) quotes each element
within the newly-substituted array (via array_quote); i.e., the element
quoting is performed if and only if the parameter expansion occurred within
double quotes. So far, this makes sense. The problem is in how array_patsub
converts the array to a string for return to parameter_brace_patsub.
Specifically, since array_to_string inserts literal spaces to denote word
breaks, if array_quote is not called first, there is no way for the caller to
distinguish between spaces that were embedded in a word and those that were
added merely to separate words. Of course, when the expansion itself is within
double quotes, there is no problem, since literal embedded spaces have already
been quoted by array_quote. Similarly, for expansions that are not within
double quotes, but for which IFS contains a space, there is no problem.

However, for the case of IFS='' and a non-double-quoted array expansion, when
expand_word_internal finally calls list_string to perform word splitting, it
will pass <space> as ifs since has_dollar_at is set and ifs is null.  Since
list_string has no way of distinguishing the separator spaces from the ones
embedded within the expanded fields, it will split on both. In fact, since
expand_word_internal splits the entire word with the same call to list_string,
and the word may consist of more than just the array expansion, the following
word

address@hidden

will be split at spaces within $var_containing_spaces, even though it is not
an array, and even though ifs is null!

It should be noted that string_list_dollar_at is never called in the case
described above. For constructs for which string_list_dollar_at is called
(e.g., address@hidden), there is the opposite problem. Specifically,
string_list_dollar_at intentionally quotes array elements when ifs is null,
even when the expansion does not occur within double quotes. The problem with
this is that this quoting, while it prevents incorrect word splitting, also
has the effect of incorrectly inhibiting pathname expansion! Consider the
following example...

$ ls *.*
junk.txt  junk2.txt

$ pat=('*.*')

# The following doesn't work since quote_list has quoted the filename
# pattern...
$ ls address@hidden
ls: *.*: No such file or directory

# However, if we return IFS to default...
$ unset IFS

# Now it works...
$ ls address@hidden
junk.txt  junk2.txt

It seems to me that there needs to be some mechanism other than literal spaces
to separate array elements prior to word splitting. Perhaps some sort of
CTLESC CTL<XXX> sequence? Am I misunderstanding what's happening here? Note
that this apparent issue is not limited to the address@hidden/pattern/string}
form. It also affects address@hidden, address@hidden, and
perhaps other forms as well. (I haven't checked all of them.) In the case of #
and %, the specific mechanism is different (i.e., list_remove_pattern instead
of array_patsub) but the basic problem is the same...

[snip]
.
.



# ***
# *** Case 2 *** (Doesn't work because of unterminated single quote)
# ***
$ a2=("address@hidden/#/-iname '}")
'


Conclusion: This attempt needed to be Ctrl-C'd because bash first considered
that the single quote was unterminated, then apparently, considered the curly
brace to be unterminated; i.e., the single quote is not considered to be
"within" the double quotes that surround the entire parameter. In and of
itself, this would not seem strange to me, but it appears to be inconsistent
with the following case, in which the double quotes in <string> are aware that
they are themselves within double quotes.

# ***
# *** Case 3 *** (Doesn't work because double quotes within <string> are kept)
# ***
$ a2=("address@hidden/#/"-iname '"}")

$ echo "address@hidden"
2

$ for el in "address@hidden"; do
echo "$el"
done
"-iname '"abc
"-iname '"def

Conclusion: Unlike case 2, the quote characters within <string> (double quotes
in this case, single quotes in case 2) are *not* considered by bash to be
string delimiters, but are simply included literally in the expanded string.
Again, there is nothing inherently wrong with this, but if nested strings are
not possible, why was the single quote in case 2 considered to be
unterminated?

Thanks,
Brett S.


Since my original post, I have read the section in the posix specification on
"double-quotes". (I should have done that first, I know...). Also, I have
looked at the bash source. I believe I understand the mechanism that is
responsible for the differing treatment of single and double quotes with the
parameter replacement string (rhs), but there are still some things that need
clarification...

The posix section on "double-quotes" requires that both single and double
quotes be balanced within the rhs of the parameter construct. This at least
implies that something akin to single and double quoted strings are permitted
within the parameter replacement. However, both the examples I have tried and
inspection of param_brace_expand_rhs indicate that if the entire parameter is
within double quotes, nested quotes don't really begin a nested string. The
rhs is parsed as a double quoted string in which unquoted double quotes are
simply discarded, and single quotes are retained literally. The only reason I
can see for using double quotes within the replacement is to allow an
unbalanced single quote to appear in the rhs. Single quotes themselves are not
treated specially at all. This begs the question: If single quotes are not
special within the rhs of a double quoted parameter construct, why are they
required to be balanced? Perhaps it is to allow the same rule to be applied to
the case of double-quoted and non-double-quoted parameter constructs?

Perhaps I just missed it, but I don't see anything in the posix section on
"double-quotes" that indicates that balanced and unquoted double quotes in the
parameter replacement should be discarded, while single quotes should be
retained. Actually, I don't really see that Posix specifies how these "nested
pseudo-strings" should be interpreted; only that the quote characters should be
balanced. Actually though, now that I'm thinking through it, I can see a
rationale for the implementation as it stands. Perhaps you can confirm or deny
that the following rationale is correct?

Within a parameter construct enclosed in double quotes, the replacement is
effectively double quoted; however, since Posix requires that all quote chars
within the replacement be balanced, and since backslash cannot be used to
escape `'' within double quotes, ordinary double quoted string processing would
give us no way to specify a single, unbalanced single-quote char within the
replacement. The implementation as it stands allows us to include a single
quote by wrapping it in balanced double quotes, which are discarded by
param_brace_expand_rhs. The previously posed question remains, however: "Why
does a single quote need to be balanced within a double-quoted parameter
replacement construct?"

Note that I'm not claiming there's anything wrong with the implementation as
it stands here, just trying to understand the rationale. A few of the examples
below show behavior that seems strange and a bit inconsistent to me, but
perhaps if I understood the rationale behind the nested string processing
mechanism, the behavior would make sense...

*** Examples ***
$ a1=(a b c)

$ a2=("address@hidden/#/"-iname '"}")

# The following shows that the double quotes were not discarded in the above
# array assignment. By contrast, they are discarded in the non-array examples
# below...
$ echo "address@hidden"
"-iname '"a "-iname '"b "-iname '"c

$ echo "${dbg-"'hey'"}"
'hey'

$ echo "${dbg-"hey"}"
hey

# Note that single quotes have no effect on discarding of double quotes.
$ echo "${dbg-'"hey"'}"
'hey'

# Unbalanced double quote causes problem
$ echo "${dbg-"hey}"
"
"


# But no problem when the double quote is "quoted" by literally-inserted
# single quotes! I guess this seems weird to me for 2 reasons: 1) because the
# single quotes are obviously not special, yet are able to quote an unbalanced
# double quote; 2) because the unbalanced and "quoted?" double quote is
# discarded.
$ echo "${dbg-'"'hey}"
''hey

# Note that $ echo "${dbg-'"hey'}"
'hey'

Thanks,
Brett S.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]