Re: About ARITH

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About ARITH_CMD

From:	Eric Blake
Subject:	Re: About ARITH_CMD
Date:	Thu, 14 Feb 2019 16:57:09 -0600
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

On 2/14/19 4:03 PM, Peng Yu wrote:
> Hi,
> 
> yylex() still gives the token ARITH_CMD for the following command. The
> error seems to be raised at the parsing stage. Shouldn't the error be
> caught in the lexical analysis stage?

Changing it now may break scripts that depend on the existing behavior.

> 
> $ ((x = 10 + 5; ++x; echo $x))
> bash: ((: x = 10 + 5; ++x: syntax error: invalid arithmetic operator
> (error token is "; ++x")
> 
> Why the parsing of the arithmetic expression is in the lexical
> analysis. Why not introduce token `((` and `))` and handle arithmetic
> expression in the bison parsing code?

Because there are other situations where bash has chosen to let '(('
represent the start of nested subshells; blindly always treating two
consecutive (( as a single token would make the treatment of nested
subshells harder to write (even if such code is not portable according
to POSIX).

> 
> Also, I don't find that POSIX specifies `((`. (Let me know if I miss
> anything.)

POSIX does not specify it, other than mentioning it in passing:
http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html

"The "((" and "))" symbols are control operators in the KornShell, used
for an alternative syntax of an arithmetic expression command. A
conforming application cannot use "((" as a single token (with the
exception of the "$((" form for shell arithmetic).

On some implementations, the symbol "((" is a control operator; its use
produces unspecified results. Applications that wish to have nested
subshells, such as:

((echo Hello);(echo World))

must separate the "((" characters into two tokens by including white
space between them. Some systems may treat these as invalid arithmetic
expressions instead of subshells."

POSIX also states:

"Arithmetic expansions have precedence over command substitutions. That
is, if the shell can parse an expansion beginning with "$((" as an
arithmetic expansion then it will do so. It will only parse the
expansion as a command substitution (that starts with a subshell) if it
determines that it cannot parse the expansion as an arithmetic
expansion. If the syntax is valid for neither type of expansion, then it
is unspecified what kind of syntax error the shell reports.

How well the shell performs this determination is a quality of
implementation issue. Current shell implementations use heuristics. In
particular, the shell need not evaluate nested expansions when
determining whether it can parse an expansion beginning with "$((" as an
arithmetic expansion. For example:

$((a $op b))

is always an arithmetic expansion if "$op" expands to, say, '+', but if
"$op" expands to '(' then the shell might still parse the expansion as
an arithmetic expansion (resulting in a syntax error due to unbalanced
parentheses) or it might perform a command substitution.

This standard requires that conforming applications always separate the
"$(" and '(' with white space when a command substitution starts with a
subshell. This is because implementations may support extensions in
arithmetic expressions which could result in the shell parsing the input
as an arithmetic expansion even though a minimally conforming shell
would not. For example, many shells support arrays with the array index
(which can be an expression) in square brackets. Therefore, the presence
of "myfile[0-9]" within an expansion beginning "$((" is no guarantee
that it will be parsed as a command substitution.

The ambiguity is not restricted to the simple case of a single subshell.
More complicated ambiguous cases are possible (even with just the
standard shell syntax), such as:

$(( cat <<EOH
+ ( (
EOH
) && ( cat <<EOH
) ) + 1 +
EOH
))
"

In short, (()) in bash is an old bash extension, although modern code
should probably use POSIX $(()) instead of the bash extension.

> If `((` is a bash-specific thing, why not allow it to
> handle multiple arithmetic expressions instead of just one? Thanks.

Historical practice, and the fact that right now, (()) and $(()) share
code, and we can't change how $(()) operates, so it does not make sense
to change how (()) operates.  The above quotes from POSIX demonstrate
ambiguous situations where lexical analysis, rather than parsing alone,
is needed to decide between arithmetic or command substitution; so since
we are already relegated to a lexical decision, complicating the parser
isn't going to buy us any benefit.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

About ARITH_CMD, Peng Yu, 2019/02/14
- Re: About ARITH_CMD, Eric Blake <=

Prev by Date: address@hidden
Next by Date: why not update bash syntax while maintaining backwards compatibility?
Previous by thread: About ARITH_CMD
Next by thread: address@hidden
Index(es):
- Date
- Thread