help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is it really necessary to allow operators and whole right hand sides


From: Koichi Murase
Subject: Re: Is it really necessary to allow operators and whole right hand sides to be substituted in (())?
Date: Tue, 18 May 2021 11:33:38 +0900

2021年5月18日(火) 10:32 Peng Yu <pengyu.ut@gmail.com>:
> On 5/17/21, Chet Ramey <chet.ramey@case.edu> wrote:
> > On 5/17/21 5:07 PM, Peng Yu wrote:
> >> $ plus=+; ((x = 1 $plus 2)); declare -p x
> >> declare -x x="3"
> >>
> >> I see that the above code works. I think that allowing operators to be
> >> substituted is counterintuitive.

I guess you are seriously thinking of implementing the parser, and
yes, this is exactly the problem that one encounters when one tries to
implement a *single-pass* parser for the shell syntax including the
arithmetic expressions. Many AltShell authors mention this. The
parsing of the Unix shells undergo multi-pass parsing for arithmetic
expressions, etc: it just extracts the arithmetic expressions as a
string "x = 1 $plus 2" in the first pass, expands it to "x = 1 + 2",
and finally performs the second pass for the arithmetic evaluations.
Unlike other programming languages, multi-pass parsing is quite common
in shell syntax. Everything such as history expansions, alias
expansions, brace expansions, tilde expansions, and the other shell
expansions ${}, $(), $(()), etc. is all processed by multi-pass
processing of strings in the shell.

> >> So it would be better not to allow it when this syntax first appeared?
> >
> > Why do you continue to relitigate 30-year-old decisions? If you don't want
> > to put it in your rewrite, leave it out.
>
> I would like to know if this decision is based on any logic reasoning.
> But it seems to be more based on convenience of implementation as when
> it is implemented in this way, the implementation for math can be
> partly shared with other parts.

I suspect the latter, i.e., not based on logic but for convenience.
Maybe this is related to a Unix philosophy of "Do One Thing And Do It
Well"? Each feature of the shell is implemented independently as just
"string modifications", and just combined later.

> The reason that I want to understand the rationale is that things like
> these seem to make shell code inherently hard to optimize for speed,
> because there is no way to know what the expression expands to until
> the code actually runs. If speed were a concern for the math
> operations, then this feature seems better be stripped off or be
> disabled by a shopt.

Even if this feature is disabled, it anyway needs to report an error
for the "statically-unparsable class" of arithmetic expressions, where
it is assumed that we can detect "the class". Then, it is possible to
optimize only for the "statically-parsable class" while keeping the
current dynamical parsing of the "statically-unparsable class". In
this case, the user who expects an optimization can avoid using
parameter expansions in arithmetic expressions; e.g. use ((a = b + c))
instead of ((a = $b + $c)). This is already the recommended way of
writing the arithmetic expressions, I think we don't have to strip the
feature off but can just optimize when possible.

> >> a=4
> >> b=a
> >> c=b
> >> ((d = c * 2))
> >> echo "$d"    # output: 8
>
> How does the above code multi-level de-reference work below the
> surface? Without (()), there is no such multi-level de-reference. So I
> want to understand how it works.
>
> Could anybody help explain?

The arithmetic evaluations are recursively called when it references
variables and array elements. It is documented:

https://www.gnu.org/software/bash/manual/bash.html#Shell-Arithmetic
> [...] The value of a variable is evaluated as an arithmetic
> expression when it is referenced, or when a variable which has
> been given the integer attribute using ‘declare -i’ is
> assigned a value.  [...]

and implemented by `subexpr' (devel expr.c L451) called by
`expr_streval' (devel expr.c L1224). So one can actually include
expressions in a variable.

$ a[0]='x=1234'
$ a[1]='y=4321'
$ b=0
$ ((a[b++])); echo "$x,$y"
1234,
$ ((a[b++])); echo "$x,$y"
1234,4321

> Also, is there a real situation such multi-level reference is really
> useful.

In fact, I find it very useful. For example, I use it to efficiently
decode UTF-8 data streams:

https://github.com/akinomyoga/ble.sh/blob/98835b5a6328606a0e0c14eee5a5bd2e29f547fe/src/decode.sh#L3961-L3992
> _ble_encoding_utf8_decode_mode=0
> _ble_encoding_utf8_decode_code=0
> _ble_encoding_utf8_decode_table=(
>   'M&&E,A[i++]='{0..127}
>   'C=C<<6|'{0..63}',--M==0&&(A[i++]=C)'
>   'M&&E,C='{0..31}',M=1'
>   'M&&E,C='{0..15}',M=2'
>   'M&&E,C='{0..7}',M=3'
>   'M&&E,C='{0..3}',M=4'
>   'M&&E,C='{0..1}',M=5'
>   'M&&E,A[i++]=_ble_decode_Erro|'{254,255}
> )
> function ble/encoding:UTF-8/decode {
>   local C=$_ble_encoding_utf8_decode_code
>   local M=$_ble_encoding_utf8_decode_mode
>   local E='M=0,A[i++]=_ble_decode_Erro|C'
>   local -a A=()
>   local i=0 b
>   for b; do
>     ((_ble_encoding_utf8_decode_table[b&255]))
>   done
>   _ble_encoding_utf8_decode_code=$C
>   _ble_encoding_utf8_decode_mode=$M
>   ((i)) && ble-decode-char "${A[@]}"
> }

I also use the recursive evaluation in many other places.

--
Koichi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]