[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Incorrect alias expansion within command substitution

From: Robert Elz
Subject: Re: Incorrect alias expansion within command substitution
Date: Wed, 02 Feb 2022 05:31:14 +0700

    Date:        Tue, 1 Feb 2022 15:39:06 -0500
    From:        Chet Ramey <chet.ramey@case.edu>
    Message-ID:  <2816cf78-d7be-b9e1-733d-12427b04cb49@case.edu>

  | When you say "just parsed," when are aliases expanded?

During lexical analysis, right between when the input is read (if it
is read, and isn't from some internal string) and when it is handed to
the grammar.

  | Are they expanded while scanning the command substitution to find the
  | closing `)' but not part of the text that results?

For the former half, yes.   And I think you're aware that we have no
"text that results" - which is why, among other things, <<$(anything)
doesn't work for us, as the $(anything) is a (kind of) a pointer to
a piece of parse tree, which never ends up matching anything even remotely

Keeping the original text is something on my todo list, but it is complicated
by our memory management methods - I could just malloc(biggish) and then
realloc(2 * biggish) if biggish isn't enough, but I really would prefer not
to do that if I can avoid it (this is a temporary string, it doesn't need
semi-permanent storage, which is what malloc() is used for in our shell).

  | What form are the `results' kept in (I
  | assume a parse tree similar to a shell function)?

Very similar - shell functions start out identical, but are then converted
(slightly) into a more condensed form, as they hang around for a long time,
unlike command substitutions which last just as long as it takes to prepare
the current command.

  | Do they include expanded aliases or is that deferred?

Aliases (as evil as they are) need to be expanded to be able to generate
the correct parse tree - they cannot be deferred.

Consider the different tree you'd get for

        cmd1 arg1; cmd2 arg2; cmd3

if that was parsed exactly as written, compared to what it would
look like if we had

        alias cmd1=if
        alias cmd2=then
        alias cmd3=fi

  | That seems like the crux of the issue. If the command substitution is part
  | of a shell function definition, you only want to expand aliases the `first
  | time' -- at the time you parse the shell function.

Yes, aliases are always lexical.   The function is a tree from the parse,
there's nothing in the tree, ever, that has anything whatever to do with
aliases (aside from the "alias" and "unalias" commands themselves of course,
but those are just ordinary commands, and treated that way).

  | You can execute arbitrary commands, including alias definitions,
  | between the time the shell function is defined and the time it's executed.

Yes.   Too late for anything in the function (anything at all).

  | POSIX requires that aliases in command substitutions be expanded when
  | the function definition is parsed, not when the command substitution
  | is finally executed,

Yes, that's what we do - it is a consequence of the "parse everything when
it is first seen, and never again" philosophy.

  | but bash has not traditionally done it that way. That's where the backwards
  | compatibility issues come in.

Oh.  I see.   Does anyone really care though?   It is kind of hard to imagine
anyone being perverse enough to use an alias in a function anywhere, let
alone depend upon it changing between executions of a function when the alias
definition has been altered.   Anyone doing anything like that deserves to
have their code break IMO.

  | It's not true that `no commands can be executed'. The alias can be altered
  | by commands between a shell function definition and its execution.

Yes, I had forgotten that case.

  | I considered keeping both the original text and the parse tree from the
  | parsed command substitution (well, a chain of them since you can have an
  | arbitrary number of command substitutions in a word).

Yes, that's what we do.

  | It's difficult, given bash's internal structure, to preserve the
  | original text

Same for us, probably for different reasons though.

  | -- as opposed to the reconstituted text --

How accurately can you reconstitute?   That is, can you maintain the
difference between $(a b) and $( a b ) for example ?   How about $(a  b) ?

  | Other shells (bosh, mksh) also recreate the text of a command
  | substitution from the parsed commands.

Interesting, though I am not surprised in a way.   Actually needing the
command sub in textual form (aside from perhaps showing in output from
jobs or something, though I doubt it is ever needed even there - and for
that kind of thing, a good approximation is just fine in any case) is
very very rare in sh - the end word of a here-doc redirection operator
might be the only case.

  | > It seems to me that the way you're doing it now would also break:
  | > 
  | >   alias x=y
  | >   cat <<$(x)
  | >   whatever
  | >   $(x)

  | This works in bash default mode because aliases aren't expanded while the
  | command is parsed.

I don't know whether I was in "default mode" or not (all I did was run 
"bash-5.2" which is the name where I stored the development binary - which
is still the last one I built while running tests, I haven't built the 5.2alpha
version yet, so what I am running is about a month and a half old I think)

But given that I simply ran it, and then typed (pasted into an xterm
actually, but that makes no difference) the commands above, and it didn't
work (it did in bash 5.1.16 or whatever the current released one is).

  | The delimiter ends up being `$(x)'.

In the version I tested (5.2 development version) it ended up being $(y)

  | Since you're required to check the line read for the terminating
  | delimiter before doing anything else, the delimiter has to be $(x)
  | to make it work.

Yes, I'm aware of that.

  | It works with ksh93 as well, but every other shell produces an error of
  | some sort (including the NetBSD sh, unless you've changed something in the
  | couple of months since I last built it).

No, it definitely does not work in our shell.   That's what I said a little
later in my message.  When we process the redirect, the word has (more or less
in this case) just a pointer to a parse tree (or as you mentioned above, to
the head of a chain of parse trees, in this case, a chain of length 1).

  | I can't see it working in any shell's posix mode if posix requires aliases
  | to be expanded while reading the WORD containing the command substitution
  | that is the here-doc delimiter.

I have no idea, I doubt that until this issue came up, anyone has ever
considered this.   Aliases are a total botch, and should be eliminated
from the standard completely, then implementors who need to keep them can
make them work however they like (like arrays).

  | If that's the case, you have an alias
  | expansion mismatch, since I don't believe you're permitted to perform
  | alias expansion on the lines of the here-document as you read them,

You are correct.  aliases are only ever expanded in the command word
position, or immediately after the expansion of a preceding alias whose
definition ended in a space (one of the most bizarre syntax rules I have
ever seen, anywhere).   here doc text is never a command word, it is only
ever input for some file descriptor, so is never eligible for alias expansion.
Nor is anything that is quoted, and here doc text is always quoted (double
when the redirect operator end-word is not quoted, or single if it is).

  | and the resolution to bug 1036 makes it clear -- to me, at least -- that you
  | check for the delimiter before doing anything to the line.

Yes.   Though (while not at all relevant right now, in this discussion) the
processing of \newline in double-quoted heredocs (unquoted word on redirect
operator) is a bit murky.

Of course in reality, none of this truly matters.   Only an idiot would
ever actually do something like

        alias x=y
        cat <<$(x)

which I guess is why it was me who suggested that...   For that matter,
only an idiot would actually ever use (what looks like, but really isn't)
a command substitution in or as a here doc end delimiter word - or, for
that matter, even include any $ in one of those (except perhaps as \$
or inside single quotes, causing the here doc to be of the single quoted

None of this really matters to anyone except those of us who dream up
torture tests, or try and nail down every word in the standard.


ps: I haven't seen my (or your) message via bug-bash yet, though mind did
get to (was accepted by) whatever is the MX for gnu.org several hours ago
now, is the list perhaps not working (I did recently see one message has
arrived from it (twice) though - but that one seems to have been held up
at the list for about 4 hours, neither mine nor yours are that old yet).

Received: from localhost ([::1]:34998 helo=lists1p.gnu.org)
        by lists.gnu.org with esmtp (Exim 4.90_1)
        (envelope-from <bug-bash-bounces+kre=munnari.oz.au@gnu.org>)
        id 1nF1Ld-0002jI-DI
        for kre@munnari.oz.au; Tue, 01 Feb 2022 17:10:01 -0500
Received: from eggs.gnu.org ([]:45358)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <fxmbsw7@gmail.com>) id 1nExqR-0000tL-OZ
 for bug-bash@gnu.org; Tue, 01 Feb 2022 13:25:38 -0500
Received: from [2607:f8b0:4864:20::42a] (port=40901
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <fxmbsw7@gmail.com>) id 1nExqK-0001HX-NF
 for bug-bash@gnu.org; Tue, 01 Feb 2022 13:25:30 -0500

reply via email to

[Prev in Thread] Current Thread [Next in Thread]