bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unclosed quotes on heredoc mode


From: Chet Ramey
Subject: Re: Unclosed quotes on heredoc mode
Date: Wed, 8 Dec 2021 09:56:50 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.2.1

On 11/28/21 2:29 PM, Robert Elz wrote:


   | So the ultimate question is whether or not the act of reading a command
   | substitution should reset this requirement. That's where we disagree.
   | The grammar is, at that point, reading a different command.

"command" is a loaded word in sh terminology, it is used for all kinds of
things, but in general it is not at all unusual for here document text to
appear while a command other than the one with the redirection operator is
being processed (no command substitutions necessarily involved).   What the
grammar is doing after a here doc redirection operator has been processed,
until the next newline (token) is encountered is irrelevant - the spec
imposes no requirements upon that at all.

We agree on this.

The real question is whether you read a command substitution as a single
WORD, so that the lexer cannot return "the next newline token" until the
command substitution has been completed.

Command substitutions don't appear in the grammar at all, just like here-
documents. They're just words, and like other words, the characters they
contain don't affect other constructs.

I suppose it's precedence parsing: the command substitution has higher
precedence than here-documents.

Sure, but that's not what I meant.   I treat heredoc data as much the same
as a \newline - something that the lexer deals with, and the grammar never
knows happened.   Heredoc data doesn't appear at all in the sh grammar,
as nothing in the grammar cares in the slightest about them (once they're
queued).  What I meant was that from that perspective, whether a sh script
(or sh script fragment) is valid or not, is determined by the grammar, and
given that here doc data does not appear there, it cannot have any impact
upon the decision whether some particular part of the sh input is valid or
not.

Here-documents are simply quoted strings with some peculiar properties,
read a line at a time.

So, if one does

        $( cmd <<END )

there's nothing invalid about that, unless EOF follows that ')' before
a newline token appears.   And if that happens, it isn't the grammar that
complains, but something beyond that.   The syntax "word redirect" is
perfectly valid, and "<< word" is a perfectly valid redirect.

Now put the text between $( and ) into a file and run it as a shell script.
Is it valid?


You seem to be hung up on the way you have chosen to implement $( )
(which of itself is OK, but it is not required to be done that way)
where (it seems) you parse the command inside the $() as if there was no
world at all outside it.   As far as getting the grammar correct that's
fine, but it doesn't work with here doc data.

Exactly. In the same way, you prioritize here documents over command
substitution, which gets back to how you have chosen to implement their
intersection.

You're sure of your implementation's correctness. We don't agree.



   | >    | The netbsd shell appears to be the outlier here. The parser reads 
the
   | >    | command substitution so it can parse the entire and-or list before 
trying
   | >    | to gather any here-documents.
   | >
   | > You cannot possibly really mean that I hope.   That is, in
   | >
   | >       cmd1 <<EOF &&
   | >       data
   | >       EOF
   | >               cmd2
   | >
   | > you do agree that "data" is stdin to cmd1, that is, the herdoc data
   | > appears splat in the middle of the and-or list.   That's certainly the
   | > way it appears to work (in bash) to me.
   |
   | There is no command substitution in this example.

I know.   But go back and read the quote from you (still here, above, in
this message) again: "The parser reads the command substitution so it can
parse the entire and-or list before trying to gather any here-documents"

The command substitution is a single word. There isn't any newline token
returned to the grammar until it's complete, and there isn't any reason
to read the here-document until it is. That's what this all comes down to:
"all characters following the open parenthesis to the matching closing
parenthesis constitute the command."


** parse the entire and-or list before trying to gather any here documents **

I don't believe that you really meant that, it isn't the way bash behaves
(unless this is something different in the devel version, but I doubt that)
and I was just pointing out that poor phraseology.

Ok.


   | So, again, the question is whether or not input data that is logically
   | part of the command substitution (it appears between the opening and
   | closing parentheses) should affect the `outer' command. That's the
   | question. We have different answers.

We do, because I don't view here doc data as affecting anything except the
command for which it is input.

OK, then we can stop here. We're not going to agree on this.


But one can also do

        printf "%s\n" 'data' >/tmp/hidden.data.$$
        .... $( cmd </tmp/hidden.data.$$ ) ...
        rm rm /tmp/hidden.data.$$

and that would also work everywhere, right?   That is, the data for the
command in the command substitution is created (and removed, but that bit
of it is generally irrelevant here) outside the command substitution.

This is the rough equivalent of

        ... $( cmd << \END ) ...
        data
        END

Oh, stop. The two constructs might have the same functional effect, but
explicitly referring to an existing file within the command substitution
doesn't have anything to do with parsing or lexical analysis.



And then once you allow that to work (which you're apparently now doing
in the devel version),

As I said in a previous message, that's an inconsistent choice, sort of
like permitting EOF to terminate a here document but not a quoted string.
But you have to do something, and, when presented with that ambiguity, it
seems like a reasonable thing to allow as an exception. Behavior varies
widely, so there's no consensus on what is "right."

there cannot really be any objection to

        cmd <<END $( cmd1 &&
        data
        END
                        cmd2 )

Ha, no. That surely is not consistent with the POSIX "all characters"
language.


as that's really just the same principle being applied in the other
direction.   Furthermore that means that in

        cmd <<END1 $( cmd1 <<END2 &&

(with a newline after the "&&") the data that follows is

        data1
        END1
        data2
        END2

keeping the left to right across the input line is the order
that the standard requires here document data to appear in.

So the implementation is something like "read a line, then parse some of
it, then go back and read more lines from the input stream if we see a
here-document operator, then go back to where you left off with the
original line and continue parsing from there." I can see that. It's as
much a consequence of implementation choices as other shells' behavior.


Here "input line" is really a logical line, rather than a physical
one. as we have already agreed that here docs don't appear in the
middle of quoted strings, and nor do they appear after elided newlines
(\newline pairs) which are removed, neither of which generates a newline
token.   But it is "line" not "command", or anything else related to the
grammar which is specified:

Except when you're reading a command substitution, when it's "characters."
So it comes back around to precedence.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]