bug-bash
[Top][All Lists]

## Re: Unclosed quotes on heredoc mode

 From: Alex fxmbsw7 Ratchev Subject: Re: Unclosed quotes on heredoc mode Date: Sun, 28 Nov 2021 20:51:33 +0100

```a small comment on that /bin in PATH code.. is invalid, you need to match
first non : beginning ahe not : ending end
case :\$PATH: would fix it

On Sun, Nov 28, 2021, 20:31 Robert Elz <kre@munnari.oz.au> wrote:

>     Date:        Sat, 27 Nov 2021 13:57:57 -0500
>     From:        Chet Ramey <chet.ramey@case.edu>
>     Message-ID:  <5217c48e-c989-a163-5673-38995e35a14b@case.edu>
>
> Warning: long message follows, give yourself time to digest it.
>
>   | OK, if you do end up building the devel branch, I'd be interested
>   | in these results.
>
> Assuming that happens, I shall certainly let you know.
>
>   | > Once, of course ... why would I ever build it again?
>   |
>   | Patches exist. There are vendors who take the original release, apply
> their
>   | own special-sauce patches, then apply the patches I release as they
> come
>   | out, as part of their own distribution release process.
>
> Of course, NetBSD pkgsrc (used on other systems as well) does that too.
> But your patches appear about every 5-6 months, so I end up doing one
> build every 5-6 months.   Keeping the object files (even the unpacked
> sources) sitting around waiting for the next patches, in order to save
> perhaps 2-3 minutes of build time isn't worth the bother.   Once built
> and installed it all gets trashed.
>         [I have also contemplated doing builds in an MFS (or tmpfs)
>         which would vanish on a reboot (or just umount) and I do tend
>         to reboot more often than bash patches are released ... but I've
>         yet to actually do that, for bash, the build time saved
>  wouldn't
>         be worth the bother - for some other apps, it might be].
>
> pkgsrc doesn't encourage attempting to retain anything in any case - it
> probably isn't a problem for bash (at least I've never see it, not that
> I ever looked either) but other applications have a habit of deleting files
> from their distributions - and unless one starts from an empty directory,
> unpacking a tarball doesn't cause those files to be removed ... further,
> some build systems don't pay attention to what is supposed to be there,
> and manage to link all the .o files they can find.
>
> It is easier, and more reliable, to simply start clean every time.
>
> But of course that doesn't apply when you're developing and building
> several times a day (or sometimes, dozens of times an hour).   That just
> doesn't apply to me with bash.
>
>   | Usually, that's ok. In this instance, where we're discussing a feature
>   | whose implementation is substantially different between the released
> and
>   | development versions, it's more relevant.
>
> Sure, though I didn't know this part was changed so much in the
> devel version until you told me just recently (I do not watch what happens
> there).
>
>   | So the ultimate question is whether or not the act of reading a command
>   | substitution should reset this requirement. That's where we disagree.
>   | The grammar is, at that point, reading a different command.
>
> "command" is a loaded word in sh terminology, it is used for all kinds of
> things, but in general it is not at all unusual for here document text to
> appear while a command other than the one with the redirection operator is
> being processed (no command substitutions necessarily involved).   What the
> grammar is doing after a here doc redirection operator has been processed,
> until the next newline (token) is encountered is irrelevant - the spec
> imposes no requirements upon that at all.
>
>
>   | > Then we get to whether heredoc data is part of a valid shell script
>   | > in that sense - when there is yet to be a newline token to introduce
> it.
>   |
>   | What does this mean? In all cases, the here-documents are not read
> until
>   | after a newline token. That's not the issue.
>
> Sure, but that's not what I meant.   I treat heredoc data as much the same
> as a \newline - something that the lexer deals with, and the grammar never
> knows happened.   Heredoc data doesn't appear at all in the sh grammar,
> as nothing in the grammar cares in the slightest about them (once they're
> queued).  What I meant was that from that perspective, whether a sh script
> (or sh script fragment) is valid or not, is determined by the grammar, and
> given that here doc data does not appear there, it cannot have any impact
> upon the decision whether some particular part of the sh input is valid or
> not.   Of course, if the script ends (completely) without a newline token
> after the last redirect operator then that's an error - but of a subtly
> different kind (more like an unterminated string (mismatched quotes) or
> here doc data without its required terminating word -- all lexical
> constructs).
>
> So, if one does
>
>         \$( cmd <<END )
>
> there's nothing invalid about that, unless EOF follows that ')' before
> a newline token appears.   And if that happens, it isn't the grammar that
> complains, but something beyond that.   The syntax "word redirect" is
> perfectly valid, and "<< word" is a perfectly valid redirect.   The data
> doesn't need to appear there, if no newline has yet appeared, any more
> than it does in
>
>         cmd << EOF ; ...
>
> where the data doesn't need to appear there, when a newline has not yet
> appeared.
>
> You seem to be hung up on the way you have chosen to implement \$( )
> (which of itself is OK, but it is not required to be done that way)
> where (it seems) you parse the command inside the \$() as if there was no
> world at all outside it.   As far as getting the grammar correct that's
> fine, but it doesn't work with here doc data.
>
>
>   | >    | The netbsd shell appears to be the outlier here. The parser
>   | >    | command substitution so it can parse the entire and-or list
> before trying
>   | >    | to gather any here-documents.
>   | >
>   | > You cannot possibly really mean that I hope.   That is, in
>   | >
>   | >   cmd1 <<EOF &&
>   | >   data
>   | >   EOF
>   | >           cmd2
>   | >
>   | > you do agree that "data" is stdin to cmd1, that is, the herdoc data
>   | > appears splat in the middle of the and-or list.   That's certainly
> the
>   | > way it appears to work (in bash) to me.
>   |
>   | There is no command substitution in this example.
>
> I know.   But go back and read the quote from you (still here, above, in
> this message) again: "The parser reads the command substitution so it can
> parse the entire and-or list before trying to gather any here-documents"
>
> ** parse the entire and-or list before trying to gather any here documents
> **
>
> I don't believe that you really meant that, it isn't the way bash behaves
> (unless this is something different in the devel version, but I doubt that)
> and I was just pointing out that poor phraseology.
>
>   | So, again, the question is whether or not input data that is logically
>   | part of the command substitution (it appears between the opening and
>   | closing parentheses) should affect the `outer' command. That's the
>   | question. We have different answers.
>
> We do, because I don't view here doc data as affecting anything except the
> command for which it is input.   As far as the script goes, it is just a
> rather weird method (kind of like the original implementation) of creating
> an anonymous file and then passing that file as input (usually stdin, but
> not required to be) to a command.
>
> Consider this alternative, which is (one possibility for) what would be
> needed if here-docs did not exist:
>
>         printf '%s\n' 'data' >/tmp/hidden.data.\$\$
>         cmd </tmp/hidden.data.\$\$
>         rm /tmp/hidden.data.\$\$
>
> whereas with here-docs, we do instead
>
>         cmd <<'END'
>         data
>         END
>
> That's all fine, and either of those would (more or less) work
> with any shell.
>
> Now consider instead that cmd is to be run in a command substitution.
>
> One can certainly do
>
>         ... \$(
>                 printf "%s\n" 'data' >/tmp/hidden.data.\$\$
>                 cmd </tmp/hidden.data.\$\$
>                 rm /tmp/hidden.data.\$\$
>         ) ...
>
> which is the rough equivalent of
>
>         ... \$( cmd <<END
>         data
>         END
>         ) ...
>
> and that should work.  No question.
>
> But one can also do
>
>         printf "%s\n" 'data' >/tmp/hidden.data.\$\$
>         .... \$( cmd </tmp/hidden.data.\$\$ ) ...
>         rm rm /tmp/hidden.data.\$\$
>
> and that would also work everywhere, right?   That is, the data for the
> command in the command substitution is created (and removed, but that bit
> of it is generally irrelevant here) outside the command substitution.
>
> This is the rough equivalent of
>
>         ... \$( cmd << \END ) ...
>         data
>         END
>
> And then once you allow that to work (which you're apparently now doing
> in the devel version), there cannot really be any objection to
>
>         cmd <<END \$( cmd1 &&
>         data
>         END
>                         cmd2 )
>
> as that's really just the same principle being applied in the other
> direction.   Furthermore that means that in
>
>         cmd <<END1 \$( cmd1 <<END2 &&
>
> (with a newline after the "&&") the data that follows is
>
>         data1
>         END1
>         data2
>         END2
>
> keeping the left to right across the input line is the order
> that the standard requires here document data to appear in.
>
> Here "input line" is really a logical line, rather than a physical
> one. as we have already agreed that here docs don't appear in the
> middle of quoted strings, and nor do they appear after elided newlines
> (\newline pairs) which are removed, neither of which generates a newline
> token.   But it is "line" not "command", or anything else related to the
> grammar which is specified:
>
>         The redirection operators "<<" and "<<-" both allow redirection
>         of subsequent lines
>
> "subsequent lines" ie: "lines after the current line"
>
>         If more than one "<<" or "<<-" operator is specified on a line,
>         the here-document associated with the first operator shall be
>         supplied first by the application and shall be read first by the
>         shell.
>
> Note: "line", not grammatical command, or script, or and-or list, or
> anything related to the grammar at all.   (The grammar generally ignores
> lines, a newline token is almost just a ';' - except we're allowed as
> many newlines as we like, where just one ';' (sometimes none) is
> permitted).
>
> Another example (no cmdsubs again) that is kind of weird, and unlikely,
> but should be permitted, and should work:
>
> cat << END; case \$PATH
> data
> END
> in
>         *:/bin:*) echo /bin is in PATH! ;;
> esac
>
> Bash (5.1.xx) allows that, so does everything else (aside from some old,
> and not even all that old, ash derived shells which had a bug not relevant
> here).   The heredoc data for cat appears splat in the middle of the
> unrelated case statement.   No problems, it all works, as it should - but
> probably would not if here-doc data was something known to the grammar.
> But it isn't, the lexer removes it, as far as the grammar & its parser are
> concerned the "data" and "END" lines are not there at all.
>
> kre
>
>
>

```