help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What does `echo xxx 1>&2xxx` do?


From: Peng Yu
Subject: Re: What does `echo xxx 1>&2xxx` do?
Date: Sat, 8 May 2021 19:01:11 -0500

    for(int c=0;c<256;++c) {
        if(shellquote(c)) printf("0x%02x:%c\n", c, c);
    }

0x22:"
0x27:'
0x60:`

    for(int c=0;c<256;++c) {
        if(PATTERN_CHAR(c)) printf("0x%02x:%c\n", c, c);
    }

0x21:!
0x2a:*
0x2b:+
0x3f:?
0x40:@

    for(int c=0;c<256;++c) {
        if(GLOB_CHAR(c)) printf("0x%02x:%c\n", c, c);
    }

0x2a:*
0x3f:?
0x5b:[
0x5d:]
0x5e:^

    for(int c=0;c<256;++c) {
        if(shellxquote(c)) printf("0x%02x:%c\n", c, c);
    }

0x22:"
0x27:'
0x5c:\
0x60:`

    for(int c=0;c<256;++c) {
        if(shellbreak(c)) printf("0x%02x:%c\n", c, c);
    }

0x09:
0x0a:

0x20:
0x26:&
0x28:(
0x29:)
0x3b:;
0x3c:<
0x3e:>
0x7c:|

    for(int c=0;c<256;++c) {
        if(shellblank(c)) printf("0x%02x:%c\n", c, c);
    }

0x09:
0x20:


    for(int c=0;c<256;++c) {
        if(shellexp(c)) printf("0x%02x:%c\n", c, c);
    }

0x24:$
0x3c:<
0x3e:>


On Sat, May 8, 2021 at 4:50 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 5/8/21 12:22 PM, Peng Yu wrote:
> > https://git.savannah.gnu.org/cgit/bash.git/tree/parse.y#n346
> >
> > How does yylex() know "1" should be treated as a <number> as in "1>&2"
> > and "1" should be treated as a <word> as in "1 >&2"? Could anybody
> > explain how this context-dependency is resolved by yylex() in detail?
>
> The short answer is that tokens are delimited by metacharacters. Space and
> `>' are both metacharacters that delimit the token "1".

To document what I learned from the source code, so the following 7
characters and only the 7 characters are the metacharacters.

    for(int c=0;c<256;++c) {
        if(shellmeta(c)) printf("0x%02x:%c\n", c, c);
    }

0x26:&
0x28:(
0x29:)
0x3b:;
0x3c:<
0x3e:>
0x7c:|

> You can get a long way by simply reading the POSIX grammar and rules for
> recognizing and classifying tokens. These are the two relevant rules from

I don't see the definition of metacharacters in the POSIX document. So
this is definition is from the nomenclature used in bash?

> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03
> :

"The shell breaks the input into tokens: words and operators; see
Token Recognition."

So the manual defines how to determine what are words and operators.
It says how to determine operators. But the definition seems to be
recursive making it hard to directly tell what are operators.

Are there only a limited number of operators? If so, is there a way to
print all the operators supported by bash?

Are `${`, `$(` and `$((` operators?

> 6. If the current character is not quoted and can be used as the first
>     character of a new operator, the current token (if any) shall be
>     delimited. The current character shall be used as the beginning of the
>     next (operator) token.

So this means that a character is always preferred to be associated
with the following characters, rather than with the preceding
characters, to form a token. Could you show some examples for this
case?

Does this mean that one will always look ahead to determine what is a
token? What is the number of characters to look ahead?

> 7. If the current character is an unquoted <blank>, any token containing
>     the previous character is delimited and the current character shall be
>     discarded.
>
> The `>' delimited token follows this POSIX rule from
>
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10_01
> :
>
> 2. If the string consists solely of digits and the delimiter character is
>     one of '<' or '>', the token identifier IO_NUMBER shall be returned.
>
> If you want to see where that happens, look at where read_token_word()
> returns NUMBER.
>
> The space-delimited token is a TOKEN and follows this rule:
>
> 3. Otherwise, the token identifier TOKEN results.
>
> The TOKEN is further classified as the grammar requires. POSIX puts it like
> this:
>
> "Further distinction on TOKEN is context-dependent. It may be that the same
> TOKEN yields WORD, a NAME, an ASSIGNMENT_WORD, or one of the reserved words
> below, dependent upon the context."
>
> In this case, it's a WORD.

Could you give examples for the NAME and ASSIGNMENT_WORD cases?

So because this context-dependency, it is not possible to determine
the token just by lexical analysis? What specific syntactic
information is fed back to the lexer in order to determine whether it
is a WORD, a NAME, or an ASSIGNMENT_WORD?

> >>> How the parsing is done?
> >>
> >> If we literally interpret the manual, the distinction between
> >> filenames and file descriptors is not processed at the parsing level.
>
> Correct, for the most part. Read
>
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_05
>
> for instance; its description of `word' is basically identical to the bash
> manual text.
>
> The bash tokenizer takes a shortcut and skips the expansion for a `word'
> consisting entirely of digits and immediately classifies it as a NUMBER
> if it's in the right place in a redirection operator.

-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]