bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IFS field splitting doesn't conform with POSIX


From: Felipe Contreras
Subject: Re: IFS field splitting doesn't conform with POSIX
Date: Thu, 30 Mar 2023 07:51:59 -0600

On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge <greg@wooledge.org> wrote:
>
> On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> >     IFS=,
> >     str='foo,bar,,roo,'
> >     printf '"%s"\n' $str
> >
> > There is a discrepancy between how this is interpreted between bash
> > and zsh: in bash the last comma doesn't generate a field and is
> > ignored,
>
> ... which is correct according to POSIX (but not sensible).
>
> > in zsh a last empty field is generated. Initially I was going
> > to report the bug in zsh, until I read what the POSIX specification
> > says about field splitting [1].
>
> You seem to have misinterpreted whatever you read.
>
> https://mywiki.wooledge.org/BashPitfalls#pf47
>
>     Unbelievable as it may seem, POSIX requires the treatment of IFS as
>     a field terminator, rather than a field separator. What this means
>     in our example is that if there's an empty field at the end of the
>     input line, it will be discarded:
>
>     $ IFS=, read -ra fields <<< "a,b,"
>     $ declare -p fields
>     declare -a fields='([0]="a" [1]="b")'
>
>     Where did the empty field go? It was eaten for historical reasons
>     ("because it's always been that way"). This behavior is not unique
>     to bash; all conformant shells do it.

If you think in terms of terminators instead of separators, then the
above code makes sense because if you add ',' at the end of each field
(terminate it), you get the original string:

    printf '%s,' ${fields[@]}

But you can't replicate 'a,b' that way, because b does not have a
terminator. Obviously we'll want 'b' as a field, therefore one has to
assume either 1) the end of the string is considered an implicit
terminator, or 2) the terminator in the last field is optional.
Neither of these two things is specified in POSIX.

If we consider 1) the end of the string is considered an implicit
terminator, then 'a' contains a valid field, but then 'a,' contains
*two* fields. Making these terminators indistinguishable from
separators.

We can go for 2) of course, but this is not specified anywhere in
POSIX, that's just common practice.

-- 
Felipe Contreras



reply via email to

[Prev in Thread] Current Thread [Next in Thread]