bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IFS field splitting doesn't conform with POSIX


From: Felipe Contreras
Subject: Re: IFS field splitting doesn't conform with POSIX
Date: Thu, 30 Mar 2023 11:52:06 -0600

On Thu, Mar 30, 2023 at 11:22 AM Kerin Millar <kfm@plushkava.net> wrote:
>
> On Thu, 30 Mar 2023 07:51:59 -0600
> Felipe Contreras <felipe.contreras@gmail.com> wrote:
>
> > On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge <greg@wooledge.org> wrote:
> > >
> > > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > > >     IFS=,
> > > >     str='foo,bar,,roo,'
> > > >     printf '"%s"\n' $str
> > > >
> > > > There is a discrepancy between how this is interpreted between bash
> > > > and zsh: in bash the last comma doesn't generate a field and is
> > > > ignored,
> > >
> > > ... which is correct according to POSIX (but not sensible).
> > >
> > > > in zsh a last empty field is generated. Initially I was going
> > > > to report the bug in zsh, until I read what the POSIX specification
> > > > says about field splitting [1].
> > >
> > > You seem to have misinterpreted whatever you read.
> > >
> > > https://mywiki.wooledge.org/BashPitfalls#pf47
> > >
> > >     Unbelievable as it may seem, POSIX requires the treatment of IFS as
> > >     a field terminator, rather than a field separator. What this means
> > >     in our example is that if there's an empty field at the end of the
> > >     input line, it will be discarded:
> > >
> > >     $ IFS=, read -ra fields <<< "a,b,"
> > >     $ declare -p fields
> > >     declare -a fields='([0]="a" [1]="b")'
> > >
> > >     Where did the empty field go? It was eaten for historical reasons
> > >     ("because it's always been that way"). This behavior is not unique
> > >     to bash; all conformant shells do it.
> >
> > If you think in terms of terminators instead of separators, then the
> > above code makes sense because if you add ',' at the end of each field
> > (terminate it), you get the original string:
> >
> >     printf '%s,' ${fields[@]}
> >
> > But you can't replicate 'a,b' that way, because b does not have a
> > terminator. Obviously we'll want 'b' as a field, therefore one has to
> > assume either 1) the end of the string is considered an implicit
> > terminator, or 2) the terminator in the last field is optional.
> > Neither of these two things is specified in POSIX.
> >
> > If we consider 1) the end of the string is considered an implicit
> > terminator, then 'a' contains a valid field, but then 'a,' contains
> > *two* fields. Making these terminators indistinguishable from
> > separators.
> >
> > We can go for 2) of course, but this is not specified anywhere in
> > POSIX, that's just common practice.
>
> You may find these interesting; the second link in particular.

Indeed.

> - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html
> - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html

This says precisely what I said in 1):

Chet wrote:
> Alternately, you can think of the NUL at the end of the string as an
> additional field terminator,

Except if you do that, then 'a,' has two fields since the end of the
string is an additional field terminator, as I explained.

> but one that follows the adjacency rules and doesn't create any empty
> fields.

So it's a *very special* field terminator that is mentioned nowhere in
the POSIX specification.

> - http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html
>
> Though I was aware of these behaviours, I do find the POSIX wording to be 
> unclear as concerns the observations made by the second link, to say the 
> least.

So I'm not the only one who thinks it's unclear.

Not to mention the small detail that the Internal Field Separator is
not a *separator*, but a terminator (with certain exceptions).

-- 
Felipe Contreras



reply via email to

[Prev in Thread] Current Thread [Next in Thread]