bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IFS field splitting doesn't conform with POSIX


From: Kerin Millar
Subject: Re: IFS field splitting doesn't conform with POSIX
Date: Thu, 30 Mar 2023 18:22:51 +0100

On Thu, 30 Mar 2023 07:51:59 -0600
Felipe Contreras <felipe.contreras@gmail.com> wrote:

> On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge <greg@wooledge.org> wrote:
> >
> > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote:
> > >     IFS=,
> > >     str='foo,bar,,roo,'
> > >     printf '"%s"\n' $str
> > >
> > > There is a discrepancy between how this is interpreted between bash
> > > and zsh: in bash the last comma doesn't generate a field and is
> > > ignored,
> >
> > ... which is correct according to POSIX (but not sensible).
> >
> > > in zsh a last empty field is generated. Initially I was going
> > > to report the bug in zsh, until I read what the POSIX specification
> > > says about field splitting [1].
> >
> > You seem to have misinterpreted whatever you read.
> >
> > https://mywiki.wooledge.org/BashPitfalls#pf47
> >
> >     Unbelievable as it may seem, POSIX requires the treatment of IFS as
> >     a field terminator, rather than a field separator. What this means
> >     in our example is that if there's an empty field at the end of the
> >     input line, it will be discarded:
> >
> >     $ IFS=, read -ra fields <<< "a,b,"
> >     $ declare -p fields
> >     declare -a fields='([0]="a" [1]="b")'
> >
> >     Where did the empty field go? It was eaten for historical reasons
> >     ("because it's always been that way"). This behavior is not unique
> >     to bash; all conformant shells do it.
> 
> If you think in terms of terminators instead of separators, then the
> above code makes sense because if you add ',' at the end of each field
> (terminate it), you get the original string:
> 
>     printf '%s,' ${fields[@]}
> 
> But you can't replicate 'a,b' that way, because b does not have a
> terminator. Obviously we'll want 'b' as a field, therefore one has to
> assume either 1) the end of the string is considered an implicit
> terminator, or 2) the terminator in the last field is optional.
> Neither of these two things is specified in POSIX.
> 
> If we consider 1) the end of the string is considered an implicit
> terminator, then 'a' contains a valid field, but then 'a,' contains
> *two* fields. Making these terminators indistinguishable from
> separators.
> 
> We can go for 2) of course, but this is not specified anywhere in
> POSIX, that's just common practice.

You may find these interesting; the second link in particular.

- https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html
- https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html
- http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html

Though I was aware of these behaviours, I do find the POSIX wording to be 
unclear as concerns the observations made by the second link, to say the least. 
I would add that it is possible to have it both ways, so to speak, though the 
means of going about it are no less confusing than the topic at large.

$ IFS=,
$ str="a,b"
$ arr=($str""); declare -p arr
declare -a arr=([0]="a" [1]="b")
$ str="a,b,"
$ arr=($str""); declare -p arr # duly coercing an empty field that some may 
expect or wish for
declare -a arr=([0]="a" [1]="b" [2]="")

-- 
Kerin Millar



reply via email to

[Prev in Thread] Current Thread [Next in Thread]