bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

field splitting with IFS non-whitespace


From: Greg Wooledge
Subject: field splitting with IFS non-whitespace
Date: Tue, 11 Jan 2011 15:36:11 -0500
User-agent: Mutt/1.4.2.3i

POSIX 2.6.5 Field Splitting [1] says, in part,

1. If IFS is <space><tab><newline> or unset, ...
2. If IFS is null, ...
3. Otherwise, ...
  b. Each occurrence in the input of an IFS character that is not IFS
     white space, along with any adjacent IFS white space, shall delimit
     a field, as described previously.

I'm attempting to understand what exactly "delimit a field" means.
Specifically, consider this case:

$ (x=bar, IFS=,; set -f; a=($x); printf "<%s> " "${a[@]}"; echo)
<bar> 

With an input string ending with a non-whitespace IFS character, bash
apparently drops the final character altogether, rather than creating
an empty second field.  Bash 2.05 through 4.2-beta all do this, and
ksh88 and ksh93 as well.

Is that the correct behavior?  Does "delimit a field" mean "end a field,
and possibly start a new one if there's something after it", or does it
always mean "start a new field"?  (It seems bash and ksh use the former
definition.)

I expected to see two fields resulting, largely because:

$ (x=,bar IFS=,; set -f; a=($x); printf "<%s> " "${a[@]}"; echo)
<> <bar> 

An IFS delimiter at the start of the string is not "ignored" the way
an IFS delimiter at the end appears to be.

The question gets slightly more interesting when we look at read:

$ (IFS=, read -r a <<< "bar,"; echo "<$a>")
<bar>

Normally I would expect read with a single argument variable to put
the entire input line, minus leading/trailing IFS *whitespace*, into
that variable.

But apparently that's not what it does in bash or ksh, much to my
surprise.  A *single* trailing IFS non-whitespace delimiter gets eaten.
But multiple trailing IFS non-whitespace delimiters do not:

$ (IFS=, read -r a <<< "bar,,"; echo "<$a>")
<bar,,>

I can understand the behavior here, actually, due to the "If there are
fewer vars than fields" clause of POSIX's definition of read. [2]  It's
just the single-delimiter case that's got me mixed up.


[1] 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

[2] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]