bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Different delimiter for 'collapse'


From: Erik Auerswald
Subject: Re: Different delimiter for 'collapse'
Date: Sat, 13 Feb 2021 22:00:14 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Hi,

On 13.02.21 18:14, Eric Powell wrote:
Datamash is such a wonderful piece of software and I am so happy to have
discovered it.
One feature that I wish was available is to change the delimiter for the
collapse operation.  My data has commas in it already so I cannot
distinguish between those and the commas produced by collapse.  It would be
great if there was a command-line flag allowing the user to choose the
delimiter used by collapse.

I second this.  It is often useful and convenient to be able to
choose input and/or output separators.  Many programs allow to
specify those via options.  It would be nice for datamash to
have such options, too.

It already provides the option "-t, --field-separator=X", but
that does not affect the separator for "collapse":

    $ printf '1:a\n2:b\n3:c\n' | datamash -t: collapse 1,2
    1,2,3:a,b,c

That new option could be another (optional) argument for collapse,
e.g., "collapse FIELD_LIST [COMMA]," where COMMA is the character
to use as the 'comma' between collapsed values.  Or something
like "collapse FIELD_LIST[:COMMA]" to simplify parsing.

That said, you may be able to use other GNU tools to pre and post
process the data.  Thus you may be able to replace commas in the
input data with some other character, e.g. a semicolon (';'),
process the data with datamash, and then turn every semicolon in
the output into a comma while turning every comma into something
else, e.g. a pipe ('|'):

    $ tr ',' ';' < INPUT_DATA | datamash collapse 1 | tr ';,' ',|'

The semicolon and pipe are just examples, you can use whatever
characters are convenient.

Thanks,
Erik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]