bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: quoted fields and decimal separators


From: code
Subject: Re: quoted fields and decimal separators
Date: Tue, 04 Jan 2022 09:14:17 +0100
User-agent: Roundcube Webmail/1.2.3

Thanks for your answer.

I agree the third example you gave is trickier, but for the first two,
I got it to work by using "-t" flag to set the delimiter, and make
sure I have a German locale set. Eg if you put the first set of data
into /tmp/ex1.txt, this works for me:

$ sed 's/"//g' /tmp/ex1.txt | LC_ALL=de_DE.UTF-8 datamash -t';' -H
groupby 4 sum 9
GroupBy(Buchungstext);sum(Betrag)
BARGELDAUSZAHLUNG;-50
GUTSCHR. UEBERWEISUNG;10
KARTENZAHLUNG;-21,98
ENTGELTABSCHLUSS;-4,9
GUTSCHR. UEBERWEISUNG;10
DAUERAUFTRAG;-17,5
ENTGELTABSCHLUSS;-4,9

Thank you.
As mentioned in the other mail: this works, but I still wish I could overwrite the separator as I come across differently formatted data quite frequently. Just removing the quotation marks works here, but the quotation might be there for a reason. And a single quotation mark somewhere in the text might change the result, possibly without the user noticing.

I didn't know about the locale setting.
Could we please include it into the manual?
I didn't see any notion about decimal-separators or locales there.


For pathological data, you may need a csv-aware utility, eg I once
wrote a little code using libcsv which could easily change the
delimiters when piping this kind of data around.

Perhaps this is what you want: https://miller.readthedocs.io/en/latest/

Actually this is an amazing tool, thanks for that.
I just tried it and it does a very good job.
It has a ton of options, but they do seem well organized (and they seem necessary to parse the ton of different inputs it is designed to handle).

I think with miller and datamash I can tackle most of my problems.

Thanks a lot,
Johannes



reply via email to

[Prev in Thread] Current Thread [Next in Thread]