bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-datamash] Control the decimal separator


From: Assaf Gordon
Subject: Re: [Bug-datamash] Control the decimal separator
Date: Tue, 17 Oct 2017 17:35:34 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hello,

On 2017-10-16 08:21 AM, Magnus Göransson wrote:
I used LC_NUMERIC environment variable to control decimal separator, for example:

export LC_NUMERIC=sv_SE.UTF-8

In my bash-script solved the problem that my decimals (",") was incorrectly interpreted when the script was running from crontab where the environment is very limited. The error from datamash was "invalid numeric value in line" due to the wrong interpretation.

Thank you for the report.

To ensure I understand the problem, can you confirm the following:

1.
in sv_SE locale, thousands are separated by space, and decimals (fractions) by a comma.
I see the following on my computer:

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 300.3 2000.2 1000.1
  300,3
  2 000,2
  1 000,1

  $ env LC_NUMERIC=en_CA.UTF-8 printf "%'.1f\n" 300.3 2000.2 1000.1
  300.3
  2,000.2
  1,000.1



2.
When dealing only with fraction (no thousand separators),
I see that "datamash" does work correctly if one sets the LC_NUMERIC locale, and that is similar to other GNU programs (e.g. "sort"):

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7
  1,5
  1,1
  1,7

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7 \
       |LC_NUMERIC=sv_SE.UTF-8 sort -k1g,1 -s
  1,1
  1,5
  1,7

With correct locale:

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7 \
       | LC_NUMERIC=sv_SE.UTF-8 datamash sum 1
  4,3

With incorrect locale:

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1.5 1.1 1.7 \
      | LC_NUMERIC=en_CA.UTF-8 datamash sum 1
  datamash: invalid numeric value in line 1 field 1: '1,5'


Is that what you are experiencing?
Or do you get datamash errors even in the correct locale?




3.
When thousands-separator space character is included, I see that
datamash does have problems parsing the values.

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1000.1 300.3 2000.2
  1 000,1
  300,3
  2 000,2

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1000.1 300.3 2000.2 \
          | LC_ALL=sv_SE.UTF-8 datamash sum 1
  datamash: invalid numeric value in line 1 field 1: '1 000,1'


However, I see that other GNU programs also do not parse these numbers:

  $ env LC_NUMERIC=sv_SE.UTF-8 printf "%'.1f\n" 1000.1 300.3 2000.2 \
          | LC_ALL=sv_SE.UTF-8 sort -k1g,1 -s --debug
  sort: using ‘sv_SE.UTF-8’ sorting rules
  1 000,1
  _
  2 000,2
  _
  300,3
  _____


---

You mentioned only "decimal separator" issues - these should be solved
when specifying LC_NUMERIC=sv_SE.UTF-8 before executing 'datamash'.

As for the thousand separator - the current code does not support space as a separator, in line with other gnu programs.

Is the above sufficient to work-around the issue, or do you experience other issues ?

regards,
 - assaf







reply via email to

[Prev in Thread] Current Thread [Next in Thread]