[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.
From: |
Erik Auerswald |
Subject: |
Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8) |
Date: |
Sat, 25 Jun 2022 00:36:05 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 |
Hi Tim,
On 24.06.22 23:36, Tim Rice wrote:
Hey Erik,
while looking at the binning issues reported by Andreas Schamanek[0] I
noticed that providing floating point numbers as bin sizes does not work
when using a locale where comma (',') is used as decimal separator:
$ echo $LC_NUMERIC
de_DE.UTF-8
...
$ echo 1,15 | datamash bin:0,1 1
datamash: missing field for operation ‘bin’
I was having a play around with this, and (plot twist!), things work as
expected when using LC_ALL instead of LC_NUMERIC:
```
$ datamash sum 1 <<< 1,1
datamash: invalid numeric value in line 1 field 1: '1,1'
$ LC_ALL=de_DE.utf8 datamash sum 1 <<< 1,1
1,1
```
I cannot reproduce that, both LC_NUMERIC and LC_ALL work for me.
Reading numbers in de_DE.UTF-8 format works:
$ printf '%s\n' 1,1 2,2 | ./datamash sum 1
3,3
They can be binned into buckets, too:
$ printf '%s\n' 1,1 2,2 | ./datamash --full bin:1 1
1,1 1
2,2 2
But the bucket size cannot be a floating point number:
$ printf '%s\n' 1,1 2,2 | ./datamash --full bin:1,1 1
./datamash: missing field for operation ‘bin’
$ printf '%s\n' 1,1 2,2 | ./datamash --full bin:1.1 1
./datamash: invalid operand ‘.1 1’
$ printf '%s\n' 1,1 2,2 | ./datamash --full bin:1\\,1 1
./datamash: invalid operation ‘1’
But with a locale using '.' as decimal separator, the bucket size
can be floating point:
$ printf '%s\n' 1.1 2.2 \
> | env LC_NUMERIC=en_US.UTF-8 ./datamash --full bin:1.1 1
1.1 1.1
2.2 2.2
I agree it should also work with LC_NUMERIC. So far, it is mysterious to
me why it doesn't. I tried explicitly using `setlocale(LC_NUMERIC,"")`
in the main function (where LC_ALL is set), but nothing seems to "stick".
Because the problem is not reading locale specific input, it is
parsing an operation specification comprising a floating point
number using ',' as decimal separator. The comma has a special
meaning in operation parsing.
Do you have any insight about what the problem might be?
Not yet. I supposed the operation parser does not take the locale
setting into account.
I tried checking what other GNU projects do. I thought GNU Awk or GNU bc
might point me in the right direction. In fact, it seems like they don't
even respect LC_ALL:
Yes, they just use '.' as decimal separator and do not honor the
locale setting. I think that is fine.
```
$ awk '{printf "%f %f\n", $1, $2}' <<< "1,1 1.1"
1.000000 1.100000
$ LC_ALL=de_DE.utf8 awk '{printf "%f %f\n", $1, $2}' <<< "1,1 1.1"
1.000000 1.100000
$ LC_ALL=de_DE.utf8 bc <<< '1,1+1,1'
(standard_in) 1: syntax error
(standard_in) 1: syntax error
$ LC_ALL=de_DE.utf8 bc <<< '1.1+1.1'
2.2
```
So if we can figure this out for GNU Datamash, we may need to raise some
bugs and submit some patches to other GNU projects too :)
I do not think so. I actually prefer the behavior of GNU Awk or bc.
But GNU Datamash uses the locale setting since a long time, so IMHO
we should look into making it work better.
Thanks,
Erik
- [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Erik Auerswald, 2022/06/23
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Tim Rice, 2022/06/23
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Tim Rice, 2022/06/24
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Tim Rice, 2022/06/24
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Tim Rice, 2022/06/24
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Erik Auerswald, 2022/06/24
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Tim Rice, 2022/06/24
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Tim Rice, 2022/06/25
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Erik Auerswald, 2022/06/25
- Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8), Erik Auerswald, 2022/06/26
Re: [BUG] fractional bin sizes do not work in some locales (e.g., de_DE.UTF-8),
Erik Auerswald <=