bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

2 issues with binning


From: Andreas Schamanek
Subject: 2 issues with binning
Date: Sun, 19 Jun 2022 23:54:48 +0200 (CEST)


Hi everyone,

Recently, I ran into some issues when I tried to calculate histograms using an Awk script of my own. Eventually, I found that I needed to fix my script. As I was digging deeper I started to look for alternatives to do the binning. That's how I found datamash which I wish I knew all along as it does so many things I frequently need. So, thanks for this great tool.

Comparing the outputs of my script with those of datamash it seems I hit 2 possible "bugs" in datamash. (Disclaimer: I am not a programmer apart from some scripting skills.)

## Possible issue due to binary floating point arithmetics:

$ printf '%s\n' 4.19 4.2 4.21 | datamash --full bin:0.1 1
4.19    4.1
4.2     4.1
4.21    4.2

Of course, 4.2 should bin to 4.2, unless I mistaken.

## Possible issue with binning negative numbers:

$ printf '%s\n' 0 1 2  | datamash --full bin:2 1
0       0
1       0
2       2

For positive numbers, the bins are inclusive ("[") on the lower end, exclusive on the upper end (")"), i.e. here they are [0,2) and [2,4).

I expected this type of binning to continue for negative numbers, i.e. that the bins left of [0,2) are [-2,0) and next one would be [-4,2). However:

$ printf '%s\n' -2 -1 0  | datamash --full bin:2 1
-2      -4
-1      -2
0       0

I was expecting -2 to map to -2. Maybe, bin:1 shows my concerns even better:

$ printf '%s\n' -2 -1 0 | datamash --full bin:1 1
-2      -3
-1      -2
0       0

Curious, what you think!

--
-- Andreas

     :-)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]