bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-datamash] Feature request: percentiles


From: Barry Nisly
Subject: Re: [Bug-datamash] Feature request: percentiles
Date: Tue, 14 Mar 2017 22:28:33 -0700

Hey Assaf,

Attached is a patch that implements percentiles using the parsing and processing you recommended. Check it out and let me know if you have feedback. 

Thanks,
Barry

On Sun, Mar 12, 2017 at 6:56 PM, Assaf Gordon <address@hidden> wrote:
Hello Barry,

Sorry for the delayed response.

> On Mar 6, 2017, at 02:57, Barry Nisly <address@hidden> wrote:
>
> I just found out about datamash and I want to thank you for creating such a useful tool.

Thank you for your kind words.

> My request is to add percentile in addition to the quartile calculations.
>
> I typically deal with latencies and am interested in 90, 95, or 99 percentiles. Arbitrary percentiles would be great but, in looking at the code, it doesn’t seem easy to implement. Creating hardcoded percentile calculations (e.g., 90, 95, 99) would be simple (adding the opcodes and connecting them to percentile_value() in src/utils.c.
>
> Ideally, I could specify an arbitrary percentile, e.g., ‘percentile_93’ and have the parser parse out the percentile and pass it along with the ‘percentile’ opcode.
>
> I may take a crack at implementing this as time permits and if there is any interest in the feature.

I like this idea very much.

If I may suggest:
There are already two operations that accept a parameter: 'bin' and 'strbin'.
In their case the optional parameter determines the bucket size.
e.g. default bucket size of 100:
   seq 1 500 | datamash --full bin 1
vs bucket size of 10:
   seq 1 500 | datamash --full bin:10 1

The parser (in op-parser.c) already takes the value after a ':' and uses it as a parameter.
The function op-parser.c:set_op_params() checks if the parameter can be used with the requested operation.

I would try to implement a 'percentile' operation exactly in that way (in terms of parsing).

In terms of processing, it should probably be a case very similar to OP_QUARTILE_1/3/IQR/MEDIAN
in 'fields-ops.c'.

Please do try your hand at it and i'm happy to help making it work. Also feel free to send partial patches and we'll discuss and improve them.
I apologize in  advance if my replies are a bit delayed - a bit hectic at work at the moment.

regards,
 - assaf






Attachment: 0001-percentiles.patch.gz
Description: GNU Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]