help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort by bash loop command and/or sort


From: Seth David Schoen
Subject: Re: Sort by bash loop command and/or sort
Date: Sat, 15 May 2021 19:44:33 -0700

Budi writes:

> How to sort the numbers such the exponent form numeric values or the
> numbers such these?
> 
> TABLE OF E.G
> 
> A     2.8 x 10-8
> B     3.9 x 10-7
> C      1.3 x 10-6
> D      0.6-0.9 x 10-7
> E      6 x 10-8
> 
> Thought trying it simple:
> 
> $ cat Table_of_eg  | sort -k2.1h
> 
>     TABLE OF E.G
> 
> D      0.6-0.9 x 10-7
> C      1.3 x 10-6
> A     2.8 x 10-8
> B     3.9 x 10-7
> E      6 x 10-8
> But failed !
> Should've given numbers with less (greater negative if any) exponent
> first, if such is same then the less number followed by the next and
> so on ordered in such rule
> Help solve it. Thanks before

Hi Budi, I think it might be more correct and robust to represent these
numbers in a floating-point format in a programming language with a
floating-point data type.

For example, in Python, which has such a type natively, I can do

>>> [0.6e-7, 1.3e-6, 2.8e-8, 3.9e-7, 6e-8]
[6e-08, 1.3e-06, 2.8e-08, 3.9e-07, 6e-08]
>>> sorted([0.6e-7, 1.3e-6, 2.8e-8, 3.9e-7, 6e-8])
[2.8e-08, 6e-08, 6e-08, 3.9e-07, 1.3e-06]

and then the values are natively sorted programmatically by numeric
value.  (I can see this might not work exactly in your example, because
one of the values is a range rather than a single number...)

Although it seems error-prone, you could get results more like what you
expected with something like

sort -t '-' -k2 -n

telling the sort command to use the minus sign as a delimiter (!!!).

I think a solution of intermediate robustness is to sort each kind of value
in its own field, systematically separated with a consistent delimiter.

So instead of

> A     2.8 x 10-8
> B     3.9 x 10-7
> C      1.3 x 10-6
> D      0.6-0.9 x 10-7
> E      6 x 10-8

you might use the tab-delimited data

A       2.8     2.8     -8
B       3.9     3.9     -7
C       1.3     1.3     -6
D       0.6     0.9     -7
E       6       6       -8

where the fields are "item code", "lowest value", "highest value", and
"exponent".  In that case you could use the sort command more according
to its design, to sort first by column 4 and then by column 3, or
something.  The " x 10" doesn't work that well from the sort command's
point of view as a field separator...

If your data is relatively structured and you want to make it more so,
you can experiment with using tr and/or sed to transform it first.  In
all of these cases, it's important to keep in mind what will happen if
you provide somewhat unexpected data or a somewhat unexpected format.
That's something that some people don't like about the very ease of
doing data processing with text files at a bash command line: it's so
easy to construct pipelines that produce good, desired output for a
sample case, but they may not be robust or have good failure modes if
the input data's format can vary more than the examples that you used in
constructing the pipelines.

This is actually directly related to something I was just reading about,
from essentially the opposite extremely:

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

although this is a rather advanced and abstract presentation of this
idea.  But part of the argument here is that if you just build up a data
processing pipeline based on things that work well with examples, there
are probably other cases out there that have ambiguous or undefined
meanings from the point of view of that pipeline, and that will produce
undefined or unexpected behavior.

Since this is help-bash and not lets-all-switch-to-expressive-static-typing,
I'll stop this point there. :-)

-- 
Seth David Schoen <schoen@loyalty.org>      |  Qué empresa fácil no pensar
     http://www.loyalty.org/~schoen/        |  en un tigre, reflexioné.
                                            |        -- Borges, "El Zahir"



reply via email to

[Prev in Thread] Current Thread [Next in Thread]