[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: FYI: nth_element
From: |
Jaroslav Hajek |
Subject: |
Re: FYI: nth_element |
Date: |
Thu, 15 Oct 2009 14:31:22 +0200 |
On Thu, Oct 15, 2009 at 2:22 PM, Shai Ayal <address@hidden> wrote:
> On Thu, Oct 15, 2009 at 12:36 PM, Jaroslav Hajek <address@hidden> wrote:
>> On Thu, Oct 15, 2009 at 9:56 AM, Shai Ayal <address@hidden> wrote:
>>> On Thu, Oct 15, 2009 at 9:15 AM, Jaroslav Hajek <address@hidden> wrote:
>>>> hi all,
>>>>
>>>> Octave has just got a new function: nth_element:
>>>> http://hg.savannah.gnu.org/hgweb/octave/rev/aea3a3a950e1
>>>>
>>>> the docstring:
>>>>
>>>> -- Built-in Function: nth_element (X, N)
>>>> -- Built-in Function: nth_element (X, N, DIM)
>>>> Select the n-th smallest element of a vector, using the ordering
>>>> defined by `sort'. In other words, the result is equivalent to
>>>> `sort(X)(N)'. N can also be a contiguous range, either ascending
>>>> `l:u' or descending `u:-1:l', in which case a range of elements is
>>>> returned. If X is an array, `nth_element' operates along the
>>>> dimension defined by DIM, or the first non-singleton dimension if
>>>> DIM is not given.
>>>>
>>>> nth_element encapsulates the C++ STL algorithms nth_element and
>>>> partial_sort. On average, the complexity of the operation is
>>>> O(M*log(K)), where `M = size(X, DIM)' and `K = length (N)'. This
>>>> function is intended for cases where the ratio K/M is small;
>>>> otherwise, it may be better to use `sort'.
>>>>
>>>> See also: sort, min, max
>>>>
>>>> In short, it allows extracting a small portion of sort(X) without
>>>> actually doing the full sort.
>>>> This is sometimes useful for statistics when computing quantiles and
>>>> the like... statisticians can surely tell better.
>>>>
>>>> Maybe it can be used to boost some existing functions. As an example,
>>>> it can be readily used to speed up the "median" function:
>>>> http://hg.savannah.gnu.org/hgweb/octave/rev/b7b89061bd0e
>>>>
>>> It can also help hist quite a lot.
>>>
>>
>> I don't see how. Can you clarify?
>
> You are right -- I was confused. hist needs the number of elements
> smaller than a given value (and uses sort to find out), while
> nth_element gives the value of the nth element.
> Writing hist in c++ is not hard and would really help since it uses
> (M+K)log(M+K) sort in order to avoid a M*K loop just because looping
> is slow in the scripting language. One of these days I'll get down to
> doing that ...
>
> Shai
>
Why not just use a binary search, via `lookup'? That would be
O(M*log(K) + K) (M the number of data and K the number of bins).
Look what histc does... maybe hist can even simply call histc.
hth
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
Re: FYI: nth_element, Jordi Gutiérrez Hermoso, 2009/10/17