[Octave-bug-tracker] [bug #36372] Improved ranks.m included:

octave-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #36372] Improved ranks.m included:

From:	anonymous
Subject:	[Octave-bug-tracker] [bug #36372] Improved ranks.m included:
Date:	Wed, 02 May 2012 16:50:37 +0000
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20120421 Firefox/3.5.16

URL:
  <http://savannah.gnu.org/bugs/?36372>

                 Summary: Improved ranks.m included:
                 Project: GNU Octave
            Submitted by: None
            Submitted on: Wed 02 May 2012 04:50:36 PM UTC
                Category: None
                Severity: 3 - Normal
                Priority: 5 - Normal
              Item Group: Performance
                  Status: None
             Assigned to: None
         Originator Name: Dave Goel
        Originator Email: address@hidden
             Open/Closed: Open
         Discussion Lock: Any
                 Release: dev
        Operating System: GNU/Linux

    _______________________________________________________

Details:

Hi.

I believe that ranks.m (Though I originally tested from version 3.2.4,
the dev. version's algo is basically unchanged) is very inefficient
except when the elements of the input matrix are distinct.

For example,  for this input, a=round(10*rand(100123,100));

my alternative finished computation in 1.4s while the current function
hasn't yet returned an answer in the last 15-20 minutes.

For smaller inputs, I checked carefully, and the outputs do agree.

---
Additionally, the alternative also implements multiple ranking schemes:
fractional (default), standard competition ranking, modified competition
ranking, and finally, ordinal ranking; examples of all of which are seen
here: http://en.wikipedia.org/wiki/Ranking


----

Though I used a completely new algorithm, I managed to pen the
alternative as a derivative of the current function. Thus, attaching
(a) diff; (b) as the complete file.

The "diff -u" is included below. Below that follows the entire file. Thanks
very much.

----


1a2
> ## Copyright (C) 2012 Dave Goel
6,7c7,8
< ## under the terms of the GNU General Public License as published by
< ## the Free Software Foundation; either version 3 of the License, or (at
---
> ## under the terms of the GNU General Public License as published by the
> ## Free Software Foundation; either version 3 of the License, or (at
10,13c11,14
< ## Octave is distributed in the hope that it will be useful, but
< ## WITHOUT ANY WARRANTY; without even the implied warranty of
< ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
< ## General Public License for more details.
---
> ## Octave is distributed in the hope that it will be useful, but WITHOUT
> ## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> ## FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> ## for more details.
18c19
< 
---
> ##
20,27c21,40
< ## @deftypefn {Function File} {} ranks (@var{x}, @var{dim})
< ## Return the ranks of @var{x} along the first non-singleton dimension
< ## adjusted for ties.  If the optional argument @var{dim} is
< ## given, operate along this dimension.
< ## @seealso{spearman, kendall}
< ## @end deftypefn
< 
< ## Author: KH <address@hidden>
---
> ##
> ## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
> ## ranks of @var{x} along the first non-singleton dimension adjust for
> ## ties. If the optional argument @var{dim} is given, operate along this
> ## dimension. Third argument @var{rtype} donates the type of ranking: 0
> ## or "fractional" for fractional, which is the default, 1 or
> ## "competition" for competiton ranking, 2 or "modified" for modified
> ## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
> ## "dense" for dense ranking. When using the third argument, you can
> ## supply us '' for @var{dim} to have it auto-computed. @end deftypefn
> ##
> ## This function returns a result identical to GNU Octave's built-in
> ## ranks.m but is much faster (For example, 1.4s vs. ''no result in
> ## 33mins'' when the input was a=round(10*rand(100123,100));
> ## Additionally, we also handle several ranking schemes. This function
> ## has been submitted to octave's mailing list and may (or may not)
> ## become part of GNU Octave.
> ##
> ##
> ## Authors: KH <address@hidden>, Dave Goel <address@hidden>
30,32d42
< ## This code was rather ugly, since it didn't use sort due to the
< ## fact of how to deal with ties. Now it does use sort and its
< ## even uglier!!! At least it handles NDArrays..
34c44
< function y = ranks (x, dim)
---
> function y = ranks (x, dim, rtype)
36c46
<   if (nargin != 1 && nargin != 2)
---
>   if (nargin < 1);
40,41c50,51
<   if (! (isnumeric (x) || islogical (x)))
<     error ("ranks: X must be a numeric vector or matrix");
---
>   if nargin<3||isempty(rtype);
>     rtype=0;
43c53
< 
---
>   
46c56
<   if (nargin != 2)
---
>   if (nargin<2)||isempty(dim);
48c58,64
<     (dim = find (sz > 1, 1)) || (dim = 1);
---
>     dim  = 1;
>     while (dim < nd + 1 && sz(dim) == 1)
>       dim = dim + 1;
>     endwhile
>     if (dim > nd)
>       dim = 1;
>     endif
50,52c66,69
<     if (!(isscalar (dim) && dim == fix (dim))
<         || !(1 <= dim && dim <= nd))
<       error ("ranks: DIM must be an integer and a valid dimension");
---
>     if (! (isscalar (dim) && dim == round (dim))
>       && dim > 0
>       && dim < (nd + 1))
>       error ("ranks: dim must be an integer and valid dimension");
66,80c83,105
<     sz = size (x);
<     infvec = -Inf ([1, sz(2 : end)]);
<     [xs, xi] = sort (x);
<     eq_el = find (diff ([xs; infvec]) == 0);
<     if (isempty (eq_el))
<       [eq_el, y] = sort (xi);
<     else
<       runs = setdiff (eq_el, eq_el+1);
<       len = diff (find (diff ([Inf; eq_el; -Inf]) != 1)) + 1;
<       [eq_el, y] = sort (xi);
<       for i = 1 : length(runs)
<         y (xi (runs (i) + [0:(len(i)-1)]) + floor (runs (i) ./ sz(1))
<            * sz(1)) = eq_el(runs(i)) + (len(i) - 1) / 2;
<       endfor
<     endif
---
>     sz=size(x);
>     [sx ids]=sort(x); ## sx is sorted x.
>     
>     lin=cumsum(ones(size(x)),1);  ## A linearly increasing array.
> 
>     switch rtype;
>       case {0,"fractional"};
>         lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
>       case {1,"competition"};
>         lin=_competition(lin,sx,sz);
>       case {2,"modified"};
>         lin=_modified(lin,sx,sz);
>       case {3,"ordinal"};
>         ## lin=lin;
>       otherwise 
>         rtype
>         error("Illegal value of rtype specified.");
>     endswitch
>     y=NaN(size(lin));
>     ## If input was a vector, we could have simply done this: 
>     ## y(ids)=lin;
>     y(_ids(ids,sz))=lin;
> 
84a110
> endfunction
85a112,123
> function idf=_ids(ids,sz);
>   oo=ones(sz);
>   allids=[{ids-1}];
>   nn=numel(sz);
>   for ii=2:nn;
>     allids=[allids,{cumsum(oo,ii)-1}];
>   endfor
>   idf=allids{end};
>   for jj=(nn-1):-1:1;
>     idf=idf*sz(jj)+allids{jj};
>   endfor
>   idf+=1;
88a127,148
> function linnew=_competition (lin,sx,sz)
>   ## Stop increasing lin when sx does not increase. Same as before
>   ## otherwise.
>   infvec = -Inf * ones ([1, sz(2 : end)]);
>   fnewp= find(diff([infvec;sx]));
>   linnew=zeros(size(lin));
>   linnew(fnewp)=lin(fnewp);
>   linnew=cummax(linnew);
> endfunction
> 
> 
> function linnew=_modified (lin,sx,sz)
>   ## Traverse lin backwards. Stop decreasing it when sx doesn't
>   ## decrease.
>   infvec = Inf * ones ([1, sz(2 : end)]);
>   fnewp= find(diff([sx;infvec]));
>   linnew=Inf(size(lin));
>   linnew(fnewp)=lin(fnewp);
>   linnew=flipdim(cummin(flipdim(linnew,1)),1);
> endfunction 
> 
> 
104d163
< 


====================================================

## Copyright (C) 1995-2012 Kurt Hornik
## Copyright (C) 2012 Dave Goel
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by the
## Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but WITHOUT
## ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
## FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
## for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING.  If not, see
## <http://www.gnu.org/licenses/>.
##
## -*- texinfo -*-
##
## @deftypefn {Function File} {} ranks (@var{x}, @var{dim}) Return the
## ranks of @var{x} along the first non-singleton dimension adjust for
## ties. If the optional argument @var{dim} is given, operate along this
## dimension. Third argument @var{rtype} donates the type of ranking: 0
## or "fractional" for fractional, which is the default, 1 or
## "competition" for competiton ranking, 2 or "modified" for modified
## competition ranking, 3 or "ordinal" for ordinal ranking and 4 or
## "dense" for dense ranking. When using the third argument, you can
## supply us '' for @var{dim} to have it auto-computed. @end deftypefn
##
## This function returns a result identical to GNU Octave's built-in
## ranks.m but is much faster (For example, 1.4s vs. ''no result in
## 33mins'' when the input was a=round(10*rand(100123,100));
## Additionally, we also handle several ranking schemes. This function
## has been submitted to octave's mailing list and may (or may not)
## become part of GNU Octave.
##
##
## Authors: KH <address@hidden>, Dave Goel <address@hidden>
## Description: Compute ranks


function y = ranks (x, dim, rtype)

  if (nargin < 1);
    print_usage ();
  endif

  if nargin<3||isempty(rtype);
    rtype=0;
  endif
  
  nd = ndims (x);
  sz = size (x);
  if (nargin<2)||isempty(dim);
    ## Find the first non-singleton dimension.
    dim  = 1;
    while (dim < nd + 1 && sz(dim) == 1)
      dim = dim + 1;
    endwhile
    if (dim > nd)
      dim = 1;
    endif
  else
    if (! (isscalar (dim) && dim == round (dim))
        && dim > 0
        && dim < (nd + 1))
      error ("ranks: dim must be an integer and valid dimension");
    endif
  endif

  if (sz(dim) == 1)
    y = ones(sz);
  else
    ## The algorithm works only on dim = 1, so permute if necesary.
    if (dim != 1)
      perm = [1 : nd];
      perm(1) = dim;
      perm(dim) = 1;
      x = permute (x, perm);
    endif
    sz=size(x);
    [sx ids]=sort(x); ## sx is sorted x.
    
    lin=cumsum(ones(size(x)),1);  ## A linearly increasing array.

    switch rtype;
      case {0,"fractional"};
        lin=(_competition(lin,sx,sz)+_modified(lin,sx,sz))/2;
      case {1,"competition"};
        lin=_competition(lin,sx,sz);
      case {2,"modified"};
        lin=_modified(lin,sx,sz);
      case {3,"ordinal"};
        ## lin=lin;
      otherwise 
        rtype
        error("Illegal value of rtype specified.");
    endswitch
    y=NaN(size(lin));
    ## If input was a vector, we could have simply done this: 
    ## y(ids)=lin;
    y(_ids(ids,sz))=lin;

    if (dim != 1)
      y = permute (y, perm);
    endif
  endif
endfunction

function idf=_ids(ids,sz);
  oo=ones(sz);
  allids=[{ids-1}];
  nn=numel(sz);
  for ii=2:nn;
    allids=[allids,{cumsum(oo,ii)-1}];
  endfor
  idf=allids{end};
  for jj=(nn-1):-1:1;
    idf=idf*sz(jj)+allids{jj};
  endfor
  idf+=1;
endfunction


function linnew=_competition (lin,sx,sz)
  ## Stop increasing lin when sx does not increase. Same as before
  ## otherwise.
  infvec = -Inf * ones ([1, sz(2 : end)]);
  fnewp= find(diff([infvec;sx]));
  linnew=zeros(size(lin));
  linnew(fnewp)=lin(fnewp);
  linnew=cummax(linnew);
endfunction


function linnew=_modified (lin,sx,sz)
  ## Traverse lin backwards. Stop decreasing it when sx doesn't
  ## decrease.
  infvec = Inf * ones ([1, sz(2 : end)]);
  fnewp= find(diff([sx;infvec]));
  linnew=Inf(size(lin));
  linnew(fnewp)=lin(fnewp);
  linnew=flipdim(cummin(flipdim(linnew,1)),1);
endfunction 


%!assert (ranks (1:2:10), 1:5)
%!assert (ranks (10:-2:1), 5:-1:1)
%!assert (ranks ([2, 1, 2, 4]), [2.5, 1, 2.5, 4])
%!assert (ranks (ones (1, 5)), 3*ones (1, 5))
%!assert (ranks (1e6*ones (1, 5)), 3*ones (1, 5))
%!assert (ranks (rand (1, 5), 1), ones (1, 5))

%% Test input validation
%!error ranks ()
%!error ranks (1, 2, 3)
%!error ranks ({1, 2})
%!error ranks (['A'; 'B'])
%!error ranks (1, 1.5)
%!error ranks (1, 0)
%!error ranks (1, 3)






    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?36372>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[Octave-bug-tracker] [bug #36372] Improved ranks.m included:, anonymous <=
- [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/02
  - [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/02
    - Re: [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Dave Goel, 2012/05/03
    - Re: [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Dave Goel, 2012/05/03
    - Re: [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/03
    - Re: [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Dave Goel, 2012/05/03
    - [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/08
    - [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, anonymous, 2012/05/08
    - [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, Jordi Gutiérrez Hermoso, 2012/05/08
    - [Octave-bug-tracker] [bug #36372] Improved ranks.m included:, anonymous, 2012/05/08

Prev by Date: [Octave-bug-tracker] [bug #36368] matrix operation incorrect
Next by Date: [Octave-bug-tracker] [bug #36372] Improved ranks.m included:
Previous by thread: [Octave-bug-tracker] [bug #36368] matrix operation incorrect
Next by thread: [Octave-bug-tracker] [bug #36372] Improved ranks.m included:
Index(es):
- Date
- Thread