octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed,


From: anonymous
Subject: [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested
Date: Sat, 9 Jan 2021 20:31:30 -0500 (EST)
User-agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0

Follow-up Comment #4, bug #59840 (project octave):

After a significant amount of additional time on this, I have addressed each
comment below and provided a new implementation that is generally faster than
the existing method with all of the features of the existing method. I found
for large inputs the existing method is faster than the kron method but the
kron method is faster for small inputs. For both repmat, repelem I copied the
existing code and modified the relevant section. I put a row of comment signs
above and below the section I modified. The modifications are a few changes to
the checks in repmat and repelem with the inclusion of kron into repelem.

The loop solution is still the fastest for my particular problem but appears
to be slower if I put it into a function so I think the speedup is related to
avoiding copying memory in and out of a function. It is still faster for a few
hundred iterations and as the rows were variables for repeated single variable
integration I will never have large numbers of rows. The reason for the
difference in output in my original submission was a mistake introduced after
my copy and paste, there should have been a +1 after size on the first line. I
clearly deleted it after copying my original code as it was present where I
copied it from and I did not check the output before submitting the bug.

I worked out a simple function for were to switch between the kron method and
the existing method by creating an objective function with inputs of the size
of the matrix to be repeated and the number of times to be repeated in each
direction. The objective function then evaluated the time using both methods
and returned the absolute difference in time. I minimized this function using
fminsearch (natural number inputs so the function is not smooth) with a
uniform random guess between 1 and 100 for each variable, 100 times (if you
feel like spending more time you can do more). Using this output I decided if
the sum of the inputs (matrix size plus the number of times repeated in each
direction) is less than 175 use the kron method otherwise use the existing
method. I chose this as the mean sum of inputs where the evaluation time was
the same was around 175, the standard deviation was around 50.

My code is below: it contains the optimization code which takes a few minutes
to run, the original tests which can now easily be extended to more cases, and
the proposed replacement functions repmatnew and repelemnew. repmatcmpspeed
and repelemcmpspeed are identical excluding the sum of input checks which is
set so the kron method always runs to perform the optimization.

Another thought with sparse matrixes and repelem perhaps the kron method
should always be used to keep it consistent with repmat. I have not checked
the speed difference in this case.

I have also attached plots of the input sizes where the new and old functions
have the same evaluation time so the slow first section of the code can be
skipped.

++
function repmatrepelemfaster

a=(1:100);
xx=nan(4,length(a));
for k=a
  k
  [x,r,ex]=fminsearch(@repmatswitchmeth,100*rand(4,1));
  if ex==1
    xx(:,k)=abs(round(x));
  end
end
close all;
figure;
plot(a,xx(1,:),'o',a,xx(2,:),'s',a,xx(3,:),'p',a,xx(4,:),'d',a,sum(xx,1));
legend('m','n','a','b','sum')
title('repmat');
inputsum=sum(xx,1);
inputsum(isnan(inputsum))=[];
mean(inputsum)
std(inputsum)

a=(1:100);
xx=nan(4,length(a));
for k=a
  k
  [x,r,ex]=fminsearch(@repelemswitchmeth,100*rand(4,1));
  if ex==1
    xx(:,k)=abs(round(x));
  end
end
figure;
plot(a,xx(1,:),'o',a,xx(2,:),'s',a,xx(3,:),'p',a,xx(4,:),'d',a,sum(xx,1));
legend('m','n','a','b','sum')
title('repelem');
inputsum=sum(xx,1);
inputsum(isnan(inputsum))=[];
mean(inputsum)
std(inputsum)







q=reshape(1:6,[],2)

repmat(q,3,1)
repmatnew(q,3,1)

repelem(q,3,2)
repelemnew(q,3,2)

ab=[1 50;
50 1;
50 50];
qq=cell(3,1);
qq{1}=1:10;
qq{2}=qq{1}';
qq{3}=reshape(1:100,10,[]);

for i=1:length(qq)
for j=1:size(ab,1)
  a=ab(j,1);
  b=ab(j,2);
  q=qq{i};
  fprintf('b=%d, c=%d, size q=%d,%d\n',a,b,size(q))
  disp('inbuilt repmat');
  tic
  for k=1:1000
    repmat(q,a,b);
  end
  toc
  disp('repmat using kron')
  tic
  for k=1:1000
    repmatnew(q,a,b);
  end
  toc
  disp('inbuilt repelem')
  tic
  for k=1:1000
    repelem(q,a,b);
  end
  toc
  disp('repelem using kron')
  tic
  for k=1:1000
    repelemnew(q,a,b);
  end
  toc
end
end
fprintf('What I was doing when I observed the issue:\nInsert a row into a
column vector repeting the elements of the vector\nThis was used to plot a
slice through a multivariable function.\nSpeed became an issue when I tried to
intergrate the function.\n');

np=150;
xi=rand(1,np);
xo=rand(20,1);
i=2;
disp('Origional attempt')
tic
for l=1:10000
  x=[repmat(xo(1:i-1),1,np);xi;repmat(xo(i:end),1,np)];
end
toc
disp('with faster repmat')
tic
for l=1:10000
  x=[repmatnew(xo(1:i-1),1,np);xi;repmatnew(xo(i:end),1,np)];
end
toc
disp('single repmat call')
tic
for l=1:10000
  x=zeros(size(xo,1)+1,size(xi,2));
  x([1:i-1 i+1:end],:)=repmat(xo,1,np);
  x(i,:)=xi;
end
toc
disp('with faster repmat')
tic
for l=1:10000
  x=zeros(size(xo,1)+1,size(xi,2));
  x([1:i-1 i+1:end],:)=repmatnew(xo,1,np);
  x(i,:)=xi;
end
toc
disp('loop solution')
tic
for l=1:1
  nv=size(xo,1)+1;
  x=zeros(size(xo,1)+1,size(xi,2));
  x(i,:)=xi;
  for k=[1:i-1]
    x(k,:)=xo(k);
  end
  for k=i+1:nv
    x(k,:)=xo(k-1);
  end
end
toc

end
function r=repmatswitchmeth(x)
q=rand(max(abs(round(x(1))),1),max(abs(round(x(2))),1));
a=max(abs(round(x(3))),1);
b=max(abs(round(x(4))),1);
tic;
repmat(q,a,b);
t1=toc;
tic;
repmatcmpspeed(q,a,b);
t2=toc;
r=abs(t2-t1);
end
function r=repelemswitchmeth(x)
q=rand(max(abs(round(x(1))),1),max(abs(round(x(2))),1));
a=max(abs(round(x(3))),1);
b=max(abs(round(x(4))),1);
tic;
repelem(q,a,b);
t1=toc;
tic;
repelemcmpspeed(q,a,b);
t2=toc;
r=abs(t2-t1);
end



function x = repmatcmpspeed(A, m, varargin)

  if (nargin < 2)
    print_usage ();
  endif

  if (nargin == 3)
    n = varargin{1};
    if (! isempty (m) && isempty (n))
      m = m(:).';
      n = 1;
    elseif (isempty (m) && ! isempty (n))
      m = n(:).';
      n = 1;
    elseif (isempty (m) && isempty (n))
      m = n = 1;
    else
      if (all (size (m) > 1))
        m = m(:,1);
        if (numel (m) < 3)
          n = n(end);
        else
          n = [];
        endif
      endif
      if (all (size (n) > 1))
        n = n(:,1);
      endif
      m = m(:).';
      n = n(:).';
    endif
  else
    if (nargin > 3)
      ## input check for m and varargin
      if (isscalar (m) && all (cellfun ("numel", varargin) == 1))
        m = [m varargin{:}];
        n = [];
      else
        error ("repmat: all input arguments must be scalar");
      endif
    elseif (isempty (m))
      m = n = 1;
    elseif (isscalar (m))
      n = m;
    elseif (ndims (m) > 2)
      error ("repmat: M has more than 2 dimensions");
    elseif (all (size (m) > 1))
      m = m(:,1).';
      n = [];
    else
      m = m(:).';
      n = [];
    endif
  endif
  idx = [m, n];

  if (all (idx < 0))
    error ("repmat: invalid dimensions");
  else
    idx = max (idx, 0);
  endif

  if (numel (A) == 1)
    ## optimize the scalar fill case.
    if (any (idx == 0))
      x = resize (A, idx);
    else
      x(1:prod (idx)) = A;
      x = reshape (x, idx);
    endif
  elseif (ndims (A) == 2 && length (idx) < 3)
################################################################################
    m=rows(A);
    n=columns(A);
##    Change the 1000 to 175
    if (issparse(A)||((m+n+sum(idx)<1000)&&(isnumeric(A)||islogical(A))))
      x=kron(ones(idx),A);
    else
      ## indexing is now faster, so we use it rather than kron.
      p = idx(1); q = idx(2);
      x = reshape (A, m, 1, n, 1);
      x = x(:, ones (1, p), :, ones (1, q));
      x = reshape (x, m*p, n*q);
    endif
################################################################################
  else
    aidx = size (A);
    ## ensure matching size
    idx(end+1:length (aidx)) = 1;
    aidx(end+1:length (idx)) = 1;
    ## create subscript array
    cidx = cell (2, length (aidx));
    for i = 1:length (aidx)
      cidx{1,i} = ':';
      cidx{2,i} = ones (1, idx (i));
    endfor
    aaidx = aidx;
    ## add singleton dims
    aaidx(2,:) = 1;
    A = reshape (A, aaidx(:));
    x = reshape (A (cidx{:}), idx .* aidx);
  endif

endfunction

function retval = repelemcmpspeed(x, varargin)

  if (nargin <= 1)
    print_usage ();

  elseif (nargin == 2)

    R = varargin{1};

    if (isscalar (R))

      if (! isvector (x))
        error (["repelem: %dD Array requires %d or more input " ...
                "arguments, but only %d given"], ...
               ndims (x), ndims (x) + 1, nargin);
      endif

      if (iscolumn (x))
        ## element values repeated R times in a col vector
        retval = x.'(ones (R, 1), :)(:);
      else
        ## element values repeated R times in a row vector
        retval = x(ones (R, 1), :)(:).';
      endif

    elseif (isvector (x) && isvector (R))

      ## vector x with vector repeat.
      if (numel (R) != numel (x))
        error (["repelem: R1 must either be scalar or have the same " ...
                "number of elements as the vector to be replicated"]);
      endif

      ## Basic run-length decoding in function prepareIdx returns
      ## idx2 as a row vector of element indices in the right positions.
      idx2 = prepareIdx (R);
      ## Fill with element values, direction matches element.
      retval = x(idx2);

    else # catch any arrays passed to x or varargin with nargin==2
      error (["repelem: when called with only two inputs they must be " ...
              "either scalars or vectors, not %s and %s."],
             typeinfo (x), typeinfo (R));
    endif

  elseif (nargin == 3)  # special optimized case for 2-D (matrices)

    ## Input Validation
    xsz = size (x);
    vector_r = ! (cellfun (@numel, varargin) == 1);

    ## 1. Check that all varargin are either scalars or vectors, not arrays.
    ##    isvector gives true for scalars.
    if (! (isvector (varargin{1}) && (isvector (varargin{2}))))
      error ("repelem: R1 and R2 must be scalars or vectors");

    ## 2. check that any repeat vectors have the right length.
    elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
      error (["repelem: R_j vectors must have the same number of elements "
...
              "as the size of dimension j of X"]);
    endif

    ## Create index arrays to pass to element.
    ## (It is no slower passing to prepareIdx
    ## than checking and doing scalars directly.)
################################################################################
## See bug 59840 for reasons for different methods
    if (isnumeric(x)||islogical(x))&&ismatrix(x)&&(issparse(x)||sum([xsz
varargin{1} varargin{2}])<1000)%Change to 175, 1sd 50
      retval=kron(x,ones(varargin{1},varargin{2}));
    else
      idx1 = prepareIdx (varargin{1}, xsz(1));
      idx2 = prepareIdx (varargin{2}, xsz(2));

    ## The ":" at the end takes care of any x dimensions > 2.
      if issparse()
        retval = x(idx1, idx2);
      else
        retval = x(idx1, idx2, :);
      endif
    end
##################################################################################
  else  # (nargin > 3)

    ## Input Validation
    xsz = size (x);
    n_xdims = numel (xsz);
    vector_r = ! (cellfun (@numel, varargin) == 1);

    ## 1. Check that all repeats are scalars or vectors
    ##    (isvector gives true for scalars);
    if (! all (cellfun (@isvector, varargin(vector_r))))
      error ("repelem: R_j must all be scalars or vectors");

    ## 2. Catch any vectors thrown at trailing singletons,
    ##    which should only have scalars;
    elseif (find (vector_r, 1, "last") > n_xdims)
      error ("repelem: R_j for trailing singleton dimensions must be
scalar");

    ## 3. Check that the ones that are vectors have the right length.
    elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
      error (["repelem: R_j vectors must have the same number of elements "
...
              "as the size of dimension j of X"]);

    endif

    n_rpts = nargin - 1;
    dims_with_vectors_and_scalars = min (n_xdims, n_rpts);

    ## Preallocate idx which will contain index array to be put into element.
    idx = cell (1, n_rpts);

    ## Use prepareIdx() to fill indices for dimensions that could be
    ## a scalar or a vector.
    for i = 1 : dims_with_vectors_and_scalars
      idx(i) = prepareIdx (varargin{i}, xsz(i));
    endfor

    ## If there are more varargin inputs than x dimensions, then input tests
    ## have verified that they are just scalars, so add [1 1 1 1 1 ... 1] to
    ## those dims to perform concatenation along those dims.
    if (n_rpts > n_xdims)
      for i = n_xdims + (1 : (n_rpts - n_xdims))
        idx(i) = ones (1, varargin{i});
      endfor
    endif

    ## Use completed idx to specify repetition of x values in all dimensions.
    ## The trailing ":" will take care of cases where n_xdims > n_rpts.
    retval = x(idx{:}, :);

  endif

endfunction



function x = repmatnew(A, m, varargin)

  if (nargin < 2)
    print_usage ();
  endif

  if (nargin == 3)
    n = varargin{1};
    if (! isempty (m) && isempty (n))
      m = m(:).';
      n = 1;
    elseif (isempty (m) && ! isempty (n))
      m = n(:).';
      n = 1;
    elseif (isempty (m) && isempty (n))
      m = n = 1;
    else
      if (all (size (m) > 1))
        m = m(:,1);
        if (numel (m) < 3)
          n = n(end);
        else
          n = [];
        endif
      endif
      if (all (size (n) > 1))
        n = n(:,1);
      endif
      m = m(:).';
      n = n(:).';
    endif
  else
    if (nargin > 3)
      ## input check for m and varargin
      if (isscalar (m) && all (cellfun ("numel", varargin) == 1))
        m = [m varargin{:}];
        n = [];
      else
        error ("repmat: all input arguments must be scalar");
      endif
    elseif (isempty (m))
      m = n = 1;
    elseif (isscalar (m))
      n = m;
    elseif (ndims (m) > 2)
      error ("repmat: M has more than 2 dimensions");
    elseif (all (size (m) > 1))
      m = m(:,1).';
      n = [];
    else
      m = m(:).';
      n = [];
    endif
  endif
  idx = [m, n];

  if (all (idx < 0))
    error ("repmat: invalid dimensions");
  else
    idx = max (idx, 0);
  endif

  if (numel (A) == 1)
    ## optimize the scalar fill case.
    if (any (idx == 0))
      x = resize (A, idx);
    else
      x(1:prod (idx)) = A;
      x = reshape (x, idx);
    endif
  elseif (ndims (A) == 2 && length (idx) < 3)
################################################################################
## See bug 59840 for reasons for different methods
    m=rows(A);
    n=columns(A);
    if (issparse(A)||((m+n+sum(idx)<175)&&(isnumeric(A)||islogical(A))))
      x=kron(ones(idx),A);
    else
      ## indexing is now faster, so we use it rather than kron.
      p = idx(1); q = idx(2);
      x = reshape (A, m, 1, n, 1);
      x = x(:, ones (1, p), :, ones (1, q));
      x = reshape (x, m*p, n*q);
    endif
################################################################################
  else
    aidx = size (A);
    ## ensure matching size
    idx(end+1:length (aidx)) = 1;
    aidx(end+1:length (idx)) = 1;
    ## create subscript array
    cidx = cell (2, length (aidx));
    for i = 1:length (aidx)
      cidx{1,i} = ':';
      cidx{2,i} = ones (1, idx (i));
    endfor
    aaidx = aidx;
    ## add singleton dims
    aaidx(2,:) = 1;
    A = reshape (A, aaidx(:));
    x = reshape (A (cidx{:}), idx .* aidx);
  endif

endfunction

function retval = repelemnew(x, varargin)

  if (nargin <= 1)
    print_usage ();

  elseif (nargin == 2)

    R = varargin{1};

    if (isscalar (R))

      if (! isvector (x))
        error (["repelem: %dD Array requires %d or more input " ...
                "arguments, but only %d given"], ...
               ndims (x), ndims (x) + 1, nargin);
      endif

      if (iscolumn (x))
        ## element values repeated R times in a col vector
        retval = x.'(ones (R, 1), :)(:);
      else
        ## element values repeated R times in a row vector
        retval = x(ones (R, 1), :)(:).';
      endif

    elseif (isvector (x) && isvector (R))

      ## vector x with vector repeat.
      if (numel (R) != numel (x))
        error (["repelem: R1 must either be scalar or have the same " ...
                "number of elements as the vector to be replicated"]);
      endif

      ## Basic run-length decoding in function prepareIdx returns
      ## idx2 as a row vector of element indices in the right positions.
      idx2 = prepareIdx (R);
      ## Fill with element values, direction matches element.
      retval = x(idx2);

    else # catch any arrays passed to x or varargin with nargin==2
      error (["repelem: when called with only two inputs they must be " ...
              "either scalars or vectors, not %s and %s."],
             typeinfo (x), typeinfo (R));
    endif

  elseif (nargin == 3)  # special optimized case for 2-D (matrices)

    ## Input Validation
    xsz = size (x);
    vector_r = ! (cellfun (@numel, varargin) == 1);

    ## 1. Check that all varargin are either scalars or vectors, not arrays.
    ##    isvector gives true for scalars.
    if (! (isvector (varargin{1}) && (isvector (varargin{2}))))
      error ("repelem: R1 and R2 must be scalars or vectors");

    ## 2. check that any repeat vectors have the right length.
    elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
      error (["repelem: R_j vectors must have the same number of elements "
...
              "as the size of dimension j of X"]);
    endif

    ## Create index arrays to pass to element.
    ## (It is no slower passing to prepareIdx
    ## than checking and doing scalars directly.)
################################################################################
## See bug 59840 for reasons for different methods
    if (isnumeric(x)||islogical(x))&&ismatrix(x)&&(issparse(x)||sum([xsz
varargin{1} varargin{2}])<175)
      retval=kron(x,ones(varargin{1},varargin{2}));
    else
      idx1 = prepareIdx (varargin{1}, xsz(1));
      idx2 = prepareIdx (varargin{2}, xsz(2));

    ## The ":" at the end takes care of any x dimensions > 2.
      if issparse()
        retval = x(idx1, idx2);
      else
        retval = x(idx1, idx2, :);
      endif
    end
##################################################################################
  else  # (nargin > 3)

    ## Input Validation
    xsz = size (x);
    n_xdims = numel (xsz);
    vector_r = ! (cellfun (@numel, varargin) == 1);

    ## 1. Check that all repeats are scalars or vectors
    ##    (isvector gives true for scalars);
    if (! all (cellfun (@isvector, varargin(vector_r))))
      error ("repelem: R_j must all be scalars or vectors");

    ## 2. Catch any vectors thrown at trailing singletons,
    ##    which should only have scalars;
    elseif (find (vector_r, 1, "last") > n_xdims)
      error ("repelem: R_j for trailing singleton dimensions must be
scalar");

    ## 3. Check that the ones that are vectors have the right length.
    elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
      error (["repelem: R_j vectors must have the same number of elements "
...
              "as the size of dimension j of X"]);

    endif

    n_rpts = nargin - 1;
    dims_with_vectors_and_scalars = min (n_xdims, n_rpts);

    ## Preallocate idx which will contain index array to be put into element.
    idx = cell (1, n_rpts);

    ## Use prepareIdx() to fill indices for dimensions that could be
    ## a scalar or a vector.
    for i = 1 : dims_with_vectors_and_scalars
      idx(i) = prepareIdx (varargin{i}, xsz(i));
    endfor

    ## If there are more varargin inputs than x dimensions, then input tests
    ## have verified that they are just scalars, so add [1 1 1 1 1 ... 1] to
    ## those dims to perform concatenation along those dims.
    if (n_rpts > n_xdims)
      for i = n_xdims + (1 : (n_rpts - n_xdims))
        idx(i) = ones (1, varargin{i});
      endfor
    endif

    ## Use completed idx to specify repetition of x values in all dimensions.
    ## The trailing ":" will take care of cases where n_xdims > n_rpts.
    retval = x(idx{:}, :);

  endif

endfunction

## Return a row vector of indices prepared for replicating.
function idx = prepareIdx (v, n)

  if (isscalar (v))
    ## will always return row vector
    idx = [1:n](ones (v, 1), :)(:).';

  else
    ## This works for a row or column vector.

    ## Get ending position for each element item.
    idx_temp = cumsum (v);

    ## Set starting position of each element to 1.
    idx(idx_temp + 1) = 1;

    ## Set starting position of each element to 1.
    idx(1) = 1;

    ## Row vector with proper length for output
    idx = idx(1:idx_temp(end));

    ## with prepared index
    idx = (find (v != 0))(cumsum (idx));

  endif

endfunction
--

(file #50689, file #50690, file #50691, file #50692)
    _______________________________________________________

Additional Item Attachment:

File name: repelemoldnewsametime.eps      Size:220 KB
   
<https://file.savannah.gnu.org/file/repelemoldnewsametime.eps?file_id=50689>

File name: repelemoldnewsametime2.eps     Size:220 KB
   
<https://file.savannah.gnu.org/file/repelemoldnewsametime2.eps?file_id=50690>

File name: repmatoldnewsametime.eps       Size:222 KB
   
<https://file.savannah.gnu.org/file/repmatoldnewsametime.eps?file_id=50691>

File name: repmatoldnewsametime2.eps      Size:220 KB
   
<https://file.savannah.gnu.org/file/repmatoldnewsametime2.eps?file_id=50692>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59840>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]