[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed,
From: |
anonymous |
Subject: |
[Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested |
Date: |
Sat, 9 Jan 2021 20:31:30 -0500 (EST) |
User-agent: |
Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0 |
Follow-up Comment #4, bug #59840 (project octave):
After a significant amount of additional time on this, I have addressed each
comment below and provided a new implementation that is generally faster than
the existing method with all of the features of the existing method. I found
for large inputs the existing method is faster than the kron method but the
kron method is faster for small inputs. For both repmat, repelem I copied the
existing code and modified the relevant section. I put a row of comment signs
above and below the section I modified. The modifications are a few changes to
the checks in repmat and repelem with the inclusion of kron into repelem.
The loop solution is still the fastest for my particular problem but appears
to be slower if I put it into a function so I think the speedup is related to
avoiding copying memory in and out of a function. It is still faster for a few
hundred iterations and as the rows were variables for repeated single variable
integration I will never have large numbers of rows. The reason for the
difference in output in my original submission was a mistake introduced after
my copy and paste, there should have been a +1 after size on the first line. I
clearly deleted it after copying my original code as it was present where I
copied it from and I did not check the output before submitting the bug.
I worked out a simple function for were to switch between the kron method and
the existing method by creating an objective function with inputs of the size
of the matrix to be repeated and the number of times to be repeated in each
direction. The objective function then evaluated the time using both methods
and returned the absolute difference in time. I minimized this function using
fminsearch (natural number inputs so the function is not smooth) with a
uniform random guess between 1 and 100 for each variable, 100 times (if you
feel like spending more time you can do more). Using this output I decided if
the sum of the inputs (matrix size plus the number of times repeated in each
direction) is less than 175 use the kron method otherwise use the existing
method. I chose this as the mean sum of inputs where the evaluation time was
the same was around 175, the standard deviation was around 50.
My code is below: it contains the optimization code which takes a few minutes
to run, the original tests which can now easily be extended to more cases, and
the proposed replacement functions repmatnew and repelemnew. repmatcmpspeed
and repelemcmpspeed are identical excluding the sum of input checks which is
set so the kron method always runs to perform the optimization.
Another thought with sparse matrixes and repelem perhaps the kron method
should always be used to keep it consistent with repmat. I have not checked
the speed difference in this case.
I have also attached plots of the input sizes where the new and old functions
have the same evaluation time so the slow first section of the code can be
skipped.
++
function repmatrepelemfaster
a=(1:100);
xx=nan(4,length(a));
for k=a
k
[x,r,ex]=fminsearch(@repmatswitchmeth,100*rand(4,1));
if ex==1
xx(:,k)=abs(round(x));
end
end
close all;
figure;
plot(a,xx(1,:),'o',a,xx(2,:),'s',a,xx(3,:),'p',a,xx(4,:),'d',a,sum(xx,1));
legend('m','n','a','b','sum')
title('repmat');
inputsum=sum(xx,1);
inputsum(isnan(inputsum))=[];
mean(inputsum)
std(inputsum)
a=(1:100);
xx=nan(4,length(a));
for k=a
k
[x,r,ex]=fminsearch(@repelemswitchmeth,100*rand(4,1));
if ex==1
xx(:,k)=abs(round(x));
end
end
figure;
plot(a,xx(1,:),'o',a,xx(2,:),'s',a,xx(3,:),'p',a,xx(4,:),'d',a,sum(xx,1));
legend('m','n','a','b','sum')
title('repelem');
inputsum=sum(xx,1);
inputsum(isnan(inputsum))=[];
mean(inputsum)
std(inputsum)
q=reshape(1:6,[],2)
repmat(q,3,1)
repmatnew(q,3,1)
repelem(q,3,2)
repelemnew(q,3,2)
ab=[1 50;
50 1;
50 50];
qq=cell(3,1);
qq{1}=1:10;
qq{2}=qq{1}';
qq{3}=reshape(1:100,10,[]);
for i=1:length(qq)
for j=1:size(ab,1)
a=ab(j,1);
b=ab(j,2);
q=qq{i};
fprintf('b=%d, c=%d, size q=%d,%d\n',a,b,size(q))
disp('inbuilt repmat');
tic
for k=1:1000
repmat(q,a,b);
end
toc
disp('repmat using kron')
tic
for k=1:1000
repmatnew(q,a,b);
end
toc
disp('inbuilt repelem')
tic
for k=1:1000
repelem(q,a,b);
end
toc
disp('repelem using kron')
tic
for k=1:1000
repelemnew(q,a,b);
end
toc
end
end
fprintf('What I was doing when I observed the issue:\nInsert a row into a
column vector repeting the elements of the vector\nThis was used to plot a
slice through a multivariable function.\nSpeed became an issue when I tried to
intergrate the function.\n');
np=150;
xi=rand(1,np);
xo=rand(20,1);
i=2;
disp('Origional attempt')
tic
for l=1:10000
x=[repmat(xo(1:i-1),1,np);xi;repmat(xo(i:end),1,np)];
end
toc
disp('with faster repmat')
tic
for l=1:10000
x=[repmatnew(xo(1:i-1),1,np);xi;repmatnew(xo(i:end),1,np)];
end
toc
disp('single repmat call')
tic
for l=1:10000
x=zeros(size(xo,1)+1,size(xi,2));
x([1:i-1 i+1:end],:)=repmat(xo,1,np);
x(i,:)=xi;
end
toc
disp('with faster repmat')
tic
for l=1:10000
x=zeros(size(xo,1)+1,size(xi,2));
x([1:i-1 i+1:end],:)=repmatnew(xo,1,np);
x(i,:)=xi;
end
toc
disp('loop solution')
tic
for l=1:1
nv=size(xo,1)+1;
x=zeros(size(xo,1)+1,size(xi,2));
x(i,:)=xi;
for k=[1:i-1]
x(k,:)=xo(k);
end
for k=i+1:nv
x(k,:)=xo(k-1);
end
end
toc
end
function r=repmatswitchmeth(x)
q=rand(max(abs(round(x(1))),1),max(abs(round(x(2))),1));
a=max(abs(round(x(3))),1);
b=max(abs(round(x(4))),1);
tic;
repmat(q,a,b);
t1=toc;
tic;
repmatcmpspeed(q,a,b);
t2=toc;
r=abs(t2-t1);
end
function r=repelemswitchmeth(x)
q=rand(max(abs(round(x(1))),1),max(abs(round(x(2))),1));
a=max(abs(round(x(3))),1);
b=max(abs(round(x(4))),1);
tic;
repelem(q,a,b);
t1=toc;
tic;
repelemcmpspeed(q,a,b);
t2=toc;
r=abs(t2-t1);
end
function x = repmatcmpspeed(A, m, varargin)
if (nargin < 2)
print_usage ();
endif
if (nargin == 3)
n = varargin{1};
if (! isempty (m) && isempty (n))
m = m(:).';
n = 1;
elseif (isempty (m) && ! isempty (n))
m = n(:).';
n = 1;
elseif (isempty (m) && isempty (n))
m = n = 1;
else
if (all (size (m) > 1))
m = m(:,1);
if (numel (m) < 3)
n = n(end);
else
n = [];
endif
endif
if (all (size (n) > 1))
n = n(:,1);
endif
m = m(:).';
n = n(:).';
endif
else
if (nargin > 3)
## input check for m and varargin
if (isscalar (m) && all (cellfun ("numel", varargin) == 1))
m = [m varargin{:}];
n = [];
else
error ("repmat: all input arguments must be scalar");
endif
elseif (isempty (m))
m = n = 1;
elseif (isscalar (m))
n = m;
elseif (ndims (m) > 2)
error ("repmat: M has more than 2 dimensions");
elseif (all (size (m) > 1))
m = m(:,1).';
n = [];
else
m = m(:).';
n = [];
endif
endif
idx = [m, n];
if (all (idx < 0))
error ("repmat: invalid dimensions");
else
idx = max (idx, 0);
endif
if (numel (A) == 1)
## optimize the scalar fill case.
if (any (idx == 0))
x = resize (A, idx);
else
x(1:prod (idx)) = A;
x = reshape (x, idx);
endif
elseif (ndims (A) == 2 && length (idx) < 3)
################################################################################
m=rows(A);
n=columns(A);
## Change the 1000 to 175
if (issparse(A)||((m+n+sum(idx)<1000)&&(isnumeric(A)||islogical(A))))
x=kron(ones(idx),A);
else
## indexing is now faster, so we use it rather than kron.
p = idx(1); q = idx(2);
x = reshape (A, m, 1, n, 1);
x = x(:, ones (1, p), :, ones (1, q));
x = reshape (x, m*p, n*q);
endif
################################################################################
else
aidx = size (A);
## ensure matching size
idx(end+1:length (aidx)) = 1;
aidx(end+1:length (idx)) = 1;
## create subscript array
cidx = cell (2, length (aidx));
for i = 1:length (aidx)
cidx{1,i} = ':';
cidx{2,i} = ones (1, idx (i));
endfor
aaidx = aidx;
## add singleton dims
aaidx(2,:) = 1;
A = reshape (A, aaidx(:));
x = reshape (A (cidx{:}), idx .* aidx);
endif
endfunction
function retval = repelemcmpspeed(x, varargin)
if (nargin <= 1)
print_usage ();
elseif (nargin == 2)
R = varargin{1};
if (isscalar (R))
if (! isvector (x))
error (["repelem: %dD Array requires %d or more input " ...
"arguments, but only %d given"], ...
ndims (x), ndims (x) + 1, nargin);
endif
if (iscolumn (x))
## element values repeated R times in a col vector
retval = x.'(ones (R, 1), :)(:);
else
## element values repeated R times in a row vector
retval = x(ones (R, 1), :)(:).';
endif
elseif (isvector (x) && isvector (R))
## vector x with vector repeat.
if (numel (R) != numel (x))
error (["repelem: R1 must either be scalar or have the same " ...
"number of elements as the vector to be replicated"]);
endif
## Basic run-length decoding in function prepareIdx returns
## idx2 as a row vector of element indices in the right positions.
idx2 = prepareIdx (R);
## Fill with element values, direction matches element.
retval = x(idx2);
else # catch any arrays passed to x or varargin with nargin==2
error (["repelem: when called with only two inputs they must be " ...
"either scalars or vectors, not %s and %s."],
typeinfo (x), typeinfo (R));
endif
elseif (nargin == 3) # special optimized case for 2-D (matrices)
## Input Validation
xsz = size (x);
vector_r = ! (cellfun (@numel, varargin) == 1);
## 1. Check that all varargin are either scalars or vectors, not arrays.
## isvector gives true for scalars.
if (! (isvector (varargin{1}) && (isvector (varargin{2}))))
error ("repelem: R1 and R2 must be scalars or vectors");
## 2. check that any repeat vectors have the right length.
elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
error (["repelem: R_j vectors must have the same number of elements "
...
"as the size of dimension j of X"]);
endif
## Create index arrays to pass to element.
## (It is no slower passing to prepareIdx
## than checking and doing scalars directly.)
################################################################################
## See bug 59840 for reasons for different methods
if (isnumeric(x)||islogical(x))&&ismatrix(x)&&(issparse(x)||sum([xsz
varargin{1} varargin{2}])<1000)%Change to 175, 1sd 50
retval=kron(x,ones(varargin{1},varargin{2}));
else
idx1 = prepareIdx (varargin{1}, xsz(1));
idx2 = prepareIdx (varargin{2}, xsz(2));
## The ":" at the end takes care of any x dimensions > 2.
if issparse()
retval = x(idx1, idx2);
else
retval = x(idx1, idx2, :);
endif
end
##################################################################################
else # (nargin > 3)
## Input Validation
xsz = size (x);
n_xdims = numel (xsz);
vector_r = ! (cellfun (@numel, varargin) == 1);
## 1. Check that all repeats are scalars or vectors
## (isvector gives true for scalars);
if (! all (cellfun (@isvector, varargin(vector_r))))
error ("repelem: R_j must all be scalars or vectors");
## 2. Catch any vectors thrown at trailing singletons,
## which should only have scalars;
elseif (find (vector_r, 1, "last") > n_xdims)
error ("repelem: R_j for trailing singleton dimensions must be
scalar");
## 3. Check that the ones that are vectors have the right length.
elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
error (["repelem: R_j vectors must have the same number of elements "
...
"as the size of dimension j of X"]);
endif
n_rpts = nargin - 1;
dims_with_vectors_and_scalars = min (n_xdims, n_rpts);
## Preallocate idx which will contain index array to be put into element.
idx = cell (1, n_rpts);
## Use prepareIdx() to fill indices for dimensions that could be
## a scalar or a vector.
for i = 1 : dims_with_vectors_and_scalars
idx(i) = prepareIdx (varargin{i}, xsz(i));
endfor
## If there are more varargin inputs than x dimensions, then input tests
## have verified that they are just scalars, so add [1 1 1 1 1 ... 1] to
## those dims to perform concatenation along those dims.
if (n_rpts > n_xdims)
for i = n_xdims + (1 : (n_rpts - n_xdims))
idx(i) = ones (1, varargin{i});
endfor
endif
## Use completed idx to specify repetition of x values in all dimensions.
## The trailing ":" will take care of cases where n_xdims > n_rpts.
retval = x(idx{:}, :);
endif
endfunction
function x = repmatnew(A, m, varargin)
if (nargin < 2)
print_usage ();
endif
if (nargin == 3)
n = varargin{1};
if (! isempty (m) && isempty (n))
m = m(:).';
n = 1;
elseif (isempty (m) && ! isempty (n))
m = n(:).';
n = 1;
elseif (isempty (m) && isempty (n))
m = n = 1;
else
if (all (size (m) > 1))
m = m(:,1);
if (numel (m) < 3)
n = n(end);
else
n = [];
endif
endif
if (all (size (n) > 1))
n = n(:,1);
endif
m = m(:).';
n = n(:).';
endif
else
if (nargin > 3)
## input check for m and varargin
if (isscalar (m) && all (cellfun ("numel", varargin) == 1))
m = [m varargin{:}];
n = [];
else
error ("repmat: all input arguments must be scalar");
endif
elseif (isempty (m))
m = n = 1;
elseif (isscalar (m))
n = m;
elseif (ndims (m) > 2)
error ("repmat: M has more than 2 dimensions");
elseif (all (size (m) > 1))
m = m(:,1).';
n = [];
else
m = m(:).';
n = [];
endif
endif
idx = [m, n];
if (all (idx < 0))
error ("repmat: invalid dimensions");
else
idx = max (idx, 0);
endif
if (numel (A) == 1)
## optimize the scalar fill case.
if (any (idx == 0))
x = resize (A, idx);
else
x(1:prod (idx)) = A;
x = reshape (x, idx);
endif
elseif (ndims (A) == 2 && length (idx) < 3)
################################################################################
## See bug 59840 for reasons for different methods
m=rows(A);
n=columns(A);
if (issparse(A)||((m+n+sum(idx)<175)&&(isnumeric(A)||islogical(A))))
x=kron(ones(idx),A);
else
## indexing is now faster, so we use it rather than kron.
p = idx(1); q = idx(2);
x = reshape (A, m, 1, n, 1);
x = x(:, ones (1, p), :, ones (1, q));
x = reshape (x, m*p, n*q);
endif
################################################################################
else
aidx = size (A);
## ensure matching size
idx(end+1:length (aidx)) = 1;
aidx(end+1:length (idx)) = 1;
## create subscript array
cidx = cell (2, length (aidx));
for i = 1:length (aidx)
cidx{1,i} = ':';
cidx{2,i} = ones (1, idx (i));
endfor
aaidx = aidx;
## add singleton dims
aaidx(2,:) = 1;
A = reshape (A, aaidx(:));
x = reshape (A (cidx{:}), idx .* aidx);
endif
endfunction
function retval = repelemnew(x, varargin)
if (nargin <= 1)
print_usage ();
elseif (nargin == 2)
R = varargin{1};
if (isscalar (R))
if (! isvector (x))
error (["repelem: %dD Array requires %d or more input " ...
"arguments, but only %d given"], ...
ndims (x), ndims (x) + 1, nargin);
endif
if (iscolumn (x))
## element values repeated R times in a col vector
retval = x.'(ones (R, 1), :)(:);
else
## element values repeated R times in a row vector
retval = x(ones (R, 1), :)(:).';
endif
elseif (isvector (x) && isvector (R))
## vector x with vector repeat.
if (numel (R) != numel (x))
error (["repelem: R1 must either be scalar or have the same " ...
"number of elements as the vector to be replicated"]);
endif
## Basic run-length decoding in function prepareIdx returns
## idx2 as a row vector of element indices in the right positions.
idx2 = prepareIdx (R);
## Fill with element values, direction matches element.
retval = x(idx2);
else # catch any arrays passed to x or varargin with nargin==2
error (["repelem: when called with only two inputs they must be " ...
"either scalars or vectors, not %s and %s."],
typeinfo (x), typeinfo (R));
endif
elseif (nargin == 3) # special optimized case for 2-D (matrices)
## Input Validation
xsz = size (x);
vector_r = ! (cellfun (@numel, varargin) == 1);
## 1. Check that all varargin are either scalars or vectors, not arrays.
## isvector gives true for scalars.
if (! (isvector (varargin{1}) && (isvector (varargin{2}))))
error ("repelem: R1 and R2 must be scalars or vectors");
## 2. check that any repeat vectors have the right length.
elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
error (["repelem: R_j vectors must have the same number of elements "
...
"as the size of dimension j of X"]);
endif
## Create index arrays to pass to element.
## (It is no slower passing to prepareIdx
## than checking and doing scalars directly.)
################################################################################
## See bug 59840 for reasons for different methods
if (isnumeric(x)||islogical(x))&&ismatrix(x)&&(issparse(x)||sum([xsz
varargin{1} varargin{2}])<175)
retval=kron(x,ones(varargin{1},varargin{2}));
else
idx1 = prepareIdx (varargin{1}, xsz(1));
idx2 = prepareIdx (varargin{2}, xsz(2));
## The ":" at the end takes care of any x dimensions > 2.
if issparse()
retval = x(idx1, idx2);
else
retval = x(idx1, idx2, :);
endif
end
##################################################################################
else # (nargin > 3)
## Input Validation
xsz = size (x);
n_xdims = numel (xsz);
vector_r = ! (cellfun (@numel, varargin) == 1);
## 1. Check that all repeats are scalars or vectors
## (isvector gives true for scalars);
if (! all (cellfun (@isvector, varargin(vector_r))))
error ("repelem: R_j must all be scalars or vectors");
## 2. Catch any vectors thrown at trailing singletons,
## which should only have scalars;
elseif (find (vector_r, 1, "last") > n_xdims)
error ("repelem: R_j for trailing singleton dimensions must be
scalar");
## 3. Check that the ones that are vectors have the right length.
elseif (any (cellfun (@numel, varargin(vector_r)) != xsz(vector_r)))
error (["repelem: R_j vectors must have the same number of elements "
...
"as the size of dimension j of X"]);
endif
n_rpts = nargin - 1;
dims_with_vectors_and_scalars = min (n_xdims, n_rpts);
## Preallocate idx which will contain index array to be put into element.
idx = cell (1, n_rpts);
## Use prepareIdx() to fill indices for dimensions that could be
## a scalar or a vector.
for i = 1 : dims_with_vectors_and_scalars
idx(i) = prepareIdx (varargin{i}, xsz(i));
endfor
## If there are more varargin inputs than x dimensions, then input tests
## have verified that they are just scalars, so add [1 1 1 1 1 ... 1] to
## those dims to perform concatenation along those dims.
if (n_rpts > n_xdims)
for i = n_xdims + (1 : (n_rpts - n_xdims))
idx(i) = ones (1, varargin{i});
endfor
endif
## Use completed idx to specify repetition of x values in all dimensions.
## The trailing ":" will take care of cases where n_xdims > n_rpts.
retval = x(idx{:}, :);
endif
endfunction
## Return a row vector of indices prepared for replicating.
function idx = prepareIdx (v, n)
if (isscalar (v))
## will always return row vector
idx = [1:n](ones (v, 1), :)(:).';
else
## This works for a row or column vector.
## Get ending position for each element item.
idx_temp = cumsum (v);
## Set starting position of each element to 1.
idx(idx_temp + 1) = 1;
## Set starting position of each element to 1.
idx(1) = 1;
## Row vector with proper length for output
idx = idx(1:idx_temp(end));
## with prepared index
idx = (find (v != 0))(cumsum (idx));
endif
endfunction
--
(file #50689, file #50690, file #50691, file #50692)
_______________________________________________________
Additional Item Attachment:
File name: repelemoldnewsametime.eps Size:220 KB
<https://file.savannah.gnu.org/file/repelemoldnewsametime.eps?file_id=50689>
File name: repelemoldnewsametime2.eps Size:220 KB
<https://file.savannah.gnu.org/file/repelemoldnewsametime2.eps?file_id=50690>
File name: repmatoldnewsametime.eps Size:222 KB
<https://file.savannah.gnu.org/file/repmatoldnewsametime.eps?file_id=50691>
File name: repmatoldnewsametime2.eps Size:220 KB
<https://file.savannah.gnu.org/file/repmatoldnewsametime2.eps?file_id=50692>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?59840>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, anonymous, 2021/01/07
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, anonymous, 2021/01/08
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, Rik, 2021/01/08
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, Rik, 2021/01/08
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested,
anonymous <=
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, anonymous, 2021/01/09
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, Rik, 2021/01/16
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, anonymous, 2021/01/18
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, anonymous, 2021/01/18
- [Octave-bug-tracker] [bug #59840] repmat and repelem slower than needed, faster implimentation suggested, Rik, 2021/01/19