[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Improved triu.m and tril.m
From: |
Jaroslav Hajek |
Subject: |
Re: Improved triu.m and tril.m |
Date: |
Thu, 22 Oct 2009 21:54:18 +0200 |
2009/10/22 Marco Caliari <address@hidden>:
> Dear maintainers,
>
> although triu and tril are built-in functions in Matlab and then much
> faster, I discovered that it is possible to improve the m files in Octave,
> by eliminating the min/max evaluations inside the loops and selecting
> whether to replace numbers with zeros or zeros wih numbers.
> The results, for my Octave 3.2.3, are:
>
>> A=rand(100);
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = -31
> Elapsed time is 0.0221 seconds.
> Elapsed time is 0.00397 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = -52
> Elapsed time is 0.0224 seconds.
> Elapsed time is 0.00114 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = 22
> Elapsed time is 0.0174 seconds.
> Elapsed time is 0.00495 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = 63
> Elapsed time is 0.00873 seconds.
> Elapsed time is 0.00261 seconds.
> ans = 0
>>
>>
>> i=randperm(200)(1)-100,tic,U=triu(A,i);,toc,tic,myU=mytriu(A,i);,toc,norm(U-myU)
>
> i = 36
> Elapsed time is 0.0146 seconds.
> Elapsed time is 0.00413 seconds.
> ans = 0
>
> For me the speed-up is > 3. Enclosed, the patches wrt 3.2.3.
>
> Best regards,
>
> Marco
Interesting. For comparison, here is my benchmark:
n = 500;
A = rand (n);
k = 1;
tril (eye (2)); # to avoid startup penalty
for iter = 1:4
tic; tril (A, k); toc
endfor
my tip gives:
Elapsed time is 0.0236731 seconds.
Elapsed time is 0.0239391 seconds.
Elapsed time is 0.0221 seconds.
Elapsed time is 0.022387 seconds.
the relevant code snippet is
retval = resize (resize (x, 0), nr, nc);
for j = 1 : min (nc, nr+k)
nr_limit = max (1, j-k);
retval (nr_limit:nr, j) = x (nr_limit:nr, j);
endfor
one can save a lot by doing a vectorized max once and eliminating the
repeated range:
retval = zeros (nr, nc, class (x));
jj = 1:min (nc, nr+k);
ii = max (1, jj - k);
for j = jj
i = ii(j):nr;
retval (i, j) = x (i, j);
endfor
this gives:
Elapsed time is 0.0148931 seconds.
Elapsed time is 0.0148301 seconds.
Elapsed time is 0.0152121 seconds.
Elapsed time is 0.014981 seconds.
using cellslices + vertcat + reshape is faster still:
m = min (nc, nr+k);
ii = max (1, (1:m) - k);
sl(2,:) = cellslices (x(:), ii, nr*ones (1, m));
sl(1,:) = cellslices (zeros (nr, 1), ones (1, m), ii - 1);
retval = reshape (vertcat (sl{:}), nr, nc);
gives
Elapsed time is 0.00466204 seconds.
Elapsed time is 0.00521612 seconds.
Elapsed time is 0.00525379 seconds.
Elapsed time is 0.00514317 seconds.
for comparison, your patch:
Elapsed time is 0.00818515 seconds.
Elapsed time is 0.00864601 seconds.
Elapsed time is 0.00886297 seconds.
Elapsed time is 0.00826406 seconds.
(quite good, compared to the loop-free code). Winner is number 3.
triu can be handled similarly. Neither of these approaches works
efficiently for sparse functions, though, so it should get a special
code (probably using find). But anyway, I think implementing
triu/tril/vech as compiled functions should be trivial and likely the
fastest.
Out of interest, do you depend on triu/tril efficiency in an application?
regards
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz