octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FYI: bsxfun optimized


From: Jaroslav Hajek
Subject: Re: FYI: bsxfun optimized
Date: Tue, 20 Oct 2009 12:45:56 +0200

On Tue, Oct 20, 2009 at 11:13 AM, Jaroslav Hajek <address@hidden> wrote:
> hi all,
> I optimized bsxfun (binary singleton expansion function applier) to be
> faster when some common built-in functions are given. The bsxfun
> operations are also available from liboctave as bsxfun_add (NDArray,
> NDArray) etc.
> http://hg.savannah.gnu.org/hgweb/octave/rev/26abff55f6fe
>
> The following benchmark illustrates the speed-up (set n to suitable value)
>
> n = 200;
> i = ones (1, n);
> a = rand (n, n, 1);
> b = rand (n, n, n);
> tic; bsxfun (@plus, a, b); toc
> a = rand (n, 1, 1);
> tic; bsxfun (@minus, a, b); toc
> a = rand (1, n, 1);
> tic; bsxfun (@times, a, b); toc
>
> a = rand (n, n, 1);
> b = rand (1, n, n);
> tic; bsxfun (@rdivide, a, b); toc
>
> b = rand (1, 1, n);
> tic; bsxfun (@lt, a, b); toc
>
> a = rand (2, n, n/2, n); # heavily multi-dim
> b = rand (1, 1, n/2, n);
> tic; bsxfun (@gt, a, b); toc
>
> on a Core 2 Duo @ 2.83 GHz, g++ -O3 -march=native,
> with a recent tip, I get:
>
> Elapsed time is 0.697962 seconds.
> Elapsed time is 0.645682 seconds.
> Elapsed time is 0.709098 seconds.
> Elapsed time is 0.720162 seconds.
> Elapsed time is 0.608029 seconds.
> Elapsed time is 24.2397 seconds.
>
> with the new patch, I got
>
> Elapsed time is 0.0434361 seconds.
> Elapsed time is 0.048116 seconds.
> Elapsed time is 0.0503559 seconds.
> Elapsed time is 0.0644929 seconds.
> Elapsed time is 0.0164821 seconds.
> Elapsed time is 0.098119 seconds.
>
> an interesting third option is a naive bsxfun replacement, which just
> spreads the arrays to a common size and then applies the operator:
>
> function r = bsxfun (op, x, y)
>  nd = max (ndims (x), ndims (y));
>  xidx = yidx = repmat({':'}, 1, nd);
>  for i = 1:nd
>    sx = size (x, i);
>    sy = size (y, i);
>    if (sx == 1)
>      xidx{i} = ones (1, sy);
>    endif
>    if (sy == 1)
>      yidx{i} = ones (1, sx);
>    endif
>  endfor
>
>  r = op (x(xidx{:}), y(yidx{:}));
> endfunction
>
> with this replacement, I get:
>
> Elapsed time is 0.0920911 seconds.
> Elapsed time is 0.098748 seconds.
> Elapsed time is 0.097363 seconds.
> Elapsed time is 0.137549 seconds.
> Elapsed time is 0.109806 seconds.
> Elapsed time is 0.18157 seconds.
>
> of course, the memory consumption will be up to 3x as big, but in
> general these are closer to their optimized counterparts than the
> original code.
>
> Currently, only some common operations are covered, such as the basic
> operations +,-,*,/, relations =,!=,<,>,<=,>= and pairwise min, max are
> covered, and the optimizations only work if the operation is
> homogeneous. More can be added if there is interest.
>
> One further question that remains is whether to retain the old code,
> or replace it with the spread-and-apply approach above.
> The old code is more memory-efficient (it creates the result array and
> then updates it), but apparently slower, sometimes significantly.
> Opinions?
>
> enjoy
>
> --
> RNDr. Jaroslav Hajek
> computing expert & GNU Octave developer
> Aeronautical Research and Test Institute (VZLU)
> Prague, Czech Republic
> url: www.highegg.matfyz.cz
>

Btw, as an example of usage, I optimized the "center" function - which
removes the means from an array along a specified dimension, and is a
typical example of a bsxfun-like operation...
http://hg.savannah.gnu.org/hgweb/octave/rev/fb3543975ed9

benchmark:

n = 4000;
a = rand (n);
tic; center (a); toc
tic; center (a(:)); toc

old:

Elapsed time is 0.192539 seconds.
Elapsed time is 0.210404 seconds.
Elapsed time is 0.114803 seconds.

new:
Elapsed time is 0.102689 seconds.
Elapsed time is 0.116499 seconds.
Elapsed time is 0.113716 seconds.

the vector case was already optimized before. In the matrix cases, the
expected factor ~2 speed-up is seen (and it saves one temporary
array).

enjoy

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]