octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: smarter indexing by logical masks


From: Jaroslav Hajek
Subject: FYI: smarter indexing by logical masks
Date: Fri, 27 Nov 2009 21:44:20 +0100

hi all,

the attached changeset implements further optimizations to the indexing machinery when dealing with logical masks:
http://hg.savannah.gnu.org/hgweb/octave/rev/034677ab6865

Rationale:
Up to now, Octave always converted logical masks to index vectors for indexing, via the octave_value::index_vector method. This allows efficient random access and is generally beneficial for masks where most elements are false.
However, when the mask is nearly full, the storage for index array is up to 4x or 8x (for 64-bit indexing) larger, hence incurring a significant penalty for memory traffic, as well as the penalty for the conversion itself.

With the new change, octave will convert the mask to index array only if at most 1/8 (or 1/16 for 64-bit indexing) of elements are true; i.e. if the index array takes at most half the memory of the mask.
The innermost loops are specialized for the mask case, and contiguous subrange cases are also detected.

This is beneficial for expressions like x(x != 0) in which you expect the condition to be true for most or all elements;

The following benchmark illustrates the improvements:
for n = [1e5 1e6 1e7 4e7]
  printf ("n = %d\n", n);
  a = rand (n, 1);
  mask = a > 0.9;
  tic; a(mask); t1 = toc;
  tic; a(mask); t2 = toc;
  fprintf ("sparse mask: 1st: %f, 2nd: %f\n", t1, t2);
  mask = a > 0.1;
  tic; a(mask); t1 = toc;
  tic; a(mask); t2 = toc;
  fprintf ("dense mask: 1st: %f, 2nd: %f\n", t1, t2);
  mask = a > 0.0;
  tic; a(mask); t1 = toc;
  tic; a(mask); t2 = toc;
  fprintf ("full mask: 1st: %f, 2nd: %f\n", t1, t2);
endfor

on a Core 2 Duo @ 1.83 GHz, I get with a recent tip:

n = 100000
sparse mask: 1st: 0.000928, 2nd: 0.000157
dense mask: 1st: 0.001803, 2nd: 0.001093
full mask: 1st: 0.001564, 2nd: 0.001318
n = 1000000
sparse mask: 1st: 0.008774, 2nd: 0.002222
dense mask: 1st: 0.020419, 2nd: 0.011310
full mask: 1st: 0.020780, 2nd: 0.012612
n = 10000000
sparse mask: 1st: 0.086547, 2nd: 0.022133
dense mask: 1st: 0.196409, 2nd: 0.107571
full mask: 1st: 0.203453, 2nd: 0.118685
n = 40000000
sparse mask: 1st: 0.366365, 2nd: 0.093962
dense mask: 1st: 0.790059, 2nd: 0.421204
full mask: 1st: 0.805457, 2nd: 0.466647

and with the new change:

n = 100000
sparse mask: 1st: 0.003601, 2nd: 0.000167
dense mask: 1st: 0.001102, 2nd: 0.001247
full mask: 1st: 0.000347, 2nd: 0.000013
n = 1000000
sparse mask: 1st: 0.007920, 2nd: 0.002377
dense mask: 1st: 0.013383, 2nd: 0.011627
full mask: 1st: 0.003406, 2nd: 0.000019
n = 10000000
sparse mask: 1st: 0.078718, 2nd: 0.023621
dense mask: 1st: 0.115947, 2nd: 0.108459
full mask: 1st: 0.031419, 2nd: 0.000039
n = 40000000
sparse mask: 1st: 0.325938, 2nd: 0.093050
dense mask: 1st: 0.462140, 2nd: 0.424088
full mask: 1st: 0.119612, 2nd: 0.000044

apparently, there is up to 70% speedup for the first indexing with dense masks, no penalty for subsequent ones.
for full masks, the speed-up is more than 6x (570%) because Octave detects that a shallow copy can be used.

any comments?

enjoy
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

reply via email to

[Prev in Thread] Current Thread [Next in Thread]