On Sat, Nov 28, 2009 at 6:26 AM, Judd Storrs
<address@hidden> wrote:
This is pretty cool!
Is it possible to somehow pre-compute/convert the mask? I've often
used find() to pre-convert a mask into index form because I think that
improves performance when the mask is used multiple times (but it may
have been a habit that I picked up when I was working primarily in IDL
where this is part of the collective folk wisdom). e.g. things along
these lines:
mask = find(a > 0) ;
a(mask) = b(mask) ;
Not really. Let me explain how this works (not in 3.2.x though). When a matrix is first used in index _expression_, the internal conversion to index_vector is, if successful, cached for subsequent uses. This optimizes usages like
c(mask) = a(mask) + b(mask)
because the conversion of mask has only to be done once. But beware that this is still evaluated by pieces, i.e.
aa = a(mask);
bb = b(mask);
cc = aa + bb;
c(mask) = cc;
This explains why find is *apparently* faster in your benchmark, because find creates the result with the index vector *already cached*. Try putting the first tic; prior to
mask = , and you will see. Also, in the last (full) section you've missed an assignment to mask:
"a > 0.0;" should be "mask = a > 0.0;"
Here's what I get with a corrected benchmark (the 4e7 case is too much for my home laptop):
n = 100000
sparse mask: 1st: 0.001931, 2nd: 0.000562
sparse mask(find): 1st: 0.001640, 2nd: 0.000611
sparse mask(where): 1st: 0.002379, 2nd: 0.000562
dense mask: 1st: 0.004646, 2nd: 0.003817
dense mask(find): 1st: 0.005676, 2nd: 0.003681
dense mask(where): 1st: 0.005042, 2nd: 0.003642
full mask: 1st: 0.002669, 2nd: 0.001587
full mask(find): 1st: 0.006236, 2nd: 0.004371
full mask(where): 1st: 0.005657, 2nd: 0.003588
n = 1000000
sparse mask: 1st: 0.017749, 2nd: 0.009287
sparse mask(find): 1st: 0.018799, 2nd: 0.008243
sparse mask(where): 1st: 0.016918, 2nd: 0.007967
dense mask: 1st: 0.044704, 2nd: 0.038776
dense mask(find): 1st: 0.058628, 2nd: 0.039231
dense mask(where): 1st: 0.054648, 2nd: 0.035887
full mask: 1st: 0.028402, 2nd: 0.017976
full mask(find): 1st: 0.057978, 2nd: 0.042992
full mask(where): 1st: 0.057311, 2nd: 0.038398
n = 10000000
sparse mask: 1st: 0.178092, 2nd: 0.092343
sparse mask(find): 1st: 0.182628, 2nd: 0.079860
sparse mask(where): 1st: 0.165752, 2nd: 0.079588
dense mask: 1st: 0.442568, 2nd: 0.392207
dense mask(find): 1st: 0.587378, 2nd: 0.398332
dense mask(where): 1st: 0.601536, 2nd: 0.397812
full mask: 1st: 0.256911, 2nd: 0.192204
full mask(find): 1st: 0.625493, 2nd: 0.435221
full mask(where): 1st: 0.639314, 2nd: 0.434564
so you can see find is actually slower. Not by much because the time is drowned in the other work; there's the mask creation, two index expressions, one addition and one indexed assignment. It's mainly visible in the dense and full mask cases. In fact, "find" not only *always* converts to index vector, it also creates a *double* array corresponding to the result (to form a valid octave_value). Hence, in development version, using raw logical masks is more efficient in (almost?) all cases.
I've been playing with the new code a bit and I think the find() trick
defeats the new optimization? I tried to create a simple where.oct
that converts its input into the new smart mask but I think I've
missed something
#include <octave/oct.h>
DEFUN_DLD(where, args, ,
"Convert mask to index")
{
idx_vector idx = args(0).index_vector() ;
return octave_value(idx);
}
Spoiler alert: something's wrong here because 'where' is slower than
using the raw boolean matrix. Here's the modified benchmark code I've
been using:
Yeah. Regarding this, I'm not yet fully sure. My rationale was that when you assign an index_vector to octave_value,
you expect to get a numeric array, not a logical one. So, if the idx_vector is actually a mask, it is converted to an index array. Hence, your where function actually simulates "find" (its simplest case). Giving it a second thought, maybe it should be possible to return a logical array with cached index; still, I believe one should never create a numeric array with a cached logical mask. That would be a bit weird. Hence, "find" will need to do the conversion.
here's the corrected benchmark: