Rationale: Up to now, Octave always converted logical masks to index vectors for indexing, via the octave_value::index_vector method. This allows efficient random access and is generally beneficial for masks where most elements are false.
However, when the mask is nearly full, the storage for index array is up to 4x or 8x (for 64-bit indexing) larger, hence incurring a significant penalty for memory traffic, as well as the penalty for the conversion itself.
With the new change, octave will convert the mask to index array only if at most 1/8 (or 1/16 for 64-bit indexing) of elements are true; i.e. if the index array takes at most half the memory of the mask. The innermost loops are specialized for the mask case, and contiguous subrange cases are also detected.
This is beneficial for expressions like x(x != 0) in which you expect the condition to be true for most or all elements;
apparently, there is up to 70% speedup for the first indexing with dense masks, no penalty for subsequent ones. for full masks, the speed-up is more than 6x (570%) because Octave detects that a shallow copy can be used.
any comments?
enjoy --
RNDr. Jaroslav Hajek computing expert & GNU Octave developer Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic url: www.highegg.matfyz.cz