octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rewrite of structs - advice sought


From: Jaroslav Hajek
Subject: Re: rewrite of structs - advice sought
Date: Thu, 1 Jul 2010 11:06:21 +0200

On Wed, Jun 23, 2010 at 3:07 PM, John W. Eaton <address@hidden> wrote:

> If the new code is working and tests are passing, then I'd say check
> it in when you are ready.  I can help convert uses of Octave_map in
> Octave to use the new classes.
>
> Thanks,
>
> jwe
>

Almost there now. Implementation is mostly finished, the test suite is
passing now.

Attached is a benchmark script giving an idea of the new optimization
possibilities. I encourage everyone
to clone http://hg.tw-math.de/highegg and try.

My machine is a Core 2 Duo @ 2.83 GHz. Octave is compiled with -O3
-march=native in both cases.

With a recent tip of main archive, I get:

make big struct matrix with three fields
Elapsed time is 0.0149531 seconds.
splitting it into scalar structs using num2cell
Elapsed time is 1.61135 seconds.
concatenating first 10 columns horizontally
Elapsed time is 0.914758 seconds.
concatenating independently created structs
Elapsed time is 1.00762 seconds.
for loop over elements using indices
Elapsed time is 1.9454 seconds.
loop directly over elements
Elapsed time is 1.07845 seconds.
a smaller struct array with many fields
Elapsed time is 0.000844955 seconds.
for loop over elements using indices
Elapsed time is 0.504879 seconds.
loop directly over elements
Elapsed time is 0.50414 seconds.
stat all files in /usr/bin/*, gather as cell
Elapsed time is 0.063081 seconds.
stat all files in /usr/bin/*, gather as struct array
Elapsed time is 1.10384 seconds.


with my archive's tip, I get this:

make big struct matrix with three fields
Elapsed time is 0.011476 seconds.
splitting it into scalar structs using num2cell
Elapsed time is 0.0581141 seconds.
concatenating first 10 columns horizontally
Elapsed time is 0.00216007 seconds.
concatenating independently created structs
Elapsed time is 0.010855 seconds.
for loop over elements using indices
Elapsed time is 1.11567 seconds.
loop directly over elements
Elapsed time is 0.359794 seconds.
a smaller struct array with many fields
Elapsed time is 0.00538397 seconds.
for loop over elements using indices
Elapsed time is 0.0776331 seconds.
loop directly over elements
Elapsed time is 0.0750151 seconds.
stat all files in /usr/bin/*, gather as cell
Elapsed time is 0.0416861 seconds.
stat all files in /usr/bin/*, gather as struct array
Elapsed time is 0.052321 seconds.

discussion:

splitting a struct array into scalar structs using num2cell is about
25x faster, even for small numbers of fields. In fact, it is also
asymptotically more optimal, being O(nelem * nfields) instead of
O(nelem * nfields * log(nfields)) as before.

The next timing show how the shared field structure is exploited by
concatenation, making it more than 400x faster. If the structs are
created independently (next timing), the operation is 5x slower, yet
still 100x faster than it used to be.

The next two timings show a simple for-looping over elements of a
struct array, either using an index or using the struct directly. The
speed-ups are moderate, 74% and somewhat better 200% in the latter
case (less interpreter overhead).

However, switching from an nxn struct array with 3 fields to a 1xn
struct array with 10*n fields in the next 3 timings, shows the
speed-up factor rising to 6.5x. This demonstrates that indexing is now
asymptotically faster, because instead of constructing a std::map
element by element which takes superlinear time, a refcount is
increased and a linear copy of data is performed.

The final two timings are the original cellfun example I gave. The
extra work for gathering results into a struct is cut down from 1.04s
to 0.01 s, making it 100x faster. Notice also how the stat() function
itself got a bit faster by merely using octave_scalar_map instead of
Octave_map.

The old Octave_map is still used in a lot of places, but thanks to the
automatic conversions the transition can be made gradually.

The most important question remaining open is what to do with classes.
These are still based on Octave_map. One possible approach is to just
convert to octave_map. This will be the simplest approach, but it
won't allow classes to benefit from the optimizations much. Creating
an extra scalar container also for scalar classes seems like a big
waste of code.

IMO, it would be best if octave_class used storage like this:

octave_value this_fields; // can be a scalar struct or struct array
octave_scalar_map ancestors; // ancestors stored as octave_values

besides enabling quick construction of new classes (almost all could
be shallow copied), and faster struct-like indexing operations, this
could also help us to make Octave's inheritance actually work with
non-scalar classes (I think currently these fail already on
construction and there is no simple remedy).

However, since I no longer have access to Matlab, I feel unqualified
to do such a significant rewrite functionality, because I'm far from
being sure exactly how everything about inheritance + class arrays
works. I believe it should be done by someone who can have a Matlab
session open while hacking. So, unless any such person steps boldly
forward, I'll just go with the simple solution.

One more thing I want to do is to extend the horzcat/vertcat/cat
optimizations to the [] operator. After that, I think the code is
ready to be merged upstream. Please share your opinions, comments etc.

enjoy!

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Attachment: ttnc.m
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]