[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Parallel access to data created with Octave
From: |
Jarno Rajahalme |
Subject: |
Parallel access to data created with Octave |
Date: |
Fri, 30 Apr 2010 20:25:22 -0700 |
I have for quite some time now wanted to parallelize some OCT-file C++ code
using OpenMP. My code uses some global Octave data (Cells of Cells of vectors
or scalars). I had naively thought that even though Octave is not thread safe,
"just reading" the global variables would be thread-safe :-)
It took some debugging, but I found out that, in order to avoid any dynamic
memory allocations, while traversing the data, I must:
- use data(), get_rep(), and mex_get_data() to get pointers to the data
(otherwise temporaries are dynamically allocated and freed, which messes up bad)
- In cells, I had to index manually using .data()[idx].get_rep() to avoid temp
references
- NOT use dims(), columns(), rows(), length(), or numel() on scalars. Safe on
matrices, but NOT on scalars. Have to inspect the data with is_scalar_type(),
and if yes, length is 1, otherwise use numel() to read the vector length
- results have to be gathered via a pointer, parallel assignments to e.g.
RowVector, even if the space is preallocated, seem to fail.
The data I used was all uint16 matrix and uint16 scalars.
The numel on a scalar type was a had one to track down. There is a static dv
initialization within the scalar dims() function, which was a real pain. At
first I added dummy length(), numel() and capacity() functions (all returning
1), but as I wanted my code to be portable, I deleted these changes in the
ov-base-scalar.h, and used the is_scalar_type() instead.
For example, this crashes when parallel: (pCellStore is pointer to a global
Cell, arg1 is an octave index)
const Cell & su = pCellStore->xelem(arg1-1).cell_value();
const octave_idx_type smax = su.columns();
This is a bit faster, but still crashes, when parallel:
const Cell su(pCellStore->xelem(arg1-1).cell_value());
const octave_idx_type smax = su.columns();
But this does not crash:
const octave_cell * sup = static_cast<const octave_cell
*>(&(pCellStore->data()[arg1-1].get_rep()));
const octave_value * suovp = static_cast<const octave_value
*>(sup->mex_get_data());
const octave_idx_type smax = sup->columns();
const octave_idx_type srows = sup->rows();
And, this is simple, but slow and crashes when parallel: (si is a loop variable)
const uint16NDArray & sn = su.xelem(0,si).uint16_array_value();
This is faster, but still crashes, when parallel:
const octave_value snv(su.xelem(0,si));
const uint16NDArray sna(snv.uint16_array_value());
const octave_idx_type snl = sna.nelem();
const octave_uint16 * snp = sna.fortran_vec();
But this does not crash:
const octave_base_value * snvp(suovp[srows*si].internal_rep()); // either
octave_uint16_matrix or octave_uint16_scalar
const octave_idx_type snl(snvp->is_scalar_type() ? 1 : snvp->numel());
const octave_uint16 * snp(static_cast<const octave_uint16
*>(snvp->mex_get_data())); // works for both
It gets a bit messy, but it is a lot faster, and works also with Octave 3.2.3,
when compiling the oct-file with OpenMP :-)
Overall, with these changes the code became about 1.8x as fast (1 core), and
3.4x as fast (2 cores), than my original code, where I followed the examples in
the Octave manual as I then could. So there is a lot of overhead in "just
accessing" data in OCT files, if one is not careful!
My original C++ itself was more than 145x faster than the equivalent, optimized
.m version. Now the speed difference is whopping 500x. This must be about the
worst case of comparison for Octave, as there are no matrix operations, and
there are 6 levels of for loops involved...
Regards,
Jarno
P.S. Finally, I just remembered that there is free OpenMP also for MSVC, it
comes with one of the SDKs... I used it last year.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Parallel access to data created with Octave,
Jarno Rajahalme <=