octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using OpenMP in Octave


From: Jaroslav Hajek
Subject: Re: Using OpenMP in Octave
Date: Mon, 29 Mar 2010 11:50:21 +0200

On Mon, Mar 29, 2010 at 7:56 AM, Søren Hauberg <address@hidden> wrote:
> man, 29 03 2010 kl. 01:37 +0200, skrev David Bateman:
>> I've had a short discussion with Jaroslav and John off list about
>> implementing OpenMP multi-threading in Octave and want to bring it back
>> to the list. The use of OpenMP with Octave 3.4 will still be too
>> experimental and so if we include the code now I propose to make it off
>> by default and the changeset I committed on Saturday adds the autoconf
>> code to probe for OpenMP support, but only if the "--enable-openmp"
>> configure option is used. Currently it only probes for OpenMP support
>> for gcc and msvc (though the msvc code is untested).
>
> Interesting.
>
>> However, the value 1000 is arbitrary and a little benchmarking is
>> needed. I attach a first experimental changeset for those who want to
>> experiment. Configured with "--enable-openmp" and a recent tip this code
>> sucessfully run through "make check", but I don't know if the choice of
>> array size to switch between single and multithread code is optimal.
>>
>> A couple of interesting tests might be
>>
>> n  = 300; a = ones(n,n,n);
>> tic; sum(a,1); toc
>> tic; sum(a,2); toc
>> tic; sum(a,3); toc
>> n = 999; a = (1+1i)*ones (n,n); tic; a = real(a); toc
>> n = 1001; a = (1+1i)*ones (n,n); tic; a = real(a); toc
>>
>> before and after the change. Unfortunately I'm developing on a atom and
>> so I won't personally see much gain from this multi-threading
>
> I tried your changeset and ran the above test with no noticeable
> difference in speed (0.0114329 seconds for n = 999 and 0.013583 seconds
> for n = 1001) on a dual core laptop.
>
> I tried to increase n to 10000 and still saw no noticeable difference
> between using my ordinary 3.3.51+ installation and the OpenMP version.
> Do I need to do anything to activate OpenMP when running Octave?
>
> Søren
>
>

Hi Soren, try this one instead. I started from David's patch,
simplified some things and encapsulated the tuning constants (max num
of threads and minimum size limit) into get/set functions. I removed
the reductions parallelizations for the time being (those were not
quite correct).

Here's a benchmark:

n = 5e6;
a = rand (n, 1);
b = rand (n, 1);
disp ("operations");
tic; for i = 1:10, -a ; endfor; toc
tic; for i = 1:10, a+b ; endfor; toc
tic; for i = 1:10, a-b ; endfor; toc
tic; for i = 1:10, a.*b ; endfor; toc
tic; for i = 1:10, a./b ; endfor; toc
tic; for i = 1:10, a+=b ; endfor; toc
tic; for i = 1:10, a.*=b ; endfor; toc

disp ("mappers");
tic; for i = 1:10, exp(a); endfor; toc
tic; for i = 1:10, sin(a); endfor; toc
tic; for i = 1:10, erf(a); endfor; toc
tic; for i = 1:10, erfinv(a); endfor; toc

at my Core 2 Duo, g++ -O3 -march=native,
with a recent tip, I get:

address@hidden:~/devel/octave/main> octave -q ttmt.m
operations
Elapsed time is 0.279794 seconds.
Elapsed time is 0.348097 seconds.
Elapsed time is 0.352154 seconds.
Elapsed time is 0.348534 seconds.
Elapsed time is 0.371341 seconds.
Elapsed time is 0.222977 seconds.
Elapsed time is 0.221841 seconds.
mappers
Elapsed time is 1.40801 seconds.
Elapsed time is 1.34872 seconds.
Elapsed time is 1.50267 seconds.
Elapsed time is 3.00263 seconds.


with the new patch, I get:

address@hidden:~/devel/octave/main> ./run-octave -q ttmt.m
operations
Elapsed time is 0.222325 seconds.
Elapsed time is 0.292047 seconds.
Elapsed time is 0.293674 seconds.
Elapsed time is 0.291939 seconds.
Elapsed time is 0.287067 seconds.
Elapsed time is 0.227352 seconds.
Elapsed time is 0.215154 seconds.
mappers
Elapsed time is 0.774295 seconds.
Elapsed time is 0.736404 seconds.
Elapsed time is 0.817736 seconds.
Elapsed time is 1.62565 seconds.

Unfortunately, it confirms what I anticipated: the elementary
operations scale poorly. Memory bandwidth is probably the real limit
here. The mappers involve more work per cycle and hence scale much
better.

This is why I think we should not hurry with multithreading the
elementary operations, and reductions like sum(). I know Matlab does
it, but I think it's just fancy stuff, to convince customers that new
versions add significant value.
Elementary operations are seldom a bottleneck; add Amdahl's law to
their poor scaling and the result is going to be very little music for
lots of money.

When I read about Matlab getting parallelized stuff like sum(), I was
a little surprised. 50 million numbers get summed in 0.07 seconds on
my computer; generating them in some non-trivial way typically takes
at least 50 times that long, often much more. In that case,
multithreaded sum is absolutely marginal, even if it scaled perfectly.

One area where multithreading really helps is the complicated mappers,
as shown by the second part of the benchmark.
Still, I think we should carefully consider how to best provide parallelism.
For instance, I would be happy with explicit parallelism, something
like pararrayfun from the OctaveForge package, so that I could write:

pararrayfun (3, @erf, x, "ChunksPerProc", 100); # parallelize on 3
threads, splitting the array to 300 chunks.

Note that if I was about to parallelize a larger section of code that
uses erf, I could do

erf = @(x) pararrayfun (3, @erf, x, "ChunksPerProc", 100); # use
parallel erf for the rest of the code

If we really insisted that the builtin functions must support
parallelism, I say it must fulfill at least the following:

1. an easy way of temporarily disabling it must exist (for high-level
parallel constructs like parcellfun, it should be done automatically)
2. the tuning constants should be customizable.

for instance, I can imagine something like

mt_size_limit ("sin", 1000); # parallelize sin for arrays with > 1000 elements
mt_size_limit ("erfinv", 500); # parallelize erfinv for arrays with >
500 elements

We have no chance to determine the best constant for all machines, so
I think users should be allowed to find out their own.

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz

Attachment: mt-start.diff
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]