Re: CPU usage by call of C++ code through system() on Linux

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CPU usage by call of C++ code through system() on Linux

From:	Andreas Stahel
Subject:	Re: CPU usage by call of C++ code through system() on Linux
Date:	Fri, 7 Aug 2020 09:55:11 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0



On 6/29/20 4:37 PM, Andreas Stahel wrote:



On 29.06.20 09:49, Kai Torben Ohlhus wrote:

On 6/26/20 3:58 PM, Andreas Stahel wrote:

On 6/26/20 6:28 AM, Kai Torben Ohlhus wrote:

On 6/26/20 1:17 AM, Andreas Stahel wrote:

Dear Octave Users

Maybe one of you can give me a hint on how to make my Octave code run
faster.
Within a good size program (run time 40 sec) the command system() is
used to call a C++ code.
The C++ code uses pthreads.
While the code is running htop show approximately 40% of load by the
kernel on each CPU and 60% "normal" (user space?).

When running the same code in Matlab only the "normal"load shows and
very little kernel load on the CPUs.
The computation time by Matlab is also only 60% of the time consumed by
Octave (5.2.0)
The system is an Ubuntu 20.04 on a AMD Ryzen 3950X.

Any hints on what is slowing Octave down?

With best regards

Andreas


Dear Andreas,

Maybe I do not understand your setup correctly.  You have a C++ code
using threads compiled to, e.g. "code.exe" (the suffix does not matter),
and an Octave script "benchmark.m" with somewhere the code line

     system ("code.exe")

First question is, do "benchmark.m" and "code.exe" interact with each
other?  Means, does "code.exe" compute something that "benchmark.m"
processes further by importing results?  What is the purpose of Octave
calling "code.exe"?  Benchmarking with tic-toc?

Second question, does "code.exe" (standalone, without Octave or Matlab)
or "benchmark.m" (called from Octave or Matlab) have a run time of 40
seconds?

Now to your observation.  When running "benchmark.m" in Octave and
Matlab you observe Octave is slower.  I do not understand how this is
related to the CPU "kernel" and "normal" usage?  What is the runtime of
"benchmark.m" in Matlab and Octave, respectively?  Do you complain not
all CPU cores are used?

Maybe it is best to give us (some) code to better understand the
situation.

Kai

Dear Kai

Thank you for the quick reply and attempt to locate the problem.
The code in "benchmark.m" is a loop with 600 iterations.
In each iteration a C++ code is called through system().
The C++ code is heavily threaded, and using FFTW extensively. FFTW is
used as single thread library.
Thu multithreading is "hand coded"
I have two options set up
  NumIter = 0, no   FFT computations
  NumIter = 2, many FFT computations
In addition I called the binary with a loop in bash.
These are the observed wall times, averaged for one call of the binary

– Octave NumIter=2 : 59.6 ms, NumIter=0 : 16.3 ms,
– MATLAB NumIter=2 : 38.3 ms, NumIter=0 : 20.1 ms,
– bash   NumIter=2 : 37.9 ms, NumIter=0 : 19.2 ms,

This puzzles me thoroughly!

Andreas

PS. on nabble these messages show up in the wrong thread!


Dear Andreas,

The maintainers list was not in the CC.  Sorry for the late reply.

I am still not really convinced, that I understand your setup and the
purpose of your computation.

Is there any output or synchronization between "code.exe" or
"benchmark.m"?  The Octave interpreter interpreting a for-loop alone
consumes already "lots of time" compared to your fast overall
computation time.

    a = 0; tic; for i = 1:600, a = a + i; end; toc

    Octave 1.53995 ms.
    Matlab 0.025   ms.

So maybe you just measure "slow" code interpretation when the body of
the for-loop is "heavier" than the one shown above?

Do you measure your wall time inside "code.exe" or in "benchmark.m" by
tic-toc, like in my example?  Maybe you find no differences, if you use
a more precise C/C++ library to measure the wall time and return it for
further processing by Octave or Matlab?

Kai

Dear Kai
Thank you for your effort.
Here an attempt to clear up the situation.
The loop runs over 600 frames, the timings given as average per frame.

In the code "benchmark.m" the time per frame is measured by a tic()/toc() pair.
tic();
system(command);  %% this is where the computations are performed
systemtime = toc();
display(sprintf('time = %f',systemtime))              % to get an impression 
while it is running
systemtimetotal = systemtimetotal+systemtime;

Based on your suggestion I added two system calls to  gettimeofday() in the C 
code.
The observed timing is consistent with the tic()/toc() result, i.e. tic()/toc() 
slightly higher.

The C code was compiled with
gcc -O3 -Wall  RunMultipleTH_z_Neumann2.c -lpthread -lm -lfftw3 -o 
RunMultipleTH_z_Neumann2

"benchmark.m"  and "code.exe" exchange some information through files.
    I timed those file reads and writes, it uses very little time.

on a host with a Ryzen 3950X CPU
* running "code.exe" in a bash loop leads to 33 ms per frame
   htop has almost all of the CPU load assigned to the user
* running the code in Octave leads to  59 ms per frame
   htop has a sizable part of the CPU load assigned to kernel
* running the code in Matlab leads to 37 ms per frame
   htop has almost all of the CPU load assigned to the user

If I reduce the FFTW computations withing "code.exe" Octave is faster than
bash or Matlab, but by very little. The multiple threads are still launched 
within the C code,
but no FFT 2D operations applied.


On a host with a Intel Xeon E5-1650 CPU a similar effect occurs, not quite as 
drastic
            bash      80 ms
            Matlab   99 ms
            Octave 127 ms

I have no idea what could cause this surprising effect.

Enjoy the day

Andreas


Questions answered. It is an effect caused by using openBLAS. If setting the
environment variable by "export OPENBLAS_NUM_THREADs=1" before starting
Octave, then the speed is similar to Matlab or bash.

Enjoy the day

Andreas
--
Andreas Stahel
Mathematics, BFH-TI  E-Mail: Andreas.Stahel@[ANTI-SPAM]bfh.ch
Quellgasse 21        HuCE, Institute for Human Centered Engineering
CH-2502 Biel         WWW:   https://web.sha1.bfh.science
Switzerland          Phone: ++41 +32 32 16 258

[Prev in Thread]

Current Thread

[Next in Thread]

Re: CPU usage by call of C++ code through system() on Linux, Andreas Stahel <=

Next by Date: sparsersb 1.0.8 package release
Next by thread: sparsersb 1.0.8 package release
Index(es):
- Date
- Thread