[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: fibers,questions about thread id and mutation of vectors
From: |
Damien Mattei |
Subject: |
Re: fibers,questions about thread id and mutation of vectors |
Date: |
Tue, 17 Jan 2023 10:42:33 +0100 |
Hello Maxime,
it runs in the fastest way with your idea,
as you said it scm_init_guile() is only needed once by thread.
On Fri, Jan 13, 2023 at 1:23 PM Maxime Devos <maximedevos@telenet.be> wrote:
> > for (i=start; i<=stop; i++) { /* i is private by default */
> >
> > scm_init_guile();
> > scm_call_1( func , scm_from_int(i) );
>
> IIUC, you are calling scm_init_guile once per index, whereas calling it
>
yes openMP slice a 1 to N for loop in N/number_of_cpus segments of normal C
for loop but run one loop per CPUs so if you do a 'top' command on a C
openMP code you will see a load of number_of_cpus*100%
for example with 12 cpus top will then display a load for your program of
1200% furthermore if you hit the 1 key you would see in top the load of
each CPU (100% each) the same options does not exist with 'top' of BSD like
Mac OS.
OpenMP do a partition of N and run exactly each part on one thread ,each
thread on a different CPU or core, i think it is the only library that can
do that , OpenMP is written very near of the compiler and LLVM.
In general there is a Master thread and slave threads or you can run a
special code only on the first thread to fork (master one or the first to
launch) and friday unfortunately i tried the single pragma:
https://www.openmp.org/spec-html/5.0/openmpsu38.html
but that can not help becaus it run only on the first thread.
a solution of the problem could be this one:
Executing Code Once Per Thread in an OpenMP Loop
<https://ofekshilon.com/2014/06/10/executing-code-once-per-thread-in-an-openmp-loop/>
https://www.openmp.org/spec-html/5.0/openmpsu38.html
but it is (Visual C++) and even with g++ this would be not compatible.
so i use a basic C solution with static and array that keep in memory if
the scm_init_guile() as already been launch for the current thread the
code is running now.
I also put omp_get_max_threads() in a static var as openmp() is called many
times in my codes and the number of available hardware cpus would change
never.
the code is here:
https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c
unfortunately i find no real speed up, i understood that the only reason of
speed up was because the C 'for loop is much faster than the Scheme 'for
ones.
For this concclusion i compared Scheme and C openmp and C without openMP
and in C i got exactly the same time results:
Scheme:
... [output cut]
Chrono START number: 165 minterms-vector-length = 10944. chrono STOP :
elapsedTime = 36.219 ms.totalComputationTime =485311.94
Chrono START number: 166 minterms-vector-length = 12008. chrono STOP :
elapsedTime = 39.82 ms.totalComputationTime =485351.76
Chrono START number: 167 minterms-vector-length = 342. chrono STOP :
elapsedTime = 1.215 ms.totalComputationTime =485352.97500000003
Scheme with OpenMP call:
...[output cut]
Chrono START number: 165 minterms-vector-length = 10944. chrono STOP :
elapsedTime = 35.039 ms.Open MP totalComputationTime =385444.1410000001
Chrono START number: 166 minterms-vector-length = 12008. chrono STOP :
elapsedTime = 37.792 ms.Open MP totalComputationTime =385481.93300000014
Chrono START number: 167 minterms-vector-length = 342. chrono STOP :
elapsedTime = 1.163 ms.Open MP totalComputationTime =385483.09600000014
Scheme with C 'for loop call:
...[output cut]
Chrono START number: 165 minterms-vector-length = 10944. chrono STOP :
elapsedTime = 33.104 ms.For Funct totalComputationTime =385543.4700000001
Chrono START number: 166 minterms-vector-length = 12008. chrono STOP :
elapsedTime = 35.938 ms.For Funct totalComputationTime =385579.4080000001
Chrono START number: 167 minterms-vector-length = 342. chrono STOP :
elapsedTime = 1.165 ms.For Funct totalComputationTime =385580.5730000001
on the C codes (// openmp and sequenctial for) the result
is almost the same :
totalComputationTime =385580.5730000001 ms
totalComputationTime =385483.09600000014 ms
=385 s
i suppose openMP works well by slicing on many processors but the
scm_call_1( func , scm_from_int(i) );
works all on the same thread that host the Guile interpreter.
Solution would be to have many Guile interpreter running but i do not know
how doing that from the C code with OpenMP.
Damien
note : i did time measure both in C and Scheme with gettimeofday code to
compare both 100% scheme code and mixed one:
https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3500
- fibers,questions about thread id and mutation of vectors, Damien Mattei, 2023/01/06
- Re: fibers,questions about thread id and mutation of vectors, Maxime Devos, 2023/01/06
- Re: fibers,questions about thread id and mutation of vectors, Damien Mattei, 2023/01/06
- Re: fibers,questions about thread id and mutation of vectors, Damien Mattei, 2023/01/06
- Re: fibers,questions about thread id and mutation of vectors, Maxime Devos, 2023/01/06
- Re: fibers,questions about thread id and mutation of vectors, Damien Mattei, 2023/01/13
- Re: fibers,questions about thread id and mutation of vectors, Maxime Devos, 2023/01/13
- Re: fibers,questions about thread id and mutation of vectors,
Damien Mattei <=