guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: map-par slower than map


From: Damien Mattei
Subject: Re: map-par slower than map
Date: Thu, 13 Oct 2022 09:40:28 +0200

ok , i think the problem comes both from my code and from guile parmap so.
Obviously parmap could be slower on other codes because of the nature of
list i think, it is hard to split a list in sublist and send them to thread
and after redo a single list, i better use vector.
As mentioned and forget by me, i apologize, i use an hash table which is a
global variable and mutex can be set on it , no dead lock , here but it
slow down the code than it is dead for speed , but not locked.

The examples given are good but i do not want to write long and specific
code for //, for // must a ssimple as OpenMP directives (on arrays) not be
a pain in the ass like things i already did at work with GPUs in C or
Fortran,that's nightmare and i do not want to have this in functional
programming.

ok i will try to modify the code but i do not think there is easy solution
to replace the hash table, any structure i would use would be global and
the problem (this function is not pure, it does a side-effect),at the end i
do not think this code could be // but i'm not a specialist of //. I think
this is a parallelization problem, how can we deal with a global variable
such as an hash table?

On Wed, Oct 12, 2022 at 11:55 PM Olivier Dion <olivier.dion@polymtl.ca>
wrote:

> On Wed, 12 Oct 2022, Damien Mattei <damien.mattei@gmail.com> wrote:
> > Hello,
> > all is in the title, i test on a approximately 30000 element list , i got
> > 9s with map and 3min 30s with par-map on exactly the same piece of
> > code!?
>
> I can only speculate here.  But trying with a very simple example here:
> --8<---------------cut here---------------start------------->8---
> (use-modules (statprof))
> (statprof (lambda () (par-map 1+ (iota 300000))))
> --8<---------------cut here---------------end--------------->8---
>
> Performance are terrible.  I don't know how par-map is implemented, but
> if it does 1 element static scheduling -- which it probably does because
> you pass a linked list and not a vector -- then yeah you can assure that
> thing will be very slow.
>
> You're probably better off with dynamic scheduling with vectors.  Here's
> a quick snippet I made for static scheduling but with vectors.  Feel
> free to roll your own.
>
> --8<---------------cut here---------------start------------->8---
> (use-modules
>  (srfi srfi-1)
>  (ice-9 threads))
>
> (define* (par-map-vector proc input
>                          #:optional
>                          (max-thread (current-processor-count)))
>
>   (let* ((block-size (quotient (vector-length input) max-thread))
>          (rest (remainder (vector-length input) max-thread))
>          (output (make-vector (vector-length input) #f)))
>     (when (not (zero? block-size))
>       (let ((mtx (make-mutex))
>             (cnd (make-condition-variable))
>             (n 0))
>         (fold
>          (lambda (scale output)
>            (begin-thread
>             (let lp ((i 0))
>               (when (< i block-size)
>                 (let ((i (+ i (* scale block-size))))
>                   (vector-set! output i (proc (vector-ref input i))))
>                 (lp (1+ i))))
>             (with-mutex mtx
>               (set! n (1+ n))
>               (signal-condition-variable cnd)))
>            output)
>          output
>          (iota max-thread))
>         (with-mutex mtx
>           (while (not (< n max-thread))
>             (wait-condition-variable cnd mtx))))
>     (let ((base (- (vector-length input) rest)))
>       (let lp ((i 0))
>         (when (< i rest)
>           (let ((i (+ i base)))
>             (vector-set! output i (proc (vector-ref input i))))
>           (lp (1+ i)))))
>     output))
> --8<---------------cut here---------------end--------------->8---
>
> --
> Olivier Dion
> oldiob.dev
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]