guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: map-par slower than map


From: Damien Mattei
Subject: Re: map-par slower than map
Date: Thu, 13 Oct 2022 10:20:19 +0200

ok 'parallelization code and hash table' in google show no real solution, i
have another idea,instead of function-unify-two-minterms-and-tag ,just
unify the minterms and tag them (which use hash table) later out of the //
region, after if it still sucks i will drop list for vectors, making it
step by step will show where is the bottleneck, let me just a few time and
i will go back to the mailing lists with interesting results i hope....

On Thu, Oct 13, 2022 at 9:40 AM Damien Mattei <damien.mattei@gmail.com>
wrote:

> ok , i think the problem comes both from my code and from guile parmap so.
> Obviously parmap could be slower on other codes because of the nature of
> list i think, it is hard to split a list in sublist and send them to thread
> and after redo a single list, i better use vector.
> As mentioned and forget by me, i apologize, i use an hash table which is a
> global variable and mutex can be set on it , no dead lock , here but it
> slow down the code than it is dead for speed , but not locked.
>
> The examples given are good but i do not want to write long and specific
> code for //, for // must a ssimple as OpenMP directives (on arrays) not be
> a pain in the ass like things i already did at work with GPUs in C or
> Fortran,that's nightmare and i do not want to have this in functional
> programming.
>
> ok i will try to modify the code but i do not think there is easy solution
> to replace the hash table, any structure i would use would be global and
> the problem (this function is not pure, it does a side-effect),at the end i
> do not think this code could be // but i'm not a specialist of //. I think
> this is a parallelization problem, how can we deal with a global variable
> such as an hash table?
>
> On Wed, Oct 12, 2022 at 11:55 PM Olivier Dion <olivier.dion@polymtl.ca>
> wrote:
>
>> On Wed, 12 Oct 2022, Damien Mattei <damien.mattei@gmail.com> wrote:
>> > Hello,
>> > all is in the title, i test on a approximately 30000 element list , i
>> got
>> > 9s with map and 3min 30s with par-map on exactly the same piece of
>> > code!?
>>
>> I can only speculate here.  But trying with a very simple example here:
>> --8<---------------cut here---------------start------------->8---
>> (use-modules (statprof))
>> (statprof (lambda () (par-map 1+ (iota 300000))))
>> --8<---------------cut here---------------end--------------->8---
>>
>> Performance are terrible.  I don't know how par-map is implemented, but
>> if it does 1 element static scheduling -- which it probably does because
>> you pass a linked list and not a vector -- then yeah you can assure that
>> thing will be very slow.
>>
>> You're probably better off with dynamic scheduling with vectors.  Here's
>> a quick snippet I made for static scheduling but with vectors.  Feel
>> free to roll your own.
>>
>> --8<---------------cut here---------------start------------->8---
>> (use-modules
>>  (srfi srfi-1)
>>  (ice-9 threads))
>>
>> (define* (par-map-vector proc input
>>                          #:optional
>>                          (max-thread (current-processor-count)))
>>
>>   (let* ((block-size (quotient (vector-length input) max-thread))
>>          (rest (remainder (vector-length input) max-thread))
>>          (output (make-vector (vector-length input) #f)))
>>     (when (not (zero? block-size))
>>       (let ((mtx (make-mutex))
>>             (cnd (make-condition-variable))
>>             (n 0))
>>         (fold
>>          (lambda (scale output)
>>            (begin-thread
>>             (let lp ((i 0))
>>               (when (< i block-size)
>>                 (let ((i (+ i (* scale block-size))))
>>                   (vector-set! output i (proc (vector-ref input i))))
>>                 (lp (1+ i))))
>>             (with-mutex mtx
>>               (set! n (1+ n))
>>               (signal-condition-variable cnd)))
>>            output)
>>          output
>>          (iota max-thread))
>>         (with-mutex mtx
>>           (while (not (< n max-thread))
>>             (wait-condition-variable cnd mtx))))
>>     (let ((base (- (vector-length input) rest)))
>>       (let lp ((i 0))
>>         (when (< i rest)
>>           (let ((i (+ i base)))
>>             (vector-set! output i (proc (vector-ref input i))))
>>           (lp (1+ i)))))
>>     output))
>> --8<---------------cut here---------------end--------------->8---
>>
>> --
>> Olivier Dion
>> oldiob.dev
>>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]