freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators


From: Jeffrey D. Oldham
Subject: Re: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators
Date: Thu, 15 Jan 2004 18:58:21 -0800
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624

Richard Guenther wrote:
On Tue, 13 Jan 2004, Jeffrey D. Oldham wrote:


Richard Guenther wrote:

Hi!

The following patch is necessary to avoid deadlocks with the MPI
implementation and multi-patch setups where one context does not
participate in the reduction.

Fixes failure of array_test_.. - I don't remember - with MPI.

Basically the scenario is that the collective synchronous MPI_Gather is
called from ReduceOverContexts<> on the non-participating (and thus
not receiving) contexts while the SendIterates are still in the
schedulers queue.  The calculation participating contexts will wait for
the ReceiveIterates and patch reductions to complete using the CSem
forever then.

So the fix is to make the not participating contexts wait on the CSem,
too, by using a fake write iterate queued after the send iterates which
will trigger as soon as the send iterates complete.

Instead of adding fake write iterate can we adjust the MPI_Gather so
non-participating contexts do not participate?


The problem is not easy to tackle in MPI_Gather, as collective
communication primitives involve all contexts and this can be overcome
only by creating a new MPI communicator, which is costly.  Also I'm not
sure that this will solve the problem at all.

The problem is that contexts participating only via sending their data to
a remote context (i.e. are participating, but not computing) don't have
the counting semaphore to block on (its height is zero for them).  So
after queuing the send iterates they go straight to the final reduction
which is not done via an extra iterate and block there, not firing off the
send iterate in the first place. Ugh. Same of course for completely non
participating contexts, and even this may be a problem because of old
unrun iterates.

So in first I thought of creating a DataObject to hold the reduction
result, so we can do usual data-flow evaluation on it, and not ignore
dependencies on it, as we do now.  But this turned out to be more invasive
and I didn't have time to complete this.

So the fake writing iterate solves the problem (only partly, because, I
could imagine for completely non-participating contexts the problem is
still there) for me.

But anyway, I'm not pushing this very hard now, but it's guaranteed to
deadlock at reductions otherwise for MPI for me (so there's a race even
in the case of all-participating contexts, or the intersector is doing
something strange).

Richard.

I appreciate your finding the difficulty and your taking the time to explain the problem. I am reluctant to add code that is known to be broken for some situations. Is there a way to mark the code so 1) the known brokenness is marked and 2) the program asks sensibly when the brokenness is experienced?

--
Jeffrey D. Oldham
address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]