freepooma-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators


From: Richard Guenther
Subject: Re: [pooma-dev] Re: [PATCH] Fix deadlocks in MPI reduction evaluators
Date: Tue, 13 Jan 2004 20:43:46 +0100 (CET)

On Tue, 13 Jan 2004, Jeffrey D. Oldham wrote:

> Richard Guenther wrote:
> > Hi!
> >
> > The following patch is necessary to avoid deadlocks with the MPI
> > implementation and multi-patch setups where one context does not
> > participate in the reduction.
> >
> > Fixes failure of array_test_.. - I don't remember - with MPI.
> >
> > Basically the scenario is that the collective synchronous MPI_Gather is
> > called from ReduceOverContexts<> on the non-participating (and thus
> > not receiving) contexts while the SendIterates are still in the
> > schedulers queue.  The calculation participating contexts will wait for
> > the ReceiveIterates and patch reductions to complete using the CSem
> > forever then.
> >
> > So the fix is to make the not participating contexts wait on the CSem,
> > too, by using a fake write iterate queued after the send iterates which
> > will trigger as soon as the send iterates complete.
>
> Instead of adding fake write iterate can we adjust the MPI_Gather so
> non-participating contexts do not participate?

The problem is not easy to tackle in MPI_Gather, as collective
communication primitives involve all contexts and this can be overcome
only by creating a new MPI communicator, which is costly.  Also I'm not
sure that this will solve the problem at all.

The problem is that contexts participating only via sending their data to
a remote context (i.e. are participating, but not computing) don't have
the counting semaphore to block on (its height is zero for them).  So
after queuing the send iterates they go straight to the final reduction
which is not done via an extra iterate and block there, not firing off the
send iterate in the first place. Ugh. Same of course for completely non
participating contexts, and even this may be a problem because of old
unrun iterates.

So in first I thought of creating a DataObject to hold the reduction
result, so we can do usual data-flow evaluation on it, and not ignore
dependencies on it, as we do now.  But this turned out to be more invasive
and I didn't have time to complete this.

So the fake writing iterate solves the problem (only partly, because, I
could imagine for completely non-participating contexts the problem is
still there) for me.

But anyway, I'm not pushing this very hard now, but it's guaranteed to
deadlock at reductions otherwise for MPI for me (so there's a race even
in the case of all-participating contexts, or the intersector is doing
something strange).

Richard.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]