Re: race condition destroying condition variables

On Tue, Dec 19, 2017 at 3:25 AM, Samuel Thibault <samuel.thibault@gnu.org> wrote:

Brent W. Baccala, on mar. 19 déc. 2017 00:08:44 -0500, wrote:
> Looks like there's a race condition when we destroy a condition variable. My
> understanding of the expected behavior is that once all the threads have been
> signaled (i.e, pthread_cond_broadcast is called), the condition variable can be
> safely destroyed with pthread_cond_destroy.

Err, I don't think that POSIX allows to assume that. The fact that
pthread_cond_broadcast has returned doesn't mean that other threads have
finished with pthread_cond_wait.

POSIX seems a little unclear on that, but C++ seems to require it.

POSIX:

"It shall be safe to destroy an initialized condition variable upon which no threads are currently blocked." [1]

and

"The pthread_cond_broadcast() function shall unblock all threads currently blocked on the specified condition variable cond." [2]

A little further down on [1], in an "informative" section, is the following snippet of code:

(A) pthread_cond_broadcast(&ep->notbusy);
    pthread_mutex_unlock(&lp->lm);
(B) pthread_cond_destroy(&rp->notbusy);

...accompanied by the following comment:

"In this example, the condition variable and its list element may be freed (line B) immediately after all threads waiting for it are awakened (line A)..."

which is what makes sense to me. Of course, our pthread_cond_destroy() does nothing, but assuming that the condvar is now freed in a step (C), this code won't work reliably in our current implementation.

I found a discussion of this on stackexchange [3], where the top answer observes that 'The standard only says "shall unblock" and not "on successful return from this call they will be unblocked".'

The C++ standard, however, is more explicit, in section 30.5.1:

"Only the notification to unblock the wait must happen before destruction".

cppreference.com [4] says:

"It is only safe to invoke the destructor if all threads have been notified. It is not required that they have exited their respective wait functions: some threads may still be waiting to reacquire the associated lock, or may be waiting to be scheduled to run after reacquiring it."

That's the behavior I was counting on. I'm using C++, and have found that I can't just notify_all() a condition variable, then destroy it.

And, yes, I've got everything locked so that another thread can't jump in there with a wait() between those two events.

On Tue, Dec 19, 2017 at 4:17 AM, Richard Braun <rbraun@sceen.net> wrote:

Besides, the threads should also all go through reacquiring the associated
mutex, usually sitting right next to the condition variable, and usually
both embedded in a parent object. What you're normally really interested
in is releasing this parent object, including destroying the mutex, which
means you also have to wait for all threads to unlock it. One common way
to deal with this is reference counters on the parent object.

In my case, I've got a lot more condition variables than mutexes, because I don't want threads waking up for events other than the one they're waiting for. One mutex for the entire pager, and a condition variable for every outstanding lock and write. So, for example, if a thread is waiting for a particular write to finish, it waits on that write's condition variable, which gets signaled and destroyed when the write finishes, while the pager object and the mutex continue to exist.

I was thinking about wrapping such a counter as you suggest into the condition variable structure, to ensure that the POSIX informative behavior and the C++ behavior work reliably.

Increment the counter (atomically) when we're about to block on a condition variable, and decrement it when we're done the post-block processing. Then pthread_cond_destroy() can first check to make sure that the wait queue is empty (return EBUSY if not), then spin wait for the counter to be zero otherwise. I think that should ensure that we can free() the condition variable after pthread_cond_destroy() has returned, and that, in turn, will ensure that the C++ destructor works reliably, too.

agape

brent

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_destroy.html

[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_broadcast.html

[3] https://stackoverflow.com/questions/7598457/when-can-a-cond-var-be-used-to-synchronize-its-own-destruction-unmapping

[4] http://en.cppreference.com/w/cpp/thread/condition_variable/~condition_variable

From:	Brent W. Baccala
Subject:	Re: race condition destroying condition variables
Date:	Tue, 19 Dec 2017 14:20:40 -0500