qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

recursive locks (in general)


From: Christian Schoenebeck
Subject: recursive locks (in general)
Date: Fri, 21 Aug 2020 13:12:50 +0200

On Donnerstag, 20. August 2020 12:54:49 CEST Paolo Bonzini wrote:
> More seriously: programming with concurrency is first and foremost about
> understanding invariants[1].  Locks are relatively simple to reason
> about because they enforce invariants at the points where they are
> acquired or released.

As a side note, independent of the actual QBL issue discussed, while I agree 
with you that nested locks should be avoided as much as possible, especially 
on heterogenous large scale projects like QEMU; please let me correct some of 
the things you said about recursive locks in general:

> However, recursive locks are pure evil. :)

That's a common overgeneralization of the potential issues when dealing with 
recursive locks. Especially this claim ...

> but callers have no clue about what invariants are provided around calls
> to do_it().  So, understanding complex code that uses recursive mutexes
> is effectively impossible.

... is incorrect.

It is correct that you can run into deadlocks if you don't know how to deal 
with nested recursive locks appropriately. It is incorrect though to say they 
were not managable.

There is a golden rule with recursive locks: You always have to preserve a 
clear hierarchy. Say you have the following recursive mutexes:

RecursiveMutex mutex0;
RecursiveMutex mutex1;
RecursiveMutex mutex2;
...
RecursiveMutex mutexN;

where the suffix shall identify the hierarchy, i.e. h(mutex0) = 0,
h(mutex1) = 1, ... h(mutexN) = N. Then the golden rule is that in any call 
stack the nested locks must always preserve the same transitive hierarchy, 
e.g.:

        h(lock1) <= h(lock2) <= ... <= h(lockK)

Example (using lock-guard notation), let's say ascending transitivity is 
chosen, then the following is Ok, as it does not violate chosen transitivity:

        synchronized(mutex0) {
                synchronized(mutex1) {
                        ...
                        synchronized(mutexN) {
                        }
                }
        }

Likewise, the following is Ok as well:

        synchronized(mutex3) {
                synchronized(mutex8) {
                        ...
                        synchronized(mutexN) {
                        }
                }
        }

whereas this would be illegal:

        synchronized(mutex3) {
                synchronized(mutex2) { // < violates chosen transitivity
                        ...
                        synchronized(mutexN) {
                        }
                }
        }

Now let's say one day somebody accidentally broke that rule and ran into a 
deadlock. What you can do to resolve the situation is capturing the call stack 
of each mutex's [last] lock. And when you triggered the deadlock, you know 
that at least one of the threads violated the lock hierarchy. So you look at 
the individual call stacks and correct the program flow appropriately to 
restore the hierarchy. And the latter is BTW independent of (i.e. any side 
effects) of other threads, so it is really just about looking at what exactly 
ONE thread is doing.

And for the latter reason, there are also more systematic approaches to ensure 
correctness: for instance a built-in hierarchy check in the individual Mutex 
implementation which would then e.g. raise an assertaion fault on early 
testing whenever a developer broke the hierarchy in a nested lock sequence.

Another solution would be integrating this hierarchy check into a (e.g. 
static) sanatizer, as this information can already be derived directly from 
the AST in most cases. But unfortunatley this does not exist yet in any 
sanatizer yet AFAIK.

For me, a non-recursive mutex makes sense for one use case: if the intention 
is to lock the mutex on one thread while allowing to unlock it on another 
thread. For all other use cases I would (personally) prefer a recursive type, 
as it guards a clear ownership relation and hence allows to guard and prevent 
many mistakes.

Best regards,
Christian Schoenebeck





reply via email to

[Prev in Thread] Current Thread [Next in Thread]