On 3/8/23 17:47, Richard Henderson wrote:
The case that I was imagining for smp_mb__before_rmw() is something like this:
wake_me = true;
smp_mb__before_rmw();
if (qatomic_xchg(&can_sleep, true)) { ... }
where you really need a full barrier.
What is different about this that doesn't apply to the remove-head case we've been
talking about?
For remove-head, nothing is going to change the BH_PENDING flag while the code runs. This
would be an okay transformation of the code, at both the compiler and the processor level:
// first part of atomic_fetch_and
old_flags = LDAXR of bh->flags
// QSLIST_REMOVE_HEAD ends up between load and store of
// atomic_fetch_and
all the loads and stores for remove-head
// second part of atomic_fetch_and
new_flags = old_flags & ~(BH_PENDING|BH_SCHEDULED|BH_IDLE);
(successful) STLXR of new_flags into bh->flags
Instead in the case above, I was thinking you would get a missed wakeup without the
barriers. Here is the full pseudocode:
// this store can move below the load of can_sleep
qatomic_set(&wake_me, true);
if (qatomic_xchg(&can_sleep, true)) sleep;
// this store can move below the load of wake_me
qatomic_set(&can_sleep, false);
if (qatomic_xchg(&wake_me, false)) wake them;
The buggy case is where the threads observe can_sleep = true, wake_me = false, i.e. the
original value of the variables.