qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 8/9] async: update documentation of the memory barriers


From: Paolo Bonzini
Subject: Re: [PATCH v2 8/9] async: update documentation of the memory barriers
Date: Wed, 8 Mar 2023 19:08:05 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1

On 3/8/23 17:47, Richard Henderson wrote:
The case that I was imagining for smp_mb__before_rmw() is something like this:

     wake_me = true;
     smp_mb__before_rmw();
     if (qatomic_xchg(&can_sleep, true)) { ... }

where you really need a full barrier.

What is different about this that doesn't apply to the remove-head case we've been talking about?

For remove-head, nothing is going to change the BH_PENDING flag while the code runs. This would be an okay transformation of the code, at both the compiler and the processor level:

  // first part of atomic_fetch_and
  old_flags = LDAXR of bh->flags

  // QSLIST_REMOVE_HEAD ends up between load and store of
  // atomic_fetch_and
  all the loads and stores for remove-head

  // second part of atomic_fetch_and
  new_flags = old_flags & ~(BH_PENDING|BH_SCHEDULED|BH_IDLE);
  (successful) STLXR of new_flags into bh->flags


Instead in the case above, I was thinking you would get a missed wakeup without the barriers. Here is the full pseudocode:

  // this store can move below the load of can_sleep
  qatomic_set(&wake_me, true);
  if (qatomic_xchg(&can_sleep, true)) sleep;

  // this store can move below the load of wake_me
  qatomic_set(&can_sleep, false);
  if (qatomic_xchg(&wake_me, false)) wake them;

The buggy case is where the threads observe can_sleep = true, wake_me = false, i.e. the original value of the variables. Let's look at it with CPPMEM.

Like before, the CPPMEM test must use CAS and .readsvalue() to hack around the lack of "if"s. Two .readsvalue() clauses represent the buggy case, so we are all good if there is *no* consistent executions.

Unfortunately, it fails:

  // 2 consistent (i.e. buggy) executions
  int main() {
    atomic_int w = 0;
    atomic_int s = 1;
    {{{
      { w.store(1, mo_relaxed);
        // the buggy case is the one in which the threads read the
        // original value of w and s.  So here the CAS writes a
        // dummy value different from both 0 and 1
        cas_strong_explicit(&s, 0, 99, mo_seq_cst, mo_seq_cst);
        s.load(mo_relaxed).readsvalue(1); }
    |||
      { s.store(0, mo_relaxed);
        // same as above
        cas_strong_explicit(&w, 1, 99, mo_seq_cst, mo_seq_cst);
        w.load(mo_relaxed).readsvalue(0); }
    }}}
  }

It works with barriers, which models the extra smp_mb__before_rmw():

  // no consistent executions (i.e. it works)
  int main() {
    atomic_int w = 0;
    atomic_int s = 1;
    {{{
      { w.store(1, mo_relaxed);
        atomic_thread_fence(mo_seq_cst);
        cas_strong_explicit(&s, 0, 99, mo_relaxed, mo_relaxed);
        s.load(mo_relaxed).readsvalue(1); }
    |||
      { s.store(0, mo_relaxed);
        atomic_thread_fence(mo_seq_cst);
        cas_strong_explicit(&w, 1, 99, mo_relaxed, mo_relaxed);
        w.load(mo_relaxed).readsvalue(0); }
    }}}
  }

I think I agree with you that _in practice_ it's going to work at the processor level; the pseudo-assembly would be

  r1 = LDAXR(can_sleep);
                                   r2 = LDAXR(wake_me);
                                   STR(can_sleep, false);
                                   STLXR(wake_me, false); // successful
  STR(wake_me, true);
  STLXR(can_sleep, true); // successful (?)
  if r1 == 0 { ... }
                                   if r2 != 0 { ... }

and I can't think of a way in which both store-exclusives (or xchg, or cmpxchg) would succeed. So perhaps smp_mb__before_rmw() could indeed be a signal_fence(). But that's quite scary even for the standards of this discussion...

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]