lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Atomic operations


From: Marc Nieper-Wißkirchen
Subject: Re: Atomic operations
Date: Fri, 12 Aug 2022 18:39:21 +0200




Am Fr., 12. Aug. 2022 um 18:27 Uhr schrieb Paulo César Pereira de Andrade <paulo.cesar.pereira.de.andrade@gmail.com>:
Em sex., 12 de ago. de 2022 às 11:45, Marc Nieper-Wißkirchen
<marc.nieper+gnu@gmail.com> escreveu:
>
> I agree that casr/tasr (maybe also casi/tasi) are the most important instructions.  As long as it can be implemented on the supported set of CPUs, I don't see why we shouldn't have _c, _s, _i, _l variants alongside the word-size variant.
>
> The only problem I see is that an atomic store in one thread together with an atomic load of the same memory location in another thread will become costly if release-acquire semantics are needed.  This can be emulated with casr/tasr (I assume that their semantics is sequentially consistent) but this is a costly operation while on many important CPUs like x86 all release-acquire semantics actually do not need special instructions.

  At first,  my idea would be to implement tas (test and set). During
this exercise,
learn more detailed about what the different backends provide.
  Then, implement cas (compare and swap), that also requires a new pattern for
4 argument instructions.

> It is probably enough to provide a global release and an acquire instruction (which does the equivalent of C11's atomic_thread_fence (memory_order_release) and atomic_thread_fence (memory_order_acquire), respectively).  (These would be no-ops on x86, for example.)  And maybe also an operation corresponding to atomic_thread_fence (memory_order_seq_cst).

  And later, consider any kind of transactional memory support. That will depend
on how much is required on software fallbacks.

  This has very low priority for me, so, if you want to start, I can help :)

You are right that we should start with tas and cas first.  It will already cover a lot of use cases (and everything else can be emulated).

At one point, I will have a look at how to implement it (First, I have to dive a bit more into your code, though.)
However, an instruction that can be used to embed data is of higher priority for me.  Would this be implemented with the various is, ic, ... macros in each jit-XXX-cpu.c?  After such an instruction, would the alignment have to be adjusted, or is it done by lightning automatically on those ports where instructions have to be aligned?

[...]


reply via email to

[Prev in Thread] Current Thread [Next in Thread]