lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Atomic operations


From: Paulo César Pereira de Andrade
Subject: Re: Atomic operations
Date: Fri, 12 Aug 2022 13:26:52 -0300

Em sex., 12 de ago. de 2022 às 11:45, Marc Nieper-Wißkirchen
<marc.nieper+gnu@gmail.com> escreveu:
>
> I agree that casr/tasr (maybe also casi/tasi) are the most important 
> instructions.  As long as it can be implemented on the supported set of CPUs, 
> I don't see why we shouldn't have _c, _s, _i, _l variants alongside the 
> word-size variant.
>
> The only problem I see is that an atomic store in one thread together with an 
> atomic load of the same memory location in another thread will become costly 
> if release-acquire semantics are needed.  This can be emulated with casr/tasr 
> (I assume that their semantics is sequentially consistent) but this is a 
> costly operation while on many important CPUs like x86 all release-acquire 
> semantics actually do not need special instructions.

  At first,  my idea would be to implement tas (test and set). During
this exercise,
learn more detailed about what the different backends provide.
  Then, implement cas (compare and swap), that also requires a new pattern for
4 argument instructions.

> It is probably enough to provide a global release and an acquire instruction 
> (which does the equivalent of C11's atomic_thread_fence 
> (memory_order_release) and atomic_thread_fence (memory_order_acquire), 
> respectively).  (These would be no-ops on x86, for example.)  And maybe also 
> an operation corresponding to atomic_thread_fence (memory_order_seq_cst).

  And later, consider any kind of transactional memory support. That will depend
on how much is required on software fallbacks.

  This has very low priority for me, so, if you want to start, I can help :)

> Am Fr., 12. Aug. 2022 um 13:16 Uhr schrieb Paulo César Pereira de Andrade 
> <paulo.cesar.pereira.de.andrade@gmail.com>:
>>
>> Em qui., 11 de ago. de 2022 às 17:52, Marc Nieper-Wißkirchen
>> <marc.nieper+gnu@gmail.com> escreveu:
>>
>>   Hi Marc,
>>
>> > PS: This document may be helpful as well: 
>> > https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.
>>
>>   Right now, my idea of what should be done would be to have a
>> jit_casr(bool_return_in_register, address_in_register,
>> new_value_in_register, old_value_in_register);
>>
>>   It would need a new pattern of 4 registers instruction, as only
>> the first one is modified. The current qmul* and qdiv* ones put
>> result in the first two register arguments.
>>
>>   And while at that, also have a:
>>
>> jit_tasr(old_value_in_register, address_in_register, new_value_in_register);
>>
>>   Likely can have the _c, _s, _i, and _l modifiers, or have only a 32
>> bit variant.
>>
>>   Very few cpus should not have some construct for it. For those, we just add
>> a software fallback with some kind of spin lock. In either case,  we "cheat" 
>> and
>> look at what assembly gcc generates or the kernel uses for the
>> equivalent construct.
>>
>>   About the ABA problem, we ignore it. Just have some note, describe the 
>> issue
>> and how to avoid it. I would suggest using it with the address_in_register 
>> value
>> as a special lock, not the actual variable.
>>
>> > Am Do., 11. Aug. 2022 um 21:35 Uhr schrieb Marc Nieper-Wißkirchen 
>> > <marc.nieper+gnu@gmail.com>:
>> >>
>> >> Hi Paulo,
>> >>
>> >> Am Di., 9. Aug. 2022 um 12:40 Uhr schrieb Paulo César Pereira de Andrade 
>> >> <paulo.cesar.pereira.de.andrade@gmail.com>:
>> >>>
>> >>> > Here is a minimal API, albeit written for Scheme: 
>> >>> > https://srfi.schemers.org/srfi-230/srfi-230.html.  What is an atomic 
>> >>> > (fixnum) box there should be word-sized memory location in GNU 
>> >>> > lightning.  Atomic pairs (two words) are important for some 
>> >>> > algorithms.  If they are not easily implementable on a particular 
>> >>> > architecture, GNU lightning should report this so that the user can 
>> >>> > call C library routines (from stdatomic) or GCC builtins themselves.
>> >>> >
>> >>> > As for GNU lightning instructions, we would probably at least need the 
>> >>> > following instructions (for word-sized integers):
>> >>> >
>> >>> > - loads and stores with relaxed memory order (if I have understood 
>> >>> > correctly, we can use the usual GNU lightning load/store instructions)
>> >>> > - loads with acquire memory order
>> >>> > - stores with release memory order
>> >>> > - swap (load and store) with relaxed memory order
>> >>> > - swap (load and store) with acquire-release memory order
>> >>> > - compare-and-swap with relaxed memory order
>> >>> > - compare-and-swap with acquire-release memory order
>> >>>
>> >>>   If lightning were to provide such primites, I believe it should
>> >>> only "make a contract" of supporting strong compare-and-swap,
>> >>> not on shared memory (a different process might die with the
>> >>> lock held), to allow some kind of mutex implementation, what
>> >>> could be expensive if there are too many waiters spinning.
>> >>
>> >>
>> >> I am not sure whether I have understood your "contract".
>> >>
>> >> In any case, if a mutex is needed we could just call the GCC-provided 
>> >> software implementation in libatomic ([1]) (after checking that 
>> >> libatomic's ABI is supposed to be stable and works with different 
>> >> compilers as well).  Alternatively, we can roll out our own hash table of 
>> >> mutexes where the hash is calculated from the memory address that is to 
>> >> be accessed atomically in software.
>> >>
>> >>>
>> >>>   Still not trivial to get it on all supported ports, at least with the 
>> >>> same
>> >>> semantics, because if need to implement in an external function call,
>> >>> it would need to save/restore all JT_R* and JIT_F* registers in the
>> >>> worst case. Most times could just inline what gcc generates.
>> >>
>> >>
>> >> As atomic operations are usually quite costly, the overhead of saving and 
>> >> restoring registers is probably not too bad.
>> >>
>> >>>
>> >>> > And the same, if supported, for double-word-sized memory operands.
>> >>> >
>> >>> > And then the following arithmetic operations (in relaxed and 
>> >>> > acquire-release semantics):
>> >>> >
>> >>> > - fetch-add
>> >>> > - fetch-sub
>> >>> > - fetch-or
>> >>> > - fetch-xor
>> >>> > - fetch-and
>> >>> >
>> >>> > And then an instruction to emit a memory order (release, acquire, 
>> >>> > acquire-release, sequential consistency) as atomic_thread_fence in 
>> >>> > stdatomic.h.
>> >>> > To simplify the interface, it may make sense to offer all operations 
>> >>> > (but the thread fence instruction) only with relaxed semantics so that 
>> >>> > the programmer has to emit thread fence instructions explicitly.
>> >>>
>> >>>   The simplest way to implement it is to have it have some PIC code
>> >>> implementing it, and use two jit_jmpr to/from the code, but lightning
>> >>> would still treat the jit_jmpr as function calls, that is, invalidate 
>> >>> non callee
>> >>> save registers.
>> >>
>> >>
>> >> If internally, lightning uses an instruction different from jit_jmpr to 
>> >> "call" the atomic code, it can have more detailed knowledge about the 
>> >> registers that have to be saved.
>> >>
>> >>>
>> >>>
>> >>>   As long as using only jmpr, and not modifying registers, should be
>> >>> enough to call jit_live() once "returning" for any non callee save 
>> >>> register
>> >>> used in the construct, or that must be alive for other use.
>> >>>
>> >>> Thanks!
>> >>> Paulo
>> >>
>> >>
>> >> --
>> >>
>> >> [1] 
>> >> https://gcc.gnu.org/git/?p=gcc.git;a=tree;f=libatomic;h=7e61d96034e3b2f3c697d30e9d88ef9482f047e5;hb=HEAD
>> >>

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]