qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/5] Add LoongArch v1.1 instructions


From: gaosong
Subject: Re: [PATCH 0/5] Add LoongArch v1.1 instructions
Date: Thu, 26 Oct 2023 14:54:27 +0800
User-agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

在 2023/10/26 上午9:38, Jiajie Chen 写道:

On 2023/10/26 03:04, Richard Henderson wrote:
On 10/25/23 10:13, Jiajie Chen wrote:
On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive, TCGv_i128 block.
See target/ppc/translate.c, gen_stqcx_.

The situation here is slightly different: aarch64 and ppc64 have both 128-bit ll and sc, however LoongArch v1.1 only has 64-bit ll and 128-bit sc.

Ah, that does complicate things.

Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0
ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed

Then a possible implementation of gen_ll() would be: align base to 128-bit boundary, read 128-bit from memory, save 64-bit part to rd and record whole 128-bit data in llval. Then, in gen_sc_q(), it uses a 128-bit cmpxchg.


But what about the reversed instruction pattern: ll.d hi, base, 4; ld.d lo, base 0?

It would be worth asking your hardware engineers about the bounds of legal behaviour. Ideally there would be some very explicit language, similar to


I'm a community developer not affiliated with Loongson. Song Gao, could you provide some detail from Loongson Inc.?



ll.d   r1, base, 0
dbar 0x700          ==> see 2.2.8.1
ld.d  r2, base,  8
...
sc.q r1, r2, base


For this series,
I think we need set the new config bits to the 'max cpu', and change linux-user/target_elf.h ''any' to 'max', so that we can use these new instructions on linux-user mode.

Thanks
Song Gao

https://developer.arm.com/documentation/ddi0487/latest/
B2.9.5 Load-Exclusive and Store-Exclusive instruction usage restrictions

But you could do the same thing, aligning and recording the entire 128-bit quantity, then extract the ll.d result based on address bit 6.  This would complicate the implementation of sc.d as well, but would perhaps bring us "close enough" to the actual architecture.

Note that our Arm store-exclusive implementation isn't quite in spec either.  There is quite a large comment within translate-a64.c store_exclusive() about the ways things are not quite right.  But it seems to be close enough for actual usage to succeed.


r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]