qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] tcg/ppc: Enable direct branching tcg_out_goto_tb with TC


From: Jordan Niethe
Subject: Re: [RFC PATCH] tcg/ppc: Enable direct branching tcg_out_goto_tb with TCG_REG_TB
Date: Wed, 16 Aug 2023 12:59:21 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1



On 16/8/23 2:07 am, Richard Henderson wrote:
On 8/14/23 22:01, Jordan Niethe wrote:
Direct branch patching was disabled when using TCG_REG_TB in commit
736a1588c1 ("tcg/ppc: Fix race in goto_tb implementation"). Commit
7502f89c74 ("tcg/ppc: Use prefixed instructions for tcg_out_goto_tb")
used the support for pc relative ISAv3.1 instructions to re-enable
direct branch patching on POWER10.

The issue with direct branch patching with TCG_REG_TB is the lack of
synchronization between the new TCG_REG_TB being established and the
direct branch being patched in.

If each translation block is responsible for establishing its own
TCG_REG_TB then there can be no synchronization issue.

That's a good idea, and can be used for other things...

It also begs the question of whether power10 should continue to use TCG_REG_TB, loading the address with PADDI.  Or whether power9 should, like power10, disable USE_REG_TB and use ADDPCIS throughout.

I imagine it depends on usage frequency, whether use of TCG_REG_TB allows 1 insn, where addpcis requires 2 insns and prefixed insns require 2 or 3 insn slots (depending on alignment).

Yes, I agree. Your v3 series looks good, I'll try and get some performance numbers with it so we can make a decision about which way to go on power10 and power9.



+        tcg_out32(s, MFSPR | RT(TCG_REG_TMP1) | LR);
+        /* bcl 20,31,$+4 (Preferred form for getting nia.) */
+        tcg_out32(s, BC | BO_ALWAYS | BI(7, CR_SO) | 0x4 | LK);
+        tcg_out32(s, MFSPR | RT(TCG_REG_TB) | LR);
+        tcg_out32(s, ADDI | TAI(TCG_REG_TB, TCG_REG_TB, -8));
+        tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | LR);

Don't need to save/restore LR.  It is saved in the prologue and may be clobbered within the tb itself (as we do for calls >
@@ -2678,6 +2693,12 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
      tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR);
      tcg_out32(s, BCCTR | BO_ALWAYS);
      set_jmp_reset_offset(s, which);
+
+    /* For the unlinked case, need to reset TCG_REG_TB.  */
+    if (USE_REG_TB) {
+        tcg_out_movi_int(s, TCG_TYPE_I64, TCG_REG_TB,
+                         (tcg_target_long)s->code_buf, true);
+    }
  }

Actually, we don't.  The only time we arrive here is when an unlinked TB branches to itself.  TCG_REG_TB is still valid.

Ok, I was not sure how that was meant to work.


diff --git a/tcg/tcg.c b/tcg/tcg.c
index ddfe9a96cb..20698131c2 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -6010,6 +6010,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
          tcg_malloc(sizeof(uint64_t) * s->gen_tb->icount * start_words);
      num_insns = -1;
+#ifdef TCG_TARGET_NEED_ENTER_TB
+    tcg_out_enter_tb(s);
+#endif

Better would be to not have the ifdef, and add this symbol as an empty function in all other tcg backends.

I might play around with this a bit.

Thank you for that, adding pcrel support on POWER9 too is really cool.



r~




reply via email to

[Prev in Thread] Current Thread [Next in Thread]