[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_
From: |
Richard Henderson |
Subject: |
Re: [PATCH v2 3/3] target/riscv/translate.c: set vstart_eq_zero in mark_vs_dirty() |
Date: |
Fri, 16 Feb 2024 13:41:21 -1000 |
User-agent: |
Mozilla Thunderbird |
On 2/16/24 12:40, Daniel Henrique Barboza wrote:
After reading the reviews of patches 1 and 3 what I'm considering here is:
1 - drop patch 1;
Ok.
2 - there's a patch from Ivan Klokov sent 2 months ago:
"[PATCH 1/1] target/riscv: Clear vstart_qe_zero flag"
https://lore.kernel.org/qemu-riscv/20231214111851.142532-1-ivan.klokov@syntacore.com/
His patch is closer to what you suggested than mine. He already renamed
mark_vs_dirty()
to finalize_rvv_inst() and made it set start_eq_zero unconditionally. It needs a
little work (i.e. remove the ifds from the function) that I'll do myself.
3 - I'll keep patch 2 to reduce the redundant calls to the now
finalize_rvv_inst();
Ok.
4 - Add another patch to through all "gen_set_label(over)" cond branches and set
vstart = 0 && vstart_eq_zero manually when we're doing the jump.
In fact, shouldn't we return earlier if we're not taking the branch? Otherwise
we'll set vstart twice in case we didn't get the branch. E.g:
TCGLabel *over = gen_new_label();
tcg_gen_brcond_tl(TCG_COND_GEU, cpu_vstart, cpu_vl, over);
(...)
finalize_rvv_insn();
return true;
gen_set_label(over);
/* some TCG ops to set cpu_vstart to zero. Perhaps a helper? */
s->vstart_eq_zero = true;
return true;
That will break, of course, because you wouldn't emit 'over'.
You really need to get translation-time and run-time separate in your head.
That said, I think these brcond(vstart >= vl) are a mistake.
The loops within the helpers are generally of the form
for (i = env->vstart; i < evl; i++, env->vstart++) {
which will operate just fine with vstart >= vl, iterating zero times.
We will then fall through to the post-insn cleanup,
env->vstart = 0;
vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
or whatever.
I would expect the condition vstart >= vl to never happen in practice. I believe the only
way to induce it is an explicit write to vstart. Therefore I think you should not attempt
to "optimize away" the call to the helper.
Of course you will want to double-check all of the loop iterations in the associated
helpers when removing the branches.
r~
PS: I believe a better form for the ldst loops is
for (i = env->vstart; i < evl; env->vstart = ++i)
to avoid re-reading from vstart each iteration.
[PATCH v2 2/3] trans_rvv.c.inc: remove redundant mark_vs_dirty() calls, Daniel Henrique Barboza, 2024/02/16