[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/7] target/ppc: Add infrastructure for prefixed instructions
From: |
David Gibson |
Subject: |
Re: [PATCH 1/7] target/ppc: Add infrastructure for prefixed instructions |
Date: |
Thu, 17 Dec 2020 14:47:33 +1100 |
On Wed, Dec 16, 2020 at 05:01:29AM -0300, Gustavo Romero wrote:
> Hi David,
>
> Thanks a lot for the review. Please find my comments inline.
>
> On 10/8/20 9:43 PM, David Gibson wrote:
> > On Mon, Oct 05, 2020 at 01:03:13AM -0300, Gustavo Romero wrote:
> > > From: Michael Roth <mdroth@linux.vnet.ibm.com>
> >
> > Probably a good idea to CC future spins to Richard Henderson
> > <rth@twiddle.net> - by knowledge of how TCG works is only middling.
>
> OK.
Well, I said that at the time, but rth@twiddle.net seems to have been
bouncing for a while now. I'm not sure what Richard's current
preferred address is.
> > > Some prefixed instructions (Type 0 and 1, e.g. 8-byte Load/Store or 8LS),
> > > have a completely seperate namespace for their primary opcodes.
> > >
> > > Other prefixed instructions (Type 2 and 3, e.g. Modified Load/Store or
> > > MLS)
> > > actually re-use existing opcodes to provide a modified variant. We could
> > > handle these by extending the existing opcode handlers to check for the
> > > prefix and handle accordingly, but in some cases it is cleaner to define
> > > seperate handlers for these in their own table entry, and avoids the need
> > > to add error-handling to existing handlers for cases where they are called
> > > for a prefixed Type 2/3 instruction but don't implement the handling for
> > > them. In the future we can re-work things to support both approaches if
> > > cases arise where that seems warranted.
> > >
> > > Then we have the normal non-prefixed instructions.
> > >
> > > To handle all 3 of these cases we extend the table size 3x, with normal
> > > instructions indexed directly by their primary opcodes, Type 0/1 indexed
> > > by
> > > primary opcode + PPC_CPU_PREFIXED_OPCODE_OFFSET, and Type 2/3 indexed by
> > > primary opcode + PPC_CPU_PREFIXED_MODIFIED_OPCODE_OFFSET.
> >
> > Is there an advantage to having one table with magic offsets, rather
> > than 3 separately declared tables?
>
> Not that I can see in terms of size or performance. However since we
> are dealing here with the PO (primary opcode) for the prefixed insns
> as well I think it makes sense to have it related to the table that
> already handles the PO for the normal instruction. So the offset are
> not really magic, but 64 (PPC_CPU_PREFIXED_OPCODE_OFFSET) and 128
> (PPC_CPU_PREFIXED_MODIFIED_OPCODE_OFFSET), because the slots for the
> normal opcodes is triplicated.
Sure, I see why the offsets are what they are - it's basically just 3
64 entry tables appended together. I still see no point in appending
them together, rather than naming them separately.
> In addition to Michael's comment above I've add the following comment
> in the code for v2:
>
> 982 /*
> 983 * The primary op-code field extracted from the instruction (PO) is used
> as the
> 984 * index in the top-level QEMU opcode table. That table is then further
> used to
> 985 * find the proper handler, or a pointer to another table (OPC1, OPC2
> etc). For
> 986 * the normal opcodes (PO field of normal insns) the 64 first entries
> are used.
> 987 *
> 988 * Prefixed instructions can be divided into two major groups of
> instructions:
> 989 * the first group is formed by type 0 and 1, and the second group is by
> type 2
> 990 * and 3. Opcodes (PO) related to prefixed type 0/1 can have the same
> opcodes
> 991 * as the normal instructions but don't have any semantics related to
> them. But
> 992 * since it's an primary opcode anyway it was decided to simply extend
> the
> 993 * top-level table that handles the primary opcodes for the normal
> instructions.
> 994 * On the other handle, prefixed instructions type 2/3 work like
> modifiers for
> 995 * existing normal instructions. In that case it would be possible to
> reuse the
> 996 * handlers for the normal instructions, but that would require changing
> the
> 997 * normal instruction handler in a way it would have to check, for
> instance, on
> 998 * which ISA it has to take care of prefixes insns, and that is already
> done by
> 999 * GEN_HANDLER_* helpers when registering a new instruction, where there
> is a
> 1000 * a field to inform on which ISA the instruction exists. Hence, it was
> decided
> 1001 * to add another 64 entries for specific handler for the prefixed
> instructions
> 1002 * of type 2/3. Thus the size of top-level PO lookup table is 3*64 =
> 192, hence
> 1003 * representing three namespaces for decoding PO: for normal insns, for
> type 0/1
> 1004 * and for type 2/3.
> 1005 */
>
>
> > > Various exception/SRR handling changes related to prefixed instructions
> > > are
> > > yet to be implemented. For instance, no alignment interrupt is generated
> > > if
> > > a prefixed instruction crosses the 64-byte boundary; no SRR bit related to
> > > prefixed instructions is set on any interrupt, like for FP unavailable
> > > interrupt, data storage interrupt, etc.
> > >
> > > For details, please see PowerISA v3.1, section 1.6.3 and chapter 7,
> > > particularly the new changes regarding the prefixed instructions.
> > >
> > > Signed-off-by: Michael Roth <mroth@lamentation.net>
> > > [ gromero: - comments clean up and updates
> > > - additional comments in the commit log ]
> > > Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
> > > ---
> > > target/ppc/cpu.h | 4 +-
> > > target/ppc/internal.h | 14 +++
> > > target/ppc/translate.c | 167 +++++++++++++++++++++++++++++++-
> > > target/ppc/translate_init.c.inc | 11 ++-
> > > 4 files changed, 190 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > > index 766e9c5c26..c5de33572c 100644
> > > --- a/target/ppc/cpu.h
> > > +++ b/target/ppc/cpu.h
> > > @@ -979,8 +979,10 @@ struct ppc_radix_page_info {
> > > #define PPC_TLB_EPID_LOAD 8
> > > #define PPC_TLB_EPID_STORE 9
> > > -#define PPC_CPU_OPCODES_LEN 0x40
> > > +#define PPC_CPU_OPCODES_LEN 0xc0
> > > #define PPC_CPU_INDIRECT_OPCODES_LEN 0x20
> > > +#define PPC_CPU_PREFIXED_OPCODE_OFFSET 0x40
> > > +#define PPC_CPU_PREFIXED_MODIFIED_OPCODE_OFFSET 0x80
> >
> > Might be worth including some of that description from the commit
> > message as a comment here (or on the table declaration itself).
>
> Yes. Fixed for v2.
>
> > > struct CPUPPCState {
> > > /* Most commonly used resources during translated code execution
> > > first */
> > > diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> > > index 15d655b356..d03d691a44 100644
> > > --- a/target/ppc/internal.h
> > > +++ b/target/ppc/internal.h
> > > @@ -129,6 +129,7 @@ EXTRACT_SHELPER(SIMM5, 16, 5);
> > > EXTRACT_HELPER(UIMM5, 16, 5);
> > > /* 4 bits unsigned immediate value */
> > > EXTRACT_HELPER(UIMM4, 16, 4);
> > > +
> > > /* Bit count */
> > > EXTRACT_HELPER(NB, 11, 5);
> > > /* Shift count */
> > > @@ -146,6 +147,19 @@ EXTRACT_HELPER(TO, 21, 5);
> > > EXTRACT_HELPER(CRM, 12, 8);
> > > +/*
> > > + * Instruction prefix fields
> > > + *
> > > + * as per PowerISA 3.1 1.6.3
> > > + */
> > > +
> > > +/* prefixed instruction type */
> > > +EXTRACT_HELPER(PREFIX_TYPE, 24, 2);
> > > +/* 1-bit sub-type */
> > > +EXTRACT_HELPER(PREFIX_ST1, 23, 1);
> > > +/* 4-bit sub-type */
> > > +EXTRACT_HELPER(PREFIX_ST4, 20, 4);
> > > +
> > > #ifndef CONFIG_USER_ONLY
> > > EXTRACT_HELPER(SR, 16, 4);
> > > #endif
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index fedb9b2271..96c2997d3f 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -180,6 +180,8 @@ struct DisasContext {
> > > uint32_t flags;
> > > uint64_t insns_flags;
> > > uint64_t insns_flags2;
> > > + uint32_t prefix;
> > > + uint32_t prefix_subtype;
> > > };
> > > /* Return true iff byteswap is needed in a scalar memop */
> > > @@ -376,6 +378,12 @@ GEN_OPCODE(name, opc1, opc2, opc3, inval, type,
> > > PPC_NONE)
> > > #define GEN_HANDLER_E(name, opc1, opc2, opc3, inval, type, type2)
> > > \
> > > GEN_OPCODE(name, opc1, opc2, opc3, inval, type, type2)
> > > +#define GEN_HANDLER_E_PREFIXED(name, opc1, opc2, opc3, inval, type,
> > > type2) \
> > > +GEN_OPCODE_PREFIXED(name, opc1, opc2, opc3, inval, type, type2, false)
> > > +
> > > +#define GEN_HANDLER_E_PREFIXED_M(name, opc1, opc2, opc3, inval, type,
> > > type2) \
> > > +GEN_OPCODE_PREFIXED(name, opc1, opc2, opc3, inval, type, type2, true)
> > > +
> > > #define GEN_HANDLER2(name, onam, opc1, opc2, opc3, inval, type)
> > > \
> > > GEN_OPCODE2(name, onam, opc1, opc2, opc3, inval, type, PPC_NONE)
> > > @@ -395,6 +403,8 @@ typedef struct opcode_t {
> > > #endif
> > > opc_handler_t handler;
> > > const char *oname;
> > > + bool prefixed;
> > > + bool modified;
> > > } opcode_t;
> > > /* Helpers for priv. check */
> > > @@ -449,6 +459,23 @@ typedef struct opcode_t {
> > > },
> > > \
> > > .oname = stringify(name),
> > > \
> > > }
> > > +#define GEN_OPCODE_PREFIXED(name, op1, op2, op3, invl, _typ, _typ2,
> > > _modified)\
> > > +{
> > > \
> > > + .opc1 = op1,
> > > \
> > > + .opc2 = op2,
> > > \
> > > + .opc3 = op3,
> > > \
> > > + .opc4 = 0xff,
> > > \
> > > + .handler = {
> > > \
> > > + .inval1 = invl,
> > > \
> > > + .type = _typ,
> > > \
> > > + .type2 = _typ2,
> > > \
> > > + .handler = &gen_##name,
> > > \
> > > + .oname = stringify(name),
> > > \
> > > + },
> > > \
> > > + .oname = stringify(name),
> > > \
> > > + .prefixed = true,
> > > \
> > > + .modified = _modified,
> > > \
> > > +}
> > > #define GEN_OPCODE_DUAL(name, op1, op2, op3, invl1, invl2, _typ, _typ2)
> > > \
> > > {
> > > \
> > > .opc1 = op1,
> > > \
> > > @@ -525,6 +552,22 @@ typedef struct opcode_t {
> > > },
> > > \
> > > .oname = stringify(name),
> > > \
> > > }
> > > +#define GEN_OPCODE_PREFIXED(name, op1, op2, op3, invl, _typ, _typ2,
> > > _modified)\
> > > +{
> > > \
> > > + .opc1 = op1,
> > > \
> > > + .opc2 = op2,
> > > \
> > > + .opc3 = op3,
> > > \
> > > + .opc4 = 0xff,
> > > \
> > > + .handler = {
> > > \
> > > + .inval1 = invl,
> > > \
> > > + .type = _typ,
> > > \
> > > + .type2 = _typ2,
> > > \
> > > + .handler = &gen_##name,
> > > \
> > > + },
> > > \
> > > + .oname = stringify(name),
> > > \
> > > + .prefixed = true,
> > > \
> > > + .modified = _modified,
> > > \
> > > +}
> > > #define GEN_OPCODE_DUAL(name, op1, op2, op3, invl1, invl2, _typ, _typ2)
> > > \
> > > {
> > > \
> > > .opc1 = op1,
> > > \
> > > @@ -7991,6 +8034,98 @@ static bool
> > > ppc_tr_breakpoint_check(DisasContextBase *dcbase, CPUState *cs,
> > > return true;
> > > }
> > > +
> > > +
> > > +/*
> > > + * Prefixed instruction sub-types
> > > + *
> > > + * as per PowerISA 3.1 1.6.3
> > > + */
> > > +enum {
> > > + /* non-prefixed instruction */
> >
> > Two things are labelled as "non-prefixed instruction" here..
>
> Fixed for v2.
>
>
> > > + PREFIX_ST_INVALID = -1,
> > > + /* non-prefixed instruction */
> > > + PREFIX_ST_NONE = 0,
> > > + /* Type 00 prefixed insn sub-types */
> >
> > I'm guessing the types here are in binary, maybe list as 0b00 0b01 etc
> > for clarity.
>
> ISA actually uses like that, but I think it's good to clarify
> here it's a binary. Fixed for v2.
>
> > > + PREFIX_ST_8LS, /* 8-byte load/store */
> > > + PREFIX_ST_8MLS, /* 8-byte masked load/store */
> > > + /* Type 01 prefixed insn sub-types */
> > > + PREFIX_ST_8RR, /* 8-byte reg-to-reg */
> > > + PREFIX_ST_8MRR, /* 8-byte masked reg-to-reg */
> > > + /* Type 10 prefixed insn sub-types */
> > > + PREFIX_ST_MLS, /* modified load/store */
> > > + PREFIX_ST_MMLS, /* modified masked load/store */
> > > + /* Type 11 prefixed insn sub-types */
> > > + PREFIX_ST_MRR, /* modified reg-to-reg */
> > > + PREFIX_ST_MMRR, /* modified mask reg-to-reg */
> > > + PREFIX_ST_MMIRR, /* modified masked immediate reg-to-reg */
> > > +};
> > > +
> > > +static int32_t parse_prefix_subtype(uint32_t prefix)
> > > +{
> > > + int32_t prefix_subtype = PREFIX_ST_INVALID;
> > > +
> > > + /* primary opcode 1 is reserved for instruction prefixes */
> > > + if (opc1(prefix) != 1) {
> > > + return PREFIX_ST_NONE;
> > > + }
> > > +
> > > + switch (PREFIX_TYPE(prefix)) {
> > > + case 0:
> > > + if (PREFIX_ST1(prefix) == 0) {
> > > + prefix_subtype = PREFIX_ST_8LS;
> > > + } else if (PREFIX_ST1(prefix) == 1) {
> > > + prefix_subtype = PREFIX_ST_8MLS;
> > > + }
> > > + break;
> > > + case 1:
> > > + if (PREFIX_ST4(prefix) == 0) {
> > > + prefix_subtype = PREFIX_ST_8RR;
> > > + } else if (PREFIX_ST4(prefix) == 8) {
> > > + prefix_subtype = PREFIX_ST_8MRR;
> > > + }
> >
> > Is this fallthrough intentional? If so it should be commented.
>
> No. Fixed for v2.
>
> > > + case 2:
> > > + if (PREFIX_ST1(prefix) == 0) {
> > > + prefix_subtype = PREFIX_ST_MLS;
> > > + } else if (PREFIX_ST1(prefix) == 1) {
> > > + prefix_subtype = PREFIX_ST_MMLS;
> > > + }
> > > + break;
> > > + case 3:
> > > + if (PREFIX_ST4(prefix) == 0) {
> > > + prefix_subtype = PREFIX_ST_MRR;
> > > + } else if (PREFIX_ST4(prefix) == 8) {
> > > + prefix_subtype = PREFIX_ST_MMRR;
> > > + } else if (PREFIX_ST4(prefix) == 9) {
> > > + prefix_subtype = PREFIX_ST_MMIRR;
> > > + }
> >
> > Fallthrough here doesn't matter, but probably better to have a break;
> > for consistency.
>
> Right. Fixed in opc1_idx() as well for v2.
>
> > > + }
> > > +
> > > + return prefix_subtype;
> > > +}
> > > +
> > > +static uint32_t opc1_idx(DisasContext *ctx)
> > > +{
> > > + uint32_t table_offset = 0;
> > > +
> > > + switch (ctx->prefix_subtype) {
> > > + case PREFIX_ST_8LS:
> > > + case PREFIX_ST_8MLS:
> > > + case PREFIX_ST_8RR:
> > > + case PREFIX_ST_8MRR:
> > > + table_offset = PPC_CPU_PREFIXED_OPCODE_OFFSET;
> > > + break;
> > > + case PREFIX_ST_MLS:
> > > + case PREFIX_ST_MMLS:
> > > + case PREFIX_ST_MRR:
> > > + case PREFIX_ST_MMRR:
> > > + case PREFIX_ST_MMIRR:
> > > + table_offset = PPC_CPU_PREFIXED_MODIFIED_OPCODE_OFFSET;
> > > + }
> > > +
> > > + return table_offset + opc1(ctx->opcode);
> > > +}
> > > +
> > > static void ppc_tr_translate_insn(DisasContextBase *dcbase, CPUState
> > > *cs)
> > > {
> > > DisasContext *ctx = container_of(dcbase, DisasContext, base);
> > > @@ -8004,14 +8139,40 @@ static void
> > > ppc_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
> > > ctx->opcode = translator_ldl_swap(env, ctx->base.pc_next,
> > > need_byteswap(ctx));
> > > + /* check for prefix */
> > > + ctx->prefix_subtype = parse_prefix_subtype(ctx->opcode);
> > > + if (ctx->prefix_subtype == PREFIX_ST_INVALID) {
> > > + qemu_log_mask(LOG_GUEST_ERROR, "invalid/unsupported prefix: "
> > > + "%08x " TARGET_FMT_lx "\n",
> > > + ctx->prefix, ctx->base.pc_next);
> >
> > We should generate a 0x700 here, shouldn't we?
>
> I don't know for sure. I could not find any document saying
> explicitly what we should do when subtype is invalid. Nor
> that an invalid subtype means an invalid instruction. In the
> past for some TM instructions the User Manual would specify
> how to treat certain invalid fields in the instructions... like
> just ignoring it or even treating like certain valid form (!).
Hm, ok. Unless we have specific information saying the architecture,
or hw implementations do something different, I think we should
generate a 0x700.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature