[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events
From: |
Jonathan Cameron |
Subject: |
Re: [RFC PATCH 3/6] hw/cxl/cxl-events: Add CXL mock events |
Date: |
Mon, 17 Oct 2022 16:57:46 +0100 |
On Thu, 13 Oct 2022 17:21:56 -0700
Ira Weiny <ira.weiny@intel.com> wrote:
> On Tue, Oct 11, 2022 at 11:07:59AM +0100, Jonathan Cameron wrote:
> > On Mon, 10 Oct 2022 15:29:41 -0700
> > ira.weiny@intel.com wrote:
> >
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > To facilitate testing of guest software add mock events and code to
> > > support iterating through the event logs.
> > >
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >
> > Various comments inline, but biggest one is I'd like to see
> > a much more flexible injection interface. Happy to help code one
> > up if that is useful.
>
> Quick response to this.
>
> I thought about holding off and doing something like that but this got the irq
> testing in the kernel off the ground.
>
> I think it would be cool to use QMP to submit events as json. That would be
> much more flexible. But would have taken a lot more time.
The qapi code gen infrastructure makes this fairly simple (subject to fighting
with meson - with hindsight the same fight I had with it for other stubs...)
I've moved the poison injection patches over to this and will hopefully send
a RFC v2 of those out tomorrow.
For reference injection of poison now looks like
{ "execute": "cxl-inject-poison",
"arguments": {
"path": "/machine/peripheral/cxl-pmem0",
"start": 2048,
"length": 256
}
}
defined via a new cxl.json that other than comments just contains
{ 'command': 'cxl-inject-poison',
'data' : { 'path': 'str, 'start': 'uint64', 'length': uint64 }}
from that all the json parsing infrastructure is generated and you just
need to provide an implementation of
void qmp_cxl_inject_poison(const char *path, uint64_t start, uint64_t length,
Error *errp)
Something similar for these events will be very straight forward.
Jonathan
>
> What I did below duplicated the test code cxl-test has. It was pretty quick
> to
> do that.
>
> The biggest issue with is parsing the various events from the json to binary
> blobs.
>
> I'll clean up this patch and see what I can do. But I think having a set of
> statically defined blobs which can be injected would make testing a bit
> easier.
> Less framework to format json input to QMP.
>
> More to come...
>
> Ira
>
> >
> > Jonathan
> >
> >
> > > ---
> > > hw/cxl/cxl-events.c | 248 ++++++++++++++++++++++++++++++++++++
> > > hw/cxl/meson.build | 1 +
> > > include/hw/cxl/cxl_device.h | 19 +++
> > > include/hw/cxl/cxl_events.h | 173 +++++++++++++++++++++++++
> > > 4 files changed, 441 insertions(+)
> > > create mode 100644 hw/cxl/cxl-events.c
> > > create mode 100644 include/hw/cxl/cxl_events.h
> > >
> > > diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
> > > new file mode 100644
> > > index 000000000000..c275280bcb64
> > > --- /dev/null
> > > +++ b/hw/cxl/cxl-events.c
> > > @@ -0,0 +1,248 @@
> > > +/*
> > > + * CXL Event processing
> > > + *
> > > + * Copyright(C) 2022 Intel Corporation.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2. See
> > > the
> > > + * COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qemu/bswap.h"
> > > +#include "qemu/typedefs.h"
> > > +#include "hw/cxl/cxl.h"
> > > +#include "hw/cxl/cxl_events.h"
> > > +
> > > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int log_type)
> > > +{
> > > + if (log_type >= CXL_EVENT_TYPE_MAX) {
> > > + return NULL;
> > > + }
> > > + return &cxlds->event_logs[log_type];
> > > +}
> > > +
> > > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log)
> > > +{
> > > + return log->events[log->cur_event];
> > > +}
> > > +
> > > +uint16_t get_cur_event_handle(struct cxl_event_log *log)
> > > +{
> > > + return cpu_to_le16(log->cur_event);
> > > +}
> > > +
> > > +bool log_empty(struct cxl_event_log *log)
> > > +{
> > > + return log->cur_event == log->nr_events;
> > > +}
> > > +
> > > +int log_rec_left(struct cxl_event_log *log)
> > > +{
> > > + return log->nr_events - log->cur_event;
> > > +}
> > > +
> > > +static void event_store_add_event(CXLDeviceState *cxlds,
> > > + enum cxl_event_log_type log_type,
> > > + struct cxl_event_record_raw *event)
> > > +{
> > > + struct cxl_event_log *log;
> > > +
> > > + assert(log_type < CXL_EVENT_TYPE_MAX);
> > > +
> > > + log = &cxlds->event_logs[log_type];
> > > + assert(log->nr_events < CXL_TEST_EVENT_CNT_MAX);
> > > +
> > > + log->events[log->nr_events] = event;
> > > + log->nr_events++;
> > > +}
> > > +
> > > +uint16_t log_overflow(struct cxl_event_log *log)
> > > +{
> > > + int cnt = log_rec_left(log) - 5;
> >
> > Why -5? Can't we make it actually overflow and drop records
> > if that happens?
> >
> > > +
> > > + if (cnt < 0) {
> > > + return 0;
> > > + }
> > > + return cnt;
> > > +}
> > > +
> > > +#define CXL_EVENT_RECORD_FLAG_PERMANENT BIT(2)
> > > +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED BIT(3)
> > > +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
> > > +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
> > > +
> > > +struct cxl_event_record_raw maint_needed = {
> > > + .hdr = {
> > > + .id.data = UUID(0xDEADBEEF, 0xCAFE, 0xBABE,
> > > + 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > > + .length = sizeof(struct cxl_event_record_raw),
> > > + .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
> > > + /* .handle = Set dynamically */
> > > + .related_handle = const_le16(0xa5b6),
> > > + },
> > > + .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > > +};
> > > +
> > > +struct cxl_event_record_raw hardware_replace = {
> > > + .hdr = {
> > > + .id.data = UUID(0xBABECAFE, 0xBEEF, 0xDEAD,
> > > + 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> > > + .length = sizeof(struct cxl_event_record_raw),
> > > + .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
> > > + /* .handle = Set dynamically */
> > > + .related_handle = const_le16(0xb6a5),
> > > + },
> > > + .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> > > +};
> > > +
> > > +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT BIT(0)
> > > +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT BIT(1)
> > > +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW BIT(2)
> > > +
> > > +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR 0x00
> > > +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR 0x01
> > > +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x02
> > > +
> > > +#define CXL_GMER_TRANS_UNKNOWN 0x00
> > > +#define CXL_GMER_TRANS_HOST_READ 0x01
> > > +#define CXL_GMER_TRANS_HOST_WRITE 0x02
> > > +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA 0x03
> > > +#define CXL_GMER_TRANS_HOST_INJECT_POISON 0x04
> > > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB 0x05
> > > +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT 0x06
> > > +
> > > +#define CXL_GMER_VALID_CHANNEL BIT(0)
> > > +#define CXL_GMER_VALID_RANK BIT(1)
> > > +#define CXL_GMER_VALID_DEVICE BIT(2)
> > > +#define CXL_GMER_VALID_COMPONENT BIT(3)
> > > +
> > > +struct cxl_event_gen_media gen_media = {
> > > + .hdr = {
> > > + .id.data = UUID(0xfbcd0a77, 0xc260, 0x417f,
> > > + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> > > + .length = sizeof(struct cxl_event_gen_media),
> > > + .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> > > + /* .handle = Set dynamically */
> > > + .related_handle = const_le16(0),
> > > + },
> > > + .phys_addr = const_le64(0x2000),
> > > + .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> > > + .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> > > + .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> > > + .validity_flags = { CXL_GMER_VALID_CHANNEL |
> > > + CXL_GMER_VALID_RANK, 0 },
> > > + .channel = 1,
> > > + .rank = 30
> > > +};
> > > +
> > > +#define CXL_DER_VALID_CHANNEL BIT(0)
> > > +#define CXL_DER_VALID_RANK BIT(1)
> > > +#define CXL_DER_VALID_NIBBLE BIT(2)
> > > +#define CXL_DER_VALID_BANK_GROUP BIT(3)
> > > +#define CXL_DER_VALID_BANK BIT(4)
> > > +#define CXL_DER_VALID_ROW BIT(5)
> > > +#define CXL_DER_VALID_COLUMN BIT(6)
> > > +#define CXL_DER_VALID_CORRECTION_MASK BIT(7)
> > > +
> > > +struct cxl_event_dram dram = {
> > > + .hdr = {
> > > + .id.data = UUID(0x601dcbb3, 0x9c06, 0x4eab,
> > > + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> > > + .length = sizeof(struct cxl_event_dram),
> > > + .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> > > + /* .handle = Set dynamically */
> > > + .related_handle = const_le16(0),
> > > + },
> > > + .phys_addr = const_le64(0x8000),
> > > + .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> > > + .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> > > + .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> > > + .validity_flags = { CXL_DER_VALID_CHANNEL |
> > > + CXL_DER_VALID_BANK_GROUP |
> > > + CXL_DER_VALID_BANK |
> > > + CXL_DER_VALID_COLUMN, 0 },
> > > + .channel = 1,
> > > + .bank_group = 5,
> > > + .bank = 2,
> > > + .column = { 0xDE, 0xAD},
> > > +};
> > > +
> > > +#define CXL_MMER_HEALTH_STATUS_CHANGE 0x00
> > > +#define CXL_MMER_MEDIA_STATUS_CHANGE 0x01
> > > +#define CXL_MMER_LIFE_USED_CHANGE 0x02
> > > +#define CXL_MMER_TEMP_CHANGE 0x03
> > > +#define CXL_MMER_DATA_PATH_ERROR 0x04
> > > +#define CXL_MMER_LAS_ERROR 0x05
> >
> > Ah this explains why I didn't find these alongside the structures.
> > I'd keep them together. If we need to put the structures in a header
> > then put the defines there as well. Puts all the spec related
> > stuff in one place.
> >
> > > +
> > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> > > +
> > > +#define CXL_DHI_MS_NORMAL 0x00
> > > +#define CXL_DHI_MS_NOT_READY 0x01
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST 0x02
> > > +#define CXL_DHI_MS_ALL_DATA_LOST 0x03
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS 0x04
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN 0x05
> > > +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT 0x06
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS 0x07
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN 0x08
> > > +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT 0x09
> > > +
> > > +#define CXL_DHI_AS_NORMAL 0x0
> > > +#define CXL_DHI_AS_WARNING 0x1
> > > +#define CXL_DHI_AS_CRITICAL 0x2
> > > +
> > > +#define CXL_DHI_AS_LIFE_USED(as) (as & 0x3)
> > > +#define CXL_DHI_AS_DEV_TEMP(as) ((as & 0xC) >> 2)
> > > +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as) ((as & 0x10) >> 4)
> > > +#define CXL_DHI_AS_COR_PER_ERR_CNT(as) ((as & 0x20) >> 5)
> > > +
> > > +struct cxl_event_mem_module mem_module = {
> > > + .hdr = {
> > > + .id.data = UUID(0xfe927475, 0xdd59, 0x4339,
> > > + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
> > >
> >
> > As mentioned, below a UUID define for each type in the header
> > probably makes more sense than having this huge thing inline.
> >
> > > + .length = sizeof(struct cxl_event_mem_module),
> > > + /* .handle = Set dynamically */
> > > + .related_handle = const_le16(0),
> > > + },
> > > + .event_type = CXL_MMER_TEMP_CHANGE,
> > > + .info = {
> > > + .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> > > + .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> > > + .add_status = (CXL_DHI_AS_CRITICAL << 2) |
> > > + (CXL_DHI_AS_WARNING << 4) |
> > > + (CXL_DHI_AS_WARNING << 5),
> > > + .device_temp = { 0xDE, 0xAD},
> >
> > odd spacing
> >
> > > + .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > > + .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> >
> > Could make a reasonable number up rather than deadbeef ;)
> >
> > > + .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> > > + }
> > > +};
> > > +
> > > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds)
> > > +{
> >
> > This is fine for initial testing, but I Think we want to be more
> > sophisticated with the injection interface and allow injecting
> > individual events so we can move the requirement for 'coverage'
> > testing from having a representative list here to an external script
> > that hits all the corners.
> >
> > I can build something on top of this that lets us doing that if you like.
> > I have ancient code doing the equivalent for CCIX devices that I never
> > upstreamed. Would probably do it a bit differently today but principle
> > is the same. Using QMP directly rather than qmp-shell lets you do it
> > as json which ends up more readable than complex command lines for this
> > sort of structure command.
> >
> >
> >
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO, &maint_needed);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > > + (struct cxl_event_record_raw *)&gen_media);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_INFO,
> > > + (struct cxl_event_record_raw *)&mem_module);
> > > +
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &maint_needed);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > + (struct cxl_event_record_raw *)&dram);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > + (struct cxl_event_record_raw *)&gen_media);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > + (struct cxl_event_record_raw *)&mem_module);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FAIL,
> > > + (struct cxl_event_record_raw *)&dram);
> > > +
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
> > > &hardware_replace);
> > > + event_store_add_event(cxlds, CXL_EVENT_TYPE_FATAL,
> > > + (struct cxl_event_record_raw *)&dram);
> > > +}
> >
> >
> > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > index 7b4cff569347..46c50c1c13a6 100644
> > > --- a/include/hw/cxl/cxl_device.h
> > > +++ b/include/hw/cxl/cxl_device.h
> > > @@ -11,6 +11,7 @@
> > > #define CXL_DEVICE_H
> > >
> > > #include "hw/register.h"
> > > +#include "hw/cxl/cxl_events.h"
> > >
> > > /*
> > > * The following is how a CXL device's Memory Device registers are laid
> > > out.
> > > @@ -80,6 +81,14 @@
> > > (CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH + \
> > > CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
> > >
> > > +#define CXL_TEST_EVENT_CNT_MAX 15
> >
> > Where did 15 come from?
> >
> > > +
> > > +struct cxl_event_log {
> > > + int cur_event;
> > > + int nr_events;
> > > + struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
> > > +};
> > > +
> > > typedef struct cxl_device_state {
> > > MemoryRegion device_registers;
> > >
> > > @@ -119,6 +128,8 @@ typedef struct cxl_device_state {
> > >
> > > /* memory region for persistent memory, HDM */
> > > uint64_t pmem_size;
> > > +
> > > + struct cxl_event_log event_logs[CXL_EVENT_TYPE_MAX];
> > > } CXLDeviceState;
> > >
> > > /* Initialize the register block for a device */
> > > @@ -272,4 +283,12 @@ MemTxResult cxl_type3_read(PCIDevice *d, hwaddr
> > > host_addr, uint64_t *data,
> > > MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t
> > > data,
> > > unsigned size, MemTxAttrs attrs);
> > >
> > > +void cxl_mock_add_event_logs(CXLDeviceState *cxlds);
> > > +struct cxl_event_log *find_event_log(CXLDeviceState *cxlds, int
> > > log_type);
> > > +struct cxl_event_record_raw *get_cur_event(struct cxl_event_log *log);
> > > +uint16_t get_cur_event_handle(struct cxl_event_log *log);
> > > +bool log_empty(struct cxl_event_log *log);
> > > +int log_rec_left(struct cxl_event_log *log);
> > > +uint16_t log_overflow(struct cxl_event_log *log);
> > > +
> > > #endif
> > > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> > > new file mode 100644
> > > index 000000000000..255111f3dcfb
> > > --- /dev/null
> > > +++ b/include/hw/cxl/cxl_events.h
> > > @@ -0,0 +1,173 @@
> > > +/*
> > > + * QEMU CXL Events
> > > + *
> > > + * Copyright (c) 2022 Intel
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2. See
> > > the
> > > + * COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#ifndef CXL_EVENTS_H
> > > +#define CXL_EVENTS_H
> > > +
> > > +#include "qemu/uuid.h"
> > > +#include "hw/cxl/cxl.h"
> > > +
> > > +/*
> > > + * Common Event Record Format
> > > + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> > > + */
> > > +#define CXL_EVENT_REC_HDR_RES_LEN 0xf
> >
> > I don't see an advantage in this define vs just
> > putting the value in directly below.
> > Same with similar cases - the define must makes them
> > a tiny bit harder to compare with the specification when
> > reviewing.
> >
> > > +struct cxl_event_record_hdr {
> > > + QemuUUID id;
> > > + uint8_t length;
> > > + uint8_t flags[3];
> > > + uint16_t handle;
> > > + uint16_t related_handle;
> > > + uint64_t timestamp;
> > > + uint8_t maint_op_class;
> > > + uint8_t reserved[CXL_EVENT_REC_HDR_RES_LEN];
> > > +} QEMU_PACKED;
> > > +
> > > +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> > > +struct cxl_event_record_raw {
> > > + struct cxl_event_record_hdr hdr;
> > > + uint8_t data[CXL_EVENT_RECORD_DATA_LENGTH];
> > > +} QEMU_PACKED;
> >
> > Hmm. I wonder if we should instead define this as a union of
> > the known event types? I haven't checked if it would work
> > everywhere yet though.
> >
> > > +
> > > +/*
> > > + * Get Event Records output payload
> > > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> > > + *
> > > + * Space given for 1 record
> > > + */
> > > +#define CXL_GET_EVENT_FLAG_OVERFLOW BIT(0)
> > > +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> > > +struct cxl_get_event_payload {
> > > + uint8_t flags;
> > > + uint8_t reserved1;
> > > + uint16_t overflow_err_count;
> > > + uint64_t first_overflow_timestamp;
> > > + uint64_t last_overflow_timestamp;
> > > + uint16_t record_count;
> > > + uint8_t reserved2[0xa];
> > > + struct cxl_event_record_raw record;
> >
> > This last element should be a [] array and then move
> > the handling of different record counts to the places it
> > is used.
> >
> > Spec unfortunately says that we should return as many
> > as we can fit, so we can't rely on the users of this interface
> > only sending a request for one record (as I think your Linux
> > kernel code currently does). See below for more on this...
> >
> >
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> > > + */
> > > +enum cxl_event_log_type {
> > > + CXL_EVENT_TYPE_INFO = 0x00,
> > > + CXL_EVENT_TYPE_WARN,
> > > + CXL_EVENT_TYPE_FAIL,
> > > + CXL_EVENT_TYPE_FATAL,
> > > + CXL_EVENT_TYPE_DYNAMIC_CAP,
> > > + CXL_EVENT_TYPE_MAX
> > > +};
> > > +
> > > +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type
> > > type)
> > > +{
> > > + switch (type) {
> > > + case CXL_EVENT_TYPE_INFO:
> > > + return "Informational";
> > > + case CXL_EVENT_TYPE_WARN:
> > > + return "Warning";
> > > + case CXL_EVENT_TYPE_FAIL:
> > > + return "Failure";
> > > + case CXL_EVENT_TYPE_FATAL:
> > > + return "Fatal";
> > > + case CXL_EVENT_TYPE_DYNAMIC_CAP:
> > > + return "Dynamic Capacity";
> > > + default:
> > > + break;
> > > + }
> > > + return "<unknown>";
> > > +}
> > > +
> > > +/*
> > > + * Clear Event Records input payload
> > > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > > + *
> > > + * Space given for 1 record
> >
> > I'd rather this was defined to have a trailing variable length
> > array of handles and allocations and then wherever it was used
> > we deal with the length.
> >
> > I'm also nervous about limiting the qemu emulation to handling only
> > one record.. Spec wise I don't think you are allowed to say
> > no to larger clears. I understand the fact we can't test this today
> > with the kernel code but maybe we can hack together enough to
> > verify the emulation of larger gets and clears...
> >
> >
> > > + */
> > > +struct cxl_mbox_clear_event_payload {
> > > + uint8_t event_log; /* enum cxl_event_log_type */
> > > + uint8_t clear_flags;
> > > + uint8_t nr_recs; /* 1 for this struct */
> > > + uint8_t reserved[3];
> > > + uint16_t handle;
> > > +};
> > > +
> > > +/*
> > > + * General Media Event Record
> > > + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> > > + */
> >
> > In interests of keeping everything that needs checking against
> > a chunk of the spec together, perhaps it's worth adding appropriate
> > defines for the UUIDs?
> >
> > > +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
> > > +#define CXL_EVENT_GEN_MED_RES_SIZE 0x2e
> >
> > As above, I'd rather see these inline.
> >
> > > +struct cxl_event_gen_media {
> > > + struct cxl_event_record_hdr hdr;
> > > + uint64_t phys_addr;
> > Defines for the mask + that we have a few things hiding in
> > the bottom bits?
> >
> > > + uint8_t descriptor;
> > Defines for the various fields in here?
> >
> > > + uint8_t type;
> > Same for the others that follow.
> >
> > > + uint8_t transaction_type;
> >
> > > + uint8_t validity_flags[2];
> >
> > uint16_t probably makes sense as we can do that for this one (unlike the
> > helpful le24 flags fields
> > in other structures).
> >
> > > + uint8_t channel;
> > > + uint8_t rank;
> > > + uint8_t device[3];
> > > + uint8_t component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> > > + uint8_t reserved[CXL_EVENT_GEN_MED_RES_SIZE];
> > > +} QEMU_PACKED;
> > Would be nice to add a build time check that these structures have the
> > correct
> > overall size. Ben did a bunch of these in the other CXL emulation and they
> > are
> > a handy way to reassure reviewers that it adds up right!
> >
> > > +
> > > +/*
> > > + * DRAM Event Record - DER
> > > + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> > > + */
> > > +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
> > > +#define CXL_EVENT_DER_RES_SIZE 0x17
> > Same as above.
> >
> > > +struct cxl_event_dram {
> > > + struct cxl_event_record_hdr hdr;
> > > + uint64_t phys_addr;
> > As before I'd like defines for the sub fields and masks.
> >
> > > + uint8_t descriptor;
> > > + uint8_t type;
> > > + uint8_t transaction_type;
> > > + uint8_t validity_flags[2];
> > uint16_t and same in similar cases.
> >
> > > + uint8_t channel;
> > > + uint8_t rank;
> > > + uint8_t nibble_mask[3];
> > > + uint8_t bank_group;
> > > + uint8_t bank;
> > > + uint8_t row[3];
> > > + uint8_t column[2];
> > > + uint8_t correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> > > + uint8_t reserved[CXL_EVENT_DER_RES_SIZE];
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * Get Health Info Record
> > > + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +struct cxl_get_health_info {
> > Same stuff as for earlier structures.
> >
> > > + uint8_t health_status;
> > > + uint8_t media_status;
> > > + uint8_t add_status;
> > > + uint8_t life_used;
> > > + uint8_t device_temp[2];
> > > + uint8_t dirty_shutdown_cnt[4];
> > > + uint8_t cor_vol_err_cnt[4];
> > > + uint8_t cor_per_err_cnt[4];
> > > +} QEMU_PACKED;
> > > +
> > > +/*
> > > + * Memory Module Event Record
> > > + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> > > + */
> > > +#define CXL_EVENT_MEM_MOD_RES_SIZE 0x3d
> > > +struct cxl_event_mem_module {
> > > + struct cxl_event_record_hdr hdr;
> > > + uint8_t event_type;
> > > + struct cxl_get_health_info info;
> > > + uint8_t reserved[CXL_EVENT_MEM_MOD_RES_SIZE];
> > > +} QEMU_PACKED;
> > > +
> > > +#endif /* CXL_EVENTS_H */
> >
>