qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_byte


From: Christian A. Ehrhardt
Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()
Date: Tue, 30 Aug 2022 21:45:37 +0200

Hi,

Shameer: Thanks for bringing this to my attention.

Some comments inline.

On Tue, Aug 30, 2022 at 06:43:56AM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Shameerali Kolothum Thodi
> > Sent: 26 August 2022 13:15
> > To: 'Laszlo Ersek' <lersek@redhat.com>; qemu-devel@nongnu.org;
> > qemu-arm@nongnu.org
> > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm
> > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>; Ard
> > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann
> > <kraxel@redhat.com>
> > Subject: RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > fw_cfg_modify_bytes_read()
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Laszlo Ersek [mailto:lersek@redhat.com]
> > > Sent: 26 August 2022 13:07
> > > To: Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>;
> > > qemu-devel@nongnu.org; qemu-arm@nongnu.org
> > > Cc: imammedo@redhat.com; peter.maydell@linaro.org; Linuxarm
> > > <linuxarm@huawei.com>; chenxiang (M) <chenxiang66@hisilicon.com>;
> > Ard
> > > Biesheuvel (kernel.org address) <ardb@kernel.org>; Gerd Hoffmann
> > > <kraxel@redhat.com>
> > > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > > fw_cfg_modify_bytes_read()
> > >
> > > +Ard +Gerd, one pointer at the bottom
> > >
> > > On 08/26/22 13:59, Laszlo Ersek wrote:
> > > > On 08/25/22 18:18, Shameer Kolothum wrote:
> > > >> Hi
> > > >>
> > > >> On arm/virt platform, Chen Xiang reported a Guest crash while
> > > >> attempting the below steps,
> > > >>
> > > >> 1. Launch the Guest with nvdimm=on
> > > >> 2. Hot-add a NVDIMM dev
> > > >> 3. Reboot
> > > >> 4. Guest boots fine.
> > > >> 5. Reboot again.
> > > >> 6. Guest boot fails.
> > > >>
> > > >> QEMU_EFI reports the below error:
> > > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> > > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > > >>
> > > >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> > > >> Qemu updates the etc/table-loader len,
> > > >>
> > > >> qemu_ram_resize()
> > > >>   fw_cfg_modify_file()
> > > >>      fw_cfg_modify_bytes_read()
> > > >>
> > > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> > > >> the "key" entry to NULL. Because of this, on the second reboot,
> > > >> virt_acpi_build_update() is called with a NULL "build_state" and
> > > >> returns without updating the ACPI tables. This seems to be
> > > >> upsetting the firmware.
> > > >>
> > > >> To fix this, don't change the callback_opaque in
> > > fw_cfg_modify_bytes_read().
> > > >>
> > > >> Reported-by: chenxiang <chenxiang66@hisilicon.com>
> > > >> Signed-off-by: Shameer Kolothum
> > > <shameerali.kolothum.thodi@huawei.com>
> > > >> ---
> > > >> I am still not very convinced this is the root cause of the issue.
> > > >> Though it looks like setting callback_opaque to NULL while updating
> > > >> the file size is wrong, what puzzles me is that on the second reboot
> > > >> we don't have any ACPI table size changes and ideally firmware should
> > > >> see the updated tables from the first reboot itself.
> > > >>
> > > >> Please take a look and let me know.
> > > >>
> > > >> Thanks,
> > > >> Shameer
> > > >>
> > > >> ---
> > > >>  hw/nvram/fw_cfg.c | 1 -
> > > >>  1 file changed, 1 deletion(-)
> > > >>
> > > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > > >> index d605f3f45a..dfe8404c01 100644
> > > >> --- a/hw/nvram/fw_cfg.c
> > > >> +++ b/hw/nvram/fw_cfg.c
> > > >> @@ -728,7 +728,6 @@ static void
> > > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> > > >>      ptr = s->entries[arch][key].data;
> > > >>      s->entries[arch][key].data = data;
> > > >>      s->entries[arch][key].len = len;
> > > >> -    s->entries[arch][key].callback_opaque = NULL;
> > > >>      s->entries[arch][key].allow_write = false;
> > > >>
> > > >>      return ptr;
> > > >>


The code as it stands clears callback_opaque (the data pointer
of the callbacks) while leaving the actual callbacks in place.
I think it is obvious that this cannot be correct.

IMHO the change to allow_write is wrong for similar reasons but
I don't think that this matters in practice.

If this path is hit for the table-loader file the ACPI tables in the
guest will be corrupt.


> > > > I vaguely recall seeing the same issue report years ago (also in
> > > > relation to hot-adding NVDIMM). However, I have no capacity to
> > > > participate in the discussion. Making this remark just for clarity.
> > >
> > > The earlier report I've had in mind was from Shameer as well:
> > >
> > >
> > http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> > > B328@lhreml524-mbs.china.huawei.com
> > 
> > Right. That was a slightly different issue though. It was basically ACPI 
> > table
> > size not
> > getting updated on the first reboot of Guest after we hot-add NVDIMM dev.
> > The error
> > from firmware was different in that case,
> > 
> > ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> > OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > 
> > And it was fixed with this series here,
> > https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.3
> > 0664-1-shameerali.kolothum.thodi@huawei.com/
> > 
> > The current issue only happens on the second reboot of the Guest as
> > described in
> > the steps above.
> > 
> 
> [+Christian]
> 
> I just found that a similar issue was reported here sometime back on 
> Q35/Windows
> setup,
> https://patchew.org/QEMU/YldFMTbFLUcdFIfa@cae.in-ulm.de/
> 
> But there are no further discussions on that thread.

I convinced myself that this cannot happen upstream as the number of
entries in the table-loader is always small. However, it does happen
for us and the suggested patch fixes the issue for us.

Given that Shameer independantly came to the same conclusion
the patch should by considered for inclusion.

    thanks   Christian





reply via email to

[Prev in Thread] Current Thread [Next in Thread]