[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6 1/4] memory: prevent dma-reentracy issues
From: |
Alexander Bulekov |
Subject: |
Re: [PATCH v6 1/4] memory: prevent dma-reentracy issues |
Date: |
Fri, 10 Mar 2023 07:31:17 -0500 |
On 230310 0723, Alexander Bulekov wrote:
> On 230310 1214, Fiona Ebner wrote:
> > Am 05.02.23 um 05:07 schrieb Alexander Bulekov:
> > > Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
> > > This flag is set/checked prior to calling a device's MemoryRegion
> > > handlers, and set when device code initiates DMA. The purpose of this
> > > flag is to prevent two types of DMA-based reentrancy issues:
> > >
> > > 1.) mmio -> dma -> mmio case
> > > 2.) bh -> dma write -> mmio case
> > >
> > > These issues have led to problems such as stack-exhaustion and
> > > use-after-frees.
> > >
> > > Summary of the problem from Peter Maydell:
> > > https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com
> > >
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
> > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
> > >
> > > Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
> > > Acked-by: Peter Xu <peterx@redhat.com>
> > > ---
> > > include/hw/qdev-core.h | 7 +++++++
> > > softmmu/memory.c | 17 +++++++++++++++++
> > > softmmu/trace-events | 1 +
> > > 3 files changed, 25 insertions(+)
> > >
> > Hi,
> > there seems to be an issue with this patch or existing issue exposed by
> > this patch in combination with the LSI SCSI controller.
> >
> > After applying this patch on current master (i.e.
> > ee59483267de29056b5b2ee2421ef3844e5c9932), a Debian 11 with the LSI
> > controller would not boot properly anymore:
> > > [ 7.540907] sym0: <895a> rev 0x0 at pci 0000:00:05.0 irq 10
> > > [ 7.546028] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
> > > [ 7.559724] sym0: SCSI BUS has been reset.
> > > [ 7.560820] sym0: interrupted SCRIPT address not found.
> > > [ 7.563802] scsi host2: sym-2.2.3
> > > [ 7.881111] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit)
> > > 52:54:00:12:34:56
> > > [ 7.881998] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network
> > > Connection
> > > [ 7.925902] e1000 0000:00:03.0 ens3: renamed from eth0
> > > [ 32.654811] scsi 2:0:0:0: tag#192 ABORT operation started
> > > [ 37.764283] scsi 2:0:0:0: ABORT operation timed-out.
> > > [ 37.774974] scsi 2:0:0:0: tag#192 DEVICE RESET operation started
> > > [ 42.882488] scsi 2:0:0:0: DEVICE RESET operation timed-out.
> > > [ 42.883606] scsi 2:0:0:0: tag#192 BUS RESET operation started
> > > [ 48.002437] scsi 2:0:0:0: BUS RESET operation timed-out.
> > > [ 48.003030] scsi 2:0:0:0: tag#192 HOST RESET operation started
> > > [ 48.010226] sym0: SCSI BUS has been reset.
> > > [ 53.122472] scsi 2:0:0:0: HOST RESET operation timed-out.
> > > [ 53.123030] scsi 2:0:0:0: Device offlined - not ready after error
> > > recovery
> >
> > The commandline I used is:
> > ./qemu-system-x86_64 \
> > -cpu 'kvm64' \
> > -m 4096 \
> > -serial 'stdio' \
> > -device 'lsi,id=scsihw0,bus=pci.0,addr=0x5' \
> > -drive
> > 'file=/dev/zvol/myzpool/vm-9006-disk-0,if=none,id=drive-scsi0,format=raw' \
> > -device
> > 'scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
> > -machine 'pc'
> >
> > Happy to provide any more information if necessary!
> >
> > CC-ing Fam Zheng (reviewer:SCSI)
> >
> > Originally reported by one of our community members:
> > https://forum.proxmox.com/threads/123843/
> >
> > Best Regards,
> > Fiona
> >
>
> Thanks, I confirmed this by booting up a livecd iso with an lsi device
> attached. I will do some digging
>
> Stack-trace:
>
> #0 trace_memory_region_reentrant_io (cpu_index=<optimized out>,
> mr=<optimized out>, offset=<optimized out>, size=<optimized out>) at
> trace/trace-softmmu.h:337
> #1 0x000055555815ce67 in access_with_adjusted_size (addr=addr@entry=0x1000,
> value=0x7ffef01fb980, size=size@entry=0x4, access_size_min=0x1,
> access_size_min@entry=0x0, access_size_max=0x4, access_fn=0x555558181370
> <memory_region_read_accessor>, mr=0x627000000c50, attrs=...
> ) at ../softmmu/memory.c:552
> #2 0x000055555815aec7 in memory_region_dispatch_read1 (mr=0x627000000c50,
> addr=0x1000, pval=<optimized out>, size=0x4, attrs=...) at
> ../softmmu/memory.c:1448
This MR seems to be "lsi-ram".
>From hw/scsi/lsi53c895a.c:
memory_region_init_io(&s->ram_io, OBJECT(s), &lsi_ram_ops, s,
"lsi-ram", 0x2000);
So the LSI device is reading an LSI "Script" from its own IO region.. In
this particular case, I think there was no reason to use
memory_region_init_io rather than memory_region_init_ram, but this makes
me worried that there are other devices that use something like this.
-Alex