[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
From: |
Stefan Hajnoczi |
Subject: |
Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread |
Date: |
Thu, 25 Jan 2024 08:25:07 -0500 |
On Thu, Jan 25, 2024 at 10:06:51AM +0100, Hanna Czenczek wrote:
> On 24.01.24 22:53, Stefan Hajnoczi wrote:
> > On Wed, Jan 24, 2024 at 01:12:47PM +0100, Hanna Czenczek wrote:
> > > On 23.01.24 18:10, Kevin Wolf wrote:
> > > > Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> > > > > On 21.12.23 22:23, Kevin Wolf wrote:
> > > > > > From: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > >
> > > > > > Stop depending on the AioContext lock and instead access
> > > > > > SCSIDevice->requests from only one thread at a time:
> > > > > > - When the VM is running only the BlockBackend's AioContext may
> > > > > > access
> > > > > > the requests list.
> > > > > > - When the VM is stopped only the main loop may access the requests
> > > > > > list.
> > > > > >
> > > > > > These constraints protect the requests list without the need for
> > > > > > locking
> > > > > > in the I/O code path.
> > > > > >
> > > > > > Note that multiple IOThreads are not supported yet because the code
> > > > > > assumes all SCSIRequests are executed from a single AioContext.
> > > > > > Leave
> > > > > > that as future work.
> > > > > >
> > > > > > Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
> > > > > > Reviewed-by: Eric Blake<eblake@redhat.com>
> > > > > > Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
> > > > > > Signed-off-by: Kevin Wolf<kwolf@redhat.com>
> > > > > > ---
> > > > > > include/hw/scsi/scsi.h | 7 +-
> > > > > > hw/scsi/scsi-bus.c | 181
> > > > > > ++++++++++++++++++++++++++++-------------
> > > > > > 2 files changed, 131 insertions(+), 57 deletions(-)
> > > > > My reproducer forhttps://issues.redhat.com/browse/RHEL-3934 now
> > > > > breaks more
> > > > > often because of this commit than because of the original bug, i.e.
> > > > > when
> > > > > repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd
> > > > > device,
> > > > > this tends to happen when unplugging the scsi-hd:
> > > > >
> > > > > {"execute":"device_del","arguments":{"id":"stg0"}}
> > > > > {"return": {}}
> > > > > qemu-system-x86_64: ../block/block-backend.c:2429:
> > > > > blk_get_aio_context:
> > > > > Assertion `ctx == blk->ctx' failed.
> > > [...]
> > >
> > > > I don't know anything about the problem either, but since you already
> > > > speculated about the cause, let me speculate about the solution:
> > > > Can we hold the graph writer lock for the tran_commit() call in
> > > > bdrv_try_change_aio_context()? And of course take the reader lock for
> > > > blk_get_aio_context(), but that should be completely unproblematic.
> > > I tried this, and it’s not easy taking the lock just for tran_commit(),
> > > because some callers of bdrv_try_change_aio_context() already hold the
> > > write
> > > lock (specifically bdrv_attach_child_common(),
> > > bdrv_attach_child_common_abort(), and bdrv_root_unref_child()[1]), and
> > > qmp_x_blockdev_set_iothread() holds the read lock. Other callers don’t
> > > hold
> > > any lock[2].
> > >
> > > So I’m not sure whether we should mark all of
> > > bdrv_try_change_aio_context()
> > > as GRAPH_WRLOCK and then make all callers take the lock, or really only
> > > take
> > > it for tran_commit(), and have callers release the lock around
> > > bdrv_try_change_aio_context(). Former sounds better to naïve me.
> > >
> > > (In any case, FWIW, having blk_set_aio_context() take the write lock, and
> > > scsi_device_for_each_req_async_bh() take the read lock[3], does fix the
> > > assertion failure.)
> > I wonder if a simpler solution is blk_inc_in_flight() in
> > scsi_device_for_each_req_async() and blk_dec_in_flight() in
> > scsi_device_for_each_req_async_bh() so that drain
> > waits for the BH.
> >
> > There is a drain around the AioContext change, so as long as
> > scsi_device_for_each_req_async() was called before blk_set_aio_context()
> > and not _during_ aio_poll(), we would prevent inconsistent BB vs BDS
> > aio_contexts.
>
> Actually, Kevin has suggested on IRC that we drop the whole drain. :)
>
> Dropping the write lock in our outside of bdrv_try_change_aio_context() for
> callers that have already taken it seems unsafe, so the only option would be
> to make the whole function write-lock-able. The drained section can cause
> problems with that if it ends up wanting to reorganize the graph, so AFAIU
> drain should never be done while under a write lock. This is already a
> problem now because there are three callers that do call
> bdrv_try_change_aio_context() while under a write lock, so it seems like we
> shouldn’t keep the drain as-is.
>
> So, Kevin suggested just dropping that drain, because I/O requests are no
> longer supposed to care about a BDS’s native AioContext anymore anyway, so
> it seems like the need for the drain has gone away with multiqueue. Then we
> could make the whole function GRAPH_WRLOCK.
Okay.
Stefan
signature.asc
Description: PGP signature
Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/29
Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/23