Re: Deadlock with SATA CD I/O and eject

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Deadlock with SATA CD I/O and eject

From:	Hanna Czenczek
Subject:	Re: Deadlock with SATA CD I/O and eject
Date:	Tue, 19 Sep 2023 13:23:21 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 19.09.23 12:54, Kevin Wolf wrote:

Am 18.09.2023 um 19:28 hat John Levon geschrieben:

Observed with base of qemu 6.2.0, but from code inspection it looks to me like
it's still current in upstream master. Apologies if I have missed a fix in this
area.

Symptom: run a UEFI-booted SATA CD Windows installer. When it hits "Loading
files.." screen, run an eject e.g.

virsh qemu-monitor-command 64c6e190-ea7f-49e2-b2d5-6ba1814b00ae '{"execute":"eject", "arguments": { 
"id": "sata0-0-0" } }'

qemu will get stuck like so:

gdb) bt
#0  0x00007f8ba4b16036 in ppoll () from /lib64/libc.so.6
#1  0x0000561813c48ed5 in ppoll (__ss=0x0, __timeout=0x7ffcbd981a70, __nfds=<optimized 
out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:62
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
timeout=timeout@entry=999896128) at ../util/qemu-timer.c:348
#3  0x0000561813c29be9 in fdmon_poll_wait (ctx=0x56181516e070, 
ready_list=0x7ffcbd981af0, timeout=999896128) at ../util/fdmon-poll.c:80
#4  0x0000561813c297e1 in aio_poll (ctx=ctx@entry=0x56181516e070, 
blocking=blocking@entry=true) at ../util/aio-posix.c:607
#5  0x0000561813ae2fad in bdrv_do_drained_begin (poll=true, 
ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x56181533fcc0) at 
../block/io.c:483
#6  bdrv_do_drained_begin (bs=0x56181533fcc0, recursive=<optimized out>, parent=0x0, 
ignore_bds_parents=<optimized out>, poll=<optimized out>) at ../block/io.c:446
#7  0x0000561813ad9982 in blk_drain (blk=0x5618161c1f10) at 
../block/block-backend.c:1741
#8  0x0000561813ad9b8c in blk_remove_bs (blk=blk@entry=0x5618161c1f10) at 
../block/block-backend.c:852
#9  0x000056181382b8ab in blockdev_remove_medium (has_device=<optimized out>, 
device=<optimized out>, has_id=<optimized out>, id=<optimized out>, 
errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:232
#10 0x000056181382bfb1 in qmp_eject (has_device=<optimized out>, device=0x0, has_id=<optimized out>, 
id=0x561815e6efe0 "sata0-0-0", has_force=<optimized out>, force=<optimized out>, 
errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:45

We are stuck forever here:

  351 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
  352                                   bool poll)
...
  380     if (poll) {
  381         BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
  382     }

Because the blk root's ->in_flight is 1, as tested by the condition
blk_root_drained_poll().


Our blk->in_flight user is stuck here:

1298 static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
...
1310         blk_dec_in_flight(blk);
1311         qemu_co_queue_wait(&blk->queued_requests, 
&blk->queued_requests_lock);
1312         blk_inc_in_flight(blk);

Note that before entering this stanza, blk->in_flight was 2. This turns out to
be due to the ide atapi code. In particular, the guest VM is generating lots of
read I/O. The "first IO" arrives into blk via:

cd_read_sector()->ide_buffered_readv()->blk_aio_preadv()

This initial IO completes:

blk_aio_read_entry()->blk_aio_complete()

1560 static void blk_aio_complete(BlkAioEmAIOCB *acb)
1561 {
1562     if (acb->has_returned) {
1563         acb->common.cb(acb->common.opaque, acb->rwco.ret);
1564         blk_dec_in_flight(acb->rwco.blk);
1565         qemu_aio_unref(acb);
1566     }
1567 }

Line 1564 is what we need to move blk->in_flight down to zero, but that is never
reached! This is because of what happens at :1563

acm->common.cb()->cd_read_sector_cb()->ide_atapi_cmd_reply_end()->cd_read_sector_sync()->blk_pread()

That is, the IO callback in the atapi code itself triggers more - synchronous - 
IO.

In the meantime, we start processing the blk_drain() code, so by the time this
blk_pread() actually gets handled, quiesce is set, and we get stuck in the
blk_wait_while_drained().

I don't know the qemu block stack well enough to propose an actual fix.

Experimentally, waiting for ->in_flight to drop to zero *before* we quiesce in
blk_remove_bs() via an AIO_WAIT_WHILE() avoids the symptom, but I'm pretty sure
that's just a band-aid instead of fixing the deadlock.

Any suggestions/clues/thoughts?

Related discussion: 
https://lists.gnu.org/archive/html/qemu-block/2023-03/msg00284.html

Actually, it seems we never fixed that problem either?

We didn’t, from my POV mainly because everyone had different suggestionsand none of them looked nice enough that it was clear which one to pursue.

I.e., there are alsohttps://lists.gnu.org/archive/html/qemu-block/2023-03/msg00709.html andhttps://lists.gnu.org/archive/html/qemu-block/2023-04/msg00118.html .

Back then I suggested that blk_drain*() should disable request queuing
because its callers don't want to quiesce the BlockBackend, but rather
get their own requests completed. I think this approach would solve this
one as well.

Your experiment with moving the queuing later is basically what Paolo
suggested, though I'd still argue it's not the right thing to do because
while it seems to mostly work with both use cases of drain (give me
exclusive access vs. wait for my requests to complete), it's not optimal
for either one.

Kevin

[Prev in Thread]

Current Thread

[Next in Thread]

Deadlock with SATA CD I/O and eject, John Levon, 2023/09/18
- Re: Deadlock with SATA CD I/O and eject, Kevin Wolf, 2023/09/19
  - Re: Deadlock with SATA CD I/O and eject, Hanna Czenczek <=
  - Re: Deadlock with SATA CD I/O and eject, John Levon, 2023/09/19
    - Re: Deadlock with SATA CD I/O and eject, Kevin Wolf, 2023/09/19

Prev by Date: Re: [PATCH v2 07/19] target/riscv/cpu.c: mark extensions arrays as 'const'
Next by Date: Re: [PATCH v2 08/19] target/riscv: move riscv_cpu_add_kvm_properties() to kvm.c
Previous by thread: Re: Deadlock with SATA CD I/O and eject
Next by thread: Re: Deadlock with SATA CD I/O and eject
Index(es):
- Date
- Thread