qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Deadlock between bdrv_drain_all_begin and prepare_mmio_access


From: Liang Yan
Subject: Re: Deadlock between bdrv_drain_all_begin and prepare_mmio_access
Date: Mon, 8 Aug 2022 18:34:10 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0


On 8/2/22 08:35, Kevin Wolf wrote:
Am 24.07.2022 um 23:41 hat Liang Yan geschrieben:
Hello All,

I am facing a lock situation between main-loop thread 1 and vcpu thread 4
when doing a qmp snapshot. QEMU is running on 6.0.x, checked the upstream
code and did not see any big change since between. Guest is a Windows 10 VM.
Unfortunately, I could not get into the windows vm or reproduce the issue by
myself. No iothread is used here, native aio only.

From the code,

-> AIO_WAIT_WHILE(NULL, bdrv_drain_all_poll());

--> aio_poll(qemu_get_aio_context(), true);

Mainloop mutex is locked when start snapshot in thread 1, vcpu released
thread lock when address_space_rw and try to get thread lock again in
prepare_mmio_access.

It seems main loop thread is stuck at aio_poll with blocking, but I can not
figure out what the addr=4275044592 belongs to from mmio read.

I do not quite understand what really happens here, either block jobs never
drained out or maybe a block io read from vcpu and cause a deadlock? I hope
domain experts here could help figure out the root cause, thanks in advance
and let me know if need any further information.
This does not look like a deadlock to me: Thread 4 is indeed waiting for
thread 1 to release the lock, but I don't think thread 1 is waiting in
any way for thread 4.

In thread 1, bdrv_drain_all_begin() waits for all in-flight I/O requests
to complete. So it looks a bit like some I/O request got stuck. If you
want to debug this a bit further, try to check what it is that makes
bdrv_drain_poll() still return true.

Thanks for the reply.

I agree it is not a pure deadlock. thread 1 seems have more responsibility here.

Do you know if there is a way to check in-flight I/O requests here? Is it possible that the i/o request is the mmio_read from thread 4?

I could only see the addr=4275044592, but could not identify which address space it is belonged.


I am also pretty curious why bdrv_drain_poll() always return true.  Any chance that it is blocked in aio_poll(qemu_get_aio_context(), true)?

while ((cond)) { \
if (ctx_) { \
aio_context_release(ctx_); \
} \
aio_poll(qemu_get_aio_context(), true);


As mentioned, I only have a dump file, could not reproduce it in my local environment. 

Though, I have been working on a log patch to print all fd/aio-handlers that main-loop is dispatched.


Please also add the QEMU command line you're using, especially the
configuration of the block device backends (for example, does this use
Linux AIO, the thread pool or io_uring?).

it uses native linux aio, and no extra io-thread is assigned here.

-blockdev {"driver":"file","filename":"****.raw","aio":"native","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}

-device virtio-blk-pci,bus=pci.0,addr=0x6,drive=libvirt-2-format,id=virtio-disk0,bootindex=1,write-cache=on


Let me now if you need more information and thanks for looking into this issue.

~Liang

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]