Re: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush sl

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush sl

From:	Stefan Hajnoczi
Subject:	Re: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush slave channel
Date:	Thu, 28 Jan 2021 16:52:34 +0000

On Mon, Jan 25, 2021 at 01:01:13PM -0500, Vivek Goyal wrote:
> Currently we don't have a mechanism to flush slave channel while shutting
> down vhost-user device and that can result a deadlock. Consider following
> scenario.
> 
> 1. Slave gets a request from guest on virtqueue (say index 1, vq1), to map
>    a portion of file in qemu address space.
> 
> 2. Thread serving vq1 receives this request and sends a message to qemu on
>    slave channel/fd and gets blocked in waiting for a response from qemu.
> 
> 3. In the mean time, user does "echo b > /proc/sysrq-trigger" in guest. This
>    leads to qemu reset and ultimately in main thread we end up in
>    vhost_dev_stop() which is trying to shutdown virtqueues for the device.
> 
> 4. Slave gets VHOST_USER_GET_VRING_BASE message to shutdown a virtqueue on
>    unix socket being used for communication.
> 
> 5. Slave tries to shutdown the thread serving vq1 and waits for it to
>    terminate. But vq1 thread can't terminate because it is waiting for
>    a response from qemu on slave_fd. And qemu is not processing slave_fd
>    anymore as qemu is ressing (and not running main event loop anymore)
>    and is waiting for a response to VHOST_USER_GET_VRING_BASE.
> 
> So we have a deadlock. Qemu is waiting on slave to response to
> VHOST_USER_GET_VRING_BASE message and slave is waiting on qemu to
> respond to request it sent on slave_fd.
> 
> I can reliably reproduce this race with virtio-fs.
> 
> Hence here is the proposal to solve this problem. Enhance vhost-user
> protocol to properly shutdown slave_fd channel. And if there are pending
> requests, flush the channel completely before sending the request to
> shutdown virtqueues.
> 
> New workflow to shutdown slave channel is.
> 
> - Qemu sends VHOST_USER_STOP_SLAVE_CHANNEL request to slave. It waits
>   for an reply from guest.
> 
> - Then qemu sits in a tight loop waiting for
>   VHOST_USER_SLAVE_STOP_CHANNEL_COMPLETE message from slave on slave_fd.
>   And while waiting for this message, qemu continues to process requests
>   on slave_fd to flush any pending requests. This will unblock threads
>   in slave and allow slave to shutdown slave channel.
> 
> - Once qemu gets VHOST_USER_SLAVE_STOP_CHANNEL_COMPLETE message, it knows
>   no more requests will come on slave_fd and it can continue to shutdown
>   virtqueues now.
> 
> - Once device starts again, qemu will send VHOST_USER_START_SLAVE_CHANNEL
>   message to slave to open the slave channel and receive requests.
> 
> IOW, this allows for proper shutdown of slave channel, making sure
> no threads in slave are blocked on sending/receiving message. And
> this in-turn allows for shutting down of virtqueues, hence resolving
> the deadlock.

Is the new message necessary? How about letting QEMU handle slave fd
activity while waiting for virtqueues to stop instead?

In other words, QEMU should monitor both the UNIX domain socket and the
slave fd after it has sent VHOST_USER_GET_VRING_BASE and is awaiting the
response.

Stefan

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 6/6] virtiofsd: Opt in for slave start/stop/shutdown functionality, (continued)
- [PATCH 6/6] virtiofsd: Opt in for slave start/stop/shutdown functionality, Vivek Goyal, 2021/01/25
- [PATCH 3/6] vhost-user: Return error code from slave_read(), Vivek Goyal, 2021/01/25
  - Re: [PATCH 3/6] vhost-user: Return error code from slave_read(), Greg Kurz, 2021/01/29
    - Re: [PATCH 3/6] vhost-user: Return error code from slave_read(), Vivek Goyal, 2021/01/29
- [PATCH 1/6] virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit, Vivek Goyal, 2021/01/25
  - Re: [PATCH 1/6] virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit, Greg Kurz, 2021/01/26
    - Re: [PATCH 1/6] virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit, Vivek Goyal, 2021/01/26
    - Re: [PATCH 1/6] virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit, Greg Kurz, 2021/01/29
    - Re: [PATCH 1/6] virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit, Vivek Goyal, 2021/01/29
- [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush slave channel, Vivek Goyal, 2021/01/25
  - Re: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush slave channel, Stefan Hajnoczi <=
    - Re: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush slave channel, Vivek Goyal, 2021/01/29
    - Re: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush slave channel, Vivek Goyal, 2021/01/29
- [PATCH 5/6] libvhost-user: Add support to start/stop/flush slave channel, Vivek Goyal, 2021/01/25

Prev by Date: Re: [RFC PATCH v2 24/32] hw/cxl/device: Add a memory device (8.2.8.5)
Next by Date: Re: [PATCH v6 00/11] hvf: Implement Apple Silicon Support
Previous by thread: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush slave channel
Next by thread: Re: [PATCH 4/6] qemu, vhost-user: Extend protocol to start/stop/flush slave channel
Index(es):
- Date
- Thread