qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: About restoring the state in vhost-vdpa device


From: Gautam Dawar
Subject: RE: About restoring the state in vhost-vdpa device
Date: Fri, 13 May 2022 17:48:28 +0000

-----Original Message-----
From: Parav Pandit <parav@nvidia.com> 
Sent: Friday, May 13, 2022 8:39 PM
To: Eugenio Perez Martin <eperezma@redhat.com>; virtualization 
<virtualization@lists.linux-foundation.org>; qemu-level 
<qemu-devel@nongnu.org>; Jason Wang <jasowang@redhat.com>; Cindy Lu 
<lulu@redhat.com>; Gautam Dawar <gdawar@xilinx.com>; 
virtio-networking@redhat.com; Eli Cohen <elic@nvidia.com>; Laurent Vivier 
<lvivier@redhat.com>; Stefano Garzarella <sgarzare@redhat.com>
Subject: RE: About restoring the state in vhost-vdpa device


> From: Eugenio Perez Martin <eperezma@redhat.com>
> Sent: Wednesday, May 11, 2022 3:44 PM
> 
> This is a proposal to restore the state of the vhost-vdpa device at 
> the destination after a live migration. It uses as many available 
> features both from the device and from qemu as possible so we keep the 
> communication simple and speed up the merging process.
> 
> # Initializing a vhost-vdpa device.
> 
> Without the context of live migration, the steps to initialize the 
> device from vhost-vdpa at qemu starting are:
> 1) [vhost] Open the vdpa device, Using simply open()
> 2) [vhost+virtio] Get device features. These are expected not to 
> change in the device's lifetime, so we can save them. Qemu issues a 
> VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
> get_device_features() callback.
> 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
This should be soon replaced with more generic num_vq interface as 
max_queue_pairs don’t, work beyond net.
There is no need to continue some ancient interface way for newly built vdpa 
stack.

> These are obtained using VHOST_VDPA_GET_CONFIG, and that request is 
> forwarded to the device using get_config. QEMU expects the device to 
> not change it in its lifetime.
> 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no 
> FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and the 
> vdpa backend driver callback is set_status.
> 
> These are the steps used to initialize the device in qemu terminology, 
> taking away some redundancies to make it simpler.
> 
> Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu 
> detects it, so it *starts* the device.
> 
> # Starting a vhost-vdpa device
> 
> At virtio_net_vhost_status we have two important variables here:
> int cvq = _F_CTRL_VQ ? 1 : 0;
> int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) :
> 0;
> 
> Now identification of the cvq index. Qemu *know* that the device will 
> expose it at the last queue (max_queue_pairs*2) if _F_MQ has been 
> acknowledged by the guest's driver or 2 if not. It cannot depend on 
> any data sent to the device via cvq, because we couldn't get its 
> command status on a change.
> 
> Now we start the vhost device. The workflow is currently:
> 
> 5) [virtio+vhost] The first step is to send the acknowledgement of the 
> Virtio features and vhost/vdpa backend features to the device, so it 
> knows how to configure itself. This is done using the same calls as 
> step 4 with these feature bits added.
> 6) [virtio] Set the size, base, addr, kick and call fd for each queue 
> (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with 
> set_vq_address, set_vq_state, ...)
> 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each 
> queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and 
> forwarded to the vdpa backend using set_vq_ready callback.
> 8) [virtio + vdpa] Send memory translations & set DRIVER_OK.
> 
So MQ all VQs setup should be set before step_8.

> If we follow the current workflow, the device is allowed now to start 
> receiving only on vq pair 0, since we've still not set the multi queue 
> pair. This could cause the guest to receive packets in unexpected 
> queues, breaking RSS.
> 
> # Proposal
> 
> Our proposal diverge in step 7: Instead of enabling *all* the 
> virtqueues, only enable the CVQ.
Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
[GD>>] Yes, that's my understanding as well.

> After that, send the DRIVER_OK and queue all the control commands to 
> restore the device status (MQ, RSS, ...). Once all of them have been 
> acknowledged ("device", or emulated cvq in host vdpa backend driver, 
> has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all 
> other queues.
> 
What is special about doing DRIVER_OK and enqueuing the control commands?
Why other configuration cannot be applied before DRIVER_OK?
[GD>>] For the device to be live (and any queue to be able to pass traffic), 
DRIVER_OK is a must. So, any configuration can be passed over the CVQ only 
after it is started (vring configuration + DRIVER_OK). For an emulated queue, 
if the order is reversed, I think the enqueued commands will remain buffered 
and device should be able to service them when it goes live.

In other words,
Step_7 already setups up the necessary VQ related fields.

Before doing driver ok, what is needed is to setup any other device fields and 
features.
For net this includes rss, vlan, mac filters.
So, a new vdpa ioctl() should be able to set these values.
This is the ioctl() between user and kernel.
Post this ioctl(), DRIVER_OK should be done resuming the device.

Device has full view of config now.

This node local device setup change should not require migration protocol 
change.

This scheme will also work for non_net virtio devices too.

> Everything needed for this is already implemented in the kernel as far 
> as I see, there is only a small modification in qemu needed. Thus 
> achieving the restoring of the device state without creating maintenance 
> burden.
> 
> A lot of optimizations can be applied on top without the need to add 
> stuff to the migration protocol or vDPA uAPI, like the pre-warming of 
> the vdpa queues or adding more capabilities to the emulated CVQ.
Above ioctl() will enable vdpa subsystem to apply this setting one mor more 
times in pre-warming up stage before DRIVER_OK.

> 
> Other optimizations like applying the state out of band can also be 
> added so they can run in parallel with the migration, but that 
> requires a bigger change in qemu migration protocol making us lose 
> focus on achieving at least the basic device migration in my opinion.
> 
Let's strive to apply this in-band as much as possible. Applying out of band 
opens issues unrelated to migration (authentication and more).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]