qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About restoring the state in vhost-vdpa device


From: Jason Wang
Subject: Re: About restoring the state in vhost-vdpa device
Date: Tue, 17 May 2022 11:05:12 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0


在 2022/5/17 04:29, Parav Pandit 写道:

From: Eugenio Perez Martin <eperezma@redhat.com>
Sent: Monday, May 16, 2022 4:51 AM

On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
Hi Gautam,

Please fix your email client to have right response format.
Otherwise, it will be confusing for the rest and us to follow the
conversation.
More below.

From: Gautam Dawar <gdawar@xilinx.com>
Sent: Friday, May 13, 2022 1:48 PM
Our proposal diverge in step 7: Instead of enabling *all* the
virtqueues, only enable the CVQ.
Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
[GD>>] Yes, that's my understanding as well.

That's correct. We can say that for a queue to be enabled three things must
happen:
* DRIVER_OK (Still not send)
* VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
CVQ)
* If queue is not in first data queue pair and not cvq: send
VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.

These if conditions, specially the last one is not good as it requires device 
type knowledge, which in most cases not needed.
Specially for the new code.

After that, send the DRIVER_OK and queue all the control commands
to restore the device status (MQ, RSS, ...). Once all of them have
been acknowledged ("device", or emulated cvq in host vdpa backend
driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
set_vq_ready) all other queues.

What is special about doing DRIVER_OK and enqueuing the control
commands?
Why other configuration cannot be applied before DRIVER_OK?
There is nothing special beyond "they have a method to be set that way, so
reusing it avoids having to maintain many ways to set the same things,
simplifying implementations".

I'm not saying "it has been like that forever so we cannot change it":
I'm very open to the change but priority-wise we should first achieve a
working LM with packed, in_order, or even indirect.

[GD>>] For the device to be live (and any queue to be able to pass
traffic), DRIVER_OK is a must.
This applies only to the vdpa device implemented over virtio device.
For such use case/implementation anyway a proper virtio spec extension is
needed for it be efficient.
And what that happens this scheme will still work.

Although it's a longer route, I'd very much prefer an in-band virtio way to
perform it rather than a linux/vdpa specific. It's one of the reasons I prefer
the CVQ behavior over a vdpa specific ioctl.

What is the in-band method to set last_avail_idx?
In-band virtio method doesn't exist.


Right, but it's part of the vhost API which was there for more than 10 years. This should be supported by all the vDPA vendors.



Other vdpa devices doesn’t have to live with this limitation at this moment.

So, any configuration can be passed over the CVQ only after it is
started (vring configuration + DRIVER_OK). For an emulated queue, if
the order is reversed, I think the enqueued commands will remain
buffered and device should be able to service them when it goes live.
I likely didn’t understand what you describe above.

Vq avail index etc is programmed before doing DRIVER_OK anyway.

Sequence is very straight forward at destination from user to kernel.
1. Set config space fields (such as virtio_net_config/virtio_blk_config).
2. Set other device attributes (max vqs, current num vqs) 3. Set net
specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
(enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes from
where it left off

Steps #1 to #4 can be done multiple times in pre-warm device up case in
future.

That requires creating a way to set all the parameters enumerated there to
vdpa devices. Each time a new feature is added to virtio-net that modifies
the guest-visible fronted device we would need to update it.
Any guest visible feature exposed by the vdpa device on the source side, a 
migration agent needs to verify/and make destination device capable to support 
it anyway. Without it a device may be migrated, but it won't perform at same 
level as source.

And all the
layers of the stack need to maintain more state.
Mostly not. A complete virtio device state arrived from source vdpa device can 
be given to destination vdpa device without anyone else looking in the middle. 
If this format is known/well defined.


That's fine, and it seems the virtio spec is a better place for this, then we won't duplicate efforts?



 From the guest point of view, to enable all the queues with
VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
as send DRIVER_OK and not to enable any data queue with
VHOST_VDPA_SET_VRING_ENABLE.
Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.


It looks to me the spec:

1) For PCI it doesn't forbid the driver to set queue_enable to 1 after DRIVER_OK.
2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK


1. supplied RSS config and VQ config is not honored for several tens of 
hundreds of milliseconds
It will be purely dependent on how/when this ioctl are made.
Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.


I don't get why we end up with this situation.

1) enable cvq
2) set driver_ok
3) set RSS
4) enable TX/RX

vs

1) set RSS
2) enable cvq
3) enable TX/RX
4) set driver_ok

Is the latter faster?



2. Each VQ enablement one at a time, requires constant steering update for the 
VQ
While this information is something already known. Trying to reuse brings a 
callback result in this in-efficiency.
So better to start with more reusable APIs that fits the LM flow.


I agree, but the method proposed in the mail seems to be the only way that can work with the all the major vDPA vendors.

E.g the new API requires the device has the ability to receive device state other than the control virtqueue which might not be supported the hardware. (The device might expects a trap and emulate model rather than save and restore).

From qemu point of view, it might need to support both models.

If the device can't do save and restore:

1.1) enable cvq
1.2) set driver_ok
1.3) set device state (MQ, RSS) via control vq
1.4) enable TX/RX

If the device can do save and restore:

2.1) set device state (new API for setting MQ,RSS)
2.2) enable cvq
2.3) enable TX?RX
2.4) set driver_ok

We can start from 1 since it works for all device and then adding support for 2?

Thanks





reply via email to

[Prev in Thread] Current Thread [Next in Thread]