qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About restoring the state in vhost-vdpa device


From: Eugenio Perez Martin
Subject: Re: About restoring the state in vhost-vdpa device
Date: Tue, 17 May 2022 10:12:00 +0200

On Mon, May 16, 2022 at 10:29 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Eugenio Perez Martin <eperezma@redhat.com>
> > Sent: Monday, May 16, 2022 4:51 AM
> >
> > On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > Hi Gautam,
> > >
> > > Please fix your email client to have right response format.
> > > Otherwise, it will be confusing for the rest and us to follow the
> > conversation.
> > >
> > > More below.
> > >
> > > > From: Gautam Dawar <gdawar@xilinx.com>
> > > > Sent: Friday, May 13, 2022 1:48 PM
> > >
> > > > > Our proposal diverge in step 7: Instead of enabling *all* the
> > > > > virtqueues, only enable the CVQ.
> > > > Just to double check, VQ 0 and 1 of the net are also not enabled, 
> > > > correct?
> > > > [GD>>] Yes, that's my understanding as well.
> > > >
> >
> > That's correct. We can say that for a queue to be enabled three things must
> > happen:
> > * DRIVER_OK (Still not send)
> > * VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
> > CVQ)
> > * If queue is not in first data queue pair and not cvq: send
> > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.
> >
> These if conditions, specially the last one is not good as it requires device 
> type knowledge, which in most cases not needed.
> Specially for the new code.
>

DRIVER_OK and SET_VRING_ENABLE are the current way to do so, and all
kinds of vdpa devices are running this way. For the last conditional I
meant only -net. -console have VIRTIO_CONSOLE_F_MULTIPORT, and so on.

But the point was that this is already in the standard and integrated
in the devices:this is not part of the proposal, but the introduction
explaining how the devices work at this moment. We can try to optimize
this flow for sure, but it's a different discussion.

> > > > > After that, send the DRIVER_OK and queue all the control commands
> > > > > to restore the device status (MQ, RSS, ...). Once all of them have
> > > > > been acknowledged ("device", or emulated cvq in host vdpa backend
> > > > > driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
> > > > > set_vq_ready) all other queues.
> > > > >
> > > > What is special about doing DRIVER_OK and enqueuing the control
> > > > commands?
> > > > Why other configuration cannot be applied before DRIVER_OK?
> >
> > There is nothing special beyond "they have a method to be set that way, so
> > reusing it avoids having to maintain many ways to set the same things,
> > simplifying implementations".
> >
> > I'm not saying "it has been like that forever so we cannot change it":
> > I'm very open to the change but priority-wise we should first achieve a
> > working LM with packed, in_order, or even indirect.
> >
> > > > [GD>>] For the device to be live (and any queue to be able to pass
> > > > traffic), DRIVER_OK is a must.
> > > This applies only to the vdpa device implemented over virtio device.
> > > For such use case/implementation anyway a proper virtio spec extension is
> > needed for it be efficient.
> > > And what that happens this scheme will still work.
> > >
> >
> > Although it's a longer route, I'd very much prefer an in-band virtio way to
> > perform it rather than a linux/vdpa specific. It's one of the reasons I 
> > prefer
> > the CVQ behavior over a vdpa specific ioctl.
> >
> What is the in-band method to set last_avail_idx?
> In-band virtio method doesn't exist.
>

There isn't, I was acknowledging your point about "a proper virtio
spec extension is needed for it to be efficient".

The intended method is SET_BASE, supported by vhost device types, as
Jason pointed out in his mail. It has been supported for a long time.
In my opinion we should make it an in-band virtio operation too, since
being vhost-only is a limit for some devices.

> > > Other vdpa devices doesn’t have to live with this limitation at this 
> > > moment.
> > >
> > > > So, any configuration can be passed over the CVQ only after it is
> > > > started (vring configuration + DRIVER_OK). For an emulated queue, if
> > > > the order is reversed, I think the enqueued commands will remain
> > > > buffered and device should be able to service them when it goes live.
> > > I likely didn’t understand what you describe above.
> > >
> > > Vq avail index etc is programmed before doing DRIVER_OK anyway.
> > >
> > > Sequence is very straight forward at destination from user to kernel.
> > > 1. Set config space fields (such as virtio_net_config/virtio_blk_config).
> > > 2. Set other device attributes (max vqs, current num vqs) 3. Set net
> > > specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
> > > (enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes from
> > > where it left off
> > >
> > > Steps #1 to #4 can be done multiple times in pre-warm device up case in
> > future.
> >
> > That requires creating a way to set all the parameters enumerated there to
> > vdpa devices. Each time a new feature is added to virtio-net that modifies
> > the guest-visible fronted device we would need to update it.
> Any guest visible feature exposed by the vdpa device on the source side, a 
> migration agent needs to verify/and make destination device capable to 
> support it anyway. Without it a device may be migrated, but it won't perform 
> at same level as source.
>

We can discuss how to reach that point, but it's not the state at this
moment. And I doubt we can reach it before the next kernel merge
window.

> > And all the
> > layers of the stack need to maintain more state.
> Mostly not. A complete virtio device state arrived from source vdpa device 
> can be given to destination vdpa device without anyone else looking in the 
> middle. If this format is known/well defined.
>

I'm not sure I follow this. If you mean that qemu defines a format for
migration data, that is not 100% true: It changes between versions,
and you cannot migrate between two versions of qemu that are too far
apart without migrating between the immediate version if I'm not
wrong. Migration bugs are solved by changing that format, since qemu
does not need interoperability with other VMM at the moment.

The boundaries between qemu and devices need more restrictions than
that for interoperability.

I agree we could set a format for the virtio devices state, but I
think the right place is the virtio standard, not the vDPA layer. If
we do at vDPA layer, we are "repeating the mistake" of VHOST_BASE: We
need to maintain two ways to perform the same action.

> >
> > From the guest point of view, to enable all the queues with
> > VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
> > as send DRIVER_OK and not to enable any data queue with
> > VHOST_VDPA_SET_VRING_ENABLE.
>
> Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.
> 1. supplied RSS config and VQ config is not honored for several tens of 
> hundreds of milliseconds
> It will be purely dependent on how/when this ioctl are made.

We can optimize this without adding burden to the API.

> Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
>

I'm not sure why that happens.

By the standard:
After the driver transmitted a packet of a flow on transmitqX, the
device SHOULD cause incoming packets for that flow to be steered to
receiveqX. For uni-directional protocols, or where no packets have
been transmitted yet, the device MAY steer a packet to a random queue
out of the specified receiveq1…receiveqn.

The device MUST NOT queue packets on receive queues greater than
virtqueue_pairs once it has placed the VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
command in a used buffer.

It doesn't say the requests must be migrated from one queue to another
beyond the interpretation of the SHOULD. Maybe we need to restrict the
standard more to reduce the differences between the devices?

> 2. Each VQ enablement one at a time, requires constant steering update for 
> the VQ
> While this information is something already known. Trying to reuse brings a 
> callback result in this in-efficiency.
> So better to start with more reusable APIs that fits the LM flow.

We can change to that model later. Since the model proposed by us does
not add any burden, we can discard it down the road if something
better arises. The proposed behavior should already work for all
devices: It comes for free regarding kernel / vdpa code.

I think that doing at vhost/vDPA level is going to cause the same
problem as VRING_SET_BASE: We will need to maintain two ways of
performing the same, and the code will need to synchronize them. I'm
not *against* adding it by itself, I'm just considering it an
optimization that needs to be balanced against what already enables
the device to perform state restoring.

Thanks!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]