qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About restoring the state in vhost-vdpa device


From: Jason Wang
Subject: Re: About restoring the state in vhost-vdpa device
Date: Mon, 23 May 2022 11:12:58 +0800

On Wed, May 18, 2022 at 8:44 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, May 16, 2022 11:05 PM
> > >> Although it's a longer route, I'd very much prefer an in-band virtio
> > >> way to perform it rather than a linux/vdpa specific. It's one of the
> > >> reasons I prefer the CVQ behavior over a vdpa specific ioctl.
> > >>
> > > What is the in-band method to set last_avail_idx?
> > > In-band virtio method doesn't exist.
> >
> >
> > Right, but it's part of the vhost API which was there for more than 10 
> > years.
> > This should be supported by all the vDPA vendors.
> Sure. My point to Eugenio was that vdpa doesn’t have to limited by virtio 
> spec.

Yes, but the APIs only consist of the ones that have been already
supported by either virtio or vhost.

> Plumbing exists to make vdpa work without virtio spec.
> And hence, additional ioctl can be ok.
>
> > >> layers of the stack need to maintain more state.
> > > Mostly not. A complete virtio device state arrived from source vdpa device
> > can be given to destination vdpa device without anyone else looking in the
> > middle. If this format is known/well defined.
> >
> >
> > That's fine, and it seems the virtio spec is a better place for this,
> > then we won't duplicate efforts?
> >
> Yes. for VDPA kernel, setting parameters doesn’t need virtio spec update.

We've already had a spec patch for this.

> It is similar to avail index setting.

Yes, but we don't want to be separated too much from the virtio spec
unless we have a very strong point. The reason is that it would be
challenging to offer forward compatibility to the future spec support
of device state restoring. That's why we tend to reuse the existing
APIs so far.

>
> >
> > >
> > >>  From the guest point of view, to enable all the queues with
> > >> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the
> > same
> > >> as send DRIVER_OK and not to enable any data queue with
> > >> VHOST_VDPA_SET_VRING_ENABLE.
> > > Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things
> > broken.
> >
> >
> > It looks to me the spec:
> >
> > 1) For PCI it doesn't forbid the driver to set queue_enable to 1 after
> > DRIVER_OK.
> Device init sequence sort of hints that vq setup should be done before 
> driver_ok in below snippet.
>
> "Perform device-specific setup, including discovery of virtqueues for the 
> device, optional per-bus setup,
> reading and possibly writing the device’s virtio configuration space, and 
> population of virtqueues."
>
> For a moment even if we assume, that queue can be enabled after driver_ok, it 
> ends up going to incorrect queue.

For RSS, the device can choose to drop the packet if the destination
queue is not enabled, we can clarify this in the spec. Actually
there's a patch that has clarified the packet should be dropped if the
queue is reset:

https://lists.oasis-open.org/archives/virtio-dev/202204/msg00063.html

We need to do something similar to queue_enable in this case. Then we
are all fine.

And the over head of the incremental ioctls is not something that
can't be addressed, we can introduce e.g a new command to disable
datapath by setting num_queue_paris to 0.

> Because the queue where it supposed to go, it not enabled and its rss is not 
> setup.

So the queue_enable and num_queue_pairs work at different levels.
Queue_enable works at general virtio level, but the num_queue_paris
works for networking only.

>From the spec, it allows the following setups:

1) enable 1st queue pairs
2) set driver_ok
3) set 4 queue pairs

The device is expected to deal with this setup anyhow.

>
> So on restore flow it is desired to set needed config before doing driver_ok.
>
> > 2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK
> >
> >
> > > 1. supplied RSS config and VQ config is not honored for several tens of
> > hundreds of milliseconds
> > > It will be purely dependent on how/when this ioctl are made.
> > > Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
> >
> >
> > I don't get why we end up with this situation.
> >
> > 1) enable cvq
> > 2) set driver_ok
> > 3) set RSS
> > 4) enable TX/RX
> >
> > vs
> >
> > 1) set RSS
> > 2) enable cvq
> > 3) enable TX/RX
> > 4) set driver_ok
> >
> > Is the latter faster?
> >
> Yes, because later sequence has the ability to setup steering config once.
> As opposed to that first sequence needs to incrementally update the rss 
> setting on every new queue addition on step #4.

So what I understand the device RX flow is something like:

Incoming packet -> steering[1] -> queue_enable[2] -> start DMA

The steering[1] is expected to be configured when setting either MQ or RSS once.
The queue_enable[2] check is an independent check after steering.

We should only start the DMA after both 1 and 2. So the only
incremental thing is queue_enable[2].

I would try Eugenio's proposal and see how it works, anyhow it's the
cheapest way so far (without introducing new ioctls etc). My
understanding is that it should work (probably with some overhead on
the queue_enable step) for all vendors so far. If it doesn't we can do
new ioctls.

>
> >
> > >
> > > 2. Each VQ enablement one at a time, requires constant steering update
> > for the VQ
> > > While this information is something already known. Trying to reuse brings 
> > > a
> > callback result in this in-efficiency.
> > > So better to start with more reusable APIs that fits the LM flow.
> >
> >
> > I agree, but the method proposed in the mail seems to be the only way
> > that can work with the all the major vDPA vendors.
> >
> > E.g the new API requires the device has the ability to receive device
> > state other than the control virtqueue which might not be supported the
> > hardware. (The device might expects a trap and emulate model rather than
> > save and restore).
> >
> How a given vendor to return the values is in the vendor specific vdpa 
> driver, just like avail_index which is not coming through the CVQ.

The problem is:

1) at virtqueue level, we know the index (and the inflight buffers)
are the only state
2) at the device level, we know there're a lot of other stuffs,
inventing new ioctls might require very careful design which may take
time

>
> >  From qemu point of view, it might need to support both models.
> >
> > If the device can't do save and restore:
> >
> > 1.1) enable cvq
> > 1.2) set driver_ok
> > 1.3) set device state (MQ, RSS) via control vq
> > 1.4) enable TX/RX
> >
> > If the device can do save and restore:
> >
> > 2.1) set device state (new API for setting MQ,RSS)
> > 2.2) enable cvq
> > 2.3) enable TX?RX
> > 2.4) set driver_ok
> >
> > We can start from 1 since it works for all device and then adding
> > support for 2?
> >
>
> How about:
> 3.1) create cvq for the supported device
> Cvq not exposed to user space, stays in the kernel. Vdpa driver created it.

If it's a hardware CVQ, we may still need to set DRIVER_OK before
setting the states.

>
> 3.2) set device state (MQ, RSS) comes via user->kernel ioctl()
> Vdpa driver internally decides whether to use cvq or something else (like 
> avail index).
>

I think this is similar to method 2. The challenge is that the ioctl()
is not flexible as a queue, e.g we want a device agnostics API, (and
if we had one, it could be defined in the virtio-spec). At the API
level, it doesn't differ too much whether it's an ioctl or a queue. If
we decide to go with this way, we can simply insist that the CVQ can
work before DRIVER_OK (with a backend feature anyhow).

3.1) enable cvq
3.2) set device state (RSS,MQ) via CVQ
3.3) enable tx/rx
3.4) set driver_ok

Thanks


> 3.3) enable tx/rx
> 3.4) set driver_ok




reply via email to

[Prev in Thread] Current Thread [Next in Thread]