Re: [Virtio-fs] (no subject)

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Virtio-fs] (no subject)

From:	Hanna Czenczek
Subject:	Re: [Virtio-fs] (no subject)
Date:	Mon, 9 Oct 2023 10:21:51 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1

On 07.10.23 04:22, Yajun Wu wrote:

On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote:
External email: Use caution opening links or attachments


On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote:
On 06.10.23 11:26, Michael S. Tsirkin wrote:
On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote:
On 06.10.23 10:45, Michael S. Tsirkin wrote:
On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote:
On 05.10.23 19:15, Michael S. Tsirkin wrote:
On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote:
On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote:
There is no clearly defined purpose for the virtio statusbyte invhost-user: For resetting, we already have RESET_DEVICE; andfor virtio
feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK
protocol extension, it is possible for SET_FEATURES to returnerrors
(SET_PROTOCOL_FEATURES may be called before SET_FEATURES).
As for implementations, SET_STATUS is not widelyimplemented. dpdk doesimplement it, but only uses it to signal feature negotiationfailure.While it does log reset requests (SET_STATUS 0) as such, iteffectivelyignores them, in contrast to RESET_OWNER (which isdeprecated, and today
means the same thing as RESET_DEVICE).
While qemu superficially has support for [GS]ET_STATUS, itdoes not
forward the guest-set status byte, but instead just makes it up
internally, and actually completely ignores what the back-endreturns,only using it as the template for a subsequent SET_STATUS toadd singlebits to it. Notably, after setting FEATURES_OK, it neverreads it backto see whether the flag is still set, which is the only wayin which
dpdk uses the status byte.
As-is, no front-end or back-end can rely on the other sidehandling thisfield in a useful manner, and it also provides no practicaluse overother mechanisms the vhost-user protocol has, which are moreclearly
defined.  Deprecate it.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
docs/interop/vhost-user.rst | 28+++++++++++++++++++++-------
     1 file changed, 21 insertions(+), 7 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
SET_STATUS is the only way to signal failure to acknowledgeFEATURES_OK.The fact current backends never check errors does not mean theynever
will. So no, not applying this.
Can this not be done with REPLY_ACK? I.e., with the followingmessage
order:
1. GET_FEATURES to find out whetherVHOST_USER_F_PROTOCOL_FEATURES is
present
2. GET_PROTOCOL_FEATURES to hopefully getVHOST_USER_PROTOCOL_F_REPLY_ACK
3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK
4. SET_FEATURES with need_reply
If not, the problem is that qemu has sent SET_STATUS 0 for awhile when thevCPUs are stopped, which generally seems to request a devicereset. If wedon’t state at least that SET_STATUS 0 is to be ignored,back-ends that willimplement SET_STATUS later may break with at least these qemuversions. Butdocumenting that a particular use of the status byte is to beignored would
be really strange.

Hanna
Hmm I guess. Though just following virtio spec seems cleaner tome...
vhost-user reconfigures the state fully on start.
Not the internal device state, though. virtiofsd has internalstate, and
other devices like vhost-gpu back-ends would probably, too.

Stefan has recently sent a series
(https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html)toput the reset (RESET_DEVICE) into virtio_reset() (when we reallyneed a
reset).
I really don’t like our current approach with the status byte.Following thevirtio specification to me would mean that the guest directlycontrols thisbyte, which it does not. qemu makes up values as it deemsappropriate, andthis includes sending a SET_STATUS 0 when the guest is justpaused, i.e.
when the guest really doesn’t want a device reset.
That means that qemu does not treat this as a virtio device field(becausethat would mean exposing it to the guest driver), but insteadtreats it aspart of the vhost(-user) protocol. It doesn’t feel right to methat we usea virtio-defined feature for communication on the vhost level,i.e. betweenfront-end and back-end, and not between guest driver and device. I thinkall vhost-level protocol features should be fully defined in thevhost-user
specification, which REPLY_ACK is.
Hmm that makes sense. Maybe we should have done what stefan's patch
is doing.

Do look at the original commit that introduced it to understand why
it was added.
I don’t understand why this was added to the stop/cont code,though. If itis time consuming to make these changes, why are they done everytime the VM
is paused
and resumed?  It makes sense that this would be done for the initial
configuration (where a reset also wouldn’t hurt), but here it seemswrong.
(To be clear, a reset in the stop/cont code is wrong, because it breaks
stateful devices.)

Also, note the newer commits 6f8be29ec17 and c3716f260bf.  The reset as
originally introduced was wrong even for non-stateful devices,because itoccurred before we fetched the state (vring indices) so we couldrestore itlater. I don’t know how 923b8921d21 was tested, but if the back-endusedfor testing implemented SET_STATUS 0 as a reset, it could not havesurvivedeither migration or a stop/cont in general, because the vringindices would
have been reset to 0.
What I’m saying is, 923b8921d21 introduced SET_STATUS calls thatbroke alldevices that would implement them as per virtio spec, and even todayit’s
broken for stateful devices.  The mentioned performance issue is likely
real, but we can’t address it by making up SET_STATUS calls that arewrong.
I concede that I didn’t think about DRIVER_OK. Personally, I woulddo allfinal configuration that would happen upon a DRIVER_OK once thefirst vring
is started (i.e. receives a kick).  That has the added benefit of being
asynchronous because it doesn’t block any vhost-user messages (whichare
synchronous, and thus block downtime).

Hanna
For better or worse kick is per ring. It's out of spec to start rings
that were not kicked but I guess you could do configuration ...
Seems somewhat asymmetrical though.

Let's wait until next week, hopefully Yajun Wu will answer.
The main motivation of adding VHOST_USER_SET_STATUS is to let backendDPDK knowwhen DRIVER_OK bit is valid. It's an indication of all VQconfiguration has sent,otherwise DPDK has to rely on first queue pair is ready, thenreceiving/applying
VQ configuration one by one.

During live migration, configuring VQ one by one is very time consuming.

One question I have here is why it wasn’t then introduced in the livemigration code, but in the general VM stop/cont code instead. It doesseem time-consuming to do this every time the VM is paused and resumed.

For VIRTIO
net vDPA, HW needs to know how many VQs are enabled to setRSS(Receive-Side Scaling).
If you don’t want SET_STATUS message, backend can remove protocolfeature bit
VHOST_USER_PROTOCOL_F_STATUS.

The problem isn’t back-ends that don’t want the message, the problem isthat qemu uses the message wrongly, which prevents well-behavingback-ends from implementing the message.

DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do deviceclose/reset.

So the right thing to do for back-ends is to announce STATUS support andthen not implement it correctly?

GET_VRING_BASE should not reset the close or reset the device, by theway. It should stop that one vring, not more. We have a RESET_DEVICEcommand for resetting.

I'm not involved in discussion about adding SET_STATUS in Vhostprotocol. This featureis essential for vDPA(same as vhost-vdpa implementsVHOST_VDPA_SET_STATUS).

So from what I gather from your response is that there is only a singleuse for SET_STATUS, which is the DRIVER_OK bit. If so, documenting thatall other bits are to be ignored by both back-end and front-end would befine by me.

I’m not fully serious about that suggestion, but I hear the strongimplication that nothing but DRIVER_OK was of any concern, and this isreally important to note when we talk about the status of the STATUSfeature in vhost today. It seems to me now that it was not intended tobe the virtio-level status byte, but just a DRIVER_OK signalling pathfrom front-end to back-end. That makes it a vhost-level protocolfeature to me.


Hanna

Thanks,
Yajun
Now, we could hand full control of the status byte to the guest,and thatwould make me content. But I feel like that doesn’t really work,becauseqemu needs to intercept the status byte anyway (it needs to knowwhen thereis a reset, probably wants to know when the device is configured,etc.), soI don’t think having the status byte in vhost-user really gains usmuch whenqemu could translate status byte changes to/from other vhost-usercommands.
Hanna
well it intercepts it but I think it could pass it on unchanged.
I guess symmetry was the
point. So I don't see why SET_STATUS 0 has to be ignored.


SET_STATUS was introduced by:

commit 923b8921d210763359e96246a58658ac0db6c645
Author: Yajun Wu <yajunw@nvidia.com>
Date:   Mon Oct 17 14:44:52 2022 +0800

       vhost-user: Support vhost_dev_start

CC the author.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Virtio-fs] (no subject), (continued)

Prev by Date: Re: [PATCH v7 05/15] python/qemu: rename command() to cmd()
Next by Date: Re: [PATCH RESEND 11/15] ppc: spapr: Implement nested PAPR hcall - H_GUEST_[GET|SET]_STATE
Previous by thread: Re: [Virtio-fs] (no subject)
Next by thread: Re: [Virtio-fs] (no subject)
Index(es):
- Date
- Thread