qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vhost-vsock: fix migration issue when seqpacket is supported


From: Stefano Garzarella
Subject: Re: [PATCH] vhost-vsock: fix migration issue when seqpacket is supported
Date: Wed, 8 Sep 2021 15:41:35 +0200

On Tue, Sep 07, 2021 at 03:47:56PM +0200, Stefano Garzarella wrote:
On Tue, Sep 07, 2021 at 02:22:24PM +0100, Daniel P. Berrangé wrote:
On Tue, Sep 07, 2021 at 02:49:35PM +0200, Stefano Garzarella wrote:
Commit 1e08fd0a46 ("vhost-vsock: SOCK_SEQPACKET feature bit support")
enabled the SEQPACKET feature bit.
This commit is released with QEMU 6.1, so if we try to migrate a VM where
the host kernel supports SEQPACKET but machine type version is less than
6.1, we get the following errors:

   Features 0x130000002 unsupported. Allowed features: 0x179000000
   Failed to load virtio-vhost_vsock:virtio
   error while loading state for instance 0x0 of device 
'0000:00:05.0/virtio-vhost_vsock'
   load of migration failed: Operation not permitted

Let's disable the feature bit for machine types < 6.1, adding a
`features` field to VHostVSock to simplify the handling of upcoming
features we will support.

IIUC, this will still leave migration broken for anyone migrating
a >= 6.1 machine type between a kernel that supports SEQPACKET and
a kernel lacking that, or vica-verca.

This should be true for migrating from kernel that supports SEQPACKET to a kernel lacking that.

For vice-versa I'm not sure, since vhost_get_features() will disable that feature if the host kernel doesn't support it, and the guest will not have acked it.

I did some testing and the migration is only broken in the case of
kernel 5.14+ (SEQPACKET supported) -> kernel 5.13 (SEQPACKET not supported).

Vice-versa works well because the feature is not acked.



If a feature is dependant on a host kernel feature we can't turn
that on automatically as part of the machine type, as we need
ABI stability across migration indepdant of kernel version.


How do we typically handle this?

I wrongly thought it was an expected behavior that migrating a guest using a vhost device from a new kernel to an old one can fail if not all features are supported.

I need to take a look at the other vhost devices.

I took a look at vhost-net and vhost-scsi and we don't seem to handle this case. Maybe I'm missing something...

So following your advice, the best thing would be to have this feature disabled by default and require the user to enable it explicitly so we are sure it is needed. At this point a migration to a kernel that doesn't support it is rightly broken.

Or is there something better we can do?

@Michael @Jason any thoughts?

Thanks,
Stefano




reply via email to

[Prev in Thread] Current Thread [Next in Thread]