qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: virtio-vsock requires 'disable-legacy=on' in QEMU 5.1


From: Stefano Garzarella
Subject: Re: virtio-vsock requires 'disable-legacy=on' in QEMU 5.1
Date: Tue, 18 Aug 2020 16:01:20 +0200

On Tue, Aug 18, 2020 at 02:44:50PM +0200, Cornelia Huck wrote:
> On Mon, 17 Aug 2020 15:11:28 +0200
> Stefano Garzarella <sgarzare@redhat.com> wrote:
> 
> > On Mon, Aug 17, 2020 at 12:27:46PM +0200, Cornelia Huck wrote:
> > > On Thu, 13 Aug 2020 14:04:15 +0200
> > > Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >   
> > > > On Thu, Aug 13, 2020 at 12:37:37PM +0200, Cornelia Huck wrote:  
> > > > > On Thu, 13 Aug 2020 12:24:30 +0200
> > > > > Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > >     
> > > > > > On Thu, Aug 13, 2020 at 11:28:20AM +0200, Cornelia Huck wrote:    
> > > > > > > We basically have three possible ways to deal with this:
> > > > > > > 
> > > > > > > - Force it to modern (i.e., what you have been doing; would need 
> > > > > > > the
> > > > > > >   equivalent changes in ccw as well.)      
> > > > > > 
> > > > > > Oo, thanks for pointing out ccw!
> > > > > > I don't know ccw well, in this case should we set dev->max_rev to 1 
> > > > > > or 2
> > > > > > to force to modern?    
> > > > > 
> > > > > No, ->max_rev is the wrong side of the limit :) You want    
> > > > 
> > > > Well :-) Thanks!
> > > >   
> > > > > 
> > > > >     ccw_dev->force_revision_1 = true;
> > > > > 
> > > > > in _instance_init() (see e.g. virtio-ccw-gpu.c).
> > > > >     
> > > > > >     
> > > > > > >   Pro: looks like the cleanest approach.
> > > > > > >   Con: not sure if we would need backwards compatibility support,
> > > > > > >   which looks hairy.      
> > > > > > 
> > > > > > Not sure too.    
> > > > > 
> > > > > Yes, I'm not sure at all how to handle user-specified values for
> > > > > legacy/modern.  
> > > 
> > > Thinking a bit more about it, I'm not sure whether we even *can*
> > > provide backwards compatibility: we have different autoconfigurations
> > > for PCI based upon where it is plugged, and ccw does not have a way to
> > > turn legacy on/off, except from within the code.  
> > 
> > Yes, I discovered today for example that the PCIe bus set auto-legacy
> > mode to off.
> 
> And vhost-vsock actually really seems to be modern-only, see below.
> 
> > 
> > >   
> > > > >     
> > > > > >     
> > > > > > > - Add vsock to the list of devices with legacy support.
> > > > > > >   Pro: Existing setups continue to work.
> > > > > > >   Con: If vsock is really virtio-1-only, we still carry around
> > > > > > >   possibly broken legacy support.      
> > > > > > 
> > > > > > I'm not sure it is virtio-1-only, but virtio-vsock was introduced in
> > > > > > 2016, so I supposed it is modern-only.    
> > > > > 
> > > > > Yes, I would guess so as well.
> > > > >     
> > > > > > 
> > > > > > How can I verify that? Maybe forcing legacy mode and run some 
> > > > > > tests.    
> > > > > 
> > > > > Probably yes. The likeliest area with issues is probably endianness, 
> > > > > so
> > > > > maybe with something big endian in the mix?
> > > > >     
> > > > 
> > > > Yeah, I'll try this setup!
> 
> Ok, I tried this now with an x86 host and an s390x guest. Reverted the
> checking commit, tried both with a -ccw and a -pci device and your ncat
> example.
> - When using virtio-1, both devices work fine.
> - When using the -pci device with disable-modern=yes, I get "reset by
>   peer".
> - When using the -ccw device with max_revision=0, I get an instant
>   timeout.

Great, thanks for testing!

> 
> Smells like endianness problems (aka weird things are happening).

Yeah.

> 
> Also noticed that vhost-vsock-ccw does not have an immediate problem,
> even with the commit: The code only checks whether the device has been
> forced to legacy, not whether legacy is allowed (which cannot be
> controlled by the user anyway). Probably best to address after we've
> dealt with the vhost-vsock issue and made sure that there are no other
> problems.

Make sense!

> 
> > > >   
> > > > > >     
> > > > > > > - Do nothing, have users force legacy off. Bad idea, as ccw has 
> > > > > > > no way
> > > > > > >   to do that on the command line.
> > > > > > > 
> > > > > > > The first option is probably best.  
> > > 
> > > The first option is now "force modern, but with no backwards
> > > compatibility", which is not that great; but "allow legacy, even though
> > > it should not exist" is not particularly appealing, either... what a
> > > mess :(  
> > 
> > Yeah, it's a mess :-( anyway I still prefer option 1, it seems a little
> > bit more correct to me.
> 
> It seems to me that the status before this was "works by accident, but
> only if we're not negotiating to legacy, or the guest/host are both
> little endian". IOW, no visible breakage for most people (or we'd
> probably have heard of it already). Now we have a setup that's correct,
> but forces users to adapt their QEMU command lines. Option 1 would
> eliminate the need to do that, but would cause possibly
> not-really-fixable migration issues (you can probably deal with that
> manually, detaching and re-attaching the device as a last resort.)
> 
> So, force modern, probably also remove the -transitional device type,
> and put a prominent explanation into the change log?
> 

I completely agree with your analysis and solution.

So, for now we need to patch vhost-vsock-pci and vhost-user-vsock-pci,
and queue the patches in stable.

Do you prefer to send them? Otherwise I can do that.

Thanks again for the help and the test with s390x guest!
Stefano




reply via email to

[Prev in Thread] Current Thread [Next in Thread]