qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode


From: Igor Mammedov
Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode
Date: Thu, 27 Aug 2020 23:19:07 +0200

On Wed, 26 Aug 2020 15:10:46 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > On Tue, 25 Aug 2020 16:25:21 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > 
> > > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > > On Tue, 25 Aug 2020 09:15:04 +0100
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >   
> > > > > * Babu Moger (babu.moger@amd.com) wrote:  
> > > > > > Hi Dave,
> > > > > > 
> > > > > > On 8/24/20 1:41 PM, Dr. David Alan Gilbert wrote:    
> > > > > > > * Babu Moger (babu.moger@amd.com) wrote:    
> > > > > > >> To support some of the complex topology, we introduced EPYC mode 
> > > > > > >> apicid decode.
> > > > > > >> But, EPYC mode decode is running into problems. Also it can 
> > > > > > >> become quite a
> > > > > > >> maintenance problem in the future. So, it was decided to remove 
> > > > > > >> that code and
> > > > > > >> use the generic decode which works for majority of the topology. 
> > > > > > >> Most of the
> > > > > > >> SPECed configuration would work just fine. With some non-SPECed 
> > > > > > >> user inputs,
> > > > > > >> it will create some sub-optimal configuration.
> > > > > > >> Here is the discussion thread.
> > > > > > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-1d84-a6e7-e468-d5b437c1b254%40amd.com%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C74d90724af9c4adcc75008d8485d4d16%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637338912853492167&amp;sdata=GTsMKcpeYXAA0CvpLTirPHKdNSdlJE3RuPjCtSyWtGQ%3D&amp;reserved=0
> > > > > > >>
> > > > > > >> This series removes all the EPYC mode specific apicid changes 
> > > > > > >> and use the generic
> > > > > > >> apicid decode.    
> > > > > > > 
> > > > > > > Hi Babu,
> > > > > > >   This does simplify things a lot!
> > > > > > > One worry, what happens about a live migration of a VM from an 
> > > > > > > old qemu
> > > > > > > that was using the node-id to a qemu with this new scheme?    
> > > > > > 
> > > > > > The node_id which we introduced was only used internally. This 
> > > > > > wasn't
> > > > > > exposed outside. I don't think live migration will be an issue.    
> > > > > 
> > > > > Didn't it become part of the APIC ID visible to the guest?  
> > > > 
> > > > Daniel asked similar question wrt hard error on start up,
> > > > when CLI is not sufficient to create EPYC cpu.
> > > > 
> > > > https://www.mail-archive.com/qemu-devel@nongnu.org/msg728536.html
> > > > 
> > > > Migration might fall into the same category.
> > > > Also looking at the history, 5.0 commit 
> > > >   247b18c593ec29 target/i386: Enable new apic id encoding for EPYC 
> > > > based cpus models
> > > > silently broke APIC ID (without versioning), for all EPYC models 
> > > > (that's were 1 new and 1 old one).
> > > > 
> > > > (I'm not aware of somebody complaining about it)
> > > > 
> > > > Another commit ed78467a21459, changed CPUID_8000_001E without 
> > > > versioning as well.
> > > > 
> > > > 
> > > > With current EPYC apicid code, if all starts align (no numa or 1 numa 
> > > > node only on
> > > > CLI and no -smp dies=) it might produce a valid CPU 
> > > > (apicid+CPUID_8000_001E).
> > > > No numa is gray area, since EPYC spec implies that it has to be numa 
> > > > machine in case of real EPYC cpus.
> > > > Multi-node configs would be correct only if user assigns cpus to numa 
> > > > nodes
> > > > by duplicating internal node_id algorithm that this series removes.
> > > > 
> > > > There might be other broken cases that I don't recall anymore
> > > > (should be mentioned in previous versions of this series)
> > > > 
> > > > 
> > > > To summarize from migration pov (ignoring ed78467a21459 change):
> > > > 
> > > >  1) old qemu pre-5.0 ==>  qemu 5.0, 5.1 - broken migration  
> > > 
> > > Oh ....
> > > 
> > > >  2) with this series (lets call it qemu 5.2)
> > > >      pre-5.0 ==> qemu 5.2 - should work as series basically rollbacks 
> > > > current code to pre-5.0
> > > >      qemu 5.0, 5.1 ==> qemu 5.2 - broken
> > > > 
> > > > It's all about picking which poison to choose,
> > > > I'd preffer 2nd case as it lets drop a lot of complicated code that
> > > > doesn't work as expected.  
> > > 
> > > I think that would make our lives easier for other reasons; so I'm happy
> > > to go with that.
> > 
> > to make things less painful for users, me wonders if there is a way
> > to block migration if epyc and specific QEMU versions are used?
> 
> We have no way to block based on version - and that's a pretty painful
> thing to do; we can block based on machine type.
> 
> But before we get there; can we understand in which combinations that
> things break and why exactly - would it break on a 1 or 2 vCPU guest -
> or would it only break when we get to the point the upper bits start
> being used for example?  Why exaclty would it break - i.e. is it going
> to change the name of sections in the migration stream - or are the
> values we need actually going to migrate OK?

it's values of APIC ID, where 4.2 and 5.0 QEMU use different values
if numa is enabled.
I'd expect guest to be very confused in when this happens.

here is an example:
qemu-4.2 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3 -numa 
node,cpus=4-7

(QEMU) qom-get path=/machine/unattached/device[8] property=apic-id
{
    "return": 7
}

vs

qemu-5.1 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3 -numa 
node,cpus=4-7
(QEMU) qom-get path=/machine/unattached/device[8] property=apic-id
{
    "return": 15
}

we probably can't do anything based on machine type versions, as
4.2 and older versions on qemu-5.0 and newer use different algorithm to 
calculate apic-id.

Hence was suggestion to leave 5.0/5.1 with broken apic id and revert back to
4.2 algorithm, which should encode APIC ID correctly when '-smp dies' is used. 


> Dave
> 
> 
> > > > PS:
> > > >  I didn't review it yet, but with this series we aren't
> > > >  making up internal node_ids that should match user provided numa node 
> > > > ids somehow.
> > > >  It seems series lost the patch that would enforce numa in case -smp 
> > > > dies>1,
> > > >  but otherwise it heads in the right direction.  
> > > 
> > > Dave
> > > 
> > > > > 
> > > > > Dave
> > > > >   
> > > >   
> > 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]