qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EP


From: Babu Moger
Subject: RE: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EPYC
Date: Thu, 13 Aug 2020 16:10:39 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0


> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Thursday, August 13, 2020 8:56 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; ehabkost@redhat.com;
> mst@redhat.com; qemu-devel@nongnu.org; pbonzini@redhat.com;
> rth@twiddle.net
> Subject: Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for
> EPYC
> 
> On Tue, 11 Aug 2020 16:03:58 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
> > On 8/7/20 2:11 PM, Igor Mammedov wrote:
> > > On Fri, 7 Aug 2020 17:52:22 +0100
> > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > >> On Fri, Aug 07, 2020 at 11:32:51AM -0500, Babu Moger wrote:
> > >>> Adding a new check to warn the users to configure 'dies' when
> > >>> topology is numa configured. It makes it easy to build the
> > >>> topology for EPYC models.
> > >>
> > >> This says you're adding a warning....
> > >>
> > >>>
> > >>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> > >>> ---
> > >>>  hw/i386/x86.c |    7 +++++++
> > >>>  1 file changed, 7 insertions(+)
> > >>>
> > >>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c index
> > >>> 67bee1bcb8..2a6ce56ef1 100644
> > >>> --- a/hw/i386/x86.c
> > >>> +++ b/hw/i386/x86.c
> > >>> @@ -138,6 +138,13 @@ void x86_cpus_init(X86MachineState *x86ms,
> > >>> int default_cpu_version)
> > >>>
> > >>>      /* Check for apicid encoding */
> > >>>      if (cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type)) {
> > >>> +        if ((ms->numa_state->num_nodes > 0) &&
> > >>> +            ms->numa_state->num_nodes != (ms->smp.sockets * x86ms-
> >smp_dies)) {
> > >>> +            error_setg(&error_fatal, "Numa configuration requires smp 
> > >>> 'dies' "
> > >>> +                       "parameter. Configure the cpu topology properly 
> > >>> with "
> > >>> +                       "max_cpus = sockets * dies * cores * threads");
> > >>
> > >> ...but you're actually making this a fatal error, not a warning.
> > >>
> > >> I'm not sure this is really OK. Wouldn't this mean that existing
> > >> VMs deployed today, risk triggering this fatal error next time they
> > >> are booted, or live migrated.  If it is possible someone is using
> > >> such a config today, I don't think we can break it.
> > >
> > > to begin with, users shouldn't have used 'dies' with initial impl. at all.
> > > (it was Intel introduced option and EPYC's added very similar
> > > internal node_id (removed by the next patch)).
> > > Now we are trying to consolidate this mess and reuse dies for EPYC.
> > >
> > > EPYC was out in the since with 5.0 (though broken), users could
> > > start a VM with such config but that would not be correct EPYC from apicid
> and cpuid point of view.
> > > Guest OS might run if it doesn't know about EPYCs or behave wierdly
> > > (sub optimal|crash|whatever) on seeing unexpected values.
> > >
> > > If we are hell bound on keeping bugs of initial impl, then we should
> > > keep it to 5.1<= machine version and do the right thing for newer ones.
> > > Though I'm not sure we should keep broken variant around (all we
> > > would get from it is bug reports*/complains from users with end result of
> their config anyways).
> > > I'd rather error out with clear error message so user could fix their 
> > > broken
> config.
> > >
> > > *) there is at least one thread/bz on qemu-devel where users are
> > > trying to run with EPYC and pick up options combination so it would 
> > > produce
> sensible topology.
> >
> >
> > I am still not sure what is the right approach here.  I can think of
> > couple of options.
> > 1. If smp_dies != num_nodes then go ahead create the configuration
> > with as  many smp_dies and warn(but not error out) users about the mis-
> configuration.
> warning is a bad idea, that usually leads to troubles down the road.
> 
> Provided that code is relatively new and produces misconfigured CPUs and if
> nobody insists on keeping bug around, I'd try to go for erroring out.
> Yes that would break misconfigured configs but that could be fixed by
> reconfiguring on user side.

Ok. I will refresh the patches if there are no other comments. thanks

> 
> > 2. Introduce it as a fix based on  machine version(5.1 >) like Igor
> > mentioned. I am not sure how to achieve that. I can look into that.
> That's a headache for maintaing point of view, so again if nobody insist I'd
> rather avoid it.
> 
> >
> > Thanks
> > Babu
> >
> > >
> > >
> > >> Regards,
> > >> Daniel
> > >
> >




reply via email to

[Prev in Thread] Current Thread [Next in Thread]