qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode


From: Igor Mammedov
Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic decode
Date: Thu, 27 Aug 2020 22:21:10 +0200

On Wed, 26 Aug 2020 13:45:51 -0500
Babu Moger <babu.moger@amd.com> wrote:

> 
> 
> > -----Original Message-----
> > From: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Sent: Wednesday, August 26, 2020 1:34 PM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: Igor Mammedov <imammedo@redhat.com>; Daniel P. Berrangé
> > <berrange@redhat.com>; ehabkost@redhat.com; mst@redhat.com; Michal
> > Privoznik <mprivozn@redhat.com>; qemu-devel@nongnu.org;
> > pbonzini@redhat.com; rth@twiddle.net
> > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use generic
> > decode
> > 
> > * Babu Moger (babu.moger@amd.com) wrote:
> > >
> > > > -----Original Message-----
> > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > Sent: Wednesday, August 26, 2020 8:31 AM
> > > > To: Daniel P. Berrangé <berrange@redhat.com>
> > > > Cc: Moger, Babu <Babu.Moger@amd.com>; pbonzini@redhat.com;
> > > > rth@twiddle.net; ehabkost@redhat.com; qemu-devel@nongnu.org;
> > > > mst@redhat.com; Michal Privoznik <mprivozn@redhat.com>
> > > > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> > > > generic decode
> > > >
> > > > On Wed, 26 Aug 2020 13:50:59 +0100
> > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > >
> > > > > On Wed, Aug 26, 2020 at 02:38:49PM +0200, Igor Mammedov wrote:
> > > > > > On Fri, 21 Aug 2020 17:12:19 -0500 Babu Moger
> > > > > > <babu.moger@amd.com> wrote:
> > > > > >
> > > > > > > To support some of the complex topology, we introduced EPYC
> > > > > > > mode
> > > > apicid decode.
> > > > > > > But, EPYC mode decode is running into problems. Also it can
> > > > > > > become quite a maintenance problem in the future. So, it was
> > > > > > > decided to remove that code and use the generic decode which
> > > > > > > works for majority of the topology. Most of the SPECed
> > > > > > > configuration would work just fine. With some non-SPECed user
> > > > > > > inputs, it will create some sub-
> > > > optimal configuration.
> > > > > > > Here is the discussion thread.
> > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > > F%2F
> > > > > > > lore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-1d84-a6e7-e468-
> > > > d5b437c1b25
> > > > > > >
> > > >
> > 4%40amd.com%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C8a5c
> > > > 52f92
> > > > > > >
> > > >
> > 3f04082a40808d849c43d49%7C3dd8961fe4884e608e11a82d994e183d%7C0%7
> > > > C0
> > > > > > >
> > > >
> > %7C637340454473508873&amp;sdata=VnW28H1v4XwK3GaNGFxu%2BhwiMeA
> > > > YO%2B
> > > > > > > 3WAzo3DeY5Ha8%3D&amp;reserved=0
> > > > > > >
> > > > > > > This series removes all the EPYC mode specific apicid changes
> > > > > > > and use the generic apicid decode.
> > > > > >
> > > > > > the main difference between EPYC and all other CPUs is that it
> > > > > > requires numa configuration (it's not optional) so we need an
> > > > > > extra
> > > No, That is not true. Because of that assumption we made all these
> > > apicid changes. And here we are now.
> > >
> > > AMD supports varies mixed configurations. In case of EPYC-Rome, we
> > > have NPS1, NPS2 and NPS4(Numa Nodes per socket). In case of NPS1,
> > > basically we have all the cores in a socket under one numa node. This
> > > is non-numa configuration.
> > > Looking at the various configurations and also discussing internally,
> > > it is not advisable to have (epyc && !numa) check.
> > 
> > Indeed on real hardware, I don't think we always see NUMA; my single socket,
> > 16 core/32 thread 7302P Dell box, shows the kernel printing 'No NUMA
> > configuration found...Faking a node.'
looks like firmware bug or maybe it's feature and there is a knob in fw
to turn it on/off in case used OS doesn't like it for some reason.


> > So if real hardware hasn't got a NUMA node, what's the real problem?
> 
> I don't see any problem once we revert all these changes(patch 1-7).
> We don't need if (epyc && !numa) error check or auto_enable_numa=true
> unconditionally.

We need revert to unbreak migration from QEMU < 5.0,
everything else (fixes for CPUID_Fn8000001E) could go on top.

So what's on top (because old code also wasn't correct when
CPUID_Fn8000001E is taken in account, tha's why we are at this point),

When starting QEMU without -numa
Indeed we can skip "if (epyc && !numa) error check or auto_enable_numa=true",
in case where there is 1 die (NPS1).
(1) User however may set core/threads number bigger than possible by spec,
    in which case CPUID_Fn8000001E_EBX/CPUID_Fn8000001E_ECX will not be
    valid spec vise and could trigger OPPs in guest kernel.
    Given we allow go out of spec, perhaps we should add a warning at
    realize time saying that used -smp config is not supported since it
    doesn't match AMD EPYC spec and might not work.

(2) Earlier we agreed that we can reuse existing die_id instead of internal
    (topo_ids.node_id in current code)
    (It's is called DIE_ID and NODE ID in spec interchangeably)
    Same as (1) add a warning when '-smp dies' goes beyond spec limits.
    
(3) "-smp dies>1" ''if'' we allow to run it without -numa,
    then system wide NUMA node id in CPUID_Fn8000001E_ECX probably doesn't 
matter.
    could be something like in spec but taking in account die offset, to produce
    unique id.

    Same, add a warning that there are more than 1 dies but numa is not enabled,
    suggest to enable numa.

    With current code it produces invalid APIC ID for valid '-smp' combination,
    however if we revert it and switch to die_id than it should produce
    valid APIC ID once again (as in 4.2).
    Given it produces invalid APIC id, maybe we should just ditch the case and
    fold it in (4) (i.e. require -numa if "-smp dies>1")

(4) -numa is used (RHBZ1728166)
    we need to ensure that socket*dies == ms->numa_state->num_nodes
     and make sure that CPUID_Fn8000001E_ECX consistent with
    cpu mapping provided with "-numa cpu=" option.

Warnings won't help a lot, but at least they will point out at
possible problem when someone complains.

> > 
> > Dave
> > 
> > > > > > patch on top of this series to enfoce that, i.e:
> > > > > >
> > > > > >  if (epyc && !numa)
> > > > > >     error("EPYC cpu requires numa to be configured")
> > > > >
> > > > > Please no. This will break 90% of current usage of the EPYC CPU in
> > > > > real world QEMU deployments. That is way too user hostile to
> > > > > introduce as a requirement.
> > > > >
> > > > > Why do we need to force this ?  People have been successfuly using
> > > > > EPYC CPUs without NUMA in QEMU for years now.
> > > > >
> > > > > It might not match behaviour of bare metal silicon, but that
> > > > > hasn't obviously caused the world to come crashing down.
> > > > So far it produces warning in linux kernel (RHBZ1728166), (resulting
> > > > performance might be suboptimal), but I haven't seen anyone reporting
> > crashes yet.
> > > >
> > > >
> > > > What other options do we have?
> > > > Perhaps we can turn on strict check for new machine types only, so
> > > > old configs can keep broken topology (CPUID), while new ones would
> > > > require -numa and produce correct topology.
> > > >
> > > >
> > > > >
> > > > > Regards,
> > > > > Daniel
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]