qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt


From: Andrew Jones
Subject: Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
Date: Fri, 30 Apr 2021 12:48:43 +0200

On Fri, Apr 30, 2021 at 04:59:36PM +0800, wangyanan (Y) wrote:
> 
> On 2021/4/30 14:41, Andrew Jones wrote:
> > On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
> > > Hi Drew,
> > > 
> > > On 2021/4/29 19:02, Andrew Jones wrote:
> > > > On Thu, Apr 29, 2021 at 04:56:06PM +0800, wangyanan (Y) wrote:
> > > > > On 2021/4/29 15:16, Andrew Jones wrote:
> > > > > > On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
> > > > > > > On 2021/4/28 18:31, Andrew Jones wrote:
> > > > > > > > On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
> > > > > > > > >              } else if (sockets == 0) {
> > > > > > > > >                  threads = threads > 0 ? threads : 1;
> > > > > > > > > -            sockets = cpus / (cores * threads);
> > > > > > > > > +            sockets = cpus / (clusters * cores * threads);
> > > > > > > > >                  sockets = sockets > 0 ? sockets : 1;
> > > > > > > > If we initialize clusters to zero instead of one and add lines 
> > > > > > > > in
> > > > > > > > 'cpus == 0 || cores == 0' and 'sockets == 0' like
> > > > > > > > 'clusters = clusters > 0 ? clusters : 1' as needed, then I 
> > > > > > > > think we can
> > > > > > > > add
> > > > > > > > 
> > > > > > > >      } else if (clusters == 0) {
> > > > > > > >          threads = threads > 0 ? threads : 1;
> > > > > > > >          clusters = cpus / (sockets * cores * thread);
> > > > > > > >          clusters = clusters > 0 ? clusters : 1;
> > > > > > > >      }
> > > > > > > > 
> > > > > > > > here.
> > > > > > > I have thought about this kind of format before, but there is a 
> > > > > > > little bit
> > > > > > > difference between these two ways. Let's chose the better and more
> > > > > > > reasonable one of the two.
> > > > > > > 
> > > > > > > Way A currently in this patch:
> > > > > > > If value of clusters is not explicitly specified in -smp command 
> > > > > > > line, we
> > > > > > > assume
> > > > > > > that users don't want to support clusters, for compatibility we 
> > > > > > > initialized
> > > > > > > the
> > > > > > > value to 1. So that with cmdline "-smp cpus=24, sockets=2, 
> > > > > > > cores=6", we will
> > > > > > > parse out the topology description like below:
> > > > > > > cpus=24, sockets=2, clusters=1, cores=6, threads=2
> > > > > > > 
> > > > > > > Way B that you suggested for this patch:
> > > > > > > Whether value of clusters is explicitly specified in -smp command 
> > > > > > > line or
> > > > > > > not,
> > > > > > > we assume that clusters are supported and calculate the value. So 
> > > > > > > that with
> > > > > > > cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the 
> > > > > > > topology
> > > > > > > description like below:
> > > > > > > cpus =24, sockets=2, clusters=2, cores=6, threads=1
> > > > > > > 
> > > > > > > But I think maybe we should not assume too much about what users 
> > > > > > > think
> > > > > > > through the -smp command line. We should just assume that all 
> > > > > > > levels of
> > > > > > > cpu topology are supported and calculate them, and users should 
> > > > > > > be more
> > > > > > > careful if they want to get the expected results with not so 
> > > > > > > complete
> > > > > > > cmdline.
> > > > > > > If I'm right, then Way B should be better. :)
> > > > > > > 
> > > > > > Hi Yanan,
> > > > > > 
> > > > > > We're already assuming the user wants to describe clusters to the 
> > > > > > guest
> > > > > > because we require at least one per socket. If we want the user to 
> > > > > > have a
> > > > > > choice between using clusters or not, then I guess we need to 
> > > > > > change the
> > > > > > logic in the PPTT and the cpu-map to only generate the cluster 
> > > > > > level when
> > > > > > the number of clusters is not zero. And then change this parser to 
> > > > > > not
> > > > > > require clusters at all.
> > > > > Hi Drew,
> > > > > 
> > > > > I think this kind of change will introduce more complexity and 
> > > > > actually is
> > > > > not necessary.
> > > > > Not generating cluster level at all and generating cluster level (one 
> > > > > per
> > > > > socket) are same
> > > > > to kernel. Without cluster description provided, kernel will 
> > > > > initialize all
> > > > > cores in the same
> > > > > cluster which also means one cluster per socket.
> > > > Which kernel? All kernels? One without the cluster support patches [1]?
> > > > 
> > > > [1] 
> > > > https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
> > > I'm sorry, I didn't make it clear. :)
> > > I actually mean the ARM64 kernel, with or without [1].
> > > 
> > > Without [1]: Kernel doesn't care about cluster. When populating cpu
> > > topology, it directly
> > > finds the hierarchy node with "physical package flag" as package when
> > > parsing ACPI, and
> > > finds the core node's parent as package when parsing DT. So even we 
> > > provide
> > > cluster level
> > > description (one per socket), the parsing results will be the same as not
> > > providing at all.
> > > 
> > > With [1]: Kernel finds the core hierarchy node's parent as cluster when
> > > parsing ACPI. So if
> > > we don't provide cluster level description, package will be taken as
> > > cluster. And if we provide
> > > the description (one per socket), the parsing result will also be the 
> > > same.
> > > 
> > > That's why I said that we just need to provide description of cluster (one
> > > per socket) if we
> > > don't want to make use of it in VMs.
> > OK, that sounds good.
> > 
> > > [1] 
> > > https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
> > > > > So we should only ensure value of clusters per socket is one if we 
> > > > > don't
> > > > > want to use clusters,
> > > > > and don't need to care about whether or not to generate description 
> > > > > in PPTT
> > > > > and cpu-map.
> > > > > Is this right?
> > > > Depends on your answer to my 'which kernel' questions.
> > > > 
> > > > > > I'm not a big fan of these auto-calculated values either, but the
> > > > > > documentation says that it'll do that, and it's been done that way
> > > > > > forever, so I think we're stuck with it for the -smp option. Hmm, I 
> > > > > > was
> > > > > > just about to say that x86 computes all its values, but I see the 
> > > > > > most
> > > > > > recently added one, 'dies', is implemented the way you're proposing 
> > > > > > we
> > > > > > implement 'clusters', i.e. default to one and don't calculate it 
> > > > > > when it's
> > > > > > missing. I actually consider that either a documentation bug or an 
> > > > > > smp
> > > > > > parsing bug, though.
> > > > > My propose originally came from implementation of x86.
> > > > > > Another possible option, for Arm, because only the cpus and maxcpus
> > > > > > parameters of -smp have ever worked, is to document, for Arm, that 
> > > > > > if even
> > > > > > one parameter other than cpus or maxcpus is provided, then all 
> > > > > > parameters
> > > > > > must be provided. We can still decide if clusters=0 is valid, but 
> > > > > > we'll
> > > > > > enforce that everything is explicit and that the product (with or 
> > > > > > without
> > > > > > clusters) matches maxcpus.
> > > > > Requiring every parameter explicitly will be most stable but indeed 
> > > > > strict.
> > > > > 
> > > > > Currently all the parsers use way B to calculate value of thread if 
> > > > > it is
> > > > > not provided explicitly.
> > > > > So users should ensure the -smp cmdline they provided can result in 
> > > > > that
> > > > > parsed threads will
> > > > > be 1 if they don't want to support multiple threads in one core.
> > > > > 
> > > > > Very similar to thread, users should also ensure the provided cmdline 
> > > > > can
> > > > > result in that parsed
> > > > > clusters will be 1 if they don't want to support multiple clusters in 
> > > > > one
> > > > > socket.
> > > > > 
> > > > > So I'm wondering if we can just add some commit in the documentation 
> > > > > to tell
> > > > > users that they
> > > > > should ensure this if they don't want support it. And as for 
> > > > > calculation of
> > > > > clusters, we follow
> > > > > the logic of other parameters as you suggested in way B.
> > > > > 
> > > > > Thanks,
> > > > > Yanan
> > > > > > Requiring every parameter might be stricter than necessary, though, 
> > > > > > I
> > > > > > think we're mostly concerned with cpus/maxcpus, sockets, and cores.
> > > > > > clusters can default to one or zero (whatever we choose and 
> > > > > > document),
> > > > > > threads can default to one, and cpus can default to maxcpus or 
> > > > > > maxcpus can
> > > > > > default to cpus, but at least one of those must be provided. And, if
> > > > > > sockets are provided, then cores must be provided and vice versa. If
> > > > > > neither sockets nor cores are provided, then nothing else besides 
> > > > > > cpus and
> > > > > > maxcpus may be provided, and that would mean to not generate any 
> > > > > > topology
> > > > > > descriptions for the guest.
> > > > I still don't know. I think I prefer making -smp more strict (even for
> > > > other architectures, but that's more difficult to do than for Arm.) 
> > > > What I
> > > > wrote above isn't that bad. We only require one of cpus or maxcpus 
> > > > (pretty
> > > > much everybody already does that anyway), and then, if given sockets
> > > > or cores, the other will also be required. I assume anybody who bothers 
> > > > to
> > > > specify one or the other would already specify both anyway.
> > > I agree to make -smp more strict. We want to expose the cpu topology
> > > information
> > > to guest kernel, so that it can take advantage of the information for 
> > > better
> > > scheduling.
> > >  From this point of view, we hope the topology descriptions are accurately
> > > specified
> > > but not automatically populated.
> > > 
> > > But I think the requirement for ARM "if even one parameter other than cpus
> > > or maxcpus
> > > is provided then all parameters must be provided" will be better. This can
> > > ensure the
> > > whole accurate users-specified topology. As you mentioned, if anybody who
> > > bothers
> > > to specify one, why not also specify the others.
> > > 
> > > I can add the requirement for ARM in the documentation, and also check the
> > > parameters
> > > in virt_smp_parse. Will this be fine?
> > We sort of have to support command lines that are missing 'maxcpus' and
> > 'clusters', unless we work together with libvirt to make the change.
> > Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
> > from '<vcpu placement='static'>16</vcpu>'.
> I see. And libvirt currently doesn't support cluster in xml, which means
> we can not generate complete cmdlines with cluster in it through
> <topology ...> specification in xml.
> > That's sufficient for our
> > stricter, but not completely strict requirements. And, I still think
> > 'threads' could be optional, because there's a good chance the user
> > doesn't want to describe them, so a default of 1 is good enough.
> So the parsing logic can be repeated like this:
> We require at least one of cpus and maxcpus specified explicitly, and
> default
> cluster/thread to 1 if not explicitly specified. And require both of sockets
> and
> cores if one of them is specified.
> 
> This is consistent with what you mentioned yesterday.
>

Yup, I think this should work for both compatibility concerns, concerns
with bad assumptions, and with supporting users which would like more
terse command lines.

Thanks,
drew




reply via email to

[Prev in Thread] Current Thread [Next in Thread]