qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 3/6] spapr: introduce spapr_numa_associativity_reset()


From: Daniel Henrique Barboza
Subject: Re: [PATCH v6 3/6] spapr: introduce spapr_numa_associativity_reset()
Date: Wed, 15 Sep 2021 22:32:13 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0

Greg,


On 9/14/21 16:58, Daniel Henrique Barboza wrote:


On 9/14/21 08:55, Greg Kurz wrote:
On Fri, 10 Sep 2021 16:55:36 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:


[...]


      }
@@ -167,6 +167,11 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
      int nb_numa_nodes = machine->numa_state->num_nodes;
      int i, j, max_nodes_with_gpus;
+    /* init FORM1_assoc_array */
+    for (i = 0; i < MAX_NODES + NVGPU_MAX_NUM; i++) {
+        spapr->FORM1_assoc_array[i] = g_new0(uint32_t, NUMA_ASSOC_SIZE);

Why dynamic allocation since you have fixed size ?

If I use static allocation the C compiler complains that I can't assign a
uint32_t** pointer to a uint32_t[MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE]
array type.

And given that the FORM2 array is a [MAX_NODES + NVGPU_MAX_NUM][2] array, the
way I worked around that here is to use dynamic allocation. Then C considers 
valid
to use numa_assoc_array as an uint32_t** pointer for both FORM1 and FORM2
2D arrays. I'm fairly certain that there might be a way of doing static 
allocation
and keeping the uint32_t** pointer as is, but didn't find any. Tips welcome :D

An alternative that I considered, without the need for this dynamic allocation 
hack,
is to make both FORM1 and FORM2 data structures the same size (i.e.
an [MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE] uint32_t array) and then 
numa_assoc_array
can be a pointer of the same array type for both. Since we're controlling FORM1 
and
FORM2 sizes separately inside the functions this would work. The downside is 
that
FORM2 2D array would be bigger than necessary.

I took a look and didn't find any neat way of doing a pointer switch
between 2 static arrays without either allocating dynamic mem on the
pointer and then g_memdup'ing it, or make all arrays the same size
(i.e. numa_assoc_array would also be a statically allocated array
with FORM1 size) and then memcpy() the chosen mode.

I just posted a new version in which I'm not relying on 'numa_assoc_array'.
Instead, the DT functions will use a get_associativity() to retrieve
the current associativity array based on CAS choice, in a similar
manner like it is already done with the array sizes. This also allowed
us to get rid of associativity_reset().


Thanks,


Daniel





I don't have strong opinions about which way to do it, so I'm all ears.


Thanks,


Daniel




+    }
+
      /*
       * For all associativity arrays: first position is the size,
       * position MAX_DISTANCE_REF_POINTS is always the numa_id,
@@ -177,8 +182,8 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
       * 'i' will be a valid node_id set by the user.
       */
      for (i = 0; i < nb_numa_nodes; i++) {
-        spapr->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
-        spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
+        spapr->FORM1_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
+        spapr->FORM1_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
      }
      /*
@@ -192,15 +197,15 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
      max_nodes_with_gpus = nb_numa_nodes + NVGPU_MAX_NUM;
      for (i = nb_numa_nodes; i < max_nodes_with_gpus; i++) {
-        spapr->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
+        spapr->FORM1_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
          for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
              uint32_t gpu_assoc = smc->pre_5_1_assoc_refpoints ?
                                   SPAPR_GPU_NUMA_ID : cpu_to_be32(i);
-            spapr->numa_assoc_array[i][j] = gpu_assoc;
+            spapr->FORM1_assoc_array[i][j] = gpu_assoc;
          }
-        spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
+        spapr->FORM1_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
      }
      /*
@@ -227,14 +232,33 @@ void spapr_numa_associativity_init(SpaprMachineState 
*spapr,
                                     MachineState *machine)
  {
      spapr_numa_FORM1_affinity_init(spapr, machine);
+
+    /*
+     * Default to FORM1 affinity until CAS. We'll call affinity_reset()
+     * during CAS when we're sure about which NUMA affinity the guest
+     * is going to use.
+     *
+     * This step is a failsafe - guests in the wild were able to read
+     * FORM1 affinity info before CAS for a long time. Since affinity_reset()
+     * is just a pointer switch between data that was already populated
+     * here, this is an acceptable overhead to be on the safer side.
+     */
+    spapr->numa_assoc_array = spapr->FORM1_assoc_array;

The right way to do that is to call spapr_numa_associativity_reset() from
spapr_machine_reset() because we want to revert to FORM1 each time the
guest is rebooted.

+}
+
+void spapr_numa_associativity_reset(SpaprMachineState *spapr)
+{
+    /* No FORM2 affinity implemented yet */

This seems quite obvious at this point, not sure the comment helps.

+    spapr->numa_assoc_array = spapr->FORM1_assoc_array;
  }
  void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
                                         int offset, int nodeid)
  {
+    /* Hardcode the size of FORM1 associativity array for now */
      _FDT((fdt_setprop(fdt, offset, "ibm,associativity",
                        spapr->numa_assoc_array[nodeid],
-                      sizeof(spapr->numa_assoc_array[nodeid]))));
+                      NUMA_ASSOC_SIZE * sizeof(uint32_t))));

Rather than doing this temporary change that gets undone in
a later patch, I suggest you introduce get_numa_assoc_size()
in a preliminary patch and use it here already :

-                      NUMA_ASSOC_SIZE * sizeof(uint32_t))));
+                      get_numa_assoc_size(spapr) * sizeof(uint32_t))));

This will simplify the reviewing.

  }
  static uint32_t *spapr_numa_get_vcpu_assoc(SpaprMachineState *spapr,
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 637652ad16..8a9490f0bf 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -249,7 +249,8 @@ struct SpaprMachineState {
      unsigned gpu_numa_id;
      SpaprTpmProxy *tpm_proxy;
-    uint32_t numa_assoc_array[MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE];
+    uint32_t *FORM1_assoc_array[MAX_NODES + NVGPU_MAX_NUM];

As said above, I really don't see the point in not having :

     uint32_t *FORM1_assoc_array[MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE];

+    uint32_t **numa_assoc_array;
      Error *fwnmi_migration_blocker;
  };
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
index 6f9f02d3de..ccf3e4eae8 100644
--- a/include/hw/ppc/spapr_numa.h
+++ b/include/hw/ppc/spapr_numa.h
@@ -24,6 +24,7 @@
   */
  void spapr_numa_associativity_init(SpaprMachineState *spapr,
                                     MachineState *machine);
+void spapr_numa_associativity_reset(SpaprMachineState *spapr);
  void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
  void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
                                         int offset, int nodeid);




reply via email to

[Prev in Thread] Current Thread [Next in Thread]