qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO r


From: Gavin Shan
Subject: Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions
Date: Fri, 5 Aug 2022 18:36:19 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0

Hi Eric,

On 8/4/22 5:19 PM, Eric Auger wrote:
On 8/4/22 04:47, Gavin Shan wrote:
On 8/3/22 10:52 PM, Eric Auger wrote:
On 8/3/22 15:02, Gavin Shan wrote:
On 8/3/22 5:01 PM, Marc Zyngier wrote:
On Wed, 03 Aug 2022 04:01:04 +0100,
Gavin Shan <gshan@redhat.com> wrote:
On 8/2/22 7:41 PM, Eric Auger wrote:
On 8/2/22 08:45, Gavin Shan wrote:
There are 3 highmem IO regions as below. They can be disabled in
two situations: (a) The specific region is disabled by user. (b)
The specific region doesn't fit in the PA space. However, the base
address and highest_gpa are still updated no matter if the region
is enabled or disabled. It's incorrectly incurring waste in the PA
space.
If I am not wrong highmem_redists and highmem_mmio are not user
selectable

Only highmem ecam depends on machine type & ACPI setup. But I would
say
that in server use case it is always set. So is that optimization
really
needed?

There are two other cases you missed.

- highmem_ecam is enabled after virt-2.12, meaning it stays disabled
     before that.

I don't get this. The current behaviour is to disable highmem_ecam if
it doesn't fit in the PA space. I can't see anything that enables it
if it was disabled the first place.


There are several places or conditions where vms->highmem_ecam can be
disabled:

- virt_instance_init() where vms->highmem_ecam is inherited from
    !vmc->no_highmem_ecam. The option is set to true after virt-2.12
    in virt_machine_2_12_options().

- machvirt_init() where vms->highmem_ecam can be disable if we have
    32-bits vCPUs and failure on loading firmware.

- Another place is where we're talking about. It's address assignment
    to fit the PA space.


- The high memory region can be disabled if user is asking large
     (normal) memory space through 'maxmem=' option. When the
requested
     memory by 'maxmem=' is large enough, the high memory regions are
     disabled. It means the normal memory has higher priority than
those
     high memory regions. This is the case I provided in (b) of the
     commit log.

Why is that a problem? It matches the expected behaviour, as the
highmem IO region is floating and is pushed up by the memory region.


Eric thought that VIRT_HIGH_GIC_REDIST2 and VIRT_HIGH_PCIE_MMIO regions
aren't user selectable. I tended to explain why it's not true.
'maxmem='
can affect the outcome. When 'maxmem=' value is big enough, there
will be
no free area in the PA space to hold those two regions.


In the commit log, I was supposed to say something like below for
(a):

- The specific high memory region can be disabled through changing
     the code by user or developer. For example, 'vms->highmem_mmio'
     is changed from true to false in virt_instance_init().

Huh. By this principle, the user can change anything. Why is it
important?


Still like above. I was explaining the possible cases where those
3 switches can be turned on/off by users or developers. Our code
needs to be consistent and comprehensive.

    vms->highmem_redists
    vms->highmem_ecam
    vms->mmio



Improve address assignment for highmem IO regions to avoid the
waste
in the PA space by putting the logic into virt_memmap_fits().

I guess that this is what I understand the least. What do you mean by
"wasted PA space"? Either the regions fit in the PA space, and
computing their addresses in relevant, or they fall outside of it and
what we stick in memap[index].base is completely irrelevant.


It's possible that we run into the following combination. we should
have enough PA space to enable VIRT_HIGH_PCIE_MMIO region. However,
the region is disabled in the original implementation because
VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} regions consumed 1GB, which is
unecessary and waste in the PA space.
each region's base is aligned on its size.

Yes.


      static MemMapEntry extended_memmap[] = {
          [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
          [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
          [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
so anyway MMIO is at least at 512GB. Having a 1TB IPA space does not
imply any amount of RAM. This depends on the address space.
I    };

Yes. Prior to the start of system memory, there is 1GB used by
various regions either.
yes


      IPA_LIMIT           = (1UL << 40)
      '-maxmem'           = 511GB              /* Memory starts from
1GB */
      '-slots'            = 0
      vms->highmem_rdist2 = false
How can this happen? the only reason for highmem_redists to be reset is
if it does not fit into map_ipa. So if mmio fits, highmem_redists does
too. What do I miss?

The example is having "vms->highmem_rdist2 = flase" BEFORE the address
assignment, it's possible that developer changes the code to disable
it intentionally. The point is the original implementation isn't
comprehensive
because it has the wrong assumption that vms->highmem_{rdist2, ecam,
mmio} all
true before the address assignment. With the wrong assumption, the
base address
is always increased, even the previous region is disabled, during the
address assignment in virt_set_memmap().
Yes we currently always provision space for those functionalities,
independently on whether they are used. But that's true for many other
regions in the address map (although smaller) ;-)

Yep :)



      vms->highmem_ecam   = false
      vms->highmem_mmio   = true
I am still skeptic about the relevance of such optim. Isn't it sensible
to expect the host to support large IPA if you want to use such amount
of memory.
I think we should rather better document the memory map from a user
point of view.
Besides if you change the memory map, you need to introduce yet another
option to avoid breaking migration, no?


These 3 high memory regions consumes 513GB with alignment considered when
all of them are enabled. The point is those 3 high memory regions, or
part
of them can be squeezed or smashed to accommodate '-maxmem' by design. I
think there are two options I can think of. I personally prefer to keep
the capability. With this, users gain broader range for their '-maxmem'.
Please let me know your preference.

- Revert the capability of squeezing or smashing those high memory
regions
   to accommodate '-maxmem'. It means vms->high_{redist2, ecam, mmio}
can't
   be disable on conflicts to the PA space.

- Keep the capability, with this optimization applied to make the
implementation
   comprehensive.

I think it's worthy to add something about this limitation in
doc/system/arm/virt.rst.
indeed that's worth in any case.

I don't think I need introduce another option to avoid migration
breakage.
Could you explain why I need the extra option?
I think you do. Before and after this patch the QEMU memory regions
associated to those devices won't be a the same location in the memory
so if you migrate from an old version to a newer one, the guest won't be
able to access them

OK I have given my own opinion about those potential changes. Let's wait
for others' ;-)


Thank you for your comments. Yeah, I would hold to post v2 to collect
more comments :)

I'm still don't understand how it's affecting migration. If I understand
correct, the changed address based doesn't affect migration, as explained
like below. It took me while to looking the source code to figure out
how VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} is linked to GIC and PCIe host and
migration.

For VIRTH_HIGH_GIC_REDIST2 region, it's used by TYPE_ARM_GICV3 or 
TYPE_KVM_ARM_GICV3.
TYPE_KVM_ARM_GICV3. For both cases, we do NOT migrate the region directly. 
Instead,
the registers are saved to GICv3CPUState. GICv3CPUState is migrate and the 
registers
are restored from the instance.

For VIRT_HIGH_PCIE_ECAM, the registers are saved to PCIDevice::config. The
buffer is migrated and PCI's config space is restored from it. In 
hw/net/e1000e.c,
e1000e_vmstate has something like below embedded:

    VMSTATE_PCI_DEVICE(parent_obj, E1000EState),

---

I also did one experiment by having different address bases on the source and
destination. Migration succeeds.

address regions from source
---------------------------
virt_memmap_fits: HIGH_GIC_REDIST2: [0000004000000000  0000000004000000]
virt_memmap_fits: HIGH_GIC_ECAM:    [0000004010000000  0000000010000000]
virt_memmap_fits: HIGH_GIC_MMIO:    [0000008000000000  0000008000000000]

address regions from destination
--------------------------------
virt_memmap_fits: HIGH_GIC_REDIST2: [0000004040000000  0000000004000000]
virt_memmap_fits: HIGH_GIC_ECAM:    [0000004050000000  0000000010000000]
virt_memmap_fits: HIGH_GIC_MMIO:    [0000008000000000  0000008000000000]





Signed-off-by: Gavin Shan <gshan@redhat.com>
---
     hw/arm/virt.c | 54
+++++++++++++++++++++++++++++----------------------
     1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9633f822f3..bc0cd218f9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1688,6 +1688,34 @@ static uint64_t
virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
         return arm_cpu_mp_affinity(idx, clustersz);
     }
     +static void virt_memmap_fits(VirtMachineState *vms, int index,
+                             bool *enabled, hwaddr *base, int
pa_bits)
+{
+    hwaddr size = extended_memmap[index].size;
+
+    /* The region will be disabled if its size isn't given */
+    if (!*enabled || !size) {
In which case do you have !size?

Yeah, we don't have !size and the condition should be removed.

+        *enabled = false;
+        vms->memmap[index].base = 0;
+        vms->memmap[index].size = 0;
It looks dangerous to me to reset the region's base and size like
that.
for instance fdt_add_gic_node() will add dummy data in the dt.

I would guess you missed that the high memory regions won't be
exported
through device-tree if they have been disabled. We have a check
there,
which is "if (nb_redist_regions == 1)"

+        return;
+    }
+
+    /*
+     * Check if the memory region fits in the PA space. The
memory map
+     * and highest_gpa are updated if it fits. Otherwise, it's
disabled.
+     */
+    *enabled = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits));
using a 'fits' local variable would make the code more obvious I
think

Lets confirm if you're suggesting something like below?

           bool fits;

           fits = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits));

           if (fits) {
              /* update *base, memory mapping, highest_gpa */
           } else {
              *enabled = false;
           }

I guess we can simply do

           if (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)) {
              /* update *base, memory mapping, highest_gpa */
           } else {
              *enabled = false;
           }

Please let me know which one looks best to you.

Why should the 'enabled' flag be updated by this function, instead of
returning the value and keeping it as an assignment in the caller
function? It is purely stylistic though.


The idea to put the logic, address assignment for those 3 high memory
regions or updating the flags (vms->high_redist2/ecam/mmio), to the
newly introduced function, to make virt_set_memmap() a bit simplified.
Eric tends to agree the changes with minor adjustments. So I'm going
to keep it as of being, which doesn't mean the stylistic thought is
a bad one :)

Lastly, I need to rewrite the comit log completely because it seems
causing confusions to Eric and you.



Thanks,
Gavin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]