qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] mem/x86: add processor address space check for VM memory


From: David Hildenbrand
Subject: Re: [PATCH] mem/x86: add processor address space check for VM memory
Date: Tue, 12 Sep 2023 17:34:42 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

[...]

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 54838c0c41..d187890675 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -908,9 +908,12 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, 
uint64_t pci_hole64_size)
{
     X86CPU *cpu = X86_CPU(first_cpu);

-    /* 32-bit systems don't have hole64 thus return max CPU address */
-    if (cpu->phys_bits <= 32) {
-        return ((hwaddr)1 << cpu->phys_bits) - 1;
+    /*
+     * 32-bit systems don't have hole64, but we might have a region for
+     * memory hotplug.
+     */
+    if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
+        return pc_pci_hole64_start() - 1;

Ok this is very confusing! I am looking at pc_pci_hole64_start() function. I 
have a few questions …
(a) pc_get_device_memory_range() returns the size of the device memory as the 
difference between ram_size and maxram_size. But from what I understand, 
ram_size is the actual size of the ram present and maxram_size is the max size 
of ram *after* hot plugging additional memory. How can we assume that the 
additional available space is already occupied by hot plugged memory?

Let's take a look at an example:

$ ./build/qemu-system-x86_64 -m 8g,maxmem=16g,slots=1 \
  -object memory-backend-ram,id=mem0,size=1g \
  -device pc-dimm,memdev=mem0 \
  -nodefaults -nographic -S -monitor stdio

(qemu) info mtree
...
memory-region: system
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-00000000bfffffff (prio 0, ram): alias ram-below-4g @pc.ram 
0000000000000000-00000000bfffffff
    0000000000000000-ffffffffffffffff (prio -1, i/o): pci
      00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
      00000000000e0000-00000000000fffff (prio 1, rom): alias isa-bios @pc.bios 
0000000000020000-000000000003ffff
      00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
    00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 
00000000000a0000-00000000000bffff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 
00000000000c0000-00000000000c3fff
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 
00000000000c4000-00000000000c7fff
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 
00000000000c8000-00000000000cbfff
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 
00000000000cc000-00000000000cffff
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 
00000000000d0000-00000000000d3fff
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 
00000000000d4000-00000000000d7fff
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 
00000000000d8000-00000000000dbfff
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 
00000000000dc000-00000000000dffff
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 
00000000000e0000-00000000000e3fff
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 
00000000000e4000-00000000000e7fff
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 
00000000000e8000-00000000000ebfff
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 
00000000000ec000-00000000000effff
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 
00000000000f0000-00000000000fffff
    00000000fec00000-00000000fec00fff (prio 0, i/o): ioapic
    00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
    00000000fee00000-00000000feefffff (prio 4096, i/o): apic-msi
    0000000100000000-000000023fffffff (prio 0, ram): alias ram-above-4g @pc.ram 
00000000c0000000-00000001ffffffff
    0000000240000000-000000047fffffff (prio 0, i/o): device-memory
      0000000240000000-000000027fffffff (prio 0, ram): mem0


We requested 8G of boot memory, which is split between "<4G" memory and ">=4G" 
memory.

We only place exactly 3G (0x0->0xbfffffff) under 4G, starting at address 0.

We leave the remainder (1G) of the <4G addresses available for I/O devices 
(32bit PCI hole).

So we end up with 5G (0x100000000->0x23fffffff) of memory starting exactly at 
address 4G.

"maxram_size - ram_size"=8G is the maximum amount of memory you can hotplug. We 
use it to size the
"device-memory" region:

0x47fffffff - 0x240000000+1 = 0x240000000
-> 9 GiB

We requested a to hotplug a maximum of "8 GiB", and sized the area slightly 
larger to allow for some flexibility
when it comes to placing DIMMs in that "device-memory" area.

We place that area for memory devices after the RAM. So it starts after the 5G of 
">=4G" boot memory.


Long story short, based on the initial RAM size and the maximum RAM size, you
can construct the layout above and exactly know
a) How much memory is below 4G, starting at address 0 -> leaving 1G for the 
32bit PCI hole
b) How much memory is above 4G, starting at address 4g.
c) Where the region for memory devices starts (aligned after b) ) and how big 
it is.
d) Where the 64bit PCI hole is (after c) )

(b) Another question is, in pc_pci_hole64_start(), why are we adding this size 
to the start address?

} else if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) {
        pc_get_device_memory_range(pcms, &hole64_start, &size);
         if (!pcmc->broken_reserved_end) {
             hole64_start += size;

The 64bit PCI hole starts after "device-memory" above.

Apparently, we have to take care of some layout issues before QEMU 2.5. You can 
assume that nowadays,
"pcmc->broken_reserved_end" is never set. So the PCI64 hole is always after the 
device-memory region.


I think this is trying to put the hole after the device memory. But if the ram 
size is <=maxram_size then the hole is after the above_4G memory? Why?

I didn't quit get what the concern is, can you elaborate?


(c) in your above change, what does long mode have anything to do with all of 
this?

According to my understanding, 32bit (i386) doesn't have a 64bit hole. And 
32bit vs.
64bit (i386 vs. x86_64) is decided based on LM, not on the address bits (as we 
learned, PSE36, and PAE).

But really, I just did what x86_cpu_realizefn() does to decide 32bit vs. 64bit 
;)

    /* For 64bit systems think about the number of physical bits to present.
     * ideally this should be the same as the host; anything other than matching
     * the host can cause incorrect guest behaviour.
     * QEMU used to pick the magic value of 40 bits that corresponds to
     * consumer AMD devices but nothing else.
     *
     * Note that this code assumes features expansion has already been done
     * (as it checks for CPUID_EXT2_LM), and also assumes that potential
     * phys_bits adjustments to match the host have been already done in
     * accel-specific code in cpu_exec_realizefn.
     */
    if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
    ...
    } else {
        /* For 32 bit systems don't use the user set value, but keep
         * phys_bits consistent with what we tell the guest.
         */
    ...


But that was just my quick attempt at fixing pc_max_used_gpa().

*Maybe* there is a 64bit PCI hole on 32bit i386 with 36bit addresses?

I'm the wrong person to ask, but I kind-of doubt it. :)


--
Cheers,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]