qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memor


From: David Hildenbrand
Subject: Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols
Date: Thu, 2 Mar 2023 10:28:44 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0

On 01.03.23 23:08, Maciej S. Szmigiero wrote:
On 1.03.2023 18:24, David Hildenbrand wrote:
(...)
With virtio-mem one can simply have per-node virtio-mem devices.

2) I'm not sure what's the overhead of having, let's say, 1 TiB backing
memory device mostly marked madvise(MADV_DONTNEED).
Like, how much memory + swap this setup would actually consume - that's
something I would need to measure.

There are some WIP items to improve that (QEMU metadata (e.g., bitmaps), KVM 
metadata (e.g., per-memslot), Linux metadata (e.g., page tables).
Memory overcommit handling also has to be tackled.

So it would be a "shared" problem with virtio-mem and will be sorted out 
eventually :)


Yes, but this might take a bit of time, especially if kernel-side changes
are involved - that's why I will check how this setup works in practice
in its current shape.

Yes, let me know if you have any question. I invested a lot of time to figure out all of the details and possible workarounds/approaches in the past.

Hyper-V actually "cleans up" the guest memory map on reboot - if the
guest was effectively resized up then on reboot the guest boot memory is
resized up to match that last size.
Similarly, if the guest was ballooned out - that amount of memory is
removed from the boot memory on reboot.

Yes, it cleans up, but as I said last time I checked there was this concept of 
startup vs. minimum vs. maximum, at least for dynamic memory:

https://www.fastvue.co/tmgreporter/blog/understanding-hyper-v-dynamic-memory-dynamic-ram/

Startup RAM would be whatever you specify for "-m xG". If you go below min, you 
remove memory via deflation once the guest is up.


That article was from 2014, so I guess it pertained Windows 2012 R2.

I remember seeing the same interface when I played with that a couple of years ago, but I don't recall which windows version i was using.


The memory settings page in more recent Hyper-V versions looks like on
the screenshot at [1].

It no longer calls that main memory amount value "Startup RAM", now it's
just "RAM".

Despite what one might think the "Enable Dynamic Memory" checkbox does
*not* control the Dynamic Memory protocol availability or usage - the
protocol is always available/exported to the guest.

What the "Enable Dynamic Memory" checkbox controls is some host-side
heuristics that automatically resize the guest within chosen bounds
based on some metrics.

Even if the "Enable Dynamic Memory" checkbox is *not* enabled the guest
can still be online-resized via Dynamic Memory protocol by simply
changing the value in the "RAM" field and clicking "Apply".

At least that's how it works on Windows 2019 with a Linux guest.

Right, I recall that that's a feature that was separately announced as explicit VM resizing, not HV dynamic memory. It uses the same underlying mechanism, yes, which is why the feature is always exposed to the VMs.

That's most probably when they performed the "Startup RAM" -> "RAM" rename, to make both features possibly co-exist and easier to configure.



So it's not exactly doing a hot-add after the guest boots.

I recall BUG reports in Linux, that we got hv-balloon hot-add requests ~1 
minute after Linux booted up, because of the above reason of startup memory [in 
these BUG reports, memory onlining was disabled and the VM would run out of 
memory because we hotplugged too much memory]. That's why I remember that this 
approach once was done.

Maybe there are multiple implementations noways. At least in QEMU you could 
chose whatever makes most sense for QEMU.


Right, it seems that the Hyper-V behavior evolved with time, too.

Yes. One could think of a split approach, that is, we never resize the initial RAM size (-m XG) from inside QEMU. Instead, we could have the following models:

(1) Basic "Startup RAM" model: always (re)boot Linux with "-m XG". On
    reboot. Once the VM comes up, we either add memory or request to
    inflate the balloon, to reach the previous guest size. Whenever the
    VM reboots, we first defrag all hv-balloon provided memory ("one
    contiguous chunk") to then "add" that memory to the VM. If the
    logical VM size <= requested, this hv-balloon memory size would be
    "0". Essentially resembling the "old" HV dynamic memory approach.

(2) Extended "Startup RAM" mode: Same as (1), but instead of hot-adding
    the RAM after the guest came up, we simply defrag the
    hv-balloon RAM during reboot ("one contiguous chunk") and expose it
    via e820/SRAT ot the guest. Going "below" startup RAM will still
    require inflation once the guest is up.

(3) External "Resize" mode: On reboot, simply shutdown the VM and notify
    libvirt. Libvirt will restart the VM with adjusted "Startup RAM".

It's fairly straight forward to extend (1) to achieve (2). That could be a sane default for QEMU. However wants (3) can simply let libvirt handle it on top without any special handling.

Internal resize mode is tricky, especially regarding migration. With sufficient motivation and problem solving one might be able to turn (1) or (2) into such a (4) mode. It would just be an implementation detail.


Note that I never considered the "go below initial RAM" and "resize initial RAM" really relevant for virtio-mem. Instead, you chose the startup size to be reasonably small (e.g., 4 GiB) and expose memory via the virtio-mem devices right at QEMU startup ("requested-size=XG"). The same approach could be applied to the hv-balloon model.

One main reason to decide against resizing significantly below 4G was, for example, that you'll end up losing valuable DMA/DMA32 memory the lower you go -- that no hotplugged memory will provide. So using inflation for everything < 4G does not sound too crazy to me, and could avoid mode (3) altogether. But again, just my thoughts.

--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]