Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memor

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memor

From:	David Hildenbrand
Subject:	Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols
Date:	Wed, 1 Mar 2023 18:24:28 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0


The idea would seem reasonable, but: (there's always some "but")
1) Once we implement NUMA support we'd probably need multiple
TYPE_MEMORY_DEVICEs anyway, since it seems one memdev can sit on only
one NUMA node,

Not necessarily. You could extend the hv-balloon device to have onememslot for each NUMA node. Of course, once again, you have to planahead how to distribute memory across NUMA nodes (same with virtio-mem).

Having that said, last time I checked, HV dynamic memory wasforce-disabled when enabling vNUMA under HV. Simply because ballooninflation is not NUMA aware.

With virtio-mem one can simply have per-node virtio-mem devices.

2) I'm not sure what's the overhead of having, let's say, 1 TiB backing
memory device mostly marked madvise(MADV_DONTNEED).
Like, how much memory + swap this setup would actually consume - that's
something I would need to measure.

There are some WIP items to improve that (QEMU metadata (e.g., bitmaps),KVM metadata (e.g., per-memslot), Linux metadata (e.g., page tables).

Memory overcommit handling also has to be tackled.

So it would be a "shared" problem with virtio-mem and will be sorted outeventually :)


3) In a public cloud environment malicious guests are a possibility.
Currently (without things like resizable memslots) the best idea I tried
was to place the whole QEMU process into a memory-limited cgroup
(limited to the guest target size).

Yes. Protection of unplugged memory is on my TODO list for virtio-mem aswell, to avoid having to rely on cgroups.


There are still some issues with it: one needs to reserve swap space up
to the guest maximum size so the QEMU process doesn't get OOM-killed if
guest touches that memory and the cgroup memory controller for some
reason seems to start swapping even before reaching its limit (that's
still under investigation why).


Yes, putting a memory cap on Linux was always tricky.

Reboot? Logically unplug all memory and as the guest boots up, re-add the 
memory after the guest booted up.

The only thing we can't do is the following: when going below 4G, we cannot 
resize boot memory.


But I recall that that's *exactly* how the HV version I played with ~2 years ago worked: always 
start up with some initial memory ("startup memory"). After the VM is up for some 
seconds, we either add more memory (requested > startup) or request the VM to inflate memory 
(requested < startup).


Hyper-V actually "cleans up" the guest memory map on reboot - if the
guest was effectively resized up then on reboot the guest boot memory is
resized up to match that last size.
Similarly, if the guest was ballooned out - that amount of memory is
removed from the boot memory on reboot.

Yes, it cleans up, but as I said last time I checked there was thisconcept of startup vs. minimum vs. maximum, at least for dynamic memory:


https://www.fastvue.co/tmgreporter/blog/understanding-hyper-v-dynamic-memory-dynamic-ram/

Startup RAM would be whatever you specify for "-m xG". If you go belowmin, you remove memory via deflation once the guest is up.


So it's not exactly doing a hot-add after the guest boots.

I recall BUG reports in Linux, that we got hv-balloon hot-add requests~1 minute after Linux booted up, because of the above reason of startupmemory [in these BUG reports, memory onlining was disabled and the VMwould run out of memory because we hotplugged too much memory]. That'swhy I remember that this approach once was done.

Maybe there are multiple implementations noways. At least in QEMU youcould chose whatever makes most sense for QEMU.

This approach (of resizing the boot memory) also avoids problems if the
guest loses hot-add / ballooning capability after a reboot - for example,
rebooting into a Linux guest from Windows with hv-balloon.

TBH, I wouldn't be too concerned about that scenario ("hotplugged memoryto a guest, guest reboots into a weird OS, weird OS isn't able to usehotplugged memory). For virtio-mem, the important part was that youalways "know" how much memory the VM is aware about. If you always startwith "Startup memory" and hotadd later (only if you detected guestsupport after a bootup), you can handle that scenario.


But unfortunately such resizing the guest boot memory seems not trivial
to implement in QEMU.

Yes, avoiding changing memory layout to keep memory migration feasiblewas another thing I considered when designing virtio-mem.

Anyhow, I'm just throwing out ideas here on how to eventually handle itdifferently.


--
Thanks,

David / dhildenb

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols, Maciej S. Szmigiero, 2023/03/01
- Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols, David Hildenbrand <=
  - Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols, Maciej S. Szmigiero, 2023/03/01
    - Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols, David Hildenbrand, 2023/03/02

Prev by Date: Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
Next by Date: Re: [PATCH v2 1/4] util/cacheflush: fix cache on windows-arm64
Previous by thread: Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols
Next by thread: Re: [PATCH][RESEND v3 1/3] hapvdimm: add a virtual DIMM device for memory hot-add protocols
Index(es):
- Date
- Thread