qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memo


From: David Hildenbrand
Subject: Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU
Date: Thu, 30 Sep 2021 12:33:30 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 30.09.21 11:40, david.dai wrote:
On Wed, Sep 29, 2021 at 11:30:53AM +0200, David Hildenbrand (david@redhat.com) 
wrote:

On 27.09.21 14:28, david.dai wrote:
On Mon, Sep 27, 2021 at 11:07:43AM +0200, David Hildenbrand (david@redhat.com) 
wrote:

CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and know the
content is safe.


On 27.09.21 10:27, Stefan Hajnoczi wrote:
On Sun, Sep 26, 2021 at 10:16:14AM +0800, David Dai wrote:
Add a virtual pci to QEMU, the pci device is used to dynamically attach memory
to VM, so driver in guest can apply host memory in fly without virtualization
management software's help, such as libvirt/manager. The attached memory is

We do have virtio-mem to dynamically attach memory to a VM. It could be
extended by a mechanism for the VM to request more/less memory, that's
already a planned feature. But yeah, virito-mem memory is exposed as
ordinary system RAM, not only via a BAR to mostly be managed by user space
completely.

There is a virtio-pmem spec proposal to expose the memory region via a PCI
BAR. We could do something similar for virtio-mem, however, we would have to
wire that new model up differently in QEMU (it's no longer a "memory device"
like a DIMM then).



I wish virtio-mem can solve our problem, but it is a dynamic allocation 
mechanism
for system RAM in virtualization. In heterogeneous computing environments, the
attached memory usually comes from computing device, it should be managed 
separately.
we doesn't hope Linux MM controls it.

If that heterogeneous memory would have a dedicated node (which usually is
the case IIRC) , and you let it manage by the Linux kernel (dax/kmem), you
can bind the memory backend of virtio-mem to that special NUMA node. So all
memory managed by that virtio-mem device would come from that heterogeneous
memory.


Yes, CXL type 2, 3 devices expose memory to host as a dedicated node, the node
is marked as soft_reserved_memory, dax/kmem can take over the node to create a
dax devcie. This dax device can be regarded as the memory backend of virtio-mem

I don't sure whether a dax device can be open by multiple VMs or host 
applications.

virito-mem currently relies on having a single sparse memory region (anon mmap, mmaped file, mmaped huge pages, mmap shmem) per VM. Although we can share memory with other processes, sharing with other VMs is not intended. Instead of actually mmaping parts dynamically (which can be quite expensive), virtio-mem relies on punching holes into the backend and dynamically allocating memory/file blocks/... on access.

So the easy way to make it work is:

a) Exposing the CXL memory to the buddy via dax/kmem, esulting in device memory getting managed by the buddy on a separate NUMA node.
b) (optional) allocate huge pages on that separate NUMA node.
c) Use ordinary memory-device-ram or memory-device-memfd (for huge pages), *bidning* the memory backend to that special NUMA node.

This will dynamically allocate memory from that special NUMA node, resulting in the virtio-mem device completely being backed by that device memory, being able to dynamically resize the memory allocation.


Exposing an actual devdax to the virtio-mem device, shared by multiple VMs isn't really what we want and won't work without major design changes. Also, I'm not so sure it's a very clean design: exposing memory belonging to other VMs to unrelated QEMU processes. This sounds like a serious security hole: if you managed to escalate to the QEMU process from inside the VM, you can access unrelated VM memory quite happily. You want an abstraction in-between, that makes sure each VM/QEMU process only sees private memory: for example, the buddy via dax/kmem.

--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]