qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memo


From: david.dai
Subject: Re: [PATCH] hw/misc: Add a virtual pci device to dynamically attach memory to QEMU
Date: Thu, 30 Sep 2021 17:40:07 +0800

On Wed, Sep 29, 2021 at 11:30:53AM +0200, David Hildenbrand (david@redhat.com) 
wrote: 
> 
> On 27.09.21 14:28, david.dai wrote:
> > On Mon, Sep 27, 2021 at 11:07:43AM +0200, David Hildenbrand 
> > (david@redhat.com) wrote:
> > > 
> > > CAUTION: This email originated from outside of the organization. Do not
> > > click links or open attachments unless you recognize the sender and know 
> > > the
> > > content is safe.
> > > 
> > > 
> > > On 27.09.21 10:27, Stefan Hajnoczi wrote:
> > > > On Sun, Sep 26, 2021 at 10:16:14AM +0800, David Dai wrote:
> > > > > Add a virtual pci to QEMU, the pci device is used to dynamically 
> > > > > attach memory
> > > > > to VM, so driver in guest can apply host memory in fly without 
> > > > > virtualization
> > > > > management software's help, such as libvirt/manager. The attached 
> > > > > memory is
> > > 
> > > We do have virtio-mem to dynamically attach memory to a VM. It could be
> > > extended by a mechanism for the VM to request more/less memory, that's
> > > already a planned feature. But yeah, virito-mem memory is exposed as
> > > ordinary system RAM, not only via a BAR to mostly be managed by user space
> > > completely.
> 
> There is a virtio-pmem spec proposal to expose the memory region via a PCI
> BAR. We could do something similar for virtio-mem, however, we would have to
> wire that new model up differently in QEMU (it's no longer a "memory device"
> like a DIMM then).
> 
> > > 
> > 
> > I wish virtio-mem can solve our problem, but it is a dynamic allocation 
> > mechanism
> > for system RAM in virtualization. In heterogeneous computing environments, 
> > the
> > attached memory usually comes from computing device, it should be managed 
> > separately.
> > we doesn't hope Linux MM controls it.
> 
> If that heterogeneous memory would have a dedicated node (which usually is
> the case IIRC) , and you let it manage by the Linux kernel (dax/kmem), you
> can bind the memory backend of virtio-mem to that special NUMA node. So all
> memory managed by that virtio-mem device would come from that heterogeneous
> memory.
> 

Yes, CXL type 2, 3 devices expose memory to host as a dedicated node, the node
is marked as soft_reserved_memory, dax/kmem can take over the node to create a
dax devcie. This dax device can be regarded as the memory backend of virtio-mem

I don't sure whether a dax device can be open by multiple VMs or host 
applications. 

> You could then further use a separate NUMA node for that virtio-mem device
> inside the VM. But to the VM it would look like System memory with different
> performance characteristics. That would work fore some use cases I guess,
> but not sure for which not (I assume you can tell :) ).
> 

If the NUMA node in guest can be dynamically expanded by virtio-mem, maybe it is
a good thing. Because we will develop our own memory management driver to manage
device memory.
   
> We could even write an alternative virtio-mem mode, where device manage
> isn't exposed to the buddy but using some different way to user space.
> 
> > > > > isolated from System RAM, it can be used in heterogeneous memory 
> > > > > management for
> > > > > virtualization. Multiple VMs dynamically share same computing device 
> > > > > memory
> > > > > without memory overcommit.
> > > 
> > > This sounds a lot like MemExpand/MemLego ... am I right that this is the
> > > original design? I recall that VMs share a memory region and dynamically
> > > agree upon which part of the memory region a VM uses. I further recall 
> > > that
> > > there were malloc() hooks that would dynamically allocate such memory in
> > > user space from the shared memory region.
> > > 
> > 
> > Thank you for telling me about Memexpand/MemLego, I have carefully read the 
> > paper.
> > some ideas from it are same as this patch, such as software model and 
> > stack, but
> > it may have a security risk that whole shared memory is visible to all VMs.
> 
> How will you make sure that not all shared memory can be accessed by the
> other VMs? IOW, emulate !shared memory on shared memory?
> 
> > -----------------------
> >       application
> > -----------------------
> > memory management driver
> > -----------------------
> >       pci driver
> > -----------------------
> >     virtual pci device
> > -----------------------
> > 
> > > I can see some use cases for it, although the shared memory design isn't
> > > what you typically want in most VM environments.
> > > 
> > 
> > The original design for this patch is to share a computing device among 
> > multipile
> > VMs. Each VM runs a computing application(for example, OpenCL application)
> > Our computing device can support a few applications in parallel. In 
> > addition, it
> > supports SVM(shared virtual memory) via IOMMU/ATS/PASID/PRI. Device exposes 
> > its
> > memory to host vis PCIe bar or CXL.mem, host constructs memory pool to 
> > uniformly
> > manage device memory, then attach device memory to VM via a virtual PCI 
> > device.
> 
> How exactly is that memory pool created/managed? Simply dax/kmem and
> handling it via the buddy in a special NUMA node.
>

We develop MM driver in host and guest to manage reserved memory(NUMA node as 
you mentioned).
MM Driver is similar to buddy system, it also uses specific page structure to 
manage physical
memory, then offers mmap() to host application or VM. Device driver adds memory 
region to MM
driver.

we don't use dax/kmem, because we need to control key software module to reduce 
risk. we may
add new features into driver overtime.

> > but we don't know how much memory should be assigned when creating VM, so 
> > we hope
> > memory is attached to VM on-demand. driver in guest triggers memory 
> > attaching, but
> > not outside virtualization management software. so the original 
> > requirements are:
> > 1> The managed memory comes from device, it should be isolated from system 
> > RAM
> > 2> The memory can be dynamically attached to VM in fly
> > 3> The attached memory supports SVM and DMA operation with IOMMU
> > 
> > Thank you very much.
> 
> Thanks for the info. If virtio-mem is not applicable and cannot be modified
> for this use case, would it make sense to create a new virtio device type?
> 

we had MM driver in host and guest, now we need a way to dynamically attach 
memory
to guest, join both ends together, this patch is a self-contain device, it 
doesn't
impact QEMU's stability.

A new virtio device type is a good idea for me, I need some time to understand
virtio spec and virtio-mem, then I may give another proposal, such as
[RFC]hw/virtio: Add virtio-memdev to dynamically attach memory to QEMU


Thanks,
David Dai

> 
> -- 
> Thanks,
> 
> David / dhildenb
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]