qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qo


From: Gregory Price
Subject: Re: [External] Re: [QEMU-devel][RFC PATCH 1/1] backends/hostmem: qapi/qom: Add an ObjectOption for memory-backend-* called HostMemType and its arg 'cxlram'
Date: Mon, 8 Jan 2024 12:15:06 -0500

On Fri, Jan 05, 2024 at 09:59:19PM -0800, Hao Xiang wrote:
> On Wed, Jan 3, 2024 at 1:56 PM Gregory Price <gregory.price@memverge.com> 
> wrote:
> >
> > For a variety of performance reasons, this will not work the way you
> > want it to.  You are essentially telling QEMU to map the vmem0 into a
> > virtual cxl device, and now any memory accesses to that memory region
> > will end up going through the cxl-type3 device logic - which is an IO
> > path from the perspective of QEMU.
> 
> I didn't understand exactly how the virtual cxl-type3 device works. I
> thought it would go with the same "guest virtual address ->  guest
> physical address -> host physical address" translation totally done by
> CPU. But if it is going through an emulation path handled by virtual
> cxl-type3, I agree the performance would be bad. Do you know why
> accessing memory on a virtual cxl-type3 device can't go with the
> nested page table translation?
>

Because a byte-access on CXL memory can have checks on it that must be
emulated by the virtual device, and because there are caching
implications that have to be emulated as well.

The cxl device you are using is an emulated CXL device - not a
virtualization interface.  Nuanced difference:  the emulated device has
to emulate *everything* that CXL device does.

What you want is passthrough / managed access to a real device -
virtualization.  This is not the way to accomplish that.  A better way
to accomplish that is to simply pass the memory through as a static numa
node as I described.

> 
> When we had a discussion with Intel, they told us to not use the KVM
> option in QEMU while using virtual cxl type3 device. That's probably
> related to the issue you described here? We enabled KVM though but
> haven't seen the crash yet.
>

The crash really only happens, IIRC, if code ends up hosted in that
memory.  I forget the exact scenario, but the working theory is it has
to do with the way instruction caches are managed with KVM and this
device.

> >
> > You're better off just using the `host-nodes` field of host-memory
> > and passing bandwidth/latency attributes though via `-numa hmat-lb`
> 
> We tried this but it doesn't work from end to end right now. I
> described the issue in another fork of this thread.
>
> >
> > In that scenario, the guest software doesn't even need to know CXL
> > exists at all, it can just read the attributes of the numa node
> > that QEMU created for it.
> 
> We thought about this before. But the current kernel implementation
> requires a devdax device to be probed and recognized as a slow tier
> (by reading the memory attributes). I don't think this can be done via
> the path you described. Have you tried this before?
>

Right, because the memory tiering component lumps the nodes together.

Better idea:  Fix the memory tiering component

I cc'd you on another patch line that is discussing something relevant
to this.

https://lore.kernel.org/linux-mm/87fs00njft.fsf@yhuang6-desk2.ccr.corp.intel.com/T/#m32d58f8cc607aec942995994a41b17ff711519c8

The point is: There's no need for this to be a dax device at all, there
is no need for the guest to even know what is providing the memory, or
for the guest to have any management access to the memory.  It just
wants the memory and the ability to tier it.

So we should fix the memory tiering component to work with this
workflow.

~Gregory



reply via email to

[Prev in Thread] Current Thread [Next in Thread]