qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V1 18/32] osdep: import MADV_DOEXEC


From: Alex Williamson
Subject: Re: [PATCH V1 18/32] osdep: import MADV_DOEXEC
Date: Mon, 17 Aug 2020 14:48:56 -0600

On Mon, 17 Aug 2020 14:30:51 -0400
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 7/30/2020 11:14 AM, Steve Sistare wrote:
> > Anonymous memory segments used by the guest are preserved across a re-exec
> > of qemu, mapped at the same VA, via a proposed madvise(MADV_DOEXEC) option
> > in the Linux kernel. For the madvise patches, see:
> > 
> > https://lore.kernel.org/lkml/1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com/
> > 
> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > ---
> >  include/qemu/osdep.h | 7 +++++++
> >  1 file changed, 7 insertions(+)  
> 
> Hi Alex,
>   The MADV_DOEXEC functionality, which is a pre-requisite for the entire qemu 
> live update series, is getting a chilly reception on lkml.  We could instead 
> create guest memory using memfd_create and preserve the fd across exec.  
> However, 
> the subsequent mmap(fd) will return a different VA than was used previously, 
> which  is a problem for memory that was registered with vfio, as the original 
> VA 
> is remembered in the kernel struct vfio_dma and used in various kernel 
> functions, 
> such as vfio_iommu_replay.
> 
> To fix, we could provide a VFIO_IOMMU_REMAP_DMA ioctl taking iova, size, and
> new_vaddr.  The implementation finds an exact match for (iova, size) and 
> replaces 
> vaddr with new_vaddr.  Flags cannot be changed.
> 
> memfd_create plus VFIO_IOMMU_REMAP_DMA would replace MADV_DOEXEC.
> vfio on any form of shared memory (shm, dax, etc) could also be preserved 
> across
> exec with shmat/mmap plus VFIO_IOMMU_REMAP_DMA.
> 
> What do you think

Your new REMAP ioctl would have parameters identical to the MAP_DMA
ioctl, so I think we should just use one of the flag bits on the
existing MAP_DMA ioctl for this variant.

Reading through the discussion on the kernel side there seems to be
some confusion around why vfio needs the vaddr beyond the user call to
MAP_DMA though.  Originally this was used to test for virtually
contiguous mappings for merging and splitting purposes.  This is
defunct in the v2 interface, however the vaddr is now used largely for
mdev devices.  If an mdev device is not backed by an IOMMU device and
does not share a container with an IOMMU device, then a user MAP_DMA
ioctl essentially just registers the translation within the vfio
container.  The mdev vendor driver can then later either request pages
to be pinned for device DMA or can perform copy_to/from_user() to
simulate DMA via the CPU.

Therefore I don't see that there's a simple re-architecture of the vfio
IOMMU backend that could drop vaddr use.  I'm a bit concerned this new
remap proposal also raises the question of how do we prevent userspace
remapping vaddrs racing with asynchronous kernel use of the previous
vaddrs.  Are we expecting guest drivers/agents to quiesce the device,
or maybe relying on clearing bus-master, for PCI devices, to halt DMA?
The vfio migration interface we've developed does have a mechanism to
stop a device, would we need to use this here?  If we do have a
mechanism to quiesce the device, is the only reason we're not unmapping
everything and remapping it into the new address space the latency in
performing that operation?  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]