Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest

From:	Quentin Perret
Subject:	Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
Date:	Tue, 3 May 2022 11:12:18 +0000

On Thursday 28 Apr 2022 at 20:29:52 (+0800), Chao Peng wrote:
> 
> + Michael in case he has comment from SEV side.
> 
> On Mon, Apr 25, 2022 at 07:52:38AM -0700, Andy Lutomirski wrote:
> > 
> > 
> > On Mon, Apr 25, 2022, at 6:40 AM, Chao Peng wrote:
> > > On Sun, Apr 24, 2022 at 09:59:37AM -0700, Andy Lutomirski wrote:
> > >> 
> > 
> > >> 
> > >> 2. Bind the memfile to a VM (or at least to a VM technology).  Now it's 
> > >> in the initial state appropriate for that VM.
> > >> 
> > >> For TDX, this completely bypasses the cases where the data is 
> > >> prepopulated and TDX can't handle it cleanly.  For SEV, it bypasses a 
> > >> situation in which data might be written to the memory before we find 
> > >> out whether that data will be unreclaimable or unmovable.
> > >
> > > This sounds a more strict rule to avoid semantics unclear.
> > >
> > > So userspace needs to know what excatly happens for a 'bind' operation.
> > > This is different when binds to different technologies. E.g. for SEV, it
> > > may imply after this call, the memfile can be accessed (through mmap or
> > > what ever) from userspace, while for current TDX this should be not 
> > > allowed.
> > 
> > I think this is actually a good thing.  While SEV, TDX, pKVM, etc achieve 
> > similar goals and have broadly similar ways of achieving them, they really 
> > are different, and having userspace be aware of the differences seems okay 
> > to me.
> > 
> > (Although I don't think that allowing userspace to mmap SEV shared pages is 
> > particularly wise -- it will result in faults or cache incoherence 
> > depending on the variant of SEV in use.)
> > 
> > >
> > > And I feel we still need a third flow/operation to indicate the
> > > completion of the initialization on the memfile before the guest's 
> > > first-time launch. SEV needs to check previous mmap-ed areas are munmap-ed
> > > and prevent future userspace access. After this point, then the memfile
> > > becomes truely private fd.
> > 
> > Even that is technology-dependent.  For TDX, this operation doesn't really 
> > exist.  For SEV, I'm not sure (I haven't read the specs in nearly enough 
> > detail).  For pKVM, I guess it does exist and isn't quite the same as a 
> > shared->private conversion.
> > 
> > Maybe this could be generalized a bit as an operation "measure and make 
> > private" that would be supported by the technologies for which it's useful.
> 
> Then I think we need callback instead of static flag field. Backing
> store implements this callback and consumers change the flags
> dynamically with this callback. This implements kind of state machine
> flow.
> 
> > 
> > 
> > >
> > >> 
> > >> 
> > >> ----------------------------------------------
> > >> 
> > >> Now I have a question, since I don't think anyone has really answered 
> > >> it: how does this all work with SEV- or pKVM-like technologies in which 
> > >> private and shared pages share the same address space?  I sounds like 
> > >> you're proposing to have a big memfile that contains private and shared 
> > >> pages and to use that same memfile as pages are converted back and 
> > >> forth.  IO and even real physical DMA could be done on that memfile.  Am 
> > >> I understanding correctly?
> > >
> > > For TDX case, and probably SEV as well, this memfile contains private 
> > > memory
> > > only. But this design at least makes it possible for usage cases like
> > > pKVM which wants both private/shared memory in the same memfile and rely
> > > on other ways like mmap/munmap or mprotect to toggle private/shared 
> > > instead
> > > of fallocate/hole punching.
> > 
> > Hmm.  Then we still need some way to get KVM to generate the correct SEV 
> > pagetables.  For TDX, there are private memslots and shared memslots, and 
> > they can overlap.  If they overlap and both contain valid pages at the same 
> > address, then the results may not be what the guest-side ABI expects, but 
> > everything will work.  So, when a single logical guest page transitions 
> > between shared and private, no change to the memslots is needed.  For SEV, 
> > this is not the case: everything is in one set of pagetables, and there 
> > isn't a natural way to resolve overlaps.
> 
> I don't see SEV has problem. Note for all the cases, both private/shared
> memory are in the same memslot. For a given GPA, if there is no private
> page, then shared page will be used to establish KVM pagetables, so this
> can guarantee there is no overlaps.
> 
> > 
> > If the memslot code becomes efficient enough, then the memslots could be 
> > fragmented.  Or the memfile could support private and shared data in the 
> > same memslot.  And if pKVM does this, I don't see why SEV couldn't also do 
> > it and hopefully reuse the same code.
> 
> For pKVM, that might be the case. For SEV, I don't think we require
> private/shared data in the same memfile. The same model that works for
> TDX should also work for SEV. Or maybe I misunderstood something here?
> 
> > 
> > >
> > >> 
> > >> If so, I think this makes sense, but I'm wondering if the actual memslot 
> > >> setup should be different.  For TDX, private memory lives in a logically 
> > >> separate memslot space.  For SEV and pKVM, it doesn't.  I assume the API 
> > >> can reflect this straightforwardly.
> > >
> > > I believe so. The flow should be similar but we do need pass different
> > > flags during the 'bind' to the backing store for different usages. That
> > > should be some new flags for pKVM but the callbacks (API here) between
> > > memfile_notifile and its consumers can be reused.
> > 
> > And also some different flag in the operation that installs the fd as a 
> > memslot?
> > 
> > >
> > >> 
> > >> And the corresponding TDX question: is the intent still that shared 
> > >> pages aren't allowed at all in a TDX memfile?  If so, that would be the 
> > >> most direct mapping to what the hardware actually does.
> > >
> > > Exactly. TDX will still use fallocate/hole punching to turn on/off the
> > > private page. Once off, the traditional shared page will become
> > > effective in KVM.
> > 
> > Works for me.
> > 
> > For what it's worth, I still think it should be fine to land all the TDX 
> > memfile bits upstream as long as we're confident that SEV, pKVM, etc can be 
> > added on without issues.
> > 
> > I think we can increase confidence in this by either getting one other 
> > technology's maintainers to get far enough along in the design to be 
> > confident
> 
> AFAICS, SEV shouldn't have any problem, But would like to see AMD people
> can comment. For pKVM, definitely need more work, but isn't totally
> undoable. Also would be good if pKVM people can comment.

Merging things incrementally sounds good to me if we can indeed get some
time to make sure it'll be a workable solution for other technologies.
I'm happy to prototype a pKVM extension to the proposed series to see if
there are any major blockers.

Thanks,
Quentin

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory, Quentin Perret <=
- Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory, Michael Roth, 2022/05/09
  - Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory, Sean Christopherson, 2022/05/09

Prev by Date: Re: [RFC PATCH v1 0/8] qapi: add generator for Golang interface
Next by Date: Re: [PATCH v2 12/26] qga: replace pipe() with g_unix_open_pipe(CLOEXEC)
Previous by thread: Re: [PATCH 0/3] coroutine: use QEMU_DEFINE_STATIC_CO_TLS()
Next by thread: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
Index(es):
- Date
- Thread