[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest
From: |
Quentin Perret |
Subject: |
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory |
Date: |
Tue, 3 May 2022 11:12:18 +0000 |
On Thursday 28 Apr 2022 at 20:29:52 (+0800), Chao Peng wrote:
>
> + Michael in case he has comment from SEV side.
>
> On Mon, Apr 25, 2022 at 07:52:38AM -0700, Andy Lutomirski wrote:
> >
> >
> > On Mon, Apr 25, 2022, at 6:40 AM, Chao Peng wrote:
> > > On Sun, Apr 24, 2022 at 09:59:37AM -0700, Andy Lutomirski wrote:
> > >>
> >
> > >>
> > >> 2. Bind the memfile to a VM (or at least to a VM technology). Now it's
> > >> in the initial state appropriate for that VM.
> > >>
> > >> For TDX, this completely bypasses the cases where the data is
> > >> prepopulated and TDX can't handle it cleanly. For SEV, it bypasses a
> > >> situation in which data might be written to the memory before we find
> > >> out whether that data will be unreclaimable or unmovable.
> > >
> > > This sounds a more strict rule to avoid semantics unclear.
> > >
> > > So userspace needs to know what excatly happens for a 'bind' operation.
> > > This is different when binds to different technologies. E.g. for SEV, it
> > > may imply after this call, the memfile can be accessed (through mmap or
> > > what ever) from userspace, while for current TDX this should be not
> > > allowed.
> >
> > I think this is actually a good thing. While SEV, TDX, pKVM, etc achieve
> > similar goals and have broadly similar ways of achieving them, they really
> > are different, and having userspace be aware of the differences seems okay
> > to me.
> >
> > (Although I don't think that allowing userspace to mmap SEV shared pages is
> > particularly wise -- it will result in faults or cache incoherence
> > depending on the variant of SEV in use.)
> >
> > >
> > > And I feel we still need a third flow/operation to indicate the
> > > completion of the initialization on the memfile before the guest's
> > > first-time launch. SEV needs to check previous mmap-ed areas are munmap-ed
> > > and prevent future userspace access. After this point, then the memfile
> > > becomes truely private fd.
> >
> > Even that is technology-dependent. For TDX, this operation doesn't really
> > exist. For SEV, I'm not sure (I haven't read the specs in nearly enough
> > detail). For pKVM, I guess it does exist and isn't quite the same as a
> > shared->private conversion.
> >
> > Maybe this could be generalized a bit as an operation "measure and make
> > private" that would be supported by the technologies for which it's useful.
>
> Then I think we need callback instead of static flag field. Backing
> store implements this callback and consumers change the flags
> dynamically with this callback. This implements kind of state machine
> flow.
>
> >
> >
> > >
> > >>
> > >>
> > >> ----------------------------------------------
> > >>
> > >> Now I have a question, since I don't think anyone has really answered
> > >> it: how does this all work with SEV- or pKVM-like technologies in which
> > >> private and shared pages share the same address space? I sounds like
> > >> you're proposing to have a big memfile that contains private and shared
> > >> pages and to use that same memfile as pages are converted back and
> > >> forth. IO and even real physical DMA could be done on that memfile. Am
> > >> I understanding correctly?
> > >
> > > For TDX case, and probably SEV as well, this memfile contains private
> > > memory
> > > only. But this design at least makes it possible for usage cases like
> > > pKVM which wants both private/shared memory in the same memfile and rely
> > > on other ways like mmap/munmap or mprotect to toggle private/shared
> > > instead
> > > of fallocate/hole punching.
> >
> > Hmm. Then we still need some way to get KVM to generate the correct SEV
> > pagetables. For TDX, there are private memslots and shared memslots, and
> > they can overlap. If they overlap and both contain valid pages at the same
> > address, then the results may not be what the guest-side ABI expects, but
> > everything will work. So, when a single logical guest page transitions
> > between shared and private, no change to the memslots is needed. For SEV,
> > this is not the case: everything is in one set of pagetables, and there
> > isn't a natural way to resolve overlaps.
>
> I don't see SEV has problem. Note for all the cases, both private/shared
> memory are in the same memslot. For a given GPA, if there is no private
> page, then shared page will be used to establish KVM pagetables, so this
> can guarantee there is no overlaps.
>
> >
> > If the memslot code becomes efficient enough, then the memslots could be
> > fragmented. Or the memfile could support private and shared data in the
> > same memslot. And if pKVM does this, I don't see why SEV couldn't also do
> > it and hopefully reuse the same code.
>
> For pKVM, that might be the case. For SEV, I don't think we require
> private/shared data in the same memfile. The same model that works for
> TDX should also work for SEV. Or maybe I misunderstood something here?
>
> >
> > >
> > >>
> > >> If so, I think this makes sense, but I'm wondering if the actual memslot
> > >> setup should be different. For TDX, private memory lives in a logically
> > >> separate memslot space. For SEV and pKVM, it doesn't. I assume the API
> > >> can reflect this straightforwardly.
> > >
> > > I believe so. The flow should be similar but we do need pass different
> > > flags during the 'bind' to the backing store for different usages. That
> > > should be some new flags for pKVM but the callbacks (API here) between
> > > memfile_notifile and its consumers can be reused.
> >
> > And also some different flag in the operation that installs the fd as a
> > memslot?
> >
> > >
> > >>
> > >> And the corresponding TDX question: is the intent still that shared
> > >> pages aren't allowed at all in a TDX memfile? If so, that would be the
> > >> most direct mapping to what the hardware actually does.
> > >
> > > Exactly. TDX will still use fallocate/hole punching to turn on/off the
> > > private page. Once off, the traditional shared page will become
> > > effective in KVM.
> >
> > Works for me.
> >
> > For what it's worth, I still think it should be fine to land all the TDX
> > memfile bits upstream as long as we're confident that SEV, pKVM, etc can be
> > added on without issues.
> >
> > I think we can increase confidence in this by either getting one other
> > technology's maintainers to get far enough along in the design to be
> > confident
>
> AFAICS, SEV shouldn't have any problem, But would like to see AMD people
> can comment. For pKVM, definitely need more work, but isn't totally
> undoable. Also would be good if pKVM people can comment.
Merging things incrementally sounds good to me if we can indeed get some
time to make sure it'll be a workable solution for other technologies.
I'm happy to prototype a pKVM extension to the proposed series to see if
there are any major blockers.
Thanks,
Quentin
- Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory,
Quentin Perret <=