[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backen
From: |
Sean Christopherson |
Subject: |
Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset |
Date: |
Fri, 10 Sep 2021 17:34:29 +0000 |
On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> On 10/09/21 17:34, Sean Christopherson wrote:
> > > Yang explained to me (offlist) that this is needed because Windows fails
> > > to
> > > reboot without it. We would need a way to ask Linux to reinitialize the
> > > vEPC, that doesn't involve munmap/mmap; this could be for example
> > > fallocate(FALLOC_FL_ZERO_RANGE).
> > >
> > > What do you all think?
> >
> > Mechanically, FALLOC_FL_ZERO_RANGE could work, but it's definitely a
> > stretch of
> > the semantics as the underlying memory would not actually be zeroed.
>
> The contents are not visible to anyone, so they might well be zero
> (not entirely serious, but also not entirely unserious).
Yeah, it wouldn't be a sticking point, just odd.
> > The only other option that comes to mind is a dedicated ioctl().
>
> If it is not too restrictive to do it for the whole mapping at once,
> that would be fine.
Oooh, rats. That reminds me of a complication. If QEMU creates multiple EPC
sections, e.g. for a vNUMA setup, resetting each section individually will fail
if the guest did an unclean RESET and a given enclaves has EPC pages from
multiple
sections. E.g. an SECS in vEPC[X] can have children in vEPC[0..N-1], and all
those children need to be removed before the SECS can be removed. Yay SGX!
> If it makes sense to do it for a range, however,
> the effort of defining a ioctl() API is probably not worth it when
> fallocate() is available.
>
> Either way, I suppose it would be just something like
>
> /* for fallocate; otherwise just use xa_for_each */
> size = min_t(unsigned long, size, -start);
> end = (start + size - 1) >> PAGE_SHIFT;
> start >>= PAGE_SHIFT;
>
> /*
> * Removing in two passes lets us remove SECS pages as well,
> * since they can only be EREMOVE'd after all their child pages.
> * An EREMOVE failure on the second pass means that the SECS
> * page still has children on another instance. Since it is
> * still in the xarray, bury our head in the sand and ignore
> * it; sgx_vepc_release() will deal with it.
> */
> LIST_HEAD(secs_pages);
> xa_for_each_range(&vepc->page_array, index, entry, start, end) {
> if (!sgx_vepc_free_page(entry))
> list_add_tail(&epc_page->list, &secs_pages);
> }
>
> list_for_each_entry_safe(epc_page, tmp, &secs_pages, list) {
> list_del(&epc_page->list);
> sgx_vepc_free_page(epc_page);
> }
> So another advantage of the ioctl would be e.g. being able to return the
> number of successfully EREMOVEd pages. But since QEMU would not do
> anything with the return value and _also_ bury its head in the sand,
> that would only be useful if you guys have other uses in mind.
There are two options: 1) QEMU has to handle "failure", or 2) the kernel
provides
an ioctl() that takes multiple vEPC fds and handles the SECS dependencies. #1
is
probably the least awful option. For #2, in addition to the kernel having to
deal
with multiple fds, it would also need a second list_head object in each page so
that it could track which pages failed to be removed. Using the existing
list_head
would work for now, but it won't work if/when an EPC cgroup is added.
Note, for #1, QEMU would have to potentially do three passes.
1. Remove child pages for a given vEPC.
2. Remove SECS for a given vEPC that were pinned by children in the same vEPC.
3. Remove SECS for all vEPC that were pinned by children in different vEPC.
Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset, Jarkko Sakkinen, 2021/09/13