qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backen


From: Sean Christopherson
Subject: Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset
Date: Fri, 10 Sep 2021 17:34:29 +0000

On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> On 10/09/21 17:34, Sean Christopherson wrote:
> > > Yang explained to me (offlist) that this is needed because Windows fails 
> > > to
> > > reboot without it.  We would need a way to ask Linux to reinitialize the
> > > vEPC, that doesn't involve munmap/mmap; this could be for example
> > > fallocate(FALLOC_FL_ZERO_RANGE).
> > > 
> > > What do you all think?
> > 
> > Mechanically, FALLOC_FL_ZERO_RANGE could work, but it's definitely a 
> > stretch of
> > the semantics as the underlying memory would not actually be zeroed.
> 
> The contents are not visible to anyone, so they might well be zero
> (not entirely serious, but also not entirely unserious).

Yeah, it wouldn't be a sticking point, just odd.

> > The only other option that comes to mind is a dedicated ioctl().
> 
> If it is not too restrictive to do it for the whole mapping at once,
> that would be fine.

Oooh, rats.  That reminds me of a complication.  If QEMU creates multiple EPC
sections, e.g. for a vNUMA setup, resetting each section individually will fail
if the guest did an unclean RESET and a given enclaves has EPC pages from 
multiple
sections.  E.g. an SECS in vEPC[X] can have children in vEPC[0..N-1], and all
those children need to be removed before the SECS can be removed.  Yay SGX!

> If it makes sense to do it for a range, however,
> the effort of defining a ioctl() API is probably not worth it when
> fallocate() is available.
> 
> Either way, I suppose it would be just something like
> 
>       /* for fallocate; otherwise just use xa_for_each */
>       size = min_t(unsigned long, size, -start);
>       end = (start + size - 1) >> PAGE_SHIFT;
>       start >>= PAGE_SHIFT;
> 
>       /*
>        * Removing in two passes lets us remove SECS pages as well,
>        * since they can only be EREMOVE'd after all their child pages.
>        * An EREMOVE failure on the second pass means that the SECS
>        * page still has children on another instance.  Since it is
>        * still in the xarray, bury our head in the sand and ignore
>        * it; sgx_vepc_release() will deal with it.
>        */
>       LIST_HEAD(secs_pages);
>         xa_for_each_range(&vepc->page_array, index, entry, start, end) {
>                 if (!sgx_vepc_free_page(entry))
>                         list_add_tail(&epc_page->list, &secs_pages);
>         }
> 
>         list_for_each_entry_safe(epc_page, tmp, &secs_pages, list) {
>               list_del(&epc_page->list);
>                 sgx_vepc_free_page(epc_page);
>         }
> So another advantage of the ioctl would be e.g. being able to return the
> number of successfully EREMOVEd pages.  But since QEMU would not do
> anything with the return value and _also_ bury its head in the sand,
> that would only be useful if you guys have other uses in mind.

There are two options: 1) QEMU has to handle "failure", or 2) the kernel 
provides
an ioctl() that takes multiple vEPC fds and handles the SECS dependencies.  #1 
is
probably the least awful option.  For #2, in addition to the kernel having to 
deal
with multiple fds, it would also need a second list_head object in each page so
that it could track which pages failed to be removed.  Using the existing 
list_head
would work for now, but it won't work if/when an EPC cgroup is added.

Note, for #1, QEMU would have to potentially do three passes.

  1. Remove child pages for a given vEPC.
  2. Remove SECS for a given vEPC that were pinned by children in the same vEPC.
  3. Remove SECS for all vEPC that were pinned by children in different vEPC.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]