qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backen


From: Sean Christopherson
Subject: Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset
Date: Fri, 10 Sep 2021 20:21:00 +0000

On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> On 10/09/21 19:34, Sean Christopherson wrote:
> > On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> > > On 10/09/21 17:34, Sean Christopherson wrote:
> > > > The only other option that comes to mind is a dedicated ioctl().
> > > 
> > > If it is not too restrictive to do it for the whole mapping at once,
> > > that would be fine.
> > 
> > Oooh, rats.  That reminds me of a complication.  If QEMU creates multiple 
> > EPC
> > sections, e.g. for a vNUMA setup, resetting each section individually will 
> > fail
> > if the guest did an unclean RESET and a given enclaves has EPC pages from 
> > multiple
> > sections.  E.g. an SECS in vEPC[X] can have children in vEPC[0..N-1], and 
> > all
> > those children need to be removed before the SECS can be removed.  Yay SGX!
> > 
> > There are two options: 1) QEMU has to handle "failure", or 2) the kernel 
> > provides
> > an ioctl() that takes multiple vEPC fds and handles the SECS dependencies.  
> > #1 is
> > probably the least awful option.  For #2, in addition to the kernel having 
> > to deal
> > with multiple fds, it would also need a second list_head object in each 
> > page so
> > that it could track which pages failed to be removed.  Using the existing 
> > list_head
> > would work for now, but it won't work if/when an EPC cgroup is added.
> > 
> > Note, for #1, QEMU would have to potentially do three passes.
> > 
> >    1. Remove child pages for a given vEPC.
> >    2. Remove SECS for a given vEPC that were pinned by children in the same 
> > vEPC.
> >    3. Remove SECS for all vEPC that were pinned by children in different 
> > vEPC.
> 
> It's also possible that QEMU handles failure, but the kernel does two
> passes; then QEMU can just do two passes.  The kernel will overall do four
> passes, but:
> 
> 1) the second (SECS pinned by children in the same vEPC) would be cheaper
> than a full second pass

The problem is that this would require a list_head (or temp allocations) to 
track
the SECS pages that failed the first time 'round.  For vEPC destruction, the 
kernel
can use sgx_epc_page.list because it can take the pages off the active/allocated
list, but that's not an option in this case because the presumably-upcoming EPC
cgroup needs to keep pages on the list to handle OOM.

The kernel's ioctl/syscall/whatever could return the number of pages that were
not freed, or maybe just -EAGAIN, and userspace could use that to know it needs
to do another reset to free everything.

My thought for QEMU was to do (bad pseudocode):

        /* Retry to EREMOVE pinned SECS pages if necessary. */
        ret = ioctl(SGX_VEPC_RESET, ...);
        if (ret)
                ret = ioctl(SGX_VEPC_RESET, ...);

        /*
         * Tag the VM as needed yet another round of resets to ERMOVE SECS pages
         * that were pinned across vEPC sections.
         */
        vm->sgx_epc_final_reset_needed = !!ret;

> 2) the fourth would actually do nothing, because there would be no pages
> failing the EREMOV'al.
> 
> A hypothetical other SGX client that only uses one vEPC will do the right
> thing with a single pass.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]