> It's also possible that QEMU handles failure, but the kernel does two
> passes; then QEMU can just do two passes. The kernel will overall do four
> passes, but:
>
> 1) the second (SECS pinned by children in the same vEPC) would be cheaper
> than a full second pass
The problem is that this would require a list_head (or temp allocations) to track
the SECS pages that failed the first time 'round. For vEPC destruction, the kernel
can use sgx_epc_page.list because it can take the pages off the active/allocated
list, but that's not an option in this case because the presumably-upcoming EPC
cgroup needs to keep pages on the list to handle OOM.
Good point, so yeah: let's go for a ioctl that does full removal, returning the number of failures. I will try and cobble up a patch unless Kai beats me to it.
Thanks for the quick discussion!
Paolo
The kernel's ioctl/syscall/whatever could return the number of pages that were
not freed, or maybe just -EAGAIN, and userspace could use that to know it needs
to do another reset to free everything.
My thought for QEMU was to do (bad pseudocode):
/* Retry to EREMOVE pinned SECS pages if necessary. */
ret = ioctl(SGX_VEPC_RESET, ...);
if (ret)
ret = ioctl(SGX_VEPC_RESET, ...);
/*
* Tag the VM as needed yet another round of resets to ERMOVE SECS pages
* that were pinned across vEPC sections.
*/
vm->sgx_epc_final_reset_needed = !!ret;
> 2) the fourth would actually do nothing, because there would be no pages
> failing the EREMOV'al.
>
> A hypothetical other SGX client that only uses one vEPC will do the right
> thing with a single pass.