|
From: | Xiaoyao Li |
Subject: | Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object |
Date: | Thu, 25 Aug 2022 22:42:34 +0800 |
User-agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.12.0 |
On 8/25/2022 7:36 PM, Gerd Hoffmann wrote:
On Tue, Aug 02, 2022 at 03:47:25PM +0800, Xiaoyao Li wrote:Bit 28, named SEPT_VE_DISABLE, disables EPT violation conversion to #VE on guest TD access of PENDING pages when set to 1. Some guest OS (e.g., Linux TD guest) may require this bit set as 1. Otherwise refuse to boot.--verbose please. That somehow doesn't make sense to me. A guest is either TDX-aware (which should be the case for linux 5.19+), or it is not. My expectation would be that guests which are not TDX-aware will be disturbed by any #VE exception, not only the ones triggered by EPT violations. So I'm wondering what this config bit actually is useful for ...
This bit, including other properties of tdx-guest object, are supposed to be configured for TD only. On VM creation phase, user needs to decide if it's a TD (TDX VM) or non-TD (previous normal VM) by attaching tdx-guest object or not.
If it's a TD when VM creation, but the guest kernel is not TDX-capable/-aware, it's doomed to fail booting.
For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g., linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap [1]. Frankly speaking, this bit is better to be configured by TD guest kernel, however current TDX architecture makes the design to let VMM configure.
[1]: TD pages that are not accepted cause a #VE exception. It is possible for a hypervisor to take away a guest page and thus trigger a #VE the next time it is accessed. Normally the guest would just panic in such a case, but for that it first needs to execute the #VE handler reliably. This can cause problems with the "system call gap": a malicious hypervisor might trigger a #VE for example on the system call entry code, and when a user process does a system call it would trigger a and SYSCALL relies on the kernel code to switch to the kernel stack, this would lead to kernel code running on the ring 3 stack. This could be exploited by a combination of malicious host and malicious ring 3 program to attack the kernel.
take care, Gerd
[Prev in Thread] | Current Thread | [Next in Thread] |