qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-gues


From: Xiaoyao Li
Subject: Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
Date: Thu, 25 Aug 2022 22:42:34 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.12.0

On 8/25/2022 7:36 PM, Gerd Hoffmann wrote:
On Tue, Aug 02, 2022 at 03:47:25PM +0800, Xiaoyao Li wrote:
Bit 28, named SEPT_VE_DISABLE, disables EPT violation conversion to #VE
on guest TD access of PENDING pages when set to 1. Some guest OS (e.g.,
Linux TD guest) may require this bit set as 1. Otherwise refuse to boot.

--verbose please.  That somehow doesn't make sense to me.

A guest is either TDX-aware (which should be the case for linux 5.19+),
or it is not.  My expectation would be that guests which are not
TDX-aware will be disturbed by any #VE exception, not only the ones
triggered by EPT violations.  So I'm wondering what this config bit
actually is useful for ...

This bit, including other properties of tdx-guest object, are supposed to be configured for TD only. On VM creation phase, user needs to decide if it's a TD (TDX VM) or non-TD (previous normal VM) by attaching tdx-guest object or not.

If it's a TD when VM creation, but the guest kernel is not TDX-capable/-aware, it's doomed to fail booting.

For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g., linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap [1]. Frankly speaking, this bit is better to be configured by TD guest kernel, however current TDX architecture makes the design to let VMM configure.


[1]: TD pages that are not accepted cause a #VE exception.
It is possible for a hypervisor to take away a guest page
and thus trigger a #VE the next time it is accessed.
Normally the guest would just panic in such a case, but
for that it first needs to execute the #VE handler
reliably.

This can cause problems with the "system call gap": a malicious
hypervisor might trigger a #VE for example on the system call entry
code, and when a user process does a system call it would trigger a
and SYSCALL relies on the kernel code to switch to the kernel stack,
this would lead to kernel code running on the ring 3 stack.  This could
be exploited by a combination of malicious host and malicious ring 3
program to attack the kernel.


take care,
   Gerd





reply via email to

[Prev in Thread] Current Thread [Next in Thread]