[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_n
From: |
Eiichi Tsukata |
Subject: |
Re: [PATCH] target/i386/kvm: call kvm_put_vcpu_events() before kvm_put_nested_state() |
Date: |
Wed, 1 Nov 2023 02:09:40 +0000 |
FYI: The EINVAL in vmx_set_nested_state() is caused by the following condition:
* vcpu->arch.hflags == 0
* kvm_state->hdr.vmx.smm.flags == KVM_STATE_NESTED_SMM_VMXON
Please feel free to ask me any more data points you need.
Thanks,
Eiichi
> On Oct 26, 2023, at 17:52, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Cc'ing Max :-) At first glance the condition in vmx_set_nested_state()
> is correct so I guess we either have a stale
> KVM_STATE_NESTED_RUN_PENDING when in SMM or stale smm.flags when outside
> of it...
>
> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
>
>> Cc'ing Vitaly.
>>
>> On 26/10/23 07:49, Eiichi Tsukata wrote:
>>> Hi all,
>>>
>>> Here is additional details on the issue.
>>>
>>> We've found this issue when testing Windows Virtual Secure Mode (VSM) VMs.
>>> We sometimes saw live migration failures of VSM-enabled VMs. It turned
>>> out that the issue happens during live migration when VMs change boot
>>> related
>>> EFI variables (ex: BootOrder, Boot0001).
>>> After some debugging, I've found the race I mentioned in the commit message.
>>>
>>> Symptom
>>> =======
>>>
>>> When it happnes with the latest Qemu which has commit
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_qemu_qemu_commit_7191f24c7fcfbc1216d09&d=DwIFaQ&c=s883GpUCOChKOHiocYtGcg&r=dy01Dr4Ly8mhvnUdx1pZhhT1bkq4h9z5aVWu3paoZtk&m=w7w9eoWDJkzNpMAP1--ljTEkGkHZkB_81JBr2vdK47SK2RQFyAJHI5f13n2bybQF&s=orR7p9iiUFmw98VuhxB-Uc6Jtn91Ldm1V6qOhIqRKnc&e=
>>>
>>> Qemu shows the following error message on destination.
>>>
>>> qemu-system-x86_64: Failed to put registers after init: Invalid argument
>>>
>>> If it happens with older Qemu which doesn't have the commit, then we see
>>> CPU dump something like this:
>>>
>>> KVM internal error. Suberror: 3
>>> extra data[0]: 0x0000000080000b0e
>>> extra data[1]: 0x0000000000000031
>>> extra data[2]: 0x0000000000000683
>>> extra data[3]: 0x000000007f809000
>>> extra data[4]: 0x0000000000000026
>>> RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000
>>> RDX=0000000000000f61
>>> RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000
>>> RSP=0000000000000000
>>> R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
>>> R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000
>>> R15=0000000000000000
>>> RIP=000000000000fff0 RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> CS =0038 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
>>> SS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> DS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> FS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> GS =0020 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA]
>>> LDT=0000 0000000000000000 ffffffff 00c00000
>>> TR =0040 000000007f7df050 00068fff 00808b00 DPL=0 TSS64-busy
>>> GDT= 000000007f7df000 0000004f
>>> IDT= 000000007f836000 000001ff
>>> CR0=80010033 CR2=000000000000fff0 CR3=000000007f809000 CR4=00000668
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>>> DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000d00
>>> Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ??
>>> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
>>> ?? ?? ??
>>>
>>> In the above dump, CR3 is pointing to SMRAM region though SMM=0.
>>>
>>> Repro
>>> =====
>>>
>>> Repro step is pretty simple.
>>>
>>> * Run SMM enabled Linux guest with secure boot enabled OVMF.
>>> * Run the following script in the guest.
>>>
>>> /usr/libexec/qemu-kvm &
>>> while true
>>> do
>>> efibootmgr -n 1
>>> done
>>>
>>> * Do live migration
>>>
>>> On my environment, live migration fails in 20%.
>>>
>>> VMX specific
>>> ============
>>>
>>> This issue is VMX sepcific and SVM is not affected as the validation
>>> in svm_set_nested_state() is a bit different from VMX one.
>>>
>>> VMX:
>>>
>>> static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>>> struct kvm_nested_state __user
>>> *user_kvm_nested_state,
>>> struct kvm_nested_state *kvm_state)
>>> {
>>> .. /* * SMM temporarily disables VMX, so we cannot
>>> be in guest mode,
>>> * nor can VMLAUNCH/VMRESUME be pending. Outside SMM, SMM flags
>>> * must be zero.
>>> */ if (is_smm(vcpu) ?
>>> (kvm_state->flags &
>>> (KVM_STATE_NESTED_GUEST_MODE |
>>> KVM_STATE_NESTED_RUN_PENDING))
>>> : kvm_state->hdr.vmx.smm.flags)
>>> return -EINVAL;
>>> ..
>>>
>>> SVM:
>>>
>>> static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>>> struct kvm_nested_state __user
>>> *user_kvm_nested_state,
>>> struct kvm_nested_state *kvm_state)
>>> {
>>> .. /* SMM temporarily disables SVM, so we cannot be in guest
>>> mode. */ if (is_smm(vcpu) && (kvm_state->flags &
>>> KVM_STATE_NESTED_GUEST_MODE))
>>> return -EINVAL;
>>> ..
>>>
>>> Thanks,
>>>
>>> Eiichi
>>>
>>>> On Oct 26, 2023, at 14:42, Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>>>> wrote:
>>>>
>>>> kvm_put_vcpu_events() needs to be called before kvm_put_nested_state()
>>>> because vCPU's hflag is referred in KVM vmx_get_nested_state()
>>>> validation. Otherwise kvm_put_nested_state() can fail with -EINVAL when
>>>> a vCPU is in VMX operation and enters SMM mode. This leads to live
>>>> migration failure.
>>>>
>>>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
>>>> ---
>>>> target/i386/kvm/kvm.c | 13 +++++++++----
>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index e7c054cc16..cd635c9142 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -4741,6 +4741,15 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>>> return ret;
>>>> }
>>>>
>>>> + /*
>>>> + * must be before kvm_put_nested_state so that HF_SMM_MASK is set
>>>> during
>>>> + * SMM.
>>>> + */
>>>> + ret = kvm_put_vcpu_events(x86_cpu, level);
>>>> + if (ret < 0) {
>>>> + return ret;
>>>> + }
>>>> +
>>>> if (level >= KVM_PUT_RESET_STATE) {
>>>> ret = kvm_put_nested_state(x86_cpu);
>>>> if (ret < 0) {
>>>> @@ -4787,10 +4796,6 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
>>>> if (ret < 0) {
>>>> return ret;
>>>> }
>>>> - ret = kvm_put_vcpu_events(x86_cpu, level);
>>>> - if (ret < 0) {
>>>> - return ret;
>>>> - }
>>>> if (level >= KVM_PUT_RESET_STATE) {
>>>> ret = kvm_put_mp_state(x86_cpu);
>>>> if (ret < 0) {
>>>> --
>>>> 2.41.0
>>>>
>>>
>>>
>>
>
> --
> Vitaly