qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Divide error in kvm_unlock_kick()


From: Chris Webb
Subject: [Qemu-devel] Divide error in kvm_unlock_kick()
Date: Mon, 2 Jun 2014 19:11:11 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

Running a 3.14.4 x86-64 SMP guest kernel on qemu-2.0, with kvm enabled and
-cpu host on a 3.14.4 AMD Opteron host, I'm seeing a reliable kernel panic from
the guest shortly after boot. I think is happening in kvm_unlock_kick() in the
guest kernel paravirt_ops code:

divide error: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1013 Comm: mkdir Not tainted 3.14.4-guest #21
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
task: ffff88007c8cf400 ti: ffff88007c7c6000 task.ti: ffff88007c7c6000
RIP: 0010:[<ffffffff8102ea86>]  [<ffffffff8102ea86>] kvm_unlock_kick+0x69/0x73
RSP: 0000:ffff88007fc83ca8  EFLAGS: 00010046
RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000002
RDX: 0000000000000002 RSI: ffff88007fd11d40 RDI: ffffffff8198f840
RBP: ffff88007fc83cc0 R08: 0000000000000000 R09: ffffffff8198f840
R10: 000000000000b5e0 R11: 0000000000000005 R12: ffff88007fd11d40
R13: 000000000000cec0 R14: ffff88007d382b80 R15: 0000000000000002
FS:  00007f4c6e265700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4c6dc9a080 CR3: 000000007c62e000 CR4: 00000000000406e0
Stack:
 0000000000011d40 ffff88007fd11d40 0000000000000002 ffff88007fc83cd0
 ffffffff815852d0 ffff88007fc83d20 ffffffff810dd694 ffff88007fd00000
 0000000000000046 ffff88007d383172 ffff88007d3abe68 0000000000000003
Call Trace:
 <IRQ>
 [<ffffffff815852d0>] _raw_spin_unlock+0x36/0x5b
 [<ffffffff810dd694>] try_to_wake_up+0x1f4/0x217
 [<ffffffff810dd6f6>] default_wake_function+0xd/0xf
 [<ffffffff810e99f0>] autoremove_wake_function+0xd/0x2f
 [<ffffffff810e944f>] __wake_up_common+0x50/0x7c
 [<ffffffff810e962f>] __wake_up+0x34/0x46
 [<ffffffff810f3b45>] rsp_wakeup+0x1c/0x1e
 [<ffffffff81112e31>] irq_work_run+0x77/0x9b
 [<ffffffff810063e2>] smp_irq_work_interrupt+0x2a/0x31
 [<ffffffff8158739d>] irq_work_interrupt+0x6d/0x80
 [<ffffffff81585336>] ? _raw_spin_unlock_irqrestore+0x41/0x6a
 [<ffffffff810f5402>] rcu_process_callbacks+0x162/0x486
 [<ffffffff810c4140>] ? run_timer_softirq+0x19f/0x1c0
 [<ffffffff810be612>] __do_softirq+0xe1/0x1e9
 [<ffffffff810be8b7>] irq_exit+0x40/0x87
 [<ffffffff810283f1>] smp_apic_timer_interrupt+0x3f/0x4b
 [<ffffffff81586e9d>] apic_timer_interrupt+0x6d/0x80
 <EOI>
Code: c5 40 50 87 81 49 8d 44 0d 00 48 8b 30 4c 39 e6 75 c9 8a 40 08 38 d8 75 
c2 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 <0f> 01 c1 5b 41 5c 41 
5d 5d c3 4c 8d 54 24 08 48 83 e4 f0 b9 0a
RIP  [<ffffffff8102ea86>] kvm_unlock_kick+0x69/0x73
 RSP <ffff88007fc83ca8>
---[ end trace ed563ea2dedc59b5 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Shutting down cpus with NMI
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 
0xffffffff80000000-0xffffffff9fffffff)

My host kernel config is http://cdw.me.uk/tmp/host-config.txt and the guest
config is http://cdw.me.uk/tmp/guest-config.txt with qemu command line:

 qemu-system-x86 -enable-kvm -cpu qemu64 -machine q35 -m 2048 -name $1 \
   -smp sockets=1,cores=4 -pidfile /run/$1.pid -runas nobody \
   -serial stdio -vga none -vnc none -kernel /boot/vmlinuz-guest \
   -append "console=ttyS0 root=/dev/vda" \
   -drive file=/dev/guest/$1,cache=none,format=raw,if=virtio \
   -device virtio-net-pci,netdev=nic,mac=$(< /sys/class/net/$1/address) \
   -netdev tap,id=nic,fd=3 3<>/dev/tap$(< /sys/class/net/$1/ifindex)

I can stop this crash by disabling CONFIG_PARAVIRT_SPINLOCKS in my guest
kernel, running with -cpu qemu64 instead of -cpu host, or running with -smp 1
instead of -smp 4. (Removing/changing the -machine q35 makes no difference.)

I tried enabling CONFIG_PARAVIRT_DEBUG, but no extra information was reported.

My CPU flags inside the crashing guest look like this:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm rep_good nopl
extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave
avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw xop fma4 tbm arat npt nrip_save tsc_adjust bmi1

whereas in a (working) -cpu qemu64 guest, they look like this:

fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx
fxsr sse sse2 ht syscall nx lm nopl pni cx16 x2apic popcnt hypervisor lahf_lm
cmp_legacy svm abm sse4a

Full /proc/cpuinfo output from host and guest are at

  http://cdw.me.uk/tmp/host-cpuinfo.txt
  http://cdw.me.uk/tmp/guest-cpuinfo.txt

I thought I'd try to bisect on processor flags to see which were implicated.
The extra flags from -cpu host compared to -cpu qemu64 are:

3dnowprefetch aes arat avx bmi1 cr8_legacy extd_apicid f16c fma fma4
fxsr_opt misalignsse mmxext npt nrip_save osvw pclmulqdq pdpe1gb rep_good
sse4_1 sse4_2 ssse3 tbm tsc_adjust vme xop xsave

I can add all of these to -cpu qemu64 with the +FLAG,... syntax and obtain a
working guest, but qemu doesn't recognise a handful of them:

CPU feature tsc_adjust not found
CPU feature arat not found
CPU feature cr8_legacy not found
CPU feature extd_apicid not found
CPU feature rep_good not found
CPU feature tsc_adjust not found
Failed to access perfctr msr (MSR c0010001 is ffffffffffffffff)
[...]

Doing this results in a working, non-crashing guest, which suggests the
behaviour is triggered by one of tsc_adjust, arat, cr8_legacy, extd_apicid
or rep_good. However, because qemu doesn't recognise the flags, I can't run
with -cpu host,-tsc_adjust,-arat,... to investigate further. :(

Very happy to do any testing at my end which might help track down what's going
on here.

Best wishes,

Chris.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]