qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

The linux guest rebooted automatically running after each 2~3 days


From: lixiaofeng li
Subject: The linux guest rebooted automatically running after each 2~3 days
Date: Mon, 26 Dec 2022 19:11:43 -0800

Hello,
   I am running linux on qemu-kvm. After each 2~3 days, the guest will be rebooted automatically without any error output on the console.
The host cpu is 'AMD EPYC 7702P 64-Core Processor'. 
The guest emulated cpu info is:
root@GPON-8R2:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : QEMU Virtual CPU version 2.5+
stepping        : 3
microcode       : 0x1000065
cpu MHz         : 1999.951
cache size      : 512 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm nopl cpuid pni cx16 x2apic hypervisor lahf_lm 3dnowprefetch vmmcall
bugs            : fxsave_leak sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 3999.90
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

After 2~3 days without any activates , the guest system rebooted, below is console logs.
1) first reboot after 2~3 days:
root@GPON-8R2:/FLASH/persist/logs# [    0.000000] ACPI BIOS Error (bug): A valid RSDP was not found (20170728/tbxfroot-244)
[    0.109000] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010000 is 530076)
[    0.109852] CALIX: CLEAR mate_int_out on kernel fault
[    0.111061] CALIX: CLEAR mate_int_out on kernel fault
[    0.859274]  Begin Initial RAM boot
udev: ###### TODO Add vfat fsck check #####
e2fsck 1.43.8 (1-Jan-2018)
flash: recovering journal

2) The kernel oops happened during above reboot, which triggered another reboot

created directory: '/etc/schema'
[   38.003278] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[   38.003278] Do you have a strange power saving mode enabled?
[   38.003279] CALX: reset some HW before panic
[   38.003305] Kernel panic - not syncing: NMI: Not continuing
[   38.003307] CPU: 0 PID: 2852 Comm: xmllint Tainted: G           O    4.14.67-yocto-standard #1
[   38.003308] Hardware name: Kontron COMe-bBD6/COMe-bBD6, BIOS 2018.07-00152-g966b4aa 10/12/2018
[   38.003308] Call Trace:
[   38.003318]  dump_stack+0x4d/0x71
[   38.003320]  panic+0xde/0x238
[   38.003322]  ? vprintk_func+0x2e/0x60
[   38.003323]  nmi_panic+0x39/0x40
[   38.003324]  unknown_nmi_error+0x77/0x90
[   38.003325]  default_do_nmi+0xdd/0x100
[   38.003326]  do_nmi+0xe0/0x130
[   38.003328]  nmi+0x8b/0xd4
[   38.003329] RIP: 0033:0x36ab2eec74
[   38.003330] RSP: 002b:00007ffcbca1cd20 EFLAGS: 00000216
[   38.003331] RAX: 000000000005729d RBX: 0000000001841020 RCX: 0000000000c3455e
[   38.003331] RDX: 0000000000005e60 RSI: 0000000000000015 RDI: 0000000000c3455e
[   38.003332] RBP: 00000000008106e0 R08: 0000000000000003 R09: 0000000000000009
[   38.003332] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000620f0cf3
[   38.003333] R13: 0000000000810a50 R14: 0000000000000015 R15: 00007ffcbca1edae
[   38.005740] Kernel Offset: 0x37200000 from 0xffffffff80200000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

The qemu cpu_reset dump is below:
[root@e7-gpon8r2_11 qemu]# cat qemu-log.txt
CPU Reset (CPU 0)
RAX=00000000fffffffe RBX=0000000000010000 RCX=000c77334b86101c RDX=00000000000186b4
RSI=0000000000000000 RDI=00000000000186aa RBP=ffffb2420022be30 RSP=ffffb2420022be00
R8 =0000000000000000 R9 =0000000000000018 R10=000000000000007b R11=2e2e73646e6f6365
R12=000000000000000a R13=00000000fffffffe R14=ffffffffb8209f80 R15=0000000000000000
RIP=ffffffffb743ca3d RFL=00000016 [----AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 00000000 00000000
DS =0000 0000000000000000 00000000 00000000
FS =0000 00007f909023a740 00000000 00000000
GS =0000 ffff9d4a37c00000 00000000 00000000
LDT=0000 fffffe0000000000 00000000 00000000
TR =0040 fffffe0000003000 0000206f 00008b00 DPL=0 TSS64-busy
GDT=     fffffe0000001000 0000007f
IDT=     fffffe0000000000 00000fff
CR0=80050033 CR2=0000000001841018 CR3=00000000329b0000 CR4=000006f0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000000 CCD=0000000000000000 CCO=DYNAMIC
EFER=0000000000000d01
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=c748f27200000000 401d FPR7=f000000000000000 4002
XMM00=00000000000000000000000000000000 XMM01=0000000000000000ffffffffffffffff
XMM02=72637374696e6900656c626168637461 XMM03=00637374696e6900656c626168637461
XMM04=706d695f747261747365720074706972 XMM05=767265746e695f747261747365725f78
XMM06=2020202020202020202020200a3e6c61 XMM07=6e69733c202020202020202020202020
XMM08=2020202020202020202020200a3e6576 XMM09=00000000000000000000000000000000
XMM10=00000000000000000000000000000000 XMM11=00000000000000000000000000000000
XMM12=00000000000000000000000000000000 XMM13=00000000000000000000000000000000
XMM14=00000000000000000000000000000000 XMM15=00000000000000000000000000000000

I just started working on qemu virtualization, so my question may be stupid . I would Appreciate it if someone could answer my questions .  
1. How can I debug for the first reboot since there is not any error print before reboot?
2. For the error on second reboot, I found below fix, but it seems it doesn't work
https://www.spinics.net/lists/kvm/msg297192.html
   On qemu cpu dump, what does 'RIP=ffffffffb743ca3d' mean? Is it the address on guest os ? How can I find the location of this address ?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]