[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] spapr: Migrate CAS reboot flag
From: |
Greg Kurz |
Subject: |
Re: [PATCH] spapr: Migrate CAS reboot flag |
Date: |
Fri, 17 Jan 2020 16:49:14 +0100 |
On Fri, 17 Jan 2020 13:10:35 +0100
Laurent Vivier <address@hidden> wrote:
> On 17/01/2020 12:49, Greg Kurz wrote:
> > On Wed, 15 Jan 2020 19:26:18 +0100
> > Laurent Vivier <address@hidden> wrote:
> >
> >> On 15/01/2020 19:10, Laurent Vivier wrote:
> >>> Hi,
> >>>
> >>> On 15/01/2020 18:48, Greg Kurz wrote:
> >>>> Migration can potentially race with CAS reboot. If the migration thread
> >>>> completes migration after CAS has set spapr->cas_reboot but before the
> >>>> mainloop could pick up the reset request and reset the machine, the
> >>>> guest is migrated unrebooted and the destination doesn't reboot it
> >>>> either because it isn't aware a CAS reboot was needed (eg, because a
> >>>> device was added before CAS). This likely result in a broken or hung
> >>>> guest.
> >>>>
> >>>> Even if it is small, the window between CAS and CAS reboot is enough to
> >>>> re-qualify spapr->cas_reboot as state that we should migrate. Add a new
> >>>> subsection for that and always send it when a CAS reboot is pending.
> >>>> This may cause migration to older QEMUs to fail but it is still better
> >>>> than end up with a broken guest.
> >>>>
> >>>> The destination cannot honour the CAS reboot request from a post load
> >>>> handler because this must be done after the guest is fully restored.
> >>>> It is thus done from a VM change state handler.
> >>>>
> >>>> Reported-by: Lukáš Doktor <address@hidden>
> >>>> Signed-off-by: Greg Kurz <address@hidden>
> >>>> ---
> >>>>
> >>>
> >>> I'm wondering if the problem can be related with the fact that
> >>> main_loop_should_exit() could release qemu_global_mutex in
> >>> pause_all_vcpus() in the reset case?
> >>>
> >>> 1602 static bool main_loop_should_exit(void)
> >>> 1603 {
> >>> ...
> >>> 1633 request = qemu_reset_requested();
> >>> 1634 if (request) {
> >>> 1635 pause_all_vcpus();
> >>> 1636 qemu_system_reset(request);
> >>> 1637 resume_all_vcpus();
> >>> 1638 if (!runstate_check(RUN_STATE_RUNNING) &&
> >>> 1639 !runstate_check(RUN_STATE_INMIGRATE)) {
> >>> 1640 runstate_set(RUN_STATE_PRELAUNCH);
> >>> 1641 }
> >>> 1642 }
> >>> ...
> >>>
> >>> I already sent a patch for this kind of problem (in current Juan pull
> >>> request):
> >>>
> >>> "runstate: ignore finishmigrate -> prelaunch transition"
> >>>
> >>> but I don't know if it could fix this one.
> >>
> >> I think it should be interesting to have the state transition on source
> >> and destination when the problem occurs (with something like "-trace
> >> runstate_set").
> >>
> >
> > With "-serial mon:stdio -trace runstate_set -trace -trace guest_cpu_reset" :
> >
> > OF stdout device is: /vdevice/vty@71000000
> > Preparing to boot Linux version 4.18.0-80.el8.ppc64le (address@hidden) (gcc
> > version 8.2.1 20180905 (Red Hat 8.2.1-3) (GCC)) #1 SMP Wed Mar 13 11:26:21
> > UTC 2019
> > Detected machine type: 0000000000000101
> > command line: BOOT_IMAGE=/boot/vmlinuz-4.18.0-80.el8.ppc64le
> > root=UUID=012b83a5-2594-48ac-b936-12fec7cdbb9a ro console=ttyS0
> > console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=auto
> > Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
> > Calling ibm,client-architecture-support.
> >
> > Migration starts here.
> >
> > ..qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable:
> > IRQ_XIVE capability must be present for KVM
> > Falling back to kernel-irqchip=off
> >
> > This ^^ indicates that CAS was called and switched to XIVE, for which
> > we lack proper KVM support on GA boston machines.
> >
> > 23348@1579260982.315795:runstate_set current_run_state 9 (running)
> > new_state 7 (finish-migrate)
> > 23348@1579260982.360821:runstate_set current_run_state 7 (finish-migrate)
> > new_state 5 (postmigrate)
> >
> > The migration thread is holding the global QEMU mutex at this point. It
> > has stopped all CPUs. It now streams the full state to the destination
> > before releasing the mutex.
> >
> > 23340@1579260982.797279:guest_cpu_reset cpu=0xf9dbb48a5e0
> > 23340@1579260982.797319:guest_cpu_reset cpu=0xf9dbb4d56a0
> >
> > The main loop regained control and could process the CAS reboot request
> > but it is too late...
> >
> > 23340@1579260982.866071:runstate_set current_run_state 5 (postmigrate)
> > new_state 6 (prelaunch)
>
> Thank you Greg.
>
> So I think the best we can do is to migrate cas_reboot.
>
> To delay the H_CAS call would be cleaner but I don't know if H_BUSY is a
> valid return state and this forces to update SLOF too.
>
Since SLOF is currently the only user of H_CAS, I guess we have some
flexibility with the valid return codes... but anyway, David doesn't
like the idea :)
> Reviewed-by: Laurent Vivier <address@hidden>
>
> Thanks,
> Laurent
>
- [PATCH] spapr: Migrate CAS reboot flag, Greg Kurz, 2020/01/15
- Re: [PATCH] spapr: Migrate CAS reboot flag, Greg Kurz, 2020/01/16
- Re: [PATCH] spapr: Migrate CAS reboot flag, Laurent Vivier, 2020/01/16
- Re: [PATCH] spapr: Migrate CAS reboot flag, Greg Kurz, 2020/01/16
- Re: [PATCH] spapr: Migrate CAS reboot flag, Greg Kurz, 2020/01/16
- Re: [PATCH] spapr: Migrate CAS reboot flag, David Gibson, 2020/01/17
- Re: [PATCH] spapr: Migrate CAS reboot flag, Greg Kurz, 2020/01/17
- Re: [PATCH] spapr: Migrate CAS reboot flag, Greg Kurz, 2020/01/20
- Re: [PATCH] spapr: Migrate CAS reboot flag, David Gibson, 2020/01/20
- Re: [PATCH] spapr: Migrate CAS reboot flag, Greg Kurz, 2020/01/21
- Re: [PATCH] spapr: Migrate CAS reboot flag, David Gibson, 2020/01/22