qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 16/45] vl: Add option to avoid stopping VM upon guest panic


From: Daniel P . Berrangé
Subject: Re: [PULL 16/45] vl: Add option to avoid stopping VM upon guest panic
Date: Wed, 20 Jan 2021 13:54:59 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

On Wed, Jan 20, 2021 at 12:28:14AM -0500, Alejandro Jimenez wrote:
> 
> 
> On 1/19/2021 4:34 PM, Peter Maydell wrote:
> > On Tue, 15 Dec 2020 at 18:11, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> > > 
> > > The current default action of pausing a guest after a panic event
> > > is received leaves the responsibility to resume guest execution to the
> > > management layer. The reasons for this behavior are discussed here:
> > > https://lore.kernel.org/qemu-devel/52148F88.5000509@redhat.com/
> > > 
> > > However, in instances like the case of older guests (Linux and
> > > Windows) using a pvpanic device but missing support for the
> > > PVPANIC_CRASHLOADED event, and Windows guests using the hv-crash
> > > enlightenment, it is desirable to allow the guests to continue
> > > running after sending a PVPANIC_PANICKED event. This allows such
> > > guests to proceed to capture a crash dump and automatically reboot
> > > without intervention of a management layer.
> > > 
> > > Add an option to avoid stopping a VM after a panic event is received,
> > > by passing:
> > > 
> > > -action panic=none
> > > 
> > > in the command line arguments, or during runtime by using an upcoming
> > > QMP command.
> > Hi. This commit message doesn't say it's changing the default
> > action, but the change does:
> > 
> > > @@ -3899,6 +3899,8 @@ DEF("action", HAS_ARG, QEMU_OPTION_action,
> > >       "                   action when guest reboots [default=none]\n"
> > >       "-action shutdown=poweroff|pause\n"
> > >       "                   action when guest shuts down 
> > > [default=poweroff]\n"
> > > +    "-action panic=poweroff|pause|none\n"
> > > +    "                   action when guest panics [default=poweroff]\n"
> > >       "-action 
> > > watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none\n"
> > >       "                   action when watchdog fires [default=reset]\n",
> > >       QEMU_ARCH_ALL)
> > >   RebootAction reboot_action = REBOOT_ACTION_NONE;
> > >   ShutdownAction shutdown_action = SHUTDOWN_ACTION_POWEROFF;
> > > +PanicAction panic_action = PANIC_ACTION_POWEROFF;
> > We used to default to 'pause' and now we default to 'poweroff'.
> Hi Peter.
> 
> My rationale for setting the panic action to 'poweroff' was to keep the
> default behavior of QEMU when '-no-shutdown' is not specified, and a panic
> occurs. I believe that in order to accomplish that, the default panic action
> should still be 'poweroff', but as you point out there is an instance where
> the behavior changes. Specifically, when '-no-shutdown' is not used there is
> now one fewer QMP event issued when a guest panic is detected, before
> stopping the VM and powering off.
> 
> I tried to account for this scenario in the original patches, but I failed
> to catch the problem after the rebase when the changes were merged. I'll
> test and send a fix for this issue in the next few days.
> 
> > 
> > We noticed this because it broke an in-flight test case for
> > the pvpanic-pci device from Mihai (which was expecting to see
> > the device in 'pause' state and found it was now in 'poweroff').
> The test is just checking for the arrival of the QMP event, and not actually
> expecting the VM to be paused, correct? I see that if a test/management app
> is expecting to receive a GUEST_PANICKED event with the specific 'pause'
> action, then it might be confused. But any such tests would only be able to
> check for the arrival of the QMP event, and not actually expect to issue any
> commands to a paused VM, since the next block of code in QEMU immediately
> powers off and shutdowns when '-no-shutdown' is not requested. This was the
> typical behavior before the patches.
> 
> > Test cases aren't very exciting, but was it really intentional
> > to change the default behaviour?
> My intention was to preserve the default behavior. Perhaps Paolo wanted to
> reduce the number of GUEST_PANICKED events by removing the one with 'pause'
> action? You could consider it superfluous since it is immediately followed
> by another indicating the 'poweroff' action... Unless I hear otherwise from
> either of you, I'll work on a fix to keep the same number and type of events
> sent.

Why would pause be immediately followed by poweroff ? These are independant
actions, and the mgmt app should be deciding what todo next after the
pause action. It may wish to capture a guest crash image before poweroff.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]