qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux


From: Markus Armbruster
Subject: Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
Date: Tue, 30 Aug 2022 08:32:22 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Please excuse my late reply; I was on vacation.

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Aug 09, 2022 at 08:40:24AM +0200, Claudio Imbrenda wrote:
>> This patch adds support for asynchronously tearing down a VM on Linux.
>> 
>> When qemu terminates, either naturally or because of a fatal signal,
>> the VM is torn down. If the VM is huge, it can take a considerable
>> amount of time for it to be cleaned up. In case of a protected VM, it
>> might take even longer than a non-protected VM (this is the case on
>> s390x, for example).
>> 
>> Some users might want to shut down a VM and restart it immediately,
>> without having to wait. This is especially true if management
>> infrastructure like libvirt is used.
>> 
>> This patch implements a simple trick on Linux to allow qemu to return
>> immediately, with the teardown of the VM being performed
>> asynchronously.
>> 
>> If the new commandline option -async-teardown is used, a new process is
>> spawned from qemu at startup, using the clone syscall, in such way that
>> it will share its address space with qemu.
>> 
>> The new process will have the name "cleanup/<QEMU_PID>". It will wait
>> until qemu terminates, and then it will exit itself.
>> 
>> This allows qemu to terminate quickly, without having to wait for the
>> whole address space to be torn down. The teardown process will exit
>> after qemu, so it will be the last user of the address space, and
>> therefore it will take care of the actual teardown.
>> 
>> The teardown process will share the same cgroups as qemu, so both
>> memory usage and cpu time will be accounted properly.
>> 
>> This feature can already be used with libvirt by adding the following
>> to the XML domain definition to pass the parameter to qemu directly:
>> 
>>   <commandline xmlns="http://libvirt.org/schemas/domain/qemu/1.0";>
>>   <arg value='-async-teardown'/>
>>   </commandline>
>> 
>> More advanced interfaces like pidfd or close_range have intentionally
>> been avoided in order to be more compatible with older kernels.
>> 
>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

[...]

>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 3f23a42fa8..d434353159 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -4743,6 +4743,23 @@ HXCOMM Internal use
>>  DEF("qtest", HAS_ARG, QEMU_OPTION_qtest, "", QEMU_ARCH_ALL)
>>  DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", QEMU_ARCH_ALL)
>>  
>> +#ifdef __linux__
>> +DEF("async-teardown", 0, QEMU_OPTION_asyncteardown,
>> +    "-async-teardown enable asynchronous teardown\n",
>> +    QEMU_ARCH_ALL)
>> +#endif
>> +SRST
>> +``-async-teardown``
>> +    Enable asynchronous teardown. A new teardown process will be
>> +    created at startup, using clone. The teardown process will share
>> +    the address space of the main qemu process, and wait for the main
>> +    process to terminate. At that point, the teardown process will
>> +    also exit. This allows qemu to terminate quickly if the guest was
>> +    huge, leaving the teardown of the address space to the teardown
>> +    process. Since the teardown process shares the same cgroups as the
>> +    main qemu process, accounting is performed correctly.
>> +ERST
>> +
>>  DEF("msg", HAS_ARG, QEMU_OPTION_msg,
>>      "-msg [timestamp[=on|off]][,guest-name=[on|off]]\n"
>>      "                control error message format\n"
>
> It occurrs to me that we've got a general goal of getting away from
> adding new top level command line arguments. Most of the time there's
> an obvious existing place to put them, but I'm really not sure
> where this particular  option would fit ?
>
> it isn't tied to any aspect of the VM backend configuration nor
> hardware frontends.
>
> The closest match is the lifecycle action option (-no-shutdown)
> which were merged into a -action arg, but even that's a bit of a
> stretch.

If I understand the proposed new option correctly, it modifies how QEMU
terminates, independent of why it terminates.  Could be guest reboot
with -action reboot-shutdown, monitor command quit, SIGTERM, ...

I agree putting it under -action would be a bit of a stretch, as so far
-action is entirely about configuring the reaction to guest certain
actions:

    -action reboot=reset|shutdown
                       action when guest reboots [default=reset]
    -action shutdown=poweroff|pause
                       action when guest shuts down [default=poweroff]
    -action panic=pause|shutdown|exit-failure|none
                       action when guest panics [default=shutdown]
    -action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none
                       action when watchdog fires [default=reset]

A different stretch: -daemonize, -runas, -chroot.  These modify how QEMU
starts.  They too are "top-level".

> Markus/Paolo:  do you have suggestions ?

Ramblings^WThoughts, not actionable suggestions, I'm afraid.

[...]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]