qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux


From: Murilo Opsfelder Araújo
Subject: Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
Date: Fri, 12 Aug 2022 08:38:59 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.12.0

On 8/12/22 04:26, Claudio Imbrenda wrote:
On Thu, 11 Aug 2022 23:05:52 -0300
Murilo Opsfelder Araújo <muriloo@linux.ibm.com> wrote:

On 8/11/22 11:02, Daniel P. Berrangé wrote:
[...]
Hmm, I was hoping you could just use SIGKILL to guarantee that this
gets killed off.  Is SIGKILL delivered too soon to allow for the
main QEMU process to have exited quickly ?

yes, I tried. qemu has not finished exiting when the signal is
delivered, the cleanup process dies before qemu, which defeats the
purpose

Ok, too bad.

If so I wonder what happens when systemd just delivers SIGKILL to
all processes in the cgroup - I'm not sure there's a guarantee it
will SIGKILL the main qemu before it SIGKILLs this helper

I'm afraid in that case there is no guarantee.

for what it's worth, both virsh shutdown and destroy seem to do things
properly.

Hmm, probably because libvirt tells QEMU to exit before systemd comes
along and tells everything in the cgroup to die with SIGKILL.

It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10
seconds after Libvirt sent SIGTERM:

https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375

but this is fine.

with asynchronous teardown, qemu will exit almost immediately when
receiving SIGTERM, and the cleanup process will start cleaning up.

Under normal and orderly conditions, yes.

So I guess this patch happened to work with Libvirt because the main qemu
process terminated before the timeout and before SIGKILL was delivered.

it seems so


The cleanup process is trying to solve the problem where the main qemu process
takes too long to terminate. However, if the cleanup process itself takes too
long, SIGKILL will be sent by Libvirt anyway.

but that is not a problem, the sole purpose of the cleanup process is
to terminate _after_ qemu. it doesn't matter what happens after qemu
has terminated. if you look at the patch, after going to great lengths
to assure that qemu has terminated, all the child process does is
_exit(0).


Perhaps we can describe this situation in the parameter help, e.g.: If
management layer decides to send SIGKILL (e.g.: due to timeout or deliberate
decision), the cleanup process can exit before the main process, deceiving its
purpose.

if the management layer (or the user) decides to send SIGKILL
immediately to the whole cgroup without sending SIGTERM first, then
this whole asynchronous teardown mechanism is defeated, yes.

This situation is what we likely want to describe in the parameter help. I don't
want to give users the false impression that this option will *always* behave
the manner we expect it to work *most* of the time.

--
Murilo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]