qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux


From: Claudio Imbrenda
Subject: Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
Date: Fri, 12 Aug 2022 13:45:27 +0200

On Fri, 12 Aug 2022 08:38:59 -0300
Murilo Opsfelder Araújo <muriloo@linux.ibm.com> wrote:

> On 8/12/22 04:26, Claudio Imbrenda wrote:
> > On Thu, 11 Aug 2022 23:05:52 -0300
> > Murilo Opsfelder Araújo <muriloo@linux.ibm.com> wrote:
> >  
> >> On 8/11/22 11:02, Daniel P. Berrangé wrote:
> >> [...]  
> >>>>> Hmm, I was hoping you could just use SIGKILL to guarantee that this
> >>>>> gets killed off.  Is SIGKILL delivered too soon to allow for the
> >>>>> main QEMU process to have exited quickly ?  
> >>>>
> >>>> yes, I tried. qemu has not finished exiting when the signal is
> >>>> delivered, the cleanup process dies before qemu, which defeats the
> >>>> purpose  
> >>>
> >>> Ok, too bad.
> >>>  
> >>>>> If so I wonder what happens when systemd just delivers SIGKILL to
> >>>>> all processes in the cgroup - I'm not sure there's a guarantee it
> >>>>> will SIGKILL the main qemu before it SIGKILLs this helper  
> >>>>
> >>>> I'm afraid in that case there is no guarantee.
> >>>>
> >>>> for what it's worth, both virsh shutdown and destroy seem to do things
> >>>> properly.  
> >>>
> >>> Hmm, probably because libvirt tells QEMU to exit before systemd comes
> >>> along and tells everything in the cgroup to die with SIGKILL.  
> >>
> >> It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10
> >> seconds after Libvirt sent SIGTERM:
> >>
> >> https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375
> >>   
> >
> > but this is fine.
> >
> > with asynchronous teardown, qemu will exit almost immediately when
> > receiving SIGTERM, and the cleanup process will start cleaning up.  
> 
> Under normal and orderly conditions, yes.
> 
> >> So I guess this patch happened to work with Libvirt because the main qemu
> >> process terminated before the timeout and before SIGKILL was delivered.  
> >
> > it seems so
> >  
> >>
> >> The cleanup process is trying to solve the problem where the main qemu 
> >> process
> >> takes too long to terminate. However, if the cleanup process itself takes 
> >> too
> >> long, SIGKILL will be sent by Libvirt anyway.  
> >
> > but that is not a problem, the sole purpose of the cleanup process is
> > to terminate _after_ qemu. it doesn't matter what happens after qemu
> > has terminated. if you look at the patch, after going to great lengths
> > to assure that qemu has terminated, all the child process does is
> > _exit(0).
> >  
> >>
> >> Perhaps we can describe this situation in the parameter help, e.g.: If
> >> management layer decides to send SIGKILL (e.g.: due to timeout or 
> >> deliberate
> >> decision), the cleanup process can exit before the main process, deceiving 
> >> its
> >> purpose.  
> >
> > if the management layer (or the user) decides to send SIGKILL
> > immediately to the whole cgroup without sending SIGTERM first, then
> > this whole asynchronous teardown mechanism is defeated, yes.  
> 
> This situation is what we likely want to describe in the parameter help. I 
> don't
> want to give users the false impression that this option will *always* behave
> the manner we expect it to work *most* of the time.

fair enough, I'll improve the documentation

> 
> --
> Murilo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]