qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up


From: Laszlo Ersek
Subject: Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up
Date: Mon, 25 Jan 2021 22:13:05 +0100

On 01/23/21 23:13, Paolo Bonzini wrote:
> On 22/01/21 22:26, Laszlo Ersek wrote:
>> That seems bogus, per POSIX, given that all signals except SIGUSR2 are
>> included in the mask passed to sigsuspend().
> 
> What happens if you get a SIGSTOP at exactly the wrong time?  (Yeah I
> know how incredibly unlikely that would be).

Nothing; it is impossible to ignore or catch SIGSTOP:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

And therefore it is also impossible to block it (but an attempt to block
it will not cause issues); see e.g.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigsuspend.html

("It is not possible to block signals that cannot be ignored. This is
enforced by the system without causing an error to be indicated.")

So the process will simply stop executing.


The more interesting question is SIGCONT. SIGCONT will always behave
like it should:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html

The interesting part is that it's possible to install a handler for
SIGCONT, which will act *in addition* to the normal SIGCONT effect.

Such a handler is actually useful for (e.g.) restarting timers that may
have been thrown off while the process was not scheduled for a while
(due to SIGSTOP).

If there is no handler installed for SIGCONT, nothing particular happens
when it is delivered (the process doesn't notice).

QEMU does seem to install a SIGCONT handler, in "chardev/char-stdio.c".
But that's not a problem (at least with the patch I had attached)
because the mask that sigsuspend() puts in place, temporarily, for
unblocking SIGUSR2, also blocks SIGCONT (together with everything else).
If a STOP/CONT occurs exactly then, then SIGCONT will resume the
process, but the handler for SIGCONT will run precisely *after*
sigsuspend() -- including the SIGUSR2 handling -- returns, and SIGCONT
becomes unblocked by virtue of sigsuspend() restoring the original mask.

(Side comment: I'm not convinced the existent SIGCONT handling in
"chardev/char-stdio.c" is safe, regardless of the topic at hand --
SIGCONT does not seem to be masked for all threads except one, and so
any thread could execute term_stdio_handler(). While the only called
function in that is tcsetattr(), which is async signal safe, I don't
know whether accessing "stdio_echo_state" and "stdio_allow_signal" is
also safe. It may be.)


BTW, the signal(7) Linux manual documents spurious wakeups for
sigtimedwait(2) and sigwaitinfo(2), indeed in connection with SIGCONT.
However, sigsuspend() is not listed in that section (I had checked):

https://man7.org/linux/man-pages/man7/signal.7.html

   Interruption of system calls and library functions by stop signals
       On Linux, even in the absence of signal handlers, certain
       blocking interfaces can fail with the error EINTR after the
       process is stopped by one of the stop signals and then resumed
       via SIGCONT.  This behavior is not sanctioned by POSIX.1, and
       doesn't occur on other systems.

       The Linux interfaces that display this behavior are:

       * "Input" socket interfaces, when a timeout (SO_RCVTIMEO) has
         been set on the socket using setsockopt(2): accept(2), recv(2),
         recvfrom(2), recvmmsg(2) (also with a non-NULL timeout
         argument), and recvmsg(2).

       * "Output" socket interfaces, when a timeout (SO_RCVTIMEO) has
         been set on the socket using setsockopt(2): connect(2),
         send(2), sendto(2), and sendmsg(2), if a send timeout
         (SO_SNDTIMEO) has been set.

       * epoll_wait(2), epoll_pwait(2).

       * semop(2), semtimedop(2).

       * sigtimedwait(2), sigwaitinfo(2).

       * Linux 3.7 and earlier: read(2) from an inotify(7) file
         descriptor

       * Linux 2.6.21 and earlier: futex(2) FUTEX_WAIT,
         sem_timedwait(3), sem_wait(3).

       * Linux 2.6.8 and earlier: msgrcv(2), msgsnd(2).

       * Linux 2.4 and earlier: nanosleep(2).

If other host OSes that QEMU supports have similar non-conformance
notes, and sigsuspend() is affected on them, then removing the loop
around sigsuspend() is unsafe.

I only know where to check POSIX and the Linux manuals.


> BTW if we are in a mood for cleanup, there's no reason to use
> pthread_key_t instead of __thread + qemu_thread_atexit_add (adding a
> Notifier to struct CoroutineThreadState).  That would fix the issue with
> async-signal safety of pthread_getspecific.
> 
> (It makes sense for the function not to be async-signal safe since it
> can in principle allocate memory for the data.  In practice it's most
> likely okay if the function has been called before on this thread).

This is a good idea, but with the patch I had attached, it's not urgent
-- pthread_getspecific is only called from a safe context. Per POSIX,
there is a problem only when an unsafe function is executed while the
signal interrupts an unsafe function:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03

    In the presence of signals, all functions defined by this volume of
    POSIX.1-2017 shall behave as defined when called from or interrupted
    by a signal-catching function, with the exception that when a signal
    interrupts an unsafe function or equivalent (such as the processing
    equivalent to exit() performed after a return from the initial call
    to main()) and the signal-catching function calls an unsafe
    function, the behavior is undefined.

And with the proposed patch, the only function that can be interrupted
by SIGUSR2 is sigsuspend(), and sigsuspend() is listed in the table of
safe functions.

Thanks
Laszlo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]