qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up


From: Max Reitz
Subject: Re: [PATCH] coroutine-sigaltstack: Keep SIGUSR2 handler up
Date: Fri, 22 Jan 2021 19:05:12 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0

On 22.01.21 18:09, Laszlo Ersek wrote:
On 01/22/21 11:20, Max Reitz wrote:
Modifying signal handlers is a process-global operation.  When two
threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently,
they may interfere with each other: One of them may revert the SIGUSR2
handler back to the default between the other thread setting up
coroutine_trampoline() as the handler and raising SIGUSR2.  That SIGUSR2
will then lead to the process exiting.

Outside of coroutine-sigaltstack, qemu does not use SIGUSR2.  We can
thus keep the signal handler installed all the time.
CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its
stack is set up so a new coroutine is to be launched (i.e., it should
invoke sigsetjmp()), or not (i.e., the signal came from an external
source and we should just perform the default action, which is to exit
the process).

Note that in user-mode emulation, the guest can register signal handlers
for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2
handler, sigaltstack coroutines will break from then on.  However, we do
not use coroutines for user-mode emulation, so that is fine.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
  util/coroutine-sigaltstack.c | 56 +++++++++++++++++++-----------------
  1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index aade82afb8..2d32afc322 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -59,6 +59,8 @@ typedef struct {
static pthread_key_t thread_state_key; +static void coroutine_trampoline(int signal);
+
  static CoroutineThreadState *coroutine_get_thread_state(void)
  {
      CoroutineThreadState *s = pthread_getspecific(thread_state_key);
@@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque)
static void __attribute__((constructor)) coroutine_init(void)
  {
+    struct sigaction sa;
      int ret;
ret = pthread_key_create(&thread_state_key, qemu_coroutine_thread_cleanup);
@@ -87,6 +90,20 @@ static void __attribute__((constructor)) coroutine_init(void)
          fprintf(stderr, "unable to create leader key: %s\n", strerror(errno));
          abort();
      }
+
+    /*
+     * Establish the SIGUSR2 signal handler.  This is a process-wide
+     * operation, and so will apply to all threads from here on.
+     */
+    sa = (struct sigaction) {
+        .sa_handler = coroutine_trampoline,
+        .sa_flags   = SA_ONSTACK,
+    };
+
+    if (sigaction(SIGUSR2, &sa, NULL) != 0) {
+        perror("Unable to install SIGUSR2 handler");
+        abort();
+    }
  }
/* "boot" function
@@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal)
      /* Get the thread specific information */
      coTS = coroutine_get_thread_state();
      self = coTS->tr_handler;
+
+    if (!self) {
+        /*
+         * This SIGUSR2 came from an external source, not from
+         * qemu_coroutine_new(), so perform the default action.
+         */
+        exit(0);
+    }
+
      coTS->tr_called = 1;
+    coTS->tr_handler = NULL;
      co = &self->base;
/*

(8) There's a further complication here, assuming we really want to
recognize the case when the handler is executing unexpectedly:

- pthread_getspecific() is not necessarily async-signal-safe, according
to POSIX, so calling coroutine_get_thread_state() in the "unexpected"
case (e.g. in response to an asynchronously generated SIGUSR2) is
problematic in its own right,

That’s a shame.

- if the SIGUSR2 is delivered to a thread that has never called
coroutine_get_thread_state() before, then we'll reach g_malloc0() inside
coroutine_get_thread_state(), in signal handler context, which is very bad.

Could be solved with a coroutine_try_get_thread_state() that will never malloc, but return NULL then.

You'd have to block SIGUSR2 for the entire process (all threads) at all
times, and only temporarily unblock it for a particular coroutine
thread, with the sigsuspend(). The above check would suffice, that way.

Yes, that’s what I was originally afraid of. I feel like that may be the complexity drop that pushes this change too far out of my comfort zone. (And as evidenced by your review, it already was pretty much outside as it was.)

Such blocking is possible by calling pthread_sigmask() from the main
thread, before any other thread is created (the signal mask is inherited
across pthread_create()). I guess it could be done in coroutine_init() too.

And *then* the pthread_sigmask() calls should indeed be removed from
qemu_coroutine_new().

OTOH, that does sound rather simple...

(Apologies if my feedback is difficult to understand, it's my fault. I
could propose a patch, if (and only if) you want that.)

I can’t say I wouldn’t be happy with a patch for this code that doesn’t bear my S-o-b. ;)

I feel conflicted. I can send a v2 that addresses this (probably consisting of multiple patches then, e.g. I’d split the SIGUSR2 blocking off the main patch), but to me, this bug is really more of a nuisance that just blocks me from sending a pull request for my block branch... So I’d rather not drag it out forever. OTOH, sending a quick and bad fix just because I can’t wait is just bad.

I suppose I’ll have to decide over the weekend. Though if you’re itching to write a patch yourself, I’d definitely be grateful.

Max




reply via email to

[Prev in Thread] Current Thread [Next in Thread]