qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 0/6] migration/multifd: Fix channel creation vs. cleanup r


From: Peter Xu
Subject: Re: [PATCH v3 0/6] migration/multifd: Fix channel creation vs. cleanup races
Date: Wed, 7 Feb 2024 10:29:43 +0800

On Tue, Feb 06, 2024 at 06:51:12PM -0300, Fabiano Rosas wrote:
> Based-on: 20240202102857.110210-1-peterx@redhat.com
> [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups
> 20240202102857.110210-1-peterx@redhat.com">https://lore.kernel.org/r/20240202102857.110210-1-peterx@redhat.com
> 
> Hi,
> 
> For v3 I fixed the refcounting issue spotted by Avihai. The situation
> there is a bit clunky due to historical reasons. The gist is that we
> have an assumption that channel creation never fails after p->c has
> been set, so when 'p->c == NULL' we have to unref and when 'p->c !=
> NULL' the cleanup code will do the unref.

Yes, this looks good to me.  That's a good catch.

I'll leave at least one more day for Avihai and/or Dan to have another
look.  My r-b persist as of now on patch 5.

Actually I think the conditional unref is slightly tricky, but it's not its
own fault, IMHO, OTOH it's more about a1af605bd5ad where p->c is slightly
abused.  My understanding is we can avoid that conditional unref with below
patch 1 as a cleanup (on top of this series).  Then patch 2 comes all
alongside.

We don't need to rush on these, though, we should fix the thread race first
because multiple of us hit it, and all cleanups can be done later.

=====
>From 0830819d86e08c5175d6669505aa712a0a09717f Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 7 Feb 2024 10:08:35 +0800
Subject: [PATCH 1/2] migration/multifd: Cleanup TLS iochannel referencing

Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to
blocking handshake") introduced a thread for TLS channels, which will
resolve the issue on blocking the main thread.  However in the same commit
p->c is slightly abused just to be able to pass over the pointer "p" into
the thread.

That's the major reason we'll need to conditionally free the io channel in
the fault paths.

To clean it up, using a separate structure to pass over both "p" and "tioc"
in the tls handshake thread.  Then we can make it a rule that p->c will
never be set until the channel is completely setup.  With that, we can drop
the tricky conditional unref of the io channel in the error path.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/multifd.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index adfe8c9a0a..4a85a6b7b3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -873,16 +873,22 @@ out:
 
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque);
 
+typedef struct {
+    MultiFDSendParams *p;
+    QIOChannelTLS *tioc;
+} MultiFDTLSThreadArgs;
+
 static void *multifd_tls_handshake_thread(void *opaque)
 {
-    MultiFDSendParams *p = opaque;
-    QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);
+    MultiFDTLSThreadArgs *args = opaque;
 
-    qio_channel_tls_handshake(tioc,
+    qio_channel_tls_handshake(args->tioc,
                               multifd_new_send_channel_async,
-                              p,
+                              args->p,
                               NULL,
                               NULL);
+    g_free(args);
+
     return NULL;
 }
 
@@ -892,6 +898,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 {
     MigrationState *s = migrate_get_current();
     const char *hostname = s->hostname;
+    MultiFDTLSThreadArgs *args;
     QIOChannelTLS *tioc;
 
     tioc = migration_tls_client_create(ioc, hostname, errp);
@@ -906,11 +913,14 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
     object_unref(OBJECT(ioc));
     trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
     qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
-    p->c = QIO_CHANNEL(tioc);
+
+    args = g_new0(MultiFDTLSThreadArgs, 1);
+    args->tioc = tioc;
+    args->p = p;
 
     p->tls_thread_created = true;
     qemu_thread_create(&p->tls_thread, "multifd-tls-handshake-worker",
-                       multifd_tls_handshake_thread, p,
+                       multifd_tls_handshake_thread, args,
                        QEMU_THREAD_JOINABLE);
     return true;
 }
@@ -923,6 +933,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 
     migration_ioc_register_yank(ioc);
     p->registered_yank = true;
+    /* Setup p->c only if the channel is completely setup */
     p->c = ioc;
 
     p->thread_created = true;
@@ -976,14 +987,12 @@ out:
 
     trace_multifd_new_send_channel_async_error(p->id, local_err);
     multifd_send_set_error(local_err);
-    if (!p->c) {
-        /*
-         * If no channel has been created, drop the initial
-         * reference. Otherwise cleanup happens at
-         * multifd_send_channel_destroy()
-         */
-        object_unref(OBJECT(ioc));
-    }
+    /*
+     * For error cases (TLS or non-TLS), IO channel is always freed here
+     * rather than when cleanup multifd: since p->c is not set, multifd
+     * cleanup code doesn't even know its existance.
+     */
+    object_unref(OBJECT(ioc));
     error_free(local_err);
 }
 
-- 
2.43.0
=====
>From 9e574c3216f6459e3a808096d905e2554d962cad Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Wed, 7 Feb 2024 10:24:39 +0800
Subject: [PATCH 2/2] migration/multifd: Drop registered_yank

With a clear definition of p->c protocol, where we only set it up if the
channel is fully established (TLS or non-TLS), registered_yank boolean will
have equal meaning of "p->c != NULL".

Drop registered_yank by checking p->c instead.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/multifd.h | 2 --
 migration/multifd.c | 7 +++----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 8a1cad0996..b3fe27ae93 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -78,8 +78,6 @@ typedef struct {
     bool tls_thread_created;
     /* communication channel */
     QIOChannel *c;
-    /* is the yank function registered */
-    bool registered_yank;
     /* packet allocated len */
     uint32_t packet_len;
     /* guest page size */
diff --git a/migration/multifd.c b/migration/multifd.c
index 4a85a6b7b3..278453cf84 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -648,11 +648,11 @@ static int multifd_send_channel_destroy(QIOChannel *send)
 
 static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
 {
-    if (p->registered_yank) {
+    if (p->c) {
         migration_ioc_unregister_yank(p->c);
+        multifd_send_channel_destroy(p->c);
+        p->c = NULL;
     }
-    multifd_send_channel_destroy(p->c);
-    p->c = NULL;
     qemu_sem_destroy(&p->sem);
     qemu_sem_destroy(&p->sem_sync);
     g_free(p->name);
@@ -932,7 +932,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
     qio_channel_set_delay(ioc, false);
 
     migration_ioc_register_yank(ioc);
-    p->registered_yank = true;
     /* Setup p->c only if the channel is completely setup */
     p->c = ioc;
 
-- 
2.43.0
====

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]