qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests/qtest/migration-test: Disable migration/multifd/tcp/pl


From: Peter Xu
Subject: Re: [PATCH] tests/qtest/migration-test: Disable migration/multifd/tcp/plain/cancel
Date: Tue, 14 Mar 2023 12:46:34 -0400

On Tue, Mar 14, 2023 at 10:11:53AM +0000, Dr. David Alan Gilbert wrote:
> OK, I think I kind of see what's happening here, one for Peter Xu.
> If I'm right it's a race something like:
>   a) The test harness tells the source it wants to enter postcopy
>   b) The harness then waits for the source to stop
>   c) ... and the dest to start 
> 
>   It's blocked on one of b&c but can't tell which
> 
>   d) The main thread in the dest is waiting for the postcopy recovery fd
>     to be opened
>   e) But I think the source is still trying to send normal precopy RAM
>     and perhaps hasn't got around yet to opening that socket yet????
>   f) But I think the dest isn't reading from the main channel at that
>     point because of (d)

I think this analysis is spot on.  Thanks Dave!

Src qemu does this with below order:

        1. setup preempt channel
        1.1. connect()  --> this is done in another thread
        1.2. sem_wait(postcopy_qemufile_src_sem) --> make sure it's created
        2. prepare postcopy package (LISTEN, non-iterable states, ping-3, RUN)
        3. send the package

So logically the sequence is guaranteed so that when LISTEN cmd is
processed, we should have connect()ed already.

But I think there's one thing missing on dest.. since the accept() on the
dest node should be run in the main thread, meanwhile the LISTEN cmd is
also processed on the main thread, even if the listening socket is trying
to kick the main thread to do the accept() (so the connection has
established) it won't be able to kick the final accept() as main thread is
waiting in the semaphore.  That caused a deadlock.

A simple fix I can think of is moving the wait channel operation outside
the main thread, e.g. to the preempt thread.

I've attached that simple fix.  Peter, is it easy to verify it?  I'm not
sure the reproducability, fine by me too if it's easier to just disable
preempt tests for 8.0 release.

Thanks,

-- 
Peter Xu

Attachment: 0001-migration-Wait-on-preempt-channel-in-preempt-thread.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]