qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests/qtest/migration-test: Disable migration/multifd/tcp/pl


From: Dr. David Alan Gilbert
Subject: Re: [PATCH] tests/qtest/migration-test: Disable migration/multifd/tcp/plain/cancel
Date: Mon, 6 Mar 2023 13:44:38 +0000
User-agent: Mutt/2.2.9 (2022-11-12)

* Thomas Huth (thuth@redhat.com) wrote:
> On 03/03/2023 13.05, Peter Maydell wrote:
> > On Fri, 3 Mar 2023 at 11:29, Thomas Huth <thuth@redhat.com> wrote:
> > > 
> > > On 03/03/2023 12.18, Peter Maydell wrote:
> > > > On Fri, 3 Mar 2023 at 09:10, Juan Quintela <quintela@redhat.com> wrote:
> > > > > 
> > > > > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > > On Thu, Mar 02, 2023 at 05:22:11PM +0000, Peter Maydell wrote:
> > > > > > > migration-test has been flaky for a long time, both in CI and
> > > > > > > otherwise:
> > > > > > > 
> > > > > > > https://gitlab.com/qemu-project/qemu/-/jobs/3806090216
> > > > > > > (a FreeBSD job)
> > > > > > >     32/648 
> > > > > > > ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status:
> > > > > > >  assertion failed: (g_test_timer_elapsed() < 
> > > > > > > MIGRATION_STATUS_WAIT_TIMEOUT) ERROR
> > > > > > > 
> > > > > > > on a local macos x86 box:
> > > > 
> > > > 
> > > > 
> > > > > What is really weird with this failure is that:
> > > > > - it only happens on non-x86
> > > > 
> > > > No, I have seen it on x86 macos, and x86 OpenBSD
> > > > 
> > > > > - on code that is not arch dependent
> > > > > - on cancel, what we really do there is close fd's for the multifd
> > > > >     channel threads to get out of the recv, i.e. again, nothing that
> > > > >     should be arch dependent.
> > > > 
> > > > I'm pretty sure that it tends to happen when the machine that's
> > > > running the test is heavily loaded. You probably have a race condition.
> > > 
> > > I think I can second that. IIRC I've seen it a couple of times on my x86
> > > laptop when running "make check -j$(nproc) SPEED=slow" here.
> > 
> > And another on-x86 failure case, just now, on the FreeBSD x86 CI job:
> > https://gitlab.com/qemu-project/qemu/-/jobs/3870165180
> 
> And FWIW, I just saw this while doing "make vm-build-netbsd J=4":
> 
> ▶  31/645 
> ERROR:../src/tests/qtest/migration-test.c:1841:test_migrate_auto_converge: 
> 'got_stop' should be FALSE ERROR

That one is kind of interesting; this is an auto converge test - so it
tries to setup migration so it won't finish, to check that the auto
converge kicks in.  Except in this case the migration *did* finish
without the autoconverge (significantly) kicking in.

So I guess any of:
  a) The CPU thread never got much CPU time so not much dirtying
happened.
  b) The bandwidth calculations might be bad enough/course enough
that it's passing the (very low) bandwidth limit due to bad
approximation at bandwidth needed.
  c) The autoconverge jump happens fast enough for that loop
to hit the got_stop in the loop time of that loop.

I guess we could:
  i) Reduce the usleep in test_migrate_auto_converge
    (So it is more likely to correctly drop out of that loop
    as soon as autoconverge kicks in)
  ii) Reduce inc_pct so that autoconverge kicks in slower
  iii) Reduce max-bandwidth in migrate_ensure_non_converge
     even further.

Dave

>  31/645 qemu:qtest+qtest-i386 / qtest-i386/migration-test                     
>              ERROR          25.21s   killed by signal 6 SIGABRT
> > > > QTEST_QEMU_BINARY=./qemu-system-i386 MALLOC_PERTURB_=35 
> > > > G_TEST_DBUS_DAEMON=/home/qemu/qemu-test.fYHKFz/src/tests/dbus-vmstate-daemon.sh
> > > >  QTEST_QEMU_IMG=./qemu-img 
> > > > /home/qemu/qemu-test.fYHKFz/build/tests/qtest/migration-test --tap -k
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> stderr:
> qemu: thread naming not supported on this host
> qemu: thread naming not supported on this host
> qemu: thread naming not supported on this host
> qemu: thread naming not supported on this host
> qemu: thread naming not supported on this host
> qemu: thread naming not supported on this host
> **
> ERROR:../src/tests/qtest/migration-test.c:1841:test_migrate_auto_converge: 
> 'got_stop' should be FALSE
> 
> (test program exited with status code -6)
> 
>  Thomas
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]