qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures


From: Kevin Wolf
Subject: Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures
Date: Wed, 7 Apr 2021 13:32:59 +0200

Am 07.04.2021 um 13:03 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.04.2021 13:58, Kevin Wolf wrote:
> > Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:
> > > On 29.03.21 17:05, Stefan Hajnoczi wrote:
> > > > Hi Kevin,
> > > > Peter hit an s390 qemu-iotests failure. I was able to reproduce it
> > > > easily. I haven't checked if it reproduce on other platforms too, but it
> > > > seems likely. Here is what I found.
> > > > 
> > > > qsd-jobs has race conditions:
> > > > 
> > > >     # Just make sure that this doesn't crash
> > > >     $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
> > > >         --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
> > > >         --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF | 
> > > > _filter_qmp
> > > >     {"execute":"qmp_capabilities"}
> > > >     {"execute": "block-commit", "arguments": {"device": "fmt0", 
> > > > "job-id": "job0"}}
> > > >     {"execute": "quit"}
> > > >     EOF
> > > > 
> > > > The intent is for "quit" to run while job0 still exists. This proves
> > > > that the program does not abort with an assertion failure when "quit" is
> > > > called while there are block jobs. bdrv_close_all() expects there to be
> > > > no jobs.
> > > > 
> > > > Here is the failure that reproduces 1 out of 3 times:
> > > > 
> > > >     # build/tests/qemu-iotests/check -qcow2 qsd-jobs
> > > >     QEMU          -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" 
> > > > -nodefaults -display none -accel qtest
> > > >     QEMU_IMG      -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
> > > >     QEMU_IO       -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-io" --cache writeback 
> > > > --aio threads -f qcow2
> > > >     QEMU_NBD      -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
> > > >     IMGFMT        -- qcow2
> > > >     IMGPROTO      -- file
> > > >     PLATFORM      -- Linux/s390x
> > > >     TEST_DIR      -- /root/qemu/scratch
> > > >     SOCK_DIR      -- /tmp/tmpapj5dydm
> > > >     SOCKET_SCM_HELPER -- 
> > > > /root/qemu/build/tests/qemu-iotests/socket_scm_helper
> > > > 
> > > >     qsd-jobs   fail       [10:42:23] [10:42:23]   0.1s   (last: 0.1s)  
> > > > output mismatch (see qsd-jobs.out.bad)
> > > >     --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
> > > >     +++ qsd-jobs.out.bad
> > > >     @@ -9,11 +9,11 @@
> > > >      {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": 
> > > > "job0"}}
> > > >      {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": 
> > > > "job0"}}
> > > >      {"return": {}}
> > > >     +{"return": {}}
> > > >     +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": 
> > > > "job0"}}
> > > >     +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": 
> > > > "job0"}}
> > > >      {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > > >      {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, 
> > > > "offset": 0, "speed": 0, "type": "commit"}}
> > > >     -{"return": {}}
> > > >     -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": 
> > > > "job0"}}
> > > >     -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > > >      {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": 
> > > > "job0"}}
> > > >      {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": 
> > > > "job0"}}
> > > >      {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, 
> > > > "offset": 0, "speed": 0, "type": "commit"}}
> > > >     Failures: qsd-jobs
> > > >     Failed 1 of 1 iotests
> > > > 
> > > > The timing of the QMP events seems to be non-deterministic relative to
> > > > the QMP command completions.
> > > > 
> > > > It's more complicated than simply lines moving up/down in the expected
> > > > output: the job state transitions are slightly different (running ->
> > > > ready vs running -> paused).
> > > > 
> > > > Should the test case filter out all JOB_* and BLOCK_JOB_* events?
> > > > 
> > > > Is there a way to make this test deterministic?
> > > 
> > > Would writing data to the overlay and setting a very restrictive speed 
> > > help,
> > > so it doesn’t immediately go into the 'ready' state?  (See attachment.)
> > 
> > I haven't tested the patch yet, but this is the approach we normally use
> > in tests, so I agree. Do you want to send it as a proper patch?
> > 
> 
> Max already sent two variants:
> 
> [PATCH] iotests/qsd-jobs: Filter events in the first test
> 
> and
> 
> [PATCH 0/2] iotests/qsd-jobs: Use common.qemu for the QSD

Ah, with different subject lines. Thanks!

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]