[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures
From: |
Kevin Wolf |
Subject: |
Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures |
Date: |
Wed, 7 Apr 2021 12:58:45 +0200 |
Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:
> On 29.03.21 17:05, Stefan Hajnoczi wrote:
> > Hi Kevin,
> > Peter hit an s390 qemu-iotests failure. I was able to reproduce it
> > easily. I haven't checked if it reproduce on other platforms too, but it
> > seems likely. Here is what I found.
> >
> > qsd-jobs has race conditions:
> >
> > # Just make sure that this doesn't crash
> > $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
> > --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
> > --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF | _filter_qmp
> > {"execute":"qmp_capabilities"}
> > {"execute": "block-commit", "arguments": {"device": "fmt0", "job-id":
> > "job0"}}
> > {"execute": "quit"}
> > EOF
> >
> > The intent is for "quit" to run while job0 still exists. This proves
> > that the program does not abort with an assertion failure when "quit" is
> > called while there are block jobs. bdrv_close_all() expects there to be
> > no jobs.
> >
> > Here is the failure that reproduces 1 out of 3 times:
> >
> > # build/tests/qemu-iotests/check -qcow2 qsd-jobs
> > QEMU --
> > "/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" -nodefaults
> > -display none -accel qtest
> > QEMU_IMG -- "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
> > QEMU_IO -- "/root/qemu/build/tests/qemu-iotests/../../qemu-io"
> > --cache writeback --aio threads -f qcow2
> > QEMU_NBD -- "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
> > IMGFMT -- qcow2
> > IMGPROTO -- file
> > PLATFORM -- Linux/s390x
> > TEST_DIR -- /root/qemu/scratch
> > SOCK_DIR -- /tmp/tmpapj5dydm
> > SOCKET_SCM_HELPER --
> > /root/qemu/build/tests/qemu-iotests/socket_scm_helper
> >
> > qsd-jobs fail [10:42:23] [10:42:23] 0.1s (last: 0.1s)
> > output mismatch (see qsd-jobs.out.bad)
> > --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
> > +++ qsd-jobs.out.bad
> > @@ -9,11 +9,11 @@
> > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
> > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> > {"return": {}}
> > +{"return": {}}
> > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}}
> > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset":
> > 0, "speed": 0, "type": "commit"}}
> > -{"return": {}}
> > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "job0"}}
> > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
> > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job0"}}
> > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0,
> > "offset": 0, "speed": 0, "type": "commit"}}
> > Failures: qsd-jobs
> > Failed 1 of 1 iotests
> >
> > The timing of the QMP events seems to be non-deterministic relative to
> > the QMP command completions.
> >
> > It's more complicated than simply lines moving up/down in the expected
> > output: the job state transitions are slightly different (running ->
> > ready vs running -> paused).
> >
> > Should the test case filter out all JOB_* and BLOCK_JOB_* events?
> >
> > Is there a way to make this test deterministic?
>
> Would writing data to the overlay and setting a very restrictive speed help,
> so it doesn’t immediately go into the 'ready' state? (See attachment.)
I haven't tested the patch yet, but this is the approach we normally use
in tests, so I agree. Do you want to send it as a proper patch?
Kevin
- Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures,
Kevin Wolf <=