[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures
From: |
Kevin Wolf |
Subject: |
Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures |
Date: |
Wed, 7 Apr 2021 13:32:59 +0200 |
Am 07.04.2021 um 13:03 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.04.2021 13:58, Kevin Wolf wrote:
> > Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:
> > > On 29.03.21 17:05, Stefan Hajnoczi wrote:
> > > > Hi Kevin,
> > > > Peter hit an s390 qemu-iotests failure. I was able to reproduce it
> > > > easily. I haven't checked if it reproduce on other platforms too, but it
> > > > seems likely. Here is what I found.
> > > >
> > > > qsd-jobs has race conditions:
> > > >
> > > > # Just make sure that this doesn't crash
> > > > $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
> > > > --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
> > > > --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF |
> > > > _filter_qmp
> > > > {"execute":"qmp_capabilities"}
> > > > {"execute": "block-commit", "arguments": {"device": "fmt0",
> > > > "job-id": "job0"}}
> > > > {"execute": "quit"}
> > > > EOF
> > > >
> > > > The intent is for "quit" to run while job0 still exists. This proves
> > > > that the program does not abort with an assertion failure when "quit" is
> > > > called while there are block jobs. bdrv_close_all() expects there to be
> > > > no jobs.
> > > >
> > > > Here is the failure that reproduces 1 out of 3 times:
> > > >
> > > > # build/tests/qemu-iotests/check -qcow2 qsd-jobs
> > > > QEMU --
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64"
> > > > -nodefaults -display none -accel qtest
> > > > QEMU_IMG --
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
> > > > QEMU_IO --
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-io" --cache writeback
> > > > --aio threads -f qcow2
> > > > QEMU_NBD --
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
> > > > IMGFMT -- qcow2
> > > > IMGPROTO -- file
> > > > PLATFORM -- Linux/s390x
> > > > TEST_DIR -- /root/qemu/scratch
> > > > SOCK_DIR -- /tmp/tmpapj5dydm
> > > > SOCKET_SCM_HELPER --
> > > > /root/qemu/build/tests/qemu-iotests/socket_scm_helper
> > > >
> > > > qsd-jobs fail [10:42:23] [10:42:23] 0.1s (last: 0.1s)
> > > > output mismatch (see qsd-jobs.out.bad)
> > > > --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
> > > > +++ qsd-jobs.out.bad
> > > > @@ -9,11 +9,11 @@
> > > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id":
> > > > "job0"}}
> > > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id":
> > > > "job0"}}
> > > > {"return": {}}
> > > > +{"return": {}}
> > > > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id":
> > > > "job0"}}
> > > > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id":
> > > > "job0"}}
> > > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0,
> > > > "offset": 0, "speed": 0, "type": "commit"}}
> > > > -{"return": {}}
> > > > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id":
> > > > "job0"}}
> > > > -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id":
> > > > "job0"}}
> > > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id":
> > > > "job0"}}
> > > > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > > > "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0,
> > > > "offset": 0, "speed": 0, "type": "commit"}}
> > > > Failures: qsd-jobs
> > > > Failed 1 of 1 iotests
> > > >
> > > > The timing of the QMP events seems to be non-deterministic relative to
> > > > the QMP command completions.
> > > >
> > > > It's more complicated than simply lines moving up/down in the expected
> > > > output: the job state transitions are slightly different (running ->
> > > > ready vs running -> paused).
> > > >
> > > > Should the test case filter out all JOB_* and BLOCK_JOB_* events?
> > > >
> > > > Is there a way to make this test deterministic?
> > >
> > > Would writing data to the overlay and setting a very restrictive speed
> > > help,
> > > so it doesn’t immediately go into the 'ready' state? (See attachment.)
> >
> > I haven't tested the patch yet, but this is the approach we normally use
> > in tests, so I agree. Do you want to send it as a proper patch?
> >
>
> Max already sent two variants:
>
> [PATCH] iotests/qsd-jobs: Filter events in the first test
>
> and
>
> [PATCH 0/2] iotests/qsd-jobs: Use common.qemu for the QSD
Ah, with different subject lines. Thanks!
Kevin