qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures


From: Kevin Wolf
Subject: Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures
Date: Wed, 7 Apr 2021 12:58:45 +0200

Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:
> On 29.03.21 17:05, Stefan Hajnoczi wrote:
> > Hi Kevin,
> > Peter hit an s390 qemu-iotests failure. I was able to reproduce it
> > easily. I haven't checked if it reproduce on other platforms too, but it
> > seems likely. Here is what I found.
> > 
> > qsd-jobs has race conditions:
> > 
> >    # Just make sure that this doesn't crash
> >    $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
> >        --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
> >        --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF | _filter_qmp
> >    {"execute":"qmp_capabilities"}
> >    {"execute": "block-commit", "arguments": {"device": "fmt0", "job-id": 
> > "job0"}}
> >    {"execute": "quit"}
> >    EOF
> > 
> > The intent is for "quit" to run while job0 still exists. This proves
> > that the program does not abort with an assertion failure when "quit" is
> > called while there are block jobs. bdrv_close_all() expects there to be
> > no jobs.
> > 
> > Here is the failure that reproduces 1 out of 3 times:
> > 
> >    # build/tests/qemu-iotests/check -qcow2 qsd-jobs
> >    QEMU          -- 
> > "/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" -nodefaults 
> > -display none -accel qtest
> >    QEMU_IMG      -- "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
> >    QEMU_IO       -- "/root/qemu/build/tests/qemu-iotests/../../qemu-io" 
> > --cache writeback --aio threads -f qcow2
> >    QEMU_NBD      -- "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
> >    IMGFMT        -- qcow2
> >    IMGPROTO      -- file
> >    PLATFORM      -- Linux/s390x
> >    TEST_DIR      -- /root/qemu/scratch
> >    SOCK_DIR      -- /tmp/tmpapj5dydm
> >    SOCKET_SCM_HELPER -- 
> > /root/qemu/build/tests/qemu-iotests/socket_scm_helper
> > 
> >    qsd-jobs   fail       [10:42:23] [10:42:23]   0.1s   (last: 0.1s)  
> > output mismatch (see qsd-jobs.out.bad)
> >    --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
> >    +++ qsd-jobs.out.bad
> >    @@ -9,11 +9,11 @@
> >     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
> >     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> >     {"return": {}}
> >    +{"return": {}}
> >    +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}}
> >    +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> >     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> >     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 
> > 0, "speed": 0, "type": "commit"}}
> >    -{"return": {}}
> >    -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "job0"}}
> >    -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> >     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
> >     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job0"}}
> >     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, 
> > "offset": 0, "speed": 0, "type": "commit"}}
> >    Failures: qsd-jobs
> >    Failed 1 of 1 iotests
> > 
> > The timing of the QMP events seems to be non-deterministic relative to
> > the QMP command completions.
> > 
> > It's more complicated than simply lines moving up/down in the expected
> > output: the job state transitions are slightly different (running ->
> > ready vs running -> paused).
> > 
> > Should the test case filter out all JOB_* and BLOCK_JOB_* events?
> > 
> > Is there a way to make this test deterministic?
> 
> Would writing data to the overlay and setting a very restrictive speed help,
> so it doesn’t immediately go into the 'ready' state?  (See attachment.)

I haven't tested the patch yet, but this is the approach we normally use
in tests, so I agree. Do you want to send it as a proper patch?

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]