qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures
Date: Wed, 7 Apr 2021 14:03:34 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0

07.04.2021 13:58, Kevin Wolf wrote:
Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:
On 29.03.21 17:05, Stefan Hajnoczi wrote:
Hi Kevin,
Peter hit an s390 qemu-iotests failure. I was able to reproduce it
easily. I haven't checked if it reproduce on other platforms too, but it
seems likely. Here is what I found.

qsd-jobs has race conditions:

    # Just make sure that this doesn't crash
    $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
        --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
        --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF | _filter_qmp
    {"execute":"qmp_capabilities"}
    {"execute": "block-commit", "arguments": {"device": "fmt0", "job-id": 
"job0"}}
    {"execute": "quit"}
    EOF

The intent is for "quit" to run while job0 still exists. This proves
that the program does not abort with an assertion failure when "quit" is
called while there are block jobs. bdrv_close_all() expects there to be
no jobs.

Here is the failure that reproduces 1 out of 3 times:

    # build/tests/qemu-iotests/check -qcow2 qsd-jobs
    QEMU          -- 
"/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" -nodefaults 
-display none -accel qtest
    QEMU_IMG      -- "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
    QEMU_IO       -- "/root/qemu/build/tests/qemu-iotests/../../qemu-io" 
--cache writeback --aio threads -f qcow2
    QEMU_NBD      -- "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
    IMGFMT        -- qcow2
    IMGPROTO      -- file
    PLATFORM      -- Linux/s390x
    TEST_DIR      -- /root/qemu/scratch
    SOCK_DIR      -- /tmp/tmpapj5dydm
    SOCKET_SCM_HELPER -- /root/qemu/build/tests/qemu-iotests/socket_scm_helper

    qsd-jobs   fail       [10:42:23] [10:42:23]   0.1s   (last: 0.1s)  output 
mismatch (see qsd-jobs.out.bad)
    --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
    +++ qsd-jobs.out.bad
    @@ -9,11 +9,11 @@
     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "created", "id": "job0"}}
     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "running", "id": "job0"}}
     {"return": {}}
    +{"return": {}}
    +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "paused", "id": "job0"}}
    +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "running", "id": "job0"}}
     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "ready", "id": "job0"}}
     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", 
"len": 0, "offset": 0, "speed": 0, "type": "commit"}}
    -{"return": {}}
    -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "standby", "id": "job0"}}
    -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "ready", "id": "job0"}}
     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "waiting", "id": "job0"}}
     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "pending", "id": "job0"}}
     {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": 
"job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
    Failures: qsd-jobs
    Failed 1 of 1 iotests

The timing of the QMP events seems to be non-deterministic relative to
the QMP command completions.

It's more complicated than simply lines moving up/down in the expected
output: the job state transitions are slightly different (running ->
ready vs running -> paused).

Should the test case filter out all JOB_* and BLOCK_JOB_* events?

Is there a way to make this test deterministic?

Would writing data to the overlay and setting a very restrictive speed help,
so it doesn’t immediately go into the 'ready' state?  (See attachment.)

I haven't tested the patch yet, but this is the approach we normally use
in tests, so I agree. Do you want to send it as a proper patch?


Max already sent two variants:

[PATCH] iotests/qsd-jobs: Filter events in the first test

and

[PATCH 0/2] iotests/qsd-jobs: Use common.qemu for the QSD

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]