[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-commits] [qemu/qemu] 212046: queue: fix QSLIST_INSERT_HEAD_ATOMIC
From: |
GitHub |
Subject: |
[Qemu-commits] [qemu/qemu] 212046: queue: fix QSLIST_INSERT_HEAD_ATOMIC race |
Date: |
Fri, 13 Mar 2015 04:00:08 -0700 |
Branch: refs/heads/master
Home: https://github.com/qemu/qemu
Commit: 2120465fbb87a69bd60283ec4034a0963766a7ef
https://github.com/qemu/qemu/commit/2120465fbb87a69bd60283ec4034a0963766a7ef
Author: Paolo Bonzini <address@hidden>
Date: 2015-03-12 (Thu, 12 Mar 2015)
Changed paths:
M include/qemu/queue.h
Log Message:
-----------
queue: fix QSLIST_INSERT_HEAD_ATOMIC race
There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
Because atomic_cmpxchg returns the old value instead of a success flag,
QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
the second argument to atomic_cmpxchg. Unfortunately, this only works
if the second argument is a local or thread-local variable.
If it is in memory, it can be subject to common subexpression elimination
(and then everything's fine) or reloaded after the atomic_cmpxchg,
depending on the compiler's whims. If the latter happens, the race can
happen. A thread can sneak in, doing something on elm->field.sle_next
after the atomic_cmpxchg and before the comparison. This causes a wrong
failure, and then two threads are using "elm" at the same time. In the
case discovered by Christian, the sequence was likely something like this:
thread 1 | thread 2
QSLIST_INSERT_HEAD_ATOMIC |
atomic_cmpxchg succeeds |
elm added to list |
| steal release_pool
| QSLIST_REMOVE_HEAD
| elm removed from list
| ...
| QSLIST_INSERT_HEAD_ATOMIC
| (overwrites sle_next)
spurious failure |
atomic_cmpxchg succeeds |
elm added to list again |
|
steal release_pool |
QSLIST_REMOVE_HEAD |
elm removed again |
The last three steps could be done by a third thread as well.
A reproducer that failed in a matter of seconds is as follows:
- the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
memory was 16G just to err on the safe side (the host has 64G, but hey
at least you need no s390)
- the guest has 24 null-aio virtio-blk devices using dataplane
(-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
-device virtio-blk-pci,iothread=ioN,drive=blkN)
- the guest also has a single network interface. It's only doing loopback
tests so slirp vs. tap and the model doesn't matter.
- the guest is running fio with the following script:
[global]
rw=randread
blocksize=16k
ioengine=libaio
runtime=10m
buffered=0
fallocate=none
time_based
iodepth=32
[virtio1a]
filename=/dev/block/252\:16
[virtio1b]
filename=/dev/block/252\:16
...
[virtio24a]
filename=/dev/block/252\:384
[virtio24b]
filename=/dev/block/252\:384
[listen1]
protocol=tcp
ioengine=net
port=12345
listen
rw=read
bs=4k
size=1000g
[connect1]
protocol=tcp
hostname=localhost
ioengine=net
port=12345
protocol=tcp
rw=write
startdelay=1
size=1000g
...
[listen8]
protocol=tcp
ioengine=net
port=12352
listen
rw=read
bs=4k
size=1000g
[connect8]
protocol=tcp
hostname=localhost
ioengine=net
port=12352
rw=write
startdelay=1
size=1000g
Moral of the story: I should refrain from writing more clever stuff.
At least it looks like it is not too clever to be undebuggable.
Reported-by: Christian Borntraeger <address@hidden>
Tested-by: Christian Borntraeger <address@hidden>
Signed-off-by: Paolo Bonzini <address@hidden>
Message-id: address@hidden
Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54
Signed-off-by: Paolo Bonzini <address@hidden>
Signed-off-by: Stefan Hajnoczi <address@hidden>
Commit: 87b86e7ef29771a7fa06e3e8e88fa95bbc13a39c
https://github.com/qemu/qemu/commit/87b86e7ef29771a7fa06e3e8e88fa95bbc13a39c
Author: Wen Congyang <address@hidden>
Date: 2015-03-12 (Thu, 12 Mar 2015)
Changed paths:
M block/qcow2-snapshot.c
M block/qcow2.c
Log Message:
-----------
qcow2: fix the macro QCOW_MAX_L1_SIZE's use
QCOW_MAX_L1_SIZE's unit is byte, and l1_size's unit
is l1 table entry size(8 bytes).
Signed-off-by: Wen Congyang <address@hidden>
Message-id: address@hidden
Signed-off-by: Stefan Hajnoczi <address@hidden>
Commit: f9f141b7475acfed1b6a28809687109702295be3
https://github.com/qemu/qemu/commit/f9f141b7475acfed1b6a28809687109702295be3
Author: Peter Maydell <address@hidden>
Date: 2015-03-13 (Fri, 13 Mar 2015)
Changed paths:
M block/qcow2-snapshot.c
M block/qcow2.c
M include/qemu/queue.h
Log Message:
-----------
Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into
staging
# gpg: Signature made Thu Mar 12 19:09:26 2015 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <address@hidden>"
# gpg: aka "Stefan Hajnoczi <address@hidden>"
* remotes/stefanha/tags/block-pull-request:
qcow2: fix the macro QCOW_MAX_L1_SIZE's use
queue: fix QSLIST_INSERT_HEAD_ATOMIC race
Signed-off-by: Peter Maydell <address@hidden>
Compare: https://github.com/qemu/qemu/compare/2a5b58e2405e...f9f141b7475a
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-commits] [qemu/qemu] 212046: queue: fix QSLIST_INSERT_HEAD_ATOMIC race,
GitHub <=