[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fio regression caused by f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94
From: |
Lukáš Doktor |
Subject: |
Re: Fio regression caused by f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 |
Date: |
Fri, 6 May 2022 06:30:37 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 |
Hello all,
thank you for the responses, I ran 3 runs per each commit using 5 iteration of
fio-nbd using
f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94
f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 + Stefan's commit
d7482ffe9756919531307330fd1c6dbec66e8c32
using the regressed f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 as a base-line the
relative percentage results were:
f9f | 0.0 | -2.8 | 0.6
stefan | -3.1 | -1.2 | -2.2
d74 | 7.2 | 9.1 | 8.2
Not sure whether the Stefan's commit was suppose to be applied on top of the
f9fc893b commit but at least for fio-nbd 4k writes it slightly worsen the
situation.
Do you want me to try the fio inside guest as well, or is this fio-nbd check
sufficient for now?
Also let me briefly share the details about the execution:
---
mkdir -p /var/lib/runperf/runperf-nbd/
truncate -s 256M /var/lib/runperf/runperf-nbd//disk.img
nohup qemu-nbd -t -k /var/lib/runperf/runperf-nbd//socket -f raw
/var/lib/runperf/runperf-nbd//disk.img &> $(mktemp
/var/lib/runperf/runperf-nbd//qemu_nbd_XXXX.log) & echo $! >>
/var/lib/runperf/runperf-nbd//kill_pids
for PID in $(cat /var/lib/runperf/runperf-nbd//kill_pids); do disown -h $PID;
done
export TERM=xterm-256color
true
mkdir -p /var/lib/runperf/runperf-nbd/
cat > /var/lib/runperf/runperf-nbd/nbd.fio << \Gr1UaS
# To use fio to test nbdkit:
#
# nbdkit -U - memory size=256M --run 'export unixsocket; fio examples/nbd.fio'
#
# To use fio to test qemu-nbd:
#
# rm -f /tmp/disk.img /tmp/socket
# truncate -s 256M /tmp/disk.img
# export target=/tmp/socket
# qemu-nbd -t -k $target -f raw /tmp/disk.img &
# fio examples/nbd.fio
# killall qemu-nbd
[global]
bs = $@
runtime = 30
ioengine = nbd
iodepth = 32
direct = 1
sync = 0
time_based = 1
clocksource = gettimeofday
ramp_time = 5
write_bw_log = fio
write_iops_log = fio
write_lat_log = fio
log_avg_msec = 1000
write_hist_log = fio
log_hist_msec = 10000
# log_hist_coarseness = 4 # 76 bins
rw = $@
uri=nbd+unix:///?socket=/var/lib/runperf/runperf-nbd/socket
# Starting from nbdkit 1.14 the following will work:
#uri=${uri}
[job0]
offset=0
[job1]
offset=64m
[job2]
offset=128m
[job3]
offset=192m
Gr1UaS
benchmark_bin=/usr/local/bin/fio pbench-fio --block-sizes=4
--job-file=/var/lib/runperf/runperf-nbd/nbd.fio --numjobs=4 --runtime=60
--samples=5 --test-types=write --clients=$WORKER_IP
---
I am using pbench to run the execution, but you can simply replace the "$@"
variables in the produced "/var/lib/runperf/runperf-nbd/nbd.fio" and run it
directly using fio.
Regards,
Lukáš
Dne 05. 05. 22 v 15:27 Paolo Bonzini napsal(a):
> On 5/5/22 14:44, Daniel P. Berrangé wrote:
>>> util/thread-pool.c uses qemu_sem_*() to notify worker threads when work
>>> becomes available. It makes sense that this operation is
>>> performance-critical and that's why the benchmark regressed.
>>
>> Doh, I questioned whether the change would have a performance impact,
>> and it wasn't thought to be used in perf critical places
>
> The expectation was that there would be no contention and thus no overhead
> because of the pool->lock that exists anyway, but that was optimistic.
>
> Lukáš, can you run a benchmark with this condvar implementation that was
> suggested by Stefan:
>
> 20220505131346.823941-1-pbonzini@redhat.com/raw">https://lore.kernel.org/qemu-devel/20220505131346.823941-1-pbonzini@redhat.com/raw
>
> ?
>
> If it still regresses, we can either revert the patch or look at a different
> implementation (even getting rid of the global queue is an option).
>
> Thanks,
>
> Paolo
>
OpenPGP_0x26B362E47FCF22C1.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature