Re: [External] Re: [PATCH 2/2] coroutine: take exactly one batch from gl

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [External] Re: [PATCH 2/2] coroutine: take exactly one batch from gl

From:	王洪浩
Subject:	Re: [External] Re: [PATCH 2/2] coroutine: take exactly one batch from global pool at a time
Date:	Wed, 26 Aug 2020 14:06:19 +0800

The purpose of this patch is to improve performance without increasing
memory consumption.

My test case:
QEMU command line arguments
-drive file=/dev/nvme2n1p1,format=raw,if=none,id=local0,cache=none,aio=native \
    -device virtio-blk,id=blk0,drive=local0,iothread=iothread0,num-queues=4 \
-drive file=/dev/nvme3n1p1,format=raw,if=none,id=local1,cache=none,aio=native \
    -device virtio-blk,id=blk1,drive=local1,iothread=iothread1,num-queues=4 \

run these two fio jobs at the same time
[job-vda]
filename=/dev/vda
iodepth=64
ioengine=libaio
rw=randrw
bs=4k
size=300G
rwmixread=80
direct=1
numjobs=2
runtime=60

[job-vdb]
filename=/dev/vdb
iodepth=64
ioengine=libaio
rw=randrw
bs=4k
size=300G
rwmixread=90
direct=1
numjobs=2
loops=1
runtime=60

without this patch, test 3 times:
total iops: 278548.1, 312374.1, 276638.2
with this patch, test 3 times:
total iops: 368370.9, 335693.2, 327693.1

18.9% improvement in average.

In addition, we are also using a distributed block storage, of which
the io latency is much more than local nvme devices because of the
network overhead. So it needs higher iodepth(>=256) to reach its max
throughput.
Without this patch, it has more than 5% chance of calling
`qemu_coroutine_new` and the iops is less than 100K, while the iops is
about 260K with this patch.

On the other hand, there's a simpler way to reduce or eliminate the
cost of `qemu_coroutine_new` is to increase POOL_BATCH_SIZE. But it
will also bring much more memory consumption which we don't expect.
So it's the purpose of this patch.

Stefan Hajnoczi <stefanha@redhat.com> 于2020年8月25日周二 下午10:52写道：
>
> On Mon, Aug 24, 2020 at 12:31:21PM +0800, wanghonghao wrote:
> > This patch replace the global coroutine queue with a lock-free stack of 
> > which
> > the elements are coroutine queues. Threads can put coroutine queues into the
> > stack or take queues from it and each coroutine queue has exactly
> > POOL_BATCH_SIZE coroutines. Note that the stack is not strictly LIFO, but 
> > it's
> > enough for buffer pool.
> >
> > Coroutines will be put into thread-local pools first while release. Now the
> > fast pathes of both allocation and release are atomic-free, and there won't
> > be too many coroutines remain in a single thread since POOL_BATCH_SIZE has 
> > been
> > reduced to 16.
> >
> > In practice, I've run a VM with two block devices binding to two different
> > iothreads, and run fio with iodepth 128 on each device. It maintains around
> > 400 coroutines and has about 1% chance of calling to `qemu_coroutine_new`
> > without this patch. And with this patch, it maintains no more than 273
> > coroutines and doesn't call `qemu_coroutine_new` after initial allocations.
>
> Does throughput or IOPS change?
>
> Is the main purpose of this patch to reduce memory consumption?
>
> Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 1/2] QSLIST: add atomic replace operation, wanghonghao, 2020/08/13
- [PATCH 2/2] coroutine: take exactly one batch from global pool at a time, wanghonghao, 2020/08/13
- [PATCH 1/2] QSLIST: add atomic replace operation, wanghonghao, 2020/08/24
  - [PATCH 2/2] coroutine: take exactly one batch from global pool at a time, wanghonghao, 2020/08/24
    - Re: [PATCH 2/2] coroutine: take exactly one batch from global pool at a time, Stefan Hajnoczi, 2020/08/25
    - Re: [External] Re: [PATCH 2/2] coroutine: take exactly one batch from global pool at a time, 王洪浩 <=
  - Re: [PATCH 1/2] QSLIST: add atomic replace operation, Stefan Hajnoczi, 2020/08/24
    - Re: [External] Re: [PATCH 1/2] QSLIST: add atomic replace operation, 王洪浩, 2020/08/24
    - [PATCH v2 1/2] QSLIST: add atomic replace operation, wanghonghao, 2020/08/24
    - [PATCH v2 2/2] coroutine: take exactly one batch from global pool at a time, wanghonghao, 2020/08/24

Prev by Date: [PATCH v7 8/8] target/s390x: Use start-powered-off CPUState property
Next by Date: Re: [PATCH] disas/libvixl: Fix fall-through annotation for GCC >= 7
Previous by thread: Re: [PATCH 2/2] coroutine: take exactly one batch from global pool at a time
Next by thread: Re: [PATCH 1/2] QSLIST: add atomic replace operation
Index(es):
- Date
- Thread