qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6


From: Ming Lei
Subject: Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2
Date: Mon, 30 Jun 2014 16:27:56 +0800

On Mon, Jun 30, 2014 at 4:08 PM, Stefan Hajnoczi <address@hidden> wrote:
> On Sat, Jun 28, 2014 at 05:58:58PM +0800, Ming Lei wrote:
>> On Sat, Jun 28, 2014 at 5:51 AM, Paolo Bonzini <address@hidden> wrote:
>> > Il 27/06/2014 20:01, Ming Lei ha scritto:
>> >
>> >> I just implemented plug&unplug based batching, and it is working now.
>> >> But throughout still has no obvious improvement.
>> >>
>> >> Looks loading in IOthread is a bit low, so I am wondering if there is
>> >> block point caused by Qemu QEMU block layer.
>> >
>> >
>> > What does perf say?  Also, you can try using the QEMU trace subsystem and
>> > see where the latency goes.
>>
>> Follows some test result against 8589744aaf07b62 of
>> upstream qemu, and the test is done on my 2core(4thread)
>> laptop:
>>
>> 1, with my draft batch patches[1](only linux-aio supported now)
>> - throughput: +16% compared qemu upstream
>> - average time spent by handle_notify(): 310us
>> - average time between two handle_notify(): 1591us
>> (this time reflects latency of handling host_notifier)
>
> 16% is still a worthwhile improvement.  I guess batching only benefits
> aio=native since the threadpool ought to do better when it receives
> requests as soon as possible.

16% is obtained with 'simple' trace-backend enabled, and looks the
actual data with 'nop' trace is quite better than 16%, but it is still
not good as 2.0.0 release.

>
> Patch or an RFC would be welcome.

Yes, I will post it soon.

>> 2, same tests on 2.0.0 release(use custom Linux AIO)
>> - average time spent by handle_notify(): 68us
>> - average time between calling two handle_notify(): 269us
>> (this time reflects latency of handling host_notifier)
>>
>> From above tests, looks root cause is late handling notify, and
>> qemu block layer becomes 4times slower than previous custom
>> linux aio taken by dataplane.

The above data is still obtained with 'simple' trace backend enabled,
I need to find other ways to test again without extra trace io.

> Try:
> $ perf record -e syscalls:* --tid <iothread-tid>
> ^C
> $ perf script # shows the trace log
>
> The difference between syscalls in QEMU 2.0 and qemu.git/master could
> reveal the problem.
> Using perf you can also trace ioeventfd signalling in the host kernel
> and compare against the QEMU handle_notify entry/return.  It may be
> easiest to use the ftrace_marker tracing backing in QEMU so the trace is
> unified with the host kernel trace (./configure
> --enable-trace-backend=ftrace and see the ftrace section in QEMU
> docs/tracing.txt).
>
> This way you can see whether the ioeventfd signal -> handle_notify()
> entry increased or something else is going on.

Looks good ideas, I will try it.

I have tried ftrace, but looks some traces may be dropped and my
current script can't handle that well.


Thanks,
-- 
Ming Lei



reply via email to

[Prev in Thread] Current Thread [Next in Thread]