qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/4] kvm: Dirty ring autoconverge optmization for kvm_cpu_


From: Peter Xu
Subject: Re: [PATCH v2 2/4] kvm: Dirty ring autoconverge optmization for kvm_cpu_synchronize_kick_all
Date: Fri, 13 May 2022 12:41:40 -0400

On Sat, Apr 02, 2022 at 03:22:21PM +0800, Hyman Huang wrote:
> 在 2022/3/28 9:32, wucy11@chinatelecom.cn 写道:
> > @@ -879,7 +862,9 @@ static void kvm_dirty_ring_flush(void)
> >        * First make sure to flush the hardware buffers by kicking all
> >        * vcpus out in a synchronous way.
> >        */
> > -    kvm_cpu_synchronize_kick_all();
> > +    if (!cpu_throttle_get_percentage()) {
> > +        qemu_kvm_cpu_synchronize_kick_all();
> > +    }
> For the code logic itself, kvm_dirty_ring_flush aims to flush all existing
> dirty pages, same as i reviewed in v1's first commit "kvm,memory: Optimize
> dirty page collection for dirty ring", we shouldn't consider the migation
> logic in this path.

I agree with Yong's analysis.  I'm afraid this patch is breaking the api
here.

Say, memory_region_sync_dirty_bitmap() should be the memory api to flush
all dirty memory, we can't simply ignore the flushing just because it's
slow when we're throttling the cpus.  Things will already start to break,
e.g., the dirty rate calculation based on dirty ring relies on all dirty
pages be accounted after the sync() call and this patch will make the
reported value smaller than the reality.  It's not a major deal breaker in
dirty rate measurements but it's just one exmample that we're potentially
breaking the memory api.

AFAIU this is probably another reason why I don't really like the
auto-converge old solution because it's kind of brutal in this case by
putting the thread into a long sleep.

One fix could be that we do periodically sleep in auto-converge code so
that the vcpu threads can still handle any cpu sync requests, though I
didn't really check the cpu code path.

It's also racy in that cpu_throttle_get_percentage() is racy itself:
imagine a case where cpu_throttle_get_percentage() always return >0 _but_
it starts to return 0 when migration_complete() - we'll not call
qemu_kvm_cpu_synchronize_kick_all() even for completion and it can crash
the VM.

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]