qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] migration: Allow user to specify migration available bandwid


From: Peter Xu
Subject: Re: [PATCH] migration: Allow user to specify migration available bandwidth
Date: Tue, 25 Jul 2023 11:54:52 -0400

On Tue, Jul 25, 2023 at 10:16:52AM +0100, Daniel P. Berrangé wrote:
> On Mon, Jul 24, 2023 at 03:47:50PM -0400, Peter Xu wrote:
> > On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote:
> > > On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote:
> > > > Migration bandwidth is a very important value to live migration.  It's
> > > > because it's one of the major factors that we'll make decision on when 
> > > > to
> > > > switchover to destination in a precopy process.
> > > 
> > > To elaborate on this for those reading along...
> > > 
> > > QEMU takes maxmimum downtime limit and multiplies by its estimate
> > > of bandwidth. This gives a figure for the amount of data QEMU thinks
> > > it can transfer within the downtime period.
> > > 
> > > QEMU compares this figure to the amount of data that is still pending
> > > at the end of an iteration.
> > > 
> > > > This value is currently estimated by QEMU during the whole live 
> > > > migration
> > > > process by monitoring how fast we were sending the data.  This can be 
> > > > the
> > > > most accurate bandwidth if in the ideal world, where we're always 
> > > > feeding
> > > > unlimited data to the migration channel, and then it'll be limited to 
> > > > the
> > > > bandwidth that is available.
> > > 
> > > The QEMU estimate for available bandwidth will definitely be wrong,
> > > potentially by orders of magnitude, if QEMU has a max bandwidth limit
> > > set, as in that case it is never trying to push the peak rates available
> > > from the NICs/network fabric.
> > > 
> > > > The issue is QEMU itself may not be able to avoid those uncertainties on
> > > > measuing the real "available migration bandwidth".  At least not 
> > > > something
> > > > I can think of so far.
> > > 
> > > IIUC, you can query the NIC properties to find the hardware transfer
> > > rate of the NICs. That doesn't imply apps can actually reach that
> > > rate in practice - it has a decent chance of being a over-estimate
> > > of bandwidth, possibly very very much over.
> > > 
> > > Is such an over estimate better or worse than QEMU's current
> > > under-estimate ? It depends on the POV.
> > > 
> > > From the POV of QEMU, over-estimating means means it'll be not
> > > be throttling as much as it should. That's not a downside of
> > > migration - it makes it more likely for migration to complete :-)
> > 
> > Heh. :)
> > 
> > > 
> > > From the POV of non-QEMU apps though, if QEMU over-estimates,
> > > it'll mean other apps get starved of network bandwidth.
> > > 
> > > Overall I agree, there's no obvious way QEMU can ever come up
> > > with a reliable estimate for bandwidth available.
> > > 
> > > > One way to fix this is when the user is fully aware of the available
> > > > bandwidth, then we can allow the user to help providing an accurate 
> > > > value.
> > > >
> > > > For example, if the user has a dedicated channel of 10Gbps for migration
> > > > for this specific VM, the user can specify this bandwidth so QEMU can
> > > > always do the calculation based on this fact, trusting the user as long 
> > > > as
> > > > specified.
> > > 
> > > I can see that in theory, but when considering a non-trivial
> > > deployments of QEMU, I wonder if the user can really have any
> > > such certainty of what is truely avaialble. It would need
> > > global awareness of the whole network of hosts & workloads.
> > 
> > Indeed it may or may not be easy always.
> > 
> > The good thing about this parameter is we always use the old estimation if
> > the user can't specify anything valid, so this is always optional not
> > required.
> > 
> > It solves the cases where the user can still specify accurately on the bw -
> > our QE team has already verified that it worked for us on GPU tests, where
> > it used to not be able to migrate at all with any sane downtime specified.
> > I should have attached a Tested-By from Zhiyi but since this is not exactly
> > the patch he was using I didn't.
> > 
> > > 
> > > > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 
> > > > 5Gbps
> > > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps 
> > > > for
> > > > other things).  So it can be useful even if the network is not 
> > > > dedicated,
> > > > but as long as the user can know a solid value.
> > > > 
> > > > A new parameter "available-bandwidth" is introduced just for this. So 
> > > > when
> > > > the user specified this parameter, instead of trusting the estimated 
> > > > value
> > > > from QEMU itself (based on the QEMUFile send speed), let's trust the 
> > > > user
> > > > more.
> > > 
> > > I feel like rather than "available-bandwidth", we should call
> > > it "max-convergance-bandwidth".
> > > 
> > > To me that name would better reflect the fact that this isn't
> > > really required to be a measure of how much NIC bandwidth is
> > > available. It is merely an expression of a different bandwidth
> > > limit to apply during switch over.
> > > 
> > > IOW
> > > 
> > > * max-bandwidth: limit during pre-copy main transfer
> > > * max-convergance-bandwidth: limit during pre-copy switch-over
> > > * max-postcopy-banwidth: limit during post-copy phase
> > 
> > I worry the new name suggested is not straightforward enough at the 1st
> > glance, even to me as a developer.
> > 
> > "available-bandwidth" doesn't even bind that value to "convergence" at all,
> > even though it was for solving this specific problem here. I wanted to make
> > this parameter sololy for the admin to answer the question "how much
> > bandwidth is available to QEMU migration in general?"  That's pretty much
> > straightforward IMHO.  With that, it's pretty sane to consider using all we
> > have during switchover (aka, unlimited bandwidth, as fast as possible).
> > 
> > Maybe at some point we can even leverage this information for other purpose
> > rather than making the migration converge.
> 
> The flipside is that the semantics & limits we want for convergance
> are already known to be different from what we wanted for pre-copy
> and post-copy. With that existing practice, it is probably more
> likely that we would not want to re-use the same setting for different
> cases, which makes me think a specifically targetted parameter is
> better.

We can make the semantics specific, no strong opinion here.  I wished it
can be as generic / easy as possible but maybe I went too far.

Though, is there anything else we can choose from besides
"max-convergence-bandwidth"? Or am I the only one that thinks it's hard to
understand when put "max" and "convergence" together?

When I take one step back to look at the whole "bandwidth" parameters, I am
not sure why we'd even need both "convergence" and "postcopy" bandwidth
being separate.  With my current understanding of migration, we may
actually need:

  - One bandwidth that we may want to run the background migration, aka,
    precopy migration, where we don't rush on pushing data.

  - One bandwidth that is whatever we can have maximum; for dedicated NIC
    that's the line speed.  We should always use this full speed for
    important things.  I'd say postcopy falls into this, and this
    "convergence" calculation should also rely on this.

So another way to do this is we leverage the existing "postcopy-bandwidth"
for calculation when set, it may help us to shrink the bandwidth values to
two, but I'm not sure whether the name can be confusing too.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]