qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 13/14] migration: for SEV live migration bump downtime lim


From: Daniel P . Berrangé
Subject: Re: [PATCH v4 13/14] migration: for SEV live migration bump downtime limit to 1s.
Date: Fri, 10 Sep 2021 10:43:50 +0100
User-agent: Mutt/2.0.7 (2021-05-04)

On Wed, Aug 04, 2021 at 11:59:47AM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Now, qemu has a default expected downtime of 300 ms and
> SEV Live migration has a page-per-second bandwidth of 350-450 pages
> ( SEV Live migration being generally slow due to guest RAM pages
> being migrated after encryption using the security processor ).
> With this expected downtime of 300ms and 350-450 pps bandwith,
> the threshold size = <1/3 of the PPS bandwidth = ~100 pages.
> 
> Now, this threshold size is the maximum pages/bytes that can be
> sent in the final completion phase of Live migration
> (where the source VM is stopped) with the expected downtime.
> Therefore, with the threshold size computed above,
> the migration completion phase which halts the source VM
> and then transfers the leftover dirty pages,
> is only reached in SEV live migration case when # of dirty pages are ~100.
> 
> The dirty-pages-rate with larger guest RAM configuration like 4G, 8G, etc.
> is much higher, typically in the range of 300-400+ pages, hence,
> we always remain in the "dirty-sync" phase of migration and never
> reach the migration completion phase with above guest RAM configs.
> 
> To summarize, with larger guest RAM configs,
> the dirty-pages-rate > threshold_size (with the default qemu expected 
> downtime of 300ms).
> 
> So, the fix is to increase qemu's expected downtime.
> 
> This is a tweakable parameter which can be set using "migrate_set_downtime".
> 
> With a downtime of 1 second, we get a threshold size of ~350-450 pages,
> which will handle the "dirty-pages-rate" of 300+ pages and complete
> the migration process, so we bump the default downtime to 1s in case
> of SEV live migration being active.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  migration/migration.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index daea3ecd04..c9bc33fb10 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3568,6 +3568,10 @@ static void migration_update_counters(MigrationState 
> *s,
>      transferred = current_bytes - s->iteration_initial_bytes;
>      time_spent = current_time - s->iteration_start_time;
>      bandwidth = (double)transferred / time_spent;
> +    if (memcrypt_enabled() &&
> +        s->parameters.downtime_limit < 1000) {
> +        s->parameters.downtime_limit = 1000;
> +    }

I don't think we can be silently changing a value set by the mgmt
app. If the app requests 300 ms downtime, then we *must* honour
that, because it is driven by the SLA they need to privide to the
guest user's workload. If it means the migration won't complete,
it is up to the app to deal with that in some manner.

At most I think this is a documentation task to give guidance to
mgmt apps about what special SEV-only things to consider whe tuning
live migration.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]