qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/2] dump-guest-memory: Add blocker for migration


From: Markus Armbruster
Subject: Re: [PATCH 0/2] dump-guest-memory: Add blocker for migration
Date: Wed, 01 Sep 2021 13:35:18 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Peter Xu <peterx@redhat.com> writes:

> Markus,
>
> On Wed, Aug 25, 2021 at 09:54:12AM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Both dump-guest-memory and live migration have vm state cached internally.
>> > Allowing them to happen together means the vm state can be messed up.  
>> > Simply
>> > block live migration for dump-guest-memory.
>> >
>> > One trivial thing to mention is we should still allow dump-guest-memory 
>> > even if
>> > -only-migratable is specified, because that flag should majorly be used to
>> > guarantee not adding devices that will block migration by accident.  Dump 
>> > guest
>> > memory is not like that - it'll only block for the seconds when it's 
>> > dumping.
>> 
>> I recently ran into a similarly unusual use of migration blockers:
>> 
>>     Subject: -only-migrate and the two different uses of migration blockers
>>      (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?)
>>     Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago)
>>     Message-ID: <87sg0amuuz.fsf_-_@dusky.pond.sub.org>
>> 
>>     We appear to use migration blockers in two ways:
>> 
>>     (1) Prevent migration for an indefinite time, typically due to use of
>>     some feature that isn't compatible with migration.
>> 
>>     (2) Delay migration for a short time.
>> 
>>     Option -only-migrate is designed for (1).  It interferes with (2).
>> 
>>     Example for (1): device "x-pci-proxy-dev" doesn't support migration.  It
>>     adds a migration blocker on realize, and deletes it on unrealize.  With
>>     -only-migrate, device realize fails.  Works as designed.
>> 
>>     Example for (2): spapr_mce_req_event() makes an effort to prevent
>>     migration degrate the reporting of FWNMIs.  It adds a migration blocker
>>     when it receives one, and deletes it when it's done handling it.  This
>>     is a best effort; if migration is already in progress by the time FWNMI
>>     is received, we simply carry on, and that's okay.  However, option
>>     -only-migrate sabotages the best effort entirely.
>> 
>>     While this isn't exactly terrible, it may be a weakness in our thinking
>>     and our infrastructure.  I'm bringing it up so the people in charge are
>>     aware :)
>> 
>> https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html
>> 
>> Downthread there, Dave Gilbert opined
>> 
>>     It almost feels like they need a way to temporarily hold off
>>     'completion' of migratio - i.e. the phase where we stop the CPU and
>>     write the device data;  mind you you'd also probably want it to stop
>>     cold-migrates/snapshots?
>
> Yeah, maybe spapr_mce_req_event() can be another candidate of the internal
> version of migration_add_blocker().
>
> I can add a patch to replace it if anyone likes me to.
>
> Both cold and live snapshot should have checked migration blockers, I think.
> E.g., cold snapshot has:
>
> bool save_snapshot(const char *name, bool overwrite, const char *vmstate,
>                   bool has_devices, strList *devices, Error **errp)
> {
>     [...]
>     if (migration_is_blocked(errp)) {
>         return false;
>     }
>     [...]
> }
>
> While the live snapshot shares similar code in migrate_prepare().
>
> So looks safe that nothing wrong should happen within add/del pair of 
> blockers.
>
> However I do see that it's possible we'll allow the add_blocker to succeed 
> even
> if during cold snapshot, because migration_is_idle() in 
> migration_add_blocker()
> only checks migration state, not RUN_STATE_SAVE_VM.  So I'm wondering whether
> we'd like one more patch to cover that too, like:
>
> ---8<---
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 41429680c2..9c602a4ac1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2055,15 +2055,16 @@ void migrate_init(MigrationState *s)
>  
>  int migrate_add_blocker_internal(Error *reason, Error **errp)
>  {
> -    if (migration_is_idle()) {
> -        migration_blockers = g_slist_prepend(migration_blockers, reason);
> -        return 0;
> +    /* Snapshots are similar to migrations, so check RUN_STATE_SAVE_VM too. 
> */
> +    if (runstate_check(RUN_STATE_SAVE_VM) || !migration_is_idle()) {
> +        error_propagate_prepend(errp, error_copy(reason),
> +                                "disallowing migration blocker "
> +                                "(migration in progress) for: ");
> +        return -EBUSY;
>      }
>  
> -    error_propagate_prepend(errp, error_copy(reason),
> -                            "disallowing migration blocker "
> -                            "(migration in progress) for: ");
> -    return -EBUSY;
> +    migration_blockers = g_slist_prepend(migration_blockers, reason);
> +    return 0;
>  }
>  
>  int migrate_add_blocker(Error *reason, Error **errp)
> ---8<---
>
> It'll by accident also cover guest dump which also sets RUN_STATE_SAVE_VM, but
> I think that's ok.
>
> Thanks,

I delayed answering this in the hope of finding the time to think.  No
luck.  This is a quick answer, hopefully better than nothing and not too
confused.

"Snapshot should honor migration blockers" feels like an independent
issue.  I can't comment on it, because I haven't done my thinking there.

I'm not sure you fully got Dave's suggestion to "temporarily hold off
'completion'".  Let me explain (Dave, please correct misunderstandings,
if any).

Migration blockers and migration are mutually exclusive: you can't
migrate while such blockers are in effect (trying to immediately fails),
and you can't add such blockers while migration is in progress.

The temporary blockers Dave suggested do not block migration, only its
*completion* phase.  This makes sense *only* for short-lived blockers.
If such temporary blockers are in effect when migration is ready to
enter its completion phase, that entry is delayed until the temporary
blockers are gone.  If you try to add a temporary blocker while
migration is in its completion phase, the add blocks until migration
exists its completion phase.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]