qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/9] vfio/migration: Skip pre-copy if dirty page tracking is


From: Juan Quintela
Subject: Re: [PATCH 4/9] vfio/migration: Skip pre-copy if dirty page tracking is not supported
Date: Wed, 18 May 2022 13:39:31 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Jason Gunthorpe <jgg@nvidia.com> wrote:
> On Tue, May 17, 2022 at 10:00:45AM -0600, Alex Williamson wrote:
>
>> > This is really intended to be a NOP from where things are now, as if
>> > you use mlx5 live migration without a patch like this then it causes a
>> > botched pre-copy since everything just ends up permanently dirty.
>> > 
>> > If it makes more sense we could abort the pre-copy too - in the end
>> > there will be dirty tracking so I don't know if I'd invest in a big
>> > adventure to fully define non-dirty tracking migration.
>> 
>> How is pre-copy currently "botched" without a patch like this?  If it's
>> simply that the pre-copy doesn't converge and the downtime constraints
>> don't allow the VM to enter stop-and-copy, that's the expected behavior
>> AIUI, and supports backwards compatibility with existing SLAs.
>
> It means it always fails - that certainly isn't working live
> migration. There is no point in trying to converge something that we
> already know will never converge.

Fully agree with you here.

But not how this is being done.  I think we need a way to convince the
migration code that it shouldn't even try to migrate RAM.  That would
fix the current use case, and your use case.

>> I'm assuming that by setting this new skip_precopy flag that we're
>> forcing the VM to move to stop-and-copy, regardless of any other SLA
>> constraints placed on the migration.  
>
> That does seem like a defect in this patch, any SLA constraints should
> still all be checked under the assumption all ram is dirty.

And how are we going to:
- detect the network link speed
- to be sure that we are inside downtime limit

I think that it is not possible, so basically we are skiping the precopy
stage and praying that the other bits are going to be ok.

>> It seems like a better solution would be to expose to management
>> tools that the VM contains a device that does not support the
>> pre-copy phase so that downtime expectations can be adjusted.
>
> I don't expect this to be a real use case though..
>
> Remember, you asked for this patch when you wanted qemu to have good
> behavior when kernel support for legacy dirty tracking is removed
> before we merge v2 support.

I am an ignorant on the subject.  Can I ask how the dirty memory is
tracked on this v2?

Thanks, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]