qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vfio/migrate: Move switch of dirty tracking into vfio_memory


From: Keqian Zhu
Subject: Re: [PATCH] vfio/migrate: Move switch of dirty tracking into vfio_memory_listener
Date: Thu, 28 Jan 2021 23:24:13 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1

Hi Paolo and Kirti,

Many thanks for reply. I am busy today and will reply you tomorrow, Thanks.

Keqian.

On 2021/1/28 5:03, Kirti Wankhede wrote:
> 
> 
> On 1/11/2021 1:04 PM, Keqian Zhu wrote:
>> For now the switch of vfio dirty page tracking is integrated into
>> the vfio_save_handler, it causes some problems [1].
>>
> 
> Sorry, I missed [1] mail, somehow it didn't landed in my inbox.
> 
>> The object of dirty tracking is guest memory, but the object of
>> the vfio_save_handler is device state. This mixed logic produces
>> unnecessary coupling and conflicts:
>>
>> 1. Coupling: Their saving granule is different (perVM vs perDevice).
>>     vfio will enable dirty_page_tracking for each devices, actually
>>     once is enough.
> 
> That's correct, enabling dirty page tracking once is enough. But log_start 
> and log_stop gets called on address space update transaction, region_add() or 
> region_del(), at this point migration may not be active. We don't want to 
> allocate bitmap memory in kernel for lifetime of VM, without knowing 
> migration will be happen or not. vfio_iommu_type1 module should allocate 
> bitmap memory only while migration is active.
> 
> Paolo's suggestion here to use log_global_start and log_global_stop callbacks 
> seems correct here. But at this point vfio device state is not yet changed to 
> |_SAVING as you had identified it in [1]. May be we can start tracking bitmap 
> in iommu_type1 module while device is not yet _SAVING, but getting dirty 
> bitmap while device is yet not in _SAVING|_RUNNING state doesn't seem optimal 
> solution.
> 
> Pasting here your question from [1]
> 
>> Before start dirty tracking, we will check and ensure that the device
>>  is at _SAVING state and return error otherwise.  But the question is
>>  that what is the rationale?  Why does the VFIO_IOMMU_DIRTY_PAGES
>> ioctl have something to do with the device state?
> 
> Lets walk through the types of devices we are supporting:
> 1. mdev devices without IOMMU backed device
>     Vendor driver pins pages as and when required during runtime. We can say 
> that vendor driver is smart which identifies the pages to pin. We are good 
> here.
> 
> 2. mdev device with IOMMU backed device
>     This is similar to vfio-pci, direct assigned device, where all pages are 
> pinned at VM bootup. Vendor driver is not smart, so bitmap query will report 
> all pages dirty always. If --auto-converge is not set, VM stucks infinitely 
> in pre-copy phase. This is known to us.
> 
> 3. mdev device with IOMMU backed device with smart vendor driver
>     In this case as well all pages are pinned at VM bootup, but vendor driver 
> is smart to identify the pages and pin them explicitly.
> Pages can be pinned anytime, i.e. during normal VM runtime or on setting 
> _SAVING flag (entering pre-copy phase) or while in iterative pre-copy phase. 
> There is no restriction based on these phases for calling vfio_pin_pages(). 
> Vendor driver can start pinning pages based on its device state when _SAVING 
> flag is set. In that case, if dirty bitmap is queried before that then it 
> will report all sysmem as dirty with an unnecessary copy of sysmem.
> As an optimal solution, I think its better to query bitmap only after all 
> vfio devices are in pre-copy phase, i.e. after _SAVING flag is set.
> 
>> 2. Conflicts: The ram_save_setup() traverses all memory_listeners
>>     to execute their log_start() and log_sync() hooks to get the
>>     first round dirty bitmap, which is used by the bulk stage of
>>     ram saving. However, it can't get dirty bitmap from vfio, as
>>     @savevm_ram_handlers is registered before @vfio_save_handler.
>>
> Right, but it can get dirty bitmap from vfio device in it's iterative callback
> ram_save_pending ->
>     migration_bitmap_sync_precopy() .. ->
>          vfio_listerner_log_sync
> 
> Thanks,
> Kirti
> 
>> Move the switch of vfio dirty_page_tracking into vfio_memory_listener
>> can solve above problems. Besides, Do not require devices in SAVING
>> state for vfio_sync_dirty_bitmap().
>>
>> [1] https://www.spinics.net/lists/kvm/msg229967.html
>>
>> Reported-by: Zenghui Yu <yuzenghui@huawei.com>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>   hw/vfio/common.c    | 53 +++++++++++++++++++++++++++++++++++++--------
>>   hw/vfio/migration.c | 35 ------------------------------
>>   2 files changed, 44 insertions(+), 44 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 6ff1daa763..9128cd7ee1 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -311,7 +311,7 @@ bool vfio_mig_active(void)
>>       return true;
>>   }
>>   -static bool vfio_devices_all_saving(VFIOContainer *container)
>> +static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>>   {
>>       VFIOGroup *group;
>>       VFIODevice *vbasedev;
>> @@ -329,13 +329,8 @@ static bool vfio_devices_all_saving(VFIOContainer 
>> *container)
>>                   return false;
>>               }
>>   -            if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
>> -                if ((vbasedev->pre_copy_dirty_page_tracking == 
>> ON_OFF_AUTO_OFF)
>> -                    && (migration->device_state & 
>> VFIO_DEVICE_STATE_RUNNING)) {
>> -                        return false;
>> -                }
>> -                continue;
>> -            } else {
>> +            if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
>> +                && (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
>>                   return false;
>>               }
>>           }
>> @@ -987,6 +982,44 @@ static void vfio_listener_region_del(MemoryListener 
>> *listener,
>>       }
>>   }
>>   +static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool 
>> start)
>> +{
>> +    int ret;
>> +    struct vfio_iommu_type1_dirty_bitmap dirty = {
>> +        .argsz = sizeof(dirty),
>> +    };
>> +
>> +    if (start) {
>> +        dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>> +    } else {
>> +        dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
>> +    }
>> +
>> +    ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>> +    if (ret) {
>> +        error_report("Failed to set dirty tracking flag 0x%x errno: %d",
>> +                     dirty.flags, errno);
>> +    }
>> +}
>> +
>> +static void vfio_listener_log_start(MemoryListener *listener,
>> +                                    MemoryRegionSection *section,
>> +                                    int old, int new)
>> +{
>> +    VFIOContainer *container = container_of(listener, VFIOContainer, 
>> listener);
>> +
>> +    vfio_set_dirty_page_tracking(container, true);
>> +}
>> +
>> +static void vfio_listener_log_stop(MemoryListener *listener,
>> +                                   MemoryRegionSection *section,
>> +                                   int old, int new)
>> +{
>> +    VFIOContainer *container = container_of(listener, VFIOContainer, 
>> listener);
>> +
>> +    vfio_set_dirty_page_tracking(container, false);
>> +}
>> +
>>   static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>>                                    uint64_t size, ram_addr_t ram_addr)
>>   {
>> @@ -1128,7 +1161,7 @@ static void vfio_listerner_log_sync(MemoryListener 
>> *listener,
>>           return;
>>       }
>>   -    if (vfio_devices_all_saving(container)) {
>> +    if (vfio_devices_all_dirty_tracking(container)) {
>>           vfio_sync_dirty_bitmap(container, section);
>>       }
>>   }
>> @@ -1136,6 +1169,8 @@ static void vfio_listerner_log_sync(MemoryListener 
>> *listener,
>>   static const MemoryListener vfio_memory_listener = {
>>       .region_add = vfio_listener_region_add,
>>       .region_del = vfio_listener_region_del,
>> +    .log_start = vfio_listener_log_start,
>> +    .log_stop = vfio_listener_log_stop,
>>       .log_sync = vfio_listerner_log_sync,
>>   };
>>   diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 00daa50ed8..c0f646823a 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -395,40 +395,10 @@ static int vfio_load_device_config_state(QEMUFile *f, 
>> void *opaque)
>>       return qemu_file_get_error(f);
>>   }
>>   -static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)
>> -{
>> -    int ret;
>> -    VFIOMigration *migration = vbasedev->migration;
>> -    VFIOContainer *container = vbasedev->group->container;
>> -    struct vfio_iommu_type1_dirty_bitmap dirty = {
>> -        .argsz = sizeof(dirty),
>> -    };
>> -
>> -    if (start) {
>> -        if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
>> -            dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>> -        } else {
>> -            return -EINVAL;
>> -        }
>> -    } else {
>> -            dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
>> -    }
>> -
>> -    ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>> -    if (ret) {
>> -        error_report("Failed to set dirty tracking flag 0x%x errno: %d",
>> -                     dirty.flags, errno);
>> -        return -errno;
>> -    }
>> -    return ret;
>> -}
>> -
>>   static void vfio_migration_cleanup(VFIODevice *vbasedev)
>>   {
>>       VFIOMigration *migration = vbasedev->migration;
>>   -    vfio_set_dirty_page_tracking(vbasedev, false);
>> -
>>       if (migration->region.mmaps) {
>>           vfio_region_unmap(&migration->region);
>>       }
>> @@ -469,11 +439,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>>           return ret;
>>       }
>>   -    ret = vfio_set_dirty_page_tracking(vbasedev, true);
>> -    if (ret) {
>> -        return ret;
>> -    }
>> -
>>       qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>         ret = qemu_file_get_error(f);
>>
> .
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]