[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] vfio/migrate: Move switch of dirty tracking into vfio_memory
From: |
Keqian Zhu |
Subject: |
Re: [PATCH] vfio/migrate: Move switch of dirty tracking into vfio_memory_listener |
Date: |
Mon, 1 Mar 2021 10:04:09 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 |
Hi Kirti,
What's your opinion about this? Thanks.
Keqian
On 2021/1/30 14:30, Keqian Zhu wrote:
> Hi Kirti,
>
> On 2021/1/28 5:03, Kirti Wankhede wrote:
>>
>>
>> On 1/11/2021 1:04 PM, Keqian Zhu wrote:
>>> For now the switch of vfio dirty page tracking is integrated into
>>> the vfio_save_handler, it causes some problems [1].
>>>
>>
>> Sorry, I missed [1] mail, somehow it didn't landed in my inbox.
>>
>>> The object of dirty tracking is guest memory, but the object of
>>> the vfio_save_handler is device state. This mixed logic produces
>>> unnecessary coupling and conflicts:
>>>
>>> 1. Coupling: Their saving granule is different (perVM vs perDevice).
>>> vfio will enable dirty_page_tracking for each devices, actually
>>> once is enough.
>>
>> That's correct, enabling dirty page tracking once is enough. But log_start
>> and log_stop gets called on address space update transaction, region_add()
>> or region_del(), at this point migration may not be active. We don't want to
>> allocate bitmap memory in kernel for lifetime of VM, without knowing
>> migration will be happen or not. vfio_iommu_type1 module should allocate
>> bitmap memory only while migration is active.
>>
> Yeah, we can use global start/stop callbacks as suggested by Paolo, which
> solves this problem.
>
>> Paolo's suggestion here to use log_global_start and log_global_stop
>> callbacks seems correct here. But at this point vfio device state is not yet
>> changed to |_SAVING as you had identified it in [1]. May be we can start
>> tracking bitmap in iommu_type1 module while device is not yet _SAVING, but
>> getting dirty bitmap while device is yet not in _SAVING|_RUNNING state
>> doesn't seem optimal solution.
>>
>> Pasting here your question from [1]
>>
>>> Before start dirty tracking, we will check and ensure that the device
>>> is at _SAVING state and return error otherwise. But the question is
>>> that what is the rationale? Why does the VFIO_IOMMU_DIRTY_PAGES
>>> ioctl have something to do with the device state?
>>
>> Lets walk through the types of devices we are supporting:
>> 1. mdev devices without IOMMU backed device
>> Vendor driver pins pages as and when required during runtime. We can say
>> that vendor driver is smart which identifies the pages to pin. We are good
>> here.
>>
>> 2. mdev device with IOMMU backed device
>> This is similar to vfio-pci, direct assigned device, where all pages are
>> pinned at VM bootup. Vendor driver is not smart, so bitmap query will report
>> all pages dirty always. If --auto-converge is not set, VM stucks infinitely
>> in pre-copy phase. This is known to us.
>>
> little question here ;-) . Why auto-converge (slow down vCPU) helps to ease
> the case of full dirty?
>
>> 3. mdev device with IOMMU backed device with smart vendor driver
>> In this case as well all pages are pinned at VM bootup, but vendor
>> driver is smart to identify the pages and pin them explicitly.
>> Pages can be pinned anytime, i.e. during normal VM runtime or on setting
>> _SAVING flag (entering pre-copy phase) or while in iterative pre-copy phase.
>> There is no restriction based on these phases for calling vfio_pin_pages().
>> Vendor driver can start pinning pages based on its device state when _SAVING
>> flag is set. In that case, if dirty bitmap is queried before that then it
>> will report all sysmem as dirty with an unnecessary copy of sysmem.
>> As an optimal solution, I think its better to query bitmap only after all
>> vfio devices are in pre-copy phase, i.e. after _SAVING flag is set.
> OK, I get your idea. But Qemu assumes all pages are dirty initially, this
> seems not a problem.
> Let's assume we have a device of type 3, and this device starts to pin pages
> on setting _SAVING flag.
>
> Before this patch, the work flow is:
> {
> ram_save_setup()
> memory_global_dirty_log_start(): start dirty tracking excludes vfio part.
> migration_bitmap_sync_precopy(): try to sync dirty log from kvm, vhost
> etc, including vfio (as all device saving is not satisfied, fail to get log
> from vfio). The result is that bitmap of ramblock is all dirty.
>
> vfio_save_setup() of this device
> vfio_migration_set_state(): Add SAVING state to this device, and vfio
> starts to log dirty page of this device.
>
> first round (i.e. bulk stage) of ram saving: only handle dirty log which is
> collected from the first call of migration_bitmap_sync_precopy().
>
> iterative stage of ram saving: when the remaining dirty log is less than
> threshold, call migration_bitmap_sync_precopy() again. At this stage, all
> device is saving, so success to get log from vfio.
> }
>
> With this patch, the work flow is:
> {
> ram_save_setup()
> memory_global_dirty_log_start(): start dirty tracking includes vfio part.
> migration_bitmap_sync_precopy(): try to sync dirty log from kvm, vhost
> etc, including vfio (as all device saving is not checked, success to get full
> dirty log from vfio). The result is that bitmap of ramblock is all dirty.
> vfio_save_setup() of this device
> vfio_migration_set_state(): Add SAVING state to this device, and vfio
> starts to log dirty page of this device.
>
> first round of ram saving: only handles dirty log which is collected from the
> first call of migration_bitmap_sync_precopy().
>
> iterative stage of ram saving: when the remaining dirty log is less than a
> threshold, calls migration_bitmap_sync_precopy() again. At this stage, all
> device is saving, so success to get log from vfio.
> }
>
> We can see that there is no unnecessary copy of guest memory with this patch.
>
>>
>>> 2. Conflicts: The ram_save_setup() traverses all memory_listeners
>>> to execute their log_start() and log_sync() hooks to get the
>>> first round dirty bitmap, which is used by the bulk stage of
>>> ram saving. However, it can't get dirty bitmap from vfio, as
>>> @savevm_ram_handlers is registered before @vfio_save_handler.
>>>
>> Right, but it can get dirty bitmap from vfio device in it's iterative
>> callback
>> ram_save_pending ->
>> migration_bitmap_sync_precopy() .. ->
>> vfio_listerner_log_sync
>>
> Yeah, this work flow is OK. But we are not able to handle vfio dirty log at
> the first round of ram saving.
>
> We plan to add a new interface named manual_log_clear in vfio[1]. If this
> interface is enabled, kernel vfio
> doesn't automatically clear and repopulate dirty bitmap when report dirty log
> to Qemu. These actions will
> be triggered by Qemu when it invokes manual_log_clear
> (memory_region_clear_dirty_bitmap()) before handles
> the dirty log.
>
> Under the following conditions:
> (1)Qemu can handles vfio dirty log at the first round of ram saving.
> (2)device is of type1.
> (3)manual_log_clear is enabled.
> (3)device driver unpins some scopes during the first round of ram saving
> (before Qemu actually handles these scopes).
>
> Then these unpinned scope is not reported to Qemu at iterative stage
> (manual_log_clear clears them and re-population
> does not contains them), which avoid unnecessary copy of guest memory.
>
> Thanks,
> Keqian
>
> [1]https://lore.kernel.org/kvmarm/20210128151742.18840-1-zhukeqian1@huawei.com/
>
>
>> Thanks,
>> Kirti
>>
>>> Move the switch of vfio dirty_page_tracking into vfio_memory_listener
>>> can solve above problems. Besides, Do not require devices in SAVING
>>> state for vfio_sync_dirty_bitmap().
>>>
>>> [1] https://www.spinics.net/lists/kvm/msg229967.html
>>>
>>> Reported-by: Zenghui Yu <yuzenghui@huawei.com>
>>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>>> ---
>>> hw/vfio/common.c | 53 +++++++++++++++++++++++++++++++++++++--------
>>> hw/vfio/migration.c | 35 ------------------------------
>>> 2 files changed, 44 insertions(+), 44 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 6ff1daa763..9128cd7ee1 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -311,7 +311,7 @@ bool vfio_mig_active(void)
>>> return true;
>>> }
>>> -static bool vfio_devices_all_saving(VFIOContainer *container)
>>> +static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>>> {
>>> VFIOGroup *group;
>>> VFIODevice *vbasedev;
>>> @@ -329,13 +329,8 @@ static bool vfio_devices_all_saving(VFIOContainer
>>> *container)
>>> return false;
>>> }
>>> - if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
>>> - if ((vbasedev->pre_copy_dirty_page_tracking ==
>>> ON_OFF_AUTO_OFF)
>>> - && (migration->device_state &
>>> VFIO_DEVICE_STATE_RUNNING)) {
>>> - return false;
>>> - }
>>> - continue;
>>> - } else {
>>> + if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
>>> + && (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
>>> return false;
>>> }
>>> }
>>> @@ -987,6 +982,44 @@ static void vfio_listener_region_del(MemoryListener
>>> *listener,
>>> }
>>> }
>>> +static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool
>>> start)
>>> +{
>>> + int ret;
>>> + struct vfio_iommu_type1_dirty_bitmap dirty = {
>>> + .argsz = sizeof(dirty),
>>> + };
>>> +
>>> + if (start) {
>>> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>>> + } else {
>>> + dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
>>> + }
>>> +
>>> + ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>>> + if (ret) {
>>> + error_report("Failed to set dirty tracking flag 0x%x errno: %d",
>>> + dirty.flags, errno);
>>> + }
>>> +}
>>> +
>>> +static void vfio_listener_log_start(MemoryListener *listener,
>>> + MemoryRegionSection *section,
>>> + int old, int new)
>>> +{
>>> + VFIOContainer *container = container_of(listener, VFIOContainer,
>>> listener);
>>> +
>>> + vfio_set_dirty_page_tracking(container, true);
>>> +}
>>> +
>>> +static void vfio_listener_log_stop(MemoryListener *listener,
>>> + MemoryRegionSection *section,
>>> + int old, int new)
>>> +{
>>> + VFIOContainer *container = container_of(listener, VFIOContainer,
>>> listener);
>>> +
>>> + vfio_set_dirty_page_tracking(container, false);
>>> +}
>>> +
>>> static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>>> uint64_t size, ram_addr_t ram_addr)
>>> {
>>> @@ -1128,7 +1161,7 @@ static void vfio_listerner_log_sync(MemoryListener
>>> *listener,
>>> return;
>>> }
>>> - if (vfio_devices_all_saving(container)) {
>>> + if (vfio_devices_all_dirty_tracking(container)) {
>>> vfio_sync_dirty_bitmap(container, section);
>>> }
>>> }
>>> @@ -1136,6 +1169,8 @@ static void vfio_listerner_log_sync(MemoryListener
>>> *listener,
>>> static const MemoryListener vfio_memory_listener = {
>>> .region_add = vfio_listener_region_add,
>>> .region_del = vfio_listener_region_del,
>>> + .log_start = vfio_listener_log_start,
>>> + .log_stop = vfio_listener_log_stop,
>>> .log_sync = vfio_listerner_log_sync,
>>> };
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 00daa50ed8..c0f646823a 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -395,40 +395,10 @@ static int vfio_load_device_config_state(QEMUFile *f,
>>> void *opaque)
>>> return qemu_file_get_error(f);
>>> }
>>> -static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)
>>> -{
>>> - int ret;
>>> - VFIOMigration *migration = vbasedev->migration;
>>> - VFIOContainer *container = vbasedev->group->container;
>>> - struct vfio_iommu_type1_dirty_bitmap dirty = {
>>> - .argsz = sizeof(dirty),
>>> - };
>>> -
>>> - if (start) {
>>> - if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
>>> - dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
>>> - } else {
>>> - return -EINVAL;
>>> - }
>>> - } else {
>>> - dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
>>> - }
>>> -
>>> - ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
>>> - if (ret) {
>>> - error_report("Failed to set dirty tracking flag 0x%x errno: %d",
>>> - dirty.flags, errno);
>>> - return -errno;
>>> - }
>>> - return ret;
>>> -}
>>> -
>>> static void vfio_migration_cleanup(VFIODevice *vbasedev)
>>> {
>>> VFIOMigration *migration = vbasedev->migration;
>>> - vfio_set_dirty_page_tracking(vbasedev, false);
>>> -
>>> if (migration->region.mmaps) {
>>> vfio_region_unmap(&migration->region);
>>> }
>>> @@ -469,11 +439,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>>> return ret;
>>> }
>>> - ret = vfio_set_dirty_page_tracking(vbasedev, true);
>>> - if (ret) {
>>> - return ret;
>>> - }
>>> -
>>> qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>>> ret = qemu_file_get_error(f);
>>>
>> .
>>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH] vfio/migrate: Move switch of dirty tracking into vfio_memory_listener,
Keqian Zhu <=