qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[question] VFIO Device Migration: The vCPU may be paused during vfio dev


From: Kunkun Jiang
Subject: [question] VFIO Device Migration: The vCPU may be paused during vfio device DMA in iommu nested stage mode && vSVA
Date: Fri, 24 Sep 2021 14:18:58 +0800
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

Hi all,

I encountered a problem in vfio device migration test. The
vCPU may be paused during vfio-pci DMA in iommu nested
stage mode && vSVA. This may lead to migration fail and
other problems related to device hardware and driver
implementation.

It may be a bit early to discuss this issue, after all, the iommu
nested stage mode and vSVA are not yet mature. But judging
from the current implementation, we will definitely encounter
this problem in the future.

This is the current process of vSVA processing translation fault
in iommu nested stage mode (take SMMU as an example):

guest os            4.handle translation fault 5.send CMD_RESUME to vSMMU


qemu                3.inject fault into guest os 6.deliver response to host os
(vfio/vsmmu)


host os              2.notify the qemu 7.send CMD_RESUME to SMMU
(vfio/smmu)


SMMU              1.address translation fault              8.retry or terminate

The order is 1--->8.

Currently, qemu may pause vCPU at any step. It is possible to
pause vCPU at step 1-5, that is, in a DMA. This may lead to
migration fail and other problems related to device hardware
and driver implementation. For example, the device status
cannot be changed from RUNNING && SAVING to SAVING,
because the device DMA is not over.

As far as i can see, vCPU should not be paused during a device
IO process, such as DMA. However, currently live migration
does not pay attention to the state of vfio device when pausing
the vCPU. And if the vCPU is not paused, the vfio device is
always running. This looks like a *deadlock*.

Do you have any ideas to solve this problem?
Looking forward to your replay.

Thanks,
Kunkun Jiang






reply via email to

[Prev in Thread] Current Thread [Next in Thread]