qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live


From: Alex Williamson
Subject: Re: [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration
Date: Fri, 10 Sep 2021 16:24:35 -0600

On Fri, 10 Sep 2021 16:33:12 +0800
Kunkun Jiang <jiangkunkun@huawei.com> wrote:

> Hi Alex,
> 
> On 2021/9/9 4:45, Alex Williamson wrote:
> > On Fri, 3 Sep 2021 17:36:10 +0800
> > Kunkun Jiang <jiangkunkun@huawei.com> wrote:
> >  
> >> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
> >> vfio_pci_write_config to improve IO performance.
> >> The MemoryRegions of destination VM will not be expanded
> >> successful in live migration, because their addresses have
> >> been updated in vmstate_load_state (vfio_pci_load_config).  
> > What's the call path through vfio_pci_write_config() that you're
> > relying on to get triggered to enable this and why wouldn't we just
> > walk all sub-page BARs in vfio_pci_load_config() to resolve the issue
> > then?  It's my understanding that we do this update in write-config
> > because it's required that the VM sizes the BAR before using it, which
> > is not the case when we resume from migration.  Thanks,  
> Let's take an example:
> 
> AArch64
> host page granularity: 64KB
> PCI device: *Bar2 size 32KB* [mem 0x8000200000-0x800020ffff 64bit pref]
> 
> When enable Command register bit 1(Memory Space), the code flow is
> as follows:
> 
> vfio_pci_write_config (addr: 4 val: 2 len: 2)
>      // record the old address of each bar, 0xffffffffffffffff
>      old_addr[bar] = pdev->io_regions[bar].addr;
>      pci_default_write_config
>          pci_update_mappings
>              new_addr = pci_bar_address    // 0x8000200000
>              r->addr = new_addr;
>              memory_region_addr_subregion_overlap
>                  ...
> *vfio_listener_region_add*
>                      alignment check of the ram section address and size 
> fail, return
> *kvm_region_add*
>                      kvm_set_phys_mem
>                          alignment check of the ram section address and 
> size fail, return
> 
>      // old_addr[bar] != pdev->io_regions[bar].addr &&
>      // 0 < vdev->bars[bar].region.size < qemu_real_host_page_size
>      vfio_sub_page_bar_update_mapping
> *bar size = qemu_real_host_page_size*
>              ...
>              vfio_listener_region_add
>                  map success
>              kvm_region_add
>                  kvm_set_phys_mem
>                      map success
> 
> In live migration, only pci config data is sent to the destination VM.
> Therefore, we need to update the bar's size before destination VM
> using it.
> 
> In vfio_pci_load_config, the code flow is as follows:
> 
> vfio_pci_load_config
>      vmstate_load_state
>          *get_pci_config_device*
>              pci_update_mappings
>                  ...
>                  // bar's addr is updated(0x8000200000), but bar's size 
> is still 32KB, so map failed
>      vfio_pci_write_config
>          // bar's addr will not be changed, so 
> vfio_sub_page_bar_update_mapping won't be called
> 
> My idea is that removing the check 'old_addr[bar] != 
> pdev->io_regions[bar].addr' doesn't
> affect the previous process. There's also a bar size check. In 
> vfio_sub_page_bar_update_mapping,
> it will check if bar is mapped and page aligned.
> 1) If bar's addr is 0xffffffffffffffff, it will not pass the 
> vfio_sub_page_bar_update_mapping check.
> 2) If bar's size has been updated, it will not pass the bar size check 
> in vfio_pci_write_config.

The bar size check in vfio_pci_write_config() only tests if the vfio
region is >0 and <qemu_real_host_page_size, afaict our sub-page support
affects the size of the MemoryRegions mapping the vfio region, but we
never change the vfio region size itself.  So if you're expecting
(vdev->bars[bar].region.size == qemu_real_host_page_size) once we setup
the sub-page support, I'm not convinced that's true.

So yes, sub-page-update can reject invalid addresses and we already
rely on it to do so, but the code being removed avoids that redundant
writes to the BAR won't trigger redundant MemoryRegion manipulation.
Maybe those are harmless, but that's not your argument for allowing it.

OTOH, why wouldn't vfio_pci_load_config() iterate sub-page BARs and try
to update them at that time?  Thanks,

Alex


> >     
> >> Remove the restriction on base address change in
> >> vfio_pci_write_config for correct mmapping sub-page MMIO
> >> BARs. Accroding to my analysis, the remaining parameter
> >> verification is enough.
> >>
> >> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
> >> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
> >> Reported-by: Qixin Gan <ganqixin@huawei.com>
> >> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> >> ---
> >>   hw/vfio/pci.c | 8 +-------
> >>   1 file changed, 1 insertion(+), 7 deletions(-)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index e1ea1d8a23..891b211ddf 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -1189,18 +1189,12 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >>           }
> >>       } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
> >>           range_covers_byte(addr, len, PCI_COMMAND)) {
> >> -        pcibus_t old_addr[PCI_NUM_REGIONS - 1];
> >>           int bar;
> >>   
> >> -        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> >> -            old_addr[bar] = pdev->io_regions[bar].addr;
> >> -        }
> >> -
> >>           pci_default_write_config(pdev, addr, val, len);
> >>   
> >>           for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> >> -            if (old_addr[bar] != pdev->io_regions[bar].addr &&
> >> -                vdev->bars[bar].region.size > 0 &&
> >> +            if (vdev->bars[bar].region.size > 0 &&
> >>                   vdev->bars[bar].region.size < qemu_real_host_page_size) {
> >>                   vfio_sub_page_bar_update_mapping(pdev, bar);
> >>               }  
> > .  
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]