aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/vfio (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-28vfio/type1: handle DMA map/unmap up to the addressable limitAlex Mastro1-35/+42
Before this commit, it was possible to create end of address space mappings, but unmapping them via VFIO_IOMMU_UNMAP_DMA, replaying them for newly added iommu domains, and querying their dirty pages via VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP was broken due to bugs caused by comparisons against (iova + size) expressions, which overflow to zero. Additionally, there appears to be a page pinning leak in the vfio_iommu_type1_release() path, since vfio_unmap_unpin()'s loop body where unmap_unpin_*() are called will never be entered due to overflow of (iova + size) to zero. This commit handles DMA map/unmap operations up to the addressable limit by comparing against inclusive end-of-range limits, and changing iteration to perform relative traversals across range sizes, rather than absolute traversals across addresses. vfio_link_dma() inserts a zero-sized vfio_dma into the rb-tree, and is only used for that purpose, so discard the size from consideration for the insertion point. Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251028-fix-unmap-v6-3-2542b96bcc8e@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-10-28vfio/type1: move iova increment to unmap_unpin_*() callerAlex Mastro1-10/+10
Move incrementing iova to the caller of these functions as part of preparing to handle end of address space map/unmap. Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251028-fix-unmap-v6-2-2542b96bcc8e@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-10-28vfio/type1: sanitize for overflow using check_*_overflow()Alex Mastro1-23/+63
Adopt check_*_overflow() functions to clearly express overflow check intent. Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251028-fix-unmap-v6-1-2542b96bcc8e@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-10-06vfio: Dump migration features under debugfsCédric Le Goater1-0/+19
A debugfs directory was recently added for VFIO devices. Add a new "features" file under the migration sub-directory to expose which features the device supports. Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20250918121928.1921871-1-clg@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-10-06vfio/type1: optimize vfio_unpin_pages_remote()Li Zhe1-4/+16
When vfio_unpin_pages_remote() is called with a range of addresses that includes large folios, the function currently performs individual put_pfn() operations for each page. This can lead to significant performance overheads, especially when dealing with large ranges of pages. It would be very rare for reserved PFNs and non reserved will to be mixed within the same range. So this patch utilizes the has_rsvd variable introduced in the previous patch to determine whether batch put_pfn() operations can be performed. Moreover, compared to put_pfn(), unpin_user_page_range_dirty_lock() is capable of handling large folio scenarios more efficiently. The performance test results for completing the 16G VFIO IOMMU DMA unmapping are as follows. Base(v6.16): ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO UNMAP DMA in 0.141 s (113.7 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO UNMAP DMA in 0.307 s (52.2 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO UNMAP DMA in 0.135 s (118.6 GB/s) With this patchset: ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO UNMAP DMA in 0.044 s (363.2 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO UNMAP DMA in 0.289 s (55.3 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO UNMAP DMA in 0.044 s (361.3 GB/s) For large folio, we achieve an over 67% performance improvement in the VFIO UNMAP DMA item. For small folios, the performance test results appear to show a slight improvement. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250814064714.56485-6-lizhe.67@bytedance.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-10-06vfio/type1: introduce a new member has_rsvd for struct vfio_dmaLi Zhe1-0/+2
Introduce a new member has_rsvd for struct vfio_dma. This member is used to indicate whether there are any reserved or invalid pfns in the region represented by this vfio_dma. If it is true, it indicates that there is at least one pfn in this region that is either reserved or invalid. Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250814064714.56485-5-lizhe.67@bytedance.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-10-06vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()Li Zhe1-7/+3
The function vpfn_pages() can help us determine the number of vpfn nodes on the vpfn rb tree within a specified range. This allows us to avoid searching for each vpfn individually in the function vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn() calls in function vfio_unpin_pages_remote(). Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Link: https://lore.kernel.org/r/20250814064714.56485-4-lizhe.67@bytedance.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-10-06vfio/type1: optimize vfio_pin_pages_remote()Li Zhe1-12/+72
When vfio_pin_pages_remote() is called with a range of addresses that includes large folios, the function currently performs individual statistics counting operations for each page. This can lead to significant performance overheads, especially when dealing with large ranges of pages. Batch processing of statistical counting operations can effectively enhance performance. In addition, the pages obtained through longterm GUP are neither invalid nor reserved. Therefore, we can reduce the overhead associated with some calls to function is_invalid_reserved_pfn(). The performance test results for completing the 16G VFIO IOMMU DMA mapping are as follows. Base(v6.16): ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.049 s (328.5 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.268 s (59.6 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.051 s (310.9 GB/s) With this patch: ------- AVERAGE (MADV_HUGEPAGE) -------- VFIO MAP DMA in 0.025 s (629.8 GB/s) ------- AVERAGE (MAP_POPULATE) -------- VFIO MAP DMA in 0.253 s (63.1 GB/s) ------- AVERAGE (HUGETLBFS) -------- VFIO MAP DMA in 0.030 s (530.5 GB/s) For large folio, we achieve an over 40% performance improvement. For small folios, the performance test results indicate a slight improvement. Signed-off-by: Li Zhe <lizhe.67@bytedance.com> Co-developed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20250814064714.56485-3-lizhe.67@bytedance.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-10-04Merge tag 'vfio-v6.18-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds15-11/+78
Pull VFIO updates from Alex Williamson: - Use fdinfo to expose the sysfs path of a device represented by a vfio device file (Alex Mastro) - Mark vfio-fsl-mc, vfio-amba, and the reset functions for vfio-platform for removal as these are either orphaned or believed to be unused (Alex Williamson) - Add reviewers for vfio-platform to save it from also being marked for removal (Mostafa Saleh, Pranjal Shrivastava) - VFIO selftests, including basic sanity testing and minimal userspace drivers for testing against real hardware. This is also expected to provide integration with KVM selftests for KVM-VFIO interfaces (David Matlack, Josh Hilke) - Fix drivers/cdx and vfio/cdx to build without CONFIG_GENERIC_MSI_IRQ (Nipun Gupta) - Fix reference leak in hisi_acc (Miaoqian Lin) - Use consistent return for unsupported device feature (Alex Mastro) - Unwind using the correct memory free callback in vfio/pds (Zilin Guan) - Use IRQ_DISABLE_LAZY flag to improve handling of pre-PCI2.3 INTx and resolve stalled interrupt on ppc64 (Timothy Pearson) - Enable GB300 in nvgrace-gpu vfio-pci variant driver (Tushar Dave) - Misc: - Drop unnecessary ternary conversion in vfio/pci (Xichao Zhao) - Grammatical fix in nvgrace-gpu (Morduan Zang) - Update Shameer's email address (Shameer Kolothum) - Fix document build warning (Alex Williamson) * tag 'vfio-v6.18-rc1' of https://github.com/awilliam/linux-vfio: (48 commits) vfio/nvgrace-gpu: Add GB300 SKU to the devid table vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devices vfio/pds: replace bitmap_free with vfree vfio: return -ENOTTY for unsupported device feature hisi_acc_vfio_pci: Fix reference leak in hisi_acc_vfio_debug_init vfio/platform: Mark reset drivers for removal vfio/amba: Mark for removal MAINTAINERS: Add myself as VFIO-platform reviewer MAINTAINERS: Add myself as VFIO-platform reviewer docs: proc.rst: Fix VFIO Device title formatting vfio: selftests: Fix .gitignore for already tracked files vfio/cdx: update driver to build without CONFIG_GENERIC_MSI_IRQ cdx: don't select CONFIG_GENERIC_MSI_IRQ MAINTAINERS: Update Shameer Kolothum's email address vfio: selftests: Add a script to help with running VFIO selftests vfio: selftests: Make iommufd the default iommu_mode vfio: selftests: Add iommufd mode vfio: selftests: Add iommufd_compat_type1{,v2} modes vfio: selftests: Add vfio_type1v2_mode vfio: selftests: Replicate tests across all iommu_modes ...
2025-09-26vfio/nvgrace-gpu: Add GB300 SKU to the devid tableTushar Dave1-0/+2
GB300 is NVIDIA's Grace Blackwell Ultra Superchip. Add the GB300 SKU device-id to nvgrace_gpu_vfio_pci_table. Signed-off-by: Tushar Dave <tdave@nvidia.com> Reviewed-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20250925170935.121587-1-tdave@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-09-26vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devicesTimothy Pearson1-0/+7
PCI devices prior to PCI 2.3 both use level interrupts and do not support interrupt masking, leading to a failure when passed through to a KVM guest on at least the ppc64 platform. This failure manifests as receiving and acknowledging a single interrupt in the guest, while the device continues to assert the level interrupt indicating a need for further servicing. When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following sequence occurs: * Level IRQ assertion on device * IRQ marked disabled in kernel * Host interrupt handler exits without clearing the interrupt on the device * Eventfd is delivered to userspace * Guest processes IRQ and clears device interrupt * Device de-asserts INTx, then re-asserts INTx while the interrupt is masked * Newly asserted interrupt acknowledged by kernel VMM without being handled * Software mask removed by VFIO driver * Device INTx still asserted, host controller does not see new edge after EOI The behavior is now platform-dependent. Some platforms (amd64) will continue to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ will be handled by the host as soon as the mask is dropped. Others (ppc64) will only send the one request, and if it is not handled no further interrupts will be sent. The former behavior theoretically leaves the system vulnerable to interrupt storm, and the latter will result in the device stalling after receiving exactly one interrupt in the guest. Work around this by disabling lazy IRQ masking for DisINTx- INTx devices. Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> Link: https://lore.kernel.org/r/333803015.1744464.1758647073336.JavaMail.zimbra@raptorengineeringinc.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-09-21vfio/pci: drop nth_page() usage within SG entryDavid Hildenbrand2-4/+2
It's no longer required to use nth_page() when iterating pages within a single SG entry, so let's drop the nth_page() usage. Link: https://lkml.kernel.org/r/20250901150359.867252-33-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Yishai Hadas <yishaih@nvidia.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-19vfio/pds: replace bitmap_free with vfreeZilin Guan1-1/+1
host_seq_bmp is allocated with vzalloc but is currently freed with bitmap_free, which uses kfree internally. This mismach prevents the resource from being released properly and may result in memory leaks or other issues. Fix this by freeing host_seq_bmp with vfree to match the vzalloc allocation. Fixes: f232836a9152 ("vfio/pds: Add support for dirty page tracking") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://lore.kernel.org/r/20250913153154.1028835-1-zilin@seu.edu.cn Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-09-19vfio: return -ENOTTY for unsupported device featureAlex Mastro1-1/+1
The two implementers of vfio_device_ops.device_feature, vfio_cdx_ioctl_feature and vfio_pci_core_ioctl_feature, return -ENOTTY in the fallthrough case when the feature is unsupported. For consistency, the base case, vfio_ioctl_device_feature, should do the same when device_feature == NULL, indicating an implementation has no feature extensions. Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20250908-vfio-enotty-v1-1-4428e1539e2e@fb.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-09-19hisi_acc_vfio_pci: Fix reference leak in hisi_acc_vfio_debug_initMiaoqian Lin1-1/+5
The debugfs_lookup() function returns a dentry with an increased reference count that must be released by calling dput(). Fixes: b398f91779b8 ("hisi_acc_vfio_pci: register debugfs for hisilicon migration driver") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20250901081809.2286649-1-linmq006@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-09-19vfio/platform: Mark reset drivers for removalAlex Williamson4-3/+9
While vfio-platform itself is on a reprieve from being removed[1], these reset drivers don't support any current hardware, are not being tested, and suggest a level of support that doesn't really exist. Mark them for removal to surface any remaining user such that we can potentially drop them and simplify the code if none appear. Link: https://lore.kernel.org/all/20250806170314.3768750-3-alex.williamson@redhat.com [1] Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20250825175807.3264083-3-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-09-19vfio/amba: Mark for removalAlex Williamson2-1/+6
vfio-amba has only been touched to keep up with the rest of the code base for the past 10 years. We have no basis to believe that it's currently tested or used. Mark it for deprecation. Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20250825175807.3264083-2-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-27vfio/cdx: update driver to build without CONFIG_GENERIC_MSI_IRQNipun Gupta2-1/+19
Define dummy MSI related APIs in VFIO CDX driver to build the driver without enabling CONFIG_GENERIC_MSI_IRQ flag. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508070308.opy5dIFX-lkp@intel.com/ Reviewed-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Link: https://lore.kernel.org/r/20250826043852.2206008-2-nipun.gupta@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-27vfio/nvgrace-gpu: fix grammatical errorMorduan Zang1-1/+1
The word "as" in the comment should be replaced with "is", and there is an extra space in the comment. Signed-off-by: Morduan Zang <zhangdandan@uniontech.com> Reviewed-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/54E1ED6C5A2682C8+20250814110358.285412-1-zhangdandan@uniontech.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-27vfio/pci: drop redundant conversion to boolXichao Zhao1-1/+1
The result of integer comparison already evaluates to bool. No need for explicit conversion. No functional impact. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Link: https://lore.kernel.org/r/20250818085201.510206-1-zhao.xichao@vivo.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-27vfio/fsl-mc: Mark for removalAlex Williamson2-1/+6
The driver has been orphaned for more than a year, mark it for removal. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20250806170314.3768750-2-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-25vfio/pci: print vfio-device syspath to fdinfoAlex Mastro1-0/+20
Print the PCI device syspath to a vfio device's fdinfo. This enables tools to query which device is associated with a given vfio device fd. This results in output like below: $ cat /proc/"$SOME_PID"/fdinfo/"$VFIO_FD" | grep vfio vfio-device-syspath: /sys/devices/pci0000:e0/0000:e0:01.1/0000:e1:00.0/0000:e2:05.0/0000:e8:00.0 Signed-off-by: Alex Mastro <amastro@fb.com> Reviewed-by: Amit Machhiwal <amachhiw@linux.ibm.com> Tested-by: Amit Machhiwal <amachhiw@linux.ibm.com> Link: https://lore.kernel.org/r/20250804-show-fdinfo-v4-1-96b14c5691b3@fb.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-07Merge tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfioLinus Torvalds14-20/+82
Pull VFIO updates from Alex Williamson: - Fix imbalance where the no-iommu/cdev device path skips too much on open, failing to increment a reference, but still decrements the reference on close. Add bounds checking to prevent such underflows (Jacob Pan) - Fill missing detach_ioas op for pds_vfio_pci, fixing probe failure when used with IOMMUFD (Brett Creeley) - Split SR-IOV VFs to separate dev_set, avoiding unnecessary serialization between VFs that appear on the same bus (Alex Williamson) - Fix a theoretical integer overflow is the mlx5-vfio-pci variant driver (Artem Sadovnikov) - Implement missing VF token checking support via vfio cdev/IOMMUFD interface (Jason Gunthorpe) - Update QAT vfio-pci variant driver to claim latest VF devices (Małgorzata Mielnik) - Add a cond_resched() call to avoid holding the CPU too long during DMA mapping operations (Keith Busch) * tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfio: vfio/type1: conditional rescheduling while pinning vfio/qat: add support for intel QAT 6xxx virtual functions vfio/qat: Remove myself from VFIO QAT PCI driver maintainers vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD vfio/mlx5: fix possible overflow in tracking max message size vfio/pci: Separate SR-IOV VF dev_set vfio/pds: Fix missing detach_ioas op vfio: Prevent open_count decrement to negative vfio: Fix unbalanced vfio_df_close call in no-iommu mode
2025-08-05vfio/type1: conditional rescheduling while pinningKeith Busch1-0/+7
A large DMA mapping request can loop through dma address pinning for many pages. In cases where THP can not be used, the repeated vmf_insert_pfn can be costly, so let the task reschedule as need to prevent CPU stalls. Failure to do so has potential harmful side effects, like increased memory pressure as unrelated rcu tasks are unable to make their reclaim callbacks and result in OOM conditions. rcu: INFO: rcu_sched self-detected stall on CPU rcu: 36-....: (20999 ticks this GP) idle=b01c/1/0x4000000000000000 softirq=35839/35839 fqs=3538 rcu: hardirqs softirqs csw/system rcu: number: 0 107 0 rcu: cputime: 50 0 10446 ==> 10556(ms) rcu: (t=21075 jiffies g=377761 q=204059 ncpus=384) ... <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? walk_system_ram_range+0x63/0x120 ? walk_system_ram_range+0x46/0x120 ? pgprot_writethrough+0x20/0x20 lookup_memtype+0x67/0xf0 track_pfn_insert+0x20/0x40 vmf_insert_pfn_prot+0x88/0x140 vfio_pci_mmap_huge_fault+0xf9/0x1b0 [vfio_pci_core] __do_fault+0x28/0x1b0 handle_mm_fault+0xef1/0x2560 fixup_user_fault+0xf5/0x270 vaddr_get_pfns+0x169/0x2f0 [vfio_iommu_type1] vfio_pin_pages_remote+0x162/0x8e0 [vfio_iommu_type1] vfio_iommu_type1_ioctl+0x1121/0x1810 [vfio_iommu_type1] ? futex_wake+0x1c1/0x260 x64_sys_call+0x234/0x17a0 do_syscall_64+0x63/0x130 ? exc_page_fault+0x63/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20250715184622.3561598-1-kbusch@meta.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-05vfio/qat: add support for intel QAT 6xxx virtual functionsMałgorzata Mielnik1-1/+3
Extend the qat_vfio_pci variant driver to support QAT 6xxx Virtual Functions (VFs). Add the relevant QAT 6xxx VF device IDs to the driver's probe table, enabling proper detection and initialization of these devices. Update the module description to reflect that the driver now supports all QAT generations. Signed-off-by: Małgorzata Mielnik <malgorzata.mielnik@intel.com> Signed-off-by: Suman Kumar Chakraborty <suman.kumar.chakraborty@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Link: https://lore.kernel.org/r/20250715081150.1244466-1-suman.kumar.chakraborty@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-05vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFDJason Gunthorpe9-11/+59
This was missed during the initial implementation. The VFIO PCI encodes the vf_token inside the device name when opening the device from the group FD, something like: "0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3" This is used to control access to a VF unless there is co-ordination with the owner of the PF. Since we no longer have a device name in the cdev path, pass the token directly through VFIO_DEVICE_BIND_IOMMUFD using an optional field indicated by VFIO_DEVICE_BIND_FLAG_TOKEN. Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-bdd8716e85fe+3978a-vfio_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-08-01Merge tag 'pci-v6.17-changes' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Allow built-in drivers, not just modular drivers, to use async initial probing (Lukas Wunner) - Support Immediate Readiness even on devices with no PM Capability (Sean Christopherson) - Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the required delay between a reset and sending config requests to a device (Niklas Cassel) - Add pci_is_display() to check for "Display" base class and use it in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello) - Allow 'isolated PCI functions' (multi-function devices without a function 0) for LoongArch, similar to s390 and jailhouse (Huacai Chen) Power control: - Add ability to enable optional slot clock for cases where the PCIe host controller and the slot are supplied by different clocks (Marek Vasut) PCIe native device hotplug: - Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by misinterpreting a config read failure after a device has been removed (Lukas Wunner) - Avoid creating a useless PCIe port service device for pciehp if the slot is handled by the ACPI hotplug driver (Lukas Wunner) - Ignore ACPI hotplug slots when calculating depth of pciehp hotplug ports (Lukas Wunner) Virtualization: - Save VF resizable BAR state and restore it after reset (Michał Winiarski) - Allow IOV resources (VF BARs) to be resized (Michał Winiarski) - Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size (Michał Winiarski) Endpoint framework: - Add RC-to-EP doorbell support using platform MSI controller, including a test case (Frank Li) - Allow BAR assignment via configfs so platforms have flexibility in determining BAR usage (Jerome Brunet) Native PCIe controller drivers: - Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie, axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to DT schema format (Rob Herring) - Use dev_fwnode() instead of of_fwnode_handle() to remove OF dependency in altera (fixes an unused variable), designware-host, mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma, xilinx-nwl (Jiri Slaby, Arnd Bergmann) - Convert aardvark, altera, brcmstb, designware-host, iproc, mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx, xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to using msi_create_parent_irq_domain() instead; this makes the interrupt controller per-PCI device, allows dynamic allocation of vectors after initialization, and allows support of IMS (Nam Cao) APM X-Gene PCIe controller driver: - Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug bits, use device-managed memory allocations, and clean things up (Marc Zyngier) - Probe xgene-msi as a standard platform driver rather than a subsys_initcall (Marc Zyngier) Broadcom STB PCIe controller driver: - Add optional DT 'num-lanes' property and if present, use it to override the Maximum Link Width advertised in Link Capabilities (Jim Quinlan) Cadence PCIe controller driver: - Use PCIe Message routing types from the PCI core rather than defining private ones (Hans Zhang) Freescale i.MX6 PCIe controller driver: - Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu) - Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features (Richard Zhu) - Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can trigger doorbel on Endpoint (Frank Li) - Remove apps_reset (LTSSM_EN) from imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug regression on i.MX8MM (Richard Zhu) - Delay Endpoint link start until configfs 'start' written (Richard Zhu) Intel VMD host bridge driver: - Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo) Qualcomm PCIe controller driver: - Add DT binding and driver support for SA8255p, which supports ECAM for Configuration Space access (Mayank Rana) - Update DT binding and driver to describe PHYs and per-Root Port resets in a Root Port stanza and deprecate describing them in the host bridge; this makes it possible to support multiple Root Ports in the future (Krishna Chaitanya Chundru) - Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang) - Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang) - Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT bindings (Konrad Dybcio) - Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue Zhang) - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ (Niklas Cassel) Rockchip PCIe controller driver: - Drop unused PCIe Message routing and code definitions (Hans Zhang) - Remove several unused header includes (Hans Zhang) - Use standard PCIe config register definitions instead of rockchip-specific redefinitions (Geraldo Nascimento) - Set Target Link Speed to 5.0 GT/s before retraining so we have a chance to train at a higher speed (Geraldo Nascimento) Rockchip DesignWare PCIe controller driver: - Prevent race between link training and register update via DBI by inhibiting link training after hot reset and link down (Wilfred Mallawa) - Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ (Niklas Cassel) Sophgo PCIe controller driver: - Add DT binding and driver for Sophgo SG2044 PCIe controller driver in Root Complex mode (Inochi Amaoto) Synopsys DesignWare PCIe controller driver: - Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on Ports that support > 5.0 GT/s. Slower Ports still rely on the not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while waiting for the Link (Niklas Cassel)" * tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits) dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset dt-bindings: PCI: Remove 83xx-512x-pci.txt dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema dt-bindings: PCI: Convert apm,xgene-pcie to DT schema dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema dt-bindings: PCI: Convert st,spear1340-pcie to DT schema PCI: Move is_pciehp check out of pciehp_is_native() PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports selftests: pci_endpoint: Add doorbell test case misc: pci_endpoint_test: Add doorbell test case PCI: endpoint: pci-epf-test: Add doorbell test support PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode PCI: vmd: Switch to msi_create_parent_irq_domain() PCI: vmd: Convert to lock guards ...
2025-07-31Merge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds1-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2025-07-17vfio/pci: Use pci_is_display()Mario Limonciello1-2/+1
The inline pci_is_display() helper does the same thing. Use it. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Daniel Dadap <ddadap@nvidia.com> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://patch.msgid.link/20250717173812.3633478-3-superm1@kernel.org
2025-07-15vfio/mlx5: fix possible overflow in tracking max message sizeArtem Sadovnikov1-2/+2
MLX cap pg_track_log_max_msg_size consists of 5 bits, value of which is used as power of 2 for max_msg_size. This can lead to multiplication overflow between max_msg_size (u32) and integer constant, and afterwards incorrect value is being written to rq_size. Fix this issue by extending integer constant to u64 type. Found by Linux Verification Center (linuxtesting.org) with SVACE. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20250701144017.2410-2-a.sadovnikov@ispras.ru Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-07-11vfio/pci: Separate SR-IOV VF dev_setAlex Williamson1-1/+1
In the below noted Fixes commit we introduced a reflck mutex to allow better scaling between devices for open and close. The reflck was based on the hot reset granularity, device level for root bus devices which cannot support hot reset or bus/slot reset otherwise. Overlooked in this were SR-IOV VFs, where there's also no bus reset option, but the default for a non-root-bus, non-slot-based device is bus level reflck granularity. The reflck mutex has since become the dev_set mutex (via commit 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")) and is our defacto serialization for various operations and ioctls. It still seems to be the case though that sets of vfio-pci devices really only need serialization relative to hot resets affecting the entire set, which is not relevant to SR-IOV VFs. As described in the Closes link below, this serialization contributes to startup latency when multiple VFs sharing the same "bus" are opened concurrently. Mark the device itself as the basis of the dev_set for SR-IOV VFs. Reported-by: Aaron Lewis <aaronlewis@google.com> Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com Tested-by: Aaron Lewis <aaronlewis@google.com> Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release") Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250626225623.1180952-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-07-11vfio/pds: Fix missing detach_ioas opBrett Creeley1-0/+1
When CONFIG_IOMMUFD is enabled and a device is bound to the pds_vfio_pci driver, the following WARN_ON() trace is seen and probe fails: WARNING: CPU: 0 PID: 5040 at drivers/vfio/vfio_main.c:317 __vfio_register_dev+0x130/0x140 [vfio] <...> pds_vfio_pci 0000:08:00.1: probe with driver pds_vfio_pci failed with error -22 This is because the driver's vfio_device_ops.detach_ioas isn't set. Fix this by using the generic vfio_iommufd_physical_detach_ioas function. Fixes: 38fe3975b4c2 ("vfio/pds: Initial support for pds VFIO driver") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250702163744.69767-1-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-07-11vfio: Prevent open_count decrement to negativeJacob Pan1-1/+2
When vfio_df_close() is called with open_count=0, it triggers a warning in vfio_assert_device_open() but still decrements open_count to -1. This allows a subsequent open to incorrectly pass the open_count == 0 check, leading to unintended behavior, such as setting df->access_granted = true. For example, running an IOMMUFD compat no-IOMMU device with VFIO tests (https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c) results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first run, but the second run succeeds incorrectly. Add checks to avoid decrementing open_count below zero. Fixes: 05f37e1c03b6 ("vfio: Pass struct vfio_device_file * to vfio_device_open/close()") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Link: https://lore.kernel.org/r/20250618234618.1910456-2-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-07-11vfio: Fix unbalanced vfio_df_close call in no-iommu modeJacob Pan2-4/+7
For devices with no-iommu enabled in IOMMUFD VFIO compat mode, the group open path skips vfio_df_open(), leaving open_count at 0. This causes a warning in vfio_assert_device_open(device) when vfio_df_close() is called during group close. The correct behavior is to skip only the IOMMUFD bind in the device open path for no-iommu devices. Commit 6086efe73498 omitted vfio_df_open(), which was too broad. This patch restores the previous behavior, ensuring the vfio_df_open is called in the group open path. Fixes: 6086efe73498 ("vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250618234618.1910456-1-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-07-09mm: remove callers of pfn_t functionalityAlistair Popple1-3/+2
All PFN_* pfn_t flags have been removed. Therefore there is no longer a need for the pfn_t type and all uses can be replaced with normal pfns. Link: https://lkml.kernel.org/r/bbedfa576c9822f8032494efbe43544628698b1f.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09mm: remove remaining uses of PFN_DEVAlistair Popple1-4/+2
PFN_DEV was used by callers of dax_direct_access() to figure out if the returned PFN is associated with a page using pfn_t_has_page() or not. However all DAX PFNs now require an assoicated ZONE_DEVICE page so can assume a page exists. Other users of PFN_DEV were setting it before calling vmf_insert_mixed(). This is unnecessary as it is no longer checked, instead relying on pfn_valid() to determine if there is an associated page or not. Link: https://lkml.kernel.org/r/74b293aebc21b941090bc3e7aeafa91b57c821a5.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Björn Töpel <bjorn@rivosinc.com> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-20irqbypass: Require producers to pass in Linux IRQ number during registrationSean Christopherson1-2/+1
Pass in the Linux IRQ associated with an IRQ bypass producer instead of relying on the caller to set the field prior to registration, as there's no benefit to relying on callers to do the right thing. Take care to set producer->irq before __connect(), as KVM expects the IRQ to be valid as soon as a connection is possible. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20250516230734.2564775-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-20irqbypass: Take ownership of producer/consumer token trackingSean Christopherson1-6/+3
Move ownership of IRQ bypass token tracking into irqbypass.ko, and explicitly require callers to pass an eventfd_ctx structure instead of a completely opaque token. Relying on producers and consumers to set the token appropriately is error prone, and hiding the fact that the token must be an eventfd_ctx pointer (for all intents and purposes) unnecessarily obfuscates the code and makes it more brittle. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20250516230734.2564775-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-06-04Merge tag 'pci-v6.16-changes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Print the actual delay time in pci_bridge_wait_for_secondary_bus() instead of assuming it was 1000ms (Wilfred Mallawa) - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices', which broke resume from system sleep on AMD platforms and has been fixed by other commits (Lukas Wunner) Resource management: - Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and unnecessary (Philipp Stanner) - Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and related flags since all uses have been removed (Philipp Stanner) - Rework devres 'request' functions so they are no longer 'hybrid', i.e., their behavior no longer depends on whether pcim_enable_device or pci_enable_device() was used, and remove related code (Philipp Stanner) - Warn (not BUG()) about failure to assign optional resources (Ilpo Järvinen) Error handling: - Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL or ERR_NONFATAL was received from a downstream device) and decode into bus/device/function (Bjorn Helgaas) - Determine AER log level once and save it so all related messages use the same level (Karolina Stolarek) - Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors (Karolina Stolarek) - Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs controls on interval and burst count, to avoid flooding logs and RCU stall warnings (Jon Pan-Doh) Power management: - Increment PM usage counter when probing reset methods so we don't try to read config space of a powered-off device (Alex Williamson) - Set all devices to D0 during enumeration to ensure ACPI opregion is connected via _REG (Mario Limonciello) Power control: - Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the filename paths. Retain old deprecated symbols for compatibility, except for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold) - When unregistering pwrctrl, cancel outstanding rescan work before cleaning up data structures to avoid use-after-free issues (Brian Norris) Bandwidth control: - Simplify link bandwidth controller by replacing the count of Link Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag (Ilpo Järvinen) - Update the Link Speed after retraining, since the Link Speed may have changed (Ilpo Järvinen) PCIe native device hotplug: - Ignore Presence Detect Changed caused by DPC. pciehp already ignores Link Down/Up events caused by DPC, but on slots using in-band presence detect, DPC causes a spurious Presence Detect Changed event (Lukas Wunner) - Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports using in-band presence detect, the reset causes a Presence Detect Changed event, which mistakenly caused teardown and re-enumeration of the device. Drivers may need to annotate code that resets their device (Lukas Wunner) Virtualization: - Add an ACS quirk for Loongson Root Ports that don't advertise ACS but don't allow peer-to-peer transactions between Root Ports; the quirk allows each Root Port to be in a separate IOMMU group (Huacai Chen) Endpoint framework: - For fixed-size BARs, retain both the actual size and the possibly larger size allocated to accommodate iATU alignment requirements (Jerome Brunet) - Simplify ctrl/SPAD space allocation and avoid allocating more space than needed (Jerome Brunet) - Correct MSI-X PBA offset calculations for DesignWare and Cadence endpoint controllers (Niklas Cassel) - Align the return value (number of interrupts) encoding for pci_epc_get_msi()/pci_epc_ops::get_msi() and pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel) - Align the nr_irqs parameter encoding for pci_epc_set_msi()/pci_epc_ops::set_msi() and pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel) Common host controller library: - Convert pci-host-common to a library so platforms that don't need native host controller drivers don't need to include these helper functions (Manivannan Sadhasivam) Apple PCIe controller driver: - Extract ECAM bridge creation helper from pci_host_common_probe() to separate driver-specific things like MSI from PCI things (Marc Zyngier) - Dynamically allocate RID-to_SID bitmap to prepare for SoCs with varying capabilities (Marc Zyngier) - Skip ports disabled in DT when setting up ports (Janne Grunau) - Add t6020 compatible string (Alyssa Rosenzweig) - Add T602x PCIe support (Hector Martin) - Directly set/clear INTx mask bits because T602x dropped the accessors that could do this without locking (Marc Zyngier) - Move port PHY registers to their own reg items to accommodate T602x, which moves them around; retain default offsets for existing DTs that lack phy%d entries with the reg offsets (Hector Martin) - Stop polling for core refclk, which doesn't work on T602x and the bootloader has already done anyway (Hector Martin) - Use gpiod_set_value_cansleep() when asserting PERST# in probe because we're allowed to sleep there (Hector Martin) Cadence PCIe controller driver: - Drop a runtime PM 'put' to resolve a runtime atomic count underflow (Hans Zhang) - Make the cadence core buildable as a module (Kishon Vijay Abraham I) - Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by loadable drivers when they are removed (Siddharth Vadapalli) Freescale i.MX6 PCIe controller driver: - Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP (Richard Zhu) - Remove redundant dw_pcie_wait_for_link() from imx_pcie_start_link(); since the DWC core does this, imx6 only needs it when retraining for a faster link speed (Richard Zhu) - Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu) - Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in some cases, the controller can't exit 'L23 Ready' through Beacon or PERST# deassertion (Richard Zhu) - Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum: controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8 GT/s, causing timeouts in L1 (Richard Zhu) - Wait for i.MX95 PLL lock before enabling controller (Richard Zhu) - Save/restore i.MX95 LUT for suspend/resume (Richard Zhu) Mobiveil PCIe controller driver: - Return bool (not int) for link-up check in mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans Zhang) NVIDIA Tegra194 PCIe controller driver: - Create debugfs directory for 'aspm_state_cnt' only when CONFIG_PCIEASPM is enabled, since there are no other entries (Hans Zhang) Qualcomm PCIe controller driver: - Add OF support for parsing DT 'eq-presets-<N>gts' property for lane equalization presets (Krishna Chaitanya Chundru) - Read Maximum Link Width from the Link Capabilities register if DT lacks 'num-lanes' property (Krishna Chaitanya Chundru) - Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru) - Add generic dwc support for configuring lane equalization presets (Krishna Chaitanya Chundru) - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar) Renesas R-Car PCIe controller driver: - Describe endpoint BAR 4 as being fixed size (Jerome Brunet) - Document how to obtain R-Car V4H (r8a779g0) controller firmware (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Reorder rockchip_pci_core_rsts because reset_control_bulk_deassert() deasserts in reverse order, to fix a link training regression (Jensen Huang) - Mark RK3399 as being capable of raising INTx interrupts (Niklas Cassel) Rockchip DesignWare PCIe controller driver: - Check only PCIE_LINKUP, not LTSSM status, to determine whether the link is up (Shawn Lin) - Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s for Root Complex and Endpoint modes (Shawn Lin) - Hide the broken ATS Capability in rockchip_pcie_ep_init() instead of rockchip_pcie_ep_pre_init() so it stays hidden after PERST# resets non-sticky registers (Shawn Lin) - Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit() (Diederik de Haas) Synopsys DesignWare PCIe controller driver: - Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training more robust; this will not affect the intended link width if all lanes are functional (Wenbin Yao) - Return bool (not int) for link-up check in dw_pcie_ops.link_up() and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay, keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx, tegra194, uniphier, visconti (Hans Zhang) - Add debugfs support for exposing DWC device-specific PTM context (Manivannan Sadhasivam) TI J721E PCIe driver: - Make j721e buildable as a loadable and removable module (Siddharth Vadapalli) - Fix j721e host/endpoint dependencies that result in link failures in some configs (Arnd Bergmann) Device tree bindings: - Add qcom DT binding for 'global' interrupt (PCIe controller and link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p, sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan Sadhasivam) - Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074, ipq8074-gen3, ipq6018 (Manivannan Sadhasivam) - Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang) - Correct indentation and style of examples in brcm,stb-pcie, cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie, microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm (Krzysztof Kozlowski) - Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and armada8k from text to schema DT bindings (Rob Herring) - Remove obsolete .txt DT bindings for content that has been moved to schemas (Rob Herring) - Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074 and IPQ9574 (Varadarajan Narayanan) - Convert v3,v360epc-pci from text to DT schema binding (Rob Herring) - Change microchip,pcie-host DT binding to be 'dma-noncoherent' since PolarFire may be configured that way (Conor Dooley) Miscellaneous: - Drop 'pci' suffix from intel_mid_pci.c filename to match similar files (Andy Shevchenko) - All platforms with PCI have an MMU, so add PCI Kconfig dependency on MMU to simplify build testing and avoid inadvertent build regressions (Arnd Bergmann) - Update Krzysztof Wilczyński's email address in MAINTAINERS (Krzysztof Wilczyński) - Update Manivannan Sadhasivam's email address in MAINTAINERS (Manivannan Sadhasivam)" * tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits) MAINTAINERS: Update Manivannan Sadhasivam email address PCI: j721e: Fix host/endpoint dependencies PCI: j721e: Add support to build as a loadable module PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup PCI: cadence: Add support to build pcie-cadence library as a kernel module MAINTAINERS: Update Krzysztof Wilczyński email address PCI: Remove unnecessary linesplit in __pci_setup_bridge() PCI: WARN (not BUG()) when we fail to assign optional resources PCI: Remove unused pci_printk() PCI: qcom: Replace PERST# sleep time with proper macro PCI: dw-rockchip: Replace PERST# sleep time with proper macro PCI: host-common: Convert to library for host controller drivers PCI/ERR: Remove misleading TODO regarding kernel panic PCI: cadence: Remove duplicate message code definitions PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding PCI: cadence-ep: Correct PBA offset in .set_msix() callback ...
2025-05-22vfio/type1: Fix error unwind in migration dirty bitmap allocationLi RongQing1-1/+1
When setting up dirty page tracking at the vfio IOMMU backend for device migration, if an error is encountered allocating a tracking bitmap, the unwind loop fails to free previously allocated tracking bitmaps. This occurs because the wrong loop index is used to generate the tracking object. This results in unintended memory usage for the life of the current DMA mappings where bitmaps were successfully allocated. Use the correct loop index to derive the tracking object for freeing during unwind. Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20250521034647.2877-1-lirongqing@baidu.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-20vfio/mlx5: Enable the DMA link APILeon Romanovsky3-203/+143
Remove intermediate scatter-gather table completely and enable new DMA link API. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/f71638d50c9c79a462f2e0423501b1de77617656.1747747694.git.leon@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-20vfio/mlx5: Rewrite create mkey flow to allow better code reuseLeon Romanovsky2-70/+91
Change the creation of mkey to be performed in multiple steps: data allocation, DMA setup and actual call to HW to create that mkey. In this new flow, the whole input to MKEY command is saved to eliminate the need to keep array of pointers for DMA addresses for receive list and in the future patches for send list too. In addition to memory size reduce and elimination of unnecessary data movements to set MKEY input, the code is prepared for future reuse. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/d4ad0384fbd1e23a607cbbe9e5756748f3a761d9.1747747694.git.leon@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-20vfio/mlx5: Explicitly use number of pages instead of allocated lengthLeon Romanovsky3-41/+57
allocated_length is a multiple of page size and number of pages, so let's change the functions to accept number of pages. This improves code readability, simplifies buffer handling, and enables combining DMA send/receive operations, as will be introduced in the next patches. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/76f39993d2ca0311b3bcfe56038a669d03926815.1747747694.git.leon@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-19hisi_acc_vfio_pci: update function return values.Longfang Liu1-13/+16
In this driver file, many functions call sub-functions and use ret to store the error code of the sub-functions. However, instead of directly returning ret to the caller, they use a converted error code, which prevents the end-user from clearly understanding the root cause of the error. Therefore, the code needs to be modified to directly return the error code from the sub-functions. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-7-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-19hisi_acc_vfio_pci: bugfix live migration function without VF device driverLongfang Liu1-7/+15
If the VF device driver is not loaded in the Guest OS and we attempt to perform device data migration, the address of the migrated data will be NULL. The live migration recovery operation on the destination side will access a null address value, which will cause access errors. Therefore, live migration of VMs without added VF device drivers does not require device data migration. In addition, when the queue address data obtained by the destination is empty, device queue recovery processing will not be performed. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-6-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-19hisi_acc_vfio_pci: bugfix the problem of uninstalling driverLongfang Liu1-0/+1
In a live migration scenario. If the number of VFs at the destination is greater than the source, the recovery operation will fail and qemu will not be able to complete the process and exit after shutting down the device FD. This will cause the driver to be unable to be unloaded normally due to abnormal reference counting of the live migration driver caused by the abnormal closing operation of fd. Therefore, make sure the migration file descriptor references are always released when the device is closed. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-5-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-19hisi_acc_vfio_pci: bugfix cache write-back issueLongfang Liu1-7/+7
At present, cache write-back is placed in the device data copy stage after stopping the device operation. Writing back to the cache at this stage will cause the data obtained by the cache to be written back to be empty. In order to ensure that the cache data is written back successfully, the data needs to be written back into the stop device stage. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-4-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-19hisi_acc_vfio_pci: add eq and aeq interruption restoreLongfang Liu1-0/+16
In order to ensure that the task packets of the accelerator device are not lost during the migration process, it is necessary to send an EQ and AEQ command to the device after the live migration is completed and to update the completion position of the task queue. Let the device recheck the completed tasks data and if there are uncollected packets, device resend a task completion interrupt to the software. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-19hisi_acc_vfio_pci: fix XQE dma address errorLongfang Liu2-8/+47
The dma addresses of EQE and AEQE are wrong after migration and results in guest kernel-mode encryption services failure. Comparing the definition of hardware registers, we found that there was an error when the data read from the register was combined into an address. Therefore, the address combination sequence needs to be corrected. Even after fixing the above problem, we still have an issue where the Guest from an old kernel can get migrated to new kernel and may result in wrong data. In order to ensure that the address is correct after migration, if an old magic number is detected, the dma address needs to be updated. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250510081155.55840-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-19vfio/type1: Remove Fine Grained Superpages detectionJason Gunthorpe1-48/+1
VFIO is looking to enable an optimization where it can rely on a fast unmap operation that returned the size of a larger IOPTE. Due to how the test was constructed this would only ever succeed on the AMDv1 page table that supported an 8k contiguous size. Nothing else supports this. Alex says the performance win was fairly minor, so lets remove this code. Always use iommu_iova_to_phys() to extent contiguous pages. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v2-97fa1da8d983+412-vfio_fgsp_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>