summaryrefslogtreecommitdiffstats
path: root/drivers/iommu
AgeCommit message (Collapse)AuthorLines
2026-04-17Merge tag 'dma-mapping-7.1-2026-04-16' of ↵Linus Torvalds-8/+27
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - added support for batched cache sync, what improves performance of dma_map/unmap_sg() operations on ARM64 architecture (Barry Song) - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory used in confidential computing (Jiri Pirko) - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and its clients (Marek Szyprowski, shared branch with device-tree updates to avoid merge conflicts) - prepared Contiguous Memory Allocator related code for making dma-buf drivers modularized (Maxime Ripard) - added support for benchmarking dma_map_sg() calls to tools/dma utility (Qinxin Xia) * tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits) dma-buf: heaps: system: document system_cc_shared heap dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory mm: cma: Export cma_alloc(), cma_release() and cma_get_name() dma: contiguous: Export dev_get_cma_area() dma: contiguous: Make dma_contiguous_default_area static dma: contiguous: Make dev_get_cma_area() a proper function dma: contiguous: Turn heap registration logic around of: reserved_mem: rework fdt_init_reserved_mem_node() of: reserved_mem: clarify fdt_scan_reserved_mem*() functions of: reserved_mem: rearrange code a bit of: reserved_mem: replace CMA quirks by generic methods of: reserved_mem: switch to ops based OF_DECLARE() of: reserved_mem: use -ENODEV instead of -ENOENT of: reserved_mem: remove fdt node from the structure dma-mapping: fix false kernel-doc comment marker dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg dma-mapping: Separate DMA sync issuing and completion waiting arm64: Provide dcache_inval_poc_nosync helper arm64: Provide dcache_clean_poc_nosync helper ...
2026-04-16Merge tag 'for-linus-iommufd' of ↵Linus Torvalds-39/+18
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Several fixes: - Add missing static const - Correct type 1 emulation for VFIO_CHECK_EXTENSION when no-iommu is turned on - Fix selftest memory leak and syzkaller splat - Fix missed -EFAULT in fault reporting write() fops - Fix a race where map/unmap with the internal IOVA allocator can unmap things it should not" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix a race with concurrent allocation and unmap iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 format iommufd: Fix return value of iommufd_fault_fops_write() iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc() iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy} iommufd: vfio compatibility extension check for noiommu mode iommufd: Constify struct dma_buf_attach_ops
2026-04-15Merge tag 'iommu-updates-v7.1' of ↵Linus Torvalds-783/+2023
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core: - Support for RISC-V IO-page-table format in generic iommupt code ARM-SMMU Updates: - Introduction of an "invalidation array" for SMMUv3, which enables future scalability work and optimisations for devices with a large number of SMMUv3 instances - Update the conditions under which the SMMUv3 driver works around hardware errata for invalidation on MMU-700 implementations - Fix broken command filtering for the host view of NVIDIA's "cmdqv" SMMUv3 extension - MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi SoCs Intel VT-d: - Support for dirty tracking on domains attached to PASID - Removal of unnecessary read*()/write*() wrappers - Improvements to the invalidation paths AMD Vi: - Race-condition fixed in debugfs code - Make log buffer allocation NUMA aware RISC-V: - IO-TLB flushing improvements - Minor fixes" * tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits) iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCY dt-bindings: arm-smmu: qcom: Add compatible for Hawi SoC iommu/amd: Invalidate IRT cache for DMA aliases iommu/riscv: Remove overflows on the invalidation path iommu/amd: Fix clone_alias() to use the original device's devid iommu/vt-d: Remove the remaining pages along the invalidation path iommu/vt-d: Pass size_order to qi_desc_piotlb() not npages iommu/vt-d: Split piotlb invalidation into range and all iommu/vt-d: Remove dmar_writel() and dmar_writeq() iommu/vt-d: Remove dmar_readl() and dmar_readq() iommufd/selftest: Test dirty tracking on PASID iommu/vt-d: Support dirty tracking on PASID iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointer iommu/vt-d: Block PASID attachment to nested domain with dirty tracking iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain iommu/riscv: Fix signedness bug iommu/amd: Fix illegal cap/mmio access in IOMMU debugfs iommu/amd: Fix illegal device-id access in IOMMU debugfs iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirement iommu/tegra241-cmdqv: Set supports_cmd op in tegra241_vcmdq_hw_init() ...
2026-04-15Merge tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds-3/+10
Pull drm updates from Dave Airlie: "Highlights: - new DRM RAS infrastructure using netlink - amdgpu: enable DC on CIK APUs, and more IP enablement, and more user queue work - xe: purgeable BO support, and new hw enablement - dma-buf : add revocable operations Full summary: mm: - two-pass MMU interval notifiers - add gpu active/reclaim per-node stat counters math: - provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI - implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST() rust: - shared tag with driver-core: register macro and io infra - core: rework DMA coherent API - core: add interop::list to interop with C linked lists - core: add more num::Bounded operations - core: enable generic_arg_infer and add EMSGSIZE - workqueue: add ARef<T> support for work and delayed work - add GPU buddy allocator abstraction - add DRM shmem GEM helper abstraction - allow drm:::Device to dispatch work and delayed work items to driver private data - add dma_resv_lock helper and raw accessors core: - introduce DRM RAS infrastructure over netlink - add connector panel_type property - fourcc: add ARM interleaved 64k modifier - colorop: add destroy helper - suballoc: split into alloc and init helpers - mode: provide DRM_ARGB_GET*() macros for reading color components edid: - provide drm_output_color_Format dma-buf: - provide revoke mechanism for shared buffers - rename move_notify to invalidate_mappings - always enable move_notify - protect dma_fence_ops with RCU and improve locking - clean pages with helpers atomic: - allocate drm_private_state via callback - helper: use system_percpu_wq buddy: - make buddy allocator available to gpu level - add kernel-doc for buddy allocator - improve aligned allocation ttm: - fix fence signalling - improve tests and docs - improve handling of gfp_retry_mayfail - use per-node stat counters to track memory allocations - port pool to use list_lru - drop NUMA specific pools - make pool shrinker numa aware - track allocated pages per numa node coreboot: - cleanup coreboot framebuffer support sched: - fix race condition in drm_sched_fini pagemap: - enable THP support - pass pagemap_addr by reference gem-shmem: - Track page accessed/dirty status across mmap/vmap gpusvm: - reenable device to device migration - fix unbalanced unclock bridge: - anx7625: Support USB-C plus DT bindings - connector: Fix EDID detection - dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others - fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor' - imx8qxp-pixel-link: Improve bridge reference handling - lt9611: Support Port-B-only input plus DT bindings - tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up - Support TH1520 HDMI plus DT bindings - waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus DT bindings - anx7625: Fix USB Type-C handling - cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check - Support Lontium LT8713SX DP MST bridge plus DT bindings - analogix_dp: Use DP helpers for link training panel: - panel-jdi-lt070me05000: Use mipi-dsi multi functions - panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes - panel-edp: Fix timings for BOE NV140WUM-N64 - ilitek-ili9882t: Allow GPIO calls to sleep - jadard: Support TAIGUAN XTI05101-01A - lxd: Support LXD M9189A plus DT bindings - mantix: Fix pixel clock; Clean up - motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings - novatek: Support Novatek/Tianma NT37700F plus DT bindings - simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3" - novatek-nt36672a: Use mipi_dsi_*_multi() functions - panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW MNF307QS3-2 - support Himax HX83121A plus DT bindings - support JuTouch JT070TM041 plus DT bindings - support Samsung S6E8FC0 plus DT bindings - himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support backlight - ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings - simple: support Tianma TM050RDH03 plus DT bindings amdgpu: - enable DC by default on CIK APUs - userq fence ioctl param size fixes - set panel_type to OLED for eDP - refactor DC i2c code - FAMS2 update - rework ttm handling to allow multiple engines - DC DCE 6.x cleanup - DC support for NUTMEG/TRAVIS DP bridge - DCN 4.2 support - GC12 idle power fix for compute - use struct drm_edid in non-DC code - enable NV12/P010 support on primary planes - support newer IP discovery tables - VCN/JPEG 5.0.2 support - GC/MES 12.1 updates - USERQ fixes - add DC idle state manager - eDP DSC seamless boot amdkfd: - GC 12.1 updates - non 4K page fixes xe: - basic Xe3p_LPG and NVL-P enabling patches - allow VM_BIND decompress support - add purgeable buffer object support - add xe_vm_get_property_ioctl - restrict multi-lrc to VCS/VECS engines - allow disabling VM overcommit in fault mode - dGPU memory optimizations - Workaround cleanups and simplification - Allow VFs VRAM quote changes using sysfs - convert GT stats to per-cpu counters - pagefault refactors - enable multi-queue on xe3p_xpc - disable DCC on PTL - make MMIO communication more robust - disable D3Cold for BMG on specific platforms - vfio: improve FLR sync for Xe VFIO i915/display: - C10/C20/LT PHY PLL divider verification - use trans push mechanism to generate PSR frame change on LNL+ - refactor DP DSC slice config - VGA decode refactoring - refactor DPT, gen2-4 overlay, masked field register macro helpers - refactor stolen memory allocation decisions - prepare for UHBR DP tunnels - refactor LT PHY PLL to use DPLL framework - implement register polling/waiting in display code - add shared stepping header between i915 and display i915: - fix potential overflow of shmem scatterlist length nouveau: - provide Z cull info to userspace - initial GA100 support - shutdown on PCI device shutdown nova-core: - harden GSP command queue - add support for large RPCs - simplify GSP sequencer and message handling - refactor falcon firmware handling - convert to new register macro - conver to new DMA coherent API - use checked arithmetic - add debugfs support for gsp-rm log buffers - fix aux device registration for multi-GPU msm: - CI: - Uprev mesa - Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices - Core: - Switched to of_get_available_child_by_name() - DPU: - Fixes for DSC panels - Fixed brownout because of the frequency / OPP mismatch - Quad pipe preparation (not enabled yet) - Switched to virtual planes by default - Dropped VBIF_NRT support - Added support for Eliza platform - Reworked alpha handling - Switched to correct CWB definitions on Eliza - Dropped dummy INTF_0 on MSM8953 - Corrected INTFs related to DP-MST - DP: - Removed debug prints looking into PHY internals - DSI: - Fixes for DSC panels - RGB101010 support - Support for SC8280XP - Moved PHY bindings from display/ to phy/ - GPU: - Preemption support for x2-85 and a840 - IFPC support for a840 - SKU detection support for x2-85 and a840 - Expose AQE support (VK ray-pipeline) - Avoid locking in VM_BIND fence signaling path - Fix to avoid reclaim in GPU snapshot path - Disallow foreign mapping of _NO_SHARE BOs - HDMI: - Fixed infoframes programming - MDP5: - Dropped support for MSM8974v1 - Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998 panthor: - add tracepoints for power and IRQs - fix fence handling - extend timestamp query with flags - support various sources for timestamp queries tyr: - fix names and model/versions rockchip: - vop2: use drm logging function - rk3576 displayport support - support CRTC background color atmel-hlcdc: - support sana5d65 LCD controller tilcdc: - use DT bindings schema - use managed DRM interfaces - support DRM_BRIDGE_ATTACH_NO_CONNECTOR verisilicon: - support DC8200 + DT bindings virtgpu: - support PRIME import with 3D enabled komeda: - fix integer overflow in AFBC checks mcde: - improve bridge handling gma500: - use drm client buffer for fbdev framebuffer amdxdna: - add sensors ioctls - provide NPU power estimate - support column utilization sensor - allow forcing DMA through IOMMU IOVA - support per-BO mem usage queries - refactor GEM implementation ivpu: - update boot API to v3.29.4 - limit per-user number of doorbells/contexts - perform engine reset on TDR error loongson: - replace custom code with drm_gem_ttm_dumb_map_offset() imx: - support planes behind the primary plane - fix bus-format selection vkms: - support CRTC background color v3d: - improve handling of struct v3d_stats komeda: - support Arm China Linlon D6 plus DT bindings imagination: - improve power-off sequence - support context-reset notification from firmware mediatek: - mtk_dsi: enable hs clock during pre-enable - Remove all conflicting aperture devices during probe - Add support for mt8167 display blocks" * tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernel: (1735 commits) drm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc drm/ttm/tests: fix lru_count ASSERT drm/vram: remove DRM_VRAM_MM_FILE_OPERATIONS from docs drm/fb-helper: Fix a locking bug in an error path dma-fence: correct kernel-doc function parameter @flags ttm/pool: track allocated_pages per numa node. ttm/pool: make pool shrinker NUMA aware (v2) ttm/pool: drop numa specific pools ttm/pool: port to list_lru. (v2) drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) mm: add gpu active/reclaim per-node stat counters (v2) gpu: nova-core: fix missing colon in SEC2 boot debug message gpu: nova-core: vbios: use from_le_bytes() for PCI ROM header parsing gpu: nova-core: bitfield: fix broken Default implementation gpu: nova-core: falcon: pad firmware DMA object size to required block alignment gpu: nova-core: gsp: fix undefined behavior in command queue code drm/shmem_helper: Make sure PMD entries get the writeable upgrade accel/ivpu: Trigger recovery on TDR with OS scheduling drm/msm: Use of_get_available_child_by_name() dt-bindings: display/msm: move DSI PHY bindings to phy/ subdir ...
2026-04-11iommufd: Fix a race with concurrent allocation and unmapSina Hassani-0/+10
iopt_unmap_iova_range() releases the lock on iova_rwsem inside the loop body when getting to the more expensive unmap operations. This is fine on its own, except the loop condition is based on the first area that matches the unmap address range. If a concurrent call to map picks an area that was unmapped in previous iterations, the loop mistakenly tries to unmap it. This is reproducible by having one userspace thread map buffers and pass them to another thread that unmaps them. The problem manifests as EBUSY errors with single page mappings. Fix this by advancing the start pointer after unmapping an area. This ensures each iteration only examines the IOVA range that remains mapped, which is guaranteed not to have overlaps. Cc: stable@vger.kernel.org Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://patch.msgid.link/r/CAAJpGJSR4r_ds1JOjmkqHtsBPyxu8GntoeW08Sk5RNQPmgi+tg@mail.gmail.com Signed-off-by: Sina Hassani <sina@openai.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-09Merge tag 'iommu-fixes-v7.0-rc7' of ↵Linus Torvalds-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull IOMMU fix from Will Deacon: - Fix regression introduced by the empty MMU gather fix in -rc7, where the ->iotlb_sync() callback can be elided incorrectly, resulting in boot failures (hangs), crashes and potential memory corruption. * tag 'iommu-fixes-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu: Ensure .iotlb_sync is called correctly
2026-04-09Merge branches 'fixes', 'arm/smmu/updates', 'arm/smmu/bindings', 'riscv', ↵Will Deacon-785/+2031
'intel/vt-d', 'amd/amd-vi' and 'core' into next
2026-04-09iommu: Ensure .iotlb_sync is called correctlyRobin Murphy-0/+6
Many drivers have no reason to use the iotlb_gather mechanism, but do still depend on .iotlb_sync being called to properly complete an unmap. Since the core code is now relying on the gather to detect when there is legitimately something to sync, it should also take care of encoding a successful unmap when the driver does not touch the gather itself. Fixes: 90c5def10bea ("iommu: Do not call drivers for empty gathers") Reported-by: Jon Hunter <jonathanh@nvidia.com> Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Will Deacon <will@kernel.org>
2026-04-09iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCYAlex Williamson-0/+1
In removing IOMMU_CAP_DEFERRED_FLUSH, the below referenced commit was over-eager in removing the return, resulting in the test for IOMMU_CAP_CACHE_COHERENCY falling through to an irrelevant option. Restore dropped return. Fixes: 1c18a1212c77 ("iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain") Signed-off-by: Alex Williamson <alex.williamson@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-04-07Merge v7.0-rc7 into drm-nextSimona Vetter-2/+2
Thomas Zimmermann needs 2f42c1a61616 ("drm/ast: dp501: Fix initialization of SCU2C") for drm-misc-next. Conflicts: - drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c Just between e927b36ae18b ("drm/amd/display: Fix NULL pointer dereference in dcn401_init_hw()") and it's cherry-pick that confused git. - drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c Deleted in 6b0a6116286e ("drm/amd/pm: Unify version check in SMUv11") but some cherry-picks confused git. Same for v12/v14. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2026-04-02Merge tag 'iommu-fixes-v7.0-rc6' of ↵Linus Torvalds-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - IOMMU-PT related compile breakage in for AMD driver - IOTLB flushing behavior when unmapped region is larger than requested due to page-sizes - Fix IOTLB flush behavior with empty gathers * tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline iommupt: Fix short gather if the unmap goes into a large mapping iommu: Do not call drivers for empty gathers
2026-04-02iommu/amd: Invalidate IRT cache for DMA aliasesMagnus Kalland-5/+23
DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared between multiple device IDs. See commit 3c124435e8dd ("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more information on this. However, the AMD IOMMU driver currently invalidates IRTE cache entries on a per-device basis whenever an IRTE is updated, not for each alias. This approach leaves stale IRTE cache entries when an IRTE is cached under one DMA alias but later updated and invalidated through a different alias. In such cases, the original device ID is never invalidated, since it is programmed via aliasing. This incoherency bug has been observed when IRTEs are cached for one Non-Transparent Bridge (NTB) DMA alias, later updated via another. Fix this by invalidating the interrupt remapping table cache for all DMA aliases when updating an IRTE. Co-developed-by: Lars B. Kristiansen <larsk@dolphinics.com> Signed-off-by: Lars B. Kristiansen <larsk@dolphinics.com> Co-developed-by: Jonas Markussen <jonas@dolphinics.com> Signed-off-by: Jonas Markussen <jonas@dolphinics.com> Co-developed-by: Tore H. Larsen <torel@simula.no> Signed-off-by: Tore H. Larsen <torel@simula.no> Signed-off-by: Magnus Kalland <magnus@dolphinics.com> Link: https://lore.kernel.org/linux-iommu/9204da81-f821-4034-b8ad-501e43383b56@amd.com/ Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/riscv: Remove overflows on the invalidation pathJason Gunthorpe-5/+6
Since RISC-V supports a sign extended page table it should support a gather->end of ULONG_MAX, but if this happens it will infinite loop because of the overflow. Also avoid overflow computing the length by moving the +1 to the other side of the < Fixes: 488ffbf18171 ("iommu/riscv: Paging domain support") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/amd: Fix clone_alias() to use the original device's devidVasant Hegde-3/+4
Currently clone_alias() assumes first argument (pdev) is always the original device pointer. This function is called by pci_for_each_dma_alias() which based on topology decides to send original or alias device details in first argument. This meant that the source devid used to look up and copy the DTE may be incorrect, leading to wrong or stale DTE entries being propagated to alias device. Fix this by passing the original pdev as the opaque data argument to both the direct clone_alias() call and pci_for_each_dma_alias(). Inside clone_alias(), retrieve the original device from data and compute devid from it. Fixes: 3332364e4ebc ("iommu/amd: Support multiple PCI DMA aliases in device table") Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Remove the remaining pages along the invalidation pathJason Gunthorpe-26/+19
This was only being used to signal that a flush all should be used. Use mask/size_order >= 52 to signal this instead. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v1-f175e27af136+11647-iommupt_inv_vtd_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Pass size_order to qi_desc_piotlb() not npagesJason Gunthorpe-14/+9
It doesn't make sense for the caller to compute mask, throw it away and then have qi_desc_piotlb() compute it again. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v1-f175e27af136+11647-iommupt_inv_vtd_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Split piotlb invalidation into range and allJason Gunthorpe-49/+41
Currently these call chains are muddled up by using npages=-1, but only one caller has the possibility to do both options. Simplify qi_flush_piotlb() to qi_flush_piotlb_all() since all callers pass npages=-1. Split qi_batch_add_piotlb() into qi_batch_add_piotlb_all() and related helpers. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v1-f175e27af136+11647-iommupt_inv_vtd_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Remove dmar_writel() and dmar_writeq()Bjorn Helgaas-30/+27
dmar_writel() and dmar_writeq() do nothing other than expand to the generic writel() and writeq(), and the dmar_write*() wrappers are used inconsistently. Remove the dmar_write*() wrappers and use writel() and writeq() directly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Link: https://lore.kernel.org/r/20260217214438.3395039-3-bhelgaas@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Remove dmar_readl() and dmar_readq()Bjorn Helgaas-48/+46
dmar_readl() and dmar_readq() do nothing other than expand to the generic readl() and readq(), and the dmar_read*() wrappers are used inconsistently. Remove the dmar_read*() wrappers and use readl() and readq() directly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Link: https://lore.kernel.org/r/20260217214438.3395039-2-bhelgaas@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Support dirty tracking on PASIDZhenzhong Duan-8/+10
In order to support passthrough device with PASID capability in QEMU, e.g., DSA device, kernel needs to support attaching PASID to a domain. But attaching is not allowed if the domain is a second stage domain or nested domain with dirty tracking. The reason is kernel lacking support for dirty tracking on such domain attached to PASID. By adding dirty tracking on PASID, the check can be removed. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260330101108.12594-4-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointerZhenzhong Duan-12/+9
device_set_dirty_tracking() sets dirty tracking on all devices attached to a domain, also on all PASIDs attached to same domain in subsequent patch. So rename it as domain_set_dirty_tracking() and pass dmar_domain pointer to better align to what it does. No functional changes intended. Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260330101108.12594-3-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Block PASID attachment to nested domain with dirty trackingZhenzhong Duan-1/+5
Kernel lacks dirty tracking support on nested domain attached to PASID, fails the attachment early if nesting parent domain is dirty tracking configured, otherwise dirty pages would be lost. Cc: stable@vger.kernel.org Fixes: 67f6f56b5912 ("iommu/vt-d: Add set_dev_pasid callback for nested domain") Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20260330101108.12594-2-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: 67f6f56b5912 ("iommu/vt-d: Add set_dev_pasid callback for nested domain") Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-01iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domainJason Gunthorpe-5/+12
iommupt always supports the semantics required for DMA-FQ, when drivers are converted to use it they automatically get support. Detect iommpt directly instead of using IOMMU_CAP_DEFERRED_FLUSH and remove IOMMU_CAP_DEFERRED_FLUSH from converted drivers. This will also enable DMA-FQ on RISC-V. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-31iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 formatPranjal Shrivastava-32/+0
syzbot found that allocating a mock domain with AMDV1 format could cause a WARN_ON because the selftest enabled DYNAMIC_TOP without providing the required driver_ops. The AMDV1 format in the selftest was a placeholder and was not actually used by any of the existing selftests. Instead of adding dummy driver_ops to satisfy the requirements of a format we don't currently test, remove the AMDV1 format option from the selftest. The MOCK_IOMMUPT_DEFAULT and MOCK_IOMMUPT_HUGE formats are unaffected as they use the amdv1_mock variant which does not enable DYNAMIC_TOP. Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Link: https://patch.msgid.link/r/20260330092609.2659235-1-praan@google.com Reported-by: syzbot+453eb7add07c3767adab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69c1d50b.a70a0220.3cae05.0001.GAE@google.com/ Signed-off-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31iommufd: Fix return value of iommufd_fault_fops_write()Zhenzhong Duan-2/+3
copy_from_user() may return number of bytes failed to copy, we should not pass over this number to user space to cheat that write() succeed. Instead, -EFAULT should be returned. Link: https://patch.msgid.link/r/20260330030755.12856-1-zhenzhong.duan@intel.com Cc: stable@vger.kernel.org Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31BackMerge tag 'v7.0-rc6' into drm-nextDave Airlie-18/+51
Linux 7.0-rc6 Requested by a few people on irc to resolve conflicts in other tress. Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-03-27iommu/riscv: Fix signedness bugEthan Tidmore-2/+5
The function platform_irq_count() returns negative error codes and iommu->irqs_count is an unsigned integer, so the check (iommu->irqs_count <= 0) is always impossible. Make the return value of platform_irq_count() be assigned to ret, check for error, and then assign iommu->irqs_count to ret. Detected by Smatch: drivers/iommu/riscv/iommu-platform.c:119 riscv_iommu_platform_probe() warn: 'iommu->irqs_count' unsigned <= 0 Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com> Fixes: 5c0ebbd3c6c6 ("iommu/riscv: Add RISC-V IOMMU platform device driver") Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-27iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inlineSherry Yang-1/+1
After enabling CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL, following build failure is observed under GCC 14.2.1: In function 'amdv1pt_install_leaf_entry', inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:650:3, inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1, inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9, inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:657:10, inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1: ././include/linux/compiler_types.h:706:45: error: call to '__compiletime_assert_71' declared with attribute error: FIELD_PREP: value too large for the field 706 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ...... drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP' 220 | FIELD_PREP(AMDV1PT_FMT_OA, | ^~~~~~~~~~ In the path '__do_map_single_page()', level 0 always invokes 'pt_install_leaf_entry(&pts, map->oa, PAGE_SHIFT, …)'. At runtime that lands in the 'if (oasz_lg2 == isz_lg2)' arm of 'amdv1pt_install_leaf_entry()'; the contiguous-only 'else' block is unreachable for 4 KiB pages. With CONFIG_GCOV_KERNEL + CONFIG_GCOV_PROFILE_ALL, the extra instrumentation changes GCC's inlining so that the "dead" 'else' branch still gets instantiated. The compiler constant-folds the contiguous OA expression, runs the 'FIELD_PREP()' compile-time check, and produces: FIELD_PREP: value too large for the field gcov-enabled builds therefore fail even though the code path never executes. Fix this by marking amdv1pt_install_leaf_entry as __always_inline. Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sherry Yang <sherry.yang@oracle.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-27iommu/amd: Fix illegal cap/mmio access in IOMMU debugfsGuanghui Feng-23/+19
In the current AMD IOMMU debugfs, when multiple processes simultaneously access the IOMMU mmio/cap registers using the IOMMU debugfs, illegal access issues can occur in the following execution flow: 1. CPU1: Sets a valid access address using iommu_mmio/capability_write, and verifies the access address's validity in iommu_mmio/capability_show 2. CPU2: Sets an invalid address using iommu_mmio/capability_write 3. CPU1: accesses the IOMMU mmio/cap registers based on the invalid address, resulting in an illegal access. This patch modifies the execution process to first verify the address's validity and then access it based on the same address, ensuring correctness and robustness. Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-27iommu/amd: Fix illegal device-id access in IOMMU debugfsGuanghui Feng-9/+12
In the current AMD IOMMU debugFS, when multiple processes use the IOMMU debugFS process simultaneously, illegal access issues can occur in the following execution flow: 1. CPU1: Sets a valid sbdf via devid_write, then checks the sbdf's validity in execution flows such as devid_show, iommu_devtbl_show, and iommu_irqtbl_show. 2. CPU2: Sets an invalid sbdf via devid_write, at which point the sbdf value is -1. 3. CPU1: accesses the IOMMU device table, IRQ table, based on the invalid SBDF value of -1, resulting in illegal access. This is especially problematic in monitoring scripts, where multiple scripts may access debugFS simultaneously, and some scripts may unexpectedly set invalid values, which triggers illegal access in debugfs. This patch modifies the execution flow of devid_show, iommu_devtbl_show, and iommu_irqtbl_show to ensure that these processes determine the validity and access based on the same device-id, thus guaranteeing correctness and robustness. Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-27iommupt: Fix short gather if the unmap goes into a large mappingJason Gunthorpe-1/+1
unmap has the odd behavior that it can unmap more than requested if the ending point lands within the middle of a large or contiguous IOPTE. In this case the gather should flush everything unmapped which can be larger than what was requested to be unmapped. The gather was only flushing the range requested to be unmapped, not extending to the extra range, resulting in a short invalidation if the caller hits this special condition. This was found by the new invalidation/gather test I am adding in preparation for ARMv8. Claude deduced the root cause. As far as I remember nothing relies on unmapping a large entry, so this is likely not a triggerable bug. Cc: stable@vger.kernel.org Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-26Merge tag 'dma-mapping-7.0-2026-03-25' of ↵Linus Torvalds-4/+17
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "A set of fixes for DMA-mapping subsystem, which resolve false- positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)" * tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: add missing `inline` for `dma_free_attrs` mm/hmm: Indicate that HMM requires DMA coherency RDMA/umem: Tell DMA mapping that UMEM requires coherency iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set dma-mapping: Introduce DMA require coherency attribute dma-mapping: Clarify valid conditions for CPU cache line overlap dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output dma-debug: Allow multiple invocations of overlapping entries dma: swiotlb: add KMSAN annotations to swiotlb_bounce()
2026-03-25Merge tag 'dma-mapping-7.0-2026-03-25' into dma-mapping-for-nextMarek Szyprowski-4/+17
dma-mapping fixes for Linux 7.0 A set of fixes for DMA-mapping subsystem, which resolve false-positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda). Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2026-03-25iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc()Kexin Sun-1/+1
The function iommufd_hw_pagetable_alloc() was renamed to iommufd_hwpt_paging_alloc() by commit 89db31635c87 ("iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable"). Update the stale reference in iommufd_device_auto_get_domain(). Link: https://patch.msgid.link/r/20260321105759.6832-1-kexinsun@smail.nju.edu.cn Assisted-by: unnamed:deepseek-v3.2 coccinelle Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-24iommu/tegra241-cmdqv: Set supports_cmd op in tegra241_vcmdq_hw_init()Nicolin Chen-3/+4
vintf->hyp_own is finalized in tegra241_vintf_hw_init(). On the other hand, tegra241_vcmdq_alloc_smmu_cmdq() is called via an init_structures callback, which is earlier than tegra241_vintf_hw_init(). This results in the supports_cmd op always being set to the guest function, although this doesn't break any functionality nor have some noticeable perf impact since non-invalidation commands are not issued in the perf sensitive context. Fix this by moving supports_cmd to tegra241_vcmdq_hw_init(). After this change, - For a guest kernel, this will be a status quo - For a host kernel, non-invalidation commands will be issued to VCMDQ(s) Fixes: a9d40285bdef ("iommu/tegra241-cmdqv: Limit CMDs for VCMDQs of a guest owned VINTF") Reported-by: Eric Auger <eric.auger@redhat.com> Reported-by: Shameer Kolothum <skolothumtho@nvidia.com> Closes: https://lore.kernel.org/qemu-devel/CH3PR12MB754836BEE54E39B30C7210C0AB44A@CH3PR12MB7548.namprd12.prod.outlook.com/ Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-24iommu/arm-smmu-v3: Update Arm errataRobin Murphy-4/+14
MMU-700 r1p1 has subsequently fixed some of the errata for which we've been applying the workarounds unconditionally, so we can now make those conditional. However, there have also been some more new cases identified where we must rely on range invalidation commands, and thus still nominally avoid DVM being inadvertently enabled. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-24iommu/arm-smmu-v3: Fix typos introduced by arm_smmu_invsNicolin Chen-4/+4
These are introduced by separate commits, so not submitting with a "Fixes" line, since they aren't critical. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-24iommu/arm-smmu-v3: Do not continue in __arm_smmu_domain_inv_range()Nicolin Chen-2/+2
The loop in the __arm_smmu_domain_inv_range() is a while loop, not a for loop. So, using "continue" is wrong that would fail to move the needle. Meanwhile, though the current command is skipped, the batch still has to go through arm_smmu_invs_end_batch() to be issued accordingly. Thus, use "break" to fix the issue. Fixes: 587bb3e56a2c ("iommu/arm-smmu-v3: Add arm_smmu_invs based arm_smmu_domain_inv_range()") Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-20iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attributeLeon Romanovsky-4/+17
Add support for the DMA_ATTR_REQUIRE_COHERENT attribute to the exported functions. This attribute indicates that the SWIOTLB path must not be used and that no sync operations should be performed. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-6-1dde90a7f08b@nvidia.com
2026-03-19iommu/arm-smmu-v3: Perform per-domain invalidations using arm_smmu_invsNicolin Chen-195/+24
Replace the old invalidation functions with arm_smmu_domain_inv_range() in all the existing invalidation routines. And deprecate the old functions. The new arm_smmu_domain_inv_range() handles the CMDQ_MAX_TLBI_OPS as well, so drop it in the SVA function. Since arm_smmu_cmdq_batch_add_range() has only one caller now, and it must be given a valid size, add a WARN_ON_ONCE to catch any missed case. Also update the comments in arm_smmu_tlb_inv_context() to clarify things with the new invalidation functions. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19iommu/arm-smmu-v3: Add arm_smmu_invs based arm_smmu_domain_inv_range()Nicolin Chen-13/+221
Each smmu_domain now has an arm_smmu_invs that specifies the invalidation steps to perform after any change the IOPTEs. This includes supports for basic ASID/VMID, the special case for nesting, and ATC invalidations. Introduce a new arm_smmu_domain_inv helper iterating smmu_domain->invs to convert the invalidation array to commands. Any invalidation request with no size specified means an entire flush over a range based one. Take advantage of the sorted array to compatible batch operations together to the same SMMU. For instance, ATC invaliations for multiple SIDs can be pushed as a batch. ATC invalidations must be completed before the driver disables ATS. Or the device is permitted to ignore any racing invalidation that would cause an SMMU timeout. The sequencing is done with a rwlock where holding the write side of the rwlock means that there are no outstanding ATC invalidations. If ATS is not used the rwlock is ignored, similar to the existing code. Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19iommu/arm-smmu-v3: Populate smmu_domain->invs when attaching mastersNicolin Chen-1/+278
Update the invs array with the invalidations required by each domain type during attachment operations. Only an SVA domain or a paging domain will have an invs array: a. SVA domain will add an INV_TYPE_S1_ASID per SMMU and an INV_TYPE_ATS per SID b. Non-nesting-parent paging domain with no ATS-enabled master will add a single INV_TYPE_S1_ASID or INV_TYPE_S2_VMID per SMMU c. Non-nesting-parent paging domain with ATS-enabled master(s) will do (b) and add an INV_TYPE_ATS per SID d. Nesting-parent paging domain will add an INV_TYPE_S2_VMID followed by an INV_TYPE_S2_VMID_S1_CLEAR per vSMMU. For an ATS-enabled master, it will add an INV_TYPE_ATS_FULL per SID Note that case #d prepares for a future implementation of VMID allocation which requires a followup series for S2 domain sharing. So when a nesting parent domain is attached through a vSMMU instance using a nested domain. VMID will be allocated per vSMMU instance v.s. currectly per S2 domain. The per-domain invalidation is not needed until the domain is attached to a master (when it starts to possibly use TLB). This will make it possible to attach the domain to multiple SMMUs and avoid unnecessary invalidation overhead during teardown if no STEs/CDs refer to the domain. It also means that when the last device is detached, the old domain must flush its ASID or VMID, since any new iommu_unmap() call would not trigger invalidations given an empty domain->invs array. Introduce some arm_smmu_invs helper functions for building scratch arrays, preparing and installing old/new domain's invalidation arrays. Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19iommu/arm-smmu-v3: Pre-allocate a per-master invalidation arrayNicolin Chen-4/+45
When a master is attached from an old domain to a new domain, it needs to build an invalidation array to delete and add the array entries from/onto the invalidation arrays of those two domains, passed via the to_merge and to_unref arguments into arm_smmu_invs_merge/unref() respectively. Since the master->num_streams might differ across masters, a memory would have to be allocated when building an to_merge/to_unref array which might fail with -ENOMEM. On the other hand, an attachment to arm_smmu_blocked_domain must not fail so it's the best to avoid any memory allocation in that path. Pre-allocate a fixed size invalidation array for every master. This array will be used as a scratch to fill dynamically when building a to_merge or to_unref invs array. Sort fwspec->ids in an ascending order to fit to the arm_smmu_invs_merge() function. Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19iommu/arm-smmu-v3: Introduce a per-domain arm_smmu_invs arrayJason Gunthorpe-0/+502
Create a new data structure to hold an array of invalidations that need to be performed for the domain based on what masters are attached, to replace the single smmu pointer and linked list of masters in the current design. Each array entry holds one of the invalidation actions - S1_ASID, S2_VMID, ATS or their variant with information to feed invalidation commands to HW. It is structured so that multiple SMMUs can participate in the same array, removing one key limitation of the current system. To maximize performance, a sorted array is used as the data structure. It allows grouping SYNCs together to parallelize invalidations. For instance, it will group all the ATS entries after the ASID/VMID entry, so they will all be pushed to the PCI devices in parallel with one SYNC. To minimize the locking cost on the invalidation fast path (reader of the invalidation array), the array is managed with RCU. Provide a set of APIs to add/delete entries to/from an array, which cover cannot-fail attach cases, e.g. attaching to arm_smmu_blocked_domain. Also add kunit coverage for those APIs. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19iommu/arm-smmu-v3: Add an inline arm_smmu_domain_free()Nicolin Chen-4/+10
There will be a bit more things to free than smmu_domain itself. So keep a simple inline function in the header to share aross files. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19iommu/arm-smmu-v3: Explicitly set smmu_domain->stage for SVANicolin Chen-0/+5
Both the ARM_SMMU_DOMAIN_S1 case and the SVA case use ASID, requiring ASID based invalidation commands to flush the TLB. Define an ARM_SMMU_DOMAIN_SVA to make the SVA case clear to share the same path with the ARM_SMMU_DOMAIN_S1 case, which will be a part of the routine to build a new per-domain invalidation array. There is no function change. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-19iommu/arm-smmu-v3: Add a missing dma_wmb() for hitless STE updateNicolin Chen-0/+7
When writing a new (previously invalid) valid IOPTE to a page table, then installing the page table into an STE hitlesslessly (e.g. in S2TTB field), there is a window before an STE invalidation, where the page-table may be accessed by SMMU but the new IOPTE is still siting in the CPU cache. This could occur when we allocate an iommu_domain and immediately install it hitlessly, while there would be no dma_wmb() for the page table memory prior to the earliest point of HW reading the STE. Fix it by adding a dma_wmb() prior to updating the STE. Fixes: 56e1a4cc2588 ("iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry") Cc: stable@vger.kernel.org Reported-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/linux-iommu/aXdlnLLFUBwjT0V5@willie-the-truck/ Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-17iommufd: Report ATS not supported status via IOMMU_GET_HW_INFOShameer Kolothum-0/+4
If the IOMMU driver reports that ATS is not supported for a device, set the IOMMU_HW_CAP_PCI_ATS_NOT_SUPPORTED flag in the returned hardware capabilities. This uses a negative flag for UAPI compatibility. Existing userspace assumes ATS is supported if no flag is present. This also ensures that new userspace works correctly on both old and new kernels, where a zero value implies ATS support. When this flag is set, ATS cannot be used for the device. When it is clear, ATS may be enabled when an appropriate HWPT is attached. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17iommu: Add device ATS supported capabilityShameer Kolothum-0/+11
PCIe ATS may be disabled by platform firmware, root complex limitations, or kernel policy even when a device advertises the ATS capability in its PCI configuration space. Add a new IOMMU_CAP_PCI_ATS_SUPPORTED capability to allow IOMMU drivers to report the effective ATS decision for a device. When this capability is true for a device, ATS may be enabled for that device, but it does not imply that ATS is currently enabled. A subsequent patch will extend iommufd to expose the effective ATS status to userspace. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17iommu/amd: Block identity domain when SNP enabledJoe Damato-1/+14
Previously, commit 8388f7df936b ("iommu/amd: Do not support IOMMU_DOMAIN_IDENTITY after SNP is enabled") prevented users from changing the IOMMU domain to identity if SNP was enabled. This resulted in an error when writing to sysfs: # echo "identity" > /sys/kernel/iommu_groups/50/type -bash: echo: write error: Cannot allocate memory However, commit 4402f2627d30 ("iommu/amd: Implement global identity domain") changed the flow of the code, skipping the SNP guard and allowing users to change the IOMMU domain to identity after a machine has booted. Once the user does that, they will probably try to bind and the device/driver will start to do DMA which will trigger errors: iommu ivhd3: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:43:00.0 pasid=0x00000 address=0x3737b01000 flags=0x0020] iommu ivhd3: AMD-Vi: Control Reg : 0xc22000142148d AMD-Vi: DTE[0]: 6000000000000003 AMD-Vi: DTE[1]: 0000000000000001 AMD-Vi: DTE[2]: 2000003088b3e013 AMD-Vi: DTE[3]: 0000000000000000 bnxt_en 0000:43:00.0 (unnamed net_device) (uninitialized): Error (timeout: 500015) msg {0x0 0x0} len:0 iommu ivhd3: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:43:00.0 pasid=0x00000 address=0x3737b01000 flags=0x0020] iommu ivhd3: AMD-Vi: Control Reg : 0xc22000142148d AMD-Vi: DTE[0]: 6000000000000003 AMD-Vi: DTE[1]: 0000000000000001 AMD-Vi: DTE[2]: 2000003088b3e013 AMD-Vi: DTE[3]: 0000000000000000 bnxt_en 0000:43:00.0: probe with driver bnxt_en failed with error -16 To prevent this from happening, create an attach wrapper for identity_domain_ops which returns EINVAL if amd_iommu_snp_en is true. With this commit applied: # echo "identity" > /sys/kernel/iommu_groups/62/type -bash: echo: write error: Invalid argument Fixes: 4402f2627d30 ("iommu/amd: Implement global identity domain") Signed-off-by: Joe Damato <joe@dama.to> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>