linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Files	Lines
2025-10-03	kmsan: fix kmsan_handle_dma() to avoid false positives	Shigeru Yoshida	1	-2/+1
	KMSAN reports an uninitialized value issue in dma_map_phys()[1]. This is a false positive caused by the way the virtual address is handled in kmsan_handle_dma(). Fix it by translating the physical address to a virtual address using phys_to_virt(). [1] BUG: KMSAN: uninit-value in dma_map_phys+0xdc5/0x1060 dma_map_phys+0xdc5/0x1060 dma_map_page_attrs+0xcf/0x130 e1000_xmit_frame+0x3c51/0x78f0 dev_hard_start_xmit+0x22f/0xa30 sch_direct_xmit+0x3b2/0xcf0 __dev_queue_xmit+0x3588/0x5e60 neigh_resolve_output+0x9c5/0xaf0 ip6_finish_output2+0x24e0/0x2d30 ip6_finish_output+0x903/0x10d0 ip6_output+0x331/0x600 mld_sendpack+0xb4a/0x1770 mld_ifc_work+0x1328/0x19b0 process_scheduled_works+0xb91/0x1d80 worker_thread+0xedf/0x1590 kthread+0xd5c/0xf00 ret_from_fork+0x1f5/0x4c0 ret_from_fork_asm+0x1a/0x30 Uninit was created at: __kmalloc_cache_noprof+0x8f5/0x16b0 syslog_print+0x9a/0xef0 do_syslog+0x849/0xfe0 __x64_sys_syslog+0x97/0x100 x64_sys_call+0x3cf8/0x3e30 do_syscall_64+0xd9/0xfa0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Bytes 0-89 of 90 are uninitialized Memory access of size 90 starts at ffff8880367ed000 CPU: 1 UID: 0 PID: 1552 Comm: kworker/1:2 Not tainted 6.17.0-next-20250929 #26 PREEMPT(none) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 Workqueue: mld mld_ifc_work Fixes: 6eb1e769b2c1 ("kmsan: convert kmsan_handle_dma to use physical addresses") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251002051024.3096061-1-syoshida@redhat.com
2025-09-17	kmsan: fix missed kmsan_handle_dma() signature conversion	Leon Romanovsky	1	-2/+1
	kmsan_handle_dma_sg() has call to kmsan_handle_dma() function which was missed during conversion to physical addresses. Update that caller too and fix the following compilation error: mm/kmsan/hooks.c:372:6: error: too many arguments to function call, expected 3, have 4 371 \| kmsan_handle_dma(sg_page(item), item->offset, item->length, \| ~~~~~~~~~~~~~~~~ 372 \| dir); \| ^~~ mm/kmsan/hooks.c:362:19: note: 'kmsan_handle_dma' declared here 362 \| EXPORT_SYMBOL_GPL(kmsan_handle_dma); Fixes: 6eb1e769b2c1 ("kmsan: convert kmsan_handle_dma to use physical addresses") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509170638.AMGNCMEE-lkp@intel.com/ Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/4b2d7d0175b30177733bbbd42bf979d77eb73c29.1758090947.git.leon@kernel.org
2025-09-12	mm/hmm: properly take MMIO path	Leon Romanovsky	1	-7/+8
	In case peer-to-peer transaction traverses through host bridge, the IOMMU needs to have IOMMU_MMIO flag, together with skip of CPU sync. The latter was handled by provided DMA_ATTR_SKIP_CPU_SYNC flag, but IOMMU flag was missed, due to assumption that such memory can be treated as regular one. Reuse newly introduced DMA attribute to properly take MMIO path. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/998251caf3f9d1a3f6f8205f1f494c707fb4d8fa.1757423202.git.leonro@nvidia.com
2025-09-12	mm/hmm: migrate to physical address-based DMA mapping API	Leon Romanovsky	1	-4/+4
	Convert HMM DMA operations from the legacy page-based API to the new physical address-based dma_map_phys() and dma_unmap_phys() functions. This demonstrates the preferred approach for new code that should use physical addresses directly rather than page+offset parameters. The change replaces dma_map_page() and dma_unmap_page() calls with dma_map_phys() and dma_unmap_phys() respectively, using the physical address that was already available in the code. This eliminates the redundant page-to-physical address conversion and aligns with the DMA subsystem's move toward physical address-centric interfaces. This serves as an example of how new code should be written to leverage the more efficient physical address API, which provides cleaner interfaces for drivers that already have access to physical addresses. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/d45207f195b8f77d23cc2d571c83197328a86b04.1757423202.git.leonro@nvidia.com
2025-09-12	dma-mapping: export new dma_*map_phys() interface	Leon Romanovsky	9	-134/+50
	Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys() that operate directly on physical addresses instead of page+offset parameters. This provides a more efficient interface for drivers that already have physical addresses available. The new functions are implemented as the primary mapping layer, with the existing dma_map_page_attrs()/dma_map_resource() and dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple wrappers around the phys-based implementations. In case dma_map_page_attrs(), the struct page is converted to physical address with help of page_to_phys() function and dma_map_resource() provides physical address as is together with addition of DMA_ATTR_MMIO attribute. The old page-based API is preserved in mapping.c to ensure that existing code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL variant for dma_*map_phys(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/54cc52af91777906bbe4a386113437ba0bcfba9c.1757423202.git.leonro@nvidia.com
2025-09-12	xen: swiotlb: Open code map_resource callback	Leon Romanovsky	1	-1/+20
	General dma_direct_map_resource() is going to be removed in next patch, so simply open-code it in xen driver. Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/e9c66a92e818f416875441b6711963f9782dbbeb.1757423202.git.leonro@nvidia.com
2025-09-12	dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()	Leon Romanovsky	1	-5/+21
	Make dma_map_page_attrs() and dma_map_page_attrs() respect DMA_ATTR_MMIO. DMA_ATR_MMIO makes the functions behave the same as dma_(un)map_resource(): - No swiotlb is possible - Legacy dma_ops arches use ops->map_resource() - No kmsan - No arch_dma_map_phys_direct() The prior patches have made the internal functions called here support DMA_ATTR_MMIO. This is also preparation for turning dma_map_resource() into an inline calling dma_map_phys(DMA_ATTR_MMIO) to consolidate the flows. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/3660e2c78ea409d6c483a215858fb3af52cd0ed3.1757423202.git.leonro@nvidia.com
2025-09-12	kmsan: convert kmsan_handle_dma to use physical addresses	Leon Romanovsky	5	-13/+15
	Convert the KMSAN DMA handling function from page-based to physical address-based interface. The refactoring renames kmsan_handle_dma() parameters from accepting (struct page *page, size_t offset, size_t size) to (phys_addr_t phys, size_t size). The existing semantics where callers are expected to provide only kmap memory is continued here. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/3557cbaf66e935bc794f37d2b891ef75cbf2c80c.1757423202.git.leonro@nvidia.com
2025-09-12	dma-mapping: convert dma_direct_*map_page to be phys_addr_t based	Leon Romanovsky	5	-34/+49
	Convert the DMA direct mapping functions to accept physical addresses directly instead of page+offset parameters. The functions were already operating on physical addresses internally, so this change eliminates the redundant page-to-physical conversion at the API boundary. The functions dma_direct_map_page() and dma_direct_unmap_page() are renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively, with their calling convention changed from (struct page *page, unsigned long offset) to (phys_addr_t phys). Architecture-specific functions arch_dma_map_page_direct() and arch_dma_unmap_page_direct() are similarly renamed to arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct(). The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks to allow integration with dma_direct_map_resource and dma_direct_map_phys() is extended to support MMIO path either. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/bb15a22f76dc2e26683333ff54e789606cfbfcf0.1757423202.git.leonro@nvidia.com
2025-09-12	iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys()	Leon Romanovsky	1	-4/+11
	Make iommu_dma_map_phys() and iommu_dma_unmap_phys() respect DMA_ATTR_MMIO. DMA_ATTR_MMIO makes the functions behave the same as iommu_dma_(un)map_resource(): - No swiotlb is possible - No cache flushing is done (ATTR_MMIO should not be cached memory) - prot for iommu_map() has IOMMU_MMIO not IOMMU_CACHE This is preparation for replacing iommu_dma_map_resource() callers with iommu_dma_map_phys(DMA_ATTR_MMIO) and removing iommu_dma_(un)map_resource(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/acc255bee358fec9c7da6b2a5904ee50abcd09f1.1757423202.git.leonro@nvidia.com
2025-09-12	iommu/dma: rename iommu_dma_map_page to iommu_dma_map_phys	Leon Romanovsky	4	-17/+14
	Rename the IOMMU DMA mapping functions to better reflect their actual calling convention. The functions iommu_dma_map_page() and iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and iommu_dma_unmap_phys() respectively, as they already operate on physical addresses rather than page structures. The calling convention changes from accepting (struct page *page, unsigned long offset) to (phys_addr_t phys), which eliminates the need for page-to-physical address conversion within the functions. This renaming prepares for the broader DMA API conversion from page-based to physical address-based mapping throughout the kernel. All callers are updated to pass physical addresses directly, including dma_map_page_attrs(), scatterlist mapping functions, and DMA page allocation helpers. The change simplifies the code by removing the page_to_phys() + offset calculation that was previously done inside the IOMMU functions. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/ed172f95f8f57782beae04f782813366894e98df.1757423202.git.leonro@nvidia.com
2025-09-12	dma-mapping: rename trace_dma_map_page to trace_dma_map_phys	Leon Romanovsky	2	-4/+4
	As a preparation for following map_page -> map_phys API conversion, let's rename trace_dma_map_page() to be trace_dma_map_phys(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/c0c02d7d8bd4a148072d283353ba227516a76682.1757423202.git.leonro@nvidia.com
2025-09-12	dma-debug: refactor to use physical addresses for page mapping	Leon Romanovsky	5	-35/+35
	Convert the DMA debug infrastructure from page-based to physical address-based mapping as a preparation to rely on physical address for DMA mapping routines. The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and changes its signature to accept a phys_addr_t parameter instead of struct page and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys(). A new dma_debug_phy type is introduced to distinguish physical address mappings from other debug entry types. All callers throughout the codebase are updated to pass physical addresses directly, eliminating the need for page-to-physical conversion in the debug layer. This refactoring eliminates the need to convert between page pointers and physical addresses in the debug layer, making the code more efficient and consistent with the DMA mapping API's physical address focus. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> [mszyprow: added a fixup] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/56d1a6769b68dfcbf8b26a75a7329aeb8e3c3b6a.1757423202.git.leonro@nvidia.com Link: https://lore.kernel.org/all/20250910052618.GH341237@unreal/
2025-09-12	iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().	Leon Romanovsky	1	-4/+14
	This will replace the hacky use of DMA_ATTR_SKIP_CPU_SYNC to avoid touching the possibly non-KVA MMIO memory. Also correct the incorrect caching attribute for the IOMMU, MMIO memory should not be cachable inside the IOMMU mapping or it can possibly create system problems. Set IOMMU_MMIO for DMA_ATTR_MMIO. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/17ba63991aeaf8a80d5aca9ba5d028f1daa58f62.1757423202.git.leonro@nvidia.com
2025-09-12	dma-mapping: introduce new DMA attribute to indicate MMIO memory	Leon Romanovsky	4	-1/+43
	This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers that reside in memory-mapped I/O (MMIO) regions, such as device BARs exposed through the host bridge, which are accessible for peer-to-peer (P2P) DMA. This attribute is especially useful for exporting device memory to other devices for DMA without CPU involvement, and avoids unnecessary or potentially detrimental CPU cache maintenance calls. DMA_ATTR_MMIO is supposed to provide dma_map_resource() functionality without need to call to special function and perform branching when processing generic containers like bio_vec by the callers. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/6f058ec395c5348014860dbc2eed348c17975843.1757423202.git.leonro@nvidia.com
2025-09-02	dma-debug: don't enforce dma mapping check on noncoherent allocations	Baochen Qiang	3	-3/+69
	As discussed in [1], there is no need to enforce dma mapping check on noncoherent allocations, a simple test on the returned CPU address is good enough. Add a new pair of debug helpers and use them for noncoherent alloc/free to fix this issue. Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API") Link: https://lore.kernel.org/all/ff6c1fe6-820f-4e58-8395-df06aa91706c@oss.qualcomm.com # 1 Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250828-dma-debug-fix-noncoherent-dma-check-v1-1-76e9be0dd7fc@oss.qualcomm.com
2025-08-13	dma/pool: Ensure DMA_DIRECT_REMAP allocations are decrypted	Shanker Donthineni	1	-2/+2
	When CONFIG_DMA_DIRECT_REMAP is enabled, atomic pool pages are remapped via dma_common_contiguous_remap() using the supplied pgprot. Currently, the mapping uses pgprot_dmacoherent(PAGE_KERNEL), which leaves the memory encrypted on systems with memory encryption enabled (e.g., ARM CCA Realms). This can cause the DMA layer to fail or crash when accessing the memory, as the underlying physical pages are not configured as expected. Fix this by requesting a decrypted mapping in the vmap() call: pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL)) This ensures that atomic pool memory is consistently mapped unencrypted. Cc: stable@vger.kernel.org Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250811181759.998805-1-sdonthineni@nvidia.com
2025-08-11	of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()	Oreoluwa Babatunde	3	-6/+15
	Restructure the call site for dma_contiguous_early_fixup() to where the reserved_mem nodes are being parsed from the DT so that dma_mmu_remap[] is populated before dma_contiguous_remap() is called. Fixes: 8a6e02d0c00e ("of: reserved_mem: Restructure how the reserved memory regions are processed") Signed-off-by: Oreoluwa Babatunde <oreoluwa.babatunde@oss.qualcomm.com> Tested-by: William Zhang <william.zhang@broadcom.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250806172421.2748302-1-oreoluwa.babatunde@oss.qualcomm.com
2025-08-11	swiotlb: Remove redundant __GFP_NOWARN	Qianfeng Rong	1	-1/+1
	Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT \| __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250805023222.332920-1-rongqianfeng@vivo.com
2025-08-11	dma-direct: clean up the logic in __dma_direct_alloc_pages()	Petr Tesarik	1	-18/+13
	Convert a goto-based loop to a while() loop. To allow the simplification, return early when allocation from CMA is successful. As a bonus, this early return avoids a repeated dma_coherent_ok() check. No functional change. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250710083829.1853466-1-ptesarik@suse.com
2025-08-10	Linux 6.17-rc1v6.17-rc1	Linus Torvalds	1	-2/+2

2025-08-09	tools/power turbostat: version 2025.09.09	Len Brown	1	-1/+1
	Probe and display L3 Cache topology Add ability to average an added counter (useful for pre-integrated "counters", such as Watts) Break the limit of 64 built-in counters. Assorted bug fixes and minor feature tweaks Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat: Handle non-root legacy-uncore sysfs permissions	Len Brown	1	-1/+2
	/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/ may be readable by all, but /sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz may be readable only by root. Non-root turbostat users see complaints in this scenario. Fail probe of the interface if we can't read current_freq_khz. Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Original-patch-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat: standardize PER_THREAD_PARAMS	Len Brown	1	-20/+22
	use a macro for PER_THREAD_PARAMS to make adding one later more clear. no functional change Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat: Fix DMR support	Zhang Rui	1	-14/+15
	Together with the RAPL MSRs, there are more MSRs gone on DMR, including PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response Time Limit) MSRs. The configurable TDP info should also be retrieved from TPMI based Intel Speed Select Technology feature. Remove the access of these MSRs for DMR. Improve the DMR platform feature table to make it more readable at the same time. Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat: add format "average" for external attributes	Michael Hebenstreit	2	-11/+22
	External atributes with format "raw" are not printed in summary lines for nodes/packages (or with option -S). The new format "average" behaves like "raw" but also adds the summary data Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat: delete GET_PKG()	Len Brown	1	-15/+6
	pkg_base[pkg_id] is a simple array of structure pointers, let the compiler treat it that way. Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat: probe and display L3 cache topology	Len Brown	1	-3/+31
	Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat: Support more than 64 built-in-counters	Len Brown	1	-154/+406
	We have out-grown the ability to use a 64-bit memory location to inventory every possible built-in counter. Leverage the the CPU_SET(3) macros to break this barrier. Also, break the Joules & Watts counters into two, since we can no longer 'or' them together... Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09	tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns	Len Brown	1	-0/+8
	Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns. Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08	tools/power turbostat: Fix bogus SysWatt for forked program	Zhang Rui	1	-0/+1
	Similar to delta_cpu(), delta_platform() is called in turbostat main loop. This ensures accurate SysWatt readings in periodic monitoring mode $ sudo turbostat -S -q --show power -i 1 CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt 60 61 6.21 1.13 0.16 0.00 0.00 0.00 13.07 58 61 6.00 1.07 0.18 0.00 0.00 0.00 12.75 58 61 5.74 1.05 0.17 0.00 0.00 0.00 12.22 58 60 6.27 1.11 0.24 0.00 0.00 0.00 13.55 However, delta_platform() is missing for forked program and causes bogus SysWatt reporting, $ sudo turbostat -S -q --show power sleep 1 1.004736 sec CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt 57 58 6.05 1.02 0.16 0.00 0.00 0.00 0.03 Add missing delta_platform() for forked program. Fixes: e5f687b89bc2 ("tools/power turbostat: Add RAPL psys as a built-in counter") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08	tools/power turbostat: Handle cap_get_proc() ENOSYS	Calvin Owens	1	-1/+9
	Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc(). Check for ENOSYS to recognize this case, and continue on to attempt to access the requested MSRs (such as temperature). Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08	tools/power turbostat: Fix build with musl	Calvin Owens	1	-0/+1
	turbostat.c: In function 'parse_int_file': turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function) 5567 \| char path[PATH_MAX]; \| ^~~~~~~~ turbostat.c: In function 'probe_graphics': turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function) 6787 \| char path[PATH_MAX]; \| ^~~~~~~~ Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08	tools/power turbostat: verify arguments to params --show and --hide	Len Brown	1	-2/+31
	$ sudo turbostat --quiet --show junk turbostat: Counter 'junk' can not be added. Previously, invalid arguments to --show and --hide were silently ignored Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08	io_uring/memmap: cast nr_pages to size_t before shifting	Jens Axboe	1	-1/+1
	If the allocated size exceeds UINT_MAX, then it's necessary to cast the mr->nr_pages value to size_t to prevent it from overflowing. In practice this isn't much of a concern as the required memory size will have been validated upfront, and accounted to the user. And > 4GB sizes will be necessary to make the lack of a cast a problem, which greatly exceeds normal user locked_vm settings that are generally in the kb to mb range. However, if root is used, then accounting isn't done, and then it's possible to hit this issue. Link: https://lore.kernel.org/all/6895b298.050a0220.7f033.0059.GAE@google.com/ Cc: stable@vger.kernel.org Reported-by: syzbot+23727438116feb13df15@syzkaller.appspotmail.com Fixes: 087f997870a9 ("io_uring/memmap: implement mmap for regions") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07	mailbox/pcc: support mailbox management of the shared buffer	Adam Young	2	-4/+127
	Define a new, optional, callback that allows the driver to specify how the return data buffer is allocated. If that callback is set, mailbox/pcc.c is now responsible for reading from and writing to the PCC shared buffer. This also allows for proper checks of the Commnand complete flag between the PCC sender and receiver. For Type 4 channels, initialize the command complete flag prior to accepting messages. Since the mailbox does not know what memory allocation scheme to use for response messages, the client now has an optional callback that allows it to allocate the buffer for a response message. When an outbound message is written to the buffer, the mailbox checks for the flag indicating the client wants an tx complete notification via IRQ. Upon receipt of the interrupt It will pair it with the outgoing message. The expected use is to free the kernel memory buffer for the previous outgoing message. Signed-off-by: Adam Young <admiyo@os.amperecomputing.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-08-07	smb: server: Fix extension string in ksmbd_extract_shortname()	Thorsten Blum	1	-1/+1
	In ksmbd_extract_shortname(), strscpy() is incorrectly called with the length of the source string (excluding the NUL terminator) rather than the size of the destination buffer. This results in "__" being copied to 'extension' rather than "___" (two underscores instead of three). Use the destination buffer size instead to ensure that the string "___" (three underscores) is copied correctly. Cc: stable@vger.kernel.org Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07	ksmbd: limit repeated connections from clients with the same IP	Namjae Jeon	2	-0/+18
	Repeated connections from clients with the same IP address may exhaust the max connections and prevent other normal client connections. This patch limit repeated connections from clients with the same IP. Reported-by: tianshuo han <hantianshuo233@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07	smb: client: only use a single wait_queue to monitor smbdirect connection status	Stefan Metzmacher	2	-11/+9
	There's no need for separate conn_wait and disconn_wait queues. This will simplify the move to common code, the server code already a single wait_queue for this. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07	smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in ↵	Stefan Metzmacher	1	-1/+0
	_smbd_get_connection It is already called long before we may hit this cleanup code path. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07	smb: client: improve logging in smbd_conn_upcall()	Stefan Metzmacher	1	-4/+10
	Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07	smb: client: return an error if rdma_connect does not return within 5 seconds	Stefan Metzmacher	1	-2/+4
	This matches the timeout for tcp connections. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07	PCI: vmd: Fix wrong kfree() in vmd_msi_free()	Nam Cao	1	-1/+3
	vmd_msi_alloc() allocates struct vmd_irq and stashes it into irq_data->chip_data associated with the VMD's interrupt domain. vmd_msi_free() extracts the pointer by calling irq_get_chip_data() and frees it. irq_get_chip_data() returns the chip_data associated with the top interrupt domain. This worked in the past because VMD's interrupt domain was the top domain. But d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") changed the interrupt domain hierarchy so VMD's interrupt domain is not the top domain anymore. irq_get_chip_data() now returns the chip_data at the MSI devices' interrupt domains. It is therefore broken for vmd_msi_free() to kfree() this chip_data. Fix by extracting the chip_data associated with the VMD's interrupt domain. Fixes: d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") Reported-by: Kenneth Crudup <kenny@panix.com> Closes: https://lore.kernel.org/linux-pci/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/ Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Closes: https://lore.kernel.org/linux-pci/ed53280ed15d1140700b96cca2734bf327ee92539e5eb68e80f5bbbf0f01@linux.gnuweeb.org/ Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Tested-by: Kenneth Crudup <kenny@panix.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20250807081051.2253962-1-namcao@linutronix.de
2025-08-07	perf bpf-filter: Enable events manually	Ilya Leoshkevich	1	-1/+4
	On s390, and, in general, on all platforms where the respective event supports auxiliary data gathering, the command: # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data ] # ./perf report --stats \| grep SAMPLE # does not generate samples in the perf.data file. On x86 the command: # sudo perf record -e intel_pt// -u 0 ls is broken too. Looking at the sequence of calls in 'perf record' reveals this behavior: 1. The event 'cycles' is created and enabled: record__open() +-> evlist__apply_filters() +-> perf_bpf_filter__prepare() +-> bpf_program.attach_perf_event() +-> bpf_program.attach_perf_event_opts() +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) The event 'cycles' is enabled and active now. However the event's ring-buffer to store the samples generated by hardware is not allocated yet. 2. The event's fd is mmap()ed to create the ring buffer: record__open() +-> record__mmap() +-> record__mmap_evlist() +-> evlist__mmap_ex() +-> perf_evlist__mmap_ops() +-> mmap_per_cpu() +-> mmap_per_evsel() +-> mmap__mmap() +-> perf_mmap__mmap() +-> mmap() This allocates the ring buffer for the event 'cycles'. With mmap() the kernel creates the ring buffer: perf_mmap(): kernel function to create the event's ring \| buffer to save the sampled data. \| +-> ring_buffer_attach(): Allocates memory for ring buffer. \| The PMU has auxiliary data setup function. The \| has_aux(event) condition is true and the PMU's \| stop() is called to stop sampling. It is not \| restarted: \| \| if (has_aux(event)) \| perf_event_stop(event, 0); \| +-> cpumsf_pmu_stop(): Hardware sampling is stopped. No samples are generated and saved anymore. 3. After the event 'cycles' has been mapped, the event is enabled a second time in: __cmd_record() +-> evlist__enable() +-> __evlist__enable() +-> evsel__enable_cpu() +-> perf_evsel__enable_cpu() +-> perf_evsel__run_ioctl() +-> perf_evsel__ioctl() +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) The second ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); is just a NOP in this case. The first invocation in (1.) sets the event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions perf_ioctl() +-> _perf_ioctl() +-> _perf_event_enable() +-> __perf_event_enable() return immediately because event::state is already set to PERF_EVENT_STATE_ACTIVE. This happens on s390, because the event 'cycles' offers the possibility to save auxilary data. The PMU callbacks setup_aux() and free_aux() are defined. Without both callback functions, cpumsf_pmu_stop() is not invoked and sampling continues. To remedy this, remove the first invocation of ioctl(..., PERF_EVENT_IOC_ENABLE, ...). in step (1.) Create the event in step (1.) and enable it in step (3.) after the ring buffer has been mapped. Output after: # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.876 MB perf.data ] # ./perf report --stats \| grep SAMPLE SAMPLE events: 16200 (99.5%) SAMPLE events: 16200 # The software event succeeded both before and after the patch: # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ ./perf test -w thloop 2 [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 2.870 MB perf.data ] # ./perf report --stats \| grep SAMPLE SAMPLE events: 53506 (99.8%) SAMPLE events: 53506 # Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07	libbpf: Add the ability to suppress perf event enablement	Ilya Leoshkevich	2	-6/+11
	Automatically enabling a perf event after attaching a BPF prog to it is not always desirable. Add a new "dont_enable" field to struct bpf_perf_event_opts. While introducing "enable" instead would be nicer in that it would avoid a double negation in the implementation, it would make DECLARE_LIBBPF_OPTS() less efficient. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07	pptp: fix pptp_xmit() error path	Eric Dumazet	1	-3/+4
	I accidentally added a bug in pptp_xmit() that syzbot caught for us. Only call ip_rt_put() if a route has been allocated. BUG: unable to handle page fault for address: ffffffffffffffdb PGD df3b067 P4D df3b067 PUD df3d067 PMD 0 Oops: Oops: 0002 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 6346 Comm: syz.0.336 Not tainted 6.16.0-next-20250804-syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:arch_atomic_add_return arch/x86/include/asm/atomic.h:85 [inline] RIP: 0010:raw_atomic_sub_return_release include/linux/atomic/atomic-arch-fallback.h:846 [inline] RIP: 0010:atomic_sub_return_release include/linux/atomic/atomic-instrumented.h:327 [inline] RIP: 0010:__rcuref_put include/linux/rcuref.h:109 [inline] RIP: 0010:rcuref_put+0x172/0x210 include/linux/rcuref.h:173 Call Trace: <TASK> dst_release+0x24/0x1b0 net/core/dst.c:167 ip_rt_put include/net/route.h:285 [inline] pptp_xmit+0x14b/0x1a90 drivers/net/ppp/pptp.c:267 __ppp_channel_push+0xf2/0x1c0 drivers/net/ppp/ppp_generic.c:2166 ppp_channel_push+0x123/0x660 drivers/net/ppp/ppp_generic.c:2198 ppp_write+0x2b0/0x400 drivers/net/ppp/ppp_generic.c:544 vfs_write+0x27b/0xb30 fs/read_write.c:684 ksys_write+0x145/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: de9c4861fb42 ("pptp: ensure minimal skb length in pptp_xmit()") Reported-by: syzbot+27d7cfbc93457e472e00@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/689095a5.050a0220.1fc43d.0009.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250807142146.2877060-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-07	lib/sbitmap: make sbitmap_get_shallow() internal	Yu Kuai	2	-19/+16
	Because it's only used in sbitmap.c Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250807032413.1469456-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07	lib/sbitmap: convert shallow_depth from one word to the whole sbitmap	Yu Kuai	6	-73/+52
	Currently elevators will record internal 'async_depth' to throttle asynchronous requests, and they both calculate shallow_dpeth based on sb->shift, with the respect that sb->shift is the available tags in one word. However, sb->shift is not the availbale tags in the last word, see __map_depth: if (index == sb->map_nr - 1) return sb->depth - (index << sb->shift); For consequence, if the last word is used, more tags can be get than expected, for example, assume nr_requests=256 and there are four words, in the worst case if user set nr_requests=32, then the first word is the last word, and still use bits per word, which is 64, to calculate async_depth is wrong. One the ohter hand, due to cgroup qos, bfq can allow only one request to be allocated, and set shallow_dpeth=1 will still allow the number of words request to be allocated. Fix this problems by using shallow_depth to the whole sbitmap instead of per word, also change kyber, mq-deadline and bfq to follow this, a new helper __map_depth_with_shallow() is introduced to calculate available bits in each word. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07	nvmet: exit debugfs after discovery subsystem exits	Mohamed Khalfella	1	-1/+1
	Commit 528589947c180 ("nvmet: initialize discovery subsys after debugfs is initialized") changed nvmet_init() to initialize nvme discovery after "nvmet" debugfs directory is initialized. The change broke nvmet_exit() because discovery subsystem now depends on debugfs. Debugfs should be destroyed after discovery subsystem. Fix nvmet_exit() to do that. Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs96AfFQpyDKF_MdfJsnOEo=2V7dQgqjFv+k3t7H-=yGhA@mail.gmail.com/ Fixes: 528589947c180 ("nvmet: initialize discovery subsys after debugfs is initialized") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20250807053507.2794335-1-mkhalfella@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07	treewide: rename GPIO set callbacks back to their original names	Bartosz Golaszewski	282	-356/+355
	The conversion of all GPIO drivers to using the .set_rv() and .set_multiple_rv() callbacks from struct gpio_chip (which - unlike their predecessors - return an integer and allow the controller drivers to indicate failures to users) is now complete and the legacy ones have been removed. Rename the new callbacks back to their original names in one sweeping change. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>