summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorLines
2026-02-06tcp: inline tcp_filter()Eric Dumazet-1/+7
This helper is already (auto)inlined from IPv4 TCP stack. Make it an inline function to benefit IPv6 as well. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 30/-49 (-19) Function old new delta tcp_v6_rcv 3448 3478 +30 __pfx_tcp_filter 16 - -16 tcp_filter 33 - -33 Total: Before=24891904, After=24891885, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205164329.3401481-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06netns: optimize netns cleaning by batching unhash_nsid callsQiliang Yuan-0/+1
Currently, unhash_nsid() scans the entire system for each netns being killed, leading to O(L_dying_net * M_alive_net * N_id) complexity, as __peernet2id() also performs a linear search in the IDR. Optimize this to O(M_alive_net * N_id) by batching unhash operations. Move unhash_nsid() out of the per-netns loop in cleanup_net() to perform a single-pass traversal over survivor namespaces. Identify dying peers by an 'is_dying' flag, which is set under net_rwsem write lock after the netns is removed from the global list. This batches the unhashing work and eliminates the O(L_dying_net) multiplier. To minimize the impact on struct net size, 'is_dying' is placed in an existing hole after 'hash_mix' in struct net. Use a restartable idr_get_next() loop for iteration. This avoids the unsafe modification issue inherent to idr_for_each() callbacks and allows dropping the nsid_lock to safely call sleepy rtnl_net_notifyid(). Clean up redundant nsid_lock and simplify the destruction loop now that unhashing is centralized. Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204074854.3506916-1-realwujing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to ↵Qi Zheng-1/+1
CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config For architectures that define __HAVE_ARCH_TLB_REMOVE_TABLE, the page tables at the pmd/pud level are generally not of struct ptdesc type, and do not have pt_rcu_head member, thus these architectures cannot support PT_RECLAIM. In preparation for enabling PT_RECLAIM on more architectures, convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config, so that we can make conditional judgments in Kconfig. Link: https://lkml.kernel.org/r/5ebfa3d4b56e63c6906bda5eccaa9f7194d3a86b.1769515122.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Andreas Larsson <andreas@gaisler.com> [sparc, UP&SMP] Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-06Merge branch 'pci/misc'Bjorn Helgaas-0/+2
- Fix documentation typos (Shawn Lin) - Add struct p2pdma_provider kernel doc (Leon Romanovsky) - Remove useless devres WARN_ON() (Philipp Stanner) * pci/misc: PCI: Remove useless WARN_ON() from devres PCI/P2PDMA: Add missing struct p2pdma_provider documentation Documentation: PCI: Fix typos in msi-howto.rst
2026-02-06Merge branch 'pci/controller/dwc'Bjorn Helgaas-0/+33
- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a pointer to the preceding Capability (Qiang Yu) - Add dw_pcie_remove_capability() and dw_pcie_remove_ext_capability() to remove Capabilities that are advertised but not fully implemented (Qiang Yu) - Remove MSI and MSI-X Capabilities for DWC controllers in platforms that can't support them, so we automatically fall back to INTx (Qiang Yu) - Remove MSI-X and DPC Capabilities for Qualcomm platforms that advertise but don't support them (Qiang Yu) - Remove duplicate dw_pcie_ep_hide_ext_capability() function and replace with dw_pcie_remove_ext_capability() (Qiang Yu) - Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status for drivers that support this (Shawn Lin) - Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up to avoid an unnecessary timeout (Manivannan Sadhasivam) - Revert dw-rockchip, qcom, and DWC core changes that used link-up IRQs to trigger enumeration instead of waiting for link to be up because the PCI core doesn't allocate bus number space for hierarchies that might be attached (Niklas Cassel) - Make endpoint iATU entry for MSI permanent instead of programming it dynamically, which is slow and racy with respect to other concurrent traffic, e.g., eDMA (Koichiro Den) - Use iMSI-RX MSI target address when possible to fix endpoints using 32-bit MSI (Shawn Lin) - Make dw_pcie_ltssm_status_string() available and use it for logging errors in dw_pcie_wait_for_link() (Manivannan Sadhasivam) - Return -ENODEV when dw_pcie_wait_for_link() finds no devices, -EIO for device present but inactive, -ETIMEDOUT for other failures, so callers can handle these cases differently (Manivannan Sadhasivam) - Allow DWC host controller driver probe to continue if device is not found or found but inactive; only fail when there's an error with the link (Manivannan Sadhasivam) - For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers are not accessible after PME_Turn_Off, simply wait 10ms instead of polling for L2/L3 Ready (Richard Zhu) - Use multiple iATU entries to map large bridge windows and DMA ranges when necessary instead of failing (Samuel Holland) - Rename struct dw_pcie_rp.has_msi_ctrl to .use_imsi_rx for clarity (Qiang Yu) - Add EPC dynamic_inbound_mapping feature bit for Endpoint Controllers that can update BAR inbound address translation without requiring EPF driver to clear/reset the BAR first, and advertise it for DWC-based Endpoints (Koichiro Den) - Add EPC subrange_mapping feature bit for Endpoint Controllers that can map multiple independent inbound regions in a single BAR, implement subrange mapping, advertise it for DWC-based Endpoints, and add Endpoint selftests for it (Koichiro Den) - Allow overriding default BAR sizes for pci-epf-test (Niklas Cassel) - Make resizable BARs work for Endpoint multi-PF configurations; previously it only worked for PF 0 (Aksh Garg) - Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings, and Address Match Mode (Aksh Garg) - Fix issues with outbound iATU index assignment that caused iATU index to be out of bounds (Niklas Cassel) - Clean up iATU index tracking to be consistent (Niklas Cassel) - Set up iATU when ECAM is enabled; previously IO and MEM outbound windows weren't programmed, and ECAM-related iATU entries weren't restored after suspend/resume, so config accesses failed (Krishna Chaitanya Chundru) * pci/controller/dwc: PCI: dwc: Fix missing iATU setup when ECAM is enabled PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup() PCI: dwc: Fix msg_atu_index assignment PCI: dwc: ep: Add comment explaining controller level PTM access in multi PF setup PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes selftests: pci_endpoint: Add BAR subrange mapping test case misc: pci_endpoint_test: Add BAR subrange mapping test case PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support Documentation: PCI: endpoint: Clarify pci_epc_set_bar() usage PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU PCI: dwc: Advertise dynamic inbound mapping support PCI: endpoint: Add BAR subrange mapping support PCI: endpoint: Add dynamic_inbound_mapping EPC feature PCI: dwc: Rename dw_pcie_rp::has_msi_ctrl to dw_pcie_rp::use_imsi_rx for clarity PCI: dwc: Fix grammar and formatting for comment in dw_pcie_remove_ext_capability() PCI: dwc: Use multiple iATU windows for mapping large bridge windows and DMA ranges PCI: dwc: Remove duplicate dw_pcie_ep_hide_ext_capability() function PCI: dwc: Skip waiting for L2/L3 Ready if dw_pcie_rp::skip_l23_wait is true PCI: dwc: Fail dw_pcie_host_init() if dw_pcie_wait_for_link() returns -ETIMEDOUT PCI: dwc: Rework the error print of dw_pcie_wait_for_link() PCI: dwc: Rename and move ltssm_status_string() to pcie-designware.c PCI: dwc: Return -EIO from dw_pcie_wait_for_link() if device is not active PCI: dwc: Return -ENODEV from dw_pcie_wait_for_link() if device is not found PCI: dwc: Use cfg0_base as iMSI-RX target address to support 32-bit MSI devices PCI: dwc: ep: Cache MSI outbound iATU mapping Revert "PCI: dwc: Don't wait for link up if driver can detect Link Up event" Revert "PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt" Revert "PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported" Revert "PCI: qcom: Don't wait for link if we can detect Link Up" Revert "PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ" Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up" PCI: dwc: Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up PCI: dw-rockchip: Change get_ltssm() to provide L1 Substates info PCI: dwc: Add L1 Substates context to ltssm_status of debugfs PCI: qcom: Remove DPC Extended Capability PCI: qcom: Remove MSI-X Capability for Root Ports PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller PCI: dwc: Add new APIs to remove standard and extended Capability PCI: Add preceding capability position support in PCI_FIND_NEXT_*_CAP macros
2026-02-06Merge branch 'pci/virtualization'Bjorn Helgaas-0/+1
- Mark ASM1164 SATA controller to avoid bus reset since it fails to train the Link after reset (Alex Williamson) - Mark Nvidia GB10 Root Ports to avoid bus reset since they may fail to retrain the link after reset (Johnny-CC Chang) - Add lockdep and other lock assertions (Ilpo Järvinen) - Add ACS quirk for Qualcomm Hamoa & Glymur, which provides ACS-like features but doesn't advertise an ACS Capability (Krishna Chaitanya Chundru) - Add ACS quirk for Pericom PI7C9X2G404 switches, which fail under load when P2P Redirect Request is enabled (Nicolas Cavallari) - Remove an incorrect unlock in pci_slot_trylock() error handling (Jinhui Guo) - Lock the bridge device for slot reset (Keith Busch) - Enable ACS after IOMMU configuration on OF platforms so ACS is enabled an all devices; previously the first device enumeration (typically a Root Port) was omitted (Manivannan Sadhasivam) - Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to work around hardware erratum; previously ACS SV was temporarily disabled, which worked for enumeration but not after reset (Manivannan Sadhasivam) * pci/virtualization: PCI: Disable ACS SV for IDT 0x8090 switch PCI: Disable ACS SV for IDT 0x80b5 switch PCI: Cache ACS Capabilities register PCI: Enable ACS after configuring IOMMU for OF platforms PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404] PCI: Add ACS quirk for Qualcomm Hamoa & Glymur PCI: Use device_lock_assert() to verify device lock is held PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held PCI: Fix pci_slot_lock () device locking PCI: Fix pci_slot_trylock() error handling PCI: Mark Nvidia GB10 to avoid bus reset PCI: Mark ASM1164 SATA controller to avoid bus reset
2026-02-06Merge branch 'pci/trace'Bjorn Helgaas-0/+136
- Add generic RAS tracepoint for hotplug events (Shuai Xue) - Add RAS tracepoint for link speed changes (Shuai Xue) * pci/trace: Documentation: tracing: Add PCI tracepoint documentation PCI: trace: Add RAS tracepoint to monitor link speed changes PCI: trace: Add generic RAS tracepoint for hotplug event
2026-02-06Merge branch 'pci/resource'Bjorn Helgaas-2/+11
- Build zero-sized resources when a BAR is larger than 4G but pci_bus_addr_t or resource_size_t can't represent 64-bit addresses (Ilpo Järvinen) - Fix bridge window alignment with optional resources, where we previously lost the additional alignment requirement (Ilpo Järvinen) - Stop over-estimating bridge window size since we now assign them without any gaps between them (Ilpo Järvinen) - Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening for nested bridges and endpoints (Ilpo Järvinen) - Remove old_size limit from bridge window sizing (Ilpo Järvinen) - Push realloc check into pbus_size_mem() to simplify callers (Ilpo Järvinen) - Pass bridge window resource to pbus_size_mem() to avoid looking it up again (Ilpo Järvinen) - Use res_to_dev_res() instead of open-coding the same search (Ilpo Järvinen) - Add pci_resource_is_bridge_win() helper (Ilpo Järvinen) - Add more logging of resource assignment (Ilpo Järvinen) - Add pbus_mem_size_optional() to handle sizes of optional resources (SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen) - Move CardBus code to setup-cardbus.c and only build it when CONFIG_CARDBUS is set (Ilpo Järvinen) - Use scnprintf() instead of sprintf() (Ilpo Järvinen) - Add pbus_validate_busn() for Bus Number validation (Ilpo Järvinen) - Don't claim disabled bridge windows to avoid spurious claim failures (Ilpo Järvinen) * pci/resource: PCI: Don't claim disabled bridge windows PCI: Move CardBus bridge scanning to setup-cardbus.c PCI: Add pbus_validate_busn() for Bus Number validation PCI: Add dword #defines for Bus Number + Secondary Latency Timer PCI: Use scnprintf() instead of sprintf() PCI: Handle CardBus-specific params in setup-cardbus.c PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS PCI: Add 'pci' prefix to struct pci_dev_resource handling functions PCI: Use resource_assigned() in setup-bus.c algorithm resource: Mark res given to resource_assigned() as const PCI: Add pbus_mem_size_optional() to handle optional sizes PCI: Check invalid align earlier in pbus_size_mem() PCI: Log reset and restore of resources PCI: Add pci_resource_is_bridge_win() PCI: Fetch dev_res to local var in __assign_resources_sorted() PCI: Use res_to_dev_res() in reassign_resources_sorted() PCI: Pass bridge window resource to pbus_size_mem() PCI: Push realloc check into pbus_size_mem() PCI: Remove old_size limit from bridge window sizing resource: Increase MAX_IORES_LEVEL to 8 PCI: Stop over-estimating bridge window size PCI: Rewrite bridge window head alignment function PCI: Fix bridge window alignment with optional resources PCI: Use resource_set_range() that correctly sets ->end
2026-02-06Merge branch 'pci/pwrctrl'Bjorn Helgaas-2/+15
- Rename pwrseq, tc9563, and slot driver structs, variables, and functions for consistency (Bjorn Helgaas) - Add power_on/off callbacks with generic signature to pwrseq, tc9563, and slot drivers so they can be used by pwrctrl core (Manivannan Sadhasivam) - Add interfaces to create and destroy pwrctrl devices (Krishna Chaitanya Chundru) - Add interfaces to power devices on and off (Manivannan Sadhasivam) - Switch to pwrctrl interfaces to create, destroy, and power on/off devices, calling them from host controller drivers instead of the PCI core (Manivannan Sadhasivam) - Drop qcom .assert_perst() callbacks since this is now done by the controller driver instead of the pwrctrl driver (Manivannan Sadhasivam) - Add PCIe M.2 connector support to the slot pwrctrl driver (Manivannan Sadhasivam) - Create pwrctrl devices for devicetree PCIe M.2 connector nodes (Manivannan Sadhasivam) * pci/pwrctrl: PCI/pwrctrl: Create pwrctrl device if graph port is found PCI/pwrctrl: Add PCIe M.2 connector support PCI: Drop the assert_perst() callback PCI: qcom: Drop the assert_perst() callbacks PCI/pwrctrl: Switch to pwrctrl create, power on/off, destroy APIs PCI/pwrctrl: Add APIs to power on/off pwrctrl devices PCI/pwrctrl: Add APIs to create, destroy pwrctrl devices PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks PCI/pwrctrl: pwrseq: Factor out power on/off code to helpers PCI/pwrctrl: slot: Factor out power on/off code to helpers PCI/pwrctrl: tc9563: Rename private struct and pointers for consistency PCI/pwrctrl: tc9563: Add local variables to reduce repetition PCI/pwrctrl: tc9563: Clean up whitespace PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter() PCI/pwrctrl: slot: Rename private struct and pointers for consistency PCI/pwrctrl: pwrseq: Rename private struct and pointers for consistency # Conflicts: # drivers/pci/bus.c
2026-02-06Merge branch 'pci/iommu'Bjorn Helgaas-0/+7
- Add PCI_BRIDGE_NO_ALIAS quirk for ASPEED AST1150, where VGA and USB are behind a PCIe-to-PCI bridge and share the same StreamID (Nirmoy Das) * pci/iommu: PCI: Add PCI_BRIDGE_NO_ALIAS quirk for ASPEED AST1150 PCI: Add ASPEED vendor ID to pci_ids.h
2026-02-06PCI/bwctrl: Disable BW controller on Intel P45 using a quirkIlpo Järvinen-0/+1
The commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") was found to lead to a boot hang on a Intel P45 system. Testing without setting Link Bandwidth Management Interrupt Enable (LBMIE) and Link Autonomous Bandwidth Interrupt Enable (LABIE) (PCIe r7.0, sec 7.5.3.7) in bwctrl allowed system to come up. P45 is a very old chipset and supports only up to gen2 PCIe, so not having bwctrl does not seem a huge deficiency. Add no_bw_notif in struct pci_dev and quirk Intel P45 Root Port with it. Reported-by: Adam Stylinski <kungfujesus06@gmail.com> Link: https://lore.kernel.org/linux-pci/aUCt1tHhm_-XIVvi@eggsbenedict/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Adam Stylinski <kungfujesus06@gmail.com> Link: https://patch.msgid.link/20260116131513.2359-1-ilpo.jarvinen@linux.intel.com
2026-02-06PCI: Cache ACS Capabilities registerManivannan Sadhasivam-0/+1
The ACS Capability register is read-only. Cache it to allow quirks to override it and to avoid re-reading it. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://patch.msgid.link/20260102-pci_acs-v3-2-72280b94d288@oss.qualcomm.com
2026-02-06bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}Amery Hung-2/+2
Take care of rqspinlock error in bpf_local_storage_{map_free, destroy}() properly by switching to bpf_selem_unlink_nofail(). Both functions iterate their own RCU-protected list of selems and call bpf_selem_unlink_nofail(). In map_free(), to prevent infinite loop when both map_free() and destroy() fail to remove a selem from b->list (extremely unlikely), switch to hlist_for_each_entry_rcu(). In destroy(), also switch to hlist_for_each_entry_rcu() since we no longer iterate local_storage->list under local_storage->lock. bpf_selem_unlink() now becomes dedicated to helpers and syscalls paths so reuse_now should always be false. Remove it from the argument and hardcode it. Acked-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-12-ameryhung@gmail.com
2026-02-06bpf: Support lockless unlink when freeing map or local storageAmery Hung-1/+8
Introduce bpf_selem_unlink_nofail() to properly handle errors returned from rqspinlock in bpf_local_storage_map_free() and bpf_local_storage_destroy() where the operation must succeeds. The idea of bpf_selem_unlink_nofail() is to allow an selem to be partially linked and use atomic operation on a bit field, selem->state, to determine when and who can free the selem if any unlink under lock fails. An selem initially is fully linked to a map and a local storage. Under normal circumstances, bpf_selem_unlink_nofail() will be able to grab locks and unlink a selem from map and local storage in sequeunce, just like bpf_selem_unlink(), and then free it after an RCU grace period. However, if any of the lock attempts fails, it will only clear SDATA(selem)->smap or selem->local_storage depending on the caller and set SELEM_MAP_UNLINKED or SELEM_STORAGE_UNLINKED according to the caller. Then, after both map_free() and destroy() see the selem and the state becomes SELEM_UNLINKED, one of two racing caller can succeed in cmpxchg the state from SELEM_UNLINKED to SELEM_TOFREE, ensuring no double free or memory leak. To make sure bpf_obj_free_fields() is done only once and when map is still present, it is called when unlinking an selem from b->list under b->lock. To make sure uncharging memory is done only when the owner is still present in map_free(), block destroy() from returning until there is no pending map_free(). Since smap may not be valid in destroy(), bpf_selem_unlink_nofail() skips bpf_selem_unlink_storage_nolock_misc() when called from destroy(). This is okay as bpf_local_storage_destroy() will return the remaining amount of memory charge tracked by mem_charge to the owner to uncharge. It is also safe to skip clearing local_storage->owner and owner_storage as the owner is being freed and no users or bpf programs should be able to reference the owner and using local_storage. Finally, access of selem, SDATA(selem)->smap and selem->local_storage are racy. Callers will protect these fields with RCU. Acked-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-11-ameryhung@gmail.com
2026-02-06bpf: Prepare for bpf_selem_unlink_nofail()Amery Hung-2/+3
The next patch will introduce bpf_selem_unlink_nofail() to handle rqspinlock errors. bpf_selem_unlink_nofail() will allow an selem to be partially unlinked from map or local storage. Save memory allocation method in selem so that later an selem can be correctly freed even when SDATA(selem)->smap is init to NULL. In addition, keep track of memory charge to the owner in local storage so that later bpf_selem_unlink_nofail() can return the correct memory charge to the owner. Updating local_storage->mem_charge is protected by local_storage->lock. Finally, extract miscellaneous tasks performed when unlinking an selem from local_storage into bpf_selem_unlink_storage_nolock_misc(). It will be reused by bpf_selem_unlink_nofail(). This patch also takes the chance to remove local_storage->smap, which is no longer used since commit f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage"). Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-10-ameryhung@gmail.com
2026-02-06bpf: Remove unused percpu counter from bpf_local_storage_map_freeAmery Hung-2/+1
Percpu locks have been removed from cgroup and task local storage. Now that all local storage no longer use percpu variables as locks preventing recursion, there is no need to pass them to bpf_local_storage_map_free(). Remove the argument from the function. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-9-ameryhung@gmail.com
2026-02-06bpf: Change local_storage->lock and b->lock to rqspinlockAmery Hung-2/+3
Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock from raw_spin_lock to rqspinlock. Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall return or BPF helper return. In bpf_local_storage_destroy(), ignore return from raw_res_spin_lock_irqsave() for now. A later patch will correctly handle errors correctly in bpf_local_storage_destroy() so that it can unlink selems even when failing to acquire locks. For __bpf_local_storage_map_cache(), instead of handling the error, skip updating the cache. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-6-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_unlink to failableAmery Hung-1/+1
To prepare changing both bpf_local_storage_map_bucket::lock and bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Open code bpf_selem_unlink_storage() in the only caller, bpf_selem_unlink(), since unlink_map and unlink_storage must be done together after all the necessary locks are acquired. For bpf_local_storage_map_free(), ignore the return from bpf_selem_unlink() for now. A later patch will allow it to unlink selems even when failing to acquire locks. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-5-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_link_map to failableAmery Hung-3/+3
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_link_map() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-4-ameryhung@gmail.com
2026-02-06bpf: Select bpf_local_storage_map_bucket based on bpf_local_storageAmery Hung-0/+1
A later bpf_local_storage refactor will acquire all locks before performing any update. To simplified the number of locks needed to take in bpf_local_storage_map_update(), determine the bucket based on the local_storage an selem belongs to instead of the selem pointer. Currently, when a new selem needs to be created to replace the old selem in bpf_local_storage_map_update(), locks of both buckets need to be acquired to prevent racing. This can be simplified if the two selem belongs to the same bucket so that only one bucket needs to be locked. Therefore, instead of hashing selem, hashing the local_storage pointer the selem belongs. Performance wise, this is slightly better as update now requires locking one bucket. It should not change the level of contention on one bucket as the pointers to local storages of selems in a map are just as unique as pointers to selems. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-2-ameryhung@gmail.com
2026-02-06Merge tag 'mm-hotfixes-stable-2026-02-06-12-37' of ↵Linus Torvalds-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "A couple of late-breaking MM fixes. One against a new-in-this-cycle patch and the other addresses a locking issue which has been there for over a year" * tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory-failure: reject unsupported non-folio compound page procfs: avoid fetching build ID while holding VMA lock
2026-02-06Merge tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-clientLinus Torvalds-0/+6
Pull ceph fixes from Ilya Dryomov: "One RBD and two CephFS fixes which address potential oopses. The RBD thing is more of a rare edge case that pops up in our CI, while the two CephFS scenarios are regressions that were reported by users and can be triggered trivially in normal operation. All marked for stable" * tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client: ceph: fix NULL pointer dereference in ceph_mds_auth_match() ceph: fix oops due to invalid pointer for kfree() in parse_longname() rbd: check for EOD after exclusive lock is ensured to be held
2026-02-06Merge tag 'dma-mapping-6.19-2026-02-06' of ↵Linus Torvalds-6/+19
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "Two minor fixes for the DMA-mapping subsystem: - check for the rare case of the allocation failure of the global CMA pool (Shanker Donthineni) - avoid perf buffer overflow when tracing large scatter-gather lists (Deepanshu Kartikey)" * tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma: contiguous: Check return value of dma_contiguous_reserve_area() tracing/dma: Cap dma_map_sg tracepoint arrays to prevent buffer overflow
2026-02-06landlock: Minor reword of docs for TCP access rightsMatthieu Buffet-8/+9
- Move ABI requirement next to each access right to prepare adding more access rights; - Mention the possibility to remove the random component of a socket's ephemeral port choice within the netns-wide ephemeral port range, since it allows choosing the "random" ephemeral port. Signed-off-by: Matthieu Buffet <matthieu@buffet.re> Link: https://lore.kernel.org/r/20251212163704.142301-2-matthieu@buffet.re Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-02-06landlock: Multithreading support for landlock_restrict_self()Günther Noack-0/+13
Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a given Landlock ruleset is applied to all threads of the calling process, instead of only the current one. Without this flag, multithreaded userspace programs currently resort to using the nptl(7)/libpsx hack for multithreaded policy enforcement, which is also used by libcap and for setuid(2). Using this userspace-based scheme, the threads of a process enforce the same Landlock policy, but the resulting Landlock domains are still separate. The domains being separate causes multiple problems: * When using Landlock's "scoped" access rights, the domain identity is used to determine whether an operation is permitted. As a result, when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads stops working. This is a problem for programming languages and frameworks which are inherently multithreaded (e.g. Go). * In audit logging, the domains of separate threads in a process will get logged with different domain IDs, even when they are based on the same ruleset FD, which might confuse users. Cc: Andrew G. Morgan <morgan@kernel.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Paul Moore <paul@paul-moore.com> Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com [mic: Fix restrict_self_flags test, clean up Makefile, allign comments, reduce local variable scope, add missing includes] Closes: https://github.com/landlock-lsm/linux/issues/2 Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-02-06Revert "revocable: Revocable resource management"Johan Hovold-89/+0
This reverts commit 62eb557580eb2177cf16c3fd2b6efadff297b29a. The revocable implementation uses two separate abstractions, struct revocable_provider and struct revocable, in order to store the SRCU read lock index which must be passed unaltered to srcu_read_unlock() in the same context when a resource is no longer needed. With the merged revocable API, multiple threads could however share the same struct revocable and therefore potentially overwrite the SRCU index of another thread which can cause the SRCU synchronisation in revocable_provider_revoke() to never complete. [1] An example revocable conversion of the gpiolib code also turned out to be fundamentally flawed and could lead to use-after-free. [2] An attempt to address both issues was quickly put together and merged, but revocable is still fundamentally broken. [3] Specifically, the latest design relies on RCU for storing a pointer to the revocable provider, but since the resource can be shared by value (e.g. as in the now reverted selftests) this does not work at all and can also lead to use-after-free: static void revocable_provider_release(struct kref *kref) { struct revocable_provider *rp = container_of(kref, struct revocable_provider, kref); cleanup_srcu_struct(&rp->srcu); kfree_rcu(rp, rcu); } void revocable_provider_revoke(struct revocable_provider __rcu **rp_ptr) { struct revocable_provider *rp; rp = rcu_replace_pointer(*rp_ptr, NULL, 1); ... kref_put(&rp->kref, revocable_provider_release); } int revocable_init(struct revocable_provider __rcu *_rp, struct revocable *rev) { struct revocable_provider *rp; ... scoped_guard(rcu) { rp = rcu_dereference(_rp); if (!rp) return -ENODEV; if (!kref_get_unless_zero(&rp->kref)) return -ENODEV; } ... } producer: priv->rp = revocable_provider_alloc(&priv->res); // pass priv->rp by value to consumer revocable_provider_revoke(&priv->rp); consumer: struct revocable_provider __rcu *rp = filp->private_data; struct revocable *rev; revocable_init(rp, &rev); as _rp would still be non-NULL in revocable_init() regardless of whether the producer has revoked the resource and set its pointer to NULL. Essentially revocable still relies on having a pointer to reference counted driver data which holds the revocable provider, which makes all the RCU protection unnecessary along with most of the current revocable design and implementation. As the above shows, and as has been pointed out repeatedly elsewhere, these kind of issues are not something that should be addressed incrementally. [4] Revert the revocable implementation until a redesign has been proposed and evaluated properly. Link: https://lore.kernel.org/all/20260124170535.11756-4-johan@kernel.org/ [1] Link: https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/ [2] Link: https://lore.kernel.org/all/20260129143733.45618-1-tzungbi@kernel.org/ [3] Link: https://lore.kernel.org/all/aXobzoeooJqxMkEj@hovoldconsulting.com/ [4] Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260204142849.22055-4-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06io_uring: allow registration of per-task restrictionsJens Axboe-0/+9
Currently io_uring supports restricting operations on a per-ring basis. To use those, the ring must be setup in a disabled state by setting IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and the ring can then be enabled. This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd == -1, like the other "blind" register opcodes which work on the task rather than a specific ring. This allows registration of the same kind of restrictions as can been done on a specific ring, but with the task itself. Once done, any ring created will inherit these restrictions. If a restriction filter is registered with a task, then it's inherited on fork for its children. Children may only further restrict operations, not extend them. Inheriting restrictions include both the classic IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF filters that have been registered with the task via IORING_REGISTER_BPF_FILTER. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-06io_uring: add task fork hookJens Axboe-1/+14
Called when copy_process() is called to copy state to a new child. Right now this is just a stub, but will be used shortly to properly handle fork'ing of task based io_uring restrictions. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-06irqchip/riscv-imsic: Adjust the number of available guest irq filesXu Lu-0/+3
Currently, KVM assumes the minimum of implemented HGEIE bits and "BIT(gc->guest_index_bits) - 1" as the number of guest files available across all CPUs. This will not work when CPUs have different number of guest files because KVM may incorrectly allocate a guest file on a CPU with fewer guest files. To address above, during initialization, calculate the number of available guest interrupt files according to MMIO resources and constrain the number of guest interrupt files that can be allocated by KVM. Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Link: https://lore.kernel.org/r/20260104133457.57742-1-luxu.kernel@bytedance.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-02-06netfilter: nft_set_rbtree: validate open interval overlapPablo Neira Ayuso-0/+4
Open intervals do not have an end element, in particular an open interval at the end of the set is hard to validate because of it is lacking the end element, and interval validation relies on such end element to perform the checks. This patch adds a new flag field to struct nft_set_elem, this is not an issue because this is a temporary object that is allocated in the stack from the insert/deactivate path. This flag field is used to specify that this is the last element in this add/delete command. The last flag is used, in combination with the start element cookie, to check if there is a partial overlap, eg. Already exists: 255.255.255.0-255.255.255.254 Add interval: 255.255.255.0-255.255.255.255 ~~~~~~~~~~~~~ start element overlap Basically, the idea is to check for an existing end element in the set if there is an overlap with an existing start element. However, the last open interval can come in any position in the add command, the corner case can get a bit more complicated: Already exists: 255.255.255.0-255.255.255.254 Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254 ~~~~~~~~~~~~~ start element overlap To catch this overlap, annotate that the new start element is a possible overlap, then report the overlap if the next element is another start element that confirms that previous element in an open interval at the end of the set. For deletions, do not update the start cookie when deleting an open interval, otherwise this can trigger spurious EEXIST when adding new elements. Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would make easier to detect open interval overlaps. Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: nft_counter: fix reset of counters on 32bit archsAnders Grahn-0/+10
nft_counter_reset() calls u64_stats_add() with a negative value to reset the counter. This will work on 64bit archs, hence the negative value added will wrap as a 64bit value which then can wrap the stat counter as well. On 32bit archs, the added negative value will wrap as a 32bit value and _not_ wrapping the stat counter properly. In most cases, this would just lead to a very large 32bit value being added to the stat counter. Fix by introducing u64_stats_sub(). Fixes: 4a1d3acd6ea8 ("netfilter: nft_counter: Use u64_stats_t for statistic.") Signed-off-by: Anders Grahn <anders.grahn@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentationFlorian Westphal-0/+1
Ulrich reports a regression with nfqueue: If an application did not set the 'F_GSO' capability flag and a gso packet with an unconfirmed nf_conn entry is received all packets are now dropped instead of queued, because the check happens after skb_gso_segment(). In that case, we did have exclusive ownership of the skb and its associated conntrack entry. The elevated use count is due to skb_clone happening via skb_gso_segment(). Move the check so that its peformed vs. the aggregated packet. Then, annotate the individual segments except the first one so we can do a 2nd check at reinject time. For the normal case, where userspace does in-order reinjects, this avoids packet drops: first reinjected segment continues traversal and confirms entry, remaining segments observe the confirmed entry. While at it, simplify nf_ct_drop_unconfirmed(): We only care about unconfirmed entries with a refcnt > 1, there is no need to special-case dying entries. This only happens with UDP. With TCP, the only unconfirmed packet will be the TCP SYN, those aren't aggregated by GRO. Next patch adds a udpgro test case to cover this scenario. Reported-by: Ulrich Weber <ulrich.weber@gmail.com> Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks") Signed-off-by: Florian Westphal <fw@strlen.de>
2026-02-06pinctrl: generic: move function to amlogic-am4 driverConor Dooley-5/+0
pinconf_generic_dt_node_to_map_pinmux() is not actually a generic function, and really belongs in the amlogic-am4 driver. There are three reasons why. First, and least, of the reasons is that this function behaves differently to the other dt_node_to_map functions in a way that is not obvious from a first glance. This difference stems for the devicetree properties that the function is intended for use with, and how they are typically used. The other generic dt_node_to_map functions support platforms where the pins, groups and functions are described statically in the driver and require a function that will produce a mapping from dt nodes to these pre-established descriptions. No other code in the driver is require to be executed at runtime. pinconf_generic_dt_node_to_map_pinmux() on the other hand is intended for use with the pinmux property, where groups and functions are determined entirely from the devicetree. As a result, there are no statically defined groups and functions in the driver for this function to perform a mapping to. Other drivers that use the pinmux property (e.g. the k1) their dt_node_to_map function creates the groups and functions as the devicetree is parsed. Instead of that, pinconf_generic_dt_node_to_map_pinmux() requires that the devicetree is parsed twice, once by it and once at probe, so that the driver dynamically creates the groups and functions before the dt_node_to_map callback is executed. I don't believe this double parsing requirement is how developers would expect this to work and is not necessary given there are drivers that do not have this behaviour. Secondly and thirdly, the function bakes in some assumptions that only really match the amlogic platform about how the devicetree is constructed. These, to me, are problematic for something that claims to be generic. The other dt_node_to_map implementations accept a being called for either a node containing pin configuration properties or a node containing child nodes that each contain the configuration properties. IOW, they support the following two devicetree configurations: | cfg { | label: group { | pinmux = <asjhdasjhlajskd>; | config-item1; | }; | }; | label: cfg { | group1 { | pinmux = <dsjhlfka>; | config-item2; | }; | group2 { | pinmux = <lsdjhaf>; | config-item1; | }; | }; pinconf_generic_dt_node_to_map_pinmux() only supports the latter. The other assumption about devicetree configuration that the function makes is that the labeled node's parent is a "function node". The amlogic driver uses these "function nodes" to create the functions at probe time, and pinconf_generic_dt_node_to_map_pinmux() finds the parent of the node it is operating on's name as part of the mapping. IOW, it requires that the devicetree look like: | pinctrl@bla { | | func-foo { | label: group-default { | pinmuxes = <lskdf>; | }; | }; | }; and couldn't be used if the nodes containing the pinmux and configuration properties are children of the pinctrl node itself: | pinctrl@bla { | | label: group-default { | pinmuxes = <lskdf>; | }; | }; These final two reasons are mainly why I believe this is not suitable as a generic function, and should be moved into the driver that is the sole user and originator of the "generic" function. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
2026-02-06hwrng: core - use RCU and work_struct to fix race conditionLianjie Wang-0/+2
Currently, hwrng_fill is not cleared until the hwrng_fillfn() thread exits. Since hwrng_unregister() reads hwrng_fill outside the rng_mutex lock, a concurrent hwrng_unregister() may call kthread_stop() again on the same task. Additionally, if hwrng_unregister() is called immediately after hwrng_register(), the stopped thread may have never been executed. Thus, hwrng_fill remains dirty even after hwrng_unregister() returns. In this case, subsequent calls to hwrng_register() will fail to start new threads, and hwrng_unregister() will call kthread_stop() on the same freed task. In both cases, a use-after-free occurs: refcount_t: addition on 0; use-after-free. WARNING: ... at lib/refcount.c:25 refcount_warn_saturate+0xec/0x1c0 Call Trace: kthread_stop+0x181/0x360 hwrng_unregister+0x288/0x380 virtrng_remove+0xe3/0x200 This patch fixes the race by protecting the global hwrng_fill pointer inside the rng_mutex lock, so that hwrng_fillfn() thread is stopped only once, and calls to kthread_run() and kthread_stop() are serialized with the lock held. To avoid deadlock in hwrng_fillfn() while being stopped with the lock held, we convert current_rng to RCU, so that get_current_rng() can read current_rng without holding the lock. To remove the lock from put_rng(), we also delay the actual cleanup into a work_struct. Since get_current_rng() no longer returns ERR_PTR values, the IS_ERR() checks are removed from its callers. With hwrng_fill protected by the rng_mutex lock, hwrng_fillfn() can no longer clear hwrng_fill itself. Therefore, if hwrng_fillfn() returns directly after current_rng is dropped, kthread_stop() would be called on a freed task_struct later. To fix this, hwrng_fillfn() calls schedule() now to keep the task alive until being stopped. The kthread_stop() call is also moved from hwrng_unregister() to drop_current_rng(), ensuring kthread_stop() is called on all possible paths where current_rng becomes NULL, so that the thread would not wait forever. Fixes: be4000bc4644 ("hwrng: create filler thread") Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Lianjie Wang <karin0.zst@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-02-06KVM: s390: Add explicit padding to struct kvm_s390_keyopArnd Bergmann-0/+1
The newly added structure causes a warning about implied padding: ./usr/include/linux/kvm.h:1247:1: error: padding struct size to alignment boundary with 6 bytes [-Werror=padded] The padding can lead to leaking kernel data and ABI incompatibilies when used on x86. Neither of these is a problem in this specific patch, but it's best to avoid it and use explicit padding fields in general. Fixes: 0ee4ddc1647b ("KVM: s390: Storage key manipulation IOCTL") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-06Merge branches 'fixes', 'arm/smmu/updates', 'intel/vt-d', 'amd/amd-vi' and ↵Joerg Roedel-0/+100
'core' into next
2026-02-06vsnprintf: drop __printf() attributes on binary printing functionsArnd Bergmann-3/+2
The printf() format attributes are applied inconsistently for the binary printf helpers, which causes warnings for the bpf_trace code using them from functions that pass down format strings: kernel/trace/bpf_trace.c: In function '____bpf_trace_printk': kernel/trace/bpf_trace.c:377:9: error: function '____bpf_trace_printk' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format] 377 | ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt, data.bin_args); | ^~~ This can be addressed either by annotating all five callers in bpf code, or by removing the annotations on the callees that were added by Andy Shevchenko last year. As Alexei Starovoitov points out, there are no callers in C code that would benefit from the __printf attributes, the only users are in BPF code or in the do_trace_printk() helper that already checks the arguments. Drop all three of these annotations, reverting the earlierl commits that added these, in order to get a clean build with -Wsuggest-attribute=format. Fixes: 6b2c1e30ad68 ("seq_file: Mark binary printing functions with __printf() attribute") Fixes: 7bf819aa992f ("vsnprintf: Mark binary printing functions with __printf() attribute") Link: https://lore.kernel.org/all/CAADnVQK3eZp3yp35OUx8j1UBsQFhgsn5-4VReqAJ=68PaaKYmg@mail.gmail.com/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512061640.9hKTnB8p-lkp@intel.com/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260204132643.1302967-1-arnd@kernel.org Signed-off-by: Petr Mladek <pmladek@suse.com>
2026-02-05net/mlx5: Fix 1600G link mode enum namingYael Chemla-1/+1
Rename TAUI/TBASE to GAUI/GBASE in 1600G link mode identifier and its usage in ethtool and link-info tables. Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20260204194324.1723534-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05procfs: avoid fetching build ID while holding VMA lockAndrii Nakryiko-0/+3
Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock or per-VMA lock, whichever was used to lock VMA under question, to avoid deadlock reported by syzbot: -> #1 (&mm->mmap_lock){++++}-{4:4}: __might_fault+0xed/0x170 _copy_to_iter+0x118/0x1720 copy_page_to_iter+0x12d/0x1e0 filemap_read+0x720/0x10a0 blkdev_read_iter+0x2b5/0x4e0 vfs_read+0x7f4/0xae0 ksys_read+0x12a/0x250 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: __lock_acquire+0x1509/0x26d0 lock_acquire+0x185/0x340 down_read+0x98/0x490 blkdev_read_iter+0x2a7/0x4e0 __kernel_read+0x39a/0xa90 freader_fetch+0x1d5/0xa80 __build_id_parse.isra.0+0xea/0x6a0 do_procmap_query+0xd75/0x1050 procfs_procmap_ioctl+0x7a/0xb0 __x64_sys_ioctl+0x18e/0x210 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&mm->mmap_lock); lock(&sb->s_type->i_mutex_key#8); lock(&mm->mmap_lock); rlock(&sb->s_type->i_mutex_key#8); *** DEADLOCK *** This seems to be exacerbated (as we haven't seen these syzbot reports before that) by the recent: 777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context") To make this safe, we need to grab file refcount while VMA is still locked, but other than that everything is pretty straightforward. Internal build_id_parse() API assumes VMA is passed, but it only needs the underlying file reference, so just add another variant build_id_parse_file() that expects file passed directly. [akpm@linux-foundation.org: fix up kerneldoc] Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org Fixes: ed5d583a88a9 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-9/+53
Cross-merge networking fixes after downstream PR (net-6.19-rc9). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c 3125fc1701694 ("net: spacemit: k1-emac: fix jumbo frame support") f66086798f91f ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05Merge tag 'net-6.19-rc9' of ↵Linus Torvalds-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and Netfilter. Previous releases - regressions: - eth: stmmac: fix stm32 (and potentially others) resume regression - nf_tables: fix inverted genmask check in nft_map_catchall_activate() - usb: r8152: fix resume reset deadlock - fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for RSS contexts Previous releases - always broken: - sched: cls_u32: use skb_header_pointer_careful() to avoid OOB reads with malicious u32 rules - eth: ice: timestamping related fixes" * tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate() net: cpsw: Execute ndo_set_rx_mode callback in a work queue net: cpsw_new: Execute ndo_set_rx_mode callback in a work queue gve: Correct ethtool rx_dropped calculation gve: Fix stats report corruption on queue count change selftest: net: add a test-case for encap segmentation after GRO net: gro: fix outer network offset net: add proper RCU protection to /proc/net/ptype net: ethernet: adi: adin1110: Check return value of devm_gpiod_get_optional() in adin1110_check_spi() wifi: iwlwifi: mvm: pause TCM on fast resume wifi: iwlwifi: mld: cancel mlo_scan_start_wk net: spacemit: k1-emac: fix jumbo frame support net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4 net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4 net: enetc: Remove CBDR cacheability AXI settings for ENETC v4 net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4 tipc: use kfree_sensitive() for session key material net: stmmac: fix stm32 (and potentially others) resume regression net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts ...
2026-02-05net/sched: don't use dynamic lockdep keys with clsact/ingress/noqueueDavide Caratti-0/+24
Currently we are registering one dynamic lockdep key for each allocated qdisc, to avoid false deadlock reports when mirred (or TC eBPF) redirects packets to another device while the root lock is acquired [1]. Since dynamic keys are a limited resource, we can save them at least for qdiscs that are not meant to acquire the root lock in the traffic path, or to carry traffic at all, like: - clsact - ingress - noqueue Don't register dynamic keys for the above schedulers, so that we hit MAX_LOCKDEP_KEYS later in our tests. [1] https://github.com/multipath-tcp/mptcp_net-next/issues/451 Changes in v2: - change ordering of spin_lock_init() vs. lockdep_register_key() (Jakub Kicinski) Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://patch.msgid.link/94448f7fa7c4f52d2ce416a4895ec87d456d7417.1770220576.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05tcp: move __reqsk_free() out of lineEric Dumazet-8/+1
Inlining __reqsk_free() is overkill, let's reclaim 2 Kbytes of text. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 2/4 grow/shrink: 2/14 up/down: 225/-2338 (-2113) Function old new delta __reqsk_free - 114 +114 sock_edemux 18 82 +64 inet_csk_listen_start 233 264 +31 __pfx___reqsk_free - 16 +16 __pfx_reqsk_queue_alloc 16 - -16 __pfx_reqsk_free 16 - -16 reqsk_queue_alloc 46 - -46 tcp_req_err 272 177 -95 reqsk_fastopen_remove 348 253 -95 cookie_bpf_check 157 62 -95 cookie_tcp_reqsk_alloc 387 290 -97 cookie_v4_check 1568 1465 -103 reqsk_free 105 - -105 cookie_v6_check 1519 1412 -107 sock_gen_put 187 78 -109 sock_pfree 212 82 -130 tcp_try_fastopen 1818 1683 -135 tcp_v4_rcv 3478 3294 -184 reqsk_put 306 90 -216 tcp_get_cookie_sock 551 318 -233 tcp_v6_rcv 3404 3141 -263 tcp_conn_request 2677 2384 -293 Total: Before=24887415, After=24885302, chg -0.01% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05inet: move reqsk_queue_alloc() to net/ipv4/inet_connection_sock.cEric Dumazet-2/+0
Only called once from inet_csk_listen_start(), it can be static. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204055147.1682705-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05ASoC: cs35l56: Support for reading speaker ID from on-chip GPIOsRichard Fitzgerald-0/+37
Add support for using the state of pins on the amplifier to indicate the type of speaker fitted. Previously, where there were alternate speaker vendors, this was indicated using host CPU GPIOs. Some new Dell models use spare pins on the CS35L63 as GPIOs for the speaker ID detection. Cirrus-specific SDCA Disco properties provide a list of the pins to be used, and pull-up/down settings for the pads. This list is ordered, MSbit to LSbit. The code to set the firmware filename has been modified to check for using chip pins for speaker ID. The entire block of code to set firmware name has been moved out of cs35l56_component_probe() into its own function to make it easier to KUnit test. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20260205164838.1611295-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-02-05usb: phy: tegra: parametrize PORTSC1 register offsetSvyatoslav Ryhel-0/+2
The PORTSC1 register has a different offset in Tegra20 compared to Tegra30+, yet they share a crucial set of registers required for HSIC functionality. Reflect this register offset change in the SoC config. Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Link: https://patch.msgid.link/20260202080526.23487-5-clamor95@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-05usb: phy: tegra: parametrize HSIC PTS valueSvyatoslav Ryhel-0/+2
The parallel transceiver select used in HSIC mode differs on Tegra20, where it uses the UTMI value (0), whereas Tegra30+ uses a dedicated HSIC value. Reflect this in the SoC config. Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Link: https://patch.msgid.link/20260202080526.23487-4-clamor95@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-05usb: phy: tegra: cosmetic fixesSvyatoslav Ryhel-2/+2
Change TEGRA_USB_HOSTPC1_DEVLC_PTS_HSIC to its literal value instead of using the BIT macro, as it is an enumeration. Correct the spelling in the comment and rename uhsic_registers_shift to uhsic_registers_offset. These changes are cosmetic and do not affect code behavior. Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Link: https://patch.msgid.link/20260202080526.23487-2-clamor95@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-05livepatch: Fix having __klp_objects relics in non-livepatch modulesPetr Pavlu-0/+3
The linker script scripts/module.lds.S specifies that all input __klp_objects sections should be consolidated into an output section of the same name, and start/stop symbols should be created to enable scripts/livepatch/init.c to locate this data. This start/stop pattern is not ideal for modules because the symbols are created even if no __klp_objects input sections are present. Consequently, a dummy __klp_objects section also appears in the resulting module. This unnecessarily pollutes non-livepatch modules. Instead, since modules are relocatable files, the usual method for locating consolidated data in a module is to read its section table. This approach avoids the aforementioned problem. The klp_modinfo already stores a copy of the entire section table with the final addresses. Introduce a helper function that scripts/livepatch/init.c can call to obtain the location of the __klp_objects section from this data. Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Link: https://patch.msgid.link/20260123102825.3521961-2-petr.pavlu@suse.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2026-02-05net: add vlan_get_protocol_offset_inline() helperEric Dumazet-29/+22
skb_protocol() is bloated, and forces slow stack canaries in many fast paths. Add vlan_get_protocol_offset_inline() which deals with the non-vlan common cases. __vlan_get_protocol_offset() is now out of line. It returns a vlan_type_depth struct to avoid stack canaries in callers. struct vlan_type_depth { __be16 type; u16 depth; }; $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 0/22 up/down: 0/-6320 (-6320) Function old new delta vlan_get_protocol_dgram 61 59 -2 __pfx_skb_protocol 16 - -16 __vlan_get_protocol_offset 307 273 -34 tap_get_user 1374 1207 -167 ip_md_tunnel_xmit 1625 1452 -173 tap_sendmsg 940 753 -187 netif_skb_features 1079 866 -213 netem_enqueue 3017 2800 -217 vlan_parse_protocol 271 50 -221 tso_start 567 344 -223 fq_dequeue 1908 1685 -223 skb_network_protocol 434 205 -229 ip6_tnl_xmit 2639 2409 -230 br_dev_queue_push_xmit 474 236 -238 skb_protocol 258 - -258 packet_parse_headers 621 357 -264 __ip6_tnl_rcv 1306 1039 -267 skb_csum_hwoffload_help 515 224 -291 ip_tunnel_xmit 2635 2339 -296 sch_frag_xmit_hook 1582 1233 -349 bpf_skb_ecn_set_ce 868 457 -411 IP6_ECN_decapsulate 1297 768 -529 ip_tunnel_rcv 2121 1489 -632 ipip6_rcv 2572 1922 -650 Total: Before=24892803, After=24886483, chg -0.03% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260204053023.1622775-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>