summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorLines
2026-04-10bpf: Use kmalloc_nolock() universally in local storageAmery Hung-6/+2
Switch to kmalloc_nolock() universally in local storage. Socket local storage didn't move to kmalloc_nolock() when BPF memory allocator was replaced by it for performance reasons. Now that kfree_rcu() supports freeing memory allocated by kmalloc_nolock(), we can move the remaining local storages to use kmalloc_nolock() and cleanup the cluttered free paths. Use kfree() instead of kfree_nolock() in bpf_selem_free_trace_rcu() and bpf_local_storage_free_trace_rcu(). Both callbacks run in process context where spinning is allowed, so kfree_nolock() is unnecessary. Benchmark: ./bench -p 1 local-storage-create --storage-type socket \ --batch-size {16,32,64} The benchmark is a microbenchmark stress-testing how fast local storage can be created. There is no measurable throughput change for socket local storage after switching from kzalloc() to kmalloc_nolock(). Socket local storage batch creation speed diff --------------- ---- ------------------ ---- Baseline 16 433.9 ± 0.6 k/s 32 434.3 ± 1.4 k/s 64 434.2 ± 0.7 k/s After 16 439.0 ± 1.9 k/s +1.2% 32 437.3 ± 2.0 k/s +0.7% 64 435.8 ± 2.5k/s +0.4% Also worth noting that the baseline got a 5% throughput boost when sheaf replaces percpu partial slab recently [0]. [0] https://lore.kernel.org/bpf/20260123-sheaves-for-all-v4-0-041323d506f7@suse.cz/ Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260411015419.114016-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10Merge tag 'riscv-for-linus-v7.0-rc8' of ↵Linus Torvalds-25/+18
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: "Before v7.0 is released, fix a few issues with the CFI patchset, merged earlier in v7.0-rc, that primarily affect interfaces to non-kernel code: - Improve the prctl() interface for per-task indirect branch landing pad control to expand abbreviations and to resemble the speculation control prctl() interface - Expand the "LP" and "SS" abbreviations in the ptrace uapi header file to "branch landing pad" and "shadow stack", to improve readability - Fix a typo in a CFI-related macro name in the ptrace uapi header file - Ensure that the indirect branch tracking state and shadow stack state are unlocked immediately after an exec() on the new task so that libc subsequently can control it - While working in this area, clean up the kernel-internal, cross-architecture prctl() function names by expanding the abbreviations mentioned above" * tag 'riscv-for-linus-v7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: prctl: cfi: change the branch landing pad prctl()s to be more descriptive riscv: ptrace: cfi: expand "SS" references to "shadow stack" in uapi headers prctl: rename branch landing pad implementation functions to be more explicit riscv: ptrace: expand "LP" references to "branch landing pads" in uapi headers riscv: cfi: clear CFI lock status in start_thread() riscv: ptrace: cfi: fix "PRACE" typo in uapi header
2026-04-10net: bridge: add stp_mode attribute for STP mode selectionAndy Roulin-0/+39
The bridge-stp usermode helper is currently restricted to the initial network namespace, preventing userspace STP daemons (e.g. mstpd) from operating on bridges in other network namespaces. Since commit ff62198553e4 ("bridge: Only call /sbin/bridge-stp for the initial network namespace"), bridges in non-init namespaces silently fall back to kernel STP with no way to use userspace STP. Add a new bridge attribute IFLA_BR_STP_MODE that allows explicit per-bridge control over STP mode selection: BR_STP_MODE_AUTO (default) - Existing behavior: invoke the /sbin/bridge-stp helper in init_net only; fall back to kernel STP if it fails or in non-init namespaces. BR_STP_MODE_USER - Directly enable userspace STP (BR_USER_STP) without invoking the helper. Works in any network namespace. Userspace is responsible for ensuring an STP daemon manages the bridge. BR_STP_MODE_KERNEL - Directly enable kernel STP (BR_KERNEL_STP) without invoking the helper. The mode can only be changed while STP is disabled, or set to the same value (-EBUSY otherwise). IFLA_BR_STP_MODE is processed before IFLA_BR_STP_STATE in br_changelink(), so both can be set atomically in a single netlink message. The mode can also be changed in the same message that disables STP. The stp_mode struct field is u8 since all possible values fit, while NLA_U32 is used for the netlink attribute since it occupies the same space in the netlink message as NLA_U8. A new stp_helper_active boolean tracks whether the /sbin/bridge-stp helper was invoked during br_stp_start(), so that br_stp_stop() only calls the helper for stop when it was called for start. This avoids calling the helper asymmetrically when stp_mode changes between start and stop. Suggested-by: Ido Schimmel <idosch@nvidia.com> Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Andy Roulin <aroulin@nvidia.com> Link: https://patch.msgid.link/20260405205224.3163000-2-aroulin@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-10PCI/P2PDMA: Add Google SoCs to the P2P DMA host bridge listJacob Moroni-0/+2
All Google SoCs support peer-to-peer DMA between Root Ports, so add a wildcard rule to the host bridge list. Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: David Hu <xuehaohu@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Link: https://patch.msgid.link/20260409150123.3538444-2-jmoroni@google.com
2026-04-10bpf: poison dead stack slotsAlexei Starovoitov-0/+1
As a sanity check poison stack slots that stack liveness determined to be dead, so that any read from such slots will cause program rejection. If stack liveness logic is incorrect the poison can cause valid program to be rejected, but it also will prevent unsafe program to be accepted. Allow global subprogs "read" poisoned stack slots. The static stack liveness determined that subprog doesn't read certain stack slots, but sizeof(arg_type) based global subprog validation isn't accurate enough to know which slots will actually be read by the callee, so it needs to check full sizeof(arg_type) at the caller. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-14-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: change logging scheme for live stack analysisEduard Zingerman-5/+0
Instead of breadcrumbs like: (d2,cs15) frame 0 insn 18 +live -16 (d2,cs15) frame 0 insn 17 +live -16 Print final accumulated stack use/def data per-func_instance per-instruction. printed func_instance's are ordered by callsite and depth. For example: stack use/def subprog#0 shared_instance_must_write_overwrite (d0,cs0): 0: (b7) r1 = 1 1: (7b) *(u64 *)(r10 -8) = r1 ; def: fp0-8 2: (7b) *(u64 *)(r10 -16) = r1 ; def: fp0-16 3: (bf) r1 = r10 4: (07) r1 += -8 5: (bf) r2 = r10 6: (07) r2 += -16 7: (85) call pc+7 ; use: fp0-8 fp0-16 8: (bf) r1 = r10 9: (07) r1 += -16 10: (bf) r2 = r10 11: (07) r2 += -8 12: (85) call pc+2 ; use: fp0-8 fp0-16 13: (b7) r0 = 0 14: (95) exit stack use/def subprog#1 forwarding_rw (d1,cs7): 15: (85) call pc+1 ; use: fp0-8 fp0-16 16: (95) exit stack use/def subprog#1 forwarding_rw (d1,cs12): 15: (85) call pc+1 ; use: fp0-8 fp0-16 16: (95) exit stack use/def subprog#2 write_first_read_second (d2,cs15): 17: (7a) *(u64 *)(r1 +0) = 42 18: (79) r0 = *(u64 *)(r2 +0) ; use: fp0-8 fp0-16 19: (95) exit For groups of three or more consecutive stack slots, abbreviate as follows: 25: (85) call bpf_loop#181 ; use: fp2-8..-512 fp1-8..-512 fp0-8..-512 Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-10-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: simplify liveness to use (callsite, depth) keyed func_instancesEduard Zingerman-19/+0
Rework func_instance identification and remove the dynamic liveness API, completing the transition to fully static stack liveness analysis. Replace callchain-based func_instance keys with (callsite, depth) pairs. The full callchain (all ancestor callsites) is no longer part of the hash key; only the immediate callsite and the call depth matter. This does not lose precision in practice and simplifies the data structure significantly: struct callchain is removed entirely, func_instance stores just callsite, depth. Drop must_write_acc propagation. Previously, must_write marks were accumulated across successors and propagated to the caller via propagate_to_outer_instance(). Instead, callee entry liveness (live_before at subprog start) is pulled directly back to the caller's callsite in analyze_subprog() after each callee returns. Since (callsite, depth) instances are shared across different call chains that invoke the same subprog at the same depth, must_write marks from one call may be stale for another. To handle this, analyze_subprog() records into a fresh_instance() when the instance was already visited (must_write_initialized), then merge_instances() combines the results: may_read is unioned, must_write is intersected. This ensures only slots written on ALL paths through all call sites are marked as guaranteed writes. This replaces commit_stack_write_marks() logic. Skip recursive descent into callees that receive no FP-derived arguments (has_fp_args() check). This is needed because global subprogram calls can push depth beyond MAX_CALL_FRAMES (max depth is 64 for global calls but only 8 frames are accommodated for FP passing). It also handles the case where a callback subprog cannot be determined by argument tracking: such callbacks will be processed by analyze_subprog() at depth 0 independently. Update lookup_instance() (used by is_live_before queries) to search for the func_instance with maximal depth at the corresponding callsite, walking depth downward from frameno to 0. This accounts for the fact that instance depth no longer corresponds 1:1 to bpf_verifier_state->curframe, since skipped non-FP calls create gaps. Remove the dynamic public liveness API from verifier.c: - bpf_mark_stack_{read,write}(), bpf_reset/commit_stack_write_marks() - bpf_update_live_stack(), bpf_reset_live_stack_callchain() - All call sites in check_stack_{read,write}_fixed_off(), check_stack_range_initialized(), mark_stack_slot_obj_read(), mark/unmark_stack_slots_{dynptr,iter,irq_flag}() - The per-instruction write mark accumulation in do_check() - The bpf_update_live_stack() call in prepare_func_exit() mark_stack_read() and mark_stack_write() become static functions in liveness.c, called only from the static analysis pass. The func_instance->updated and must_write_dropped flags are removed. Remove spis_single_slot(), spis_one_bit() helpers from bpf_verifier.h as they are no longer used. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-9-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: record arg tracking results in bpf_liveness masksEduard Zingerman-1/+0
After arg tracking reaches a fixed point, perform a single linear scan over the converged at_in[] state and translate each memory access into liveness read/write masks on the func_instance: - Load/store instructions: FP-derived pointer's frame and offset(s) are converted to half-slot masks targeting per_frame_masks->{may_read,must_write} - Helper/kfunc calls: record_call_access() queries bpf_helper_stack_access_bytes() / bpf_kfunc_stack_access_bytes() for each FP-derived argument to determine access size and direction. Unknown access size (S64_MIN) conservatively marks all slots from fp_off to fp+0 as read. - Imprecise pointers (frame == ARG_IMPRECISE): conservatively mark all slots in every frame covered by the pointer's frame bitmask as fully read. - Static subprog calls with unresolved arguments: conservatively mark all frames as fully read. Instead of a call to clean_live_states(), start cleaning the current state continuously as registers and stack become dead since the static analysis provides complete liveness information. This makes clean_live_states() and bpf_verifier_state->cleaned unnecessary. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-8-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: introduce forward arg-tracking dataflow analysisEduard Zingerman-0/+4
The analysis is a basis for static liveness tracking mechanism introduced by the next two commits. A forward fixed-point analysis that tracks which frame's FP each register value is derived from, and at what byte offset. This is needed because a callee can receive a pointer to its caller's stack frame (e.g. r1 = fp-16 at the call site), then do *(u64 *)(r1 + 0) inside the callee — a cross-frame stack access that the callee's local liveness must attribute to the caller's stack. Each register holds an arg_track value from a three-level lattice: - Precise {frame=N, off=[o1,o2,...]} — known frame index and up to 4 concrete byte offsets - Offset-imprecise {frame=N, off_cnt=0} — known frame, unknown offset - Fully-imprecise {frame=ARG_IMPRECISE, mask=bitmask} — unknown frame, mask says which frames might be involved At CFG merge points the lattice moves toward imprecision (same frame+offset stays precise, same frame different offsets merges offset sets or becomes offset-imprecise, different frames become fully-imprecise with OR'd bitmask). The analysis also tracks spills/fills to the callee's own stack (at_stack_in/out), so FP derived values spilled and reloaded. This pass is run recursively per call site: when subprog A calls B with specific FP-derived arguments, B is re-analyzed with those entry args. The recursion follows analyze_subprog -> compute_subprog_args -> (for each call insn) -> analyze_subprog. Subprogs that receive no FP-derived args are skipped during recursion and analyzed independently at depth 0. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-7-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: make liveness.c track stack with 4-byte granularityEduard Zingerman-2/+2
Convert liveness bitmask type from u64 to spis_t, doubling the number of trackable stack slots from 64 to 128 to support 4-byte granularity. Each 8-byte SPI now maps to two consecutive 4-byte sub-slots in the bitmask: spi*2 half and spi*2+1 half. In verifier.c, check_stack_write_fixed_off() now reports 4-byte aligned writes of 4-byte writes as half-slot marks and 8-byte aligned 8-byte writes as two slots. Similar logic applied in check_stack_read_fixed_off(). Queries (is_live_before) are not yet migrated to half-slot granularity. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-4-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: Add spis_*() helpers for 4-byte stack slot bitmasksAlexei Starovoitov-0/+67
Add helper functions for manipulating u64[2] bitmasks that represent 4-byte stack slot liveness. The 512-byte BPF stack is divided into 128 4-byte slots, requiring 128 bits (two u64s) to track. These will be used by the static stack liveness analysis in the next commit. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-3-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: save subprogram name in bpf_subprog_infoEduard Zingerman-1/+1
Subprogram name can be computed from function info and BTF, but it is convenient to have the name readily available for logging purposes. Update comment saying that bpf_subprog_info->start has to be the first field, this is no longer true, relevant sites access .start field by it's name. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-2-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10bpf: share several utility functions as internal APIEduard Zingerman-0/+2
Namely: - bpf_subprog_is_global - bpf_vlog_alignment Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260410-patch-set-v4-1-5d4eecb343db@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-10clockevents: Prevent timer interrupt starvationThomas Gleixner-0/+2
Calvin reported an odd NMI watchdog lockup which claims that the CPU locked up in user space. He provided a reproducer, which sets up a timerfd based timer and then rearms it in a loop with an absolute expiry time of 1ns. As the expiry time is in the past, the timer ends up as the first expiring timer in the per CPU hrtimer base and the clockevent device is programmed with the minimum delta value. If the machine is fast enough, this ends up in a endless loop of programming the delta value to the minimum value defined by the clock event device, before the timer interrupt can fire, which starves the interrupt and consequently triggers the lockup detector because the hrtimer callback of the lockup mechanism is never invoked. As a first step to prevent this, avoid reprogramming the clock event device when: - a forced minimum delta event is pending - the new expiry delta is less then or equal to the minimum delta Thanks to Calvin for providing the reproducer and to Borislav for testing and providing data from his Zen5 machine. The problem is not limited to Zen5, but depending on the underlying clock event device (e.g. TSC deadline timer on Intel) and the CPU speed not necessarily observable. This change serves only as the last resort and further changes will be made to prevent this scenario earlier in the call chain as far as possible. [ tglx: Updated to restore the old behaviour vs. !force and delta <= 0 and fixed up the tick-broadcast handlers as pointed out by Borislav ] Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality") Reported-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Calvin Owens <calvin@wbinvd.org> Tested-by: Borislav Petkov <bp@alien8.de> Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/ Link: https://patch.msgid.link/20260407083247.562657657@kernel.org
2026-04-10sched_ext: Remove runtime kfunc mask enforcementCheng-Yang Chou-28/+0
Now that scx_kfunc_context_filter enforces context-sensitive kfunc restrictions at BPF load time, the per-task runtime enforcement via scx_kf_mask is redundant. Remove it entirely: - Delete enum scx_kf_mask, the kf_mask field on sched_ext_entity, and the scx_kf_allow()/scx_kf_disallow()/scx_kf_allowed() helpers along with the higher_bits()/highest_bit() helpers they used. - Strip the @mask parameter (and the BUILD_BUG_ON checks) from the SCX_CALL_OP[_RET]/SCX_CALL_OP_TASK[_RET]/SCX_CALL_OP_2TASKS_RET macros and update every call site. Reflow call sites that were wrapped only to fit the old 5-arg form and now collapse onto a single line under ~100 cols. - Remove the in-kfunc scx_kf_allowed() runtime checks from scx_dsq_insert_preamble(), scx_dsq_move(), scx_bpf_dispatch_nr_slots(), scx_bpf_dispatch_cancel(), scx_bpf_dsq_move_to_local___v2(), scx_bpf_sub_dispatch(), scx_bpf_reenqueue_local(), and the per-call guard inside select_cpu_from_kfunc(). scx_bpf_task_cgroup() and scx_kf_allowed_on_arg_tasks() were already cleaned up in the "drop redundant rq-locked check" patch. scx_kf_allowed_if_unlocked() was rewritten in the preceding "decouple" patch. No further changes to those helpers here. Co-developed-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
2026-04-10vfs: introduce d_mark_tmpfile_name()Paulo Alcantara-0/+1
CIFS requires O_TMPFILE dentries to have names of newly created delete-on-close files in the server so it can build full pathnames from the root of the share when performing operations on them. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-04-10Merge tag 'vfs-7.0-rc8.fixes' of ↵Linus Torvalds-45/+54
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "The kernfs rbtree is keyed by (hash, ns, name) where the hash is seeded with the raw namespace pointer via init_name_hash(ns). The resulting hash values are exposed to userspace through readdir seek positions, and the pointer-based ordering in kernfs_name_compare() is observable through entry order. Switch from raw pointers to ns_common::ns_id for both hashing and comparison. A preparatory commit first replaces all const void * namespace parameters with const struct ns_common * throughout kernfs, sysfs, and kobject so the code can access ns->ns_id. Also compare the ns_id when hashes match in the rbtree to handle crafted collisions. Also fix eventpoll RCU grace period issue and a cachefiles refcount problem" * tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: kernfs: make directory seek namespace-aware kernfs: use namespace id instead of pointer for hashing and comparison kernfs: pass struct ns_common instead of const void * for namespace tags eventpoll: defer struct eventpoll free to RCU grace period cachefiles: fix incorrect dentry refcount in cachefiles_cull()
2026-04-10mmc: sdio: add MediaTek MT7902 SDIO device IDSean Wang-0/+1
Add SDIO device ID (0x790a) for MediaTek MT7902 to sdio_ids.h. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-04-10Merge branches 'for-next/misc', 'for-next/tlbflush', ↵Catalin Marinas-55/+235
'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core * arm64/for-next/perf: : Perf updates perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc() perf/arm-cmn: Fix incorrect error check for devm_ioremap() perf: add NVIDIA Tegra410 C2C PMU perf: add NVIDIA Tegra410 CPU Memory Latency PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU perf/arm_cspmu: Add arm_cspmu_acpi_dev_get perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU perf/arm_cspmu: nvidia: Rename doc to Tegra241 perf/arm-cmn: Stop claiming entire iomem region arm64: cpufeature: Use pmuv3_implemented() function arm64: cpufeature: Make PMUVer and PerfMon unsigned KVM: arm64: Read PMUVer as unsigned * arm64/for-next/read-once: : Fixes for __READ_ONCE() with CONFIG_LTO=y arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y arm64: Optimize __READ_ONCE() with CONFIG_LTO=y * for-next/misc: : Miscellaneous cleanups/fixes arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd arm64: mm: Use generic enum pgtable_level arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0 arm64: remove ARCH_INLINE_* * for-next/tlbflush: : Refactor the arm64 TLB invalidation API and implementation arm64: mm: __ptep_set_access_flags must hint correct TTL arm64: mm: Provide level hint for flush_tlb_page() arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range() arm64: mm: More flags for __flush_tlb_range() arm64: mm: Refactor __flush_tlb_range() to take flags arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid() arm64: mm: Simplify __flush_tlb_range_limit_excess() arm64: mm: Simplify __TLBI_RANGE_NUM() macro arm64: mm: Re-implement the __flush_tlb_range_op macro in C arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range() arm64: mm: Push __TLBI_VADDR() into __tlbi_level() arm64: mm: Implicitly invalidate user ASID based on TLBI operation arm64: mm: Introduce a C wrapper for by-range TLB invalidation arm64: mm: Re-implement the __tlbi_level macro as a C function * for-next/ttbr-macros-cleanup: : Cleanups of the TTBR1_* macros arm64/mm: Directly use TTBRx_EL1_CnP arm64/mm: Directly use TTBRx_EL1_ASID_MASK arm64/mm: Describe TTBR1_BADDR_4852_OFFSET * for-next/kselftest: : arm64 kselftest updates selftests/arm64: Implement cmpbr_sigill() to hwcap test * for-next/feat_lsui: : Futex support using FEAT_LSUI instructions to avoid toggling PAN arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present arm64: Kconfig: Add support for LSUI KVM: arm64: Use CAST instruction for swapping guest descriptor arm64: futex: Support futex with FEAT_LSUI arm64: futex: Refactor futex atomic operation KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI KVM: arm64: Expose FEAT_LSUI to guests arm64: cpufeature: Add FEAT_LSUI * for-next/mpam: (40 commits) : Expose MPAM to user-space via resctrl: : - Add architecture context-switch and hiding of the feature from KVM. : - Add interface to allow MPAM to be exposed to user-space using resctrl. : - Add errata workaoround for some existing platforms. : - Add documentation for using MPAM and what shape of platforms can use resctrl arm64: mpam: Add initial MPAM documentation arm_mpam: Quirk CMN-650's CSU NRDY behaviour arm_mpam: Add workaround for T241-MPAM-6 arm_mpam: Add workaround for T241-MPAM-4 arm_mpam: Add workaround for T241-MPAM-1 arm_mpam: Add quirk framework arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl arm64: mpam: Select ARCH_HAS_CPU_RESCTRL arm_mpam: resctrl: Add empty definitions for assorted resctrl functions arm_mpam: resctrl: Update the rmid reallocation limit arm_mpam: resctrl: Add resctrl_arch_rmid_read() arm_mpam: resctrl: Allow resctrl to allocate monitors arm_mpam: resctrl: Add support for csu counters arm_mpam: resctrl: Add monitor initialisation and domain boilerplate arm_mpam: resctrl: Add kunit test for control format conversions arm_mpam: resctrl: Add support for 'MB' resource arm_mpam: resctrl: Wait for cacheinfo to be ready arm_mpam: resctrl: Add rmid index helpers arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT ... * for-next/hotplug-batched-tlbi: : arm64/mm: Enable batched TLB flush in unmap_hotplug_range() arm64/mm: Reject memory removal that splits a kernel leaf mapping arm64/mm: Enable batched TLB flush in unmap_hotplug_range() * for-next/bbml2-fixes: : Fixes for realm guest and BBML2_NOABORT arm64: mm: Remove pmd_sect() and pud_sect() arm64: mm: Handle invalid large leaf mappings correctly arm64: mm: Fix rodata=full block mapping support for realm guests * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 * for-next/generic-entry: : More arm64 refactoring towards using the generic entry code arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() entry: Fix stale comment for irqentry_enter() * for-next/acpi: : arm64 ACPI updates ACPI: AGDI: fix missing newline in error message
2026-04-10regmap: i3c: Add non-devm regmap_init_i3c() helperPei Xiao-0/+17
Add __regmap_init_i3c() and the corresponding regmap_init_i3c() macro to allow creating a regmap for I3C devices without using the device-managed version. This mirrors the pattern already established for other buses such as I2C, SPI and so on, giving drivers more flexibility when the regmap lifetime is not directly tied to the device. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Link: https://patch.msgid.link/a81256a8866b163979a20406abf01df7d7440104.1775788105.git.xiaopei01@kylinos.cn Signed-off-by: Mark Brown <broonie@kernel.org>
2026-04-10Merge branch 'pm-powercap'Rafael J. Wysocki-7/+52
Merge power capping updates for 7.1-rc1: - Clean up and rearrange the intel_rapl power capping driver to make the respective interface drivers (TPMI, MSR, and MMOI) hold their own settings and primitives and consolidate PL4 and PMU support flags into rapl_defaults (Kuppuswamy Sathyanarayanan) - Correct kernel-doc function parameter names in the power capping core code (Randy Dunlap) * pm-powercap: powercap: intel_rapl: Consolidate PL4 and PMU support flags into rapl_defaults powercap: intel_rapl: Move MSR primitives to MSR driver thermal: intel: int340x: processor: Move MMIO primitives to MMIO driver powercap: intel_rapl: Move TPMI primitives to TPMI driver powercap: intel_rapl: Move primitive info to header for interface drivers powercap: intel_rapl: Remove unused macro definitions powercap: intel_rapl: Move MSR default settings into MSR interface driver powercap: intel_rapl: Remove unused AVERAGE_POWER primitive powercap: correct kernel-doc function parameter names thermal: intel: int340x: processor: Move RAPL defaults to MMIO driver powercap: intel_rapl: Move TPMI default settings into TPMI interface driver powercap: intel_rapl: Allow interface drivers to configure rapl_defaults powercap: intel_rapl: Use unit conversion macros from units.h powercap: intel_rapl: Use GENMASK() and BIT() macros powercap: intel_rapl: Use shifts for power-of-2 operations powercap: intel_rapl: Simplify rapl_compute_time_window_atom() powercap: intel_rapl: Remove unused TIME_WINDOW macros powercap: intel_rapl: Cleanup coding style powercap: intel_rapl: Add a symbol namespace for intel_rapl exports
2026-04-10ASoC: uda1380: Modernize the driverLinus Walleij-19/+0
This codec driver depended on the legacy GPIO API, and nothing in the kernel is defining the platform data, so get rid of this. Two in-kernel device trees are defining this codec using undocumented device tree properties, so support these for now. The same properties can be defined using software nodes if board files are desired. The device tree use the "-gpio" rather than "-gpios" suffix but the GPIO DT parser will deal with that. Since there may be out of tree users, migrate to GPIO descriptors, drop the platform data that is unused, and assign the dac_clk the value that was used in all platforms found in a historical dig, and support setting the clock to the PLL using the undocumented device tree property. Add some menuconfig so the codec can be selected and tested. Signed-off-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260409-asoc-uda1380-v3-1-b3d5a53f31be@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2026-04-10netfilter: conntrack: remove UDP-Lite conntrack supportFernando Fernandez Mancera-10/+0
UDP-Lite (RFC 3828) socket support was recently retired from the core networking stack. As a follow-up of that, drop the connection tracker and NAT support for UDP-Lite in Netfilter. This patch removes CONFIG_NF_CT_PROTO_UDPLITE and scrubs UDP-Lite awareness from the conntrack core, NAT core, nft_ct, and ctnetlink. Please note that stateless packet inspection, matching, ipsets or logging support for IPPROTO_UDPLITE is preserved. As conntrack no longer extracts UDP-Lite ports or tracks its L4 state, when performing NAT the UDP-Lite checksum cannot be updated anymore. That is an expected and acceptable consequence of removing UDP-Lite conntrack module. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-04-10Merge branch 'pm-cpufreq'Rafael J. Wysocki-5/+7
Merge cpufreq updates for 7.1-rc1: - Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa) - Update cpufreq-dt-platdev blocklist (Faruque Ansari) - Minor updates to driver and dt-bindings for Tegra (Thierry Reding, Rosen Penev) - Add MAINTAINERS entry for CPPC driver (Viresh Kumar) - Add support for new features: CPPC performance priority, Dynamic EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham Shenoy, Mario Limonciello) - Fix sysfs files being present when HW missing and broken/outdated documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy) - Pass the policy to cpufreq_driver->adjust_perf() to avoid using cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate which leads to a scheduling-while-atomic bug (K Prateek Nayak) - Clean up dead code in Kconfig for cpufreq (Julian Braha) - Remove max_freq_req update for pre-existing cpufreq policy and add a boost_freq_req QoS request to save the boost constraint instead of overwriting the last scaling_max_freq constraint (Pierre Gondois) - Embed cpufreq QoS freq_req objects in cpufreq policy so they all are allocated in one go along with the policy to simplify lifetime rules and avoid error handling issues (Viresh Kumar) - Use DMI max speed when CPPC is unavailable in the acpi-cpufreq scaling driver (Henry Tseng) - Switch policy_is_shared() in cpufreq to using cpumask_nth() instead of cpumask_weight() because the former is more efficient (Yury Norov) - Use sysfs_emit() in sysfs show functions for cpufreq governor attributes (Thorsten Blum) - Update intel_pstate to stop returning an error when "off" is written to its status sysfs attribute while the driver is already off (Fabio De Francesco) - Include current frequency in the debug message printed by __cpufreq_driver_target() (Pengjie Zhang) * pm-cpufreq: (38 commits) cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer cpufreq: Pass the policy to cpufreq_driver->adjust_perf() cpufreq/amd-pstate: Pass the policy to amd_pstate_update() cpufreq/amd-pstate-ut: Add a unit test for raw EPP cpufreq/amd-pstate: Add support for raw EPP writes cpufreq/amd-pstate: Add support for platform profile class cpufreq/amd-pstate: add kernel command line to override dynamic epp cpufreq/amd-pstate: Add dynamic energy performance preference Documentation: amd-pstate: fix dead links in the reference section cpufreq/amd-pstate: Cache the max frequency in cpudata Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count} Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file amd-pstate-ut: Add a testcase to validate the visibility of driver attributes amd-pstate-ut: Add module parameter to select testcases amd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2() amd-pstate: Add sysfs support for floor_freq and floor_count amd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF x86/cpufeatures: Add AMD CPPC Performance Priority feature. ...
2026-04-10xen/grant-table: guard gnttab_suspend/resume with CONFIG_HIBERNATE_CALLBACKSPengpeng Hou-0/+12
In current linux.git, gnttab_suspend() and gnttab_resume() are defined and declared unconditionally. However, their only in-tree callers reside in drivers/xen/manage.c, which are guarded by CONFIG_HIBERNATE_CALLBACKS. Match the helper scope to their callers by wrapping the definitions in CONFIG_HIBERNATE_CALLBACKS and providing no-op stubs in the header. This fixes the config-scope mismatch and reduces the code footprint when hibernation callbacks are disabled. Signed-off-by: Pengpeng Hou <pengpeng.hou@isrc.iscas.ac.cn> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260310080800.742223-1-pengpeng.hou@isrc.iscas.ac.cn>
2026-04-10hvc/xen: Check console connection flagJason Andryuk-0/+13
When the console out buffer is filled, __write_console() will return 0 as it cannot send any data. domU_write_console() will then spin in `while (len)` as len doesn't decrement until xenconsoled attaches. This would block a domU and nullify the parallelism of Hyperlaunch until dom0 userspace starts xenconsoled, which empties the buffer. Xen 4.21 added a connection field to the xen console page. This is set to XENCONSOLE_DISCONNECTED (1) when a domain is built, and xenconsoled will set it to XENCONSOLE_CONNECTED (0) when it connects. Update the hvc_xen driver to check the field. When the field is disconnected, drop the write with -ENOTCONN. We only drop the write when the field is XENCONSOLE_DISCONNECTED (1) to try for maximum compatibility. The Xen toolstack has historically zero initialized the console, so it should see XENCONSOLE_CONNECTED (0) by default. If an implemenation used uninitialized memory, only checking for XENCONSOLE_DISCONNECTED could have the lowest chance of not connecting. This lets the hyperlaunched domU boot without stalling. Once dom0 starts xenconsoled, xl console can be used to access the domU's hvc0. Paritally sync console.h from xen.git to bring in the new field. Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260318235326.14568-1-jason.andryuk@amd.com>
2026-04-10erofs: clean up encoded map flagsGao Xiang-4/+3
- Remove EROFS_MAP_ENCODED since it was always set together with EROFS_MAP_MAPPED for compressed extents and checked redundantly; - Replace the EROFS_MAP_FULL_MAPPED flag with the opposite EROFS_MAP_PARTIAL_MAPPED flag so that extents are implicitly fully mapped initially to simplify the logic; - Make fragment extents independent of EROFS_MAP_MAPPED since they are not directly allocated on disk; thus fragment extents are no longer twisted with mapped extents. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-04-09net: remove the netif_get_rx_queue_lease_locked() helpersJakub Kicinski-5/+0
The netif_get_rx_queue_lease_locked() API hides the locking and the descend onto the leased queue. Making the code harder to follow (at least to me). Remove the API and open code the descend a bit. Most of the code now looks like: if (!leased) return __helper(x); hw_rxq = .. netdev_lock(hw_rxq->dev); ret = __helper(x); netdev_unlock(hw_rxq->dev); return ret; Of course if we have more code paths that need the wrapping we may need to revisit. For now, IMHO, having to know what netif_get_rx_queue_lease_locked() does is not worth the 20LoC it saves. Link: https://patch.msgid.link/20260408151251.72bd2482@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'Jakub Kicinski-13/+75
Daniel Borkmann says: ==================== netkit: Support for io_uring zero-copy and AF_XDP Containers use virtual netdevs to route traffic from a physical netdev in the host namespace. They do not have access to the physical netdev in the host and thus can't use memory providers or AF_XDP that require reconfiguring/restarting queues in the physical netdev. This patchset adds the concept of queue leasing to virtual netdevs that allow containers to use memory providers and AF_XDP at native speed. Leased queues are bound to a real queue in a physical netdev and act as a proxy. Memory providers and AF_XDP operations take an ifindex and queue id, so containers would pass in an ifindex for a virtual netdev and a queue id of a leased queue, which then gets proxied to the underlying real queue. We have implemented support for this concept in netkit and tested the latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504 (bnxt_en) 100G NICs. For more details see the individual patches. ==================== Link: https://patch.msgid.link/20260402231031.447597-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09netkit: Add netkit notifier to check for unregistering devicesDaniel Borkmann-0/+2
Add a netdevice notifier in netkit to watch for NETDEV_UNREGISTER events. If the target device is indeed NETREG_UNREGISTERING and previously leased a queue to a netkit device, then collect the related netkit devices and batch-unregister_netdevice_many() them. If this were not done, then the netkit device would hold a reference on the physical device preventing it from going away. However, in case of both io_uring zero-copy as well as AF_XDP this situation is handled gracefully and the allocated resources are torn down. In the case where mentioned infra is used through netkit, the applications have a reference on netkit, and netkit in turn holds a reference on the physical device. In order to have netkit release the reference on the physical device, we need such watcher to then unregister the netkit ones. This is generally quite similar to the dependency handling in case of tunnels (e.g. vxlan bound to a underlying netdev) where the tunnel device gets removed along with the physical device. # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff [...] # rmmod mlx5_ib # rmmod mlx5_core [...] [ 309.261822] mlx5_core 0000:0a:00.0 mlx5_0: Port: 1 Link DOWN [ 344.235236] mlx5_core 0000:0a:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.246948] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.463754] mlx5_core 0000:0a:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) [ 344.770155] mlx5_core 0000:0a:00.1: E-Switch: cleanup [...] # ip a [...] [ both enp10s0f0np0 and nk gone ] [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-13-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09netkit: Add single device mode for netkitDaniel Borkmann-0/+6
Add a single device mode for netkit instead of netkit pairs. The primary target for the paired devices is to connect network namespaces, of course, and support has been implemented in projects like Cilium [0]. For the rxq leasing the plan is to support two main scenarios related to single device mode: * For the use-case of io_uring zero-copy, the control plane can either set up a netkit pair where the peer device can perform rxq leasing which is then tied to the lifetime of the peer device, or the control plane can use a regular netkit pair to connect the hostns to a Pod/container and dynamically add/remove rxq leasing through a single device without having to interrupt the device pair. In the case of io_uring, the memory pool is used as skb non-linear pages, and thus the skb will go its way through the regular stack into netkit. Things like the netkit policy when no BPF is attached or skb scrubbing etc apply as-is in case the paired devices are used, or if the backend memory is tied to the single device and traffic goes through a paired device. * For the use-case of AF_XDP, the control plane needs to use netkit in the single device mode. The single device mode currently enforces only a pass policy when no BPF is attached, and does not yet support BPF link attachments for AF_XDP. skbs sent to that device get dropped at the moment. Given AF_XDP operates at a lower layer of the stack tying this to the netkit pair did not make sense. In future, the plan is to allow BPF at the XDP layer which can: i) process traffic coming from the AF_XDP application (e.g. QEMU with AF_XDP backend) to filter egress traffic or to push selected egress traffic up to the single netkit device to the local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the single netkit into the AF_XDP application (e.g. DHCP replies). Also, the control-plane can dynamically manage rxq leasing for the single netkit device without having to interrupt (e.g. down/up cycle) the main netkit pair for the Pod which has traffic going in and out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jordan Rife <jordan@jrife.io> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode [0] Link: https://patch.msgid.link/20260402231031.447597-11-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09net: Proxy netdev_queue_get_dma_dev for leased queuesDavid Wei-1/+3
Extend netdev_queue_get_dma_dev to return the physical device of the real rxq for DMA in case the queue was leased. This allows memory providers like io_uring zero-copy or devmem to bind to the physically leased rxq via virtual devices such as netkit. Signed-off-by: David Wei <dw@davidwei.uk> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-8-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09net: Slightly simplify net_mp_{open,close}_rxqDaniel Borkmann-6/+2
net_mp_open_rxq is currently not used in the tree as all callers are using __net_mp_open_rxq directly, and net_mp_close_rxq is only used once while all other locations use __net_mp_close_rxq. Consolidate into a single API, netif_mp_{open,close}_rxq, using the netif_ prefix to indicate that the caller is responsible for locking. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-6-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09net: Add lease info to queue-get responseDaniel Borkmann-0/+14
Populate nested lease info to the queue-get response that returns the ifindex, queue id with type and optionally netns id if the device resides in a different netns. Example with ynl client when using AF_XDP via queue leasing: # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:24 qdisc mq state UP group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever inet6 fe80::eaeb:d3ff:fea3:43f6/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ethtool -i enp10s0f0np0 driver: mlx5_core [...] # ynl --family netdev --output-json --do queue-get \ --json '{"ifindex": 4, "id": 15, "type": "rx"}' {'id': 15, 'ifindex': 4, 'lease': {'ifindex': 8, 'netns-id': 0, 'queue': {'id': 1, 'type': 'rx'}}, 'napi-id': 8227, 'type': 'rx', 'xsk': {}} # ip netns list foo (id: 0) # ip netns exec foo ip a [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet6 fe80::200:ff:fe00:0/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ip netns exec foo ethtool -i nk driver: netkit [...] # ip netns exec foo ls /sys/class/net/nk/queues/ rx-0 rx-1 tx-0 # ip netns exec foo ynl --family netdev --output-json --do queue-get \ --json '{"ifindex": 8, "id": 1, "type": "rx"}' {"id": 1, "type": "rx", "ifindex": 8, "xsk": {}} Note that the caller of netdev_nl_queue_fill_one() holds the netdevice lock. For the queue-get we do not lock both devices. When queues get {un,}leased, both devices are locked, thus if __netif_get_rx_queue_lease() returns a lease pointer, it points to a valid device. The netns-id is fetched via peernet2id_alloc() similarly as done in OVS. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-4-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09net: Implement netdev_nl_queue_create_doitDaniel Borkmann-6/+37
Implement netdev_nl_queue_create_doit which creates a new rx queue in a virtual netdev and then leases it to a rx queue in a physical netdev. Example with ynl client: # ynl --family netdev --output-json --do queue-create \ --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}' {'id': 1} Note that the netdevice locking order is always from the virtual to the physical device. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260402231031.447597-3-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09net: Add queue-create operationDaniel Borkmann-0/+11
Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice: name: queue-create attribute-set: queue flags: [admin-perm] do: request: attributes: - ifindex - type - lease reply: &queue-create-op attributes: - id This is a generic operation such that it can be extended for various use cases in future. Right now it is mandatory to specify ifindex, the queue type which is enforced to rx and a lease. The newly created queue id is returned to the caller. A queue from a virtual device can have a lease which refers to another queue from a physical device. This is useful for memory providers and AF_XDP operations which take an ifindex and queue id to allow applications to bind against virtual devices in containers. The lease couples both queues together and allows to proxy the operations from a virtual device in a container to the physical device. In future, the nested lease attribute can be lifted and made optional for other use-cases such as dynamic queue creation for physical netdevs. The lack of lease and the specification of the physical device as an ifindex will imply that we need a real queue to be allocated. Similarly, the queue type enforcement to rx can then be lifted as well to support tx. An early implementation had only driver-specific integration [0], but in order for other virtual devices to reuse, it makes sense to have this as a generic API in core net. For leasing queues, the virtual netdev must have real_num_rx_queues less than num_rx_queues at the time of calling queue-create. The queue-type must be rx as only rx queues are supported for leasing for now. We also enforce that the queue-create ifindex must point to a virtual device, and that the nested lease attribute's ifindex must point to a physical device. The nested lease attribute set contains a netns-id attribute which is optional and can specify a netns-id relative to the caller's netns. It requires cap_net_admin and if the netns-id attribute is not specified, the lease ifindex will be retrieved from the current netns. Also, it is modeled as an s32 type similarly as done elsewhere in the stack. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Link: https://patch.msgid.link/20260402231031.447597-2-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-10Merge tag 'drm-misc-next-fixes-2026-04-09' of ↵Dave Airlie-2/+2
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Short summary of fixes pull: dma-buf: - fence: fix docs for dma_fence_unlock_irqrestore() fb-helper: - unlock in error path gem-shmem: - fix PMD write update gem-vram: - remove obsolete documentation ivpu: - fix device-recovery handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260409113921.GA181028@linux.fritz.box
2026-04-09ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer supportMing Lei-1/+2
The __u32 len field cannot represent a 4GB buffer (0x100000000 overflows to 0). Change it to __u64 so buffers up to 4GB can be registered. Add a reserved field for alignment and validate it is zero. The kernel enforces a default max of 4GB (UBLK_SHMEM_BUF_SIZE_MAX) which may be increased in future. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260409133020.3780098-2-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-39/+136
Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c c3812651b522f ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") 78723a62b969a ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c fde29fd934932 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") d98adfbdd5c01 ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c 51f4e090b9f8 ("net: stmmac: fix integer underflow in chain mode") 6b4286e05508 ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09Merge branch 'acpi-apei'Rafael J. Wysocki-0/+11
Merge ACPI APEI updates for 7.1-rc1: - Add devm_ghes_register_vendor_record_notifier(), use it in the PCI hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng Feng) * acpi-apei: ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler PCI: hisi: Use devm_ghes_register_vendor_record_notifier() ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
2026-04-09Merge branch 'acpi-driver'Rafael J. Wysocki-3/+6
Merge ACPI core driver core driver updates and assorted driver updates related to ACPI support for 7.1-rc1: - Clean up the ACPI AC and ACPI PAD (processor aggregator device) drivers (Rafael Wysocki) - Rework checking for duplicate video bus devices and consolidate pnp.bus_id workarounds handling in the ACPI video bus driver (Rafael Wysocki) - Update the ACPI core device drivers to stop setting acpi_device_name() unnecessarily (Rafael Wysocki) - Rearrange code using acpi_device_class() in the ACPI core device drivers and update them to stop setting acpi_device_class() unnecessarily (Rafael Wysocki) - Define ACPI_AC_CLASS in one place (Rafael Wysocki) - Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver to bind to platform devices instead of ACPI devices (Rafael Wysocki) * acpi-driver: watchdog: ni903x_wdt: Convert to a platform driver ACPI: PAD: xen: Convert to a platform driver ACPI: AC: Define ACPI_AC_CLASS in one place ACPI: driver: Do not set acpi_device_class() unnecessarily ACPI: driver: Avoid using pnp.device_class for netlink handling ACPI: event: Redefine acpi_notifier_call_chain() ACPI: driver: Do not set acpi_device_name() unnecessarily ACPI: video: Consolidate pnp.bus_id workarounds handling ACPI: video: Rework checking for duplicate video bus devices driver core: auxiliary bus: Introduce dev_is_auxiliary() ACPI: PAD: Rearrange notify handler installation and removal ACPI: AC: Get rid of unnecessary declarations
2026-04-09Merge branch 'acpi-cmos-rtc'Rafael J. Wysocki-9/+10
Merge updates related to the CMOS RTC driver and x86/ACPI CMOS RTC support for 7.1-rc1: - Add ACPI support to the platform device interface in the CMOS RTC driver, make the ACPI core device enumeration code create a platform device for the CMOS RTC, and drop CMOS RTC PNP device support (Rafael Wysocki) - Consolidate the x86-specific CMOS RTC handling with the ACPI TAD driver and clean up the CMOS RTC ACPI address space handler (Rafael Wysocki) - Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT and allow that driver to work without a dedicated IRQ if the ACPI alarm is used (Rafael Wysocki) * acpi-cmos-rtc: rtc: cmos: Do not require IRQ if ACPI alarm is used rtc: cmos: Enable ACPI alarm if advertised in ACPI FADT ACPI: TAD/x86: cmos_rtc: Consolidate address space handler setup rtc: cmos: Drop PNP device support x86: rtc: Drop PNP device check ACPI: PNP: Drop CMOS RTC PNP device support ACPI: x86/rtc-cmos: Use platform device for driver binding ACPI: x86: cmos_rtc: Create a CMOS RTC platform device ACPI: x86: cmos_rtc: Improve coordination with ACPI TAD driver ACPI: x86: cmos_rtc: Clean up address space handler driver
2026-04-09Merge branches 'acpi-processor' and 'acpi-cppc'Rafael J. Wysocki-1/+23
Merge ACPI processor driver updates and ACPI CPPC library updates for 7.1-rc1: - Address multiple assorted issues and clean up the code in the ACPI processor idle driver (Huisong Li) - Replace strlcat() in the ACPI processor idle drive with a better alternative (Andy Shevchenko) - Rearrange and clean up acpi_processor_errata_piix4() (Rafael Wysocki) - Move reference performance to capabilities and fix an uninitialized variable in the ACPI CPPC library (Pengjie Zhang) - Add support for the Performance Limited Register to the ACPI CPPC library (Sumit Gupta) - Add cppc_get_perf() API to read performance controls, extend cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC library warn on missing mandatory DESIRED_PERF register (Sumit Gupta) - Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in target callbacks to allow it to control performance bounds via standard scaling_min_freq and scaling_max_freq sysfs attributes and add sysfs documentation for the Performance Limited Register to it (Sumit Gupta) * acpi-processor: ACPI: processor: idle: Reset cpuidle on C-state list changes cpuidle: Extract and export no-lock variants of cpuidle_unregister_device() ACPI: processor: idle: Fix NULL pointer dereference in hotplug path ACPI: processor: idle: Reset power_setup_done flag on initialization failure ACPI: processor: Rearrange and clean up acpi_processor_errata_piix4() ACPI: processor: idle: Replace strlcat() with better alternative ACPI: processor: idle: Remove redundant static variable and rename cstate check function ACPI: processor: idle: Move max_cstate update out of the loop ACPI: processor: idle: Remove redundant cstate check in acpi_processor_power_init ACPI: processor: idle: Add missing bounds check in flatten_lpi_states() * acpi-cppc: ACPI: CPPC: Check cpc_read() return values consistently ACPI: CPPC: Fix uninitialized ref variable in cppc_get_perf_caps() ACPI: CPPC: Move reference performance to capabilities cpufreq: CPPC: Add sysfs documentation for perf_limited ACPI: CPPC: add APIs and sysfs interface for perf_limited cpufreq: cppc: Update MIN_PERF/MAX_PERF in target callbacks cpufreq: CPPC: Update cached perf_ctrls on sysfs write ACPI: CPPC: Extend cppc_set_epp_perf() for FFH/SystemMemory ACPI: CPPC: Warn on missing mandatory DESIRED_PERF register ACPI: CPPC: Add cppc_get_perf() API to read performance controls
2026-04-09ASoC: Yet another round of SDCA fixesMark Brown-0/+5
Charles Keepax <ckeepax@opensource.cirrus.com> says: Another round of SDCA fixes a couple of fix to the IRQ cleanup from Richard, and a minor tweak to the IRQ handling from me.
2026-04-09Merge tag 'sound-7.0' of ↵Linus Torvalds-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Still a bit higher amount than wished, but nothing looks really scary, and all changes are about nice and smooth device-specific fixes. - HD-audio quirks, one revert for a regression and another oneliner - AMD ACP quirks - Fixes for SDCA interrupt handling - A few Intel SOF, avs and NVL fixes - Fixes for TAS2552 DT, NAU8325, and STM32" * tag 'sound-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: amd: acp: update DMI quirk and add ACP DMIC for Lenovo platforms ASoC: SDCA: Unregister IRQ handlers on module remove ASoC: SDCA: mask Function_Status value ASoC: SDCA: Fix overwritten var within for loop ASoC: stm32_sai: fix incorrect BCLK polarity for DSP_A/B, LEFT_J ASoC: SOF: Intel: hda: modify period size constraints for ACE4 ALSA: hda/intel: enforce stricter period-size alignment for Intel NVL ASoC: nau8325: Add software reset during probe Revert "ALSA: hda/realtek: Add quirk for Gigabyte Technology to fix headphone" ASoC: Intel: avs: Fix memory leak in avs_register_i2s_test_boards() ASoC: SOF: Intel: fix iteration in is_endpoint_present() ASoC: SOF: Intel: Fix endpoint index if endpoints are missing ASoC: SDCA: Fix errors in IRQ cleanup ASoC: amd: acp: add Lenovo P16s G5 AMD quirk for legacy SDW machine ASoC: dt-bindings: ti,tas2552: Add sound-dai-cells ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IAH10
2026-04-09Merge tag 'pmdomain-v7.0-rc6' of ↵Linus Torvalds-74/+0
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fixes from Ulf Hansson: - imx: Prevent hang at power down for imx8mp-blk-ctrl - thead: Fix buffer overflow for TH1520 AON driver - Change Ulf Hansson's email * tag 'pmdomain-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: MAINTAINERS, mailmap: Change Ulf Hansson's email pmdomain: imx8mp-blk-ctrl: Keep the NOC_HDCP clock enabled firmware: thead: Fix buffer overflow and use standard endian macros
2026-04-09bitops: Update kernel-doc for sign_extendXX()Andy Shevchenko-2/+8
The sign_extendXX() lack of Return section and have other style issues. Address that by updating kernel-doc accordingly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <ynorov@nvidia.com>
2026-04-09Merge tag 'net-7.0-rc8' of ↵Linus Torvalds-5/+28
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter, IPsec and wireless. This is again considerably bigger than the old average. No known outstanding regressions. Current release - regressions: - net: increase IP_TUNNEL_RECURSION_LIMIT to 5 - eth: ice: fix PTP timestamping broken by SyncE code on E825C Current release - new code bugs: - eth: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure Previous releases - regressions: - core: fix cross-cache free of KFENCE-allocated skb head - sched: act_csum: validate nested VLAN headers - rxrpc: fix call removal to use RCU safe deletion - xfrm: - wait for RCU readers during policy netns exit - fix refcount leak in xfrm_migrate_policy_find - wifi: rt2x00usb: fix devres lifetime - mptcp: fix slab-use-after-free in __inet_lookup_established - ipvs: fix NULL deref in ip_vs_add_service error path - eth: - airoha: fix memory leak in airoha_qdma_rx_process() - lan966x: fix use-after-free and leak in lan966x_fdma_reload() Previous releases - always broken: - ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data() - ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump - bridge: guard local VLAN-0 FDB helpers against NULL vlan group - xsk: tailroom reservation and MTU validation - rxrpc: - fix to request an ack if window is limited - fix RESPONSE authenticator parser OOB read - netfilter: nft_ct: fix use-after-free in timeout object destroy - batman-adv: hold claim backbone gateways by reference - eth: - stmmac: fix PTP ref clock for Tegra234 - idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling - ipa: fix GENERIC_CMD register field masks for IPA v5.0+" * tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (104 commits) net: lan966x: fix use-after-free and leak in lan966x_fdma_reload() net: lan966x: fix page pool leak in error paths net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool() nfc: pn533: allocate rx skb before consuming bytes l2tp: Drop large packets with UDP encap net: ipa: fix event ring index not programmed for IPA v5.0+ net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+ MAINTAINERS: Add Prashanth as additional maintainer for amd-xgbe driver devlink: Fix incorrect skb socket family dumping af_unix: read UNIX_DIAG_VFS data under unix_state_lock Revert "mptcp: add needs_id for netlink appending addr" mptcp: fix slab-use-after-free in __inet_lookup_established net: txgbe: leave space for null terminators on property_entry net: ioam6: fix OOB and missing lock rxrpc: proc: size address buffers for %pISpc output rxrpc: only handle RESPONSE during service challenge rxrpc: Fix buffer overread in rxgk_do_verify_authenticator() rxrpc: Fix leak of rxgk context in rxgk_verify_response() rxrpc: Fix integer overflow in rxgk_verify_response() rxrpc: Fix missing error checks for rxkad encryption/decryption failure ...
2026-04-09memblock: Permit existing reserved regions to be marked RSRV_KERNArd Biesheuvel-0/+1
Permit existing memblock reservations to be marked as RSRV_KERN. This will be used by the EFI code on x86 to distinguish between reservations of boot services data regions that have actual significance to the kernel and regions that are reserved temporarily to work around buggy firmware. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-04-09jbd2: store jinode dirty range in PAGE_SIZE unitsLi Chen-12/+22
jbd2_inode fields are updated under journal->j_list_lock, but some paths read them without holding the lock (e.g. fast commit helpers and ordered truncate helpers). READ_ONCE() alone is not sufficient for the dirty range fields when they are stored as loff_t because 32-bit platforms can observe torn loads. Store the dirty range in PAGE_SIZE units as pgoff_t instead. Represent the dirty range end as an exclusive end page. This avoids a special sentinel value and keeps MAX_LFS_FILESIZE on 32-bit representable. Publish a new dirty range by updating end_page before start_page, and treat start_page >= end_page as empty in the accessor for robustness. Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the dirty range and i_flags to match the existing lockless access pattern. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-5-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>