summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorLines
2026-02-10LoongArch: dts: loongson-2k0500: Add nand controller supportBinbin Zhou-1/+30
The module is supported, enable it. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: BPF: Implement bpf_addr_space_cast instructionHengqi Chen-0/+16
LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space. rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY rY = addr_space_cast(rX, 1, 0) has to be converted by JIT. With this, the following test cases passed: $ ./test_progs -a arena_htab,arena_list,arena_strsearch,verifier_arena,verifier_arena_large #4/1 arena_htab/arena_htab_llvm:OK #4/2 arena_htab/arena_htab_asm:OK #4 arena_htab:OK #5/1 arena_list/arena_list_1:OK #5/2 arena_list/arena_list_1000:OK #5 arena_list:OK #7/1 arena_strsearch/arena_strsearch:OK #7 arena_strsearch:OK #507/1 verifier_arena/basic_alloc1:OK #507/2 verifier_arena/basic_alloc2:OK #507/3 verifier_arena/basic_alloc3:OK #507/4 verifier_arena/basic_reserve1:OK #507/5 verifier_arena/basic_reserve2:OK #507/6 verifier_arena/reserve_twice:OK #507/7 verifier_arena/reserve_invalid_region:OK #507/8 verifier_arena/iter_maps1:OK #507/9 verifier_arena/iter_maps2:OK #507/10 verifier_arena/iter_maps3:OK #507 verifier_arena:OK #508/1 verifier_arena_large/big_alloc1:OK #508/2 verifier_arena_large/access_reserved:OK #508/3 verifier_arena_large/request_partially_reserved:OK #508/4 verifier_arena_large/free_reserved:OK #508/5 verifier_arena_large/big_alloc2:OK #508 verifier_arena_large:OK Summary: 5/20 PASSED, 0 SKIPPED, 0 FAILED Acked-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: BPF: Implement PROBE_MEM32 pseudo instructionsHengqi Chen-6/+68
Add support for `{LDX,STX,ST} | PROBE_MEM32 | {B,H,W,DW}` instructions. They are similar to PROBE_MEM instructions with the following differences: * PROBE_MEM32 supports store. * PROBE_MEM32 relies on the verifier to clear upper 32-bit of the src/dst register * PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in S6 in the prologue). Due to bpf_arena constructions such S6 + reg + off16 access is guaranteed to be within arena virtual range, so no address check at run-time. * S6 is a free callee-saved register, so it is used to store arena_vm_start * PROBE_MEM32 allows ST and STX. If they fault the store is a nop. When LDX faults the destination register is zeroed. To support these on LoongArch, we employ the t2/t3 registers to store the intermediate results of reg_arena + src/dst reg and use the t2/t3 registers as the new src/dst reg. This allows us to reuse most of the existing code. Acked-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: BPF: Use BPF prog pack allocatorHengqi Chen-37/+83
Use bpf_jit_binary_pack_alloc() for BPF JIT binaries. The BPF prog pack allocator creates a pair of RW and RX buffers. The BPF JIT writes the program into the RW buffer. When the JIT is done, the program is copied to the final RX buffer with bpf_jit_binary_pack_finalize(). Acked-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Use IS_ERR_PCPU() macro for KGDBCarlos López-1/+1
In commit a759e37fb467 ("err.h: add ERR_PTR_PCPU(), PTR_ERR_PCPU() and IS_ERR_PCPU() macros"), specialized macros were added to check an error within a __percpu pointer, so use them instead of manually casting with __force, like all other users of register_wide_hw_breakpoint(). Signed-off-by: Carlos López <clopez@suse.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Rework KASAN initialization for PTW-enabled systemsTiezhu Yang-38/+40
kasan_init_generic() indicates that kasan is fully initialized, so it should be put at end of kasan_init(). Otherwise bringing up the primary CPU failed when CONFIG_KASAN is set on PTW-enabled systems, here are the call chains: kernel_entry() start_kernel() setup_arch() kasan_init() kasan_init_generic() The reason is PTW-enabled systems have speculative accesses which means memory accesses to the shadow memory after kasan_init() may be executed by hardware before. However, accessing shadow memory is safe only after kasan fully initialized because kasan_init() uses a temporary PGD table until we have populated all levels of shadow page tables and writen the PGD register. Moving kasan_init_generic() later can defer the occasion of kasan_enabled(), so as to avoid speculative accesses on shadow pages. After moving kasan_init_generic() to the end, kasan_init() can no longer call kasan_mem_to_shadow() for shadow address conversion because it will always return kasan_early_shadow_page. On the other hand, we should keep the current logic of kasan_mem_to_shadow() for both the early and final stage because there may be instrumentation before kasan_init(). To solve this, we factor out a new mem_to_shadow() function from current kasan_mem_to_shadow() for the shadow address conversion in kasan_init(). Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Disable instrumentation for setup_ptwalker()Tiezhu Yang-1/+1
According to Documentation/dev-tools/kasan.rst, software KASAN modes use compiler instrumentation to insert validity checks. Such instrumentation might be incompatible with some parts of the kernel, and therefore needs to be disabled, just use the attribute __no_sanitize_address to disable instrumentation for the low level function setup_ptwalker(). Otherwise bringing up the secondary CPUs failed when CONFIG_KASAN is set (especially when PTW is enabled), here are the call chains: smpboot_entry() start_secondary() cpu_probe() per_cpu_trap_init() tlb_init() setup_tlb_handler() setup_ptwalker() The reason is the PGD registers are configured in setup_ptwalker(), but KASAN instrumentation may cause TLB exceptions before that. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Remove some extern variables in source filesTiezhu Yang-7/+0
There are declarations of the variable "eentry", "pcpu_handlers[]" and "exception_handlers[]" in asm/setup.h, the source files already include this header file directly or indirectly, so no need to declare them in the source files, just remove the code. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Guard percpu handler under !CONFIG_PREEMPT_RTTiezhu Yang-1/+1
After commit 88fd2b70120d ("LoongArch: Fix sleeping in atomic context for PREEMPT_RT"), it should guard percpu handler under !CONFIG_PREEMPT_RT to avoid redundant operations. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Handle percpu handler address for ORC unwinderTiezhu Yang-0/+19
After commit 4cd641a79e69 ("LoongArch: Remove unnecessary checks for ORC unwinder"), the system can not boot normally under some configs (such as enable KASAN), there are many error messages "cannot find unwind pc". The kernel boots normally with the defconfig, so no problem found out at the first time. Here is one way to reproduce: cd linux make mrproper defconfig -j"$(nproc)" scripts/config -e KASAN make olddefconfig all -j"$(nproc)" sudo make modules_install sudo make install sudo reboot The address that can not unwind is not a valid kernel address which is between "pcpu_handlers[cpu]" and "pcpu_handlers[cpu] + vec_sz" due to the code of eentry was copied to the new area of pcpu_handlers[cpu] in setup_tlb_handler(), handle this special case to get the valid address to unwind normally. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Use %px to print unmodified unwinding addressTiezhu Yang-1/+1
Currently, use %p to prevent leaking information about the kernel memory layout when printing the PC address, but the kernel log messages are not useful to debug problem if bt_address() returns 0. Given that the type of "pc" variable is unsigned long, it should use %px to print the unmodified unwinding address. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Prefer top-down allocation after arch_mem_init()Huacai Chen-0/+1
Currently we use bottom-up allocation after sparse_init(), the reason is sparse_init() need a lot of memory, and bottom-up allocation may exhaust precious low memory (below 4GB). On the other hand, SWIOTLB and CMA need low memories for DMA32, so swiotlb_init() and dma_contiguous_reserve() need bottom-up allocation. Since swiotlb_init() and dma_contiguous_reserve() are both called in arch_mem_init(), we no longer need bottom-up allocation after that. So we set the allocation policy to top-down at the end of arch_mem_init(), in order to avoid later memory allocations (such as KASAN) exhaust low memory. This solve at least two problems: 1. Some buggy BIOSes use 0xfd000000~0xfe000000 for secondary CPUs, but didn't reserve this range, which causes smpboot failures. 2. Some DMA32 devices, such as Loongson-DRM and OHCI, cannot work with KASAN enabled. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Add HOTPLUG_SMT implementationHuacai Chen-1/+15
For benchmarking or debugging purpose, we usually want to control SMT via boot parameter and sysfs knobs. So add HOTPLUG_SMT implementation. 1. Boot parameters: nosmt: Disable SMT, can be enabled via sysfs knobs. nosmt=force: Disable SMT, cannot be enabled via sysfs knobs. 2. Runtime sysfs controls: Write "on", "off", "forceoff" or the number of SMT threads (1, 2, ...) to /sys/devices/system/cpu/smt/control. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Make cpumask_of_node() robust against NUMA_NO_NODEJohn Garry-1/+1
The arch definition of cpumask_of_node() cannot handle NUMA_NO_NODE - which is a valid index - so add a check for this. Cc: stable@vger.kernel.org Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Wire up memfd_secret system callLain "Fearyncess" Yang-4/+4
LoongArch supports ARCH_HAS_SET_DIRECT_MAP, therefore wire up the memfd_secret system call, which just depends on it. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Lain "Fearyncess" Yang <fearyncess@aosc.io> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Replace seq_printf() with seq_puts() for simple stringsGeorge Guo-24/+40
Fix warnings like: "Prefer seq_puts to seq_printf" by checkpatch.pl. Replace seq_printf() calls with seq_puts() in show_cpuinfo() when outputting simple constant strings without format specifiers. This improves performance slightly as seq_puts() avoids parsing the format string. Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Add 128-bit atomic cmpxchg supportGeorge Guo-0/+56
Implement 128-bit atomic compare-and-exchange using LoongArch's LL.D/SC.Q instructions. At the same time, this fix the BPF scheduler test failures (scx_central and scx_qmap) caused by kmalloc_nolock_noprof() returning NULL, due to missing 128-bit atomics. The NULL returns lead to -ENOMEM errors during scheduler initialization, causing test cases to fail. Verified by testing with the scx_qmap scheduler (located in tools/sched_ext/). Building with `make` and running ./tools/sched_ext/build/bin/scx_qmap. Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=5fb750e8a9ae Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Co-developed-by:: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Add detection for SC.Q supportGeorge Guo-30/+39
Check the CPUCFG2_SCQ bit to determine if the current CPU supports the SC.Q instruction. Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn> Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn> Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Select HAVE_CMPXCHG_LOCAL in KconfigHuacai Chen-0/+1
LoongArch has already implemented cmpxchg_local(), this_cpu_cmpxchg() and similar functions, so select HAVE_CMPXCHG_LOCAL in Kconfig to avoid incurring the overhead of local_irq_save()/local_irq_restore() for page state helpers in mm/vmstat.c. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10KVM: s390: Increase permitted SE header size to 1 MiBSteffen Eiden-2/+2
Relax the maximum allowed Secure Execution (SE) header size from 8 KiB to 1 MiB. This allows individual secure guest images to run on a wider range of physical machines. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10MAINTAINERS: Replace backup for s390 vfio-pciEric Farman-1/+2
Farhan has been doing a masterful job coming on in the s390 PCI space, and my own attention has been lacking. Let's make MAINTAINERS reflect reality. Signed-off-by: Eric Farman <farman@linux.ibm.com> Acked-by: Alex Williamson <alex@shazbot.org> Acked-by: Farhan Ali <alifm@linux.ibm.com> Acked-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10Merge branch 'hsr-implement-more-robust-duplicate-discard-algorithm'Paolo Abeni-381/+913
Felix Maurer says: ==================== hsr: Implement more robust duplicate discard algorithm The duplicate discard algorithms for PRP and HSR do not work reliably with certain link faults. Especially with packet loss on one link, the duplicate discard algorithms drop valid packets. For a more thorough description see patches 4 (for PRP) and 6 (for HSR). This patchset replaces the current algorithms (based on a drop window for PRP and highest seen sequence number for HSR) with a single new one that tracks the received sequence numbers individually (descriptions again in patches 4 and 6). The changes will lead to higher memory usage and more work to do for each packet. But I argue that this is an acceptable trade-off to make for a more robust PRP and HSR behavior with faulty links. After all, both protocols are to be used in environments where redundancy is needed and people are willing to setup special network topologies to achieve that. Some more reasoning on the overhead and expected scale of the deployment from the RFC discussion: > As for the expected scale, there are two dimensions: the number of nodes > in the network and the data rate with which they send. > > The number of nodes in the network affect the memory usage because each > node now has the block buffer. For PRP that's 64 blocks * 32 byte = > 2kbyte for each node in the node table. A PRP network doesn't have an > explicit limit for the number of nodes. However, the whole network is a > single layer-2 segment which shouldn't grow too large anyways. Even if > one really tries to put 1000 nodes into the PRP network, the memory > overhead (2Mbyte) is acceptable in my opinion. > > For HSR, the blocks would be larger because we need to track the > sequence numbers per port. I expect 64 blocks * 80 byte = 5kbyte per > node in the node table. There is no explicit limit for the size of an > HSR ring either. But I expect them to be of limited size because the > forwarding delays add up throughout the ring. I've seen vendors limiting > the ring size to 50 nodes with 100Mbit/s links and 300 with 1Gbit/s > links. In both cases I consider the memory overhead acceptable. > > The data rates are harder to reason about. In general, the data rates > for HSR and PRP are limited because too high packet rates would lead to > very fast re-use of the 16bit sequence numbers. The IEC 62439-3:2021 > mentions 100Mbit/s links and 1Gbit/s links. I don't expect HSR or PRP > networks to scale out to, e.g., 10Gbit/s links with the current > specification as this would mean that sequence numbers could repeat as > often as every ~4ms. The default constants in the IEC standard, which we > also use, are oriented at a 100Mbit/s network. > > In my tests with veth pairs, the CPU overhead didn't lead to > significantly lower data rates. The main factor limiting the data rate > at the moment, I assume, is the per-node spinlock that is taken for each > received packet. IMHO, there is a lot more to gain in terms of CPU > overhead from making this lock smaller or getting rid of it, than we > loose with the more accurate duplicate discard algorithm in this patchset. > > The CPU overhead of the algorithm benefits from the fact that in high > packet rate scenarios (where it really matters) many packets will have > sequence numbers in already initialized blocks. These packets just have > additionally: one xarray lookup, one comparison, and one bit setting. If > a block needs to be initialized (once every 128 packets plus their 128 > duplicates if all sequence numbers are seen), we will have: one > xa_erase, a bunch of memory writes, and one xa_store. > > In theory, all packets could end up in the slow path if a node sends > every 128th packet to us. If this is sent from a well behaving node, the > packet rate wouldn't be an issue anymore, though. Signed-off-by: Felix Maurer <fmaurer@redhat.com> ==================== Link: https://patch.msgid.link/cover.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10MAINTAINERS: Assign hsr selftests to HSRFelix Maurer-0/+1
Despite the HSR subsystem being orphaned at the moment due to the original maintainer being unreachable for a while, assign the selftests to the subsystem for the future. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/f4a356b96f5e0c99d9db3984ea62596c99a97469.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10selftests: hsr: Add more link fault tests for HSRFelix Maurer-28/+44
Run the packet loss and reordering tests also for both HSR versions. Now they can be removed from the hsr_ping tests completely. The timeout needs to be increased because there are 15 link fault test cases now, with each of them taking 5-6sec for the test and at most 5sec for the HSR node tables to get merged and we also want some room to make the test runs stable. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/eb6f667d3804ce63d86f0ee3fbc0e0ac9e1a209a.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10hsr: Implement more robust duplicate discard for HSRFelix Maurer-143/+131
The HSR duplicate discard algorithm had even more basic problems than the described for PRP in the previous patch. It relied only on the last received sequence number to decide if a new frame should be forwarded to any port. This does not work correctly in any case where frames are received out of order. The linked bug report claims that this can even happen with perfectly fine links due to the order in which incoming frames are processed (which can be unexpected on multi-core systems). The issue also occasionally shows up in the HSR selftests. The main reason is that the sequence number that was last forwarded to the master port may have skipped a number which will in turn never be delivered to the host. As the problem (we accidentally skip over a sequence number that has not been received but will be received in the future) is similar to PRP, we can apply a similar solution. The duplicate discard algorithm based on the "sparse bitmap" works well for HSR if it is extended to track one bitmap for each port (A, B, master, interlink). To do this, change the sequence number blocks to contain a flexible array member as the last member that can keep chunks for as many bitmaps as we need. This design makes it easy to reuse the same algorithm in a potential PRP RedBox implementation. The duplicate discard algorithm functions are modified to deal with sequence number blocks of different sizes and to correctly use the array of bitmap chunks. There is a notable speciality for HSR: the port type has a special port type NONE with value 0. This leads to the number of port types being 5 instead of actually 4. To save memory, remove the NONE port from the bitmap (by subtracting 1) when setting up the block buffer and when accessing the bitmap chunks in the array. Removing the old algorithm allows us to get rid of a few fields that are not needed any more: time_out and seq_out for each port. We can also remove some functions that were only necessary for the previous duplicate discard algorithm. The removal of seq_out is possible despite its previous usage in hsr_register_frame_in: it was used to prevent updates to time_in when "invalid" sequence numbers were received. With the new duplicate discard algorithm, time_in has no relevance for the expiry of sequence numbers anymore. They will expire based on the timestamps in the sequence number blocks after at most 400ms. There is no need that a node "re-registers" to "resume communication": after 400ms, all sequence numbers are accepted again. Also, according to the IEC 62439-3:2021, all nodes are supposed to send no traffic for 500ms after boot to lead exactly to this expiry of seen sequence numbers. time_in is still used for pruning nodes from the node table after no traffic has been received for 60sec. Pruning is only needed if the node is really gone and has not been sending any traffic for that period. seq_out was also used to report the last incoming sequence number from a node through netlink. I am not sure how useful this value is to userspace at all, but added getting it from the sequence number blocks. This number can be outdated after node merging until a new block has been added. Update the KUnit test for the PRP duplicate discard so that the node allocation matches and expectations on the removed fields are removed. Reported-by: Yoann Congal <yoann.congal@smile.fr> Closes: https://lore.kernel.org/netdev/7d221a07-8358-4c0b-a09c-3b029c052245@smile.fr/ Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/36dc3bc5bdb7e68b70bb5ef86f53ca95a3f35418.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10selftests: hsr: Add tests for more link faults with PRPFelix Maurer-5/+74
Add tests where one link has different rates of packet loss or reorders packets. PRP should still be able to recover from these link faults and show no packet loss. However, it is acceptable to receive some level of duplicate packets. This matches the current specification (IEC 62439-3:2021) of the duplicate discard algorithm that requires it to be "designed such that it never rejects a legitimate frame, while occasional acceptance of a duplicate can be tolerated." The rate of acceptable duplicates in this test is intentionally high (10%) to make the test stable, the values I observed in the worst test cases (20% loss) are around 5% duplicates. The duplicates occur because of the 10ms ping interval in the test. As blocks expire after 400ms based on the timestamp of the first received sequence number in the block, every approx. 40th will lead to a new, clean block being used where the sequence number hasn't been seen before. As this occurs on both nodes in the test (for requests and replies), we observe around 20 duplicate frames. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/7b36506d3a80e53786fe56526cf6046c74dfeee1.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10hsr: Implement more robust duplicate discard for PRPFelix Maurer-138/+237
The PRP duplicate discard algorithm does not work reliably with certain link faults. Especially with packet loss on one link, the duplicate discard algorithm drops valid packets which leads to packet loss on the PRP interface where the link fault should in theory be perfectly recoverable by PRP. This happens because the algorithm opens the drop window on the lossy link, covering received and lost sequence numbers. If the other, non-lossy link receives the duplicate for a lost frame, it is within the drop window of the lossy link and therefore dropped. Since IEC 62439-3:2012, a node has one sequence number counter for frames it sends, instead of one sequence number counter for each destination. Therefore, a node can not expect to receive contiguous sequence numbers from a sender. A missing sequence number can be totally normal (if the sender intermittently communicates with another node) or mean a frame was lost. The algorithm, as previously implemented in commit 05fd00e5e7b1 ("net: hsr: Fix PRP duplicate detection"), was part of IEC 62439-3:2010 (HSRv0/PRPv0) but was removed with IEC 62439-3:2012 (HSRv1/PRPv1). Since that, no algorithm is specified but up to implementers. It should be "designed such that it never rejects a legitimate frame, while occasional acceptance of a duplicate can be tolerated" (IEC 62439-3:2021). For the duplicate discard algorithm, this means that 1) we need to track the sequence numbers individually to account for non-contiguous sequence numbers, and 2) we should always err on the side of accepting a duplicate than dropping a valid frame. The idea of the new algorithm is to store the seen sequence numbers in a bitmap. To keep the size of the bitmap in control, we store it as a "sparse bitmap" where the bitmap is split into blocks and not all blocks exist at the same time. The sparse bitmap is implemented using an xarray that keeps the references to the individual blocks and a backing ring buffer that stores the actual blocks. New blocks are initialized in the buffer and added to the xarray as needed when new frames arrive. Existing blocks are removed in two conditions: 1. The block found for an arriving sequence number is old and therefore not relevant to the duplicate discard algorithm anymore, i.e., it has been added more than the entry forget time ago. In this case, the block is removed from the xarray and marked as forgotten (by setting its timestamp to 0). 2. Space is needed in the ring buffer for a new block. In this case, the block is removed from the xarray, if it hasn't already been forgotten (by 1.). Afterwards, the new block is initialized in its place. This has the nice property that we can reliably track sequence numbers on low traffic situations (where they expire based on their timestamp) and more quickly forget sequence numbers in high traffic situations before they potentially wrap over and repeat before they are expired. When nodes are merged, the blocks are merged as well. The timestamp of a merged block is set to the minimum of the two timestamps to never keep around a seen sequence number for too long. The bitmaps are or'd to mark all seen sequence numbers as seen. All of this still happens under seq_out_lock, to prevent concurrent access to the blocks. The KUnit test for the algorithm is updated as well. The updates are done in a way to match the original intends pretty closely. Currently, there is much knowledge about the actual algorithm baked into the tests (especially the expectations) which may need some redesign in the future. Reported-by: Steffen Lindner <steffen.lindner@de.abb.com> Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Steffen Lindner <steffen.lindner@de.abb.com> Link: https://patch.msgid.link/8ce15a996099df2df5b700969a39e7df400e8dbb.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10selftests: hsr: Add tests for faulty linksFelix Maurer-11/+272
Add a test case that can support different types of faulty links for all protocol versions (HSRv0, HSRv1, PRPv1). It starts with a baseline with fully functional links. The first faulty case is one link being cut during the ping. This test uses a different function for ping that sends more packets in shorter intervals to stress the duplicate detection algorithms a bit more and allow for future tests with other link faults (packet loss, reordering, etc.). As the link fault tests now cover the cut link for HSR and PRP, it can be removed from the hsr_ping test. Note that the removed cut link test did not really test the fault because do_ping_long takes about 1sec while the link is only cut after a 3sec sleep. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/dad52276e2c349ecb96168bef7e3001bf7becc81.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10selftests: hsr: Check duplicates on HSR with VLANFelix Maurer-120/+70
Previously the hsr_ping test only checked that all nodes in a VLAN are reachable (using do_ping). Update the test to also check that there is no packet loss and no duplicate packets by running the same tests for VLANs as without VLANs (including using do_ping_long). This also adds tests for IPv6 over VLAN. To unify the test code, the topology without VLANs now uses IP addresses from dead:beef:0::/64 to align with the 100.64.0.0/24 range for IPv4. Error messages are updated across the board to make it easier to find what actually failed. Also update the VLAN test to only run in VLAN 2, as there is no need to check if ping really works with VLAN IDs 2, 3, 4, and 5. This lowers the number of long ping tests on VLANs to keep the overall test runtime in bounds. It's still necessary to bump the test timeout a bit, though: a ping long tests takes 1sec, do_ping_tests performs 12 of them, do_link_problem_tests 6, and the VLAN tests again 12. With some buffer for setup and waiting and for two protocol versions, 90sec timeout seems reasonable. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/e3ded0e2547b5f720524b62fabeb96debc579697.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10selftests: hsr: Add ping test for PRPFelix Maurer-0/+148
Add a selftest for PRP that performs a basic ping test on IPv4 and IPv6, over the plain PRP interface and a VLAN interface, similar to the existing ping test for HSR. The test first checks reachability of the other node, then checks for no loss and no duplicates. Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/4a342189e842d7308d037da72af566729ee75834.1770299429.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10Merge patch series "Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for ↵Christian Brauner-3/+7
zombie callers"" Commit 006568ab4c5c ("pid: Add a judgment for ns null in pid_nr_ns") already added the ns != NULL check in pid_nr_ns(), so we can remove the same check from __task_pid_nr_ns(). * patches from https://patch.msgid.link/20251015123550.GA9447@redhat.com: pid: introduce task_ppid_vnr() helper Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers" Link: https://patch.msgid.link/20251015123550.GA9447@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-10pid: introduce task_ppid_vnr() helperOleg Nesterov-1/+6
Cosmetic change. Unlike all other similar helpers task_ppid_nr_ns() doesn't have a _vnr() version; add one for consistency. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://patch.msgid.link/20251015123633.GB9456@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-10mm/slab: drop the OBJEXTS_NOSPIN_ALLOC flag from enum objext_flagsHarry Yoo-12/+9
OBJEXTS_NOSPIN_ALLOC was used to remember whether a slabobj_ext vector was allocated via kmalloc_nolock(), so that free_slab_obj_exts() could call kfree_nolock() instead of kfree(). Now that kfree() supports freeing kmalloc_nolock() objects, this flag is no longer needed. Instead, pass the allow_spin parameter down to free_slab_obj_exts() to determine whether kfree_nolock() or kfree() should be called in the free path, and free one bit in enum objext_flags. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Hao Li <hao.li@linux.dev> Link: https://patch.msgid.link/20260210044642.139482-3-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-10pidfs: implement ino allocation without the pidmap lockMateusz Guzik-43/+73
This paves the way for scalable PID allocation later. The 32 bit variant merely takes a spinlock for simplicity, the 64 bit variant uses a scalable scheme. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20260120184539.1480930-1-mjguzik@gmail.com Co-developed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-10Revert "pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers"Oleg Nesterov-2/+1
This reverts commit abdfd4948e45c51b19162cf8b3f5003f8f53c9b9. The changelog in this commit explains why it is not easy to avoid ns == NULL when the caller is exiting, but pid_vnr() is equally unsafe in this case. However, commit 006568ab4c5c ("pid: Add a judgment for ns null in pid_nr_ns") already added the ns != NULL check in pid_nr_ns(), so we can remove the same check from __task_pid_nr_ns(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://patch.msgid.link/20251015123613.GA9456@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-10mm/slab: allow freeing kmalloc_nolock()'d objects using kfree[_rcu]()Harry Yoo-15/+32
Slab objects that are allocated with kmalloc_nolock() must be freed using kfree_nolock() because only a subset of alloc hooks are called, since kmalloc_nolock() can't spin on a lock during allocation. This imposes a limitation: such objects cannot be freed with kfree_rcu(), forcing users to work around this limitation by calling call_rcu() with a callback that frees the object using kfree_nolock(). Remove this limitation by teaching kmemleak to gracefully ignore cases when kmemleak_free() or kmemleak_ignore() is called without a prior kmemleak_alloc(). Unlike kmemleak, kfence already handles this case, because, due to its design, only a subset of allocations are served from kfence. With this change, kfree() and kfree_rcu() can be used to free objects that are allocated using kmalloc_nolock(). Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260210044642.139482-2-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-10pid: reorder fields in pid_namespace to reduce false sharingMateusz Guzik-7/+7
alloc_pid() loads pid_cachep, level and pid_max prior to taking the lock. It dirties idr and pid_allocated with the lock. Some of these fields share the cacheline as is, split them up. No change in the size of the struct. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20260120204820.1497002-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-10pidfs: convert rb-tree to rhashtableChristian Brauner-55/+46
Mateusz reported performance penalties [1] during task creation because pidfs uses pidmap_lock to add elements into the rbtree. Switch to an rhashtable to have separate fine-grained locking and to decouple from pidmap_lock moving all heavy manipulations outside of it. Convert the pidfs inode-to-pid mapping from an rb-tree with seqcount protection to an rhashtable. This removes the global pidmap_lock contention from pidfs_ino_get_pid() lookups and allows the hashtable insert to happen outside the pidmap_lock. pidfs_add_pid() is split. pidfs_prepare_pid() allocates inode number and initializes pid fields and is called inside pidmap_lock. pidfs_add_pid() inserts pid into rhashtable and is called outside pidmap_lock. Insertion into the rhashtable can fail and memory allocation may happen so we need to drop the spinlock. To guard against accidently opening an already reaped task pidfs_ino_get_pid() uses additional checks beyond pid_vnr(). If pid->attr is PIDFS_PID_DEAD or NULL the pid either never had a pidfd or it already went through pidfs_exit() aka the process as already reaped. If pid->attr is valid check PIDFS_ATTR_BIT_EXIT to figure out whether the task has exited. This slightly changes visibility semantics: pidfd creation is denied after pidfs_exit() runs, which is just before the pid number is removed from the via free_pid(). That should not be an issue though. Link: https://lore.kernel.org/20251206131955.780557-1-mjguzik@gmail.com [1] Link: https://patch.msgid.link/20260120-work-pidfs-rhashtable-v2-1-d593c4d0f576@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-10ipc: Add SPDX license id to mqueue.cTim Bird-2/+1
Add GPL-2.0 license id to file, replacing reference to GPL in the header comment. Signed-off-by: Tim Bird <tim.bird@sony.com> Link: https://patch.msgid.link/20260117202759.692347-1-tim.bird@sony.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-10KVM: s390: vsie: Fix race in acquire_gmap_shadow()Claudio Imbrenda-4/+17
The shadow gmap returned by gmap_create_shadow() could get dropped before taking the gmap->children_lock. This meant that the shadow gmap was sometimes being used while its reference count was 0. Fix this by taking the additional reference inside gmap_create_shadow() while still holding gmap->children_lock, instead of afterwards. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10KVM: s390: vsie: Fix race in walk_guest_tables()Claudio Imbrenda-0/+3
It is possible that walk_guest_tables() is called on a shadow gmap that has been removed already, in which case its parent will be NULL. In such case, return -EAGAIN and let the callers deal with it. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10KVM: s390: Use guest address to mark guest page dirtyClaudio Imbrenda-2/+6
Stop using the userspace address to mark the guest page dirty. mark_page_dirty() expects a guest frame number, but was being passed a host virtual frame number. When slot == NULL, mark_page_dirty_in_slot() does nothing and does not complain. This means that in some circumstances the dirtiness of the guest page might have been lost. Fix by adding two fields in struct kvm_s390_adapter_int to keep the guest addressses, and use those for mark_page_dirty(). Fixes: f65470661f36 ("KVM: s390/interrupt: do not pin adapter interrupt pages") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-10net: atm: fix crash due to unvalidated vcc pointer in sigd_send()Jiayuan Chen-2/+54
Reproducer available at [1]. The ATM send path (sendmsg -> vcc_sendmsg -> sigd_send) reads the vcc pointer from msg->vcc and uses it directly without any validation. This pointer comes from userspace via sendmsg() and can be arbitrarily forged: int fd = socket(AF_ATMSVC, SOCK_DGRAM, 0); ioctl(fd, ATMSIGD_CTRL); // become ATM signaling daemon struct msghdr msg = { .msg_iov = &iov, ... }; *(unsigned long *)(buf + 4) = 0xdeadbeef; // fake vcc pointer sendmsg(fd, &msg, 0); // kernel dereferences 0xdeadbeef In normal operation, the kernel sends the vcc pointer to the signaling daemon via sigd_enq() when processing operations like connect(), bind(), or listen(). The daemon is expected to return the same pointer when responding. However, a malicious daemon can send arbitrary pointer values. Fix this by introducing find_get_vcc() which validates the pointer by searching through vcc_hash (similar to how sigd_close() iterates over all VCCs), and acquires a reference via sock_hold() if found. Since struct atm_vcc embeds struct sock as its first member, they share the same lifetime. Therefore using sock_hold/sock_put is sufficient to keep the vcc alive while it is being used. Note that there may be a race with sigd_close() which could mark the vcc with various flags (e.g., ATM_VF_RELEASED) after find_get_vcc() returns. However, sock_hold() guarantees the memory remains valid, so this race only affects the logical state, not memory safety. [1]: https://gist.github.com/mrpre/1ba5949c45529c511152e2f4c755b0f3 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+1f22cb1769f249df9fa0@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69039850.a70a0220.5b2ed.005d.GAE@google.com/T/ Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://patch.msgid.link/20260205095501.131890-1-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10objtool/rust: add one more `noreturn` Rust functionMiguel Ojeda-1/+2
`objtool` with Rust 1.84.0 reports: rust/kernel.o: error: objtool: _RNvXNtNtCsaRPFapPOzLs_6kernel3str9parse_intaNtNtB2_7private12FromStrRadix14from_str_radix() falls through to next function _RNvXNtNtCsaRPFapPOzLs_6kernel3str9parse_intaNtNtB2_7private12FromStrRadix16from_u64_negated() This is very similar to commit c18f35e49049 ("objtool/rust: add one more `noreturn` Rust function"), which added `from_ascii_radix_panic` for Rust 1.86.0, except that Rust 1.84.0 ends up needing `from_str_radix_panic`. Thus add it to the list to fix the warning. Cc: FUJITA Tomonori <fujita.tomonori@gmail.com> Fixes: 51d9ee90ea90 ("rust: str: add radix prefixed integer parsing functions") Reported-by: Alice Ryhl <aliceryhl@google.com> Link: https://rust-for-linux.zulipchat.com/#narrow/channel/291565/topic/x/with/572427627 Tested-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260206204336.38462-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2026-02-10Merge branch 'net-fec-improve-xdp-copy-mode-and-add-af_xdp-zero-copy-support'Paolo Abeni-379/+1234
Wei Fang says: ==================== net: fec: improve XDP copy mode and add AF_XDP zero-copy support This patch set optimizes the XDP copy mode logic as follows. 1. Separate the processing of RX XDP frames from fec_enet_rx_queue(), and adds a separate function fec_enet_rx_queue_xdp() for handling XDP frames. 2. For TX XDP packets, using the batch sending method to avoid frequent MMIO writes. 3. Use the switch statement to check the tx_buf type instead of the if...else... statement, making the cleanup logic of TX BD ring cleared and more efficient. We compared the performance of XDP copy mode before and after applying this patch set, and the results show that the performance has improved. Before applying this patch set. root@imx93evk:~# ./xdp-bench tx eth0 Summary 396,868 rx/s 0 err,drop/s Summary 396,024 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench drop eth0 Summary 684,781 rx/s 0 err/s Summary 675,746 rx/s 0 err/s root@imx93evk:~# ./xdp-bench pass eth0 Summary 208,552 rx/s 0 err,drop/s Summary 208,654 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench redirect eth0 eth0 eth0->eth0 311,210 rx/s 0 err,drop/s 311,208 xmit/s eth0->eth0 310,808 rx/s 0 err,drop/s 310,809 xmit/s After applying this patch set. root@imx93evk:~# ./xdp-bench tx eth0 Summary 425,778 rx/s 0 err,drop/s Summary 426,042 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench drop eth0 Summary 698,351 rx/s 0 err/s Summary 701,882 rx/s 0 err/s root@imx93evk:~# ./xdp-bench pass eth0 Summary 210,348 rx/s 0 err,drop/s Summary 210,016 rx/s 0 err,drop/s root@imx93evk:~# ./xdp-bench redirect eth0 eth0 eth0->eth0 354,407 rx/s 0 err,drop/s 354,401 xmit/s eth0->eth0 350,381 rx/s 0 err,drop/s 350,389 xmit/s This patch set also addes the AF_XDP zero-copy support, and we tested the performance on i.MX93 platform with xdpsock tool. The following is the performance comparison of copy mode and zero-copy mode. It can be seen that the performance of zero-copy mode is better than that of copy mode. 1. MAC swap L2 forwarding 1.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -z sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 414715 415455 tx 414715 415455 1.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -c sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 356396 356609 tx 356396 356609 2. TX only 2.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -z sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 1119573 1126720 2.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -c sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 406864 407616 ==================== Link: https://patch.msgid.link/20260205085742.2685134-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10net: fec: add AF_XDP zero-copy supportWei Fang-44/+754
This patch adds AF_XDP zero-copy support for both TX and RX on the FEC driver. It introduces new functions for XSK buffer allocation, RX/TX queue processing in zero-copy mode, and XSK pool setup/teardown. For RX, fec_alloc_rxq_buffers_zc() is added to allocate RX buffers from XSK pool. And fec_enet_rx_queue_xsk() is used to process the frames from the RX queue which is bound to the AF_XDP socket. Similar to the copy mode, the zero-copy mode also supports XDP_TX, XDP_PASS, XDP_DROP and XDP_REDIRECT actions. In addition, fec_enet_xsk_tx_xmit() is similar to fec_enet_xdp_tx_xmit() and is used to handle XDP_TX action in zero-copy mode. For TX, there are two cases, one is the frames from the AF_XDP socket, so fec_enet_xsk_xmit() is added to directly transmit the frames from the socket and the buffer type is marked as FEC_TXBUF_T_XSK_XMIT. The other one is the frames from the RX queue (XDP_TX action), the buffer type is marked as FEC_TXBUF_T_XSK_TX. Therefore, fec_enet_tx_queue() could correctly clean the TX queue base on the buffer type. Also, some tests have been done on the i.MX93-EVK board with the xdpsock tool, the following are the results. Env: i.MX93 connects to a packet generator, the link speed is 1Gbps, and flow-control is off. The RX packet size is 64 bytes including FCS. Only one RX queue (CPU) is used to receive frames. 1. MAC swap L2 forwarding 1.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -z sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 414715 415455 tx 414715 415455 1.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -l -c sock0@eth0:0 l2fwd xdp-drv pps pkts 1.00 rx 356396 356609 tx 356396 356609 2. TX only 2.1 Zero-copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -z sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 1119573 1126720 2.2 Copy mode root@imx93evk:~# ./xdpsock -i eth0 -t -s 64 -c sock0@eth0:0 txonly xdp-drv pps pkts 1.00 rx 0 0 tx 406864 407616 Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-16-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10net: fec: improve fec_enet_tx_queue()Wei Fang-26/+17
To support AF_XDP zero-copy mode in the subsequent patch, the following adjustments have been made to fec_tx_queue(). 1. Change the parameters of fec_tx_queue(). 2. Some variables are initialized at the time of declaration, and the order of local variables is updated to follow the reverse xmas tree style. 3. Remove the variable xdpf and add the variable tx_buf. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-15-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10net: fec: add fec_alloc_rxq_buffers_pp() to allocate buffers from page poolWei Fang-32/+58
Currently, the buffers of RX queue are allocated from the page pool. In the subsequent patches to support XDP zero copy, the RX buffers will be allocated from the UMEM. Therefore, extract fec_alloc_rxq_buffers_pp() from fec_enet_alloc_rxq_buffers() and we will add another helper to allocate RX buffers from UMEM for the XDP zero copy mode. In addition, fec_alloc_rxq_buffers_pp() only initializes bdp->bufaddr and does not initialize other fields of bdp, because these will be initialized in fec_enet_bd_init(). Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-14-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10net: fec: move xdp_rxq_info* APIs out of fec_enet_create_page_pool()Wei Fang-18/+40
Extract fec_xdp_rxq_info_reg() from fec_enet_create_page_pool() and move it out of fec_enet_create_page_pool(), so that it can be reused in the subsequent patches to support XDP zero copy mode. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-13-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-10net: fec: remove the size parameter from fec_enet_create_page_pool()Wei Fang-3/+3
Remove the size parameter from fec_enet_create_page_pool(), since rxq->bd.ring_size already contains this information. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260205085742.2685134-12-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>