linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Lines
2026-04-29	xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete	Michal Kosiorek	-6/+6
	KASAN reproduces a slab-use-after-free in __xfrm_state_delete()'s hlist_del_rcu calls under syzkaller load on linux-6.12.y stable (reproduced on 6.12.47, also reachable via the same code path on torvalds/master and on the ipsec tree). Nine unique signatures cluster in the xfrm_state lifecycle, the load-bearing one being: BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline] BUG: KASAN: slab-use-after-free in hlist_del_rcu include/linux/rculist.h:516 [inline] BUG: KASAN: slab-use-after-free in __xfrm_state_delete net/xfrm/xfrm_state.c Write of size 8 at addr ffff8881198bcb70 by task kworker/u8:9/435 Workqueue: netns cleanup_net Call Trace: __hlist_del / hlist_del_rcu __xfrm_state_delete xfrm_state_delete xfrm_state_flush xfrm_state_fini ops_exit_list cleanup_net The other observed signatures hit the same slab object from __xfrm_state_lookup, xfrm_alloc_spi, __xfrm_state_insert and an OOB write variant of __xfrm_state_delete, all on the byseq/byspi hash chains. __xfrm_state_delete() guards its byseq and byspi unhashes with value-based predicates: if (x->km.seq) hlist_del_rcu(&x->byseq); if (x->id.spi) hlist_del_rcu(&x->byspi); while everywhere else in the file (e.g. state_cache, state_cache_input) the safer hlist_unhashed() check is used. xfrm_alloc_spi() sets x->id.spi = newspi inside xfrm_state_lock and then immediately inserts into byspi, but a path that observes x->id.spi != 0 outside of xfrm_state_lock can still skip-or-hit the byspi unhash inconsistently with whether x is actually on the list. The same holds for x->km.seq versus byseq, and the bydst/bysrc unhashes have no predicate at all, so a second __xfrm_state_delete() on the same object writes through LIST_POISON pprev. The defensive change here: - Use hlist_del_init_rcu() instead of hlist_del_rcu() on bydst, bysrc, byseq and byspi so a second deletion is a no-op rather than a write through LIST_POISON pprev. The byseq/byspi nodes are already initialised in xfrm_state_alloc(). - Test hlist_unhashed() rather than the value predicate for byseq/byspi, so the unhash decision tracks list state rather than mutable scalar fields. Empirical verification: applied this patch on top of v6.12.47, rebuilt, and re-ran the same syzkaller harness for 1h16m on a previously-crashy configuration that produced ~100 hits each of slab-use-after-free Read in xfrm_alloc_spi / Read in __xfrm_state_lookup / Write in __xfrm_state_delete. After the patch, 7.1M execs across 32 VMs at ~1550 exec/sec produced zero xfrm_state UAF/OOB hits. /proc/slabinfo confirms the xfrm_state slab is actively allocated and freed during the run (~143 KiB resident), so the fuzzer is still exercising those code paths -- they just no longer crash. Reproduction: - Linux 6.12.47 x86_64 + KASAN_GENERIC + KASAN_INLINE + KCOV - syzkaller @ 746545b8b1e4c3a128db8652b340d3df90ce61db - 32 QEMU/KVM VMs x 2 vCPU on AWS c5.metal bare metal - 9 unique signatures collected in ~9h, all within xfrm_state lifecycle Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") Fixes: 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI is zero to the SPI hash.") Reported-by: Michal Kosiorek <mkosiorek121@gmail.com> Tested-by: Michal Kosiorek <mkosiorek121@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Michal Kosiorek <mkosiorek121@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-29	xfrm: provide message size for XFRM_MSG_MAPPING	Ruijie Li	-0/+1
	The compat 64=>32 translation path handles XFRM_MSG_MAPPING, but xfrm_msg_min[] does not provide the native payload size for this message type. Add the missing XFRM_MSG_MAPPING entry so compat translation can size and translate mapping notifications correctly. Fixes: 5461fc0c8d9f ("xfrm/compat: Add 64=>32-bit messages translator") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ruijie Li <ruijieli51@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-28	xfrm: Don't clobber inner headers when already set	Cosmin Ratiu	-6/+14
	On VXLAN over IPsec egress, xfrm{4,6}_transport_output() blindly overwrite inner_transport_header (== the inner TCP header saved in VXLAN iptunnel_handle_offloads() -> skb_reset_inner_headers()) with the current transport_header (== the VXLAN outer UDP header set by udp_tunnel_xmit_skb()). This was a latent bug, harmless until commit [1] added a doff validation check in qdisc_pkt_len_segs_init() for encapsulated GSO packets. With the wrong inner_transport_header set by xfrm, qdisc_pkt_len_segs_init() interprets inner_transport_header as a TCP header, reads doff=0 from the upper byte of the VNI and drops the packet with DROP_REASON_SKB_BAD_GSO. Besides the use in GSO to determine the header size of segmented packets, inner_transport_header might be used by drivers to set up inner checksum offloading by pointing the HW to the inner transport header. A quick browse through available drivers shows that mlx5 uses skb->csum_start specifically for this scenario, while others either don't support VXLAN over IPsec crypto offload (ixgbe) or the HW is capable of parsing the packets itself (nfp, Chelsio). But in all cases, it is more correct to let the inner_transport_header point to the innermost header instead of overwriting it in xfrm. So fix this by guarding all four inner header save sites in xfrm_output.c (xfrm{4,6}_transport_output, xfrm{4,6}_tunnel_encap_add) with a check for skb->inner_protocol. When inner_protocol is set, a tunnel layer (VXLAN, Geneve, GRE, etc.) has already saved the correct inner header offsets and they must not be overwritten. When inner_protocol is zero, no prior tunnel encapsulation exists and xfrm must save the inner headers itself. The tunnel mode checks are only added for completion, since they aren't strictly required, as xfrm_output() forces software GSO in tunnel mode before encap. This makes the previously added test pass: # ./tools/testing/selftests/drivers/net/hw/ipsec_vxlan.py TAP version 13 1..4 ok 1 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v4_inner_v4 ok 2 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v4_inner_v6 ok 3 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v6_inner_v4 ok 4 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v6_inner_v6 # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 [1] commit 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()") Fixes: f1bd7d659ef0 ("xfrm: Add encapsulation header offsets while SKB is not encrypted") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-09	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-9/+28
	Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c c3812651b522f ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") 78723a62b969a ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c fde29fd934932 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") d98adfbdd5c01 ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c 51f4e090b9f8 ("net: stmmac: fix integer underflow in chain mode") 6b4286e05508 ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-08	Merge tag 'ipsec-next-2026-04-08' of ↵	Jakub Kicinski	-21/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2026-04-08 1) Update outdated comment in xfrm_dst_check(). From kexinsun. 2) Drop support for HMAC-RIPEMD-160 from IPsec. From Eric Biggers. * tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Drop support for HMAC-RIPEMD-160 xfrm: update outdated comment ==================== Link: https://patch.msgid.link/20260408094258.148555-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-07	xfrm: Drop support for HMAC-RIPEMD-160	Eric Biggers	-20/+0
	Drop support for HMAC-RIPEMD-160 from IPsec to reduce the UAPI surface and simplify future maintenance. It's almost certainly unused. RIPEMD-160 received some attention in the early 2000s when SHA-* weren't quite as well established. But it never received much adoption outside of certain niches such as Bitcoin. It's actually unclear that Linux + IPsec + HMAC-RIPEMD-160 has ever been used, even historically. When support for it was added in 2003, it was done so in a "cleanup" commit without any justification [1]. It didn't actually work until someone happened to fix it 5 years later [2]. That person didn't use or test it either [3]. Finally, also note that "hmac(rmd160)" is by far the slowest of the algorithms in aalg_list[]. Of course, today IPsec is usually used with an AEAD, such as AES-GCM. But even for IPsec users still using a dedicated auth algorithm, they almost certainly aren't using, and shouldn't use, HMAC-RIPEMD-160. Thus, let's just drop support for it. Note: no kconfig update is needed, since CRYPTO_RMD160 wasn't actually being selected anyway. References: [1] linux-history commit d462985fc1941a47 ("[IPSEC]: Clean up key manager algorithm handling.") [2] linux commit a13366c632132bb9 ("xfrm: xfrm_algo: correct usage of RIPEMD-160") [3] https://lore.kernel.org/all/1212340578-15574-1-git-send-email-rueegsegger@swiss-it.ch Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-07	xfrm_user: fix info leak in build_report()	Greg Kroah-Hartman	-0/+1
	struct xfrm_user_report is a __u8 proto field followed by a struct xfrm_selector which means there is three "empty" bytes of padding, but the padding is never zeroed before copying to userspace. Fix that up by zeroing the structure before setting individual member variables. Cc: stable <stable@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-07	xfrm_user: fix info leak in build_mapping()	Greg Kroah-Hartman	-0/+1
	struct xfrm_usersa_id has a one-byte padding hole after the proto field, which ends up never getting set to zero before copying out to userspace. Fix that up by zeroing out the whole structure before setting individual variables. Fixes: 3a2dfbe8acb1 ("xfrm: Notify changes in UDP encapsulation via netlink") Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-07	xfrm: fix refcount leak in xfrm_migrate_policy_find	Kotlyarov Mihail	-3/+0
	syzkaller reported a memory leak in xfrm_policy_alloc: BUG: memory leak unreferenced object 0xffff888114d79000 (size 1024): comm "syz.1.17", pid 931 ... xfrm_policy_alloc+0xb3/0x4b0 net/xfrm/xfrm_policy.c:432 The root cause is a double call to xfrm_pol_hold_rcu() in xfrm_migrate_policy_find(). The lookup function already returns a policy with held reference, making the second call redundant. Remove the redundant xfrm_pol_hold_rcu() call to fix the refcount imbalance and prevent the memory leak. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Signed-off-by: Kotlyarov Mihail <mihailkotlyarow@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-07	xfrm: hold dev ref until after transport_finish NF_HOOK	Qi Tang	-4/+14
	After async crypto completes, xfrm_input_resume() calls dev_put() immediately on re-entry before the skb reaches transport_finish. The skb->dev pointer is then used inside NF_HOOK and its okfn, which can race with device teardown. Remove the dev_put from the async resumption entry and instead drop the reference after the NF_HOOK call in transport_finish, using a saved device pointer since NF_HOOK may consume the skb. This covers NF_DROP, NF_QUEUE and NF_STOLEN paths that skip the okfn. For non-transport exits (decaps, gro, drop) and secondary async return points, release the reference inline when async is set. Suggested-by: Florian Westphal <fw@strlen.de> Fixes: acf568ee859f ("xfrm: Reinject transport-mode packets through tasklet") Cc: stable@vger.kernel.org Signed-off-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-04-07	xfrm: Wait for RCU readers during policy netns exit	Steffen Klassert	-0/+2
	xfrm_policy_fini() frees the policy_bydst hash tables after flushing the policy work items and deleting all policies, but it does not wait for concurrent RCU readers to leave their read-side critical sections first. The policy_bydst tables are published via rcu_assign_pointer() and are looked up through rcu_dereference_check(), so netns teardown must also wait for an RCU grace period before freeing the table memory. Fix this by adding synchronize_rcu() before freeing the policy hash tables. Fixes: e1e551bc5630 ("xfrm: policy: prepare policy_bydst hash for rcu lookups") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Florian Westphal <fw@strlen.de>
2026-03-30	xfrm: account XFRMA_IF_ID in aevent size calculation	Keenan Dong	-2/+8
	xfrm_get_ae() allocates the reply skb with xfrm_aevent_msgsize(), then build_aevent() appends attributes including XFRMA_IF_ID when x->if_id is set. xfrm_aevent_msgsize() does not include space for XFRMA_IF_ID. For states with if_id, build_aevent() can fail with -EMSGSIZE and hit BUG_ON(err < 0) in xfrm_get_ae(), turning a malformed netlink interaction into a kernel panic. Account XFRMA_IF_ID in the size calculation unconditionally and replace the BUG_ON with normal error unwinding. Fixes: 7e6526404ade ("xfrm: Add a new lookup key to match xfrm interfaces.") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-30	xfrm: clear trailing padding in build_polexpire()	Yasuaki Torimaru	-0/+2
	build_expire() clears the trailing padding bytes of struct xfrm_user_expire after setting the hard field via memset_after(), but the analogous function build_polexpire() does not do this for struct xfrm_user_polexpire. The padding bytes after the __u8 hard field are left uninitialized from the heap allocation, and are then sent to userspace via netlink multicast to XFRMNLGRP_EXPIRE listeners, leaking kernel heap memory contents. Add the missing memset_after() call, matching build_expire(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Yasuaki Torimaru <yasuakitorimaru@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-29	net: convert remaining ipv6_stub users to direct function calls	Fernando Fernandez Mancera	-8/+4
	As IPv6 is built-in only, the ipv6_stub infrastructure is no longer necessary. Convert remaining ipv6_stub users to make direct function calls. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-9-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-72/+112
	Cross-merge networking fixes after downstream PR (net-7.0-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-24	Merge tag 'ipsec-2026-03-23' of ↵	Paolo Abeni	-72/+112
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-03-23 1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi. From Sabrina Dubroca. 2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the proper check. From Sabrina Dubroca. 3) Call xdo_dev_state_delete during state update to properly cleanup the xdo device state. From Sabrina Dubroca. 4) Fix a potential skb leak in espintcp when async crypto is used. From Sabrina Dubroca. 5) Validate inner IPv4 header length in IPTFS payload to avoid parsing malformed packets. From Roshan Kumar. 6) Fix skb_put() panic on non-linear skb during IPTFS reassembly. From Fernando Fernandez Mancera. 7) Silence various sparse warnings related to RCU, state, and policy handling. From Sabrina Dubroca. 8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini(). From Hyunwoo Kim. 9) Prevent policy_hthresh.work from racing with netns teardown by using a proper cleanup mechanism. From Minwoo Ra. 10) Validate that the family of the source and destination addresses match in pfkey_send_migrate(). From Eric Dumazet. 11) Only publish mode_data after the clone is setup in the IPTFS receive path. This prevents leaving x->mode_data pointing at freed memory on error. From Paul Moses. Please pull or let me know if there are problems. ipsec-2026-03-23 * tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: iptfs: only publish mode_data after clone setup af_key: validate families in pfkey_send_migrate() xfrm: prevent policy_hthresh.work from racing with netns teardown xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() xfrm: avoid RCU warnings around the per-netns netlink socket xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini} xfrm: state: silence sparse warnings during netns exit xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock xfrm: state: fix sparse warnings around XFRM_STATE_INSERT xfrm: state: fix sparse warnings in xfrm_state_init xfrm: state: fix sparse warnings on xfrm_state_hold_rcu xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly xfrm: iptfs: validate inner IPv4 header length in IPTFS payload esp: fix skb leak with espintcp and async crypto xfrm: call xdo_dev_state_delete during state update xfrm: fix the condition on x->pcpu_num in xfrm_sa_len xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi ==================== Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-17	xfrm: iptfs: only publish mode_data after clone setup	Paul Moses	-3/+3
	iptfs_clone_state() stores x->mode_data before allocating the reorder window. If that allocation fails, the code frees the cloned state and returns -ENOMEM, leaving x->mode_data pointing at freed memory. The xfrm clone unwind later runs destroy_state() through x->mode_data, so the failed clone path tears down IPTFS state that clone_state() already freed. Keep the cloned IPTFS state private until all allocations succeed so failed clones leave x->mode_data unset. The destroy path already handles a NULL mode_data pointer. Fixes: 6be02e3e4f37 ("xfrm: iptfs: handle reordering of received packets") Cc: stable@vger.kernel.org Signed-off-by: Paul Moses <p@1g4.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-16	xfrm: prevent policy_hthresh.work from racing with netns teardown	Minwoo Ra	-0/+2
	A XFRM_MSG_NEWSPDINFO request can queue the per-net work item policy_hthresh.work onto the system workqueue. The queued callback, xfrm_hash_rebuild(), retrieves the enclosing struct net via container_of(). If the net namespace is torn down before that work runs, the associated struct net may already have been freed, and xfrm_hash_rebuild() may then dereference stale memory. xfrm_policy_fini() already flushes policy_hash_work during teardown, but it does not synchronize policy_hthresh.work. Synchronize policy_hthresh.work in xfrm_policy_fini() as well, so the queued work cannot outlive the net namespace teardown and access a freed struct net. Fixes: 880a6fab8f6b ("xfrm: configure policy hash table thresholds by netlink") Signed-off-by: Minwoo Ra <raminwo0202@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-13	xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()	Hyunwoo Kim	-1/+1
	After cancel_delayed_work_sync() is called from xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining states via __xfrm_state_delete(), which calls xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work. The following is a simple race scenario: cpu0 cpu1 cleanup_net() [Round 1] ops_undo_list() xfrm_net_exit() xfrm_nat_keepalive_net_fini() cancel_delayed_work_sync(nat_keepalive_work); xfrm_state_fini() xfrm_state_flush() xfrm_state_delete(x) __xfrm_state_delete(x) xfrm_nat_keepalive_state_updated(x) schedule_delayed_work(nat_keepalive_work); rcu_barrier(); net_complete_free(); net_passive_dec(net); llist_add(&net->defer_free_list, &defer_free_list); cleanup_net() [Round 2] rcu_barrier(); net_complete_free() kmem_cache_free(net_cachep, net); nat_keepalive_work() // on freed net To prevent this, cancel_delayed_work_sync() is replaced with disable_delayed_work_sync(). Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: avoid RCU warnings around the per-netns netlink socket	Sabrina Dubroca	-8/+17
	net->xfrm.nlsk is used in 2 types of contexts: - fully under RCU, with rcu_read_lock + rcu_dereference and a NULL check - in the netlink handlers, with requests coming from a userspace socket In the 2nd case, net->xfrm.nlsk is guaranteed to stay non-NULL and the object is alive, since we can't enter the netns destruction path while the user socket holds a reference on the netns. After adding the __rcu annotation to netns_xfrm.nlsk (which silences sparse warnings in the RCU users and __net_init code), we need to tell sparse that the 2nd case is safe. Add a helper for that. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo	Sabrina Dubroca	-1/+4
	xfrm_input_afinfo is __rcu, we should use rcu_access_pointer to avoid a sparse warning: net/xfrm/xfrm_input.c:78:21: error: incompatible types in comparison expression (different address spaces): net/xfrm/xfrm_input.c:78:21: struct xfrm_input_afinfo const [noderef] __rcu * net/xfrm/xfrm_input.c:78:21: struct xfrm_input_afinfo const * Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo	Sabrina Dubroca	-1/+1
	xfrm_policy_afinfo is __rcu, use rcu_access_pointer to silence: net/xfrm/xfrm_policy.c:4152:43: error: incompatible types in comparison expression (different address spaces): net/xfrm/xfrm_policy.c:4152:43: struct xfrm_policy_afinfo const [noderef] __rcu * net/xfrm/xfrm_policy.c:4152:43: struct xfrm_policy_afinfo const * Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini}	Sabrina Dubroca	-4/+4
	In xfrm_policy_init: add rcu_assign_pointer to fix warning: net/xfrm/xfrm_policy.c:4238:29: warning: incorrect type in assignment (different address spaces) net/xfrm/xfrm_policy.c:4238:29: expected struct hlist_head [noderef] __rcu table net/xfrm/xfrm_policy.c:4238:29: got struct hlist_head add rcu_dereference_protected to silence warning: net/xfrm/xfrm_policy.c:4265:36: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_policy.c:4265:36: expected struct hlist_head n net/xfrm/xfrm_policy.c:4265:36: got struct hlist_head [noderef] __rcu table The netns is being created, no concurrent access is possible yet. In xfrm_policy_fini, net is going away, there shouldn't be any concurrent changes to the hashtables, so we can use rcu_dereference_protected to silence warnings: net/xfrm/xfrm_policy.c:4291:17: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_policy.c:4291:17: expected struct hlist_head const h net/xfrm/xfrm_policy.c:4291:17: got struct hlist_head [noderef] __rcu table net/xfrm/xfrm_policy.c:4292:36: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_policy.c:4292:36: expected struct hlist_head n net/xfrm/xfrm_policy.c:4292:36: got struct hlist_head [noderef] __rcu table Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: state: silence sparse warnings during netns exit	Sabrina Dubroca	-8/+10
	Silence sparse warnings in xfrm_state_fini: net/xfrm/xfrm_state.c:3327:9: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_state.c:3327:9: expected struct hlist_head const h net/xfrm/xfrm_state.c:3327:9: got struct hlist_head [noderef] __rcu state_byseq Add xfrm_state_deref_netexit() to wrap those calls. The netns is going away, we don't have to worry about the state_by* pointers being changed behind our backs. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto	Sabrina Dubroca	-9/+2
	xfrm_state_lookup_spi_proto is called under xfrm_state_lock by xfrm_alloc_spi, no need to take a reference on the state and pretend to be under RCU. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock	Sabrina Dubroca	-11/+11
	We're under xfrm_state_lock for all those walks, we can use xfrm_state_deref_prot to silence sparse warnings such as: net/xfrm/xfrm_state.c:933:17: warning: dereference of noderef expression Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: state: fix sparse warnings around XFRM_STATE_INSERT	Sabrina Dubroca	-11/+19
	We're under xfrm_state_lock in all those cases, use xfrm_state_deref_prot(state_by) to avoid sparse warnings: net/xfrm/xfrm_state.c:2597:25: warning: cast removes address space '__rcu' of expression net/xfrm/xfrm_state.c:2597:25: warning: incorrect type in argument 2 (different address spaces) net/xfrm/xfrm_state.c:2597:25: expected struct hlist_head h net/xfrm/xfrm_state.c:2597:25: got struct hlist_head [noderef] __rcu * Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: state: fix sparse warnings in xfrm_state_init	Sabrina Dubroca	-12/+20
	Use rcu_assign_pointer, and tmp variables for freeing on the error path without accessing net->xfrm.state_by*. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12	xfrm: state: fix sparse warnings on xfrm_state_hold_rcu	Sabrina Dubroca	-1/+1
	In all callers, x is not an __rcu pointer. We can drop the annotation to avoid sparse warnings: net/xfrm/xfrm_state.c:58:39: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_state.c:58:39: expected struct refcount_struct [usertype] r net/xfrm/xfrm_state.c:58:39: got struct refcount_struct [noderef] __rcu net/xfrm/xfrm_state.c:1166:42: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_state.c:1166:42: expected struct xfrm_state [noderef] __rcu x net/xfrm/xfrm_state.c:1166:42: got struct xfrm_state [assigned] x (repeated for each caller) Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-09	xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly	Fernando Fernandez Mancera	-0/+6
	In iptfs_reassem_cont(), IP-TFS attempts to append data to the new inner packet 'newskb' that is being reassembled. First a zero-copy approach is tried if it succeeds then newskb becomes non-linear. When a subsequent fragment in the same datagram does not meet the fast-path conditions, a memory copy is performed. It calls skb_put() to append the data and as newskb is non-linear it triggers SKB_LINEAR_ASSERT check. Oops: invalid opcode: 0000 [#1] SMP NOPTI [...] RIP: 0010:skb_put+0x3c/0x40 [...] Call Trace: <IRQ> iptfs_reassem_cont+0x1ab/0x5e0 [xfrm_iptfs] iptfs_input_ordered+0x2af/0x380 [xfrm_iptfs] iptfs_input+0x122/0x3e0 [xfrm_iptfs] xfrm_input+0x91e/0x1a50 xfrm4_esp_rcv+0x3a/0x110 ip_protocol_deliver_rcu+0x1d7/0x1f0 ip_local_deliver_finish+0xbe/0x1e0 __netif_receive_skb_core.constprop.0+0xb56/0x1120 __netif_receive_skb_list_core+0x133/0x2b0 netif_receive_skb_list_internal+0x1ff/0x3f0 napi_complete_done+0x81/0x220 virtnet_poll+0x9d6/0x116e [virtio_net] __napi_poll.constprop.0+0x2b/0x270 net_rx_action+0x162/0x360 handle_softirqs+0xdc/0x510 __irq_exit_rcu+0xe7/0x110 irq_exit_rcu+0xe/0x20 common_interrupt+0x85/0xa0 </IRQ> <TASK> Fix this by checking if the skb is non-linear. If it is, linearize it by calling skb_linearize(). As the initial allocation of newskb originally reserved enough tailroom for the entire reassembled packet we do not need to check if we have enough tailroom or extend it. Fixes: 5f2b6a909574 ("xfrm: iptfs: add skb-fragment sharing code") Reported-by: Hao Long <me@imlonghao.com> Closes: https://lore.kernel.org/netdev/DGRCO9SL0T5U.JTINSHJQ9KPK@imlonghao.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-02	net: remove addr_len argument of recvmsg() handlers	Eric Dumazet	-1/+1
	Use msg->msg_namelen as a place holder instead of a temporary variable, notably in inet[6]_recvmsg(). This removes stack canaries and allows tail-calls. $ scripts/bloat-o-meter -t vmlinux.old vmlinux add/remove: 0/0 grow/shrink: 2/19 up/down: 26/-532 (-506) Function old new delta rawv6_recvmsg 744 767 +23 vsock_dgram_recvmsg 55 58 +3 vsock_connectible_recvmsg 50 47 -3 unix_stream_recvmsg 161 158 -3 unix_seqpacket_recvmsg 62 59 -3 unix_dgram_recvmsg 42 39 -3 tcp_recvmsg 546 543 -3 mptcp_recvmsg 1568 1565 -3 ping_recvmsg 806 800 -6 tcp_bpf_recvmsg_parser 983 974 -9 ip_recv_error 588 576 -12 ipv6_recv_rxpmtu 442 428 -14 udp_recvmsg 1243 1224 -19 ipv6_recv_error 1046 1024 -22 udpv6_recvmsg 1487 1461 -26 raw_recvmsg 465 437 -28 udp_bpf_recvmsg 1027 984 -43 sock_common_recvmsg 103 27 -76 inet_recvmsg 257 175 -82 inet6_recvmsg 257 175 -82 tcp_bpf_recvmsg 663 568 -95 Total: Before=25143834, After=25143328, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260227151120.1346573-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02	xfrm: iptfs: validate inner IPv4 header length in IPTFS payload	Roshan Kumar	-0/+5
	Add validation of the inner IPv4 packet tot_len and ihl fields parsed from decrypted IPTFS payloads in __input_process_payload(). A crafted ESP packet containing an inner IPv4 header with tot_len=0 causes an infinite loop: iplen=0 leads to capturelen=min(0, remaining)=0, so the data offset never advances and the while(data < tail) loop never terminates, spinning forever in softirq context. Reject inner IPv4 packets where tot_len < ihl4 or ihl4 < sizeof(struct iphdr), which catches both the tot_len=0 case and malformed ihl values. The normal IP stack performs this validation in ip_rcv_core(), but IPTFS extracts and processes inner packets before they reach that layer. Reported-by: Roshan Kumar <roshaen09@gmail.com> Fixes: 6c82d2433671 ("xfrm: iptfs: add basic receive packet (tunnel egress) handling") Cc: stable@vger.kernel.org Signed-off-by: Roshan Kumar <roshaen09@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-26	Merge tag 'net-7.0-rc2' of ↵	Linus Torvalds	-4/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...
2026-02-26	xfrm: update outdated comment	kexinsun	-1/+1
	The function __xfrm4_bundle_create() was refactored into xfrm_bundle_create() (among others) by commit 25ee3286dcbc ("xfrm: Merge common code into xfrm_bundle_create"). The responsibility for setting dst->obsolete to DST_OBSOLETE_FORCE_CHK now lives in xfrm_bundle_create(). Update the comment accordingly. Signed-off-by: kexinsun <kexinsun@smail.nju.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-25	xfrm: call xdo_dev_state_delete during state update	Sabrina Dubroca	-0/+1
	When we update an SA, we construct a new state and call xdo_dev_state_add, but never insert it. The existing state is updated, then we immediately destroy the new state. Since we haven't added it, we don't go through the standard state delete code, and we're skipping removing it from the device (but xdo_dev_state_free will get called when we destroy the temporary state). This is similar to commit c5d4d7d83165 ("xfrm: Fix deletion of offloaded SAs on failure."). Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-25	xfrm: fix the condition on x->pcpu_num in xfrm_sa_len	Sabrina Dubroca	-1/+1
	pcpu_num = 0 is a valid value. The marker for "unset pcpu_num" which makes copy_to_user_state_extra not add the XFRMA_SA_PCPU attribute is UINT_MAX. Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-25	xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi	Sabrina Dubroca	-1/+4
	We're returning an error caused by invalid user input without setting an extack. Add one. Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-22	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses	Kees Cook	-3/+2
	Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument	Linus Torvalds	-3/+3
	This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	treewide: Replace kmalloc with kmalloc_obj for non-scalar types	Kees Cook	-10/+11
	This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-20	Merge tag 'ipsec-2026-02-20' of ↵	Jakub Kicinski	-3/+20
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-02-20 1) Check the value of ipv6_dev_get_saddr() to fix an uninitialized saddr in xfrm6_get_saddr(). From Jiayuan Chen. 2) Skip the templates check for packet offload in tunnel mode. Is was already done by the hardware and causes an unexpected XfrmInTmplMismatch increase. From Leon Romanovsky. 3) Fix a unregister_netdevice stall due to not dropped refcounts by always flushing xfrm state and policy on a NETDEV_UNREGISTER event. From Tetsuo Handa. * tag 'ipsec-2026-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: always flush state and policy upon NETDEV_UNREGISTER event xfrm: skip templates check for packet offload tunnel mode xfrm6: fix uninitialized saddr in xfrm6_get_saddr() ==================== Link: https://patch.msgid.link/20260220094133.14219-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-19	espintcp: Fix race condition in espintcp_close()	Hyunwoo Kim	-1/+1
	This issue was discovered during a code audit. After cancel_work_sync() is called from espintcp_close(), espintcp_tx_work() can still be scheduled from paths such as the Delayed ACK handler or ksoftirqd. As a result, the espintcp_tx_work() worker may dereference a freed espintcp ctx or sk. The following is a simple race scenario: cpu0 cpu1 espintcp_close() cancel_work_sync(&ctx->work); espintcp_write_space() schedule_work(&ctx->work); To prevent this race condition, cancel_work_sync() is replaced with disable_work_sync(). Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/aZSie7rEdh9Nu0eM@v4bel Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10	Merge tag 'bpf-next-7.0' of ↵	Linus Torvalds	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support associating BPF program with struct_ops (Amery Hung) - Switch BPF local storage to rqspinlock and remove recursion detection counters which were causing false positives (Amery Hung) - Fix live registers marking for indirect jumps (Anton Protopopov) - Introduce execution context detection BPF helpers (Changwoo Min) - Improve verifier precision for 32bit sign extension pattern (Cupertino Miranda) - Optimize BTF type lookup by sorting vmlinux BTF and doing binary search (Donglin Peng) - Allow states pruning for misc/invalid slots in iterator loops (Eduard Zingerman) - In preparation for ASAN support in BPF arenas teach libbpf to move global BPF variables to the end of the region and enable arena kfuncs while holding locks (Emil Tsalapatis) - Introduce support for implicit arguments in kfuncs and migrate a number of them to new API. This is a prerequisite for cgroup sub-schedulers in sched-ext (Ihor Solodrai) - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen) - Fix ORC stack unwind from kprobe_multi (Jiri Olsa) - Speed up fentry attach by using single ftrace direct ops in BPF trampolines (Jiri Olsa) - Require frozen map for calculating map hash (KP Singh) - Fix lock entry creation in TAS fallback in rqspinlock (Kumar Kartikeya Dwivedi) - Allow user space to select cpu in lookup/update operations on per-cpu array and hash maps (Leon Hwang) - Make kfuncs return trusted pointers by default (Matt Bobrowski) - Introduce "fsession" support where single BPF program is executed upon entry and exit from traced kernel function (Menglong Dong) - Allow bpf_timer and bpf_wq use in all programs types (Mykyta Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei Starovoitov) - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their definition across the tree (Puranjay Mohan) - Allow BPF arena calls from non-sleepable context (Puranjay Mohan) - Improve register id comparison logic in the verifier and extend linked registers with negative offsets (Puranjay Mohan) - In preparation for BPF-OOM introduce kfuncs to access memcg events (Roman Gushchin) - Use CFI compatible destructor kfunc type (Sami Tolvanen) - Add bitwise tracking for BPF_END in the verifier (Tianci Cao) - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou Tang) - Make BPF selftests work with 64k page size (Yonghong Song) * tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits) selftests/bpf: Fix outdated test on storage->smap selftests/bpf: Choose another percpu variable in bpf for btf_dump test selftests/bpf: Remove test_task_storage_map_stress_lookup selftests/bpf: Update task_local_storage/task_storage_nodeadlock test selftests/bpf: Update task_local_storage/recursion test selftests/bpf: Update sk_storage_omem_uncharge test bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} bpf: Support lockless unlink when freeing map or local storage bpf: Prepare for bpf_selem_unlink_nofail() bpf: Remove unused percpu counter from bpf_local_storage_map_free bpf: Remove cgroup local storage percpu counter bpf: Remove task local storage percpu counter bpf: Change local_storage->lock and b->lock to rqspinlock bpf: Convert bpf_selem_unlink to failable bpf: Convert bpf_selem_link_map to failable bpf: Convert bpf_selem_unlink_map to failable bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage selftests/xsk: fix number of Tx frags in invalid packet selftests/xsk: properly handle batch ending in the middle of a packet bpf: Prevent reentrance into call_rcu_tasks_trace() ...
2026-02-09	xfrm: always flush state and policy upon NETDEV_UNREGISTER event	Tetsuo Handa	-1/+11
	syzbot is reporting that "struct xfrm_state" refcount is leaking. unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 2 ref_tracker: netdev@ffff888052f24618 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4400 [inline] netdev_tracker_alloc include/linux/netdevice.h:4412 [inline] xfrm_dev_state_add+0x3a5/0x1080 net/xfrm/xfrm_device.c:316 xfrm_state_construct net/xfrm/xfrm_user.c:986 [inline] xfrm_add_sa+0x34ff/0x5fa0 net/xfrm/xfrm_user.c:1022 xfrm_user_rcv_msg+0x58e/0xc00 net/xfrm/xfrm_user.c:3507 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2550 xfrm_netlink_rcv+0x71/0x90 net/xfrm/xfrm_user.c:3529 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa5d/0xc30 net/socket.c:2592 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2646 __sys_sendmsg+0x16d/0x220 net/socket.c:2678 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f This is because commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") implemented xfrm_dev_unregister() as no-op despite xfrm_dev_state_add() from xfrm_state_construct() acquires a reference to "struct net_device". I guess that that commit expected that NETDEV_DOWN event is fired before NETDEV_UNREGISTER event fires, and also assumed that xfrm_dev_state_add() is called only if (dev->features & NETIF_F_HW_ESP) != 0. Sabrina Dubroca identified steps to reproduce the same symptoms as below. echo 0 > /sys/bus/netdevsim/new_device dev=$(ls -1 /sys/bus/netdevsim/devices/netdevsim0/net/) ip xfrm state add src 192.168.13.1 dst 192.168.13.2 proto esp \ spi 0x1000 mode tunnel aead 'rfc4106(gcm(aes))' $key 128 \ offload crypto dev $dev dir out ethtool -K $dev esp-hw-offload off echo 0 > /sys/bus/netdevsim/del_device Like these steps indicate, the NETIF_F_HW_ESP bit can be cleared after xfrm_dev_state_add() acquired a reference to "struct net_device". Also, xfrm_dev_state_add() does not check for the NETIF_F_HW_ESP bit when acquiring a reference to "struct net_device". Commit 03891f820c21 ("xfrm: handle NETDEV_UNREGISTER for xfrm device") re-introduced the NETDEV_UNREGISTER event to xfrm_dev_event(), but that commit for unknown reason chose to share xfrm_dev_down() between the NETDEV_DOWN event and the NETDEV_UNREGISTER event. I guess that that commit missed the behavior in the previous paragraph. Therefore, we need to re-introduce xfrm_dev_unregister() in order to release the reference to "struct net_device" by unconditionally flushing state and policy. Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-02	xfrm: skip templates check for packet offload tunnel mode	Leon Romanovsky	-2/+9
	In packet offload, hardware is responsible to check templates. The result of its operation is forwarded through secpath by relevant drivers. That secpath is actually removed in __xfrm_policy_check2(). In case packet is forwarded, this secpath is reset in RX, but pushed again to TX where policy is rechecked again against dummy secpath in xfrm_policy_ok(). Such situation causes to unexpected XfrmInTmplMismatch increase. As a solution, simply skip template mismatch check. Fixes: 600258d555f0 ("xfrm: delete intermediate secpath entry in packet offload mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-01-02	bpf: xfrm: drop dead NULL check in bpf_xdp_get_xfrm_state()	Puranjay Mohan	-1/+1
	As KF_TRUSTED_ARGS is now considered the default for all kfuncs, the opts parameter in bpf_xdp_get_xfrm_state() can never be NULL. Verifier will detect this at load time and will not allow passing NULL to this function. This matches the documentation above the kfunc that says this parameter (opts) Cannot be NULL. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260102180038.2708325-5-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-15	xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is set	Antony Antony	-0/+1
	The XFRM_STATE_NOPMTUDISC flag is only meaningful for output SAs, but it was being applied regardless of the SA direction when the sysctl ip_no_pmtu_disc is enabled. This can unintentionally affect input SAs. Limit setting XFRM_STATE_NOPMTUDISC to output SAs when the SA direction is configured. Closes: https://github.com/strongswan/strongswan/issues/2946 Fixes: a4a87fa4e96c ("xfrm: Add Direction to the SA in or out") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-11-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-12/+36
	Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18	Merge tag 'ipsec-2025-11-18' of ↵	Jakub Kicinski	-12/+36
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-11-18 1) Misc fixes for xfrm_state creation/modification/deletion. Patchset from Sabrina Dubroca. 2) Fix inner packet family determination for xfrm offloads. From Jianbo Liu. 3) Don't push locally generated packets directly to L2 tunnel mode offloading, they still need processing from the standard xfrm path. From Jianbo Liu. 4) Fix memory leaks in xfrm_add_acquire for policy offloads and policy security contexts. From Zilin Guan. * tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix memory leak in xfrm_add_acquire() xfrm: Prevent locally generated packets from direct output in tunnel mode xfrm: Determine inner GSO type from packet inner protocol xfrm: Check inner packet family directly from skb_dst xfrm: check all hash buckets for leftover states during netns deletion xfrm: set err and extack on failure to create pcpu SA xfrm: call xfrm_dev_state_delete when xfrm_state_migrate fails to add the state xfrm: make state as DEAD before final put when migrate fails xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added xfrm: drop SA reference in xfrm_state_update if dir doesn't match ==================== Link: https://patch.msgid.link/20251118085344.2199815-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14	xfrm: fix memory leak in xfrm_add_acquire()	Zilin Guan	-0/+3
	The xfrm_add_acquire() function constructs an xfrm policy by calling xfrm_policy_construct(). This allocates the policy structure and potentially associates a security context and a device policy with it. However, at the end of the function, the policy object is freed using only kfree() . This skips the necessary cleanup for the security context and device policy, leading to a memory leak. To fix this, invoke the proper cleanup functions xfrm_dev_policy_delete(), xfrm_dev_policy_free(), and security_xfrm_policy_free() before freeing the policy object. This approach mirrors the error handling path in xfrm_add_policy(), ensuring that all associated resources are correctly released. Fixes: 980ebd25794f ("[IPSEC]: Sync series - acquire insert") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>