<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-09-24T00:01:05Z</updated>
<entry>
<title>nexthop: Forbid FDB status change while nexthop is in a group</title>
<updated>2025-09-24T00:01:05Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2025-09-21T15:08:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=390b3a300d7872cef9588f003b204398be69ce08'/>
<id>urn:sha1:390b3a300d7872cef9588f003b204398be69ce08</id>
<content type='text'>
The kernel forbids the creation of non-FDB nexthop groups with FDB
nexthops:

 # ip nexthop add id 1 via 192.0.2.1 fdb
 # ip nexthop add id 2 group 1
 Error: Non FDB nexthop group cannot have fdb nexthops.

And vice versa:

 # ip nexthop add id 3 via 192.0.2.2 dev dummy1
 # ip nexthop add id 4 group 3 fdb
 Error: FDB nexthop group can only have fdb nexthops.

However, as long as no routes are pointing to a non-FDB nexthop group,
the kernel allows changing the type of a nexthop from FDB to non-FDB and
vice versa:

 # ip nexthop add id 5 via 192.0.2.2 dev dummy1
 # ip nexthop add id 6 group 5
 # ip nexthop replace id 5 via 192.0.2.2 fdb
 # echo $?
 0

This configuration is invalid and can result in a NPD [1] since FDB
nexthops are not associated with a nexthop device:

 # ip route add 198.51.100.1/32 nhid 6
 # ping 198.51.100.1

Fix by preventing nexthop FDB status change while the nexthop is in a
group:

 # ip nexthop add id 7 via 192.0.2.2 dev dummy1
 # ip nexthop add id 8 group 7
 # ip nexthop replace id 7 via 192.0.2.2 fdb
 Error: Cannot change nexthop FDB status while in a group.

[1]
BUG: kernel NULL pointer dereference, address: 00000000000003c0
[...]
Oops: Oops: 0000 [#1] SMP
CPU: 6 UID: 0 PID: 367 Comm: ping Not tainted 6.17.0-rc6-virtme-gb65678cacc03 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014
RIP: 0010:fib_lookup_good_nhc+0x1e/0x80
[...]
Call Trace:
 &lt;TASK&gt;
 fib_table_lookup+0x541/0x650
 ip_route_output_key_hash_rcu+0x2ea/0x970
 ip_route_output_key_hash+0x55/0x80
 __ip4_datagram_connect+0x250/0x330
 udp_connect+0x2b/0x60
 __sys_connect+0x9c/0xd0
 __x64_sys_connect+0x18/0x20
 do_syscall_64+0xa4/0x2a0
 entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops")
Reported-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c9a4d2.050a0220.3c6139.0e63.GAE@google.com/
Tested-by: syzbot+6596516dd2b635ba2350@syzkaller.appspotmail.com
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20250921150824.149157-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: Clear tcp_sk(sk)-&gt;fastopen_rsk in tcp_disconnect().</title>
<updated>2025-09-17T23:01:52Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2025-09-15T17:56:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=45c8a6cc2bcd780e634a6ba8e46bffbdf1fc5c01'/>
<id>urn:sha1:45c8a6cc2bcd780e634a6ba8e46bffbdf1fc5c01</id>
<content type='text'>
syzbot reported the splat below where a socket had tcp_sk(sk)-&gt;fastopen_rsk
in the TCP_ESTABLISHED state. [0]

syzbot reused the server-side TCP Fast Open socket as a new client before
the TFO socket completes 3WHS:

  1. accept()
  2. connect(AF_UNSPEC)
  3. connect() to another destination

As of accept(), sk-&gt;sk_state is TCP_SYN_RECV, and tcp_disconnect() changes
it to TCP_CLOSE and makes connect() possible, which restarts timers.

Since tcp_disconnect() forgot to clear tcp_sk(sk)-&gt;fastopen_rsk, the
retransmit timer triggered the warning and the intended packet was not
retransmitted.

Let's call reqsk_fastopen_remove() in tcp_disconnect().

[0]:
WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Modules linked in:
CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.17.0-rc5-g201825fb4278 #62 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Code: 41 55 41 54 55 53 48 8b af b8 08 00 00 48 89 fb 48 85 ed 0f 84 55 01 00 00 0f b6 47 12 3c 03 74 0c 0f b6 47 12 3c 04 74 04 90 &lt;0f&gt; 0b 90 48 8b 85 c0 00 00 00 48 89 ef 48 8b 40 30 e8 6a 4f 06 3e
RSP: 0018:ffffc900002f8d40 EFLAGS: 00010293
RAX: 0000000000000002 RBX: ffff888106911400 RCX: 0000000000000017
RDX: 0000000002517619 RSI: ffffffff83764080 RDI: ffff888106911400
RBP: ffff888106d5c000 R08: 0000000000000001 R09: ffffc900002f8de8
R10: 00000000000000c2 R11: ffffc900002f8ff8 R12: ffff888106911540
R13: ffff888106911480 R14: ffff888106911840 R15: ffffc900002f8de0
FS:  0000000000000000(0000) GS:ffff88907b768000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8044d69d90 CR3: 0000000002c30003 CR4: 0000000000370ef0
Call Trace:
 &lt;IRQ&gt;
 tcp_write_timer (net/ipv4/tcp_timer.c:738)
 call_timer_fn (kernel/time/timer.c:1747)
 __run_timers (kernel/time/timer.c:1799 kernel/time/timer.c:2372)
 timer_expire_remote (kernel/time/timer.c:2385 kernel/time/timer.c:2376 kernel/time/timer.c:2135)
 tmigr_handle_remote_up (kernel/time/timer_migration.c:944 kernel/time/timer_migration.c:1035)
 __walk_groups.isra.0 (kernel/time/timer_migration.c:533 (discriminator 1))
 tmigr_handle_remote (kernel/time/timer_migration.c:1096)
 handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:580)
 irq_exit_rcu (kernel/softirq.c:614 kernel/softirq.c:453 kernel/softirq.c:680 kernel/softirq.c:696)
 sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1050 (discriminator 35) arch/x86/kernel/apic/apic.c:1050 (discriminator 35))
 &lt;/IRQ&gt;

Fixes: 8336886f786f ("tcp: TCP Fast Open Server - support TFO listeners")
Reported-by: syzkaller &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20250915175800.118793-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/tcp: Fix a NULL pointer dereference when using TCP-AO with TCP_REPAIR</title>
<updated>2025-09-14T19:49:53Z</updated>
<author>
<name>Anderson Nascimento</name>
<email>anderson@allelesecurity.com</email>
</author>
<published>2025-09-11T23:07:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2e7bba08923ebc675b1f0e0e0959e68e53047838'/>
<id>urn:sha1:2e7bba08923ebc675b1f0e0e0959e68e53047838</id>
<content type='text'>
A NULL pointer dereference can occur in tcp_ao_finish_connect() during a
connect() system call on a socket with a TCP-AO key added and TCP_REPAIR
enabled.

The function is called with skb being NULL and attempts to dereference it
on tcp_hdr(skb)-&gt;seq without a prior skb validation.

Fix this by checking if skb is NULL before dereferencing it.

The commentary is taken from bpf_skops_established(), which is also called
in the same flow. Unlike the function being patched,
bpf_skops_established() validates the skb before dereferencing it.

int main(void){
	struct sockaddr_in sockaddr;
	struct tcp_ao_add tcp_ao;
	int sk;
	int one = 1;

	memset(&amp;sockaddr,'\0',sizeof(sockaddr));
	memset(&amp;tcp_ao,'\0',sizeof(tcp_ao));

	sk = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

	sockaddr.sin_family = AF_INET;

	memcpy(tcp_ao.alg_name,"cmac(aes128)",12);
	memcpy(tcp_ao.key,"ABCDEFGHABCDEFGH",16);
	tcp_ao.keylen = 16;

	memcpy(&amp;tcp_ao.addr,&amp;sockaddr,sizeof(sockaddr));

	setsockopt(sk, IPPROTO_TCP, TCP_AO_ADD_KEY, &amp;tcp_ao,
	sizeof(tcp_ao));
	setsockopt(sk, IPPROTO_TCP, TCP_REPAIR, &amp;one, sizeof(one));

	sockaddr.sin_family = AF_INET;
	sockaddr.sin_port = htobe16(123);

	inet_aton("127.0.0.1", &amp;sockaddr.sin_addr);

	connect(sk,(struct sockaddr *)&amp;sockaddr,sizeof(sockaddr));

return 0;
}

$ gcc tcp-ao-nullptr.c -o tcp-ao-nullptr -Wall
$ unshare -Urn

BUG: kernel NULL pointer dereference, address: 00000000000000b6
PGD 1f648d067 P4D 1f648d067 PUD 1982e8067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 11/12/2020
RIP: 0010:tcp_ao_finish_connect (net/ipv4/tcp_ao.c:1182)

Fixes: 7c2ffaf21bd6 ("net/tcp: Calculate TCP-AO traffic keys")
Signed-off-by: Anderson Nascimento &lt;anderson@allelesecurity.com&gt;
Reviewed-by: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250911230743.2551-3-anderson@allelesecurity.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2025-09-11T15:54:42Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-09-11T15:54:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=db87bd2ad1f736c2f7ab231f9b40c885934f6b2c'/>
<id>urn:sha1:db87bd2ad1f736c2f7ab231f9b40c885934f6b2c</id>
<content type='text'>
Pull networking fixes from Paolo Abeni:
 "Including fixes from CAN, netfilter and wireless.

  We have an IPv6 routing regression with the relevant fix still a WiP.
  This includes a last-minute revert to avoid more problems.

  Current release - new code bugs:

   - wifi: nl80211: completely disable per-link stats for now

  Previous releases - regressions:

   - dev_ioctl: take ops lock in hwtstamp lower paths

   - netfilter:
       - fix spurious set lookup failures
       - fix lockdep splat due to missing annotation

   - genetlink: fix genl_bind() invoking bind() after -EPERM

   - phy: transfer phy_config_inband() locking responsibility to phylink

   - can: xilinx_can: fix use-after-free of transmitted SKB

   - hsr: fix lock warnings

   - eth:
       - igb: fix NULL pointer dereference in ethtool loopback test
       - i40e: fix Jumbo Frame support after iPXE boot
       - macsec: sync features on RTM_NEWLINK

  Previous releases - always broken:

   - tunnels: reset the GSO metadata before reusing the skb

   - mptcp: make sync_socket_options propagate SOCK_KEEPOPEN

   - can: j1939: implement NETDEV_UNREGISTER notification hanidler

   - wifi: ath12k: fix WMI TLV header misalignment"

* tag 'net-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
  hsr: hold rcu and dev lock for hsr_get_port_ndev
  hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
  hsr: use rtnl lock when iterating over ports
  wifi: nl80211: completely disable per-link stats for now
  net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
  net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info
  MAINTAINERS: add Phil as netfilter reviewer
  netfilter: nf_tables: restart set lookup on base_seq change
  netfilter: nf_tables: make nft_set_do_lookup available unconditionally
  netfilter: nf_tables: place base_seq in struct net
  netfilter: nft_set_rbtree: continue traversal if element is inactive
  netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
  netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
  can: rcar_can: rcar_can_resume(): fix s2ram with PSCI
  can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
  can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
  can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
  can: j1939: implement NETDEV_UNREGISTER notification handler
  selftests: can: enable CONFIG_CAN_VCAN as a module
  ...
</content>
</entry>
<entry>
<title>Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2025-09-11T14:54:16Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-09-11T14:54:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02ffd6f89c50ca0bff0c4578949ff99e70451757'/>
<id>urn:sha1:02ffd6f89c50ca0bff0c4578949ff99e70451757</id>
<content type='text'>
Pull bpf fixes from Alexei Starovoitov:
 "A number of fixes accumulated due to summer vacations

   - Fix out-of-bounds dynptr write in bpf_crypto_crypt() kfunc which
     was misidentified as a security issue (Daniel Borkmann)

   - Update the list of BPF selftests maintainers (Eduard Zingerman)

   - Fix selftests warnings with icecc compiler (Ilya Leoshkevich)

   - Disable XDP/cpumap direct return optimization (Jesper Dangaard
     Brouer)

   - Fix unexpected get_helper_proto() result in unusual configuration
     BPF_SYSCALL=y and BPF_EVENTS=n (Jiri Olsa)

   - Allow fallback to interpreter when JIT support is limited (KaFai
     Wan)

   - Fix rqspinlock and choose trylock fallback for NMI waiters. Pick
     the simplest fix. More involved fix is targeted bpf-next (Kumar
     Kartikeya Dwivedi)

   - Fix cleanup when tcp_bpf_send_verdict() fails to allocate
     psock-&gt;cork (Kuniyuki Iwashima)

   - Disallow bpf_timer in PREEMPT_RT for now. Proper solution is being
     discussed for bpf-next. (Leon Hwang)

   - Fix XSK cq descriptor production (Maciej Fijalkowski)

   - Tell memcg to use allow_spinning=false path in bpf_timer_init() to
     avoid lockup in cgroup_file_notify() (Peilin Ye)

   - Fix bpf_strnstr() to handle suffix match cases (Rong Tao)"

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Skip timer cases when bpf_timer is not supported
  bpf: Reject bpf_timer for PREEMPT_RT
  tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock-&gt;cork.
  bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
  bpf: Allow fall back to interpreter for programs with stack size &lt;= 512
  rqspinlock: Choose trylock fallback for NMI waiters
  xsk: Fix immature cq descriptor production
  bpf: Update the list of BPF selftests maintainers
  selftests/bpf: Add tests for bpf_strnstr
  selftests/bpf: Fix "expression result unused" warnings with icecc
  bpf: Fix bpf_strnstr() to handle suffix match cases better
  selftests/bpf: Extend crypto_sanity selftest with invalid dst buffer
  bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
  bpf: Check the helper function is valid in get_helper_proto
  bpf, cpumap: Disable page_pool direct xdp_return need larger scope
</content>
</entry>
<entry>
<title>tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock-&gt;cork.</title>
<updated>2025-09-10T13:53:56Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2025-09-09T23:26:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a3967baad4d533dc254c31e0d221e51c8d223d58'/>
<id>urn:sha1:a3967baad4d533dc254c31e0d221e51c8d223d58</id>
<content type='text'>
syzbot reported the splat below. [0]

The repro does the following:

  1. Load a sk_msg prog that calls bpf_msg_cork_bytes(msg, cork_bytes)
  2. Attach the prog to a SOCKMAP
  3. Add a socket to the SOCKMAP
  4. Activate fault injection
  5. Send data less than cork_bytes

At 5., the data is carried over to the next sendmsg() as it is
smaller than the cork_bytes specified by bpf_msg_cork_bytes().

Then, tcp_bpf_send_verdict() tries to allocate psock-&gt;cork to hold
the data, but this fails silently due to fault injection + __GFP_NOWARN.

If the allocation fails, we need to revert the sk-&gt;sk_forward_alloc
change done by sk_msg_alloc().

Let's call sk_msg_free() when tcp_bpf_send_verdict fails to allocate
psock-&gt;cork.

The "*copied" also needs to be updated such that a proper error can
be returned to the caller, sendmsg. It fails to allocate psock-&gt;cork.
Nothing has been corked so far, so this patch simply sets "*copied"
to 0.

[0]:
WARNING: net/ipv4/af_inet.c:156 at inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156, CPU#1: syz-executor/5983
Modules linked in:
CPU: 1 UID: 0 PID: 5983 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156
Code: 0f 0b 90 e9 62 fe ff ff e8 7a db b5 f7 90 0f 0b 90 e9 95 fe ff ff e8 6c db b5 f7 90 0f 0b 90 e9 bb fe ff ff e8 5e db b5 f7 90 &lt;0f&gt; 0b 90 e9 e1 fe ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 9f fc
RSP: 0018:ffffc90000a08b48 EFLAGS: 00010246
RAX: ffffffff8a09d0b2 RBX: dffffc0000000000 RCX: ffff888024a23c80
RDX: 0000000000000100 RSI: 0000000000000fff RDI: 0000000000000000
RBP: 0000000000000fff R08: ffff88807e07c627 R09: 1ffff1100fc0f8c4
R10: dffffc0000000000 R11: ffffed100fc0f8c5 R12: ffff88807e07c380
R13: dffffc0000000000 R14: ffff88807e07c60c R15: 1ffff1100fc0f872
FS:  00005555604c4500(0000) GS:ffff888125af1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555604df5c8 CR3: 0000000032b06000 CR4: 00000000003526f0
Call Trace:
 &lt;IRQ&gt;
 __sk_destruct+0x86/0x660 net/core/sock.c:2339
 rcu_do_batch kernel/rcu/tree.c:2605 [inline]
 rcu_core+0xca8/0x1770 kernel/rcu/tree.c:2861
 handle_softirqs+0x286/0x870 kernel/softirq.c:579
 __do_softirq kernel/softirq.c:613 [inline]
 invoke_softirq kernel/softirq.c:453 [inline]
 __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline]
 sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052
 &lt;/IRQ&gt;

Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Reported-by: syzbot+4cabd1d2fa917a456db8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c0b6b5.050a0220.3c6139.0013.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Signed-off-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Link: https://patch.msgid.link/20250909232623.4151337-1-kuniyu@google.com
</content>
</entry>
<entry>
<title>tunnels: reset the GSO metadata before reusing the skb</title>
<updated>2025-09-09T11:03:33Z</updated>
<author>
<name>Antoine Tenart</name>
<email>atenart@kernel.org</email>
</author>
<published>2025-09-04T12:53:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e3c674db356c4303804b2415e7c2b11776cdd8c3'/>
<id>urn:sha1:e3c674db356c4303804b2415e7c2b11776cdd8c3</id>
<content type='text'>
If a GSO skb is sent through a Geneve tunnel and if Geneve options are
added, the split GSO skb might not fit in the MTU anymore and an ICMP
frag needed packet can be generated. In such case the ICMP packet might
go through the segmentation logic (and dropped) later if it reaches a
path were the GSO status is checked and segmentation is required.

This is especially true when an OvS bridge is used with a Geneve tunnel
attached to it. The following set of actions could lead to the ICMP
packet being wrongfully segmented:

1. An skb is constructed by the TCP layer (e.g. gso_type SKB_GSO_TCPV4,
   segs &gt;= 2).

2. The skb hits the OvS bridge where Geneve options are added by an OvS
   action before being sent through the tunnel.

3. When the skb is xmited in the tunnel, the split skb does not fit
   anymore in the MTU and iptunnel_pmtud_build_icmp is called to
   generate an ICMP fragmentation needed packet. This is done by reusing
   the original (GSO!) skb. The GSO metadata is not cleared.

4. The ICMP packet being sent back hits the OvS bridge again and because
   skb_is_gso returns true, it goes through queue_gso_packets...

5. ...where __skb_gso_segment is called. The skb is then dropped.

6. Note that in the above example on re-transmission the skb won't be a
   GSO one as it would be segmented (len &gt; MSS) and the ICMP packet
   should go through.

Fix this by resetting the GSO information before reusing an skb in
iptunnel_pmtud_build_icmp and iptunnel_pmtud_build_icmpv6.

Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
Reported-by: Adrian Moreno &lt;amorenoz@redhat.com&gt;
Signed-off-by: Antoine Tenart &lt;atenart@kernel.org&gt;
Reviewed-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Link: https://patch.msgid.link/20250904125351.159740-1-atenart@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init()</title>
<updated>2025-09-03T23:58:44Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@linaro.org</email>
</author>
<published>2025-09-02T06:36:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a51160f8da850a65afbf165f5bbac7ffb388bf74'/>
<id>urn:sha1:a51160f8da850a65afbf165f5bbac7ffb388bf74</id>
<content type='text'>
The inetdev_init() function never returns NULL.  Check for error
pointers instead.

Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/aLaQWL9NguWmeM1i@stanley.mountain
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>icmp: fix icmp_ndo_send address translation for reply direction</title>
<updated>2025-09-01T19:54:41Z</updated>
<author>
<name>Fabian Bläse</name>
<email>fabian@blaese.de</email>
</author>
<published>2025-08-28T09:14:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c6dd1aa2cbb72b33e0569f3e71d95792beab5042'/>
<id>urn:sha1:c6dd1aa2cbb72b33e0569f3e71d95792beab5042</id>
<content type='text'>
The icmp_ndo_send function was originally introduced to ensure proper
rate limiting when icmp_send is called by a network device driver,
where the packet's source address may have already been transformed
by SNAT.

However, the original implementation only considers the
IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
source address with that of the original-direction tuple. This causes
two problems:

1. For SNAT:
   Reply-direction packets were incorrectly translated using the source
   address of the CT original direction, even though no translation is
   required.

2. For DNAT:
   Reply-direction packets were not handled at all. In DNAT, the original
   direction's destination is translated. Therefore, in the reply
   direction the source address must be set to the reply-direction
   source, so rate limiting works as intended.

Fix this by using the connection direction to select the correct tuple
for source address translation, and adjust the pre-checks to handle
reply-direction packets in case of DNAT.

Additionally, wrap the `ct-&gt;status` access in READ_ONCE(). This avoids
possible KCSAN reports about concurrent updates to `ct-&gt;status`.

Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Fabian Bläse &lt;fabian@blaese.de&gt;
Cc: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Reviewed-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: ipv4: fix regression in local-broadcast routes</title>
<updated>2025-08-28T08:52:30Z</updated>
<author>
<name>Oscar Maes</name>
<email>oscmaes92@gmail.com</email>
</author>
<published>2025-08-27T06:23:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5189446ba995556eaa3755a6e875bc06675b88bd'/>
<id>urn:sha1:5189446ba995556eaa3755a6e875bc06675b88bd</id>
<content type='text'>
Commit 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes")
introduced a regression where local-broadcast packets would have their
gateway set in __mkroute_output, which was caused by fi = NULL being
removed.

Fix this by resetting the fib_info for local-broadcast packets. This
preserves the intended changes for directed-broadcast packets.

Cc: stable@vger.kernel.org
Fixes: 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes")
Reported-by: Brett A C Sheffield &lt;bacs@librecast.net&gt;
Closes: https://lore.kernel.org/regressions/20250822165231.4353-4-bacs@librecast.net
Signed-off-by: Oscar Maes &lt;oscmaes92@gmail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20250827062322.4807-1-oscmaes92@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
</feed>
