<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/netfilter, branch v6.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-07-17T09:23:33Z</updated>
<entry>
<title>netfilter: nf_conntrack: fix crash due to removal of uninitialised entry</title>
<updated>2025-07-17T09:23:33Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-07-16T18:39:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2d72afb340657f03f7261e9243b44457a9228ac7'/>
<id>urn:sha1:2d72afb340657f03f7261e9243b44457a9228ac7</id>
<content type='text'>
A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list:
    [exception RIP: __nf_ct_delete_from_lists+172]
    [..]
 #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]
 #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]
 #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]
    [..]

The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state:

 ct hlist pointer is garbage; looks like the ct hash value
 (hence crash).
 ct-&gt;status is equal to IPS_CONFIRMED|IPS_DYING, which is expected
 ct-&gt;timeout is 30000 (=30s), which is unexpected.

Everything else looks like normal udp conntrack entry.  If we ignore
ct-&gt;status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash:
  - ct hlist pointers are overloaded and store/cache the raw tuple hash
  - ct-&gt;timeout matches the relative time expected for a new udp flow
    rather than the absolute 'jiffies' value.

If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry.

Theory is that we did hit following race:

cpu x 			cpu y			cpu z
 found entry E		found entry E
 E is expired		&lt;preemption&gt;
 nf_ct_delete()
 return E to rcu slab
					init_conntrack
					E is re-inited,
					ct-&gt;status set to 0
					reply tuplehash hnnode.pprev
					stores hash value.

cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z.  cpu y was preempted before
checking for expiry and/or confirm bit.

					-&gt;refcnt set to 1
					E now owned by skb
					-&gt;timeout set to 30000

If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit.

					nf_conntrack_confirm gets called
					sets: ct-&gt;status |= CONFIRMED
					This is wrong: E is not yet added
					to hashtable.

cpu y resumes, it observes E as expired but CONFIRMED:
			&lt;resumes&gt;
			nf_ct_expired()
			 -&gt; yes (ct-&gt;timeout is 30s)
			confirmed bit set.

cpu y will try to delete E from the hashtable:
			nf_ct_delete() -&gt; set DYING bit
			__nf_ct_delete_from_lists

Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks:

			wait for spinlock held by z

					CONFIRMED is set but there is no
					guarantee ct will be added to hash:
					"chaintoolong" or "clash resolution"
					logic both skip the insert step.
					reply hnnode.pprev still stores the
					hash value.

					unlocks spinlock
					return NF_DROP
			&lt;unblocks, then
			 crashes on hlist_nulls_del_rcu pprev&gt;

In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs.

Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -&gt; nf_conntrack_destroy -&gt; nf_ct_destroy.

To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock.

Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this.

It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set:

Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped.

Also change nf_ct_should_gc() to first check the confirmed bit.

The gc sequence is:
 1. Check if entry has expired, if not skip to next entry
 2. Obtain a reference to the expired entry.
 3. Call nf_ct_should_gc() to double-check step 1.

nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check passes
ct-&gt;timeout has been altered to reflect the absolute 'best before' date
instead of a relative time.  Step 3 will therefore not remove the entry.

Without this change to nf_ct_should_gc() we could still get this sequence:

 1. Check if entry has expired.
 2. Obtain a reference.
 3. Call nf_ct_should_gc() to double-check step 1:
    4 - entry is still observed as expired
    5 - meanwhile, ct-&gt;timeout is corrected to absolute value on other CPU
      and confirm bit gets set
    6 - confirm bit is seen
    7 - valid entry is removed again

First do check 6), then 4) so the gc expiry check always picks up either
confirmed bit unset (entry gets skipped) or expiry re-check failure for
re-inited conntrack objects.

This change cannot be backported to releases before 5.19. Without
commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list")
|= IPS_CONFIRMED line cannot be moved without further changes.

Cc: Razvan Cojocaru &lt;rzvncj@gmail.com&gt;
Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/
Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/
Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>Revert "netfilter: nf_tables: Add notifications for hook changes"</title>
<updated>2025-07-14T13:22:47Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-07-10T16:43:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=36a686c0784fcccdaa4f38b498a9ef0d42ea7cb8'/>
<id>urn:sha1:36a686c0784fcccdaa4f38b498a9ef0d42ea7cb8</id>
<content type='text'>
This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82.

Such notifications fit better into core or nfnetlink_hook code,
following the NFNL_MSG_HOOK_GET message format.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: flowtable: account for Ethernet header in nf_flow_pppoe_proto()</title>
<updated>2025-07-11T00:12:28Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-07-07T12:45:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=18cdb3d982da8976b28d57691eb256ec5688fad2'/>
<id>urn:sha1:18cdb3d982da8976b28d57691eb256ec5688fad2</id>
<content type='text'>
syzbot found a potential access to uninit-value in nf_flow_pppoe_proto()

Blamed commit forgot the Ethernet header.

BUG: KMSAN: uninit-value in nf_flow_offload_inet_hook+0x7e4/0x940 net/netfilter/nf_flow_table_inet.c:27
  nf_flow_offload_inet_hook+0x7e4/0x940 net/netfilter/nf_flow_table_inet.c:27
  nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline]
  nf_hook_slow+0xe1/0x3d0 net/netfilter/core.c:623
  nf_hook_ingress include/linux/netfilter_netdev.h:34 [inline]
  nf_ingress net/core/dev.c:5742 [inline]
  __netif_receive_skb_core+0x4aff/0x70c0 net/core/dev.c:5837
  __netif_receive_skb_one_core net/core/dev.c:5975 [inline]
  __netif_receive_skb+0xcc/0xac0 net/core/dev.c:6090
  netif_receive_skb_internal net/core/dev.c:6176 [inline]
  netif_receive_skb+0x57/0x630 net/core/dev.c:6235
  tun_rx_batched+0x1df/0x980 drivers/net/tun.c:1485
  tun_get_user+0x4ee0/0x6b40 drivers/net/tun.c:1938
  tun_chr_write_iter+0x3e9/0x5c0 drivers/net/tun.c:1984
  new_sync_write fs/read_write.c:593 [inline]
  vfs_write+0xb4b/0x1580 fs/read_write.c:686
  ksys_write fs/read_write.c:738 [inline]
  __do_sys_write fs/read_write.c:749 [inline]

Reported-by: syzbot+bf6ed459397e307c3ad2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/686bc073.a00a0220.c7b3.0086.GAE@google.com/T/#u
Fixes: 87b3593bed18 ("netfilter: flowtable: validate pppoe header")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Link: https://patch.msgid.link/20250707124517.614489-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Add notifications for hook changes</title>
<updated>2025-05-23T11:57:14Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-05-21T20:44:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=465b9ee0ee7bc268d7f261356afd6c4262e48d82'/>
<id>urn:sha1:465b9ee0ee7bc268d7f261356afd6c4262e48d82</id>
<content type='text'>
Notify user space if netdev hooks are updated due to netdev add/remove
events. Send minimal notification messages by introducing
NFT_MSG_NEWDEV/DELDEV message types describing a single device only.

Upon NETDEV_CHANGENAME, the callback has no information about the
interface's old name. To provide a clear message to user space, include
the hook's stored interface name in the notification.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Have a list of nf_hook_ops in nft_hook</title>
<updated>2025-05-23T11:57:13Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-05-21T20:44:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=73319a8ee18b9cf0b2dac87f8521595e0381ba0c'/>
<id>urn:sha1:73319a8ee18b9cf0b2dac87f8521595e0381ba0c</id>
<content type='text'>
Supporting a 1:n relationship between nft_hook and nf_hook_ops is
convenient since a chain's or flowtable's nft_hooks may remain in place
despite matching interfaces disappearing. This stabilizes ruleset dumps
in that regard and opens the possibility to claim newly added interfaces
which match the spec. Also it prepares for wildcard interface specs
since these will potentially match multiple interfaces.

All spots dealing with hook registration are updated to handle a list of
multiple nf_hook_ops, but nft_netdev_hook_alloc() only adds a single
item for now to retain the old behaviour. The only expected functional
change here is how vanishing interfaces are handled: Instead of dropping
the respective nft_hook, only the matching nf_hook_ops are dropped.

To safely remove individual ops from the list in netdev handlers, an
rcu_head is added to struct nf_hook_ops so kfree_rcu() may be used.
There is at least nft_flowtable_find_dev() which may be iterating
through the list at the same time.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()</title>
<updated>2025-05-23T11:57:12Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-05-21T20:44:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e225376d78fb2d85e99a2436a9e65765dc1ac234'/>
<id>urn:sha1:e225376d78fb2d85e99a2436a9e65765dc1ac234</id>
<content type='text'>
Also a pretty dull wrapper around the hook-&gt;ops.dev comparison for now.
Will search the embedded nf_hook_ops list in future. The ugly cast to
eliminate the const qualifier will vanish then, too.

Since this future list will be RCU-protected, also introduce an _rcu()
variant here.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: nft_fib: consistent l3mdev handling</title>
<updated>2025-05-23T11:57:09Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-05-21T09:38:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9a119669fb1924cd9658c16da39a5a585e129e50'/>
<id>urn:sha1:9a119669fb1924cd9658c16da39a5a585e129e50</id>
<content type='text'>
fib has two modes:
1. Obtain output device according to source or destination address
2. Obtain the type of the address, e.g. local, unicast, multicast.

'fib daddr type' should return 'local' if the address is configured
in this netns or unicast otherwise.

'fib daddr . iif type' should return 'local' if the address is configured
on the input interface or unicast otherwise, i.e. more restrictive.

However, if the interface is part of a VRF, then 'fib daddr type'
returns unicast even if the address is configured on the incoming
interface.

This is broken for both ipv4 and ipv6.

In the ipv4 case, inet_dev_addr_type must only be used if the
'iif' or 'oif' (strict mode) was requested.

Else inet_addr_type_dev_table() needs to be used and the correct
dev argument must be passed as well so the correct fib (vrf) table
is used.

In the ipv6 case, the bug is similar, without strict mode, dev is NULL
so .flowi6_l3mdev will be set to 0.

Add a new 'nft_fib_l3mdev_master_ifindex_rcu()' helper and use that
to init the .l3mdev structure member.

For ipv6, use it from nft_fib6_flowi_init() which gets called from
both the 'type' and the 'route' mode eval functions.

This provides consistent behaviour for all modes for both ipv4 and ipv6:
If strict matching is requested, the input respectively output device
of the netfilter hooks is used.

Otherwise, use skb-&gt;dev to obtain the l3mdev ifindex.

Without this, most type checks in updated nft_fib.sh selftest fail:

  FAIL: did not find veth0 . 10.9.9.1 . local in fibtype4
  FAIL: did not find veth0 . dead:1::1 . local in fibtype6
  FAIL: did not find veth0 . dead:9::1 . local in fibtype6
  FAIL: did not find tvrf . 10.0.1.1 . local in fibtype4
  FAIL: did not find tvrf . 10.9.9.1 . local in fibtype4
  FAIL: did not find tvrf . dead:1::1 . local in fibtype6
  FAIL: did not find tvrf . dead:9::1 . local in fibtype6
  FAIL: fib expression address types match (iif in vrf)

(fib errounously returns 'unicast' for all of them, even
 though all of these addresses are local to the vrf).

Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next</title>
<updated>2025-03-25T15:29:13Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-03-25T15:29:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=00a25cca0d7be87285c5d0acf7ed2a04910559f1'/>
<id>urn:sha1:00a25cca0d7be87285c5d0acf7ed2a04910559f1</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

1) Use kvmalloc in xt_hashlimit, from Denis Kirjanov.

2) Tighten nf_conntrack sysctl accepted values for nf_conntrack_max
   and nf_ct_expect_max, from Nicolas Bouchinet.

3) Avoid lookup in nft_fib if socket is available, from Florian Westphal.

4) Initialize struct lsm_context in nfnetlink_queue to avoid
   hypothetical ENOMEM errors, Chenyuan Yang.

5) Use strscpy() instead of _pad when initializing xtables table name,
   kzalloc is already used to initialized the table memory area.
   From Thorsten Blum.

6) Missing socket lookup by conntrack information for IPv6 traffic
   in nft_socket, there is a similar chunk in IPv4, this was never
   added when IPv6 NAT was introduced. From Maxim Mikityanskiy.

7) Fix clang issues with nf_tables CONFIG_MITIGATION_RETPOLINE,
   from WangYuli.

* tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: Only use nf_skip_indirect_calls() when MITIGATION_RETPOLINE
  netfilter: socket: Lookup orig tuple for IPv6 SNAT
  netfilter: xtables: Use strscpy() instead of strscpy_pad()
  netfilter: nfnetlink_queue: Initialize ctx to avoid memory allocation error
  netfilter: fib: avoid lookup if socket is available
  netfilter: conntrack: Bound nf_conntrack sysctl writes
  netfilter: xt_hashlimit: replace vmalloc calls with kvmalloc
====================

Link: https://patch.msgid.link/20250323100922.59983-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfilter: fib: avoid lookup if socket is available</title>
<updated>2025-03-21T09:12:15Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-02-20T13:07:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eaaff9b6702e99be5d79135f2afa9fc48a0d59e0'/>
<id>urn:sha1:eaaff9b6702e99be5d79135f2afa9fc48a0d59e0</id>
<content type='text'>
In case the fib match is used from the input hook we can avoid the fib
lookup if early demux assigned a socket for us: check that the input
interface matches sk-cached one.

Rework the existing 'lo bypass' logic to first check sk, then
for loopback interface type to elide the fib lookup.

This speeds up fib matching a little, before:
93.08 GBit/s (no rules at all)
75.1  GBit/s ("fib saddr . iif oif missing drop" in prerouting)
75.62 GBit/s ("fib saddr . iif oif missing drop" in input)

After:
92.48 GBit/s (no rules at all)
75.62 GBit/s (fib rule in prerouting)
90.37 GBit/s (fib rule in input).

Numbers for the 'no rules' and 'prerouting' are expected to
closely match in-between runs, the 3rd/input test case exercises the
the 'avoid lookup if cached ifindex in sk matches' case.

Test used iperf3 via veth interface, lo can't be used due to existing
loopback test.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: make destruction work queue pernet</title>
<updated>2025-03-06T12:35:54Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-03-06T03:05:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fb8286562ecfb585e26b033c5e32e6fb85efb0b3'/>
<id>urn:sha1:fb8286562ecfb585e26b033c5e32e6fb85efb0b3</id>
<content type='text'>
The call to flush_work before tearing down a table from the netlink
notifier was supposed to make sure that all earlier updates (e.g. rule
add) that might reference that table have been processed.

Unfortunately, flush_work() waits for the last queued instance.
This could be an instance that is different from the one that we must
wait for.

This is because transactions are protected with a pernet mutex, but the
work item is global, so holding the transaction mutex doesn't prevent
another netns from queueing more work.

Make the work item pernet so that flush_work() will wait for all
transactions queued from this netns.

A welcome side effect is that we no longer need to wait for transaction
objects from foreign netns.

The gc work queue is still global.  This seems to be ok because nft_set
structures are reference counted and each container structure owns a
reference on the net namespace.

The destroy_list is still protected by a global spinlock rather than
pernet one but the hold time is very short anyway.

v2: call cancel_work_sync before reaping the remaining tables (Pablo).

Fixes: 9f6958ba2e90 ("netfilter: nf_tables: unconditionally flush pending work before notifier")
Reported-by: syzbot+5d8c5789c8cb076b2c25@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
