<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/netfilter, branch v6.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-07-17T09:23:33Z</updated>
<entry>
<title>netfilter: nf_conntrack: fix crash due to removal of uninitialised entry</title>
<updated>2025-07-17T09:23:33Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-07-16T18:39:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2d72afb340657f03f7261e9243b44457a9228ac7'/>
<id>urn:sha1:2d72afb340657f03f7261e9243b44457a9228ac7</id>
<content type='text'>
A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list:
    [exception RIP: __nf_ct_delete_from_lists+172]
    [..]
 #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]
 #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]
 #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]
    [..]

The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state:

 ct hlist pointer is garbage; looks like the ct hash value
 (hence crash).
 ct-&gt;status is equal to IPS_CONFIRMED|IPS_DYING, which is expected
 ct-&gt;timeout is 30000 (=30s), which is unexpected.

Everything else looks like normal udp conntrack entry.  If we ignore
ct-&gt;status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash:
  - ct hlist pointers are overloaded and store/cache the raw tuple hash
  - ct-&gt;timeout matches the relative time expected for a new udp flow
    rather than the absolute 'jiffies' value.

If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry.

Theory is that we did hit following race:

cpu x 			cpu y			cpu z
 found entry E		found entry E
 E is expired		&lt;preemption&gt;
 nf_ct_delete()
 return E to rcu slab
					init_conntrack
					E is re-inited,
					ct-&gt;status set to 0
					reply tuplehash hnnode.pprev
					stores hash value.

cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z.  cpu y was preempted before
checking for expiry and/or confirm bit.

					-&gt;refcnt set to 1
					E now owned by skb
					-&gt;timeout set to 30000

If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit.

					nf_conntrack_confirm gets called
					sets: ct-&gt;status |= CONFIRMED
					This is wrong: E is not yet added
					to hashtable.

cpu y resumes, it observes E as expired but CONFIRMED:
			&lt;resumes&gt;
			nf_ct_expired()
			 -&gt; yes (ct-&gt;timeout is 30s)
			confirmed bit set.

cpu y will try to delete E from the hashtable:
			nf_ct_delete() -&gt; set DYING bit
			__nf_ct_delete_from_lists

Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks:

			wait for spinlock held by z

					CONFIRMED is set but there is no
					guarantee ct will be added to hash:
					"chaintoolong" or "clash resolution"
					logic both skip the insert step.
					reply hnnode.pprev still stores the
					hash value.

					unlocks spinlock
					return NF_DROP
			&lt;unblocks, then
			 crashes on hlist_nulls_del_rcu pprev&gt;

In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs.

Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -&gt; nf_conntrack_destroy -&gt; nf_ct_destroy.

To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock.

Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this.

It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set:

Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped.

Also change nf_ct_should_gc() to first check the confirmed bit.

The gc sequence is:
 1. Check if entry has expired, if not skip to next entry
 2. Obtain a reference to the expired entry.
 3. Call nf_ct_should_gc() to double-check step 1.

nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check passes
ct-&gt;timeout has been altered to reflect the absolute 'best before' date
instead of a relative time.  Step 3 will therefore not remove the entry.

Without this change to nf_ct_should_gc() we could still get this sequence:

 1. Check if entry has expired.
 2. Obtain a reference.
 3. Call nf_ct_should_gc() to double-check step 1:
    4 - entry is still observed as expired
    5 - meanwhile, ct-&gt;timeout is corrected to absolute value on other CPU
      and confirm bit gets set
    6 - confirm bit is seen
    7 - valid entry is removed again

First do check 6), then 4) so the gc expiry check always picks up either
confirmed bit unset (entry gets skipped) or expiry re-check failure for
re-inited conntrack objects.

This change cannot be backported to releases before 5.19. Without
commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list")
|= IPS_CONFIRMED line cannot be moved without further changes.

Cc: Razvan Cojocaru &lt;rzvncj@gmail.com&gt;
Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/
Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/
Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>Revert "netfilter: nf_tables: Add notifications for hook changes"</title>
<updated>2025-07-14T13:22:47Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-07-10T16:43:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=36a686c0784fcccdaa4f38b498a9ef0d42ea7cb8'/>
<id>urn:sha1:36a686c0784fcccdaa4f38b498a9ef0d42ea7cb8</id>
<content type='text'>
This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82.

Such notifications fit better into core or nfnetlink_hook code,
following the NFNL_MSG_HOOK_GET message format.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: hide clash bit from userspace</title>
<updated>2025-07-14T13:22:35Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-07-07T20:32:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6ac86ac74e3a0608424240cc1ce813ad6e8a73dd'/>
<id>urn:sha1:6ac86ac74e3a0608424240cc1ce813ad6e8a73dd</id>
<content type='text'>
Its a kernel implementation detail, at least at this time:

We can later decide to revert this patch if there is a compelling
reason, but then we should also remove the ifdef that prevents exposure
of ip_conntrack_status enum IPS_NAT_CLASH value in the uapi header.

Clash entries are not included in dumps (true for both old /proc
and ctnetlink) either.  So for now exclude the clash bit when dumping.

Fixes: 7e5c6aa67e6f ("netfilter: nf_tables: add packets conntrack state to debug trace info")
Link: https://lore.kernel.org/netfilter-devel/aGwf3dCggwBlRKKC@strlen.de/
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>treewide, timers: Rename from_timer() to timer_container_of()</title>
<updated>2025-06-08T07:07:37Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2025-05-09T05:51:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=41cb08555c4164996d67c78b3bf1c658075b75f1'/>
<id>urn:sha1:41cb08555c4164996d67c78b3bf1c658075b75f1</id>
<content type='text'>
Move this API to the canonical timer_*() namespace.

[ tglx: Redone against pre rc1 ]

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com

</content>
</entry>
<entry>
<title>netfilter: nf_nat: also check reverse tuple to obtain clashing entry</title>
<updated>2025-06-05T08:50:05Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-05-30T10:34:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=50d9ce9679dd50df2dc51ada717fa875bc248fad'/>
<id>urn:sha1:50d9ce9679dd50df2dc51ada717fa875bc248fad</id>
<content type='text'>
The logic added in the blamed commit was supposed to only omit nat source
port allocation if neither the existing nor the new entry are subject to
NAT.

However, its not enough to lookup the conntrack based on the proposed
tuple, we must also check the reverse direction.

Otherwise there are esoteric cases where the collision is in the reverse
direction because that colliding connection has a port rewrite, but the
new entry doesn't.  In this case, we only check the new entry and then
erronously conclude that no clash exists anymore.

 The existing (udp) tuple is:
  a:p -&gt; b:P, with nat translation to s:P, i.e. pure daddr rewrite,
  reverse tuple in conntrack table is s:P -&gt; a:p.

When another UDP packet is sent directly to s, i.e. a:p-&gt;s:P, this is
correctly detected as a colliding entry: tuple is taken by existing reply
tuple in reverse direction.

But the colliding conntrack is only searched for with unreversed
direction, and we can't find such entry matching a:p-&gt;s:P.

The incorrect conclusion is that the clashing entry has timed out and
that no port address translation is required.

Such conntrack will then be discarded at nf_confirm time because the
proposed reverse direction clashes with an existing mapping in the
conntrack table.

Search for the reverse tuple too, this will then check the NAT bits of
the colliding entry and triggers port reallocation.

Followp patch extends nft_nat.sh selftest to cover this scenario.

The IPS_SEQ_ADJUST change is also a bug fix:
Instead of checking for SEQ_ADJ this tested for SEEN_REPLY and ASSURED
by accident -- _BIT is only for use with the test_bit() API.

This bug has little consequence in practice, because the sequence number
adjustments are only useful for TCP which doesn't support clash resolution.

The existing test case (conntrack_reverse_clash.sh) exercise a race
condition path (parallel conntrack creation on different CPUs), so
the colliding entries have neither SEEN_REPLY nor ASSURED set.

Thanks to Yafang Shao and Shaun Brady for an initial investigation
of this bug.

Fixes: d8f84a9bc7c4 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1795
Reported-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Reported-by: Shaun Brady &lt;brady.1345@gmail.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Tested-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_set_pipapo_avx2: fix initial map fill</title>
<updated>2025-06-05T08:49:58Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-05-23T12:20:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ea77c397bff8b6d59f6d83dae1425b08f465e8b5'/>
<id>urn:sha1:ea77c397bff8b6d59f6d83dae1425b08f465e8b5</id>
<content type='text'>
If the first field doesn't cover the entire start map, then we must zero
out the remainder, else we leak those bits into the next match round map.

The early fix was incomplete and did only fix up the generic C
implementation.

A followup patch adds a test case to nft_concat_range.sh.

Fixes: 791a615b7ad2 ("netfilter: nf_set_pipapo: fix initial map fill")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Reviewed-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next</title>
<updated>2025-05-26T16:53:41Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2025-05-26T16:53:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f5b60d6a575a7573a15e08aad129382aa39c228c'/>
<id>urn:sha1:f5b60d6a575a7573a15e08aad129382aa39c228c</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next,
specifically 26 patches: 5 patches adding/updating selftests,
4 fixes, 3 PREEMPT_RT fixes, and 14 patches to enhance nf_tables):

1) Improve selftest coverage for pipapo 4 bit group format, from
   Florian Westphal.

2) Fix incorrect dependencies when compiling a kernel without
   legacy ip{6}tables support, also from Florian.

3) Two patches to fix nft_fib vrf issues, including selftest updates
   to improve coverage, also from Florian Westphal.

4) Fix incorrect nesting in nft_tunnel's GENEVE support, from
   Fernando F. Mancera.

5) Three patches to fix PREEMPT_RT issues with nf_dup infrastructure
   and nft_inner to match in inner headers, from Sebastian Andrzej Siewior.

6) Integrate conntrack information into nft trace infrastructure,
   from Florian Westphal.

7) A series of 13 patches to allow to specify wildcard netdevice in
   netdev basechain and flowtables, eg.

   table netdev filter {
       chain ingress {
           type filter hook ingress devices = { eth0, eth1, vlan* } priority 0; policy accept;
       }
   }

   This also allows for runtime hook registration on NETDEV_{UN}REGISTER
   event, from Phil Sutter.

netfilter pull request 25-05-23

* tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: (26 commits)
  selftests: netfilter: Torture nftables netdev hooks
  netfilter: nf_tables: Add notifications for hook changes
  netfilter: nf_tables: Support wildcard netdev hook specs
  netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()
  netfilter: nf_tables: Handle NETDEV_CHANGENAME events
  netfilter: nf_tables: Wrap netdev notifiers
  netfilter: nf_tables: Respect NETDEV_REGISTER events
  netfilter: nf_tables: Prepare for handling NETDEV_REGISTER events
  netfilter: nf_tables: Have a list of nf_hook_ops in nft_hook
  netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()
  netfilter: nf_tables: Introduce nft_register_flowtable_ops()
  netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()
  netfilter: nf_tables: Introduce functions freeing nft_hook objects
  netfilter: nf_tables: add packets conntrack state to debug trace info
  netfilter: conntrack: make nf_conntrack_id callable without a module dependency
  netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit
  netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx
  netfilter: nf_dup{4, 6}: Move duplication check to task_struct
  netfilter: nft_tunnel: fix geneve_opt dump
  selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFs
  ...
====================

Link: https://patch.msgid.link/20250523132712.458507-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Add notifications for hook changes</title>
<updated>2025-05-23T11:57:14Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-05-21T20:44:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=465b9ee0ee7bc268d7f261356afd6c4262e48d82'/>
<id>urn:sha1:465b9ee0ee7bc268d7f261356afd6c4262e48d82</id>
<content type='text'>
Notify user space if netdev hooks are updated due to netdev add/remove
events. Send minimal notification messages by introducing
NFT_MSG_NEWDEV/DELDEV message types describing a single device only.

Upon NETDEV_CHANGENAME, the callback has no information about the
interface's old name. To provide a clear message to user space, include
the hook's stored interface name in the notification.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Support wildcard netdev hook specs</title>
<updated>2025-05-23T11:57:14Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-05-21T20:44:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6d07a289504a246aa97cdac0fec61a15834801b7'/>
<id>urn:sha1:6d07a289504a246aa97cdac0fec61a15834801b7</id>
<content type='text'>
User space may pass non-nul-terminated NFTA_DEVICE_NAME attribute values
to indicate a suffix wildcard.
Expect for multiple devices to match the given prefix in
nft_netdev_hook_alloc() and populate 'ops_list' with them all.
When checking for duplicate hooks, compare the shortest prefix so a
device may never match more than a single hook spec.
Finally respect the stored prefix length when hooking into new devices
from event handlers.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()</title>
<updated>2025-05-23T11:57:13Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-05-21T20:44:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6f670935b4703af1ef58b2cba7faf2bd154cce39'/>
<id>urn:sha1:6f670935b4703af1ef58b2cba7faf2bd154cce39</id>
<content type='text'>
No point in having err_hook_alloc, just call return directly. Also
rename err_hook_dev - it's not about the hook's device but freeing the
hook itself.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
