<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/netfilter, branch v6.13</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.13</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.13'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-01-09T12:29:45Z</updated>
<entry>
<title>netfilter: conntrack: clamp maximum hashtable size to INT_MAX</title>
<updated>2025-01-09T12:29:45Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2025-01-08T21:56:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b541ba7d1f5a5b7b3e2e22dc9e40e18a7d6dbc13'/>
<id>urn:sha1:b541ba7d1f5a5b7b3e2e22dc9e40e18a7d6dbc13</id>
<content type='text'>
Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
resizing hashtable because __GFP_NOWARN is unset. See:

  0708a0afe291 ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")

Note: hashtable resize is only possible from init_netns.

Fixes: 9cc1c73ad666 ("netfilter: conntrack: avoid integer overflow when resizing")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: imbalance in flowtable binding</title>
<updated>2025-01-09T12:29:38Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2025-01-02T12:01:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13210fc63f353fe78584048079343413a3cdf819'/>
<id>urn:sha1:13210fc63f353fe78584048079343413a3cdf819</id>
<content type='text'>
All these cases cause imbalance between BIND and UNBIND calls:

- Delete an interface from a flowtable with multiple interfaces

- Add a (device to a) flowtable with --check flag

- Delete a netns containing a flowtable

- In an interactive nft session, create a table with owner flag and
  flowtable inside, then quit.

Fix it by calling FLOW_BLOCK_UNBIND when unregistering hooks, then
remove late FLOW_BLOCK_UNBIND call when destroying flowtable.

Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: Phil Sutter &lt;phil@nwl.cc&gt;
Tested-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: ipset: Fix for recursive locking warning</title>
<updated>2024-12-18T23:28:47Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2024-12-17T19:56:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=70b6f46a4ed8bd56c85ffff22df91e20e8c85e33'/>
<id>urn:sha1:70b6f46a4ed8bd56c85ffff22df91e20e8c85e33</id>
<content type='text'>
With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding
it to a set of type list:set and populating it from iptables SET target
triggers a kernel warning:

| WARNING: possible recursive locking detected
| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
| --------------------------------------------
| ping/4018 is trying to acquire lock:
| ffff8881094a6848 (&amp;set-&gt;lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]
|
| but task is already holding lock:
| ffff88811034c048 (&amp;set-&gt;lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]

This is a false alarm: ipset does not allow nested list:set type, so the
loop in list_set_kadd() can never encounter the outer set itself. No
other set type supports embedded sets, so this is the only case to
consider.

To avoid the false report, create a distinct lock class for list:set
type ipset locks.

Fixes: f830837f0eed ("netfilter: ipset: list:set set type support")
Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems</title>
<updated>2024-12-18T22:37:27Z</updated>
<author>
<name>David Laight</name>
<email>David.Laight@ACULAB.COM</email>
</author>
<published>2024-12-14T17:30:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf2c97423a4f89c8b798294d3f34ecfe7e7035c3'/>
<id>urn:sha1:cf2c97423a4f89c8b798294d3f34ecfe7e7035c3</id>
<content type='text'>
The 'max_avail' value is calculated from the system memory
size using order_base_2().
order_base_2(x) is defined as '(x) ? fn(x) : 0'.
The compiler generates two copies of the code that follows
and then expands clamp(max, min, PAGE_SHIFT - 12) (11 on 32bit).
This triggers a compile-time assert since min is 5.

In reality a system would have to have less than 512MB memory
for the bounds passed to clamp to be reversed.

Swap the order of the arguments to clamp() to avoid the warning.

Replace the clamp_val() on the line below with clamp().
clamp_val() is just 'an accident waiting to happen' and not needed here.

Detected by compile time checks added to clamp(), specifically:
minmax.h: use BUILD_BUG_ON_MSG() for the lo &lt; hi test in clamp()

Reported-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Closes: https://lore.kernel.org/all/CA+G9fYsT34UkGFKxus63H6UVpYi5GRZkezT9MRLfAbM3f6ke0g@mail.gmail.com/
Fixes: 4f325e26277b ("ipvs: dynamically limit the connection hash table")
Tested-by: Bartosz Golaszewski &lt;bartosz.golaszewski@linaro.org&gt;
Reviewed-by: Bartosz Golaszewski &lt;bartosz.golaszewski@linaro.org&gt;
Signed-off-by: David Laight &lt;david.laight@aculab.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: do not defer rule destruction via call_rcu</title>
<updated>2024-12-11T22:27:50Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2024-12-07T11:14:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b04df3da1b5c6f6dc7cdccc37941740c078c4043'/>
<id>urn:sha1:b04df3da1b5c6f6dc7cdccc37941740c078c4043</id>
<content type='text'>
nf_tables_chain_destroy can sleep, it can't be used from call_rcu
callbacks.

Moreover, nf_tables_rule_release() is only safe for error unwinding,
while transaction mutex is held and the to-be-desroyed rule was not
exposed to either dataplane or dumps, as it deactives+frees without
the required synchronize_rcu() in-between.

nft_rule_expr_deactivate() callbacks will change -&gt;use counters
of other chains/sets, see e.g. nft_lookup .deactivate callback, these
must be serialized via transaction mutex.

Also add a few lockdep asserts to make this more explicit.

Calling synchronize_rcu() isn't ideal, but fixing this without is hard
and way more intrusive.  As-is, we can get:

WARNING: .. net/netfilter/nf_tables_api.c:5515 nft_set_destroy+0x..
Workqueue: events nf_tables_trans_destroy_work
RIP: 0010:nft_set_destroy+0x3fe/0x5c0
Call Trace:
 &lt;TASK&gt;
 nf_tables_trans_destroy_work+0x6b7/0xad0
 process_one_work+0x64a/0xce0
 worker_thread+0x613/0x10d0

In case the synchronize_rcu becomes an issue, we can explore alternatives.

One way would be to allocate nft_trans_rule objects + one nft_trans_chain
object, deactivate the rules + the chain and then defer the freeing to the
nft destroy workqueue.  We'd still need to keep the synchronize_rcu path as
a fallback to handle -ENOMEM corner cases though.

Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67478d92.050a0220.253251.0062.GAE@google.com/T/
Fixes: c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: IDLETIMER: Fix for possible ABBA deadlock</title>
<updated>2024-12-11T22:15:19Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2024-12-06T18:32:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f36b01994d68ffc253c8296e2228dfe6e6431c03'/>
<id>urn:sha1:f36b01994d68ffc253c8296e2228dfe6e6431c03</id>
<content type='text'>
Deletion of the last rule referencing a given idletimer may happen at
the same time as a read of its file in sysfs:

| ======================================================
| WARNING: possible circular locking dependency detected
| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
| ------------------------------------------------------
| iptables/3303 is trying to acquire lock:
| ffff8881057e04b8 (kn-&gt;active#48){++++}-{0:0}, at: __kernfs_remove+0x20
|
| but task is already holding lock:
| ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v]
|
| which lock already depends on the new lock.

A simple reproducer is:

| #!/bin/bash
|
| while true; do
|         iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
|         iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
| done &amp;
| while true; do
|         cat /sys/class/xt_idletimer/timers/testme &gt;/dev/null
| done

Avoid this by freeing list_mutex right after deleting the element from
the list, then continuing with the teardown.

Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation")
Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nft_set_hash: skip duplicated elements pending gc run</title>
<updated>2024-12-04T20:37:41Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2024-12-01T23:04:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7ffc7481153bbabf3332c6a19b289730c7e1edf5'/>
<id>urn:sha1:7ffc7481153bbabf3332c6a19b289730c7e1edf5</id>
<content type='text'>
rhashtable does not provide stable walk, duplicated elements are
possible in case of resizing. I considered that checking for errors when
calling rhashtable_walk_next() was sufficient to detect the resizing.
However, rhashtable_walk_next() returns -EAGAIN only at the end of the
iteration, which is too late, because a gc work containing duplicated
elements could have been already scheduled for removal to the worker.

Add a u32 gc worker sequence number per set, bump it on every workqueue
run. Annotate gc worker sequence number on the expired element. Use it
to skip those already seen in this gc workqueue run.

Note that this new field is never reset in case gc transaction fails, so
next gc worker run on the expired element overrides it. Wraparound of gc
worker sequence number should not be an issue with stale gc worker
sequence number in the element, that would just postpone the element
removal in one gc run.

Note that it is not possible to use flags to annotate that element is
pending gc run to detect duplicates, given that gc transaction can be
invalidated in case of update from the control plane, therefore, not
allowing to clear such flag.

On x86_64, pahole reports no changes in the size of nft_rhash_elem.

Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Reported-by: Laurent Fasnacht &lt;laurent.fasnacht@proton.ch&gt;
Tested-by: Laurent Fasnacht &lt;laurent.fasnacht@proton.ch&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: ipset: Hold module reference while requesting a module</title>
<updated>2024-12-04T14:39:22Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2024-11-29T15:30:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=456f010bfaefde84d3390c755eedb1b0a5857c3c'/>
<id>urn:sha1:456f010bfaefde84d3390c755eedb1b0a5857c3c</id>
<content type='text'>
User space may unload ip_set.ko while it is itself requesting a set type
backend module, leading to a kernel crash. The race condition may be
provoked by inserting an mdelay() right after the nfnl_unlock() call.

Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Acked-by: Jozsef Kadlecsik &lt;kadlec@netfilter.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nft_inner: incorrect percpu area handling under softirq</title>
<updated>2024-12-03T21:10:58Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2024-11-27T11:46:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7b1d83da254be3bf054965c8f3b1ad976f460ae5'/>
<id>urn:sha1:7b1d83da254be3bf054965c8f3b1ad976f460ae5</id>
<content type='text'>
Softirq can interrupt ongoing packet from process context that is
walking over the percpu area that contains inner header offsets.

Disable bh and perform three checks before restoring the percpu inner
header offsets to validate that the percpu area is valid for this
skbuff:

1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff
   has already been parsed before for inner header fetching to
   register.

2) Validate that the percpu area refers to this skbuff using the
   skbuff pointer as a cookie. If there is a cookie mismatch, then
   this skbuff needs to be parsed again.

3) Finally, validate if the percpu area refers to this tunnel type.

Only after these three checks the percpu area is restored to a on-stack
copy and bh is enabled again.

After inner header fetching, the on-stack copy is stored back to the
percpu area.

Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Reported-by: syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level</title>
<updated>2024-11-28T12:14:24Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2024-11-26T10:59:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b7529880cb961d515642ce63f9d7570869bbbdc3'/>
<id>urn:sha1:b7529880cb961d515642ce63f9d7570869bbbdc3</id>
<content type='text'>
cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to
restrict this maximum depth to a more reasonable value not to harm
performance. Remove unnecessary WARN_ON_ONCE which is reachable from
userspace.

Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
