<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/netfilter, branch v6.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-03-25T15:29:13Z</updated>
<entry>
<title>Merge tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next</title>
<updated>2025-03-25T15:29:13Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-03-25T15:29:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=00a25cca0d7be87285c5d0acf7ed2a04910559f1'/>
<id>urn:sha1:00a25cca0d7be87285c5d0acf7ed2a04910559f1</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

1) Use kvmalloc in xt_hashlimit, from Denis Kirjanov.

2) Tighten nf_conntrack sysctl accepted values for nf_conntrack_max
   and nf_ct_expect_max, from Nicolas Bouchinet.

3) Avoid lookup in nft_fib if socket is available, from Florian Westphal.

4) Initialize struct lsm_context in nfnetlink_queue to avoid
   hypothetical ENOMEM errors, Chenyuan Yang.

5) Use strscpy() instead of _pad when initializing xtables table name,
   kzalloc is already used to initialized the table memory area.
   From Thorsten Blum.

6) Missing socket lookup by conntrack information for IPv6 traffic
   in nft_socket, there is a similar chunk in IPv4, this was never
   added when IPv6 NAT was introduced. From Maxim Mikityanskiy.

7) Fix clang issues with nf_tables CONFIG_MITIGATION_RETPOLINE,
   from WangYuli.

* tag 'nf-next-25-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: Only use nf_skip_indirect_calls() when MITIGATION_RETPOLINE
  netfilter: socket: Lookup orig tuple for IPv6 SNAT
  netfilter: xtables: Use strscpy() instead of strscpy_pad()
  netfilter: nfnetlink_queue: Initialize ctx to avoid memory allocation error
  netfilter: fib: avoid lookup if socket is available
  netfilter: conntrack: Bound nf_conntrack sysctl writes
  netfilter: xt_hashlimit: replace vmalloc calls with kvmalloc
====================

Link: https://patch.msgid.link/20250323100922.59983-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfilter: fib: avoid lookup if socket is available</title>
<updated>2025-03-21T09:12:15Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-02-20T13:07:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eaaff9b6702e99be5d79135f2afa9fc48a0d59e0'/>
<id>urn:sha1:eaaff9b6702e99be5d79135f2afa9fc48a0d59e0</id>
<content type='text'>
In case the fib match is used from the input hook we can avoid the fib
lookup if early demux assigned a socket for us: check that the input
interface matches sk-cached one.

Rework the existing 'lo bypass' logic to first check sk, then
for loopback interface type to elide the fib lookup.

This speeds up fib matching a little, before:
93.08 GBit/s (no rules at all)
75.1  GBit/s ("fib saddr . iif oif missing drop" in prerouting)
75.62 GBit/s ("fib saddr . iif oif missing drop" in input)

After:
92.48 GBit/s (no rules at all)
75.62 GBit/s (fib rule in prerouting)
90.37 GBit/s (fib rule in input).

Numbers for the 'no rules' and 'prerouting' are expected to
closely match in-between runs, the 3rd/input test case exercises the
the 'avoid lookup if cached ifindex in sk matches' case.

Test used iperf3 via veth interface, lo can't be used due to existing
loopback test.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: make destruction work queue pernet</title>
<updated>2025-03-06T12:35:54Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-03-06T03:05:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fb8286562ecfb585e26b033c5e32e6fb85efb0b3'/>
<id>urn:sha1:fb8286562ecfb585e26b033c5e32e6fb85efb0b3</id>
<content type='text'>
The call to flush_work before tearing down a table from the netlink
notifier was supposed to make sure that all earlier updates (e.g. rule
add) that might reference that table have been processed.

Unfortunately, flush_work() waits for the last queued instance.
This could be an instance that is different from the one that we must
wait for.

This is because transactions are protected with a pernet mutex, but the
work item is global, so holding the transaction mutex doesn't prevent
another netns from queueing more work.

Make the work item pernet so that flush_work() will wait for all
transactions queued from this netns.

A welcome side effect is that we no longer need to wait for transaction
objects from foreign netns.

The gc work queue is still global.  This seems to be ok because nft_set
structures are reference counted and each container structure owns a
reference on the net namespace.

The destroy_list is still protected by a global spinlock rather than
pernet one but the hold time is very short anyway.

v2: call cancel_work_sync before reaping the remaining tables (Pablo).

Fixes: 9f6958ba2e90 ("netfilter: nf_tables: unconditionally flush pending work before notifier")
Reported-by: syzbot+5d8c5789c8cb076b2c25@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: flowtable: add CLOSING state</title>
<updated>2025-01-19T15:41:56Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2025-01-13T23:50:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fdbaf5163331342e90a2c29b87629021f4c15f0c'/>
<id>urn:sha1:fdbaf5163331342e90a2c29b87629021f4c15f0c</id>
<content type='text'>
tcp rst/fin packet triggers an immediate teardown of the flow which
results in sending flows back to the classic forwarding path.

This behaviour was introduced by:

  da5984e51063 ("netfilter: nf_flow_table: add support for sending flows back to the slow path")
  b6f27d322a0a ("netfilter: nf_flow_table: tear down TCP flows if RST or FIN was seen")

whose goal is to expedite removal of flow entries from the hardware
table. Before these patches, the flow was released after the flow entry
timed out.

However, this approach leads to packet races when restoring the
conntrack state as well as late flow re-offload situations when the TCP
connection is ending.

This patch adds a new CLOSING state that is is entered when tcp rst/fin
packet is seen. This allows for an early removal of the flow entry from
the hardware table. But the flow entry still remains in software, so tcp
packets to shut down the flow are not sent back to slow path.

If syn packet is seen from this new CLOSING state, then this flow enters
teardown state, ct state is set to TCP_CONNTRACK_CLOSE state and packet
is sent to slow path, so this TCP reopen scenario can be handled by
conntrack. TCP_CONNTRACK_CLOSE provides a small timeout that aims at
quickly releasing this stale entry from the conntrack table.

Moreover, skip hardware re-offload from flowtable software packet if the
flow is in CLOSING state.

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: conntrack: rework offload nf_conn timeout extension logic</title>
<updated>2025-01-19T15:41:55Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-01-13T23:50:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=03428ca5cee9f0792edc996c06ce4514816af1fb'/>
<id>urn:sha1:03428ca5cee9f0792edc996c06ce4514816af1fb</id>
<content type='text'>
Offload nf_conn entries may not see traffic for a very long time.

To prevent incorrect 'ct is stale' checks during nf_conntrack table
lookup, the gc worker extends the timeout nf_conn entries marked for
offload to a large value.

The existing logic suffers from a few problems.

Garbage collection runs without locks, its unlikely but possible
that @ct is removed right after the 'offload' bit test.

In that case, the timeout of a new/reallocated nf_conn entry will
be increased.

Prevent this by obtaining a reference count on the ct object and
re-check of the confirmed and offload bits.

If those are not set, the ct is being removed, skip the timeout
extension in this case.

Parallel teardown is also problematic:
 cpu1                                cpu2
 gc_worker
                                     calls flow_offload_teardown()
 tests OFFLOAD bit, set
                                     clear OFFLOAD bit
                                     ct-&gt;timeout is repaired (e.g. set to timeout[UDP_CT_REPLIED])
 nf_ct_offload_timeout() called
 expire value is fetched
 &lt;INTERRUPT&gt;
-&gt; NF_CT_DAY timeout for flow that isn't offloaded
(and might not see any further packets).

Use cmpxchg: if ct-&gt;timeout was repaired after the 2nd 'offload bit' test
passed, then ct-&gt;timeout will only be updated of ct-&gt;timeout was not
altered in between.

As we already have a gc worker for flowtable entries, ct-&gt;timeout repair
can be handled from the flowtable gc worker.

This avoids having flowtable specific logic in the conntrack core
and avoids checking entries that were never offloaded.

This allows to remove the nf_ct_offload_timeout helper.
Its safe to use in the add case, but not on teardown.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: conntrack: remove skb argument from nf_ct_refresh</title>
<updated>2025-01-19T15:41:55Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2025-01-13T23:50:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=31768596b15aa8c9c55f078acad29d0238c8269b'/>
<id>urn:sha1:31768596b15aa8c9c55f078acad29d0238c8269b</id>
<content type='text'>
Its not used (and could be NULL), so remove it.
This allows to use nf_ct_refresh in places where we don't have
an skb without having to double-check that skb == NULL would be safe.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Tolerate chains with no remaining hooks</title>
<updated>2025-01-19T15:41:54Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-01-09T17:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc0133428e7ad65aa6b7c8e65ccfe86e469e4512'/>
<id>urn:sha1:fc0133428e7ad65aa6b7c8e65ccfe86e469e4512</id>
<content type='text'>
Do not drop a netdev-family chain if the last interface it is registered
for vanishes. Users dumping and storing the ruleset upon shutdown to
restore it upon next boot may otherwise lose the chain and all contained
rules. They will still lose the list of devices, a later patch will fix
that. For now, this aligns the event handler's behaviour with that for
flowtables.
The controversal situation at netns exit should be no problem here:
event handler will unregister the hooks, core nftables cleanup code will
drop the chain itself.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: Store user-defined hook ifname</title>
<updated>2025-01-19T15:41:53Z</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2025-01-09T17:31:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b7c2d793c28cda7dbb67d6b427e3280b7c1e601a'/>
<id>urn:sha1:b7c2d793c28cda7dbb67d6b427e3280b7c1e601a</id>
<content type='text'>
Prepare for hooks with NULL ops.dev pointer (due to non-existent device)
and store the interface name and length as specified by the user upon
creation. No functional change intended.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: fix set size with rbtree backend</title>
<updated>2025-01-19T15:41:41Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2025-01-06T22:40:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d738c1869f611955d91d8d0fd0012d9ef207201'/>
<id>urn:sha1:8d738c1869f611955d91d8d0fd0012d9ef207201</id>
<content type='text'>
The existing rbtree implementation uses singleton elements to represent
ranges, however, userspace provides a set size according to the number
of ranges in the set.

Adjust provided userspace set size to the number of singleton elements
in the kernel by multiplying the range by two.

Check if the no-match all-zero element is already in the set, in such
case release one slot in the set size.

Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: conntrack: add conntrack event timestamp</title>
<updated>2025-01-09T13:42:16Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2024-11-15T13:46:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=601731fc7c6111bbca49ce3c9499c2e4d426079d'/>
<id>urn:sha1:601731fc7c6111bbca49ce3c9499c2e4d426079d</id>
<content type='text'>
Nadia Pinaeva writes:
  I am working on a tool that allows collecting network performance
  metrics by using conntrack events.
  Start time of a conntrack entry is used to evaluate seen_reply
  latency, therefore the sooner it is timestamped, the better the
  precision is.
  In particular, when using this tool to compare the performance of the
  same feature implemented using iptables/nftables/OVS it is crucial
  to have the entry timestamped earlier to see any difference.

At this time, conntrack events can only get timestamped at recv time in
userspace, so there can be some delay between the event being generated
and the userspace process consuming the message.

There is sys/net/netfilter/nf_conntrack_timestamp, which adds a
64bit timestamp (ns resolution) that records start and stop times,
but its not suited for this either, start time is the 'hashtable insertion
time', not 'conntrack allocation time'.

There is concern that moving the start-time moment to conntrack
allocation will add overhead in case of flooding, where conntrack
entries are allocated and released right away without getting inserted
into the hashtable.

Also, even if this was changed it would not with events other than
new (start time) and destroy (stop time).

Pablo suggested to add new CTA_TIMESTAMP_EVENT, this adds this feature.
The timestamp is recorded in case both events are requested and the
sys/net/netfilter/nf_conntrack_timestamp toggle is enabled.

Reported-by: Nadia Pinaeva &lt;n.m.pinaeva@gmail.com&gt;
Suggested-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
