<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net, branch v6.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-06-22T12:39:06Z</updated>
<entry>
<title>Merge tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf</title>
<updated>2023-06-22T12:39:06Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2023-06-22T12:39:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ba7e7ebb6a71407cbe25cd349c9b05d40520bf0'/>
<id>urn:sha1:2ba7e7ebb6a71407cbe25cd349c9b05d40520bf0</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

This is v3, including a crash fix for patch 01/14.

The following patchset contains Netfilter/IPVS fixes for net:

1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.

2) Fix chain binding transaction logic, add a bound flag to rule
   transactions. Remove incorrect logic in nft_data_hold() and
   nft_data_release().

3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
   the set/chain as a follow up to 1240eb93f061 ("netfilter: nf_tables:
   incorrect error path handling with NFT_MSG_NEWRULE")

4) Drop map element references from preparation phase instead of
   set destroy path, otherwise bogus EBUSY with transactions such as:

        flush chain ip x y
        delete chain ip x w

   where chain ip x y contains jump/goto from set elements.

5) Pipapo set type does not regard generation mask from the walk
   iteration.

6) Fix reference count underflow in set element reference to
   stateful object.

7) Several patches to tighten the nf_tables API:
   - disallow set element updates of bound anonymous set
   - disallow unbound anonymous set/chain at the end of transaction.
   - disallow updates of anonymous set.
   - disallow timeout configuration for anonymous sets.

8) Fix module reference leak in chain updates.

9) Fix nfnetlink_osf module autoload.

10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
    in iptables-nft.

This Netfilter batch is larger than usual at this stage, I am aware we
are fairly late in the -rc cycle, if you prefer to route them through
net-next, please let me know.

netfilter pull request 23-06-21

* tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: Fix for deleting base chains with payload
  netfilter: nfnetlink_osf: fix module autoload
  netfilter: nf_tables: drop module reference after updating chain
  netfilter: nf_tables: disallow timeout for anonymous sets
  netfilter: nf_tables: disallow updates of anonymous sets
  netfilter: nf_tables: reject unbound chain set before commit phase
  netfilter: nf_tables: reject unbound anonymous set before commit phase
  netfilter: nf_tables: disallow element updates of bound anonymous sets
  netfilter: nf_tables: fix underflow in object reference counter
  netfilter: nft_set_pipapo: .walk does not deal with generations
  netfilter: nf_tables: drop map element references from preparation phase
  netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
  netfilter: nf_tables: fix chain binding transaction logic
  ipvs: align inner_mac_header for encapsulation
====================

Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: reject unbound anonymous set before commit phase</title>
<updated>2023-06-20T20:43:41Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2023-06-16T13:21:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=938154b93be8cd611ddfd7bafc1849f3c4355201'/>
<id>urn:sha1:938154b93be8cd611ddfd7bafc1849f3c4355201</id>
<content type='text'>
Add a new list to track set transaction and to check for unbound
anonymous sets before entering the commit phase.

Bail out at the end of the transaction handling if an anonymous set
remains unbound.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: drop map element references from preparation phase</title>
<updated>2023-06-20T20:43:40Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2023-06-16T12:51:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=628bd3e49cba1c066228e23d71a852c23e26da73'/>
<id>urn:sha1:628bd3e49cba1c066228e23d71a852c23e26da73</id>
<content type='text'>
set .destroy callback releases the references to other objects in maps.
This is very late and it results in spurious EBUSY errors. Drop refcount
from the preparation phase instead, update set backend not to drop
reference counter from set .destroy path.

Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the
reference counter because the transaction abort path releases the map
references for each element since the set is unbound. The abort path
also deals with releasing reference counter for new elements added to
unbound sets.

Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain</title>
<updated>2023-06-20T20:43:40Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2023-06-16T12:45:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=26b5a5712eb85e253724e56a54c17f8519bd8e4e'/>
<id>urn:sha1:26b5a5712eb85e253724e56a54c17f8519bd8e4e</id>
<content type='text'>
Add a new state to deal with rule expressions deactivation from the
newrule error path, otherwise the anonymous set remains in the list in
inactive state for the next generation. Mark the set/chain transaction
as unbound so the abort path releases this object, set it as inactive in
the next generation so it is not reachable anymore from this transaction
and reference counter is dropped.

Fixes: 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: fix chain binding transaction logic</title>
<updated>2023-06-20T20:41:51Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2023-06-16T12:45:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4bedf9eee016286c835e3d8fa981ddece5338795'/>
<id>urn:sha1:4bedf9eee016286c835e3d8fa981ddece5338795</id>
<content type='text'>
Add bound flag to rule and chain transactions as in 6a0a8d10a366
("netfilter: nf_tables: use-after-free in failing rule with bound set")
to skip them in case that the chain is already bound from the abort
path.

This patch fixes an imbalance in the chain use refcnt that triggers a
WARN_ON on the table and chain destroy path.

This patch also disallows nested chain bindings, which is not
supported from userspace.

The logic to deal with chain binding in nft_data_hold() and
nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a
special handling in case a chain is bound but next expressions in the
same rule fail to initialize as described by 1240eb93f061 ("netfilter:
nf_tables: incorrect error path handling with NFT_MSG_NEWRULE").

The chain is left bound if rule construction fails, so the objects
stored in this chain (and the chain itself) are released by the
transaction records from the abort path, follow up patch ("netfilter:
nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
completes this error handling.

When deleting an existing rule, chain bound flag is set off so the
rule expression .destroy path releases the objects.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'ipsec-2023-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec</title>
<updated>2023-06-20T12:33:50Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2023-06-20T12:33:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e438edaae26ce80c3fcc498fbba0c8f7e78497e5'/>
<id>urn:sha1:e438edaae26ce80c3fcc498fbba0c8f7e78497e5</id>
<content type='text'>
ipsec-2023-06-20
</content>
</entry>
<entry>
<title>net: dsa: introduce preferred_default_local_cpu_port and use on MT7530</title>
<updated>2023-06-20T08:40:26Z</updated>
<author>
<name>Vladimir Oltean</name>
<email>olteanv@gmail.com</email>
</author>
<published>2023-06-17T06:26:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b79d7c14f48083abb3fb061370c0c64a569edf4c'/>
<id>urn:sha1:b79d7c14f48083abb3fb061370c0c64a569edf4c</id>
<content type='text'>
Since the introduction of the OF bindings, DSA has always had a policy that
in case multiple CPU ports are present in the device tree, the numerically
smallest one is always chosen.

The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU
ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because
it has higher bandwidth.

The MT7530 driver developers had 3 options:
- to modify DSA when the MT7531 switch support was introduced, such as to
  prefer the better port
- to declare both CPU ports in device trees as CPU ports, and live with the
  sub-optimal performance resulting from not preferring the better port
- to declare just port 6 in the device tree as a CPU port

Of course they chose the path of least resistance (3rd option), kicking the
can down the road. The hardware description in the device tree is supposed
to be stable - developers are not supposed to adopt the strategy of
piecemeal hardware description, where the device tree is updated in
lockstep with the features that the kernel currently supports.

Now, as a result of the fact that they did that, any attempts to modify the
device tree and describe both CPU ports as CPU ports would make DSA change
its default selection from port 6 to 5, effectively resulting in a
performance degradation visible to users with the MT7531BE switch as can be
seen below.

Without preferring port 6:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-20.00  sec   374 MBytes   157 Mbits/sec  734    sender
[  5][TX-C]   0.00-20.00  sec   373 MBytes   156 Mbits/sec    receiver
[  7][RX-C]   0.00-20.00  sec  1.81 GBytes   778 Mbits/sec    0    sender
[  7][RX-C]   0.00-20.00  sec  1.81 GBytes   777 Mbits/sec    receiver

With preferring port 6:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-20.00  sec  1.99 GBytes   856 Mbits/sec  273    sender
[  5][TX-C]   0.00-20.00  sec  1.99 GBytes   855 Mbits/sec    receiver
[  7][RX-C]   0.00-20.00  sec  1.72 GBytes   737 Mbits/sec   15    sender
[  7][RX-C]   0.00-20.00  sec  1.71 GBytes   736 Mbits/sec    receiver

Using one port for WAN and the other ports for LAN is a very popular use
case which is what this test emulates.

As such, this change proposes that we retroactively modify stable kernels
(which don't support the modification of the CPU port assignments, so as to
let user space fix the problem and restore the throughput) to keep the
mt7530 driver preferring port 6 even with device trees where the hardware
is more fully described.

Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch")
Signed-off-by: Vladimir Oltean &lt;olteanv@gmail.com&gt;
Signed-off-by: Arınç ÜNAL &lt;arinc.unal@arinc9.com&gt;
Reviewed-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt;
Reviewed-by: Florian Fainelli &lt;florian.fainelli@broadcom.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting</title>
<updated>2023-06-14T08:31:39Z</updated>
<author>
<name>Peilin Ye</name>
<email>peilin.ye@bytedance.com</email>
</author>
<published>2023-06-11T03:30:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=84ad0af0bccd3691cb951c2974c5cb2c10594d4a'/>
<id>urn:sha1:84ad0af0bccd3691cb951c2974c5cb2c10594d4a</id>
<content type='text'>
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
in ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
access this per-net_device pointer in mini_qdisc_pair_swap().  Similar
for clsact Qdiscs and miniq_egress.

Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
requests (thanks Hillf Danton for the hint), when replacing ingress or
clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
causing race conditions [1] including a use-after-free bug in
mini_qdisc_pair_swap() reported by syzbot:

 BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
...
 Call Trace:
  &lt;TASK&gt;
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
  print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
  print_report mm/kasan/report.c:430 [inline]
  kasan_report+0x11c/0x130 mm/kasan/report.c:536
  mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
  tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
  tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
  tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
  tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
  tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
...

@old and @new should not affect each other.  In other words, @old should
never modify miniq_{in,e}gress after @new, and @new should not update
@old's RCU state.

Fixing without changing sch_api.c turned out to be difficult (please
refer to Closes: for discussions).  Instead, make sure @new's first call
always happen after @old's last call (in {ingress,clsact}_destroy()) has
finished:

In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
and call qdisc_destroy() for @old before grafting @new.

Introduce qdisc_refcount_dec_if_one() as the counterpart of
qdisc_refcount_inc_nz() used for filter requests.  Introduce a
non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
just like qdisc_put() etc.

Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
clsact Qdiscs".

[1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
TC_H_ROOT (no longer possible after commit c7cfbd115001 ("net/sched:
sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
transmission queues:

  Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
  then adds a flower filter X to A.

  Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
  b2) to replace A, then adds a flower filter Y to B.

 Thread 1               A's refcnt   Thread 2
  RTM_NEWQDISC (A, RTNL-locked)
   qdisc_create(A)               1
   qdisc_graft(A)                9

  RTM_NEWTFILTER (X, RTNL-unlocked)
   __tcf_qdisc_find(A)          10
   tcf_chain0_head_change(A)
   mini_qdisc_pair_swap(A) (1st)
            |
            |                         RTM_NEWQDISC (B, RTNL-locked)
         RCU sync                2     qdisc_graft(B)
            |                    1     notify_and_destroy(A)
            |
   tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
   qdisc_destroy(A)                    tcf_chain0_head_change(B)
   tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
   mini_qdisc_pair_swap(A) (3rd)                |
           ...                                 ...

Here, B calls mini_qdisc_pair_swap(), pointing eth0-&gt;miniq_ingress to
its mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
ingress_destroy(), setting eth0-&gt;miniq_ingress to NULL, so ingress
packets on eth0 will not find filter Y in sch_handle_ingress().

This is just one of the possible consequences of concurrently accessing
miniq_{in,e}gress pointers.

Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Signed-off-by: Peilin Ye &lt;peilin.ye@bytedance.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>net/sched: act_ct: Fix promotion of offloaded unreplied tuple</title>
<updated>2023-06-14T07:56:50Z</updated>
<author>
<name>Paul Blakey</name>
<email>paulb@nvidia.com</email>
</author>
<published>2023-06-09T12:22:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=41f2c7c342d3adb1c4dd5f2e3dd831adff16a669'/>
<id>urn:sha1:41f2c7c342d3adb1c4dd5f2e3dd831adff16a669</id>
<content type='text'>
Currently UNREPLIED and UNASSURED connections are added to the nf flow
table. This causes the following connection packets to be processed
by the flow table which then skips conntrack_in(), and thus such the
connections will remain UNREPLIED and UNASSURED even if reply traffic
is then seen. Even still, the unoffloaded reply packets are the ones
triggering hardware update from new to established state, and if
there aren't any to triger an update and/or previous update was
missed, hardware can get out of sync with sw and still mark
packets as new.

Fix the above by:
1) Not skipping conntrack_in() for UNASSURED packets, but still
   refresh for hardware, as before the cited patch.
2) Try and force a refresh by reply-direction packets that update
   the hardware rules from new to established state.
3) Remove any bidirectional flows that didn't failed to update in
   hardware for re-insertion as bidrectional once any new packet
   arrives.

Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections")
Co-developed-by: Vlad Buslov &lt;vladbu@nvidia.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@nvidia.com&gt;
Signed-off-by: Paul Blakey &lt;paulb@nvidia.com&gt;
Reviewed-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'nf-23-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf</title>
<updated>2023-06-10T18:57:03Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2023-06-10T18:57:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=65d8bd81aa15c36d9703f4393651d10edf1f030c'/>
<id>urn:sha1:65d8bd81aa15c36d9703f4393651d10edf1f030c</id>
<content type='text'>
netfilter pull request 23-06-08

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for net:

1) Add commit and abort set operation to pipapo set abort path.

2) Bail out immediately in case of ENOMEM in nfnetlink batch.

3) Incorrect error path handling when creating a new rule leads to
   dangling pointer in set transaction list.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
