<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/netns, branch v6.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-06-03T05:55:43Z</updated>
<entry>
<title>net/ipv6: convert skip_notify_on_dev_down sysctl to u8</title>
<updated>2023-06-03T05:55:43Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-06-01T16:04:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ef62c0ae6db11c095880e473db9f846132d7eba8'/>
<id>urn:sha1:ef62c0ae6db11c095880e473db9f846132d7eba8</id>
<content type='text'>
Save a bit a space, and could help future sysctls to
use the same pattern.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Acked-by: Matthieu Baerts &lt;matthieu.baerts@tessares.net&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/ipv6: fix bool/int mismatch for skip_notify_on_dev_down</title>
<updated>2023-06-03T05:55:43Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-06-01T16:04:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=edf2e1d2019b2730d6076dbe4c040d37d7c10bbe'/>
<id>urn:sha1:edf2e1d2019b2730d6076dbe4c040d37d7c10bbe</id>
<content type='text'>
skip_notify_on_dev_down ctl table expects this field
to be an int (4 bytes), not a bool (1 byte).

Because proc_dou8vec_minmax() was added in 5.13,
this patch converts skip_notify_on_dev_down to an int.

Following patch then converts the field to u8 and use proc_dou8vec_minmax().

Fixes: 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Acked-by: Matthieu Baerts &lt;matthieu.baerts@tessares.net&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv6: add icmpv6_error_anycast_as_unicast for ICMPv6</title>
<updated>2023-04-21T03:07:50Z</updated>
<author>
<name>Mahesh Bandewar</name>
<email>maheshb@google.com</email>
</author>
<published>2023-04-19T01:32:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7ab75456be144a354fbb3df1516d82fc24d3d67d'/>
<id>urn:sha1:7ab75456be144a354fbb3df1516d82fc24d3d67d</id>
<content type='text'>
ICMPv6 error packets are not sent to the anycast destinations and this
prevents things like traceroute from working. So create a setting similar
to ECHO when dealing with Anycast sources (icmpv6_echo_ignore_anycast).

Signed-off-by: Mahesh Bandewar &lt;maheshb@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Reviewed-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Link: https://lore.kernel.org/r/20230419013238.2691167-1-maheshb@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf</title>
<updated>2023-02-23T05:25:23Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2023-02-23T05:25:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fd2a55e74a991ae5ff531c9da52963277dc7fbd5'/>
<id>urn:sha1:fd2a55e74a991ae5ff531c9da52963277dc7fbd5</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix broken listing of set elements when table has an owner.

2) Fix conntrack refcount leak in ctnetlink with related conntrack
   entries, from Hangyu Hua.

3) Fix use-after-free/double-free in ctnetlink conntrack insert path,
   from Florian Westphal.

4) Fix ip6t_rpfilter with VRF, from Phil Sutter.

5) Fix use-after-free in ebtables reported by syzbot, also from Florian.

6) Use skb-&gt;len in xt_length to deal with IPv6 jumbo packets,
   from Xin Long.

7) Fix NETLINK_LISTEN_ALL_NSID with ctnetlink, from Florian Westphal.

8) Fix memleak in {ip_,ip6_,arp_}tables in ENOMEM error case,
   from Pavel Tikhomirov.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: x_tables: fix percpu counter block leak on error path when creating new netns
  netfilter: ctnetlink: make event listener tracking global
  netfilter: xt_length: use skb len to match in length_mt6
  netfilter: ebtables: fix table blob use-after-free
  netfilter: ip6t_rpfilter: Fix regression with VRF interfaces
  netfilter: conntrack: fix rmmod double-free race
  netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack()
  netfilter: nf_tables: allow to fetch set elements when table has an owner
====================

Link: https://lore.kernel.org/r/20230222092137.88637-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfilter: ctnetlink: make event listener tracking global</title>
<updated>2023-02-21T23:28:47Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2023-02-20T16:24:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fdf6491193e411087ae77bcbc6468e3e1cff99ed'/>
<id>urn:sha1:fdf6491193e411087ae77bcbc6468e3e1cff99ed</id>
<content type='text'>
pernet tracking doesn't work correctly because other netns might have
set NETLINK_LISTEN_ALL_NSID on its event socket.

In this case its expected that events originating in other net
namespaces are also received.

Making pernet-tracking work while also honoring NETLINK_LISTEN_ALL_NSID
requires much more intrusive changes both in netlink and nfnetlink,
f.e. adding a 'setsockopt' callback that lets nfnetlink know that the
event socket entered (or left) ALL_NSID mode.

Move to global tracking instead: if there is an event socket anywhere
on the system, all net namespaces which have conntrack enabled and
use autobind mode will allocate the ecache extension.

netlink_has_listeners() returns false only if the given group has no
subscribers in any net namespace, the 'net' argument passed to
nfnetlink_has_listeners is only used to derive the protocol (nfnetlink),
it has no other effect.

For proper NETLINK_LISTEN_ALL_NSID-aware pernet tracking of event
listeners a new netlink_has_net_listeners() is also needed.

Fixes: 90d1daa45849 ("netfilter: conntrack: add nf_conntrack_events autodetect mode")
Reported-by: Bryce Kahle &lt;bryce.kahle@datadoghq.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>net: make default_rps_mask a per netns attribute</title>
<updated>2023-02-20T11:22:54Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2023-02-17T12:28:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=50bcfe8df7c73ce51762f65d218b4ef0cc5da3ee'/>
<id>urn:sha1:50bcfe8df7c73ce51762f65d218b4ef0cc5da3ee</id>
<content type='text'>
That really was meant to be a per netns attribute from the beginning.

The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.

To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: xsk: Don't include &lt;linux/rculist.h&gt;</title>
<updated>2022-12-07T04:04:34Z</updated>
<author>
<name>Christophe JAILLET</name>
<email>christophe.jaillet@wanadoo.fr</email>
</author>
<published>2022-12-03T16:51:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e9b4aeed56699b469206d05e706ddf2db95700a9'/>
<id>urn:sha1:e9b4aeed56699b469206d05e706ddf2db95700a9</id>
<content type='text'>
There is no need to include &lt;linux/rculist.h&gt; here.

Prefer the less invasive &lt;linux/types.h&gt; which is needed for 'hlist_head'.

Signed-off-by: Christophe JAILLET &lt;christophe.jaillet@wanadoo.fr&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/r/88d6a1d88764cca328610854f890a9ca1f4b029e.1670086246.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>sctp: add dif and sdif check in asoc and ep lookup</title>
<updated>2022-11-18T11:42:54Z</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2022-11-16T20:01:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0af03170637f47fb5cc6501d4b2dcbf1c14772a9'/>
<id>urn:sha1:0af03170637f47fb5cc6501d4b2dcbf1c14772a9</id>
<content type='text'>
This patch at first adds a pernet global l3mdev_accept to decide if it
accepts the packets from a l3mdev when a SCTP socket doesn't bind to
any interface. It's set to 1 to avoid any possible incompatible issue,
and in next patch, a sysctl will be introduced to allow to change it.

Then similar to inet/udp_sk_bound_dev_eq(), sctp_sk_bound_dev_eq() is
added to check either dif or sdif is equal to sk_bound_dev_if, and to
check sid is 0 or l3mdev_accept is 1 if sk_bound_dev_if is not set.
This function is used to match a association or a endpoint, namely
called by sctp_addrs_lookup_transport() and sctp_endpoint_is_match().
All functions that needs updating are:

sctp_rcv():
  asoc:
  __sctp_rcv_lookup()
    __sctp_lookup_association() -&gt; sctp_addrs_lookup_transport()
    __sctp_rcv_lookup_harder()
      __sctp_rcv_init_lookup()
         __sctp_lookup_association() -&gt; sctp_addrs_lookup_transport()
      __sctp_rcv_walk_lookup()
         __sctp_rcv_asconf_lookup()
           __sctp_lookup_association() -&gt; sctp_addrs_lookup_transport()

  ep:
  __sctp_rcv_lookup_endpoint() -&gt; sctp_endpoint_is_match()

sctp_connect():
  sctp_endpoint_is_peeled_off()
    __sctp_lookup_association()
      sctp_has_association()
        sctp_lookup_association()
          __sctp_lookup_association() -&gt; sctp_addrs_lookup_transport()

sctp_diag_dump_one():
  sctp_transport_lookup_process() -&gt; sctp_addrs_lookup_transport()

Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Introduce optional per-netns hash table.</title>
<updated>2022-11-16T09:43:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-14T21:57:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9804985bf27f8fbcf0d96c7435b5ad94a2a6ea20'/>
<id>urn:sha1:9804985bf27f8fbcf0d96c7435b5ad94a2a6ea20</id>
<content type='text'>
The maximum hash table size is 64K due to the nature of the protocol. [0]
It's smaller than TCP, and fewer sockets can cause a performance drop.

On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in
different netns, creating 32Mi sockets without data transfer in the root
netns causes regression for the iperf3's connection.

  uhash_entries		sockets		length		Gbps
	    64K		      1		     1		5.69
			    1Mi		    16		5.27
			    2Mi		    32		4.90
			    4Mi		    64		4.09
			    8Mi		   128		2.96
			   16Mi		   256		2.06
			   32Mi		   512		1.12

The per-netns hash table breaks the lengthy lists into shorter ones.  It is
useful on a multi-tenant system with thousands of netns.  With smaller hash
tables, we can look up sockets faster, isolate noisy neighbours, and reduce
lock contention.

The max size of the per-netns table is 64K as well.  This is because the
possible hash range by udp_hashfn() always fits in 64K within the same
netns and we cannot make full use of the whole buckets larger than 64K.

  /* 0 &lt; num &lt; 64K  -&gt;  X &lt; hash &lt; X + 64K */
  (num + net_hash_mix(net)) &amp; mask;

Also, the min size is 128.  We use a bitmap to search for an available
port in udp_lib_get_port().  To keep the bitmap on the stack and not
fire the CONFIG_FRAME_WARN error at build time, we round up the table
size to 128.

The sysctl usage is the same with TCP:

  $ dmesg | cut -d ' ' -f 6- | grep "UDP hash"
  UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc)

  # sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 65536  # can be changed by uhash_entries

  # sysctl net.ipv4.udp_child_hash_entries
  net.ipv4.udp_child_hash_entries = 0  # disabled by default

  # ip netns add test1
  # ip netns exec test1 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = -65536  # share the global table

  # sysctl -w net.ipv4.udp_child_hash_entries=100
  net.ipv4.udp_child_hash_entries = 100

  # ip netns add test2
  # ip netns exec test2 sysctl net.ipv4.udp_hash_entries
  net.ipv4.udp_hash_entries = 128  # own a per-netns table with 2^n buckets

We could optimise the hash table lookup/iteration further by removing
the netns comparison for the per-netns one in the future.  Also, we
could optimise the sparse udp_hslot layout by putting it in udp_table.

[0]: https://lore.kernel.org/netdev/4ACC2815.7010101@gmail.com/

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Set NULL to sk-&gt;sk_prot-&gt;h.udp_table.</title>
<updated>2022-11-16T09:43:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-11-14T21:57:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=67fb43308f4b354f13aabcc66dd5d99bfbb7e838'/>
<id>urn:sha1:67fb43308f4b354f13aabcc66dd5d99bfbb7e838</id>
<content type='text'>
We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use the global sk-&gt;sk_prot-&gt;h.udp_table
to fetch a UDP hash table.

Instead, set NULL to sk-&gt;sk_prot-&gt;h.udp_table for UDP and get
a proper table from net-&gt;ipv4.udp_table.

Note that we still need sk-&gt;sk_prot-&gt;h.udp_table for UDP LITE.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
