<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/netns, branch v5.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-06-30T17:45:08Z</updated>
<entry>
<title>bpf, netns: Keep a list of attached bpf_link's</title>
<updated>2020-06-30T17:45:08Z</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2020-06-25T14:13:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ab53cad90eb10c9991f501ba08904680a074ef3d'/>
<id>urn:sha1:ab53cad90eb10c9991f501ba08904680a074ef3d</id>
<content type='text'>
To support multi-prog link-based attachments for new netns attach types, we
need to keep track of more than one bpf_link per attach type. Hence,
convert net-&gt;bpf.links into a list, that currently can be either empty or
have just one item.

Instead of reusing bpf_prog_list from bpf-cgroup, we link together
bpf_netns_link's themselves. This makes list management simpler as we don't
have to allocate, initialize, and later release list elements. We can do
this because multi-prog attachment will be available only for bpf_link, and
we don't need to build a list of programs attached directly and indirectly
via links.

No functional changes intended.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200625141357.910330-4-jakub@cloudflare.com
</content>
</entry>
<entry>
<title>bpf, netns: Keep attached programs in bpf_prog_array</title>
<updated>2020-06-30T17:45:08Z</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2020-06-25T14:13:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=695c12147a40181fe9221d321c3f2de33c9574ed'/>
<id>urn:sha1:695c12147a40181fe9221d321c3f2de33c9574ed</id>
<content type='text'>
Prepare for having multi-prog attachments for new netns attach types by
storing programs to run in a bpf_prog_array, which is well suited for
iterating over programs and running them in sequence.

After this change bpf(PROG_QUERY) may block to allocate memory in
bpf_prog_array_copy_to_user() for collected program IDs. This forces a
change in how we protect access to the attached program in the query
callback. Because bpf_prog_array_copy_to_user() can sleep, we switch from
an RCU read lock to holding a mutex that serializes updaters.

Because we allow only one BPF flow_dissector program to be attached to
netns at all times, the bpf_prog_array pointed by net-&gt;bpf.run_array is
always either detached (null) or one element long.

No functional changes intended.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200625141357.910330-3-jakub@cloudflare.com
</content>
</entry>
<entry>
<title>bpf: Add link-based BPF program attachment to network namespace</title>
<updated>2020-06-01T22:21:03Z</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2020-05-31T08:28:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f045a49fee04b5662cbdeaf0838f9322ae8c63a'/>
<id>urn:sha1:7f045a49fee04b5662cbdeaf0838f9322ae8c63a</id>
<content type='text'>
Extend bpf() syscall subcommands that operate on bpf_link, that is
LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
network namespaces (only flow dissector at the moment).

Link-based and prog-based attachment can be used interchangeably, but only
one can exist at a time. Attempts to attach a link when a prog is already
attached directly, and the other way around, will be met with -EEXIST.
Attempts to detach a program when link exists result in -EINVAL.

Attachment of multiple links of same attach type to one netns is not
supported with the intention to lift the restriction when a use-case
presents itself. Because of that link create returns -E2BIG when trying to
create another netns link, when one already exists.

Link-based attachments to netns don't keep a netns alive by holding a ref
to it. Instead links get auto-detached from netns when the latter is being
destroyed, using a pernet pre_exit callback.

When auto-detached, link lives in defunct state as long there are open FDs
for it. -ENOLINK is returned if a user tries to update a defunct link.

Because bpf_link to netns doesn't hold a ref to struct net, special care is
taken when releasing, updating, or filling link info. The netns might be
getting torn down when any of these link operations are in progress. That
is why auto-detach and update/release/fill_info are synchronized by the
same mutex. Also, link ops have to always check if auto-detach has not
happened yet and if netns is still alive (refcnt &gt; 0).

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200531082846.2117903-5-jakub@cloudflare.com
</content>
</entry>
<entry>
<title>net: Introduce netns_bpf for BPF programs attached to netns</title>
<updated>2020-06-01T22:21:02Z</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2020-05-31T08:28:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a3fd7ceee05431d2c51ed86c6cae015d236a51f0'/>
<id>urn:sha1:a3fd7ceee05431d2c51ed86c6cae015d236a51f0</id>
<content type='text'>
In order to:

 (1) attach more than one BPF program type to netns, or
 (2) support attaching BPF programs to netns with bpf_link, or
 (3) support multi-prog attach points for netns

we will need to keep more state per netns than a single pointer like we
have now for BPF flow dissector program.

Prepare for the above by extracting netns_bpf that is part of struct net,
for storing all state related to BPF programs attached to netns.

Turn flow dissector callbacks for querying/attaching/detaching a program
into generic ones that operate on netns_bpf. Next patch will move the
generic callbacks into their own module.

This is similar to how it is organized for cgroup with cgroup_bpf.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/bpf/20200531082846.2117903-3-jakub@cloudflare.com
</content>
</entry>
<entry>
<title>nexthop: add support for notifiers</title>
<updated>2020-05-22T21:00:38Z</updated>
<author>
<name>Roopa Prabhu</name>
<email>roopa@cumulusnetworks.com</email>
</author>
<published>2020-05-22T05:26:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8590ceedb70181ad9de5a3dc2cfe50ca33a9576a'/>
<id>urn:sha1:8590ceedb70181ad9de5a3dc2cfe50ca33a9576a</id>
<content type='text'>
This patch adds nexthop add/del notifiers. To be used by
vxlan driver in a later patch. Could possibly be used by
switchdev drivers in the future.

Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: add hrtimer slack to sack compression</title>
<updated>2020-04-30T20:24:01Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2020-04-30T17:35:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a70437cc09a11771870e9f6bfc0ba1237161daa8'/>
<id>urn:sha1:a70437cc09a11771870e9f6bfc0ba1237161daa8</id>
<content type='text'>
Add a sysctl to control hrtimer slack, default of 100 usec.

This gives the opportunity to reduce system overhead,
and help very short RTT flows.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: add sysctl for nexthop api compatibility mode</title>
<updated>2020-04-28T19:50:37Z</updated>
<author>
<name>Roopa Prabhu</name>
<email>roopa@cumulusnetworks.com</email>
</author>
<published>2020-04-27T20:56:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4f80116d3df3b23ee4b83ea8557629e1799bc230'/>
<id>urn:sha1:4f80116d3df3b23ee4b83ea8557629e1799bc230</id>
<content type='text'>
Current route nexthop API maintains user space compatibility
with old route API by default. Dumps and netlink notifications
support both new and old API format. In systems which have
moved to the new API, this compatibility mode cancels some
of the performance benefits provided by the new nexthop API.

This patch adds new sysctl nexthop_compat_mode which is on
by default but provides the ability to turn off compatibility
mode allowing systems to run entirely with the new routing
API. Old route API behaviour and support is not modified by this
sysctl.

Uses a single sysctl to cover both ipv4 and ipv6 following
other sysctls. Covers dumps and delete notifications as
suggested by David Ahern.

Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>mptcp: add and use MIB counter infrastructure</title>
<updated>2020-03-30T05:14:49Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2020-03-27T21:48:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc518953bc9c8d7d33c6ab261995f5038f3c87f9'/>
<id>urn:sha1:fc518953bc9c8d7d33c6ab261995f5038f3c87f9</id>
<content type='text'>
Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s"
or "nstat" will show them automatically.

The MPTCP MIB counters are allocated in a distinct pcpu area in order to
avoid bloating/wasting TCP pcpu memory.

Counters are allocated once the first MPTCP socket is created in a
network namespace and free'd on exit.

If no sockets have been allocated, all-zero mptcp counters are shown.

The MIB counter list is taken from the multipath-tcp.org kernel, but
only a few counters have been picked up so far.  The counter list can
be increased at any time later on.

v2 -&gt; v3:
 - remove 'inline' in foo.c files (David S. Miller)

Co-developed-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Mat Martineau &lt;mathew.j.martineau@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted.</title>
<updated>2020-03-12T19:08:09Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.co.jp</email>
</author>
<published>2020-03-10T08:05:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4b01a9674231a97553a55456d883f584e948a78d'/>
<id>urn:sha1:4b01a9674231a97553a55456d883f584e948a78d</id>
<content type='text'>
Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger
condition for bind_conflict") introduced a restriction to forbid to bind
SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to
assign ports dispersedly so that we can connect to the same remote host.

The change results in accelerating port depletion so that we fail to bind
sockets to the same local port even if we want to connect to the different
remote hosts.

You can reproduce this issue by following instructions below.

  1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768"
  2. set SO_REUSEADDR to two sockets.
  3. bind two sockets to (localhost, 0) and the latter fails.

Therefore, when ephemeral ports are exhausted, bind(0) should fallback to
the legacy behaviour to enable the SO_REUSEADDR option and make it possible
to connect to different remote (addr, port) tuples.

This patch allows us to bind SO_REUSEADDR enabled sockets to the same
(addr, port) only when net.ipv4.ip_autobind_reuse is set 1 and all
ephemeral ports are exhausted. This also allows connect() and listen() to
share ports in the following way and may break some applications. So the
ip_autobind_reuse is 0 by default and disables the feature.

  1. setsockopt(sk1, SO_REUSEADDR)
  2. setsockopt(sk2, SO_REUSEADDR)
  3. bind(sk1, saddr, 0)
  4. bind(sk2, saddr, 0)
  5. connect(sk1, daddr)
  6. listen(sk2)

If it is set 1, we can fully utilize the 4-tuples, but we should use
IP_BIND_ADDRESS_NO_PORT for bind()+connect() as possible.

The notable thing is that if all sockets bound to the same port have
both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an
ephemeral port and also do listen().

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.co.jp&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2020-01-26T09:40:21Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-01-26T09:40:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4d8773b68e83558025303f266070b31bc4101e73'/>
<id>urn:sha1:4d8773b68e83558025303f266070b31bc4101e73</id>
<content type='text'>
Minor conflict in mlx5 because changes happened to code that has
moved meanwhile.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
