<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net, branch v5.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-03-02T00:41:27Z</updated>
<entry>
<title>ipv4: Add ICMPv6 support when parse route ipproto</title>
<updated>2019-03-02T00:41:27Z</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2019-02-27T08:15:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e1a99eae84999a2536f50a0beaf5d5262337f40'/>
<id>urn:sha1:5e1a99eae84999a2536f50a0beaf5d5262337f40</id>
<content type='text'>
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.

Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.

v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.

Reported-by: Jianlin Shi &lt;jishi@redhat.com&gt;
Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: avoid use IPCB in cipso_v4_error</title>
<updated>2019-02-25T22:32:35Z</updated>
<author>
<name>Nazarov Sergey</name>
<email>s-nazarov@yandex.ru</email>
</author>
<published>2019-02-25T16:27:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3da1ed7ac398f34fff1694017a07054d69c5f5c5'/>
<id>urn:sha1:3da1ed7ac398f34fff1694017a07054d69c5f5c5</id>
<content type='text'>
Extract IP options in cipso_v4_error and use __icmp_send.

Signed-off-by: Sergey Nazarov &lt;s-nazarov@yandex.ru&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Add __icmp_send helper.</title>
<updated>2019-02-25T22:32:35Z</updated>
<author>
<name>Nazarov Sergey</name>
<email>s-nazarov@yandex.ru</email>
</author>
<published>2019-02-25T16:24:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9ef6b42ad6fd7929dd1b6092cb02014e382c6a91'/>
<id>urn:sha1:9ef6b42ad6fd7929dd1b6092cb02014e382c6a91</id>
<content type='text'>
Add __icmp_send function having ip_options struct parameter

Signed-off-by: Sergey Nazarov &lt;s-nazarov@yandex.ru&gt;
Reviewed-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>phonet: fix building with clang</title>
<updated>2019-02-22T00:23:56Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2019-02-19T21:53:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6321aa197547da397753757bd84c6ce64b3e3d89'/>
<id>urn:sha1:6321aa197547da397753757bd84c6ce64b3e3d89</id>
<content type='text'>
clang warns about overflowing the data[] member in the struct pnpipehdr:

net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds]
                        if (hdr-&gt;data[4] == PEP_IND_READY)
                            ^         ~
include/net/phonet/pep.h:66:3: note: array 'data' declared here
                u8              data[1];

Using a flexible array member at the end of the struct avoids the
warning, but since we cannot have a flexible array member inside
of the union, each index now has to be moved back by one, which
makes it a little uglier.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Rémi Denis-Courmont &lt;remi@remlab.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec</title>
<updated>2019-02-22T00:08:52Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2019-02-22T00:08:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b35560e485cb3a10bfe631732bcb75fe1a568da7'/>
<id>urn:sha1:b35560e485cb3a10bfe631732bcb75fe1a568da7</id>
<content type='text'>
Steffen Klassert says:

====================
pull request (net): ipsec 2019-02-21

1) Don't do TX bytes accounting for the esp trailer when sending
   from a request socket as this will result in an out of bounds
   memory write. From Martin Willi.

2) Destroy xfrm_state synchronously on net exit path to
   avoid nested gc flush callbacks that may trigger a
   warning in xfrm6_tunnel_net_exit(). From Cong Wang.

3) Do an unconditionally clone in pfkey_broadcast_one()
   to avoid a race when freeing the skb.
   From Sean Tranchetti.

4) Fix inbound traffic via XFRM interfaces across network
   namespaces. We did the lookup for interfaces and policies
   in the wrong namespace. From Tobias Brunner.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: fix possible overflow in __sk_mem_raise_allocated()</title>
<updated>2019-02-14T05:05:18Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-02-12T20:26:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5bf325a53202b8728cf7013b72688c46071e212e'/>
<id>urn:sha1:5bf325a53202b8728cf7013b72688c46071e212e</id>
<content type='text'>
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.

They would increase their share of the memory, instead
of decreasing it.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: use a dedicated counter for icmp_v4 redirect packets</title>
<updated>2019-02-09T05:50:15Z</updated>
<author>
<name>Lorenzo Bianconi</name>
<email>lorenzo.bianconi@redhat.com</email>
</author>
<published>2019-02-06T18:18:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c09551c6ff7fe16a79a42133bcecba5fc2fc3291'/>
<id>urn:sha1:c09551c6ff7fe16a79a42133bcecba5fc2fc3291</id>
<content type='text'>
According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer-&gt;rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.

Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host

I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4c3f4 ("Linux-2.6.12-rc2")

Signed-off-by: Lorenzo Bianconi &lt;lorenzo.bianconi@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>xfrm: destroy xfrm_state synchronously on net exit path</title>
<updated>2019-02-05T05:29:20Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2019-01-31T21:05:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f75a2804da391571563c4b6b29e7797787332673'/>
<id>urn:sha1:f75a2804da391571563c4b6b29e7797787332673</id>
<content type='text'>
xfrm_state_put() moves struct xfrm_state to the GC list
and schedules the GC work to clean it up. On net exit call
path, xfrm_state_flush() is called to clean up and
xfrm_flush_gc() is called to wait for the GC work to complete
before exit.

However, this doesn't work because one of the -&gt;destructor(),
ipcomp_destroy(), schedules the same GC work again inside
the GC work. It is hard to wait for such a nested async
callback. This is also why syzbot still reports the following
warning:

 WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
 ...
  ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
  cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
  process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
  worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
  kthread+0x357/0x430 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

In fact, it is perfectly fine to bypass GC and destroy xfrm_state
synchronously on net exit call path, because it is in process context
and doesn't need a work struct to do any blocking work.

This patch introduces xfrm_state_put_sync() which simply bypasses
GC, and lets its callers to decide whether to use this synchronous
version. On net exit path, xfrm_state_fini() and
xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
blocking, it can use xfrm_state_put_sync() directly too.

Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
reflect this change.

Fixes: b48c05ab5d32 ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: unbind set in rule from commit path</title>
<updated>2019-02-04T16:29:17Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2019-02-02T09:49:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f6ac8585897684374a19863fff21186a05805286'/>
<id>urn:sha1:f6ac8585897684374a19863fff21186a05805286</id>
<content type='text'>
Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.

This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.

Moreover, this patch adds the unbind step to deliver the event from the
commit path.  This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.

This patch removes the assumption that both activate and deactivate
callbacks need to be provided.

Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Mikhail Morfikov &lt;mmorfikov@gmail.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvlan, l3mdev: fix broken l3s mode wrt local routes</title>
<updated>2019-01-31T06:13:34Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-01-30T11:49:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d5256083f62e2720f75bb3c5a928a0afe47d6bc3'/>
<id>urn:sha1:d5256083f62e2720f75bb3c5a928a0afe47d6bc3</id>
<content type='text'>
While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
I ran into the issue that while l3 mode is working fine, l3s mode
does not have any connectivity to kube-apiserver and hence all pods
end up in Error state as well. The ipvlan master device sits on
top of a bond device and hostns traffic to kube-apiserver (also running
in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
where the latter is the address of the bond0. While in l3 mode, a
curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
works fine from hostns, neither of them do in case of l3s. In the
latter only a curl to https://127.0.0.1:37573 appeared to work where
for local addresses of bond0 I saw kernel suddenly starting to emit
ARP requests to query HW address of bond0 which remained unanswered
and neighbor entries in INCOMPLETE state. These ARP requests only
happen while in l3s.

Debugging this further, I found the issue is that l3s mode is piggy-
backing on l3 master device, and in this case local routes are using
l3mdev_master_dev_rcu(dev) instead of net-&gt;loopback_dev as per commit
f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
a loopback"). I found that reverting them back into using the
net-&gt;loopback_dev fixed ipvlan l3s connectivity and got everything
working for the CNI.

Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
on l3 master device is to get the l3mdev_ip_rcv() receive hook for
setting the dst entry of the input route without adding its own
ipvlan specific hacks into the receive path, however, any l3 domain
semantics beyond just that are breaking l3s operation. Note that
ipvlan also has the ability to dynamically switch its internal
operation from l3 to l3s for all ports via ipvlan_set_port_mode()
at runtime. In any case, l3 vs l3s soley distinguishes itself by
'de-confusing' netfilter through switching skb-&gt;dev to ipvlan slave
device late in NF_INET_LOCAL_IN before handing the skb to L4.

Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
without any additional l3mdev semantics on top. This should also have
minimal impact since dev-&gt;priv_flags is already hot in cache. With
this set, l3s mode is working fine and I also get things like
masquerading pod traffic on the ipvlan master properly working.

  [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf

Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Mahesh Bandewar &lt;maheshb@google.com&gt;
Cc: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Martynas Pumputis &lt;m@lambda.lt&gt;
Acked-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
