<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net, branch v4.6</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.6</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.6'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-05-06T22:25:26Z</updated>
<entry>
<title>udp_tunnel: Remove redundant udp_tunnel_gro_complete().</title>
<updated>2016-05-06T22:25:26Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jarno@ovn.org</email>
</author>
<published>2016-05-03T23:10:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43b8448cd7b42a4c39476c9a12c960c1408f1946'/>
<id>urn:sha1:43b8448cd7b42a4c39476c9a12c960c1408f1946</id>
<content type='text'>
The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().

Signed-off-by: Jarno Rajahalme &lt;jarno@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec</title>
<updated>2016-05-04T20:35:31Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-05-04T20:35:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=32b583a0cb9b757d68c44f2819fa6ccf95dbb953'/>
<id>urn:sha1:32b583a0cb9b757d68c44f2819fa6ccf95dbb953</id>
<content type='text'>
Steffen Klassert says:

====================
pull request (net): ipsec 2016-05-04

1) The flowcache can hit an OOM condition if too
   many entries are in the gc_list. Fix this by
   counting the entries in the gc_list and refuse
   new allocations if the value is too high.

2) The inner headers are invalid after a xfrm transformation,
   so reset the skb encapsulation field to ensure nobody tries
   access the inner headers. Otherwise tunnel devices stacked
   on top of xfrm may build the outer headers based on wrong
   informations.

3) Add pmtu handling to vti, we need it to report
   pmtu informations for local generated packets.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: Add checksum check to the features check function</title>
<updated>2016-05-03T20:00:54Z</updated>
<author>
<name>Alexander Duyck</name>
<email>aduyck@mirantis.com</email>
</author>
<published>2016-05-02T16:25:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af67eb9e7e1ab37880459f83153d34b3c42b0075'/>
<id>urn:sha1:af67eb9e7e1ab37880459f83153d34b3c42b0075</id>
<content type='text'>
We need to perform an additional check on the inner headers to determine if
we can offload the checksum for them.  Previously this check didn't occur
so we would generate an invalid frame in the case of an IPv6 header
encapsulated inside of an IPv4 tunnel.  To fix this I added a secondary
check to vxlan_features_check so that we can verify that we can offload the
inner checksum.

Signed-off-by: Alexander Duyck &lt;aduyck@mirantis.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>switchdev: Adding complete operation to deferred switchdev ops</title>
<updated>2016-04-24T18:23:32Z</updated>
<author>
<name>Elad Raz</name>
<email>eladr@mellanox.com</email>
</author>
<published>2016-04-21T10:52:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7ceb2afbd6aee4643056b47156baad6841db8e78'/>
<id>urn:sha1:7ceb2afbd6aee4643056b47156baad6841db8e78</id>
<content type='text'>
When using switchdev deferred operation (SWITCHDEV_F_DEFER), the operation
is executed in different context and the application doesn't have any way
to get the operation real status.

Adding a completion callback fixes that. This patch adds fields to
switchdev_attr and switchdev_obj "complete_priv" field which is used by
the "complete" callback.

Application can set a complete function which will be called once the
operation executed.

Signed-off-by: Elad Raz &lt;eladr@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Acked-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Merge tx_flags and tskey in tcp_shifted_skb</title>
<updated>2016-04-21T18:40:55Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-04-20T05:39:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cfea5a688eb37bcd1081255df9f9f777f4e61999'/>
<id>urn:sha1:cfea5a688eb37bcd1081255df9f9f777f4e61999</id>
<content type='text'>
After receiving sacks, tcp_shifted_skb() will collapse
skbs if possible.  tx_flags and tskey also have to be
merged.

This patch reuses the tcp_skb_collapse_tstamp() to handle
them.

BPF Output Before:
~~~~~
&lt;no-output-due-to-missing-tstamp-event&gt;

BPF Output After:
~~~~~
&lt;...&gt;-2024  [007] d.s.    88.644374: : ee_data:14599

Packetdrill Script:
~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 &lt; S 0:0(0) win 32792 &lt;mss 1460,sackOK,nop,nop,nop,wscale 7&gt;
0.100 &gt; S. 0:0(0) ack 1 &lt;mss 1460,nop,nop,sackOK,nop,wscale 7&gt;
0.200 &lt; . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 1460) = 1460
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 13140) = 13140

0.200 &gt; P. 1:1461(1460) ack 1
0.200 &gt; . 1461:8761(7300) ack 1
0.200 &gt; P. 8761:14601(5840) ack 1

0.300 &lt; . 1:1(0) ack 1 win 257 &lt;sack 1461:14601,nop,nop&gt;
0.300 &gt; P. 1:1461(1460) ack 1
0.400 &lt; . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 &gt; F. 14601:14601(0) ack 1
0.500 &lt; F. 1:1(0) ack 14602 win 257
0.500 &gt; . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Tested-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>cls_cgroup: get sk_classid only from full sockets</title>
<updated>2016-04-20T00:09:25Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2016-04-18T11:37:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2309236c13feae619572324efd3e910e66ef6bd0'/>
<id>urn:sha1:2309236c13feae619572324efd3e910e66ef6bd0</id>
<content type='text'>
skb-&gt;sk could point to timewait or request socket which has no sk_classid.
Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify".

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>soreuseport: fix ordering for mixed v4/v6 sockets</title>
<updated>2016-04-15T01:14:03Z</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-04-12T17:11:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d894ba18d4e449b3a7f6eb491f16c9e02933736e'/>
<id>urn:sha1:d894ba18d4e449b3a7f6eb491f16c9e02933736e</id>
<content type='text'>
With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.

Prior to the commits referenced below, an incoming IPv4 packet would
always be routed to a socket of type AF_INET when this mixed-mode was used.
After those changes, the same packet would be routed to the most recently
bound socket (if this happened to be an AF_INET6 socket, it would
have an IPv4 mapped IPv6 address).

The change in behavior occurred because the recent SO_REUSEPORT optimizations
short-circuit the socket scoring logic as soon as they find a match.  They
did not take into account the scoring logic that favors AF_INET sockets
over AF_INET6 sockets in the event of a tie.

To fix this problem, this patch changes the insertion order of AF_INET
and AF_INET6 addresses in the TCP and UDP socket lists when the sockets
have SO_REUSEPORT set.  AF_INET sockets will be inserted at the head of the
list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at
the tail of the list.  This will force AF_INET sockets to always be
considered first.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection")

Reported-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: udp: Do a route lookup and update during release_cb</title>
<updated>2016-04-14T20:29:53Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-04-11T22:29:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e646b657f6983017783914a951039e323120dc55'/>
<id>urn:sha1:e646b657f6983017783914a951039e323120dc55</id>
<content type='text'>
This patch adds a release_cb for UDPv6.  It does a route lookup
and updates sk-&gt;sk_dst_cache if it is needed.  It picks up the
left-over job from ip6_sk_update_pmtu() if the sk was owned
by user during the pmtu update.

It takes a rcu_read_lock to protect the __sk_dst_get() operations
because another thread may do ip6_dst_store() without taking the
sk lock (e.g. sendmsg).

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Reported-by: Wei Wang &lt;weiwan@google.com&gt;
Cc: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update</title>
<updated>2016-04-14T20:29:51Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-04-11T22:29:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=33c162a980fe03498fcecb917f618ad7e7c55e61'/>
<id>urn:sha1:33c162a980fe03498fcecb917f618ad7e7c55e61</id>
<content type='text'>
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
   logic to update the sk-&gt;sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
   sk-&gt;sk_dst_cache instead of the newly created RTF_CACHE clone.

This patch updates the sk-&gt;sk_dst_cache for a connected datagram sk
during pmtu-update code path.

Note that the sk-&gt;sk_v6_daddr is used to do the route lookup
instead of skb-&gt;data (i.e. iph).  It is because a UDP socket can become
connected after sending out some datagrams in un-connected state.  or
It can be connected multiple times to different destinations.  Hence,
iph may not be related to where sk is currently connected to.

It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect()  (i.e changing
the sk-&gt;sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.

For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk-&gt;sk_dst_cache.

Test:

Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0  proto kernel  metric 256  pref medium
2fac:face::/64 via 2fac::face dev eth0  metric 1024  pref medium

A simple python code to create a connected UDP socket:

import socket
import errno

HOST = '2fac::1'
PORT = 8080

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
    try:
	data = s.recv(1024)
    except socket.error as se:
	if se.errno == errno.EMSGSIZE:
		pmtu = s.getsockopt(41, 24)
		print("PMTU:%d" % pmtu)
		break
s.close()

Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300

Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0  metric 0
    cache  expires 463sec mtu 1300 pref medium

Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message.  Here is the scapy script I have
used:

&gt;&gt;&gt; p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
&gt;&gt;&gt; sendp(p, iface='qemubr0')

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Reported-by: Wei Wang &lt;weiwan@google.com&gt;
Cc: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: vrf: Fix dst reference counting</title>
<updated>2016-04-11T19:56:20Z</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2016-04-07T18:10:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9ab179d83b4e31ea277a123492e419067c2f129a'/>
<id>urn:sha1:9ab179d83b4e31ea277a123492e419067c2f129a</id>
<content type='text'>
Vivek reported a kernel exception deleting a VRF with an active
connection through it. The root cause is that the socket has a cached
reference to a dst that is destroyed. Converting the dst_destroy to
dst_release and letting proper reference counting kick in does not
work as the dst has a reference to the device which needs to be released
as well.

I talked to Hannes about this at netdev and he pointed out the ipv4 and
ipv6 dst handling has dst_ifdown for just this scenario. Rather than
continuing with the reinvented dst wheel in VRF just remove it and
leverage the ipv4 and ipv6 versions.

Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
