<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/core, branch v5.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-09-12T10:55:34Z</updated>
<entry>
<title>net: Fix null de-reference of device refcount</title>
<updated>2019-09-12T10:55:34Z</updated>
<author>
<name>Subash Abhinov Kasiviswanathan</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2019-09-10T20:02:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=10cc514f451a0f239aa34f91bc9dc954a9397840'/>
<id>urn:sha1:10cc514f451a0f239aa34f91bc9dc954a9397840</id>
<content type='text'>
In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.

However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-

44837.761523:   &lt;6&gt; Unable to handle kernel paging request at
                    virtual address 0000004a88287000
44837.761651:   &lt;2&gt; pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654:   &lt;2&gt; lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393:   &lt;2&gt; Call trace:
44837.762398:   &lt;2&gt;  in_dev_finish_destroy+0x4c/0xc8
44837.762404:   &lt;2&gt;  in_dev_rcu_put+0x24/0x30
44837.762412:   &lt;2&gt;  rcu_nocb_kthread+0x43c/0x468
44837.762418:   &lt;2&gt;  kthread+0x118/0x128
44837.762424:   &lt;2&gt;  ret_from_fork+0x10/0x1c

Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.

Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti &lt;stranche@codeaurora.org&gt;
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list</title>
<updated>2019-09-07T15:58:48Z</updated>
<author>
<name>Shmulik Ladkani</name>
<email>shmulik@metanetworks.com</email>
</author>
<published>2019-09-06T09:23:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3dcbdb134f329842a38f0e6797191b885ab00a00'/>
<id>urn:sha1:3dcbdb134f329842a38f0e6797191b885ab00a00</id>
<content type='text'>
Historically, support for frag_list packets entering skb_segment() was
limited to frag_list members terminating on exact same gso_size
boundaries. This is verified with a BUG_ON since commit 89319d3801d1
("net: Add frag_list support to skb_segment"), quote:

    As such we require all frag_list members terminate on exact MSS
    boundaries.  This is checked using BUG_ON.
    As there should only be one producer in the kernel of such packets,
    namely GRO, this requirement should not be difficult to maintain.

However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
the "exact MSS boundaries" assumption no longer holds:
An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
leaves the frag_list members as originally merged by GRO with the
original 'gso_size'. Example of such programs are bpf-based NAT46 or
NAT64.

This lead to a kernel BUG_ON for flows involving:
 - GRO generating a frag_list skb
 - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
 - skb_segment() of the skb

See example BUG_ON reports in [0].

In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
skb_segment() was modified to support the "gso_size mangling" case of
a frag_list GRO'ed skb, but *only* for frag_list members having
head_frag==true (having a page-fragment head).

Alas, GRO packets having frag_list members with a linear kmalloced head
(head_frag==false) still hit the BUG_ON.

This commit adds support to skb_segment() for a 'head_skb' packet having
a frag_list whose members are *non* head_frag, with gso_size mangled, by
disabling SG and thus falling-back to copying the data from the given
'head_skb' into the generated segmented skbs - as suggested by Willem de
Bruijn [1].

Since this approach involves the penalty of skb_copy_and_csum_bits()
when building the segments, care was taken in order to enable this
solution only when required:
 - untrusted gso_size, by testing SKB_GSO_DODGY is set
   (SKB_GSO_DODGY is set by any gso_size mangling functions in
    net/core/filter.c)
 - the frag_list is non empty, its item is a non head_frag, *and* the
   headlen of the given 'head_skb' does not match the gso_size.

[0]
https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/

[1]
https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/

Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Suggested-by: Willem de Bruijn &lt;willemdebruijn.kernel@gmail.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.duyck@gmail.com&gt;
Signed-off-by: Shmulik Ladkani &lt;shmulik.ladkani@gmail.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sock_map, fix missing ulp check in sock hash case</title>
<updated>2019-09-05T09:56:19Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2019-09-03T20:24:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=44580a0118d3ede95fec4dce32df5f75f73cd663'/>
<id>urn:sha1:44580a0118d3ede95fec4dce32df5f75f73cd663</id>
<content type='text'>
sock_map and ULP only work together when ULP is loaded after the sock
map is loaded. In the sock_map case we added a check for this to fail
the load if ULP is already set. However, we missed the check on the
sock_hash side.

Add a ULP check to the sock_hash update path.

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: syzbot+7a6ee4d0078eac6bf782@syzkaller.appspotmail.com
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: fix skb use after free in netpoll</title>
<updated>2019-08-28T03:52:02Z</updated>
<author>
<name>Feng Sun</name>
<email>loyou85@gmail.com</email>
</author>
<published>2019-08-26T06:46:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2c1644cf6d46a8267d79ed95cb9b563839346562'/>
<id>urn:sha1:2c1644cf6d46a8267d79ed95cb9b563839346562</id>
<content type='text'>
After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into following use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo-&gt;txq.
queue_process will also run into use after free case.

hit netpoll_send_skb_on_dev first case with following kernel log:

[  117.864773] kernel BUG at mm/slub.c:306!
[  117.864773] invalid opcode: 0000 [#1] SMP PTI
[  117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P           OE     5.3.0-050300rc5-generic #201908182231
[  117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[  117.864781] Call Trace:
[  117.864781]  ? tun_net_xmit+0x21c/0x460
[  117.864781]  kfree_skbmem+0x4e/0x60
[  117.864782]  kfree_skb+0x3a/0xa0
[  117.864782]  tun_net_xmit+0x21c/0x460
[  117.864782]  netpoll_start_xmit+0x11d/0x1b0
[  117.864788]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864789]  __br_forward+0x1b9/0x1e0 [bridge]
[  117.864789]  ? skb_clone+0x53/0xd0
[  117.864790]  ? __skb_clone+0x2e/0x120
[  117.864790]  deliver_clone+0x37/0x50 [bridge]
[  117.864790]  maybe_deliver+0x89/0xc0 [bridge]
[  117.864791]  br_flood+0x6c/0x130 [bridge]
[  117.864791]  br_dev_xmit+0x315/0x3c0 [bridge]
[  117.864792]  netpoll_start_xmit+0x11d/0x1b0
[  117.864792]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864792]  netpoll_send_udp+0x2c6/0x3e8
[  117.864793]  write_msg+0xd9/0xf0 [netconsole]
[  117.864793]  console_unlock+0x386/0x4e0
[  117.864793]  vprintk_emit+0x17e/0x280
[  117.864794]  vprintk_default+0x29/0x50
[  117.864794]  vprintk_func+0x4c/0xbc
[  117.864794]  printk+0x58/0x6f
[  117.864795]  loop_fun+0x24/0x41 [printmsg_loop]
[  117.864795]  kthread+0x104/0x140
[  117.864795]  ? 0xffffffffc05b1000
[  117.864796]  ? kthread_park+0x80/0x80
[  117.864796]  ret_from_fork+0x35/0x40

Signed-off-by: Feng Sun &lt;loyou85@gmail.com&gt;
Signed-off-by: Xiaojun Zhao &lt;xiaojunzhao141@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock: fix potential memory leak in proto_register()</title>
<updated>2019-08-24T23:33:14Z</updated>
<author>
<name>zhanglin</name>
<email>zhang.lin16@zte.com.cn</email>
</author>
<published>2019-08-23T01:14:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b45ce32135d1c82a5bf12aa56957c3fd27956057'/>
<id>urn:sha1:b45ce32135d1c82a5bf12aa56957c3fd27956057</id>
<content type='text'>
If protocols registered exceeded PROTO_INUSE_NR, prot will be
added to proto_list, but no available bit left for prot in
proto_inuse_idx.

Changes since v2:
* Propagate the error code properly

Signed-off-by: zhanglin &lt;zhang.lin16@zte.com.cn&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2019-08-24T00:34:11Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2019-08-24T00:34:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=211c4624521527b19cb82ffd53fb34ae320059aa'/>
<id>urn:sha1:211c4624521527b19cb82ffd53fb34ae320059aa</id>
<content type='text'>
Daniel Borkmann says:

====================
pull-request: bpf 2019-08-24

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.

2) Fix a use-after-free in prog symbol exposure, from Daniel.

3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.

4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.

5) Fix a potential use-after-free on flow dissector detach, from Jakub.

6) Fix bpftool to close prog fd after showing metadata, from Quentin.

7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: allow narrow loads of some sk_reuseport_md fields with offset &gt; 0</title>
<updated>2019-08-23T23:25:41Z</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-08-20T15:50:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2c238177bd7f4b14bdf7447cc1cd9bb791f147e6'/>
<id>urn:sha1:2c238177bd7f4b14bdf7447cc1cd9bb791f147e6</id>
<content type='text'>
test_select_reuseport fails on s390 due to verifier rejecting
test_select_reuseport_kern.o with the following message:

	; data_check.eth_protocol = reuse_md-&gt;eth_protocol;
	18: (69) r1 = *(u16 *)(r6 +22)
	invalid bpf_context access off=22 size=2

This is because on big-endian machines casts from __u32 to __u16 are
generated by referencing the respective variable as __u16 with an offset
of 2 (as opposed to 0 on little-endian machines).

The verifier already has all the infrastructure in place to allow such
accesses, it's just that they are not explicitly enabled for
eth_protocol field. Enable them for eth_protocol field by using
bpf_ctx_range instead of offsetof.

Ditto for ip_protocol, bind_inany and len, since they already allow
narrowing, and the same problem can arise when working with them.

Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH</title>
<updated>2019-08-23T23:15:34Z</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2019-08-21T12:17:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=db38de39684dda2bf307f41797db2831deba64e9'/>
<id>urn:sha1:db38de39684dda2bf307f41797db2831deba64e9</id>
<content type='text'>
Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
free the program once a grace period has elapsed. The callback can run
together with new RCU readers that started after the last grace period.
New RCU readers can potentially see the "old" to-be-freed or already-freed
pointer to the program object before the RCU update-side NULLs it.

Reorder the operations so that the RCU update-side resets the protected
pointer before the end of the grace period after which the program will be
freed.

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: Petar Penkov &lt;ppenkov@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>tcp: make sure EPOLLOUT wont be missed</title>
<updated>2019-08-19T20:07:43Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-08-17T04:26:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ef8d8ccdc216f797e66cb4a1372f5c4c285ce1e4'/>
<id>urn:sha1:ef8d8ccdc216f797e66cb4a1372f5c4c285ce1e4</id>
<content type='text'>
As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.

However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :

    long timeo = sock_sndtimeo(sk, flags &amp; MSG_DONTWAIT);

So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk-&gt;sk_sndtimeo has been set to a small (but not zero)
value.

This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.

It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.

Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Reported-by: Vladimir Rutsky  &lt;rutsky@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Jason Baron &lt;jbaron@akamai.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock: make cookie generation global instead of per netns</title>
<updated>2019-08-09T20:14:46Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-08-08T11:57:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd48bdda4fb82c2fe569d97af4217c530168c99c'/>
<id>urn:sha1:cd48bdda4fb82c2fe569d97af4217c530168c99c</id>
<content type='text'>
Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.

The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.

Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: Martynas Pumputis &lt;m@lambda.lt&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
