<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v5.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-10-26T19:29:33Z</updated>
<entry>
<title>net: Implement -&gt;sock_is_readable() for UDP and AF_UNIX</title>
<updated>2021-10-26T19:29:33Z</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-10-08T20:33:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af493388950b6ea3a86f860cfaffab137e024fc8'/>
<id>urn:sha1:af493388950b6ea3a86f860cfaffab137e024fc8</id>
<content type='text'>
Yucong noticed we can't poll() sockets in sockmap even
when they are the destination sockets of redirections.
This is because we never poll any psock queues in -&gt;poll(),
except for TCP. With -&gt;sock_is_readable() now we can
overwrite &gt;sock_is_readable(), invoke and implement it for
both UDP and AF_UNIX sockets.

Reported-by: Yucong Sun &lt;sunyucong@gmail.com&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com
</content>
</entry>
<entry>
<title>skmsg: Extract and reuse sk_msg_is_readable()</title>
<updated>2021-10-26T19:29:33Z</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-10-08T20:33:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fb4e0a5e73d4bb5ab69b7905abd2ec3b580e9b59'/>
<id>urn:sha1:fb4e0a5e73d4bb5ab69b7905abd2ec3b580e9b59</id>
<content type='text'>
tcp_bpf_sock_is_readable() is pretty much generic,
we can extract it and reuse it for non-TCP sockets.

Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211008203306.37525-3-xiyou.wangcong@gmail.com
</content>
</entry>
<entry>
<title>net: Rename -&gt;stream_memory_read to -&gt;sock_is_readable</title>
<updated>2021-10-26T19:29:33Z</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-10-08T20:33:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7b50ecfcc6cdfe87488576bc3ed443dc8d083b90'/>
<id>urn:sha1:7b50ecfcc6cdfe87488576bc3ed443dc8d083b90</id>
<content type='text'>
The proto ops -&gt;stream_memory_read() is currently only used
by TCP to check whether psock queue is empty or not. We need
to rename it before reusing it for non-TCP protocols, and
adjust the exsiting users accordingly.

Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211008203306.37525-2-xiyou.wangcong@gmail.com
</content>
</entry>
<entry>
<title>tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function</title>
<updated>2021-10-26T19:25:55Z</updated>
<author>
<name>Liu Jian</name>
<email>liujian56@huawei.com</email>
</author>
<published>2021-10-12T05:20:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd9733f5d75c94a32544d6ce5be47e14194cf137'/>
<id>urn:sha1:cd9733f5d75c94a32544d6ce5be47e14194cf137</id>
<content type='text'>
With two Msgs, msgA and msgB and a user doing nonblocking sendmsg calls (or
multiple cores) on a single socket 'sk' we could get the following flow.

 msgA, sk                               msgB, sk
 -----------                            ---------------
 tcp_bpf_sendmsg()
 lock(sk)
 psock = sk-&gt;psock
                                        tcp_bpf_sendmsg()
                                        lock(sk) ... blocking
tcp_bpf_send_verdict
if (psock-&gt;eval == NONE)
   psock-&gt;eval = sk_psock_msg_verdict
 ..
 &lt; handle SK_REDIRECT case &gt;
   release_sock(sk)                     &lt; lock dropped so grab here &gt;
   ret = tcp_bpf_sendmsg_redir
                                        psock = sk-&gt;psock
                                        tcp_bpf_send_verdict
 lock_sock(sk) ... blocking on B
                                        if (psock-&gt;eval == NONE) &lt;- boom.
                                         psock-&gt;eval will have msgA state

The problem here is we dropped the lock on msgA and grabbed it with msgB.
Now we have old state in psock and importantly psock-&gt;eval has not been
cleared. So msgB will run whatever action was done on A and the verdict
program may never see it.

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Liu Jian &lt;liujian56@huawei.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20211012052019.184398-1-liujian56@huawei.com
</content>
</entry>
<entry>
<title>tcp: md5: Allow MD5SIG_FLAG_IFINDEX with ifindex=0</title>
<updated>2021-10-15T13:36:57Z</updated>
<author>
<name>Leonard Crestez</name>
<email>cdleonard@gmail.com</email>
</author>
<published>2021-10-15T07:26:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a76c2315bec7afeb9bafe776fe532106fa0a8b05'/>
<id>urn:sha1:a76c2315bec7afeb9bafe776fe532106fa0a8b05</id>
<content type='text'>
Multiple VRFs are generally meant to be "separate" but right now md5
keys for the default VRF also affect connections inside VRFs if the IP
addresses happen to overlap.

So far the combination of TCP_MD5SIG_FLAG_IFINDEX with tcpm_ifindex == 0
was an error, accept this to mean "key only applies to default VRF".
This is what applications using VRFs for traffic separation want.

Signed-off-by: Leonard Crestez &lt;cdleonard@gmail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: md5: Fix overlap between vrf and non-vrf keys</title>
<updated>2021-10-15T13:36:57Z</updated>
<author>
<name>Leonard Crestez</name>
<email>cdleonard@gmail.com</email>
</author>
<published>2021-10-15T07:26:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86f1e3a8489f6a0232c1f3bc2bdb379f5ccdecec'/>
<id>urn:sha1:86f1e3a8489f6a0232c1f3bc2bdb379f5ccdecec</id>
<content type='text'>
With net.ipv4.tcp_l3mdev_accept=1 it is possible for a listen socket to
accept connection from the same client address in different VRFs. It is
also possible to set different MD5 keys for these clients which differ
only in the tcpm_l3index field.

This appears to work when distinguishing between different VRFs but not
between non-VRF and VRF connections. In particular:

 * tcp_md5_do_lookup_exact will match a non-vrf key against a vrf key.
This means that adding a key with l3index != 0 after a key with l3index
== 0 will cause the earlier key to be deleted. Both keys can be present
if the non-vrf key is added later.
 * _tcp_md5_do_lookup can match a non-vrf key before a vrf key. This
casues failures if the passwords differ.

Fix this by making tcp_md5_do_lookup_exact perform an actual exact
comparison on l3index and by making  __tcp_md5_do_lookup perfer
vrf-bound keys above other considerations like prefixlen.

Fixes: dea53bb80e07 ("tcp: Add l3index to tcp_md5sig_key and md5 functions")
Signed-off-by: Leonard Crestez &lt;cdleonard@gmail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe</title>
<updated>2021-10-14T14:54:47Z</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2021-10-14T09:50:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1fcd794518b7644169595c66b1bfe726d1f498ab'/>
<id>urn:sha1:1fcd794518b7644169595c66b1bfe726d1f498ab</id>
<content type='text'>
In icmp_build_probe(), the icmp_ext_echo_iio parsing should be done
step by step and skb_header_pointer() return value should always be
checked, this patch fixes 3 places in there:

  - On case ICMP_EXT_ECHO_CTYPE_NAME, it should only copy ident.name
    from skb by skb_header_pointer(), its len is ident_len. Besides,
    the return value of skb_header_pointer() should always be checked.

  - On case ICMP_EXT_ECHO_CTYPE_INDEX, move ident_len check ahead of
    skb_header_pointer(), and also do the return value check for
    skb_header_pointer().

  - On case ICMP_EXT_ECHO_CTYPE_ADDR, before accessing iio-&gt;ident.addr.
    ctype3_hdr.addrlen, skb_header_pointer() should be called first,
    then check its return value and ident_len.
    On subcases ICMP_AFI_IP and ICMP_AFI_IP6, also do check for ident.
    addr.ctype3_hdr.addrlen and skb_header_pointer()'s return value.
    On subcase ICMP_AFI_IP, the len for skb_header_pointer() should be
    "sizeof(iio-&gt;extobj_hdr) + sizeof(iio-&gt;ident.addr.ctype3_hdr) +
    sizeof(struct in_addr)" or "ident_len".

v1-&gt;v2:
  - To make it more clear, call skb_header_pointer() once only for
    iio-&gt;indent's parsing as Jakub Suggested.
v2-&gt;v3:
  - The extobj_hdr.length check against sizeof(_iio) should be done
    before calling skb_header_pointer(), as Eric noticed.

Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages")
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/31628dd76657ea62f5cf78bb55da6b35240831f1.1634205050.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: prefer socket bound to interface when not in VRF</title>
<updated>2021-10-07T14:27:55Z</updated>
<author>
<name>Mike Manning</name>
<email>mvrmanning@gmail.com</email>
</author>
<published>2021-10-05T13:03:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d6c414cd2fb74aa6812e9bfec6178f8246c4f3a'/>
<id>urn:sha1:8d6c414cd2fb74aa6812e9bfec6178f8246c4f3a</id>
<content type='text'>
The commit 6da5b0f027a8 ("net: ensure unbound datagram socket to be
chosen when not in a VRF") modified compute_score() so that a device
match is always made, not just in the case of an l3mdev skb, then
increments the score also for unbound sockets. This ensures that
sockets bound to an l3mdev are never selected when not in a VRF.
But as unbound and bound sockets are now scored equally, this results
in the last opened socket being selected if there are matches in the
default VRF for an unbound socket and a socket bound to a dev that is
not an l3mdev. However, handling prior to this commit was to always
select the bound socket in this case. Reinstate this handling by
incrementing the score only for bound sockets. The required isolation
due to choosing between an unbound socket and a socket bound to an
l3mdev remains in place due to the device match always being made.
The same approach is taken for compute_score() for stream sockets.

Fixes: 6da5b0f027a8 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
Fixes: e78190581aff ("net: ensure unbound stream socket to be chosen when not in a VRF")
Signed-off-by: Mike Manning &lt;mmanning@vyatta.att-mail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/r/cf0a8523-b362-1edf-ee78-eef63cbbb428@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf</title>
<updated>2021-10-02T12:55:02Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2021-10-02T12:55:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dade7f9d819d89ba2da5f72a07cb4b91a1e1f74a'/>
<id>urn:sha1:dade7f9d819d89ba2da5f72a07cb4b91a1e1f74a</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter fixes for net (v2)

The following patchset contains Netfilter fixes for net:

1) Move back the defrag users fields to the global netns_nf area.
   Kernel fails to boot if conntrack is builtin and kernel is booted
   with: nf_conntrack.enable_hooks=1. From Florian Westphal.

2) Rule event notification is missing relevant context such as
   the position handle and the NLM_F_APPEND flag.

3) Rule replacement is expanded to add + delete using the existing
   rule handle, reverse order of this operation so it makes sense
   from rule notification standpoint.

4) Propagate to userspace the NLM_F_CREATE and NLM_F_EXCL flags
   from the rule notification path.

Patches #2, #3 and #4 are used by 'nft monitor' and 'iptables-monitor'
userspace utilities which are not correctly representing the following
operations through netlink notifications:

- rule insertions
- rule addition/insertion from position handle
- create table/chain/set/map/flowtable/...
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: udp: annotate data race around udp_sk(sk)-&gt;corkflag</title>
<updated>2021-09-28T12:21:16Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-09-28T00:29:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a9f5970767d11eadc805d5283f202612c7ba1f59'/>
<id>urn:sha1:a9f5970767d11eadc805d5283f202612c7ba1f59</id>
<content type='text'>
up-&gt;corkflag field can be read or written without any lock.
Annotate accesses to avoid possible syzbot/KCSAN reports.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
