<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv6, branch v5.10</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.10</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.10'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-12-10T02:55:46Z</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf</title>
<updated>2020-12-10T02:55:46Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-12-10T02:55:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b7e4ba9a91dffd298d940b4d3f173121ff829a32'/>
<id>urn:sha1:b7e4ba9a91dffd298d940b4d3f173121ff829a32</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Switch to RCU in x_tables to fix possible NULL pointer dereference,
   from Subash Abhinov Kasiviswanathan.

2) Fix netlink dump of dynset timeouts later than 23 days.

3) Add comment for the indirect serialization of the nft commit mutex
   with rtnl_mutex.

4) Remove bogus check for confirmed conntrack when matching on the
   conntrack ID, from Brett Mastbergen.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Retain ECT bits for tos reflection</title>
<updated>2020-12-10T00:08:23Z</updated>
<author>
<name>Wei Wang</name>
<email>weiwan@google.com</email>
</author>
<published>2020-12-08T17:55:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8ef44b6fe49d2b8d03ba9aa69063612b474f963b'/>
<id>urn:sha1:8ef44b6fe49d2b8d03ba9aa69063612b474f963b</id>
<content type='text'>
For DCTCP, we have to retain the ECT bits set by the congestion control
algorithm on the socket when reflecting syn TOS in syn-ack, in order to
make ECN work properly.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Reported-by: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Signed-off-by: Wei Wang &lt;weiwan@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: x_tables: Switch synchronization to RCU</title>
<updated>2020-12-08T11:57:39Z</updated>
<author>
<name>Subash Abhinov Kasiviswanathan</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2020-11-25T18:27:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cc00bcaa589914096edef7fb87ca5cee4a166b5c'/>
<id>urn:sha1:cc00bcaa589914096edef7fb87ca5cee4a166b5c</id>
<content type='text'>
When running concurrent iptables rules replacement with data, the per CPU
sequence count is checked after the assignment of the new information.
The sequence count is used to synchronize with the packet path without the
use of any explicit locking. If there are any packets in the packet path using
the table information, the sequence count is incremented to an odd value and
is incremented to an even after the packet process completion.

The new table value assignment is followed by a write memory barrier so every
CPU should see the latest value. If the packet path has started with the old
table information, the sequence counter will be odd and the iptables
replacement will wait till the sequence count is even prior to freeing the
old table info.

However, this assumes that the new table information assignment and the memory
barrier is actually executed prior to the counter check in the replacement
thread. If CPU decides to execute the assignment later as there is no user of
the table information prior to the sequence check, the packet path in another
CPU may use the old table information. The replacement thread would then free
the table information under it leading to a use after free in the packet
processing context-

Unable to handle kernel NULL pointer dereference at virtual
address 000000000000008e
pc : ip6t_do_table+0x5d0/0x89c
lr : ip6t_do_table+0x5b8/0x89c
ip6t_do_table+0x5d0/0x89c
ip6table_filter_hook+0x24/0x30
nf_hook_slow+0x84/0x120
ip6_input+0x74/0xe0
ip6_rcv_finish+0x7c/0x128
ipv6_rcv+0xac/0xe4
__netif_receive_skb+0x84/0x17c
process_backlog+0x15c/0x1b8
napi_poll+0x88/0x284
net_rx_action+0xbc/0x23c
__do_softirq+0x20c/0x48c

This could be fixed by forcing instruction order after the new table
information assignment or by switching to RCU for the synchronization.

Fixes: 80055dab5de0 ("netfilter: x_tables: make xt_replace_table wait until old rules are not used anymore")
Reported-by: Sean Tranchetti &lt;stranche@codeaurora.org&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Suggested-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>net: ip6_gre: set dev-&gt;hard_header_len when using header_ops</title>
<updated>2020-12-02T19:16:12Z</updated>
<author>
<name>Antoine Tenart</name>
<email>atenart@kernel.org</email>
</author>
<published>2020-11-30T16:19:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=832ba596494b2c9eac7760259eff2d8b7dcad0ee'/>
<id>urn:sha1:832ba596494b2c9eac7760259eff2d8b7dcad0ee</id>
<content type='text'>
syzkaller managed to crash the kernel using an NBMA ip6gre interface. I
could reproduce it creating an NBMA ip6gre interface and forwarding
traffic to it:

  skbuff: skb_under_panic: text:ffffffff8250e927 len:148 put:44 head:ffff8c03c7a33
  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:109!
  Call Trace:
  skb_push+0x10/0x10
  ip6gre_header+0x47/0x1b0
  neigh_connected_output+0xae/0xf0

ip6gre tunnel provides its own header_ops-&gt;create, and sets it
conditionally when initializing the tunnel in NBMA mode. When
header_ops-&gt;create is used, dev-&gt;hard_header_len should reflect the
length of the header created. Otherwise, when not used,
dev-&gt;needed_headroom should be used.

Fixes: eb95f52fc72d ("net: ipv6_gre: Fix GRO to work on IPv6 over GRE tap")
Cc: Maria Pasechnik &lt;mariap@mellanox.com&gt;
Signed-off-by: Antoine Tenart &lt;atenart@kernel.org&gt;
Link: https://lore.kernel.org/r/20201130161911.464106-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init</title>
<updated>2020-11-25T19:20:16Z</updated>
<author>
<name>Wang Hai</name>
<email>wanghai38@huawei.com</email>
</author>
<published>2020-11-24T07:17:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e255e11e66da8281e337e4e352956e8a4999fca4'/>
<id>urn:sha1:e255e11e66da8281e337e4e352956e8a4999fca4</id>
<content type='text'>
kmemleak report a memory leak as follows:

unreferenced object 0xffff8880059c6a00 (size 64):
  comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s)
  hex dump (first 32 bytes):
    20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00   ...............
    1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00  ................
  backtrace:
    [&lt;00000000aa4e7a87&gt;] ip6addrlbl_add+0x90/0xbb0
    [&lt;0000000070b8d7f1&gt;] ip6addrlbl_net_init+0x109/0x170
    [&lt;000000006a9ca9d4&gt;] ops_init+0xa8/0x3c0
    [&lt;000000002da57bf2&gt;] setup_net+0x2de/0x7e0
    [&lt;000000004e52d573&gt;] copy_net_ns+0x27d/0x530
    [&lt;00000000b07ae2b4&gt;] create_new_namespaces+0x382/0xa30
    [&lt;000000003b76d36f&gt;] unshare_nsproxy_namespaces+0xa1/0x1d0
    [&lt;0000000030653721&gt;] ksys_unshare+0x3a4/0x780
    [&lt;0000000007e82e40&gt;] __x64_sys_unshare+0x2d/0x40
    [&lt;0000000031a10c08&gt;] do_syscall_64+0x33/0x40
    [&lt;0000000099df30e7&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9

We should free all rules when we catch an error in ip6addrlbl_net_init().
otherwise a memory leak will occur.

Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Wang Hai &lt;wanghai38@huawei.com&gt;
Link: https://lore.kernel.org/r/20201124071728.8385-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: Set ECT0 bit in tos/tclass for synack when BPF needs ECN</title>
<updated>2020-11-24T22:12:55Z</updated>
<author>
<name>Alexander Duyck</name>
<email>alexanderduyck@fb.com</email>
</author>
<published>2020-11-21T03:47:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=407c85c7ddd6b84d3cbdd2275616f70c27c17913'/>
<id>urn:sha1:407c85c7ddd6b84d3cbdd2275616f70c27c17913</id>
<content type='text'>
When a BPF program is used to select between a type of TCP congestion
control algorithm that uses either ECN or not there is a case where the
synack for the frame was coming up without the ECT0 bit set. A bit of
research found that this was due to the final socket being configured to
dctcp while the listener socket was staying in cubic.

To reproduce it all that is needed is to monitor TCP traffic while running
the sample bpf program "samples/bpf/tcp_cong_kern.c". What is observed,
assuming tcp_dctcp module is loaded or compiled in and the traffic matches
the rules in the sample file, is that for all frames with the exception of
the synack the ECT0 bit is set.

To address that it is necessary to make one additional call to
tcp_bpf_ca_needs_ecn using the request socket and then use the output of
that to set the ECT0 bit for the tos/tclass of the packet.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Link: https://lore.kernel.org/r/160593039663.2604.1374502006916871573.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix race condition when creating child sockets from syncookies</title>
<updated>2020-11-24T00:32:33Z</updated>
<author>
<name>Ricardo Dias</name>
<email>rdias@singlestore.com</email>
</author>
<published>2020-11-20T11:11:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=01770a166165738a6e05c3d911fb4609cc4eb416'/>
<id>urn:sha1:01770a166165738a6e05c3d911fb4609cc4eb416</id>
<content type='text'>
When the TCP stack is in SYN flood mode, the server child socket is
created from the SYN cookie received in a TCP packet with the ACK flag
set.

The child socket is created when the server receives the first TCP
packet with a valid SYN cookie from the client. Usually, this packet
corresponds to the final step of the TCP 3-way handshake, the ACK
packet. But is also possible to receive a valid SYN cookie from the
first TCP data packet sent by the client, and thus create a child socket
from that SYN cookie.

Since a client socket is ready to send data as soon as it receives the
SYN+ACK packet from the server, the client can send the ACK packet (sent
by the TCP stack code), and the first data packet (sent by the userspace
program) almost at the same time, and thus the server will equally
receive the two TCP packets with valid SYN cookies almost at the same
instant.

When such event happens, the TCP stack code has a race condition that
occurs between the momement a lookup is done to the established
connections hashtable to check for the existence of a connection for the
same client, and the moment that the child socket is added to the
established connections hashtable. As a consequence, this race condition
can lead to a situation where we add two child sockets to the
established connections hashtable and deliver two sockets to the
userspace program to the same client.

This patch fixes the race condition by checking if an existing child
socket exists for the same client when we are adding the second child
socket to the established connections socket. If an existing child
socket exists, we drop the packet and discard the second child socket
to the same client.

Signed-off-by: Ricardo Dias &lt;rdias@singlestore.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20201120111133.GA67501@rdias-suse-pc.lan
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header</title>
<updated>2020-11-21T02:09:47Z</updated>
<author>
<name>Alexander Duyck</name>
<email>alexanderduyck@fb.com</email>
</author>
<published>2020-11-19T21:23:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=861602b57730a5c6d3e0b1e4ca7133ca9a8b8538'/>
<id>urn:sha1:861602b57730a5c6d3e0b1e4ca7133ca9a8b8538</id>
<content type='text'>
An issue was recently found where DCTCP SYN/ACK packets did not have the
ECT bit set in the L3 header. A bit of code review found that the recent
change referenced below had gone though and added a mask that prevented the
ECN bits from being populated in the L3 header.

This patch addresses that by rolling back the mask so that it is only
applied to the flags coming from the incoming TCP request instead of
applying it to the socket tos/tclass field. Doing this the ECT bits were
restored in the SYN/ACK packets in my testing.

One thing that is not addressed by this patch set is the fact that
tcp_reflect_tos appears to be incompatible with ECN based congestion
avoidance algorithms. At a minimum the feature should likely be documented
which it currently isn't.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Acked-by: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv6: Remove dependency of ipv6_frag_thdr_truncated on ipv6 module</title>
<updated>2020-11-19T18:49:50Z</updated>
<author>
<name>Georg Kohmann</name>
<email>geokohma@cisco.com</email>
</author>
<published>2020-11-19T09:58:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2d8f6481c17db9fa5238b277cdbc392084060b09'/>
<id>urn:sha1:2d8f6481c17db9fa5238b277cdbc392084060b09</id>
<content type='text'>
IPV6=m
NF_DEFRAG_IPV6=y

ld: net/ipv6/netfilter/nf_conntrack_reasm.o: in function
`nf_ct_frag6_gather':
net/ipv6/netfilter/nf_conntrack_reasm.c:462: undefined reference to
`ipv6_frag_thdr_truncated'

Netfilter is depending on ipv6 symbol ipv6_frag_thdr_truncated. This
dependency is forcing IPV6=y.

Remove this dependency by moving ipv6_frag_thdr_truncated out of ipv6. This
is the same solution as used with a similar issues: Referring to
commit 70b095c843266 ("ipv6: remove dependency of nf_defrag_ipv6 on ipv6
module")

Fixes: 9d9e937b1c8b ("ipv6/netfilter: Discard first fragment not including all headers")
Reported-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Georg Kohmann &lt;geokohma@cisco.com&gt;
Acked-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt; # build-tested
Link: https://lore.kernel.org/r/20201119095833.8409-1-geokohma@cisco.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ah6: fix error return code in ah6_input()</title>
<updated>2020-11-18T18:53:16Z</updated>
<author>
<name>Zhang Changzhong</name>
<email>zhangchangzhong@huawei.com</email>
</author>
<published>2020-11-17T02:45:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a5ebcbdf34b65fcc07f38eaf2d60563b42619a59'/>
<id>urn:sha1:a5ebcbdf34b65fcc07f38eaf2d60563b42619a59</id>
<content type='text'>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Zhang Changzhong &lt;zhangchangzhong@huawei.com&gt;
Link: https://lore.kernel.org/r/1605581105-35295-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
