<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v6.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.2</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-02-08T09:07:11Z</updated>
<entry>
<title>txhash: fix sk-&gt;sk_txrehash default</title>
<updated>2023-02-08T09:07:11Z</updated>
<author>
<name>Kevin Yang</name>
<email>yyd@google.com</email>
</author>
<published>2023-02-07T02:08:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c11204c78d6966c5bda6dd05c3ac5cbb193f93e3'/>
<id>urn:sha1:c11204c78d6966c5bda6dd05c3ac5cbb193f93e3</id>
<content type='text'>
This code fix a bug that sk-&gt;sk_txrehash gets its default enable
value from sysctl_txrehash only when the socket is a TCP listener.

We should have sysctl_txrehash to set the default sk-&gt;sk_txrehash,
no matter TCP, nor listerner/connector.

Tested by following packetdrill:
  0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
  +0 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 4
  // SO_TXREHASH == 74, default to sysctl_txrehash == 1
  +0 getsockopt(3, SOL_SOCKET, 74, [1], [4]) = 0
  +0 getsockopt(4, SOL_SOCKET, 74, [1], [4]) = 0

Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior")
Signed-off-by: Kevin Yang &lt;yyd@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2023-01-28T07:32:03Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2023-01-28T07:32:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0548c5f26a0fefa721b29be2bfff1c44f15b6d73'/>
<id>urn:sha1:0548c5f26a0fefa721b29be2bfff1c44f15b6d73</id>
<content type='text'>
Daniel Borkmann says:

====================
bpf 2023-01-27

We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 10 files changed, 170 insertions(+), 59 deletions(-).

The main changes are:

1) Fix preservation of register's parent/live fields when copying
   range-info, from Eduard Zingerman.

2) Fix an off-by-one bug in bpf_mem_cache_idx() to select the right
   cache, from Hou Tao.

3) Fix stack overflow from infinite recursion in sock_map_close(),
   from Jakub Sitnicki.

4) Fix missing btf_put() in register_btf_id_dtor_kfuncs()'s error path,
   from Jiri Olsa.

5) Fix a splat from bpf_setsockopt() via lsm_cgroup/socket_sock_rcv_skb,
   from Kui-Feng Lee.

6) Fix bpf_send_signal[_thread]() helpers to hold a reference on the task,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix the kernel crash caused by bpf_setsockopt().
  selftests/bpf: Cover listener cloning with progs attached to sockmap
  selftests/bpf: Pass BPF skeleton to sockmap_listen ops tests
  bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener
  bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself
  bpf: Add missing btf_put to register_btf_id_dtor_kfuncs
  selftests/bpf: Verify copy_register_state() preserves parent/live fields
  bpf: Fix to preserve reg parent/live fields when copying range info
  bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
  bpf: Fix off-by-one error in bpf_mem_cache_idx()
====================

Link: https://lore.kernel.org/r/20230127215820.4993-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener</title>
<updated>2023-01-25T05:32:55Z</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2023-01-21T12:41:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ddce1e091757d0259107c6c0c7262df201de2b66'/>
<id>urn:sha1:ddce1e091757d0259107c6c0c7262df201de2b66</id>
<content type='text'>
A listening socket linked to a sockmap has its sk_prot overridden. It
points to one of the struct proto variants in tcp_bpf_prots. The variant
depends on the socket's family and which sockmap programs are attached.

A child socket cloned from a TCP listener initially inherits their sk_prot.
But before cloning is finished, we restore the child's proto to the
listener's original non-tcp_bpf_prots one. This happens in
tcp_create_openreq_child -&gt; tcp_bpf_clone.

Today, in tcp_bpf_clone we detect if the child's proto should be restored
by checking only for the TCP_BPF_BASE proto variant. This is not
correct. The sk_prot of listening socket linked to a sockmap can point to
to any variant in tcp_bpf_prots.

If the listeners sk_prot happens to be not the TCP_BPF_BASE variant, then
the child socket unintentionally is left if the inherited sk_prot by
tcp_bpf_clone.

This leads to issues like infinite recursion on close [1], because the
child state is otherwise not set up for use with tcp_bpf_prot operations.

Adjust the check in tcp_bpf_clone to detect all of tcp_bpf_prots variants.

Note that it wouldn't be sufficient to check the socket state when
overriding the sk_prot in tcp_bpf_update_proto in order to always use the
TCP_BPF_BASE variant for listening sockets. Since commit
b8b8315e39ff ("bpf, sockmap: Remove unhash handler for BPF sockmap usage")
it is possible for a socket to transition to TCP_LISTEN state while already
linked to a sockmap, e.g. connect() -&gt; insert into map -&gt;
connect(AF_UNSPEC) -&gt; listen().

[1]: https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/

Fixes: e80251555f0b ("tcp_bpf: Don't let child socket inherit parent protocol ops on copy")
Reported-by: syzbot+04c21ed96d861dccc5cd@syzkaller.appspotmail.com
Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-2-1e0ee7ac2f90@cloudflare.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: prevent potential spectre v1 gadget in fib_metrics_match()</title>
<updated>2023-01-24T05:37:39Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-01-20T13:31:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e9398a26a92fc402d82ce1f97cc67d832527da0'/>
<id>urn:sha1:5e9398a26a92fc402d82ce1f97cc67d832527da0</id>
<content type='text'>
if (!type)
        continue;
    if (type &gt; RTAX_MAX)
        return false;
    ...
    fi_val = fi-&gt;fib_metrics-&gt;metrics[type - 1];

@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.

Fixes: 5f9ae3d9e7e4 ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: prevent potential spectre v1 gadget in ip_metrics_convert()</title>
<updated>2023-01-24T05:37:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-01-20T13:30:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1d1d63b612801b3f0a39b7d4467cad0abd60e5c8'/>
<id>urn:sha1:1d1d63b612801b3f0a39b7d4467cad0abd60e5c8</id>
<content type='text'>
if (!type)
		continue;
	if (type &gt; RTAX_MAX)
		return -EINVAL;
	...
	metrics[type - 1] = val;

@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.

Fixes: 6cf9dfd3bd62 ("net: fib: move metrics parsing to a helper")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20230120133040.3623463-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix rate_app_limited to default to 1</title>
<updated>2023-01-20T13:23:35Z</updated>
<author>
<name>David Morley</name>
<email>morleyd@google.com</email>
</author>
<published>2023-01-19T19:00:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=300b655db1b5152d6101bcb6801d50899b20c2d6'/>
<id>urn:sha1:300b655db1b5152d6101bcb6801d50899b20c2d6</id>
<content type='text'>
The initial default value of 0 for tp-&gt;rate_app_limited was incorrect,
since a flow is indeed application-limited until it first sends
data. Fixing the default to be 1 is generally correct but also
specifically will help user-space applications avoid using the initial
tcpi_delivery_rate value of 0 that persists until the connection has
some non-zero bandwidth sample.

Fixes: eb8329e0a04d ("tcp: export data delivery rate")
Suggested-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David Morley &lt;morleyd@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Tested-by: David Morley &lt;morleyd@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/ulp: use consistent error code when blocking ULP</title>
<updated>2023-01-19T17:26:16Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2023-01-18T12:24:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8ccc99362b60c6f27bb46f36fdaaccf4ef0303de'/>
<id>urn:sha1:8ccc99362b60c6f27bb46f36fdaaccf4ef0303de</id>
<content type='text'>
The referenced commit changed the error code returned by the kernel
when preventing a non-established socket from attaching the ktls
ULP. Before to such a commit, the user-space got ENOTCONN instead
of EINVAL.

The existing self-tests depend on such error code, and the change
caused a failure:

  RUN           global.non_established ...
 tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107)
 non_established: Test failed at step #3
          FAIL  global.non_established

In the unlikely event existing applications do the same, address
the issue by restoring the prior error code in the above scenario.

Note that the only other ULP performing similar checks at init
time - smc_ulp_ops - also fails with ENOTCONN when trying to attach
the ULP to a non-established socket.

Reported-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Fixes: 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: avoid the lookup process failing to get sk in ehash table</title>
<updated>2023-01-19T12:06:45Z</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2023-01-18T01:59:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3f4ca5fafc08881d7a57daa20449d171f2887043'/>
<id>urn:sha1:3f4ca5fafc08881d7a57daa20449d171f2887043</id>
<content type='text'>
While one cpu is working on looking up the right socket from ehash
table, another cpu is done deleting the request socket and is about
to add (or is adding) the big socket from the table. It means that
we could miss both of them, even though it has little chance.

Let me draw a call trace map of the server side.
   CPU 0                           CPU 1
   -----                           -----
tcp_v4_rcv()                  syn_recv_sock()
                            inet_ehash_insert()
                            -&gt; sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
                            -&gt; __sk_nulls_add_node_rcu(sk, list)

Notice that the CPU 0 is receiving the data after the final ack
during 3-way shakehands and CPU 1 is still handling the final ack.

Why could this be a real problem?
This case is happening only when the final ack and the first data
receiving by different CPUs. Then the server receiving data with
ACK flag tries to search one proper established socket from ehash
table, but apparently it fails as my map shows above. After that,
the server fetches a listener socket and then sends a RST because
it finds a ACK flag in the skb (data), which obeys RST definition
in RFC 793.

Besides, Eric pointed out there's one more race condition where it
handles tw socket hashdance. Only by adding to the tail of the list
before deleting the old one can we avoid the race if the reader has
already begun the bucket traversal and it would possibly miss the head.

Many thanks to Eric for great help from beginning to end.

Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions")
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/
Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2023-01-05T20:40:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-01-05T20:40:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=50011c32f421215f6231996fcc84fd1fe81c4a48'/>
<id>urn:sha1:50011c32f421215f6231996fcc84fd1fe81c4a48</id>
<content type='text'>
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, wifi, and netfilter.

  Current release - regressions:

   - bpf: fix nullness propagation for reg to reg comparisons, avoid
     null-deref

   - inet: control sockets should not use current thread task_frag

   - bpf: always use maximal size for copy_array()

   - eth: bnxt_en: don't link netdev to a devlink port for VFs

  Current release - new code bugs:

   - rxrpc: fix a couple of potential use-after-frees

   - netfilter: conntrack: fix IPv6 exthdr error check

   - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes

   - eth: dsa: qca8k: various fixes for the in-band register access

   - eth: nfp: fix schedule in atomic context when sync mc address

   - eth: renesas: rswitch: fix getting mac address from device tree

   - mobile: ipa: use proper endpoint mask for suspend

  Previous releases - regressions:

   - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
     Jiri / python tests

   - net: tc: don't intepret cls results when asked to drop, fix
     oob-access

   - vrf: determine the dst using the original ifindex for multicast

   - eth: bnxt_en:
      - fix XDP RX path if BPF adjusted packet length
      - fix HDS (header placement) and jumbo thresholds for RX packets

   - eth: ice: xsk: do not use xdp_return_frame() on tx_buf-&gt;raw_buf,
     avoid memory corruptions

  Previous releases - always broken:

   - ulp: prevent ULP without clone op from entering the LISTEN status

   - veth: fix race with AF_XDP exposing old or uninitialized
     descriptors

   - bpf:
      - pull before calling skb_postpull_rcsum() (fix checksum support
        and avoid a WARN())
      - fix panic due to wrong pageattr of im-&gt;image (when livepatch and
        kretfunc coexist)
      - keep a reference to the mm, in case the task is dead

   - mptcp: fix deadlock in fastopen error path

   - netfilter:
      - nf_tables: perform type checking for existing sets
      - nf_tables: honor set timeout and garbage collection updates
      - ipset: fix hash:net,port,net hang with /0 subnet
      - ipset: avoid hung task warning when adding/deleting entries

   - selftests: net:
      - fix cmsg_so_mark.sh test hang on non-x86 systems
      - fix the arp_ndisc_evict_nocarrier test for IPv6

   - usb: rndis_host: secure rndis_query check against int overflow

   - eth: r8169: fix dmar pte write access during suspend/resume with
     WOL

   - eth: lan966x: fix configuration of the PCS

   - eth: sparx5: fix reading of the MAC address

   - eth: qed: allow sleep in qed_mcp_trace_dump()

   - eth: hns3:
      - fix interrupts re-initialization after VF FLR
      - fix handling of promisc when MAC addr table gets full
      - refine the handling for VF heartbeat

   - eth: mlx5:
      - properly handle ingress QinQ-tagged packets on VST
      - fix io_eq_size and event_eq_size params validation on big endian
      - fix RoCE setting at HCA level if not supported at all
      - don't turn CQE compression on by default for IPoIB

   - eth: ena:
      - fix toeplitz initial hash key value
      - account for the number of XDP-processed bytes in interface stats
      - fix rx_copybreak value update

  Misc:

   - ethtool: harden phy stat handling against buggy drivers

   - docs: netdev: convert maintainer's doc from FAQ to a normal
     document"

* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
  caif: fix memory leak in cfctrl_linkup_request()
  inet: control sockets should not use current thread task_frag
  net/ulp: prevent ULP without clone op from entering the LISTEN status
  qed: allow sleep in qed_mcp_trace_dump()
  MAINTAINERS: Update maintainers for ptp_vmw driver
  usb: rndis_host: Secure rndis_query check against int overflow
  net: dpaa: Fix dtsec check for PCS availability
  octeontx2-pf: Fix lmtst ID used in aura free
  drivers/net/bonding/bond_3ad: return when there's no aggregator
  netfilter: ipset: Rework long task execution when adding/deleting entries
  netfilter: ipset: fix hash:net,port,net hang with /0 subnet
  net: sparx5: Fix reading of the MAC address
  vxlan: Fix memory leaks in error path
  net: sched: htb: fix htb_classify() kernel-doc
  net: sched: cbq: dont intepret cls results when asked to drop
  net: sched: atm: dont intepret cls results when asked to drop
  dt-bindings: net: marvell,orion-mdio: Fix examples
  dt-bindings: net: sun8i-emac: Add phy-supply property
  net: ipa: use proper endpoint mask for suspend
  selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
  ...
</content>
</entry>
<entry>
<title>inet: control sockets should not use current thread task_frag</title>
<updated>2023-01-05T04:38:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-01-03T19:27:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1ac88557447088ccd15eb2f2520ce46d463c8e0b'/>
<id>urn:sha1:1ac88557447088ccd15eb2f2520ce46d463c8e0b</id>
<content type='text'>
Because ICMP handlers run from softirq contexts,
they must not use current thread task_frag.

Previously, all sockets allocated by inet_ctl_sock_create()
would use the per-socket page fragment, with no chance of
recursion.

Fixes: 98123866fcf3 ("Treewide: Stop corrupting socket's task_frag")
Reported-by: syzbot+bebc6f1acdf4cbb79b03@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Acked-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Link: https://lore.kernel.org/r/20230103192736.454149-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
