<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/ip.h, branch v6.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-04-03T22:32:08Z</updated>
<entry>
<title>net: use netif_disable_lro in ipv6_add_dev</title>
<updated>2025-04-03T22:32:08Z</updated>
<author>
<name>Stanislav Fomichev</name>
<email>sdf@fomichev.me</email>
</author>
<published>2025-04-01T16:34:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8965c160b8f7333df895321c8aa6bad4a7175f2b'/>
<id>urn:sha1:8965c160b8f7333df895321c8aa6bad4a7175f2b</id>
<content type='text'>
ipv6_add_dev might call dev_disable_lro which unconditionally grabs
instance lock, so it will deadlock during NETDEV_REGISTER. Switch
to netif_disable_lro.

Make sure all callers hold the instance lock as well.

Cc: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations")
Signed-off-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Link: https://patch.msgid.link/20250401163452.622454-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>inet: call inet6_ehashfn() once from inet6_hash_connect()</title>
<updated>2025-03-06T23:26:02Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-03-05T03:45:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d4438ce68bf145aa1d7ed03ebf3b8ece337e3f64'/>
<id>urn:sha1:d4438ce68bf145aa1d7ed03ebf3b8ece337e3f64</id>
<content type='text'>
inet6_ehashfn() being called from __inet6_check_established()
has a big impact on performance, as shown in the Tested section.

After prior patch, we can compute the hash for port 0
from inet6_hash_connect(), and derive each hash in
__inet_hash_connect() from this initial hash:

hash(saddr, lport, daddr, dport) == hash(saddr, 0, daddr, dport) + lport

Apply the same principle for __inet_check_established(),
although inet_ehashfn() has a smaller cost.

Tested:

Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog
Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server

Before this patch:

  utime_start=0.286131
  utime_end=4.378886
  stime_start=11.952556
  stime_end=1991.655533
  num_transactions=1446830
  latency_min=0.001061085
  latency_max=12.075275028
  latency_mean=0.376375302
  latency_stddev=1.361969596
  num_samples=306383
  throughput=151866.56

perf top:

 50.01%  [kernel]       [k] __inet6_check_established
 20.65%  [kernel]       [k] __inet_hash_connect
 15.81%  [kernel]       [k] inet6_ehashfn
  2.92%  [kernel]       [k] rcu_all_qs
  2.34%  [kernel]       [k] __cond_resched
  0.50%  [kernel]       [k] _raw_spin_lock
  0.34%  [kernel]       [k] sched_balance_trigger
  0.24%  [kernel]       [k] queued_spin_lock_slowpath

After this patch:

  utime_start=0.315047
  utime_end=9.257617
  stime_start=7.041489
  stime_end=1923.688387
  num_transactions=3057968
  latency_min=0.003041375
  latency_max=7.056589232
  latency_mean=0.141075048    # Better latency metrics
  latency_stddev=0.526900516
  num_samples=312996
  throughput=320677.21        # 111 % increase, and 229 % for the series

perf top: inet6_ehashfn is no longer seen.

 39.67%  [kernel]       [k] __inet_hash_connect
 37.06%  [kernel]       [k] __inet6_check_established
  4.79%  [kernel]       [k] rcu_all_qs
  3.82%  [kernel]       [k] __cond_resched
  1.76%  [kernel]       [k] sched_balance_domains
  0.82%  [kernel]       [k] _raw_spin_lock
  0.81%  [kernel]       [k] sched_balance_rq
  0.81%  [kernel]       [k] sched_balance_trigger
  0.76%  [kernel]       [k] queued_spin_lock_slowpath

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Tested-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Link: https://patch.msgid.link/20250305034550.879255-3-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: remove get_rttos</title>
<updated>2025-02-19T02:27:19Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2025-02-14T22:27:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9329b58395e51bba9c847419cc4ba176df3dd2b7'/>
<id>urn:sha1:9329b58395e51bba9c847419cc4ba176df3dd2b7</id>
<content type='text'>
Initialize the ip cookie tos field when initializing the cookie, in
ipcm_init_sk.

The existing code inverts the standard pattern for initializing cookie
fields. Default is to initialize the field from the sk, then possibly
overwrite that when parsing cmsgs (the unlikely case).

This field inverts that, setting the field to an illegal value and
after cmsg parsing checking whether the value is still illegal and
thus should be overridden.

Be careful to always apply mask INET_DSCP_MASK, as before.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20250214222720.3205500-5-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: initialize inet socket cookies with sockcm_init</title>
<updated>2025-02-19T02:27:19Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2025-02-14T22:27:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=94788792f37902f1f4d417f6f9663831cf7e91fc'/>
<id>urn:sha1:94788792f37902f1f4d417f6f9663831cf7e91fc</id>
<content type='text'>
Avoid open coding the same logic.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20250214222720.3205500-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL()</title>
<updated>2025-02-14T21:09:38Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-02-12T13:24:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=54568a84c95bdea20227cf48d41f198d083e78dd'/>
<id>urn:sha1:54568a84c95bdea20227cf48d41f198d083e78dd</id>
<content type='text'>
We have many EXPORT_SYMBOL(x) in networking tree because IPv6
can be built as a module.

CONFIG_IPV6=y is becoming the norm.

Define a EXPORT_IPV6_MOD(x) which only exports x
for modular IPv6.

Same principle applies to EXPORT_IPV6_MOD_GPL()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Mateusz Polchlopek &lt;mateusz.polchlopek@intel.com&gt;
Link: https://patch.msgid.link/20250212132418.1524422-2-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: use RCU protection in ip_dst_mtu_maybe_forward()</title>
<updated>2025-02-07T00:14:14Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-02-05T15:51:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=071d8012869b6af352acca346ade13e7be90a49f'/>
<id>urn:sha1:071d8012869b6af352acca346ade13e7be90a49f</id>
<content type='text'>
ip_dst_mtu_maybe_forward() must use RCU protection to make
sure the net structure it reads does not disappear.

Fixes: f87c10a8aa1e8 ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20250205155120.1676781-4-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>sock: support SO_PRIORITY cmsg</title>
<updated>2024-12-17T02:13:44Z</updated>
<author>
<name>Anna Emese Nyiri</name>
<email>annaemesenyiri@gmail.com</email>
</author>
<published>2024-12-13T08:44:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a32f3e9d1ed146f81162702605d65447a319eb76'/>
<id>urn:sha1:a32f3e9d1ed146f81162702605d65447a319eb76</id>
<content type='text'>
The Linux socket API currently allows setting SO_PRIORITY at the
socket level, applying a uniform priority to all packets sent through
that socket. The exception to this is IP_TOS, when the priority value
is calculated during the handling of
ancillary data, as implemented in commit f02db315b8d8 ("ipv4: IP_TOS
and IP_TTL can be specified as ancillary data").
However, this is a computed
value, and there is currently no mechanism to set a custom priority
via control messages prior to this patch.

According to this patch, if SO_PRIORITY is specified as ancillary data,
the packet is sent with the priority value set through
sockc-&gt;priority, overriding the socket-level values
set via the traditional setsockopt() method. This is analogous to
the existing support for SO_MARK, as implemented in
commit c6af0c227a22 ("ip: support SO_MARK cmsg").

If both cmsg SO_PRIORITY and IP_TOS are passed, then the one that
takes precedence is the last one in the cmsg list.

This patch has the side effect that raw_send_hdrinc now interprets cmsg
IP_TOS.

Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Suggested-by: Ferenc Fejes &lt;fejes@inf.elte.hu&gt;
Signed-off-by: Anna Emese Nyiri &lt;annaemesenyiri@gmail.com&gt;
Link: https://patch.msgid.link/20241213084457.45120-3-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: Switch inet_addr_hash() to less predictable hash.</title>
<updated>2024-10-23T11:17:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2024-10-18T01:41:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c972c1c41d9b20fb38b54e77dcee763e27e715a9'/>
<id>urn:sha1:c972c1c41d9b20fb38b54e77dcee763e27e715a9</id>
<content type='text'>
Recently, commit 4a0ec2aa0704 ("ipv6: switch inet6_addr_hash()
to less predictable hash") and commit 4daf4dc275f1 ("ipv6: switch
inet6_acaddr_hash() to less predictable hash") hardened IPv6
address hash functions.

inet_addr_hash() is also highly predictable, and a malicious use
could abuse a specific bucket.

Let's follow the change on IPv4 by using jhash_1word().

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20241018014100.93776-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>ipv4: tcp: give socket pointer to control skbs</title>
<updated>2024-10-15T00:39:37Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-10-10T17:48:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=79636038d37e7bd4d078238f2a3f002cab4423bc'/>
<id>urn:sha1:79636038d37e7bd4d078238f2a3f002cab4423bc</id>
<content type='text'>
ip_send_unicast_reply() send orphaned 'control packets'.

These are RST packets and also ACK packets sent from TIME_WAIT.

Some eBPF programs would prefer to have a meaningful skb-&gt;sk
pointer as much as possible.

This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Brian Vazquez &lt;brianvv@google.com&gt;
Link: https://patch.msgid.link/20241010174817.1543642-6-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: Convert ip_route_input() to dscp_t.</title>
<updated>2024-10-03T23:21:21Z</updated>
<author>
<name>Guillaume Nault</name>
<email>gnault@redhat.com</email>
</author>
<published>2024-10-01T19:28:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e863e5db6185b1add0df4cb01b31a4ed1c4b738'/>
<id>urn:sha1:7e863e5db6185b1add0df4cb01b31a4ed1c4b738</id>
<content type='text'>
Pass a dscp_t variable to ip_route_input(), instead of a plain u8, to
prevent accidental setting of ECN bits in -&gt;flowi4_tos.

Callers of ip_route_input() to consider are:

  * input_action_end_dx4_finish() and input_action_end_dt4() in
    net/ipv6/seg6_local.c. These functions set the tos parameter to 0,
    which is already a valid dscp_t value, so they don't need to be
    adjusted for the new prototype.

  * icmp_route_lookup(), which already has a dscp_t variable to pass as
    parameter. We just need to remove the inet_dscp_to_dsfield()
    conversion.

  * br_nf_pre_routing_finish(), ip_options_rcv_srr() and ip4ip6_err(),
    which get the DSCP directly from IPv4 headers. Define a helper to
    read the .tos field of struct iphdr as dscp_t, so that these
    function don't have to do the conversion manually.

While there, declare *iph as const in br_nf_pre_routing_finish(),
declare its local variables in reverse-christmas-tree order and move
the "err = ip_route_input()" assignment out of the conditional to avoid
checkpatch warning.

Signed-off-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/e9d40781d64d3d69f4c79ac8a008b8d67a033e8d.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
