<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/netns, branch v5.19</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.19</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.19'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-05-16T09:10:37Z</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next</title>
<updated>2022-05-16T09:10:37Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2022-05-16T09:10:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a01a0751731c807c04e81d3c19c5b782d205af7'/>
<id>urn:sha1:1a01a0751731c807c04e81d3c19c5b782d205af7</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

This is v2 including deadlock fix in conntrack ecache rework
reported by Jakub Kicinski.

The following patchset contains Netfilter updates for net-next,
mostly updates to conntrack from Florian Westphal.

1) Add a dedicated list for conntrack event redelivery.

2) Include event redelivery list in conntrack dumps of dying type.

3) Remove per-cpu dying list for event redelivery, not used anymore.

4) Add netns .pre_exit to cttimeout to zap timeout objects before
   synchronize_rcu() call.

5) Remove nf_ct_unconfirmed_destroy.

6) Add generation id for conntrack extensions for conntrack
   timeout and helpers.

7) Detach timeout policy from conntrack on cttimeout module removal.

8) Remove __nf_ct_unconfirmed_destroy.

9) Remove unconfirmed list.

10) Remove unconditional local_bh_disable in init_conntrack().

11) Consolidate conntrack iterator nf_ct_iterate_cleanup().

12) Detect if ctnetlink listeners exist to short-circuit event
    path early.

13) Un-inline nf_ct_ecache_ext_add().

14) Add nf_conntrack_events autodetect ctnetlink listener mode
    and make it default.

15) Add nf_ct_ecache_exist() to check for event cache extension.

16) Extend flowtable reverse route lookup to include source, iif,
    tos and mark, from Sven Auhagen.

17) Do not verify zero checksum UDP packets in nf_reject,
    from Kevin Mitchell.

====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: nfnetlink: allow to detect if ctnetlink listeners exist</title>
<updated>2022-05-13T16:56:28Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2022-04-25T13:15:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2794cdb0b97bfe62d25c996c8afe4832207e78bc'/>
<id>urn:sha1:2794cdb0b97bfe62d25c996c8afe4832207e78bc</id>
<content type='text'>
At this time, every new conntrack gets the 'event cache extension'
enabled for it.

This is because the 'net.netfilter.nf_conntrack_events' sysctl defaults
to 1.

Changing the default to 0 means that commands that rely on the event
notification extension, e.g. 'conntrack -E' or conntrackd, stop working.

We COULD detect if there is a listener by means of
'nfnetlink_has_listeners()' and only add the extension if this is true.

The downside is a dependency from conntrack module to nfnetlink module.

This adds a different way: inc/dec a counter whenever a ctnetlink group
is being (un)subscribed and toggle a flag in struct net.

Next patches will take advantage of this and will only add the event
extension if the flag is set.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: conntrack: remove unconfirmed list</title>
<updated>2022-05-13T16:53:27Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2022-04-11T11:01:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8a75a2c1741073f0c2d7bee146648e717527e048'/>
<id>urn:sha1:8a75a2c1741073f0c2d7bee146648e717527e048</id>
<content type='text'>
It has no function anymore and can be removed.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: conntrack: remove the percpu dying list</title>
<updated>2022-05-13T16:52:16Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2022-04-11T11:01:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1397af5bfd7d32b0cf2adb70a78c9a9e8f11d912'/>
<id>urn:sha1:1397af5bfd7d32b0cf2adb70a78c9a9e8f11d912</id>
<content type='text'>
Its no longer needed. Entries that need event redelivery are placed
on the new pernet dying list.

The advantage is that there is no need to take additional spinlock on
conntrack removal unless event redelivery failed or the conntrack entry
was never added to the table in the first place (confirmed bit not set).

The IPS_CONFIRMED bit now needs to be set as soon as the entry has been
unlinked from the unconfirmed list, else the destroy function may
attempt to unlink it a second time.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipv6: make ip6_rt_gc_expire an atomic_t</title>
<updated>2022-04-15T21:28:50Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-04-13T18:13:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9cb7c013420f98fa6fd12fc6a5dc055170c108db'/>
<id>urn:sha1:9cb7c013420f98fa6fd12fc6a5dc055170c108db</id>
<content type='text'>
Reads and Writes to ip6_rt_gc_expire always have been racy,
as syzbot reported lately [1]

There is a possible risk of under-flow, leading
to unexpected high value passed to fib6_run_gc(),
although I have not observed this in the field.

Hosts hitting ip6_dst_gc() very hard are under pretty bad
state anyway.

[1]
BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc

read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
 ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
 dst_alloc+0x9b/0x160 net/core/dst.c:86
 ip6_dst_alloc net/ipv6/route.c:344 [inline]
 icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
 mld_send_cr net/ipv6/mcast.c:2119 [inline]
 mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
 worker_thread+0x618/0xa70 kernel/workqueue.c:2436
 kthread+0x1a9/0x1e0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30

read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
 ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
 dst_alloc+0x9b/0x160 net/core/dst.c:86
 ip6_dst_alloc net/ipv6/route.c:344 [inline]
 icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
 mld_send_cr net/ipv6/mcast.c:2119 [inline]
 mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
 worker_thread+0x618/0xa70 kernel/workqueue.c:2436
 kthread+0x1a9/0x1e0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30

value changed: 0x00000bb3 -&gt; 0x00000ba9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d3bad-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/</title>
<updated>2022-03-19T14:49:08Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2022-03-19T14:49:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=62f65554f5cffb17234e2b267d6376efc561d1c0'/>
<id>urn:sha1:62f65554f5cffb17234e2b267d6376efc561d1c0</id>
<content type='text'>
ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2022-03-19

1) Delete duplicated functions that calls same xfrm_api_check.
   From Leon Romanovsky.

2) Align userland API of the default policy structure to the
   internal structures. From Nicolas Dichtel.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>xfrm: rework default policy structure</title>
<updated>2022-03-18T06:23:12Z</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2022-03-14T10:38:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b58b1f563ab78955d37e9e43e02790a85c66ac05'/>
<id>urn:sha1:b58b1f563ab78955d37e9e43e02790a85c66ac05</id>
<content type='text'>
This is a follow up of commit f8d858e607b2 ("xfrm: make user policy API
complete"). The goal is to align userland API to the internal structures.

Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by:  Antony Antony &lt;antony.antony@secunet.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>tcp: adjust TSO packet sizes based on min_rtt</title>
<updated>2022-03-10T04:05:44Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-03-09T01:57:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=65466904b015f6eeb9225b51aeb29b01a1d4b59c'/>
<id>urn:sha1:65466904b015f6eeb9225b51aeb29b01a1d4b59c</id>
<content type='text'>
Back when tcp_tso_autosize() and TCP pacing were introduced,
our focus was really to reduce burst sizes for long distance
flows.

The simple heuristic of using sk_pacing_rate/1024 has worked
well, but can lead to too small packets for hosts in the same
rack/cluster, when thousands of flows compete for the bottleneck.

Neal Cardwell had the idea of making the TSO burst size
a function of both sk_pacing_rate and tcp_min_rtt()

Indeed, for local flows, sending bigger bursts is better
to reduce cpu costs, as occasional losses can be repaired
quite fast.

This patch is based on Neal Cardwell implementation
done more than two years ago.
bbr is adjusting max_pacing_rate based on measured bandwidth,
while cubic would over estimate max_pacing_rate.

/proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
this new feature, in logarithmic steps.

Tested:

100Gbit NIC, two hosts in the same rack, 4K MTU.
600 flows rate-limited to 20000000 bytes per second.

Before patch: (TSO sizes would be limited to 20000000/1024/4096 -&gt; 4 segments per TSO)

~# echo 0 &gt;/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96005

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         65,945.29 msec task-clock                #    2.845 CPUs utilized
         1,314,632      context-switches          # 19935.279 M/sec
             5,292      cpu-migrations            #   80.249 M/sec
           940,641      page-faults               # 14264.023 M/sec
   201,117,030,926      cycles                    # 3049769.216 GHz                   (83.45%)
    17,699,435,405      stalled-cycles-frontend   #    8.80% frontend cycles idle     (83.48%)
   136,584,015,071      stalled-cycles-backend    #   67.91% backend cycles idle      (83.44%)
    53,809,530,436      instructions              #    0.27  insn per cycle
                                                  #    2.54  stalled cycles per insn  (83.36%)
     9,062,315,523      branches                  # 137422329.563 M/sec               (83.22%)
       153,008,621      branch-misses             #    1.69% of all branches          (83.32%)

      23.182970846 seconds time elapsed

TcpInSegs                       15648792           0.0
TcpOutSegs                      58659110           0.0  # Average of 3.7 4K segments per TSO packet
TcpExtTCPDelivered              58654791           0.0
TcpExtTCPDeliveredCE            19                 0.0

After patch:

~# echo 9 &gt;/proc/sys/net/ipv4/tcp_tso_rtt_log
~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
  96046

 Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':

         48,982.58 msec task-clock                #    2.104 CPUs utilized
           186,014      context-switches          # 3797.599 M/sec
             3,109      cpu-migrations            #   63.472 M/sec
           941,180      page-faults               # 19214.814 M/sec
   153,459,763,868      cycles                    # 3132982.807 GHz                   (83.56%)
    12,069,861,356      stalled-cycles-frontend   #    7.87% frontend cycles idle     (83.32%)
   120,485,917,953      stalled-cycles-backend    #   78.51% backend cycles idle      (83.24%)
    36,803,672,106      instructions              #    0.24  insn per cycle
                                                  #    3.27  stalled cycles per insn  (83.18%)
     5,947,266,275      branches                  # 121417383.427 M/sec               (83.64%)
        87,984,616      branch-misses             #    1.48% of all branches          (83.43%)

      23.281200256 seconds time elapsed

TcpInSegs                       1434706            0.0
TcpOutSegs                      58883378           0.0  # Average of 41 4K segments per TSO packet
TcpExtTCPDelivered              58878971           0.0
TcpExtTCPDeliveredCE            9664               0.0

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/smc: add sysctl for autocorking</title>
<updated>2022-03-01T14:25:12Z</updated>
<author>
<name>Dust Li</name>
<email>dust.li@linux.alibaba.com</email>
</author>
<published>2022-03-01T09:43:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=12bbb0d163a90d81a2677cf7808d364697290207'/>
<id>urn:sha1:12bbb0d163a90d81a2677cf7808d364697290207</id>
<content type='text'>
This add a new sysctl: net.smc.autocorking_size

We can dynamically change the behaviour of autocorking
by change the value of autocorking_size.
Setting to 0 disables autocorking in SMC

Signed-off-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/smc: add sysctl interface for SMC</title>
<updated>2022-03-01T14:25:12Z</updated>
<author>
<name>Dust Li</name>
<email>dust.li@linux.alibaba.com</email>
</author>
<published>2022-03-01T09:43:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=462791bbfa350189e309a5a94541f6b63cd874e8'/>
<id>urn:sha1:462791bbfa350189e309a5a94541f6b63cd874e8</id>
<content type='text'>
This patch add sysctl interface to support container environment
for SMC as we talk in the mail list.

Link: https://lore.kernel.org/netdev/20220224020253.GF5443@linux.alibaba.com
Co-developed-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: Tony Lu &lt;tonylu@linux.alibaba.com&gt;
Signed-off-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
