<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/ip_vs.h, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-02-02T13:02:01Z</updated>
<entry>
<title>ipvs: avoid kfree_rcu without 2nd arg</title>
<updated>2023-02-02T13:02:01Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2023-02-01T17:56:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e4d0fe71f59dc5137a2793ff7560730d80d1e1f4'/>
<id>urn:sha1:e4d0fe71f59dc5137a2793ff7560730d80d1e1f4</id>
<content type='text'>
Avoid possible synchronize_rcu() as part from the
kfree_rcu() call when 2nd arg is not provided.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: run_estimation should control the kthread tasks</title>
<updated>2022-12-10T21:44:43Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2022-11-22T16:46:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=144361c1949f227df9244302da02c258a363b674'/>
<id>urn:sha1:144361c1949f227df9244302da02c258a363b674</id>
<content type='text'>
Change the run_estimation flag to start/stop the kthread tasks.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: yunhong-cgl jiang &lt;xintian1976@gmail.com&gt;
Cc: "dust.li" &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: add est_cpulist and est_nice sysctl vars</title>
<updated>2022-12-10T21:44:43Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2022-11-22T16:46:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f0be83d5421718ead31707b6ece34cf77d411c00'/>
<id>urn:sha1:f0be83d5421718ead31707b6ece34cf77d411c00</id>
<content type='text'>
Allow the kthreads for stats to be configured for
specific cpulist (isolation) and niceness (scheduling
priority).

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: yunhong-cgl jiang &lt;xintian1976@gmail.com&gt;
Cc: "dust.li" &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: use kthreads for stats estimation</title>
<updated>2022-12-10T21:44:43Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2022-11-22T16:46:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=705dd34440812735ece298eb5bc153fde9544d42'/>
<id>urn:sha1:705dd34440812735ece298eb5bc153fde9544d42</id>
<content type='text'>
Estimating all entries in single list in timer context
by single CPU causes large latency with multiple IPVS rules
as reported in [1], [2], [3].

Spread the estimator structures in multiple chains and
use kthread(s) for the estimation. The chains are processed
in multiple (50) timer ticks to ensure the 2-second interval
between estimations with some accuracy. Every chain is
processed under RCU lock.

Every kthread works over its own data structure and all
such contexts are attached to array. The contexts can be
preserved while the kthread tasks are stopped or restarted.
When estimators are removed, unused kthread contexts are
released and the slots in array are left empty.

First kthread determines parameters to use, eg. maximum
number of estimators to process per kthread based on
chain's length (chain_max), allowing sub-100us cond_resched
rate and estimation taking up to 1/8 of the CPU capacity
to avoid any problems if chain_max is not correctly
calculated.

chain_max is calculated taking into account factors
such as CPU speed and memory/cache speed where the
cache_factor (4) is selected from real tests with
current generation of CPU/NUMA configurations to
correct the difference in CPU usage between
cached (during calc phase) and non-cached (working) state
of the estimated per-cpu data.

First kthread also plays the role of distributor of
added estimators to all kthreads, keeping low the
time to add estimators. The optimization is based on
the fact that newly added estimator should be estimated
after 2 seconds, so we have the time to offload the
adding to chain from controlling process to kthread 0.

The allocated kthread context may grow from 1 to 50
allocated structures for timer ticks which saves memory for
setups with small number of estimators.

We also add delayed work est_reload_work that will
make sure the kthread tasks are properly started/stopped.

ip_vs_start_estimator() is changed to report errors
which allows to safely store the estimators in
allocated structures.

Many thanks to Jiri Wiesner for his valuable comments
and for spending a lot of time reviewing and testing
the changes on different platforms with 48-256 CPUs and
1-8 NUMA nodes under different cpufreq governors.

[1] Report from Yunhong Jiang:
https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/
[2]
https://marc.info/?l=linux-virtual-server&amp;m=159679809118027&amp;w=2
[3] Report from Dust:
https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: yunhong-cgl jiang &lt;xintian1976@gmail.com&gt;
Cc: "dust.li" &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Tested-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: use u64_stats_t for the per-cpu counters</title>
<updated>2022-12-10T21:44:42Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2022-11-22T16:46:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1dbd8d9a82e3f26b9d063292d47ece673f48fce2'/>
<id>urn:sha1:1dbd8d9a82e3f26b9d063292d47ece673f48fce2</id>
<content type='text'>
Use the provided u64_stats_t type to avoid
load/store tearing.

Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type")
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: yunhong-cgl jiang &lt;xintian1976@gmail.com&gt;
Cc: "dust.li" &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Tested-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: use common functions for stats allocation</title>
<updated>2022-12-10T21:44:42Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2022-11-22T16:46:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=de39afb3d811ba2c028de8662adafedb4899327b'/>
<id>urn:sha1:de39afb3d811ba2c028de8662adafedb4899327b</id>
<content type='text'>
Move alloc_percpu/free_percpu logic in new functions

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: yunhong-cgl jiang &lt;xintian1976@gmail.com&gt;
Cc: "dust.li" &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: add rcu protection to stats</title>
<updated>2022-12-10T21:44:42Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2022-11-22T16:45:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5df7d714d8cbcce7642936cc0f6532f0c4c3d197'/>
<id>urn:sha1:5df7d714d8cbcce7642936cc0f6532f0c4c3d197</id>
<content type='text'>
In preparation to using RCU locking for the list
with estimators, make sure the struct ip_vs_stats
are released after RCU grace period by using RCU
callbacks. This affects ipvs-&gt;tot_stats where we
can not use RCU callbacks for ipvs, so we use
allocated struct ip_vs_stats_rcu. For services
and dests we force RCU callbacks for all cases.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: yunhong-cgl jiang &lt;xintian1976@gmail.com&gt;
Cc: "dust.li" &lt;dust.li@linux.alibaba.com&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: add sysctl_run_estimation to support disable estimation</title>
<updated>2021-10-07T17:52:58Z</updated>
<author>
<name>Dust Li</name>
<email>dust.li@linux.alibaba.com</email>
</author>
<published>2021-08-20T05:37:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2232642ec3fb4aad6ae4da1e109f55a0e7f2d204'/>
<id>urn:sha1:2232642ec3fb4aad6ae4da1e109f55a0e7f2d204</id>
<content type='text'>
estimation_timer will iterate the est_list to do estimation
for each ipvs stats. When there are lots of services, the
list can be very large.
We found that estimation_timer() run for more then 200ms on a
machine with 104 CPU and 50K services.

yunhong-cgl jiang report the same phenomenon before:
https://www.spinics.net/lists/lvs-devel/msg05426.html

In some cases(for example a large K8S cluster with many ipvs services),
ipvs estimation may not be needed. So adding a sysctl blob to allow
users to disable this completely.

Default is: 1 (enable)

Cc: yunhong-cgl jiang &lt;xintian1976@gmail.com&gt;
Signed-off-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Acked-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: move handlers to net/ip_vs.h</title>
<updated>2021-02-05T02:37:57Z</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@nvidia.com</email>
</author>
<published>2021-02-03T13:51:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=edf597da02a01edb26bddf06890fb81eee3d82cf'/>
<id>urn:sha1:edf597da02a01edb26bddf06890fb81eee3d82cf</id>
<content type='text'>
Fix the following compilation warnings:
net/netfilter/ipvs/ip_vs_proto_tcp.c:147:1: warning: no previous prototype for 'tcp_snat_handler' [-Wmissing-prototypes]
  147 | tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
      | ^~~~~~~~~~~~~~~~
net/netfilter/ipvs/ip_vs_proto_udp.c:136:1: warning: no previous prototype for 'udp_snat_handler' [-Wmissing-prototypes]
  136 | udp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
      | ^~~~~~~~~~~~~~~~

Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipvs: remove dependency on ip6_tables</title>
<updated>2020-08-31T21:06:51Z</updated>
<author>
<name>Yaroslav Bolyukin</name>
<email>iam@lach.pw</email>
</author>
<published>2020-08-29T13:59:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=144b0a0e608690d46e9a77819249bdd8d23bdcb6'/>
<id>urn:sha1:144b0a0e608690d46e9a77819249bdd8d23bdcb6</id>
<content type='text'>
This dependency was added because ipv6_find_hdr was in iptables specific
code but is no longer required

Fixes: f8f626754ebe ("ipv6: Move ipv6_find_hdr() out of Netfilter code.")
Fixes: 63dca2c0b0e7 ("ipvs: Fix faulty IPv6 extension header handling in IPVS")
Signed-off-by: Yaroslav Bolyukin &lt;iam@lach.pw&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
