<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/ip_vs.h, branch v7.1-rc2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v7.1-rc2</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v7.1-rc2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2026-03-04T10:45:45Z</updated>
<entry>
<title>ipvs: use more keys for connection hashing</title>
<updated>2026-03-04T10:45:45Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2026-03-03T21:04:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f20c73b0460d15301cf1bddf0f85d060a38a75df'/>
<id>urn:sha1:f20c73b0460d15301cf1bddf0f85d060a38a75df</id>
<content type='text'>
Simon Kirby reported long time ago that IPVS connection hashing
based only on the client address/port (caddr, cport) as hash keys
is not suitable for setups that accept traffic on multiple virtual
IPs and ports. It can happen for multiple VIP:VPORT services, for
single or many fwmark service(s) that match multiple virtual IPs
and ports or even for passive FTP with peristence in DR/TUN mode
where we expect traffic on multiple ports for the virtual IP.

Fix it by adding virtual addresses and ports to the hash function.
This causes the traffic from NAT real servers to clients to use
second hashing for the in-&gt;out direction.

As result:

- the IN direction from client will use hash node hn0 where
the source/dest addresses and ports used by client will be used
as hash keys

- the OUT direction from NAT real servers will use hash node hn1
for the traffic from real server to client

- the persistence templates are hashed only with parameters based on
the IN direction, so they now will also use the virtual address,
port and fwmark from the service.

OLD:
- all methods: c_list node: proto, caddr:cport
- persistence templates: c_list node: proto, caddr_net:0
- persistence engine templates: c_list node: per-PE, PE-SIP uses jhash

NEW:
- all methods: hn0 node (dir 0): proto, caddr:cport -&gt; vaddr:vport
- MASQ method: hn1 node (dir 1): proto, daddr:dport -&gt; caddr:cport
- persistence templates: hn0 node (dir 0):
  proto, caddr_net:0 -&gt; vaddr:vport_or_0
  proto, caddr_net:0 -&gt; fwmark:0
- persistence engine templates: hn0 node (dir 0): as before

Also reorder the ip_vs_conn fields, so that hash nodes are on same
read-mostly cache line while write-mostly fields are on separate
cache line.

Reported-by: Simon Kirby &lt;sim@hostway.ca&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
</entry>
<entry>
<title>ipvs: switch to per-net connection table</title>
<updated>2026-03-04T10:45:45Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2026-03-03T21:04:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2fa7cc9c70254d42a82bf82827d8d20cafe975d2'/>
<id>urn:sha1:2fa7cc9c70254d42a82bf82827d8d20cafe975d2</id>
<content type='text'>
Use per-net resizable hash table for connections. The global table
is slow to walk when using many namespaces.

The table can be resized in the range of [256 - ip_vs_conn_tab_size].
Table is attached only while services are present. Resizing is done
by delayed work based on load (the number of connections).

Add a hash_key field into the connection to store the table ID in
the highest bit and the entry's hash value in the lowest bits. The
lowest part of the hash value is used as bucket ID, the remaining
part is used to filter the entries in the bucket before matching
the keys and as result, helps the lookup operation to access only
one cache line. By knowing the table ID and bucket ID for entry,
we can unlink it without calculating the hash value and doing
lookup by keys. We need only to validate the saved hash_key under
lock.

For better security switch from jhash to siphash for the default
connection hashing but the persistence engines may use their own
function. Keeping the hash table loaded with entries below the
size (12%) allows to avoid collision for 96+% of the conns.

ip_vs_conn_fill_cport() now will rehash the connection with proper
locking because unhash+hash is not safe for RCU readers.

To invalidate the templates setting just dport to 0xffff is enough,
no need to rehash them. As result, ip_vs_conn_unhash() is now
unused and removed.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
</entry>
<entry>
<title>ipvs: use resizable hash table for services</title>
<updated>2026-03-04T10:45:45Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2026-03-03T21:04:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=840aac3d900d09ec8fb8efe41bd7d09f9eb15538'/>
<id>urn:sha1:840aac3d900d09ec8fb8efe41bd7d09f9eb15538</id>
<content type='text'>
Make the hash table for services resizable in the bit range of 4-20.
Table is attached only while services are present. Resizing is done
by delayed work based on load (the number of hashed services).
Table grows when load increases 2+ times (above 12.5% with lfactor=-3)
and shrinks 8+ times when load decreases 16+ times (below 0.78%).

Switch to jhash hashing to reduce the collisions for multiple
services.

Add a hash_key field into the service to store the table ID in
the highest bit and the entry's hash value in the lowest bits. The
lowest part of the hash value is used as bucket ID, the remaining
part is used to filter the entries in the bucket before matching
the keys and as result, helps the lookup operation to access only
one cache line. By knowing the table ID and bucket ID for entry,
we can unlink it without calculating the hash value and doing
lookup by keys. We need only to validate the saved hash_key under
lock.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
</entry>
<entry>
<title>ipvs: add resizable hash tables</title>
<updated>2026-03-04T10:45:45Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2026-03-03T21:04:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b655388111cf7e43f70e49db64bdaa42bcb8a038'/>
<id>urn:sha1:b655388111cf7e43f70e49db64bdaa42bcb8a038</id>
<content type='text'>
Add infrastructure for resizable hash tables based on hlist_bl
which we will use in followup patches.

The tables allow RCU lookups during resizing, bucket modifications
are protected with per-bucket bit lock and additional custom locking,
the tables are resized when load reaches thresholds determined based
on load factor parameter.

Compared to other implementations we rely on:
* fast entry removal by using node unlinking without pre-lookup
* entry rehashing when hash key changes
* entries can contain multiple hash nodes
* custom locking depending on different contexts
* adjustable load factor to customize the grow/shrink process

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
</entry>
<entry>
<title>ipvs: no_cport and dropentry counters can be per-net</title>
<updated>2026-02-26T03:36:26Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2026-02-24T20:50:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=09b71fb459468b408f3fa3e2b75d20113374f057'/>
<id>urn:sha1:09b71fb459468b408f3fa3e2b75d20113374f057</id>
<content type='text'>
Change the no_cport counters to be per-net and address family.
This should reduce the extra conn lookups done during present
NO_CPORT connections.

By changing from global to per-net dropentry counters, one net
will not affect the drop rate of another net.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://patch.msgid.link/20260224205048.4718-7-fw@strlen.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipvs: use more counters to avoid service lookups</title>
<updated>2026-02-26T03:36:26Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2026-02-24T20:50:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c59bd9e62e060bb5cd4d697b73bbe6f23a723345'/>
<id>urn:sha1:c59bd9e62e060bb5cd4d697b73bbe6f23a723345</id>
<content type='text'>
When new connection is created we can lookup for services multiple
times to support fallback options. We already have some counters
to skip specific lookups because it costs CPU cycles for hash
calculation, etc.

Add more counters for fwmark/non-fwmark services (fwm_services and
nonfwm_services) and make all counters per address family.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://patch.msgid.link/20260224205048.4718-6-fw@strlen.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipvs: use single svc table</title>
<updated>2026-02-26T03:36:25Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2026-02-24T20:50:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b24ae1a387e404e832385448ccad30cb03520e45'/>
<id>urn:sha1:b24ae1a387e404e832385448ccad30cb03520e45</id>
<content type='text'>
fwmark based services and non-fwmark based services can be hashed
in same service table. This reduces the burden of working with two
tables.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://patch.msgid.link/20260224205048.4718-4-fw@strlen.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netns</title>
<updated>2026-02-26T03:36:25Z</updated>
<author>
<name>Jiejian Wu</name>
<email>jiejian@linux.alibaba.com</email>
</author>
<published>2026-02-24T20:50:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=74455a5b4326add2499cb4a1f9706154b3a1eab4'/>
<id>urn:sha1:74455a5b4326add2499cb4a1f9706154b3a1eab4</id>
<content type='text'>
Current ipvs uses one global mutex "__ip_vs_mutex" to keep the global
"ip_vs_svc_table" and "ip_vs_svc_fwm_table" safe. But when there are
tens of thousands of services from different netns in the table, it
takes a long time to look up the table, for example, using "ipvsadm
-ln" from different netns simultaneously.

We make "ip_vs_svc_table" and "ip_vs_svc_fwm_table" per netns, and we
add "service_mutex" per netns to keep these two tables safe instead of
the global "__ip_vs_mutex" in current version. To this end, looking up
services from different netns simultaneously will not get stuck,
shortening the time consumption in large-scale deployment. It can be
reproduced using the simple scripts below.

init.sh: #!/bin/bash
for((i=1;i&lt;=4;i++));do
        ip netns add ns$i
        ip netns exec ns$i ip link set dev lo up
        ip netns exec ns$i sh add-services.sh
done

add-services.sh: #!/bin/bash
for((i=0;i&lt;30000;i++)); do
        ipvsadm -A  -t 10.10.10.10:$((80+$i)) -s rr
done

runtest.sh: #!/bin/bash
for((i=1;i&lt;4;i++));do
        ip netns exec ns$i ipvsadm -ln &gt; /dev/null &amp;
done
ip netns exec ns4 ipvsadm -ln &gt; /dev/null

Run "sh init.sh" to initiate the network environment. Then run "time
./runtest.sh" to evaluate the time consumption. Our testbed is a 4-core
Intel Xeon ECS. The result of the original version is around 8 seconds,
while the result of the modified version is only 0.8 seconds.

Signed-off-by: Jiejian Wu &lt;jiejian@linux.alibaba.com&gt;
Co-developed-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Signed-off-by: Dust Li &lt;dust.li@linux.alibaba.com&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Link: https://patch.msgid.link/20260224205048.4718-2-fw@strlen.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipvs: Fix estimator kthreads preferred affinity</title>
<updated>2025-08-13T06:34:33Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-07-29T12:26:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c0a23bbc98e93704a1f4fb5e7e7bb2d7c0fb6eb3'/>
<id>urn:sha1:c0a23bbc98e93704a1f4fb5e7e7bb2d7c0fb6eb3</id>
<content type='text'>
The estimator kthreads' affinity are defined by sysctl overwritten
preferences and applied through a plain call to the scheduler's affinity
API.

However since the introduction of managed kthreads preferred affinity,
such a practice shortcuts the kthreads core code which eventually
overwrites the target to the default unbound affinity.

Fix this with using the appropriate kthread's API.

Fixes: d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node")
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
</content>
</entry>
<entry>
<title>ipvs: Correct spelling in comments</title>
<updated>2023-04-21T23:39:41Z</updated>
<author>
<name>Simon Horman</name>
<email>horms@kernel.org</email>
</author>
<published>2023-04-17T15:10:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7d15aaa105a9484b5385e5c391ea5203347d6b0'/>
<id>urn:sha1:c7d15aaa105a9484b5385e5c391ea5203347d6b0</id>
<content type='text'>
Correct some spelling errors flagged by codespell and found by inspection.

Signed-off-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Horatiu Vultur &lt;horatiu.vultur@microchip.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
