<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/netfilter/ipvs, branch v6.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-06-19T14:01:07Z</updated>
<entry>
<title>ipvs: align inner_mac_header for encapsulation</title>
<updated>2023-06-19T14:01:07Z</updated>
<author>
<name>Terin Stock</name>
<email>terin@cloudflare.com</email>
</author>
<published>2023-06-09T20:58:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d7fce52fdf96663ddc2eb21afecff3775588612a'/>
<id>urn:sha1:d7fce52fdf96663ddc2eb21afecff3775588612a</id>
<content type='text'>
When using encapsulation the original packet's headers are copied to the
inner headers. This preserves the space for an inner mac header, which
is not used by the inner payloads for the encapsulation types supported
by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
segmented, flow can be passed to __skb_udp_tunnel_segment() which
calculates a negative tunnel header length. A negative tunnel header
length causes pskb_may_pull() to fail, dropping the packet.

This can be observed by attaching probes to ip_vs_in_hook(),
__dev_queue_xmit(), and __skb_udp_tunnel_segment():

    perf probe --add '__dev_queue_xmit skb-&gt;inner_mac_header \
    skb-&gt;inner_network_header skb-&gt;mac_header skb-&gt;network_header'
    perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
    perf probe -m ip_vs --add 'ip_vs_in_hook skb-&gt;inner_mac_header \
    skb-&gt;inner_network_header skb-&gt;mac_header skb-&gt;network_header'

These probes the headers and tunnel header length for packets which
traverse the IPVS encapsulation path. A TCP packet can be forced into
the segmentation path by being smaller than a calculated clamped MSS,
but larger than the advertised MSS.

    probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
    probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2

When using veth-based encapsulation, the interfaces are set to be
mac-less, which does not preserve space for an inner mac header. This
prevents this issue from occurring.

In our real-world testing of sending a 32KB file we observed operation
time increasing from ~75ms for veth-based encapsulation to over 1.5s
using IPVS encapsulation due to retries from dropped packets.

This changeset modifies the packet on the encapsulation path in
ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
header offset. This fixes UDP segmentation for both encapsulation types,
and corrects the inner headers for any IPIP flows that may use it.

Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation")
Signed-off-by: Terin Stock &lt;terin@cloudflare.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Acked-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: Remove {Enter,Leave}Function</title>
<updated>2023-04-21T23:39:41Z</updated>
<author>
<name>Simon Horman</name>
<email>horms@kernel.org</email>
</author>
<published>2023-04-17T15:10:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=210ffe4a74caead4d6790747d32b63c6152c70b7'/>
<id>urn:sha1:210ffe4a74caead4d6790747d32b63c6152c70b7</id>
<content type='text'>
Remove EnterFunction and LeaveFunction.

These debugging macros seem well past their use-by date.  And seem to
have little value these days. Removing them allows some trivial cleanup
of some exit paths for some functions. These are also included in this
patch. There is likely scope for further cleanup of both debugging and
unwind paths. But let's leave that for another day.

Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG
is enabled. Compile tested only.

Signed-off-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Horatiu Vultur &lt;horatiu.vultur@microchip.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: Consistently use array_size() in ip_vs_conn_init()</title>
<updated>2023-04-21T23:39:41Z</updated>
<author>
<name>Simon Horman</name>
<email>horms@kernel.org</email>
</author>
<published>2023-04-17T15:10:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=280654932e341bf2860503af49bd5cb8e2b78522'/>
<id>urn:sha1:280654932e341bf2860503af49bd5cb8e2b78522</id>
<content type='text'>
Consistently use array_size() to calculate the size of ip_vs_conn_tab
in bytes.

Flagged by Coccinelle:
 WARNING: array_size is already used (line 1498) to compute the same size

No functional change intended.
Compile tested only.

Signed-off-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Horatiu Vultur &lt;horatiu.vultur@microchip.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: Update width of source for ip_vs_sync_conn_options</title>
<updated>2023-04-21T23:39:41Z</updated>
<author>
<name>Simon Horman</name>
<email>horms@kernel.org</email>
</author>
<published>2023-04-17T15:10:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e3478c68f6704638d08f437cbc552ca5970c151a'/>
<id>urn:sha1:e3478c68f6704638d08f437cbc552ca5970c151a</id>
<content type='text'>
In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options.
That structure looks like this:

struct ip_vs_sync_conn_options {
        struct ip_vs_seq        in_seq;
        struct ip_vs_seq        out_seq;
};

The source of the copy is the in_seq field of struct ip_vs_conn.  Whose
type is struct ip_vs_seq. Thus we can see that the source - is not as
wide as the amount of data copied, which is the width of struct
ip_vs_sync_conn_option.

The copy is safe because the next field in is another struct ip_vs_seq.
Make use of struct_group() to annotate this.

Flagged by gcc-13 as:

 In file included from ./include/linux/string.h:254,
                  from ./include/linux/bitmap.h:11,
                  from ./include/linux/cpumask.h:12,
                  from ./arch/x86/include/asm/paravirt.h:17,
                  from ./arch/x86/include/asm/cpuid.h:62,
                  from ./arch/x86/include/asm/processor.h:19,
                  from ./arch/x86/include/asm/timex.h:5,
                  from ./include/linux/timex.h:67,
                  from ./include/linux/time32.h:13,
                  from ./include/linux/time.h:60,
                  from ./include/linux/stat.h:19,
                  from ./include/linux/module.h:13,
                  from net/netfilter/ipvs/ip_vs_sync.c:38:
 In function 'fortify_memcpy_chk',
     inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3:
 ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
   529 |                         __read_overflow2_field(q_size_field, size);
       |

Compile tested only.

Signed-off-by: Simon Horman &lt;horms@kernel.org&gt;
Reviewed-by: Horatiu Vultur &lt;horatiu.vultur@microchip.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>net: dst: Switch to rcuref_t reference counting</title>
<updated>2023-03-29T01:52:28Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2023-03-23T20:55:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bc9d3a9f2afca189a6ae40225b6985e3c775375e'/>
<id>urn:sha1:bc9d3a9f2afca189a6ae40225b6985e3c775375e</id>
<content type='text'>
Under high contention dst_entry::__refcnt becomes a significant bottleneck.

atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into
high retry rates on contention.

Switch the reference count to rcuref_t which results in a significant
performance gain. Rename the reference count member to __rcuref to reflect
the change.

The gain depends on the micro-architecture and the number of concurrent
operations and has been measured in the range of +25% to +130% with a
localhost memtier/memcached benchmark which amplifies the problem
massively.

Running the memtier/memcached benchmark over a real (1Gb) network
connection the conversion on top of the false sharing fix for struct
dst_entry::__refcnt results in a total gain in the 2%-5% range over the
upstream baseline.

Reported-by: Wangyang Guo &lt;wangyang.guo@intel.com&gt;
Reported-by: Arjan Van De Ven &lt;arjan.van.de.ven@intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20230307125538.989175656@linutronix.de
Link: https://lore.kernel.org/r/20230323102800.215027837@linutronix.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next</title>
<updated>2023-02-20T10:53:56Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2023-02-20T10:53:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1155a2281de9e7c08c5c6e265b32b28d1fe9ea07'/>
<id>urn:sha1:1155a2281de9e7c08c5c6e265b32b28d1fe9ea07</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add safeguard to check for NULL tupe in objects updates via
   NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari.

2) Incorrect pointer check in the new destroy rule command,
   from Yang Yingliang.

3) Incorrect status bitcheck in nf_conntrack_udp_packet(),
   from Florian Westphal.

4) Simplify seq_print_acct(), from Ilia Gavrilov.

5) Use 2-arg optimal variant of kfree_rcu() in IPVS,
   from Julian Anastasov.

6) TCP connection enters CLOSE state in conntrack for locally
   originated TCP reset packet from the reject target,
   from Florian Westphal.

The fixes #2 and #3 in this series address issues from the previous pull
nf-next request in this net-next cycle.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipvs: avoid kfree_rcu without 2nd arg</title>
<updated>2023-02-02T13:02:01Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2023-02-01T17:56:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e4d0fe71f59dc5137a2793ff7560730d80d1e1f4'/>
<id>urn:sha1:e4d0fe71f59dc5137a2793ff7560730d80d1e1f4</id>
<content type='text'>
Avoid possible synchronize_rcu() as part from the
kfree_rcu() call when 2nd arg is not provided.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: use skb_ip_totlen and iph_totlen</title>
<updated>2023-02-02T04:54:27Z</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2023-01-28T15:58:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a13fbf5ed5b4fc9095f12e955ca3a59b5507ff01'/>
<id>urn:sha1:a13fbf5ed5b4fc9095f12e955ca3a59b5507ff01</id>
<content type='text'>
There are also quite some places in netfilter that may process IPv4 TCP
GSO packets, we need to replace them too.

In length_mt(), we have to use u_int32_t/int to accept skb_ip_totlen()
return value, otherwise it may overflow and mismatch. This change will
also help us add selftest for IPv4 BIG TCP in the following patch.

Note that we don't need to replace the one in tcpmss_tg4(), as it will
return if there is data after tcphdr in tcpmss_mangle_packet(). The
same in mangle_contents() in nf_nat_helper.c, it returns false when
skb-&gt;len + extra &gt; 65535 in enlarge_skb().

Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>treewide: Convert del_timer*() to timer_shutdown*()</title>
<updated>2022-12-25T21:38:09Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2022-12-20T18:45:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=292a089d78d3e2f7944e60bb897c977785a321e3'/>
<id>urn:sha1:292a089d78d3e2f7944e60bb897c977785a321e3</id>
<content type='text'>
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown".  After a timer is set to this state, then it can no
longer be re-armed.

The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed.  It also ignores any locations where
the timer-&gt;function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.

This was created by using a coccinelle script and the following
commands:

    $ cat timer.cocci
    @@
    expression ptr, slab;
    identifier timer, rfield;
    @@
    (
    -       del_timer(&amp;ptr-&gt;timer);
    +       timer_shutdown(&amp;ptr-&gt;timer);
    |
    -       del_timer_sync(&amp;ptr-&gt;timer);
    +       timer_shutdown_sync(&amp;ptr-&gt;timer);
    )
      ... when strict
          when != ptr-&gt;timer
    (
            kfree_rcu(ptr, rfield);
    |
            kmem_cache_free(slab, ptr);
    |
            kfree(ptr);
    )

    $ spatch timer.cocci . &gt; /tmp/t.patch
    $ patch -p1 &lt; /tmp/t.patch

Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Acked-by: Pavel Machek &lt;pavel@ucw.cz&gt; [ LED ]
Acked-by: Kalle Valo &lt;kvalo@kernel.org&gt; [ wireless ]
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt; [ networking ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf</title>
<updated>2022-12-14T03:32:53Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-12-14T03:32:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7ae9888d6e1ce4062d27367a28e46a26270a3e52'/>
<id>urn:sha1:7ae9888d6e1ce4062d27367a28e46a26270a3e52</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

1) Fix NAT IPv6 flowtable hardware offload, from Qingfang DENG.

2) Add a safety check to IPVS socket option interface report a
   warning if unsupported command is seen, this. From Li Qiong.

3) Document SCTP conntrack timeouts, from Sriram Yagnaraman.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: conntrack: document sctp timeouts
  ipvs: add a 'default' case in do_ip_vs_set_ctl()
  netfilter: flowtable: really fix NAT IPv6 offload
====================

Link: https://lore.kernel.org/r/20221213140923.154594-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
