<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/netfilter/ipvs, branch v4.20</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.20</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.20'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2018-11-26T09:23:42Z</updated>
<entry>
<title>ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notf</title>
<updated>2018-11-26T09:23:42Z</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2018-11-15T07:14:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2a31e4bd9ad255ee40809b5c798c4b1c2b09703b'/>
<id>urn:sha1:2a31e4bd9ad255ee40809b5c798c4b1c2b09703b</id>
<content type='text'>
ip_vs_dst_event is supposed to clean up all dst used in ipvs'
destinations when a net dev is going down. But it works only
when the dst's dev is the same as the dev from the event.

Now with the same priority but late registration,
ip_vs_dst_notifier is always called later than ipv6_dev_notf
where the dst's dev is set to lo for NETDEV_DOWN event.

As the dst's dev lo is not the same as the dev from the event
in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work.
Also as these dst have to wait for dest_trash_timer to clean
them up. It would cause some non-permanent kernel warnings:

  unregister_netdevice: waiting for br0 to become free. Usage count = 3

To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf
by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5.

Note that for ipv4 route fib_netdev_notifier doesn't set dst's
dev to lo in NETDEV_DOWN event, so this fix is only needed when
IP_VS_IPV6 is defined.

Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring")
Reported-by: Li Shuang &lt;shuali@redhat.com&gt;
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Acked-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>iov_iter: Separate type from direction and use accessor functions</title>
<updated>2018-10-23T23:41:07Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-10-19T23:57:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa563d7bca6e882ec2bdae24603c8f016401a144'/>
<id>urn:sha1:aa563d7bca6e882ec2bdae24603c8f016401a144</id>
<content type='text'>
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements.  This makes it easier to add further
iterator types.  Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself.  Only the direction is required.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>net: Add extack to nlmsg_parse</title>
<updated>2018-10-08T17:39:04Z</updated>
<author>
<name>David Ahern</name>
<email>dsahern@gmail.com</email>
</author>
<published>2018-10-08T03:16:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dac9c9790e542777079999900594fd069ba10489'/>
<id>urn:sha1:dac9c9790e542777079999900594fd069ba10489</id>
<content type='text'>
Make sure extack is passed to nlmsg_parse where easy to do so.
Most of these are dump handlers and leveraging the extack in
the netlink_callback.

Signed-off-by: David Ahern &lt;dsahern@gmail.com&gt;
Acked-by: Christian Brauner &lt;christian@brauner.io&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()</title>
<updated>2018-09-27T03:30:55Z</updated>
<author>
<name>Maciej Żenczykowski</name>
<email>maze@google.com</email>
</author>
<published>2018-09-26T03:56:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d888f39666774c7debfa34e4e20ba33cf61a6d71'/>
<id>urn:sha1:d888f39666774c7debfa34e4e20ba33cf61a6d71</id>
<content type='text'>
(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>treewide: convert ISO_8859-1 text comments to utf-8</title>
<updated>2018-08-24T01:48:43Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-08-24T00:01:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3723c63247854c97fe044c12a40e29043e9bbc1b'/>
<id>urn:sha1:3723c63247854c97fe044c12a40e29043e9bbc1b</id>
<content type='text'>
Almost all files in the kernel are either plain text or UTF-8 encoded.  A
couple however are ISO_8859-1, usually just a few characters in a C
comments, for historic reasons.

This converts them all to UTF-8 for consistency.

Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Simon Horman &lt;horms@verge.net.au&gt;			[IPVS portion]
Acked-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;	[IIO]
Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;			[powerpc]
Acked-by: Rob Herring &lt;robh@kernel.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Samuel Ortiz &lt;sameo@linux.intel.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Rob Herring &lt;robh+dt@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipvs: don't show negative times in ip_vs_conn</title>
<updated>2018-08-16T17:36:57Z</updated>
<author>
<name>Matteo Croce</name>
<email>mcroce@redhat.com</email>
</author>
<published>2018-07-31T16:03:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b71ed54dc2a1dbb4328f7d7c98d712747aee92ba'/>
<id>urn:sha1:b71ed54dc2a1dbb4328f7d7c98d712747aee92ba</id>
<content type='text'>
Since commit 500462a9de65 ("timers: Switch to a non-cascading wheel"),
timers duration can last even 12.5% more than the scheduled interval.

IPVS has two handlers, /proc/net/ip_vs_conn and /proc/net/ip_vs_conn_sync,
which shows the remaining time before that a connection expires.
The default expire time for a connection is 60 seconds, and the
expiration timer can fire even 4 seconds later than the scheduled time.
The expiration time is calculated subtracting jiffies to the scheduled
expiration time, and it's shown as a huge number when the timer fires late,
since both values are unsigned.

This can confuse script and tools which relies on it, like ipvsadm:

    root@mcroce-redhat:~# while ipvsadm -lc |grep SYN_RECV; do sleep 1 ; done
    TCP 00:05  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:04  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:03  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:02  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:01  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:00  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:44 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:43 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:42 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:41 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:40 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:39 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000

Signed-off-by: Matteo Croce &lt;mcroce@redhat.com&gt;
Acked-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest()</title>
<updated>2018-08-16T17:36:51Z</updated>
<author>
<name>Tan Hu</name>
<email>tan.hu@zte.com.cn</email>
</author>
<published>2018-07-25T07:23:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a53b42c11815d2357e31a9403ae3950517525894'/>
<id>urn:sha1:a53b42c11815d2357e31a9403ae3950517525894</id>
<content type='text'>
We came across infinite loop in ipvs when using ipvs in docker
env.

When ipvs receives new packets and cannot find an ipvs connection,
it will create a new connection, then if the dest is unavailable
(i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.

But if the dropped packet is the first packet of this connection,
the connection control timer never has a chance to start and the
ipvs connection cannot be released. This will lead to memory leak, or
infinite loop in cleanup_net() when net namespace is released like
this:

    ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs]
    __ip_vs_cleanup at ffffffffa0a9f60a [ip_vs]
    ops_exit_list at ffffffff81567a49
    cleanup_net at ffffffff81568b40
    process_one_work at ffffffff810a851b
    worker_thread at ffffffff810a9356
    kthread at ffffffff810b0b6f
    ret_from_fork at ffffffff81697a18

race condition:
    CPU1                           CPU2
    ip_vs_in()
      ip_vs_conn_new()
                                   ip_vs_del_dest()
                                     __ip_vs_unlink_dest()
                                       ~IP_VS_DEST_F_AVAILABLE
      cp-&gt;dest &amp;&amp; !IP_VS_DEST_F_AVAILABLE
      __ip_vs_conn_put
    ...
    cleanup_net  ---&gt; infinite looping

Fix this by checking whether the timer already started.

Signed-off-by: Tan Hu &lt;tan.hu@zte.com.cn&gt;
Reviewed-by: Jiang Biao &lt;jiang.biao2@zte.com.cn&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Acked-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>net: Remove some unneeded semicolon</title>
<updated>2018-08-04T20:05:39Z</updated>
<author>
<name>zhong jiang</name>
<email>zhongjiang@huawei.com</email>
</author>
<published>2018-08-04T11:41:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=07d53ae4fbdf7458f4d51249aa24d75c76fe52a8'/>
<id>urn:sha1:07d53ae4fbdf7458f4d51249aa24d75c76fe52a8</id>
<content type='text'>
These semicolons are not needed.  Just remove them.

Signed-off-by: zhong jiang &lt;zhongjiang@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipvs: drop conn templates under attack</title>
<updated>2018-07-18T09:26:41Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2018-07-06T05:25:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=762c40076684771c0efbce6490ded26086441ce6'/>
<id>urn:sha1:762c40076684771c0efbce6490ded26086441ce6</id>
<content type='text'>
Before now, connection templates were ignored by the random
dropentry procedure. But Michal Koutný suggests that we
should add exception for connections under SYN attack.
He provided patch that implements it for TCP:

&lt;quote&gt;

IPVS includes protection against filling the ip_vs_conn_tab by
dropping 1/32 of feasible entries every second. The template
entries (for persistent services) are never directly deleted by
this mechanism but when a picked TCP connection entry is being
dropped (1), the respective template entry is dropped too (realized
by expiring 60 seconds after the connection entry being dropped).

There is another mechanism that removes connection entries when they
time out (2), in this case the associated template entry is not deleted.
Under SYN flood template entries would accumulate (due to their entry
longer timeout).

The accumulation takes place also with drop_entry being enabled. Roughly
15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism
(1) and are removed by the timeout mechanism (2)(defaults to 60 seconds
for SYN_RECV), thus template entries would still accumulate.

The patch ensures that when a connection entry times out, we also remove
the template entry from the table. To prevent breaking persistent
services (since the connection may time out in already established state)
we add a new entry flag to protect templates what spawned at least one
established TCP connection.

&lt;/quote&gt;

We already added ASSURED flag for the templates in previous patch, so
that we can use it now to decide which connection templates should be
dropped under attack. But we also have some cases that need special
handling.

We modify the dropentry procedure as follows:

- Linux timers currently use LIFO ordering but we can not rely on
this to drop controlling connections. So, set cp-&gt;timeout to 0
to indicate that connection was dropped and that on expiration we
should try to drop our controlling connections. As result, we can
now avoid the ip_vs_conn_expire_now call.

- move the cp-&gt;n_control check above, so that it avoids restarting
the timer for controlling connections when not needed.

- drop unassured connection templates here if they are not referred
by any connections.

On connection expiration: if connection was dropped (cp-&gt;timeout=0)
try to drop our controlling connection except if it is a template
in assured state.

In ip_vs_conn_flush change order of ip_vs_conn_expire_now calls
according to the LIFO timer expiration order. It should work
faster for controlling connections with single controlled one.

Suggested-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: add assured state for conn templates</title>
<updated>2018-07-18T09:26:40Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2018-07-06T05:25:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=275411430f892407b885be1de2548b2e632892c3'/>
<id>urn:sha1:275411430f892407b885be1de2548b2e632892c3</id>
<content type='text'>
cp-&gt;state was not used for templates. Add support for state bits
and for the first "assured" bit which indicates that some
connection controlled by this template was established or assured
by the real server. In a followup patch we will use it to drop
templates under SYN attack.

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
</feed>
