<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/netfilter/ipvs, branch v4.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-11-08T22:53:30Z</updated>
<entry>
<title>ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr</title>
<updated>2016-11-08T22:53:30Z</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2016-11-04T00:14:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8fbfef7f505bba60fb57078b7621270ee57cd1c4'/>
<id>urn:sha1:8fbfef7f505bba60fb57078b7621270ee57cd1c4</id>
<content type='text'>
family.maxattr is the max index for policy[], the size of
ops[] is determined with ARRAY_SIZE().

Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Tested-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>netfilter: ip_vs_sync: fix bogus maybe-uninitialized warning</title>
<updated>2016-10-28T12:14:51Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2016-10-24T15:34:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5747620257812530adda58cbff591fede6fb261e'/>
<id>urn:sha1:5747620257812530adda58cbff591fede6fb261e</id>
<content type='text'>
Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86
confuses the compiler to the point where it produces a rather
dubious warning message:

net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  struct ip_vs_sync_conn_options opt;
                                 ^~~
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&amp;opt+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&amp;opt+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&amp;opt+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

The problem appears to be a combination of a number of factors, including
the __builtin_bswap32 compiler builtin being slightly odd, having a large
amount of code inlined into a single function, and the way that some
functions only get partially inlined here.

I've spent way too much time trying to work out a way to improve the
code, but the best I've come up with is to add an explicit memset
right before the ip_vs_seq structure is first initialized here. When
the compiler works correctly, this has absolutely no effect, but in the
case that produces the warning, the warning disappears.

In the process of analysing this warning, I also noticed that
we use memcpy to copy the larger ip_vs_sync_conn_options structure
over two members of the ip_vs_conn structure. This works because
the layout is identical, but seems error-prone, so I'm changing
this in the process to directly copy the two members. This change
seemed to have no effect on the object code or the warning, but
it deals with the same data, so I kept the two changes together.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: use nf_ct_kill helper</title>
<updated>2016-08-11T22:43:52Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-08-03T13:21:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a6c46d9bc9d8ca6e30e681aadd30b73c44434b7d'/>
<id>urn:sha1:a6c46d9bc9d8ca6e30e681aadd30b73c44434b7d</id>
<content type='text'>
Once timer is removed from nf_conn struct we cannot open-code
the removal sequence anymore.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next</title>
<updated>2016-07-25T05:02:36Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-07-25T05:02:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c42d7121fbee1ee30fd9221d594e9c5a4bc1fed6'/>
<id>urn:sha1:c42d7121fbee1ee30fd9221d594e9c5a4bc1fed6</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
they are:

1) Count pre-established connections as active in "least connection"
   schedulers such that pre-established connections to avoid overloading
   backend servers on peak demands, from Michal Kubecek via Simon Horman.

2) Address a race condition when resizing the conntrack table by caching
   the bucket size when fulling iterating over the hashtable in these
   three possible scenarios: 1) dump via /proc/net/nf_conntrack,
   2) unlinking userspace helper and 3) unlinking custom conntrack timeout.
   From Liping Zhang.

3) Revisit early_drop() path to perform lockless traversal on conntrack
   eviction under stress, use del_timer() as synchronization point to
   avoid two CPUs evicting the same entry, from Florian Westphal.

4) Move NAT hlist_head to nf_conn object, this simplifies the existing
   NAT extension and it doesn't increase size since recent patches to
   align nf_conn, from Florian.

5) Use rhashtable for the by-source NAT hashtable, also from Florian.

6) Don't allow --physdev-is-out from OUTPUT chain, just like
   --physdev-out is not either, from Hangbin Liu.

7) Automagically set on nf_conntrack counters if the user tries to
   match ct bytes/packets from nftables, from Liping Zhang.

8) Remove possible_net_t fields in nf_tables set objects since we just
   simply pass the net pointer to the backend set type implementations.

9) Fix possible off-by-one in h323, from Toby DiPasquale.

10) early_drop() may be called from ctnetlink patch, so we must hold
    rcu read size lock from them too, this amends Florian's patch #3
    coming in this batch, from Liping Zhang.

11) Use binary search to validate jump offset in x_tables, this
    addresses the O(n!) validation that was introduced recently
    resolve security issues with unpriviledge namespaces, from Florian.

12) Fix reference leak to connlabel in error path of nft_ct, from Zhang.

13) Three updates for nft_log: Fix log prefix leak in error path. Bail
    out on loglevel larger than debug in nft_log and set on the new
    NF_LOG_F_COPY_LEN flag when snaplen is specified. Again from Zhang.

14) Allow to filter rule dumps in nf_tables based on table and chain
    names.

15) Simplify connlabel to always use 128 bits to store labels and
    get rid of unused function in xt_connlabel, from Florian.

16) Replace set_expect_timeout() by mod_timer() from the h323 conntrack
    helper, by Gao Feng.

17) Put back x_tables module reference in nft_compat on error, from
    Liping Zhang.

18) Add a reference count to the x_tables extensions cache in
    nft_compat, so we can remove them when unused and avoid a crash
    if the extensions are rmmod, again from Zhang.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge tag 'ipvs-for-v4.8' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next</title>
<updated>2016-07-11T10:16:34Z</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2016-07-11T10:16:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c080b460df960f1dc3b35d009392458b2861e801'/>
<id>urn:sha1:c080b460df960f1dc3b35d009392458b2861e801</id>
<content type='text'>
Simon Horman says:

====================
IPVS Updates for v4.8

please consider these enhancements to the IPVS. This alters the behaviour
of the "least connection" schedulers such that pre-established connections
are included in the active connection count. This avoids overloading
servers when a large number of new connections arrive in a short space of
time - e.g. when clients reconnect after a node or network failure.
====================

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: count pre-established TCP states as active</title>
<updated>2016-07-07T18:30:52Z</updated>
<author>
<name>Michal Kubecek</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2016-06-03T15:56:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=be2cef49904b34dd5f75d96bbc8cd8341bab1bc0'/>
<id>urn:sha1:be2cef49904b34dd5f75d96bbc8cd8341bab1bc0</id>
<content type='text'>
Some users observed that "least connection" distribution algorithm doesn't
handle well bursts of TCP connections from reconnecting clients after
a node or network failure.

This is because the algorithm counts active connection as worth 256
inactive ones where for TCP, "active" only means TCP connections in
ESTABLISHED state. In case of a connection burst, new connections are
handled before previous ones have finished the three way handshaking so
that all are still counted as "inactive", i.e. cheap ones. The become
"active" quickly but at that time, all of them are already assigned to one
real server (or few), resulting in highly unbalanced distribution.

Address this by counting the "pre-established" states as "active".

Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>ipvs: fix bind to link-local mcast IPv6 address in backup</title>
<updated>2016-07-07T18:21:32Z</updated>
<author>
<name>Quentin Armitage</name>
<email>quentin@armitage.org.uk</email>
</author>
<published>2016-06-16T07:00:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3777ed688fba82d0bd43f9fc1ebbc6abe788576d'/>
<id>urn:sha1:3777ed688fba82d0bd43f9fc1ebbc6abe788576d</id>
<content type='text'>
When using HEAD from
https://git.kernel.org/cgit/utils/kernel/ipvsadm/ipvsadm.git/,
the command:
ipvsadm --start-daemon backup --mcast-interface eth0.60 \
    --mcast-group ff02::1:81
fails with the error message:
Argument list too long

whereas both:
ipvsadm --start-daemon master --mcast-interface eth0.60 \
    --mcast-group ff02::1:81
and:
ipvsadm --start-daemon backup --mcast-interface eth0.60 \
    --mcast-group 224.0.0.81
are successful.

The error message "Argument list too long" isn't helpful. The error occurs
because an IPv6 address is given in backup mode.

The error is in make_receive_sock() in net/netfilter/ipvs/ip_vs_sync.c,
since it fails to set the interface on the address or the socket before
calling inet6_bind() (via sock-&gt;ops-&gt;bind), where the test
'if (!sk-&gt;sk_bound_dev_if)' failed.

Setting sock-&gt;sk-&gt;sk_bound_dev_if on the socket before calling
inet6_bind() resolves the issue.

Fixes: d33288172e72 ("ipvs: add more mcast parameters for the sync daemon")
Signed-off-by: Quentin Armitage &lt;quentin@armitage.org.uk&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>ipvs: update real-server binding of outgoing connections in SIP-pe</title>
<updated>2016-06-06T00:47:25Z</updated>
<author>
<name>Marco Angaroni</name>
<email>marcoangaroni@gmail.com</email>
</author>
<published>2016-05-16T17:18:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3ec10d3a2ba591c87da94219c1e46b02ae97757a'/>
<id>urn:sha1:3ec10d3a2ba591c87da94219c1e46b02ae97757a</id>
<content type='text'>
Previous patch that introduced handling of outgoing packets in SIP
persistent-engine did not call ip_vs_check_template() in case packet was
matching a connection template. Assumption was that real-server was
healthy, since it was sending a packet just in that moment.

There are however real-server fault conditions requiring that association
between call-id and real-server (represented by connection template)
gets updated. Here is an example of the sequence of events:
  1) RS1 is a back2back user agent that handled call-id1 and call-id2
  2) RS1 is down and was marked as unavailable
  3) new message from outside comes to IPVS with call-id1
  4) IPVS reschedules the message to RS2, which becomes new call handler
  5) RS2 forwards the message outside, translating call-id1 to call-id2
  6) inside pe-&gt;conn_out() IPVS matches call-id2 with existing template
  7) IPVS does not change association call-id2 &lt;-&gt; RS1
  8) new message comes from client with call-id2
  9) IPVS reschedules the message to a real-server potentially different
     from RS2, which is now the correct destination

This patch introduces ip_vs_check_template() call in the handling of
outgoing packets for SIP-pe. And also introduces a second optional
argument for ip_vs_check_template() that allows to check if dest
associated to a connection template is the same dest that was identified
as the source of the packet. This is to change the real-server bound to a
particular call-id independently from its availability status: the idea
is that it's more reliable, for in-&gt;out direction (where internal
network can be considered trusted), to always associate a call-id with
the last real-server that used it in one of its messages. Think about
above sequence of events where, just after step 5, RS1 returns instead
to be available.

Comparison of dests is done by simply comparing pointers to struct
ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its
memory address, but represent a different real-server in terms of
ip-address / port.

Fixes: 39b972231536 ("ipvs: handle connections started by real-servers")
Signed-off-by: Marco Angaroni &lt;marcoangaroni@gmail.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>net: define gso types for IPx over IPv4 and IPv6</title>
<updated>2016-05-20T22:03:15Z</updated>
<author>
<name>Tom Herbert</name>
<email>tom@herbertland.com</email>
</author>
<published>2016-05-18T16:06:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e13318daa4a67bff2f800923a993ef3818b3c53'/>
<id>urn:sha1:7e13318daa4a67bff2f800923a993ef3818b3c53</id>
<content type='text'>
This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.

Signed-off-by: Tom Herbert &lt;tom@herbertland.com&gt;
Acked-by: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next</title>
<updated>2016-05-09T19:02:58Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-05-09T19:02:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e8ed77dfa90dd79c5343415a4bbbfdab9787b35a'/>
<id>urn:sha1:e8ed77dfa90dd79c5343415a4bbbfdab9787b35a</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following large patchset contains Netfilter updates for your
net-next tree. My initial intention was to send you this in two goes but
when I looked back twice I already had this burden on top of me.

Several updates for IPVS from Marco Angaroni:

1) Allow SIP connections originating from real-servers to be load
   balanced by the SIP persistence engine as is already implemented
   in the other direction.

2) Release connections immediately for One-packet-scheduling (OPS)
   in IPVS, instead of making it via timer and rcu callback.

3) Skip deleting conntracks for each one packet in OPS, and don't call
   nf_conntrack_alter_reply() since no reply is expected.

4) Enable drop on exhaustion for OPS + SIP persistence.

Miscelaneous conntrack updates from Florian Westphal, including fix for
hash resize:

5) Move conntrack generation counter out of conntrack pernet structure
   since this is only used by the init_ns to allow hash resizing.

6) Use get_random_once() from packet path to collect hash random seed
    instead of our compound.

7) Don't disable BH from ____nf_conntrack_find() for statistics,
   use NF_CT_STAT_INC_ATOMIC() instead.

8) Fix lookup race during conntrack hash resizing.

9) Introduce clash resolution on conntrack insertion for connectionless
   protocol.

Then, Florian's netns rework to get rid of per-netns conntrack table,
thus we use one single table for them all. There was consensus on this
change during the NFWS 2015 and, on top of that, it has recently been
pointed as a source of multiple problems from unpriviledged netns:

11) Use a single conntrack hashtable for all namespaces. Include netns
    in object comparisons and make it part of the hash calculation.
    Adapt early_drop() to consider netns.

12) Use single expectation and NAT hashtable for all namespaces.

13) Use a single slab cache for all namespaces for conntrack objects.

14) Skip full table scanning from nf_ct_iterate_cleanup() if the pernet
    conntrack counter tells us the table is empty (ie. equals zero).

Fixes for nf_tables interval set element handling, support to set
conntrack connlabels and allow set names up to 32 bytes.

15) Parse element flags from element deletion path and pass it up to the
    backend set implementation.

16) Allow adjacent intervals in the rbtree set type for dynamic interval
    updates.

17) Add support to set connlabel from nf_tables, from Florian Westphal.

18) Allow set names up to 32 bytes in nf_tables.

Several x_tables fixes and updates:

19) Fix incorrect use of IS_ERR_VALUE() in x_tables, original patch
    from Andrzej Hajda.

And finally, miscelaneous netfilter updates such as:

20) Disable automatic helper assignment by default. Note this proc knob
    was introduced by a9006892643a ("netfilter: nf_ct_helper: allow to
    disable automatic helper assignment") 4 years ago to start moving
    towards explicit conntrack helper configuration via iptables CT
    target.

21) Get rid of obsolete and inconsistent debugging instrumentation
    in x_tables.

22) Remove unnecessary check for null after ip6_route_output().
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
