<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/dst.h, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-12-15T04:52:00Z</updated>
<entry>
<title>net: fix IP early demux races</title>
<updated>2015-12-15T04:52:00Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-12-14T22:08:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5037e9ef9454917b047f9f3a19b4dd179fbf7cd4'/>
<id>urn:sha1:5037e9ef9454917b047f9f3a19b4dd179fbf7cd4</id>
<content type='text'>
David Wilder reported crashes caused by dst reuse.

&lt;quote David&gt;
  I am seeing a crash on a distro V4.2.3 kernel caused by a double
  release of a dst_entry.  In ipv4_dst_destroy() the call to
  list_empty() finds a poisoned next pointer, indicating the dst_entry
  has already been removed from the list and freed. The crash occurs
  18 to 24 hours into a run of a network stress exerciser.
&lt;/quote&gt;

Thanks to his detailed report and analysis, we were able to understand
the core issue.

IP early demux can associate a dst to skb, after a lookup in TCP/UDP
sockets.

When socket cache is not properly set, we want to store into
sk-&gt;sk_dst_cache the dst for future IP early demux lookups,
by acquiring a stable refcount on the dst.

Problem is this acquisition is simply using an atomic_inc(),
which works well, unless the dst was queued for destruction from
dst_release() noticing dst refcount went to zero, if DST_NOCACHE
was set on dst.

We need to make sure current refcount is not zero before incrementing
it, or risk double free as David reported.

This patch, being a stable candidate, adds two new helpers, and use
them only from IP early demux problematic paths.

It might be possible to merge in net-next skb_dst_force() and
skb_dst_force_safe(), but I prefer having the smallest patch for stable
kernels : Maybe some skb_dst_force() callers do not expect skb-&gt;dst
can suddenly be cleared.

Can probably be backported back to linux-3.6 kernels

Reported-by: David J. Wilder &lt;dwilder@us.ibm.com&gt;
Tested-by: David J. Wilder &lt;dwilder@us.ibm.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>dst: Pass net into dst-&gt;output</title>
<updated>2015-10-08T11:27:03Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-10-07T21:48:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ede2059dbaf9c6557a49d466c8c7778343b208ff'/>
<id>urn:sha1:ede2059dbaf9c6557a49d466c8c7778343b208ff</id>
<content type='text'>
The network namespace is already passed into dst_output pass it into
dst-&gt;output lwt-&gt;output and friends.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Pass net into dst_output and remove dst_output_okfn</title>
<updated>2015-10-08T11:26:54Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-10-07T21:48:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13206b6bff3b15b724926a222406476bf2c23c40'/>
<id>urn:sha1:13206b6bff3b15b724926a222406476bf2c23c40</id>
<content type='text'>
Replace dst_output_okfn with dst_output

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: constify ip_route_output_flow() socket argument</title>
<updated>2015-09-25T20:00:37Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-09-25T14:39:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6f9c961546699ff8bc5e1c1c52200616867ec68a'/>
<id>urn:sha1:6f9c961546699ff8bc5e1c1c52200616867ec68a</id>
<content type='text'>
Very soon, TCP stack might call inet_csk_route_req(), which
calls inet_csk_route_req() with an unlocked listener socket,
so we need to make sure ip_route_output_flow() is not trying to
change any field from its socket argument.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: Pass net into okfn</title>
<updated>2015-09-18T00:18:37Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-09-16T01:04:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0c4b51f0054ce85c0ec578ab818f0631834573eb'/>
<id>urn:sha1:0c4b51f0054ce85c0ec578ab818f0631834573eb</id>
<content type='text'>
This is immediately motivated by the bridge code that chains functions that
call into netfilter.  Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn.  For the moment dst_output_okfn
just silently drops the struct net.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Merge dst_output and dst_output_sk</title>
<updated>2015-09-18T00:18:32Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-09-16T01:03:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5a70649e0dae02ba5090540fffce667d2300bc5a'/>
<id>urn:sha1:5a70649e0dae02ba5090540fffce667d2300bc5a</id>
<content type='text'>
Add a sock paramter to dst_output making dst_output_sk superfluous.
Add a skb-&gt;sk parameter to all of the callers of dst_output
Have the callers of dst_output_sk call dst_output.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: use dctcp if enabled on the route to the initiator</title>
<updated>2015-08-31T19:34:00Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-08-31T13:58:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c3a8d9474684d391b0afc3970d9b249add15ec07'/>
<id>urn:sha1:c3a8d9474684d391b0afc3970d9b249add15ec07</id>
<content type='text'>
Currently, the following case doesn't use DCTCP, even if it should:
A responder has f.e. Cubic as system wide default, but for a specific
route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
initiating host then uses DCTCP as congestion control, but since the
initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
and we have to fall back to Reno after 3WHS completes.

We were thinking on how to solve this in a minimal, non-intrusive
way without bloating tcp_ecn_create_request() needlessly: lets cache
the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
to only do a single metric feature lookup inside tcp_ecn_create_request().

Joint work with Florian Westphal.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>dst: Add __skb_dst_copy() variation</title>
<updated>2015-08-27T18:40:42Z</updated>
<author>
<name>Joe Stringer</name>
<email>joestringer@nicira.com</email>
</author>
<published>2015-08-26T18:31:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e79e259588a414589a016edc428ee8dd308f81ad'/>
<id>urn:sha1:e79e259588a414589a016edc428ee8dd308f81ad</id>
<content type='text'>
This variation on skb_dst_copy() doesn't require two skbs.

Signed-off-by: Joe Stringer &lt;joestringer@nicira.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>route: fix breakage after moving lwtunnel state</title>
<updated>2015-08-23T23:51:17Z</updated>
<author>
<name>Jiri Benc</name>
<email>jbenc@redhat.com</email>
</author>
<published>2015-08-21T10:41:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=751a587ac9f9a8bf314590fbac32d9e418060c5a'/>
<id>urn:sha1:751a587ac9f9a8bf314590fbac32d9e418060c5a</id>
<content type='text'>
__recnt and related fields need to be in its own cacheline for performance
reasons. Commit 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
broke that on 32bit archs, causing BUILD_BUG_ON in dst_hold to be triggered.

This patch fixes the breakage by moving the lwtunnel state to the end of
dst_entry on 32bit archs. Unfortunately, this makes it share the cacheline
with __refcnt and may affect performance, thus further patches may be
needed.

Reported-by: kbuild test robot &lt;fengguang.wu@intel.com&gt;
Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
Signed-off-by: Jiri Benc &lt;jbenc@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>route: move lwtunnel state to dst_entry</title>
<updated>2015-08-20T22:42:36Z</updated>
<author>
<name>Jiri Benc</name>
<email>jbenc@redhat.com</email>
</author>
<published>2015-08-20T11:56:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=61adedf3e3f1d3f032c5a6a299978d91eff6d555'/>
<id>urn:sha1:61adedf3e3f1d3f032c5a6a299978d91eff6d555</id>
<content type='text'>
Currently, the lwtunnel state resides in per-protocol data. This is
a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa).
The xmit function of the tunnel does not know whether the packet has been
routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the
lwtstate data to dst_entry makes such inter-protocol tunneling possible.

As a bonus, this brings a nice diffstat.

Signed-off-by: Jiri Benc &lt;jbenc@redhat.com&gt;
Acked-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
