<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/dst.h, branch v4.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-04-25T20:20:09Z</updated>
<entry>
<title>route: move lwtunnel state to a single place</title>
<updated>2016-04-25T20:20:09Z</updated>
<author>
<name>Jiri Benc</name>
<email>jbenc@redhat.com</email>
</author>
<published>2016-04-22T10:40:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0868e2538e45a9ed68e2b14adc42b020a36aae1d'/>
<id>urn:sha1:0868e2538e45a9ed68e2b14adc42b020a36aae1d</id>
<content type='text'>
Commit 751a587ac9f9 ("route: fix breakage after moving lwtunnel state")
moved lwtstate to the end of dst_entry for 32bit archs. This makes it share
the cacheline with __refcnt which had an unkown effect on performance. For
this reason, the pointer was kept in place for 64bit archs.

However, later performance measurements showed this is of no concern. It
turns out that every performance sensitive path that accesses lwtstate
accesses also struct rtable or struct rt6_info which share the same cache
line.

Thus, to get rid of a few #ifdefs, move the field to the end of the struct
also for 64bit.

Signed-off-by: Jiri Benc &lt;jbenc@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf, dst: add and use dst_tclassid helper</title>
<updated>2016-03-18T23:38:46Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-03-16T00:42:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=808c1b697c3c4dd2a7132882424c390b0d0acfb9'/>
<id>urn:sha1:808c1b697c3c4dd2a7132882424c390b0d0acfb9</id>
<content type='text'>
We can just add a small helper dst_tclassid() for retrieving the
dst-&gt;tclassid value. It makes the code a bit better in that we can
get rid of the ifdef from filter.c by moving this into the header.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: fix IP early demux races</title>
<updated>2015-12-15T04:52:00Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-12-14T22:08:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5037e9ef9454917b047f9f3a19b4dd179fbf7cd4'/>
<id>urn:sha1:5037e9ef9454917b047f9f3a19b4dd179fbf7cd4</id>
<content type='text'>
David Wilder reported crashes caused by dst reuse.

&lt;quote David&gt;
  I am seeing a crash on a distro V4.2.3 kernel caused by a double
  release of a dst_entry.  In ipv4_dst_destroy() the call to
  list_empty() finds a poisoned next pointer, indicating the dst_entry
  has already been removed from the list and freed. The crash occurs
  18 to 24 hours into a run of a network stress exerciser.
&lt;/quote&gt;

Thanks to his detailed report and analysis, we were able to understand
the core issue.

IP early demux can associate a dst to skb, after a lookup in TCP/UDP
sockets.

When socket cache is not properly set, we want to store into
sk-&gt;sk_dst_cache the dst for future IP early demux lookups,
by acquiring a stable refcount on the dst.

Problem is this acquisition is simply using an atomic_inc(),
which works well, unless the dst was queued for destruction from
dst_release() noticing dst refcount went to zero, if DST_NOCACHE
was set on dst.

We need to make sure current refcount is not zero before incrementing
it, or risk double free as David reported.

This patch, being a stable candidate, adds two new helpers, and use
them only from IP early demux problematic paths.

It might be possible to merge in net-next skb_dst_force() and
skb_dst_force_safe(), but I prefer having the smallest patch for stable
kernels : Maybe some skb_dst_force() callers do not expect skb-&gt;dst
can suddenly be cleared.

Can probably be backported back to linux-3.6 kernels

Reported-by: David J. Wilder &lt;dwilder@us.ibm.com&gt;
Tested-by: David J. Wilder &lt;dwilder@us.ibm.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>dst: Pass net into dst-&gt;output</title>
<updated>2015-10-08T11:27:03Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-10-07T21:48:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ede2059dbaf9c6557a49d466c8c7778343b208ff'/>
<id>urn:sha1:ede2059dbaf9c6557a49d466c8c7778343b208ff</id>
<content type='text'>
The network namespace is already passed into dst_output pass it into
dst-&gt;output lwt-&gt;output and friends.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Pass net into dst_output and remove dst_output_okfn</title>
<updated>2015-10-08T11:26:54Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-10-07T21:48:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13206b6bff3b15b724926a222406476bf2c23c40'/>
<id>urn:sha1:13206b6bff3b15b724926a222406476bf2c23c40</id>
<content type='text'>
Replace dst_output_okfn with dst_output

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: constify ip_route_output_flow() socket argument</title>
<updated>2015-09-25T20:00:37Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-09-25T14:39:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6f9c961546699ff8bc5e1c1c52200616867ec68a'/>
<id>urn:sha1:6f9c961546699ff8bc5e1c1c52200616867ec68a</id>
<content type='text'>
Very soon, TCP stack might call inet_csk_route_req(), which
calls inet_csk_route_req() with an unlocked listener socket,
so we need to make sure ip_route_output_flow() is not trying to
change any field from its socket argument.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: Pass net into okfn</title>
<updated>2015-09-18T00:18:37Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-09-16T01:04:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0c4b51f0054ce85c0ec578ab818f0631834573eb'/>
<id>urn:sha1:0c4b51f0054ce85c0ec578ab818f0631834573eb</id>
<content type='text'>
This is immediately motivated by the bridge code that chains functions that
call into netfilter.  Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn.  For the moment dst_output_okfn
just silently drops the struct net.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Merge dst_output and dst_output_sk</title>
<updated>2015-09-18T00:18:32Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-09-16T01:03:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5a70649e0dae02ba5090540fffce667d2300bc5a'/>
<id>urn:sha1:5a70649e0dae02ba5090540fffce667d2300bc5a</id>
<content type='text'>
Add a sock paramter to dst_output making dst_output_sk superfluous.
Add a skb-&gt;sk parameter to all of the callers of dst_output
Have the callers of dst_output_sk call dst_output.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: use dctcp if enabled on the route to the initiator</title>
<updated>2015-08-31T19:34:00Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-08-31T13:58:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c3a8d9474684d391b0afc3970d9b249add15ec07'/>
<id>urn:sha1:c3a8d9474684d391b0afc3970d9b249add15ec07</id>
<content type='text'>
Currently, the following case doesn't use DCTCP, even if it should:
A responder has f.e. Cubic as system wide default, but for a specific
route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
initiating host then uses DCTCP as congestion control, but since the
initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
and we have to fall back to Reno after 3WHS completes.

We were thinking on how to solve this in a minimal, non-intrusive
way without bloating tcp_ecn_create_request() needlessly: lets cache
the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
to only do a single metric feature lookup inside tcp_ecn_create_request().

Joint work with Florian Westphal.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>dst: Add __skb_dst_copy() variation</title>
<updated>2015-08-27T18:40:42Z</updated>
<author>
<name>Joe Stringer</name>
<email>joestringer@nicira.com</email>
</author>
<published>2015-08-26T18:31:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e79e259588a414589a016edc428ee8dd308f81ad'/>
<id>urn:sha1:e79e259588a414589a016edc428ee8dd308f81ad</id>
<content type='text'>
This variation on skb_dst_copy() doesn't require two skbs.

Signed-off-by: Joe Stringer &lt;joestringer@nicira.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
