<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/ip_fib.h, branch v3.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2012-10-04T17:58:26Z</updated>
<entry>
<title>ipv4: add a fib_type to fib_info</title>
<updated>2012-10-04T17:58:26Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-10-04T01:25:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f4ef85bbda96324785097356336bc79cdd37db0a'/>
<id>urn:sha1:f4ef85bbda96324785097356336bc79cdd37db0a</id>
<content type='text'>
commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

&lt;quote&gt;
        Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
192.168.0.1
192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1

        The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

        And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

&lt;/quote&gt;

Many thanks to Chris Clayton for his patience and help.

Reported-by: Chris Clayton &lt;chris2553@googlemail.com&gt;
Bisected-by: Chris Clayton &lt;chris2553@googlemail.com&gt;
Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Tested-by: Chris Clayton &lt;chris2553@googlemail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Cache routes in nexthop exception entries.</title>
<updated>2012-07-31T22:02:02Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-31T22:02:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c5038a8327b980a5b279fa193163c468011de009'/>
<id>urn:sha1:c5038a8327b980a5b279fa193163c468011de009</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: percpu nh_rth_output cache</title>
<updated>2012-07-31T21:41:39Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-31T05:45:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d26b3a7c4b3b26319f18bb645de93eba8f4bdcd5'/>
<id>urn:sha1:d26b3a7c4b3b26319f18bb645de93eba8f4bdcd5</id>
<content type='text'>
Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt-&gt;rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Tested-by: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Restore old dst_free() behavior.</title>
<updated>2012-07-31T21:41:38Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-07-31T01:08:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=54764bb647b2e847c512acf8d443df965da35000'/>
<id>urn:sha1:54764bb647b2e847c512acf8d443df965da35000</id>
<content type='text'>
commit 404e0a8b6a55 (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Cache input routes in fib_info nexthops.</title>
<updated>2012-07-20T20:36:40Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-17T19:58:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5'/>
<id>urn:sha1:d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5</id>
<content type='text'>
Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions.  (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Cache output routes in fib_info nexthops.</title>
<updated>2012-07-20T20:36:16Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-17T19:20:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f2bb4bedf35d5167a073dcdddf16543f351ef3ae'/>
<id>urn:sha1:f2bb4bedf35d5167a073dcdddf16543f351ef3ae</id>
<content type='text'>
If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.

Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.

The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.

Before we allocate the route, we lookup the exception.

Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.

Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.

With help from Eric Dumazet.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: use seqlock for nh_exceptions</title>
<updated>2012-07-19T17:30:14Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2012-07-18T10:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aee06da6726d4981c51928c2d6d1e2cabeec7a10'/>
<id>urn:sha1:aee06da6726d4981c51928c2d6d1e2cabeec7a10</id>
<content type='text'>
Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller

Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Add FIB nexthop exceptions.</title>
<updated>2012-07-17T15:48:50Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-17T11:19:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4895c771c7f006b4b90f9d6b1d2210939ba57b38'/>
<id>urn:sha1:4895c771c7f006b4b90f9d6b1d2210939ba57b38</id>
<content type='text'>
In a regime where we have subnetted route entries, we need a way to
store persistent storage about destination specific learned values
such as redirects and PMTU values.

This is implemented here via nexthop exceptions.

The initial implementation is a 2048 entry hash table with relaiming
starting at chain length 5.  A more sophisticated scheme can be
devised if that proves necessary.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Don't store a rule pointer in fib_result.</title>
<updated>2012-07-13T15:21:29Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-13T15:21:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=85b91b0339e764f7e56ff5968fa10d85451378b4'/>
<id>urn:sha1:85b91b0339e764f7e56ff5968fa10d85451378b4</id>
<content type='text'>
We only use it to fetch the rule's tclassid, so just store the
tclassid there instead.

This also decreases the size of fib_result by a full 8 bytes on
64-bit.  On 32-bits it's a wash.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Remove tb_peers from fib_table.</title>
<updated>2012-07-12T16:39:28Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2012-07-12T16:39:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=391e5c22f5f4e55817f8ba18a08ea717ed2d4a1f'/>
<id>urn:sha1:391e5c22f5f4e55817f8ba18a08ea717ed2d4a1f</id>
<content type='text'>
No longer used.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
