<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4/ip_forward.c, branch v4.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-04-20T18:07:33Z</updated>
<entry>
<title>ip_forward: Drop frames with attached skb-&gt;sk</title>
<updated>2015-04-20T18:07:33Z</updated>
<author>
<name>Sebastian Pöhn</name>
<email>sebastian.poehn@gmail.com</email>
</author>
<published>2015-04-20T07:19:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ab957492d13bb819400ac29ae55911d50a82a13'/>
<id>urn:sha1:2ab957492d13bb819400ac29ae55911d50a82a13</id>
<content type='text'>
Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn &lt;sebastian.poehn@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>netfilter: Pass socket pointer down through okfn().</title>
<updated>2015-04-07T19:25:55Z</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-04-06T02:19:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7026b1ddb6b8d4e6ee33dc2bd06c0ca8746fa7ab'/>
<id>urn:sha1:7026b1ddb6b8d4e6ee33dc2bd06c0ca8746fa7ab</id>
<content type='text'>
On the output paths in particular, we have to sometimes deal with two
socket contexts.  First, and usually skb-&gt;sk, is the local socket that
generated the frame.

And second, is potentially the socket used to control a tunneling
socket, such as one the encapsulates using UDP.

We do not want to disassociate skb-&gt;sk when encapsulating in order
to fix this, because that would break socket memory accounting.

The most extreme case where this can cause huge problems is an
AF_PACKET socket transmitting over a vxlan device.  We hit code
paths doing checks that assume they are dealing with an ipv4
socket, but are actually operating upon the AF_PACKET one.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>xps: must clear sender_cpu before forwarding</title>
<updated>2015-03-12T03:51:18Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-03-12T01:42:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c29390c6dfeee0944ac6b5610ebbe403944378fc'/>
<id>urn:sha1:c29390c6dfeee0944ac6b5610ebbe403944378fc</id>
<content type='text'>
John reported that my previous commit added a regression
on his router.

This is because sender_cpu &amp; napi_id share a common location,
so get_xps_queue() can see garbage and perform an out of bound access.

We need to make sure sender_cpu is cleared before doing the transmit,
otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
this bug.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: John &lt;jw@nuclearfallout.net&gt;
Bisected-by: John &lt;jw@nuclearfallout.net&gt;
Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: try to cache dst_entries which would cause a redirect</title>
<updated>2015-01-27T01:28:27Z</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2015-01-23T11:01:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df4d92549f23e1c037e83323aff58a21b3de7fe0'/>
<id>urn:sha1:df4d92549f23e1c037e83323aff58a21b3de7fe0</id>
<content type='text'>
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()").

Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU.  Unfortunately waiting for
RCU grace period just takes too long, we can end up with &gt;1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.

Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.

This issue was discovered by Marcelo Leitner.

Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Marcelo Leitner &lt;mleitner@redhat.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: rename local_df to ignore_df</title>
<updated>2014-05-12T18:03:41Z</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-05-04T23:39:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=60ff746739bf805a912484643c720b6124826140'/>
<id>urn:sha1:60ff746739bf805a912484643c720b6124826140</id>
<content type='text'>
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski &lt;maze@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ip: push gso skb forwarding handling down the stack</title>
<updated>2014-05-07T19:49:07Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-05-05T13:00:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7ba65d7b64984ff371cb5630b36af23506c50d5'/>
<id>urn:sha1:c7ba65d7b64984ff371cb5630b36af23506c50d5</id>
<content type='text'>
Doing the segmentation in the forward path has one major drawback:

When using virtio, we may process gso udp packets coming
from host network stack.  In that case, netfilter POSTROUTING
will see one packet with udp header followed by multiple ip
fragments.

Delay the segmentation and do it after POSTROUTING invocation
to avoid this.

Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: ip_forward: fix inverted local_df test</title>
<updated>2014-05-07T19:26:09Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-05-04T21:24:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ca6c5d4ad216d5942ae544bbf02503041bd802aa'/>
<id>urn:sha1:ca6c5d4ad216d5942ae544bbf02503041bd802aa</id>
<content type='text'>
local_df means 'ignore DF bit if set', so if its set we're
allowed to perform ip fragmentation.

This wasn't noticed earlier because the output path also drops such skbs
(and emits needed icmp error) and because netfilter ip defrag did not
set local_df until couple of days ago.

Only difference is that DF-packets-larger-than MTU now discarded
earlier (f.e. we avoid pointless netfilter postrouting trip).

While at it, drop the repeated test ip_exceeds_mtu, checking it once
is enough...

Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2014-02-19T06:24:22Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2014-02-19T06:24:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1e8d6421cff2c24fe0b345711e7a21af02e8bcf5'/>
<id>urn:sha1:1e8d6421cff2c24fe0b345711e7a21af02e8bcf5</id>
<content type='text'>
Conflicts:
	drivers/net/bonding/bond_3ad.h
	drivers/net/bonding/bond_main.c

Two minor conflicts in bonding, both of which were overlapping
changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: ip_forward: perform skb-&gt;pkt_type check at the beginning</title>
<updated>2014-02-13T23:35:56Z</updated>
<author>
<name>Denis Kirjanov</name>
<email>kda@linux-powerpc.org</email>
</author>
<published>2014-02-13T04:58:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d4f2fa6ad61ec1db713569a179183df4d0fc6ae7'/>
<id>urn:sha1:d4f2fa6ad61ec1db713569a179183df4d0fc6ae7</id>
<content type='text'>
Packets which have L2 address different from ours should be
already filtered before entering into ip_forward().

Perform that check at the beginning to avoid processing such packets.

Signed-off-by: Denis Kirjanov &lt;kda@linux-powerpc.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ip, ipv6: handle gso skbs in forwarding path</title>
<updated>2014-02-13T22:17:02Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-02-13T22:09:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fe6cc55f3a9a053482a76f5a6b2257cee51b4663'/>
<id>urn:sha1:fe6cc55f3a9a053482a76f5a6b2257cee51b4663</id>
<content type='text'>
Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host &lt;mtu1500&gt; R1 &lt;mtu1200&gt; R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via -&gt;gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.

Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Reported-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
