<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v3.19</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.19</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.19'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-02-02T07:06:19Z</updated>
<entry>
<title>ipv4: tcp: get rid of ugly unicast_sock</title>
<updated>2015-02-02T07:06:19Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-01-30T05:35:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bdbbb8527b6f6a358dbcb70dac247034d665b8e4'/>
<id>urn:sha1:bdbbb8527b6f6a358dbcb70dac247034d665b8e4</id>
<content type='text'>
In commit be9f4a44e7d41 ("ipv4: tcp: remove per net tcp_sock")
I tried to address contention on a socket lock, but the solution
I chose was horrible :

commit 3a7c384ffd57e ("ipv4: tcp: unicast_sock should not land outside
of TCP stack") addressed a selinux regression.

commit 0980e56e506b ("ipv4: tcp: set unicast_sock uc_ttl to -1")
took care of another regression.

commit b5ec8eeac46 ("ipv4: fix ip_send_skb()") fixed another regression.

commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
was another shot in the dark.

Really, just use a proper socket per cpu, and remove the skb_orphan()
call, to re-enable flow control.

This solves a serious problem with FQ packet scheduler when used in
hostile environments, as we do not want to allocate a flow structure
for every RST packet sent in response to a spoofed packet.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Don't increase PMTU with Datagram Too Big message.</title>
<updated>2015-01-29T23:28:59Z</updated>
<author>
<name>Li Wei</name>
<email>lw@cn.fujitsu.com</email>
</author>
<published>2015-01-29T08:09:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3cdaa5be9e81a914e633a6be7b7d2ef75b528562'/>
<id>urn:sha1:3cdaa5be9e81a914e633a6be7b7d2ef75b528562</id>
<content type='text'>
RFC 1191 said, "a host MUST not increase its estimate of the Path
MTU in response to the contents of a Datagram Too Big message."

Signed-off-by: Li Wei &lt;lw@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: ipv4: initialize unicast_sock sk_pacing_rate</title>
<updated>2015-01-29T07:24:47Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-01-28T13:47:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=811230cd853d62f09ed0addd0ce9a1b9b0e13fb5'/>
<id>urn:sha1:811230cd853d62f09ed0addd0ce9a1b9b0e13fb5</id>
<content type='text'>
When I added sk_pacing_rate field, I forgot to initialize its value
in the per cpu unicast_sock used in ip_send_unicast_reply()

This means that for sch_fq users, RST packets, or ACK packets sent
on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
once we reach the per flow limit.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix timing issue in CUBIC slope calculation</title>
<updated>2015-01-29T06:18:38Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2015-01-29T01:01:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d6b1a8a92a1417f8859a6937d2e6ffe2dfab4e6d'/>
<id>urn:sha1:d6b1a8a92a1417f8859a6937d2e6ffe2dfab4e6d</id>
<content type='text'>
This patch fixes a bug in CUBIC that causes cwnd to increase slightly
too slowly when multiple ACKs arrive in the same jiffy.

If cwnd is supposed to increase at a rate of more than once per jiffy,
then CUBIC was sometimes too slow. Because the bic_target is
calculated for a future point in time, calculated with time in
jiffies, the cwnd can increase over the course of the jiffy while the
bic_target calculated as the proper CUBIC cwnd at time
t=tcp_time_stamp+rtt does not increase, because tcp_time_stamp only
increases on jiffy tick boundaries.

So since the cnt is set to:
	ca-&gt;cnt = cwnd / (bic_target - cwnd);
as cwnd increases but bic_target does not increase due to jiffy
granularity, the cnt becomes too large, causing cwnd to increase
too slowly.

For example:
- suppose at the beginning of a jiffy, cwnd=40, bic_target=44
- so CUBIC sets:
   ca-&gt;cnt =  cwnd / (bic_target - cwnd) = 40 / (44 - 40) = 40/4 = 10
- suppose we get 10 acks, each for 1 segment, so tcp_cong_avoid_ai()
   increases cwnd to 41
- so CUBIC sets:
   ca-&gt;cnt =  cwnd / (bic_target - cwnd) = 41 / (44 - 41) = 41 / 3 = 13

So now CUBIC will wait for 13 packets to be ACKed before increasing
cwnd to 42, insted of 10 as it should.

The fix is to avoid adjusting the slope (determined by ca-&gt;cnt)
multiple times within a jiffy, and instead skip to compute the Reno
cwnd, the "TCP friendliness" code path.

Reported-by: Eyal Perry &lt;eyalpe@mellanox.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix stretch ACK bugs in CUBIC</title>
<updated>2015-01-29T06:18:38Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2015-01-29T01:01:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9cd981dcf174d26805a032aefa791436da709bee'/>
<id>urn:sha1:9cd981dcf174d26805a032aefa791436da709bee</id>
<content type='text'>
Change CUBIC to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, because we are now precisely accounting for stretch ACKs,
including delayed ACKs, we can now remove the delayed ACK tracking and
estimation code that tracked recent delayed ACK behavior in
ca-&gt;delayed_ack.

Reported-by: Eyal Perry &lt;eyalpe@mellanox.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix stretch ACK bugs in Reno</title>
<updated>2015-01-29T06:18:38Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2015-01-29T01:01:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c22bdca94782f05b9337d8548bde51b2f38ef17f'/>
<id>urn:sha1:c22bdca94782f05b9337d8548bde51b2f38ef17f</id>
<content type='text'>
Change Reno to properly handle stretch ACKs in additive increase mode
by passing in the count of ACKed packets to tcp_cong_avoid_ai().

In addition, if snd_cwnd crosses snd_ssthresh during slow start
processing, and we then exit slow start mode, we need to carry over
any remaining "credit" for packets ACKed and apply that to additive
increase by passing this remaining "acked" count to
tcp_cong_avoid_ai().

Reported-by: Eyal Perry &lt;eyalpe@mellanox.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix the timid additive increase on stretch ACKs</title>
<updated>2015-01-29T06:18:37Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2015-01-29T01:01:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=814d488c61260521b1b3cc97063700a5a6667c8f'/>
<id>urn:sha1:814d488c61260521b1b3cc97063700a5a6667c8f</id>
<content type='text'>
tcp_cong_avoid_ai() was too timid (snd_cwnd increased too slowly) on
"stretch ACKs" -- cases where the receiver ACKed more than 1 packet in
a single ACK. For example, suppose w is 10 and we get a stretch ACK
for 20 packets, so acked is 20. We ought to increase snd_cwnd by 2
(since acked/w = 20/10 = 2), but instead we were only increasing cwnd
by 1. This patch fixes that behavior.

Reported-by: Eyal Perry &lt;eyalpe@mellanox.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: stretch ACK fixes prep</title>
<updated>2015-01-29T06:18:37Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2015-01-29T01:01:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e73ebb0881ea5534ce606c1d71b4ac44db5c6930'/>
<id>urn:sha1:e73ebb0881ea5534ce606c1d71b4ac44db5c6930</id>
<content type='text'>
LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
cover more than the RFC-specified maximum of 2 packets. These stretch
ACKs can cause serious performance shortfalls in common congestion
control algorithms that were designed and tuned years ago with
receiver hosts that were not using LRO or GRO, and were instead
politely ACKing every other packet.

This patch series fixes Reno and CUBIC to handle stretch ACKs.

This patch prepares for the upcoming stretch ACK bug fix patches. It
adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
changes all congestion control algorithms to pass in 1 for the ACKed
count. It also changes tcp_slow_start() to return the number of packet
ACK "credits" that were not processed in slow start mode, and can be
processed by the congestion control module in additive increase mode.

In future patches we will fix tcp_cong_avoid_ai() to handle stretch
ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
and additive increase mode.

Reported-by: Eyal Perry &lt;eyalpe@mellanox.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ping: Fix race in free in receive path</title>
<updated>2015-01-27T08:04:53Z</updated>
<author>
<name>subashab@codeaurora.org</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2015-01-23T22:26:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc752f1f43c1c038a2c6ae58cc739ebb5953ccb0'/>
<id>urn:sha1:fc752f1f43c1c038a2c6ae58cc739ebb5953ccb0</id>
<content type='text'>
An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().

--&gt;|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish

Fix this incorrect free by cloning this skb and processing this cloned
skb instead.

This patch was suggested by Eric Dumazet

Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp_diag: Fix socket skipping within chain</title>
<updated>2015-01-27T08:02:41Z</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2015-01-23T21:02:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86f3cddbc3037882414c7308973530167906b7e9'/>
<id>urn:sha1:86f3cddbc3037882414c7308973530167906b7e9</id>
<content type='text'>
While working on rhashtable walking I noticed that the UDP diag
dumping code is buggy.  In particular, the socket skipping within
a chain never happens, even though we record the number of sockets
that should be skipped.

As this code was supposedly copied from TCP, this patch does what
TCP does and resets num before we walk a chain.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
