<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v3.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-10-21T22:56:23Z</updated>
<entry>
<title>tcp: initialize passive-side sk_pacing_rate after 3WHS</title>
<updated>2013-10-21T22:56:23Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2013-10-21T19:40:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02cf4ebd82ff0ac7254b88e466820a290ed8289a'/>
<id>urn:sha1:02cf4ebd82ff0ac7254b88e466820a290ed8289a</id>
<content type='text'>
For passive TCP connections, upon receiving the ACK that completes the
3WHS, make sure we set our pacing rate after we get our first RTT
sample.

On passive TCP connections, when we receive the ACK completing the
3WHS we do not take an RTT sample in tcp_ack(), but rather in
tcp_synack_rtt_meas(). So upon receiving the ACK that completes the
3WHS, tcp_ack() leaves sk_pacing_rate at its initial value.

Originally the initial sk_pacing_rate value was 0, so passive-side
connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs
made in the first RTT. With a default initial cwnd of 10 packets, this
happened to be correct for RTTs 5ms or bigger, so it was hard to
see problems in WAN or emulated WAN testing.

Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the
initial sk_pacing_rate is 0xffffffff. So after that change, passive
TCP connections were keeping this value (and using large numbers of
segments per skbuff) until receiving an ACK for data.

Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ip_output: do skb ufo init for peeked non ufo skb as well</title>
<updated>2013-10-19T23:20:52Z</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@resnulli.us</email>
</author>
<published>2013-10-19T10:29:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e93b7d748be887cd7639b113ba7d7ef792a7efb9'/>
<id>urn:sha1:e93b7d748be887cd7639b113ba7d7ef792a7efb9</id>
<content type='text'>
Now, if user application does:
sendto len&lt;mtu flag MSG_MORE
sendto len&gt;mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

introduced by:
commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach"

Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()</title>
<updated>2013-10-17T20:08:08Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-10-15T19:24:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918'/>
<id>urn:sha1:8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918</id>
<content type='text'>
sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO
packets in the first place. (As a performance hint)

Once we have GSO packets in write queue, we can not decide they are no
longer GSO only because flow now uses a route which doesn't handle
TSO/GSO.

Core networking stack handles the case very well for us, all we need
is keeping track of packet counts in MSS terms, regardless of
segmentation done later (in GSO or hardware)

Right now, if  tcp_fragment() splits a GSO packet in two parts,
@left and @right, and route changed through a non GSO device,
both @left and @right have pcount set to 1, which is wrong,
and leads to incorrect packet_count tracking.

This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with MTU
discovery")

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Reported-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: must unclone packets before mangling them</title>
<updated>2013-10-17T20:08:08Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-10-15T18:54:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c52e2421f7368fd36cbe330d2cf41b10452e39a9'/>
<id>urn:sha1:c52e2421f7368fd36cbe330d2cf41b10452e39a9</id>
<content type='text'>
TCP stack should make sure it owns skbs before mangling them.

We had various crashes using bnx2x, and it turned out gso_size
was cleared right before bnx2x driver was populating TC descriptor
of the _previous_ packet send. TCP stack can sometime retransmit
packets that are still in Qdisc.

Of course we could make bnx2x driver more robust (using
ACCESS_ONCE(shinfo-&gt;gso_size) for example), but the bug is TCP stack.

We have identified two points where skb_unclone() was needed.

This patch adds a WARN_ON_ONCE() to warn us if we missed another
fix of this kind.

Kudos to Neal for finding the root cause of this bug. Its visible
using small MSS.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: fix incorrect ca_state in tail loss probe</title>
<updated>2013-10-17T19:41:02Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2013-10-12T17:16:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=031afe4990a7c9dbff41a3a742c44d3e740ea0a1'/>
<id>urn:sha1:031afe4990a7c9dbff41a3a742c44d3e740ea0a1</id>
<content type='text'>
On receiving an ACK that covers the loss probe sequence, TLP
immediately sets the congestion state to Open, even though some packets
are not recovered and retransmisssion are on the way.  The later ACks
may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g.,
https://bugzilla.redhat.com/show_bug.cgi?id=989251

The fix is to follow the similar procedure in recovery by calling
tcp_try_keep_open(). The sender switches to Open state if no packets
are retransmissted. Otherwise it goes to Disorder and let subsequent
ACKs move the state to Recovery or Open.

Reported-By: Michael Sterrett &lt;michael@sterretts.net&gt;
Tested-By: Dormando &lt;dormando@rydia.net&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vti: get rid of nf mark rule in prerouting</title>
<updated>2013-10-11T18:52:17Z</updated>
<author>
<name>Christophe Gouault</name>
<email>christophe.gouault@6wind.com</email>
</author>
<published>2013-10-08T15:21:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7263a5187f9e9de45fcb51349cf0e031142c19a1'/>
<id>urn:sha1:7263a5187f9e9de45fcb51349cf0e031142c19a1</id>
<content type='text'>
This patch fixes and improves the use of vti interfaces (while
lightly changing the way of configuring them).

Currently:

- it is necessary to identify and mark inbound IPsec
  packets destined to each vti interface, via netfilter rules in
  the mangle table at prerouting hook.

- the vti module cannot retrieve the right tunnel in input since
  commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
  is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.

- the i_key is used by the outbound processing as a mark to lookup
  for the right SP and SA bundle.

This patch uses the o_key to store the vti mark (instead of i_key) and
enables:

- to avoid the need for previously marking the inbound skbuffs via a
  netfilter rule.
- to properly retrieve the right tunnel in input, only based on the IPsec
  packet outer addresses.
- to properly perform an inbound policy check (using the tunnel o_key
  as a mark).
- to properly perform an outbound SPD and SAD lookup (using the tunnel
  o_key as a mark).
- to keep the current mark of the skbuff. The skbuff mark is neither
  used nor changed by the vti interface. Only the vti interface o_key
  is used.

SAs have a wildcard mark.
SPs have a mark equal to the vti interface o_key.

The vti interface must be created as follows (i_key = 0, o_key = mark):

   ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1

The SPs attached to vti1 must be created as follows (mark = vti1 o_key):

   ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
      proto esp mode tunnel
   ip xfrm policy add dir in  mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
      proto esp mode tunnel

The SAs are created with the default wildcard mark. There is no
distinction between global vs. vti SAs. Just their addresses will
possibly link them to a vti interface:

   ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
                 enc "cbc(aes)" "azertyuiopqsdfgh"

   ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
                 enc "cbc(aes)" "sqbdhgqsdjqjsdfh"

To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
should no use the default wildcard mark, but explicitly match mark 0.

To avoid a double SPD lookup in input and output (in global and vti SPDs),
the NOPOLICY and NOXFRM options should be set on the vti interfaces:

   echo 1 &gt; /proc/sys/net/ipv4/conf/vti1/disable_policy
   echo 1 &gt; /proc/sys/net/ipv4/conf/vti1/disable_xfrm

The outgoing traffic is steered to vti1 by a route via the vti interface:

   ip route add 192.168.0.0/16 dev vti1

The incoming IPsec traffic is steered to vti1 because its outer addresses
match the vti1 tunnel configuration.

Signed-off-by: Christophe Gouault &lt;christophe.gouault@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec</title>
<updated>2013-10-09T17:41:45Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2013-10-09T17:41:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f606385068afe00683f080555f5d0fa9fb8e1f37'/>
<id>urn:sha1:f606385068afe00683f080555f5d0fa9fb8e1f37</id>
<content type='text'>
Steffen Klassert says:

====================
1) We used the wrong netlink attribute to verify the
   lenght of the replay window on async events. Fix this by
   using the right netlink attribute.

2) Policy lookups can not match the output interface on forwarding.
   Add the needed informations to the flow informations.

3) We update the pmtu when we receive a ICMPV6_DEST_UNREACH message
   on IPsec with ipv6. This is wrong and leads to strange fragmented
   packets, only ICMPV6_PKT_TOOBIG messages should update the pmtu.
   Fix this by removing the ICMPV6_DEST_UNREACH check from the IPsec
   protocol error handlers.

4) The legacy IPsec anti replay mechanism supports anti replay
   windows up to 32 packets. If a user requests for a bigger
   anti replay window, we use 32 packets but pretend that we use
   the requested window size. Fix from Fan Du.

5) If asynchronous events are enabled and replay_maxdiff is set to
   zero, we generate an async event for every received packet instead
   of checking whether a timeout occurred. Fix from Thomas Egerer.

6) Policies need a refcount when the state resolution timer is armed.
   Otherwise the timer can fire after the policy is deleted.

7) We might dreference a NULL pointer if the hold_queue is empty,
   add a check to avoid this.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: fix ineffective source address selection</title>
<updated>2013-10-07T19:26:46Z</updated>
<author>
<name>Jiri Benc</name>
<email>jbenc@redhat.com</email>
</author>
<published>2013-10-04T15:04:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0a7e22609067ff524fc7bbd45c6951dd08561667'/>
<id>urn:sha1:0a7e22609067ff524fc7bbd45c6951dd08561667</id>
<content type='text'>
When sending out multicast messages, the source address in inet-&gt;mc_addr is
ignored and rewritten by an autoselected one. This is caused by a typo in
commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output
route lookups").

Signed-off-by: Jiri Benc &lt;jbenc@redhat.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: do not forget FIN in tcp_shifted_skb()</title>
<updated>2013-10-04T18:16:36Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-10-04T17:31:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e8a402f831dbe7ee831340a91439e46f0d38acd'/>
<id>urn:sha1:5e8a402f831dbe7ee831340a91439e46f0d38acd</id>
<content type='text'>
Yuchung found following problem :

 There are bugs in the SACK processing code, merging part in
 tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
 skbs FIN flag. When a receiver first SACK the FIN sequence, and later
 throw away ofo queue (e.g., sack-reneging), the sender will stop
 retransmitting the FIN flag, and hangs forever.

Following packetdrill test can be used to reproduce the bug.

$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`

// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0

+.050 &lt; S 0:0(0) win 32792 &lt;mss 1000,sackOK,nop,nop,nop,wscale 7&gt;
+.000 &gt; S. 0:0(0) ack 1 &lt;mss 1460,nop,nop,sackOK,nop,wscale 6&gt;
+.001 &lt; . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4

+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 &gt; . 1:10001(10000) ack 1
+.050 &lt; . 1:1(0) ack 2001 win 257
+.000 &gt; FP. 10001:12001(2000) ack 1
+.050 &lt; . 1:1(0) ack 2001 win 257 &lt;sack 10001:11001,nop,nop&gt;
+.050 &lt; . 1:1(0) ack 2001 win 257 &lt;sack 10001:12002,nop,nop&gt;
// SACK reneg
+.050 &lt; . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%

First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.

Bug was added in 2.6.29 by commit 832d11c5cd076ab
("tcp: Try to restore large SKBs while SACK processing")

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Ilpo Järvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Acked-by: Ilpo Järvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: do not call sock_put() on TIMEWAIT sockets</title>
<updated>2013-10-02T21:05:54Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-10-02T04:04:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=80ad1d61e72d626e30ebe8529a0455e660ca4693'/>
<id>urn:sha1:80ad1d61e72d626e30ebe8529a0455e660ca4693</id>
<content type='text'>
commit 3ab5aee7fe84 ("net: Convert TCP &amp; DCCP hash tables to use RCU /
hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.

We should instead use inet_twsk_put()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
