<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4/proc.c, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-07-22T05:36:33Z</updated>
<entry>
<title>net: track success and failure of TCP PMTU probing</title>
<updated>2015-07-22T05:36:33Z</updated>
<author>
<name>Rick Jones</name>
<email>rick.jones2@hp.com</email>
</author>
<published>2015-07-21T23:14:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b56ea2985d389a3676638203323ebe22c261b7fe'/>
<id>urn:sha1:b56ea2985d389a3676638203323ebe22c261b7fe</id>
<content type='text'>
Track success and failure of TCP PMTU probing.

Signed-off-by: Rick Jones &lt;rick.jones2@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: add TCPWinProbe and TCPKeepAlive SNMP counters</title>
<updated>2015-05-09T20:42:32Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-05-06T21:26:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e520af48c7e5acae5f17f82a79ba7ab7cf156f3b'/>
<id>urn:sha1:e520af48c7e5acae5f17f82a79ba7ab7cf156f3b</id>
<content type='text'>
Diagnosing problems related to Window Probes has been hard because
we lack a counter.

TCPWinProbe counts the number of ACK packets a sender has to send
at regular intervals to make sure a reverse ACK packet opening back
a window had not been lost.

TCPKeepAlive counts the number of ACK packets sent to keep TCP
flows alive (SO_KEEPALIVE)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Nandita Dukkipati &lt;nanditad@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: get rid of central timewait timer</title>
<updated>2015-04-13T20:40:05Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-04-13T01:51:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=789f558cfb3680aeb52de137418637f6b04b7d22'/>
<id>urn:sha1:789f558cfb3680aeb52de137418637f6b04b7d22</id>
<content type='text'>
Using a timer wheel for timewait sockets was nice ~15 years ago when
memory was expensive and machines had a single processor.

This does not scale, code is ugly and source of huge latencies
(Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)

We can afford to use an extra 64 bytes per timewait sock and spread
timewait load to all cpus to have better behavior.

Tested:

On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
on the target (lpaa24)

Before patch :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
419594

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
437171

While test is running, we can observe 25 or even 33 ms latencies.

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2

After patch :

About 90% increase of throughput :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
810442

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
800992

And latencies are kept to minimal values during this load, even
if network utilization is 90% higher :

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks</title>
<updated>2015-02-08T09:03:12Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2015-02-06T21:04:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=032ee4236954eb214651cb9bfc1b38ffa8fd7a01'/>
<id>urn:sha1:032ee4236954eb214651cb9bfc1b38ffa8fd7a01</id>
<content type='text'>
Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.

This patch includes:

- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending

The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.

We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.

We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.

The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.

The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.

Reported-by: Avery Fay &lt;avery@mixpanel.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp_cubic: add SNMP counters to track how effective is Hystart</title>
<updated>2014-12-09T19:58:23Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-12-05T00:13:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e3a8a937c2f86ee0b2d354808fc026a143b4518'/>
<id>urn:sha1:6e3a8a937c2f86ee0b2d354808fc026a143b4518</id>
<content type='text'>
When deploying FQ pacing, one thing we noticed is that CUBIC Hystart
triggers too soon.

Having SNMP counters to have an idea of how often the various Hystart
methods trigger is useful prior to any modifications.

This patch adds SNMP counters tracking, how many time "ack train" or
"Delay" based Hystart triggers, and cumulative sum of cwnd at the time
Hystart decided to end SS (Slow Start)

myhost:~# nstat -a | grep Hystart
TcpExtTCPHystartTrainDetect     9                  0.0
TcpExtTCPHystartTrainCwnd       20650              0.0
TcpExtTCPHystartDelayDetect     10                 0.0
TcpExtTCPHystartDelayCwnd       360                0.0

-&gt;
 Train detection was triggered 9 times, and average cwnd was
 20650/9=2294,
 Delay detection was triggered 10 times and average cwnd was 36

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts</title>
<updated>2014-11-07T20:45:50Z</updated>
<author>
<name>Rick Jones</name>
<email>rick.jones2@hp.com</email>
</author>
<published>2014-11-06T18:37:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=36cbb2452cbafca64dcdd3578047433787900cf0'/>
<id>urn:sha1:36cbb2452cbafca64dcdd3578047433787900cf0</id>
<content type='text'>
As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram.  We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.

This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.

V3 - use bool per David Miller

Signed-off-by: Rick Jones &lt;rick.jones2@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: use seq_puts instead of seq_printf where possible</title>
<updated>2014-11-04T20:09:52Z</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2014-11-04T19:31:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c9f503b00622525bb9a549ff38c4bb20b4287b84'/>
<id>urn:sha1:c9f503b00622525bb9a549ff38c4bb20b4287b84</id>
<content type='text'>
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frag: don't account number of fragment queues</title>
<updated>2014-07-28T05:34:36Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-07-24T14:50:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=434d305405ab86414f6ea3f261307d443a2c3506'/>
<id>urn:sha1:434d305405ab86414f6ea3f261307d443a2c3506</id>
<content type='text'>
The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter.  Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: clean up snmp stats code</title>
<updated>2014-05-07T20:06:05Z</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-05-05T22:55:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=698365fa1874aa7635d51667a34a2842228e9837'/>
<id>urn:sha1:698365fa1874aa7635d51667a34a2842228e9837</id>
<content type='text'>
commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

	commit 933393f58fef9963eac61db8093689544e29a600
	Date:   Thu Dec 22 11:58:51 2011 -0600

	    percpu: Remove irqsafe_cpu_xxx variants

	    We simply say that regular this_cpu use must be safe regardless of
	    preemption and interrupt state.  That has no material change for x86
	    and s390 implementations of this_cpu operations.  However, arches that
	    do not provide their own implementation for this_cpu operations will
	    now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: snmp stats for Fast Open, SYN rtx, and data pkts</title>
<updated>2014-03-03T20:58:03Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2014-03-03T20:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd'/>
<id>urn:sha1:f19c29e3e391a66a273e9afebaf01917245148cd</id>
<content type='text'>
Add the following snmp stats:

TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed beacuse
the remote does not accept it or the attempts timed out.

TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
retransmissions into SYN, fast-retransmits, timeout retransmits, etc.

TCPOrigDataSent: number of outgoing packets with original data (excluding
retransmission but including data-in-SYN). This counter is different from
TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
more useful to track the TCP retransmission rate.

Change TCPFastOpenActive to track only successful Fast Opens to be symmetric to
TCPFastOpenPassive.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Nandita Dukkipati &lt;nanditad@google.com&gt;
Signed-off-by: Lawrence Brakmo &lt;brakmo@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
