<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4/proc.c, branch v3.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2012-09-01T00:02:18Z</updated>
<entry>
<title>tcp: TCP Fast Open Server - header &amp; support functions</title>
<updated>2012-09-01T00:02:18Z</updated>
<author>
<name>Jerry Chu</name>
<email>hkchu@google.com</email>
</author>
<published>2012-08-31T12:29:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1046716368979dee857a2b8a91c4a8833f21b9cb'/>
<id>urn:sha1:1046716368979dee857a2b8a91c4a8833f21b9cb</id>
<content type='text'>
This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.

In addition, it includes the following:

1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.

2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.

"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.

"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.

3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.

Signed-off-by: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-tcp: Fast Open client - sending SYN-data</title>
<updated>2012-07-19T18:02:03Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2012-07-19T06:43:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=783237e8daf13481ee234997cbbbb823872ac388'/>
<id>urn:sha1:783237e8daf13481ee234997cbbbb823872ac388</id>
<content type='text'>
This patch implements sending SYN-data in tcp_connect(). The data is
from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).

The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
type of the SYN. If the cookie is not cached (len==0), the host sends
data-less SYN with Fast Open cookie request option to solicit a cookie
from the remote. If cookie is not available (len &gt; 0), the host sends
a SYN-data with Fast Open cookie option. If cookie length is negative,
  the SYN will not include any Fast Open option (for fall back operations).

To deal with middleboxes that may drop SYN with data or experimental TCP
option, the SYN-data is only sent once. SYN retransmits do not include
data or Fast Open options. The connection will fall back to regular TCP
handshake.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: implement RFC 5961 4.2</title>
<updated>2012-07-17T14:40:46Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-17T01:41:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0c24604b68fc7810d429d6c3657b6f148270e528'/>
<id>urn:sha1:0c24604b68fc7810d429d6c3657b6f148270e528</id>
<content type='text'>
Implement the RFC 5691 mitigation against Blind
Reset attack using SYN bit.

Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
incoming packet, instead of resetting the session.

Add a new SNMP counter to count number of challenge acks sent
in response to SYN packets.
(netstat -s | grep TCPSYNChallenge)

Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
because of a SYN flag.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Kiran Kumar Kella &lt;kkiran@broadcom.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: implement RFC 5961 3.2</title>
<updated>2012-07-17T08:36:20Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-17T08:13:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=282f23c6ee343126156dd41218b22ece96d747e3'/>
<id>urn:sha1:282f23c6ee343126156dd41218b22ece96d747e3</id>
<content type='text'>
Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.

Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT &lt;= SEG.SEQ &lt; RCV.NXT+RCV.WND)

If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.

Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.

Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Kiran Kumar Kella &lt;kkiran@broadcom.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: add OFO snmp counters</title>
<updated>2012-07-17T05:12:00Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-16T01:41:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a6df1ae9383697c4eb1365176002f154982325ad'/>
<id>urn:sha1:a6df1ae9383697c4eb1365176002f154982325ad</id>
<content type='text'>
Add three SNMP TCP counters, to better track TCP behavior
at global stage (netstat -s), when packets are received
Out Of Order (OFO)

TCPOFOQueue : Number of packets queued in OFO queue

TCPOFODrop  : Number of packets meant to be queued in OFO
              but dropped because socket rcvbuf limit hit.

TCPOFOMerge : Number of packets in OFO that were merged with
              other packets.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: reduce out_of_order memory use</title>
<updated>2012-03-19T20:53:08Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-03-18T11:07:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c8628155ece363487b57d33441ea0359018c0fa7'/>
<id>urn:sha1:c8628155ece363487b57d33441ea0359018c0fa7</id>
<content type='text'>
With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.

Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU &lt;= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.

When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.

Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.

Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.

This also reduced average memory used by tcp sockets.

With help from Neal Cardwell.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: H.K. Jerry Chu &lt;hkchu@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Ilpo Järvinen &lt;ilpo.jarvinen@helsinki.fi&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: add LINUX_MIB_TCPRETRANSFAIL counter</title>
<updated>2012-01-26T18:51:00Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-01-25T04:44:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=09e9b813d34d9a09d64a64580a9959d8bae1f4f5'/>
<id>urn:sha1:09e9b813d34d9a09d64a64580a9959d8bae1f4f5</id>
<content type='text'>
It might be useful to get a counter of failed tcp_retransmit_skb()
calls.

Reported-by: Satoru Moriya &lt;satoru.moriya@hds.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: detect loss above high_seq in recovery</title>
<updated>2012-01-22T20:08:44Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2012-01-19T14:42:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=974c12360dfe6ab01201fe9e708e7755c413f8b6'/>
<id>urn:sha1:974c12360dfe6ab01201fe9e708e7755c413f8b6</id>
<content type='text'>
Correctly implement a loss detection heuristic: New sequences (above
high_seq) sent during the fast recovery are deemed lost when higher
sequences are SACKed.

Current code does not catch these losses, because tcp_mark_head_lost()
does not check packets beyond high_seq. The fix is straight-forward by
checking packets until the highest sacked packet. In addition, all the
FLAG_DATA_LOST logic are in-effective and redundant and can be removed.

Update the loss heuristic comments. The algorithm above is documented
as heuristic B, but it is redundant too because heuristic A already
covers B.

Note that this change only marks some forward-retransmitted packets LOST.
It does NOT forbid TCP performing further CWR on new losses. A potential
follow-up patch under preparation is to perform another CWR on "new"
losses such as
1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
2) retransmission is lost.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>foundations of per-cgroup memory pressure controlling.</title>
<updated>2011-12-13T00:04:10Z</updated>
<author>
<name>Glauber Costa</name>
<email>glommer@parallels.com</email>
</author>
<published>2011-12-11T21:47:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=180d8cd942ce336b2c869d324855c40c5db478ad'/>
<id>urn:sha1:180d8cd942ce336b2c869d324855c40c5db478ad</id>
<content type='text'>
This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.

Signed-off-by: Glauber Costa &lt;glommer@parallels.com&gt;
Reviewed-by: Hiroyouki Kamezawa &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
CC: David S. Miller &lt;davem@davemloft.net&gt;
CC: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
CC: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: reduce percpu needs for icmpmsg mibs</title>
<updated>2011-11-09T21:04:20Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-11-08T13:04:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=acb32ba3dee66d58704caeeb8c6ff95f60efdc66'/>
<id>urn:sha1:acb32ba3dee66d58704caeeb8c6ff95f60efdc66</id>
<content type='text'>
Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
(can be ~88000 us).

This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
for all possible cpus can read 16 Mbytes of memory.

ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.

This saves 4096 bytes per cpu and per network namespace.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
