<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/ip.h, branch v3.18</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.18</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.18'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-09-28T20:35:42Z</updated>
<entry>
<title>ipv4: rename ip_options_echo to __ip_options_echo()</title>
<updated>2014-09-28T20:35:42Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-09-27T16:50:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=24a2d43d8886f5a29c3cf108927f630c545a9a38'/>
<id>urn:sha1:24a2d43d8886f5a29c3cf108927f630c545a9a38</id>
<content type='text'>
ip_options_echo() assumes struct ip_options is provided in &amp;IPCB(skb)-&gt;opt
Lets break this assumption, but provide a helper to not change all call points.

ip_send_unicast_reply() gets a new struct ip_options pointer.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>icmp: add a global rate limitation</title>
<updated>2014-09-23T16:47:38Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-09-19T14:38:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4cdf507d54525842dfd9f6313fdafba039084046'/>
<id>urn:sha1:4cdf507d54525842dfd9f6313fdafba039084046</id>
<content type='text'>
Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
protected by a lock, meaning that hosts can be stuck hard if all cpus
want to check ICMP limits.

When say a DNS or NTP server process is restarted, inetpeer tree grows
quick and machine comes to its knees.

iptables can not help because the bottleneck happens before ICMP
messages are even cooked and sent.

This patch adds a new global limitation, using a token bucket filter,
controlled by two new sysctl :

icmp_msgs_per_sec - INTEGER
    Limit maximal number of ICMP packets sent per second from this host.
    Only messages whose type matches icmp_ratemask are
    controlled by this limit.
    Default: 1000

icmp_msgs_burst - INTEGER
    icmp_msgs_per_sec controls number of ICMP packets sent per second,
    while icmp_msgs_burst controls the burst size of these packets.
    Default: 50

Note that if we really want to send millions of ICMP messages per
second, we might extend idea and infra added in commit 04ca6973f7c1a
("ip: make IP identifiers less predictable") :
add a token bucket in the ip_idents hash and no longer rely on inetpeer.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/ipv4: bind ip_nonlocal_bind to current netns</title>
<updated>2014-09-09T18:27:09Z</updated>
<author>
<name>Vincent Bernat</name>
<email>vincent@bernat.im</email>
</author>
<published>2014-09-05T13:09:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=49a601589caaf0e93194c0cc9b4ecddbe75dd2d5'/>
<id>urn:sha1:49a601589caaf0e93194c0cc9b4ecddbe75dd2d5</id>
<content type='text'>
net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.

Signed-off-by: Vincent Bernat &lt;vincent@bernat.im&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: add gro_compute_pseudo functions</title>
<updated>2014-08-25T01:09:23Z</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2014-08-22T20:34:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1933a7852ce6a81349855431b25122d7666bbfca'/>
<id>urn:sha1:1933a7852ce6a81349855431b25122d7666bbfca</id>
<content type='text'>
Add inet_gro_compute_pseudo and ip6_gro_compute_pseudo. These are
the logical equivalents of inet_compute_pseudo and ip6_compute_pseudo
for GRO path. The IP header is taken from skb_gro_network_header.

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2014-07-30T20:25:49Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2014-07-30T20:25:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f139c74a8df071217dcd63f3ef06ae7be7071c4d'/>
<id>urn:sha1:f139c74a8df071217dcd63f3ef06ae7be7071c4d</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: fail early when creating netdev named all or default</title>
<updated>2014-07-29T18:43:50Z</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-07-25T22:25:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=20e61da7ffcfd84a1b6f797e745608572e5bc218'/>
<id>urn:sha1:20e61da7ffcfd84a1b6f797e745608572e5bc218</id>
<content type='text'>
We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".

Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.

Reported-by: Stephane Chazelas &lt;stephane.chazelas@gmail.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ip: make IP identifiers less predictable</title>
<updated>2014-07-29T01:46:34Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-07-26T06:58:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04ca6973f7c1a0d8537f2d9906a0cf8e69886d75'/>
<id>urn:sha1:04ca6973f7c1a0d8537f2d9906a0cf8e69886d75</id>
<content type='text'>
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A &gt; target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target &gt; A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A &gt; target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target &gt; A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A &gt; target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target &gt; A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Jeffrey Knockel &lt;jeffk@cs.unm.edu&gt;
Reported-by: Jedidiah R. Crandall &lt;crandall@cs.unm.edu&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Cc: Hannes Frederic Sowa &lt;hannes@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: frag: don't account number of fragment queues</title>
<updated>2014-07-28T05:34:36Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-07-24T14:50:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=434d305405ab86414f6ea3f261307d443a2c3506'/>
<id>urn:sha1:434d305405ab86414f6ea3f261307d443a2c3506</id>
<content type='text'>
The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter.  Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Save TX flow hash in sock and set in skbuf on xmit</title>
<updated>2014-07-08T04:14:21Z</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2014-07-02T04:32:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b73c3d0e4f0e1961e15bec18720e48aabebe2109'/>
<id>urn:sha1:b73c3d0e4f0e1961e15bec18720e48aabebe2109</id>
<content type='text'>
For a connected socket we can precompute the flow hash for setting
in skb-&gt;hash on output. This is a performance advantage over
calculating the skb-&gt;hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb-&gt;hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: get rid of ip_id_count</title>
<updated>2014-06-02T18:00:41Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-02T12:26:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=73f156a6e8c1074ac6327e0abd1169e95eb66463'/>
<id>urn:sha1:73f156a6e8c1074ac6327e0abd1169e95eb66463</id>
<content type='text'>
Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
