<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/core/sysctl_net_core.c, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-02-09T09:28:06Z</updated>
<entry>
<title>net:Add sysctl_max_skb_frags</title>
<updated>2016-02-09T09:28:06Z</updated>
<author>
<name>Hans Westgaard Ry</name>
<email>hans.westgaard.ry@oracle.com</email>
</author>
<published>2016-02-03T08:26:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5f74f82ea34c0da80ea0b49192bb5ea06e063593'/>
<id>urn:sha1:5f74f82ea34c0da80ea0b49192bb5ea06e063593</id>
<content type='text'>
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.

Signed-off-by: Hans Westgaard Ry &lt;hans.westgaard.ry@oracle.com&gt;
Reviewed-by: Håkon Bugge &lt;haakon.bugge@oracle.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-03-20T22:51:09Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-03-20T22:51:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0fa74a4be48e0f810d3dc6ddbc9d6ac7e86cbee8'/>
<id>urn:sha1:0fa74a4be48e0f810d3dc6ddbc9d6ac7e86cbee8</id>
<content type='text'>
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: increase sk_[max_]ack_backlog</title>
<updated>2015-03-20T16:40:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-03-20T02:04:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=becb74f0acca19b5abfcb24dc602530f3deea66a'/>
<id>urn:sha1:becb74f0acca19b5abfcb24dc602530f3deea66a</id>
<content type='text'>
sk_ack_backlog &amp; sk_max_ack_backlog were 16bit fields, meaning
listen() backlog was limited to 65535.

It is time to increase the width to allow much bigger backlog,
if admins change /proc/sys/net/core/somaxconn &amp;
/proc/sys/net/ipv4/tcp_max_syn_backlog default values.

Tested:

echo 5000000 &gt;/proc/sys/net/core/somaxconn
echo 5000000 &gt;/proc/sys/net/ipv4/tcp_max_syn_backlog

Ran a SYNFLOOD test against a listener using listen(fd, 5000000)

myhost~# grep request_sock_TCP /proc/slabinfo
request_sock_TCP  4185642 4411940    304   13    1 : tunables   54   27    8 : slabdata 339380 339380      0

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sysctl_net_core: check SNDBUF and RCVBUF for min length</title>
<updated>2015-03-12T01:25:13Z</updated>
<author>
<name>Alexey Kodanev</name>
<email>alexey.kodanev@oracle.com</email>
</author>
<published>2015-03-11T11:29:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b1cb59cf2efe7971d3d72a7b963d09a512d994c9'/>
<id>urn:sha1:b1cb59cf2efe7971d3d72a7b963d09a512d994c9</id>
<content type='text'>
sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
set to incorrect values. Given that 'struct sk_buff' allocates from
rcvbuf, incorrectly set buffer length could result to memory
allocation failures. For example, set them as follows:

    # sysctl net.core.rmem_default=64
      net.core.wmem_default = 64
    # sysctl net.core.wmem_default=64
      net.core.wmem_default = 64
    # ping localhost -s 1024 -i 0 &gt; /dev/null

This could result to the following failure:

skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:&lt;NULL&gt;
kernel BUG at net/core/skbuff.c:102!
invalid opcode: 0000 [#1] SMP
...
task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
RIP: 0010:[&lt;ffffffff8155fbd1&gt;]  [&lt;ffffffff8155fbd1&gt;] skb_put+0xa1/0xb0
RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
Stack:
 ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
 ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
Call Trace:
 [&lt;ffffffff81628db4&gt;] unix_stream_sendmsg+0x2c4/0x470
 [&lt;ffffffff81556f56&gt;] sock_write_iter+0x146/0x160
 [&lt;ffffffff811d9612&gt;] new_sync_write+0x92/0xd0
 [&lt;ffffffff811d9cd6&gt;] vfs_write+0xd6/0x180
 [&lt;ffffffff811da499&gt;] SyS_write+0x59/0xd0
 [&lt;ffffffff81651532&gt;] system_call_fastpath+0x12/0x17
Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
      00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 &lt;0f&gt; 0b
      eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
RIP  [&lt;ffffffff8155fbd1&gt;] skb_put+0xa1/0xb0
RSP &lt;ffff88003ae8bc68&gt;
Kernel panic - not syncing: Fatal exception

Moreover, the possible minimum is 1, so we can get another kernel panic:
...
BUG: unable to handle kernel paging request at ffff88013caee5c0
IP: [&lt;ffffffff815604cf&gt;] __alloc_skb+0x12f/0x1f0
...

Signed-off-by: Alexey Kodanev &lt;alexey.kodanev@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: use %*pb[l] to print bitmaps including cpumasks and nodemasks</title>
<updated>2015-02-14T05:21:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-02-13T22:37:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f09068276c5cbe2dd76679b2c9fcc91e12eb7ebe'/>
<id>urn:sha1:f09068276c5cbe2dd76679b2c9fcc91e12eb7ebe</id>
<content type='text'>
printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>net:rfs: adjust table size checking</title>
<updated>2015-02-09T05:54:09Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-02-09T04:39:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=93c1af6ca94c1e763efba76a127b5c135e3d23a6'/>
<id>urn:sha1:93c1af6ca94c1e763efba76a127b5c135e3d23a6</id>
<content type='text'>
Make sure root user does not try something stupid.

Also make sure mask field in struct rps_sock_flow_table
does not share a cache line with the potentially often dirtied
flow table.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 567e4b79731c ("net: rfs: add hash collision detection")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: rfs: add hash collision detection</title>
<updated>2015-02-09T00:53:57Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-02-06T20:59:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=567e4b79731c352a17d73c483959f795d3593e03'/>
<id>urn:sha1:567e4b79731c352a17d73c483959f795d3593e03</id>
<content type='text'>
Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.

Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)

This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.

I use a 32bit value, and dynamically split it in two parts.

For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.

Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.

If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).

This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.

This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-timestamp: no-payload only sysctl</title>
<updated>2015-02-03T02:46:51Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-01-30T18:29:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b245be1f4db1a0394e4b6eb66059814b46670ac3'/>
<id>urn:sha1:b245be1f4db1a0394e4b6eb66059814b46670ac3</id>
<content type='text'>
Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.

Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.

The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.

No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;

----

Changes
  (v1 -&gt; v2)
  - test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
  (rfc -&gt; v1)
  - document the sysctl in Documentation/sysctl/net.txt
  - fix access control race: read .._OPT_TSONLY only once,
        use same value for permission check and skb generation.
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: provide a per host RSS key generic infrastructure</title>
<updated>2014-11-16T20:59:11Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-11-16T14:23:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=960fb622f85180f36d3aff82af53e2be3db2f888'/>
<id>urn:sha1:960fb622f85180f36d3aff82af53e2be3db2f888</id>
<content type='text'>
RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes
RSS key.

Some drivers use a constant (and well known key), some drivers use a random
key per port, making bonding setups hard to tune. Well known keys increase
attack surface, considering that number of queues is usually a power of two.

This patch provides infrastructure to help drivers doing the right thing.

netdev_rss_key_fill() should be used by drivers to initialize their RSS key,
even if they provide ethtool -X support to let user redefine the key later.

A new /proc/sys/net/core/netdev_rss_key file can be used to get the host
RSS key even for drivers not providing ethtool -x support, in case some
applications want to precisely setup flows to match some RX queues.

Tested:

myhost:~# cat /proc/sys/net/core/netdev_rss_key
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited</title>
<updated>2014-11-11T19:10:31Z</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2014-11-11T18:59:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ba7a46f16dd29f93303daeb1fee8af316c5a07f4'/>
<id>urn:sha1:ba7a46f16dd29f93303daeb1fee8af316c5a07f4</id>
<content type='text'>
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_&lt;LEVEL&gt; uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled.  Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings".  For backward compatibility,
the sysctl is not removed, but it has no function.  The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
