<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/tipc/socket.c, branch v4.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-06-18T04:38:10Z</updated>
<entry>
<title>tipc: fix socket timer deadlock</title>
<updated>2016-06-18T04:38:10Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-06-17T10:35:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f1d048f24e66ba85d3dabf3d076cefa5f2b546b0'/>
<id>urn:sha1:f1d048f24e66ba85d3dabf3d076cefa5f2b546b0</id>
<content type='text'>
We sometimes observe a 'deadly embrace' type deadlock occurring
between mutually connected sockets on the same node. This happens
when the one-hour peer supervision timers happen to expire
simultaneously in both sockets.

The scenario is as follows:

CPU 1:                          CPU 2:
--------                        --------
tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
  lock(sk1.slock)                 lock(sk2.slock)
  msg_create(probe)               msg_create(probe)
  unlock(sk1.slock)               unlock(sk2.slock)
  tipc_node_xmit_skb()            tipc_node_xmit_skb()
    tipc_node_xmit()                tipc_node_xmit()
      tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
        lock(sk2.slock)                 lock((sk1.slock)
        filter_rcv()                    filter_rcv()
          tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
            msg_create(probe_rsp)           msg_create(probe_rsp)
            tipc_sk_respond()               tipc_sk_respond()
              tipc_node_xmit_skb()            tipc_node_xmit_skb()
                tipc_node_xmit()                tipc_node_xmit()
                  tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                    lock((sk1.slock)                lock((sk2.slock)
                    ===&gt; DEADLOCK                   ===&gt; DEADLOCK

Further analysis reveals that there are three different locations in the
socket code where tipc_sk_respond() is called within the context of the
socket lock, with ensuing risk of similar deadlocks.

We now solve this by passing a buffer queue along with all upcalls where
sk_lock.slock may potentially be held. Response or rejected message
buffers are accumulated into this queue instead of being sent out
directly, and only sent once we know we are safely outside the slock
context.

Reported-by: GUNA &lt;gbalasun@gmail.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial</title>
<updated>2016-05-18T00:05:30Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-05-18T00:05:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=16bf8348055fe4615bd08ef50f9874f5dcc10268'/>
<id>urn:sha1:16bf8348055fe4615bd08ef50f9874f5dcc10268</id>
<content type='text'>
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits)
  gitignore: fix wording
  mfd: ab8500-debugfs: fix "between" in printk
  memstick: trivial fix of spelling mistake on management
  cpupowerutils: bench: fix "average"
  treewide: Fix typos in printk
  IB/mlx4: printk fix
  pinctrl: sirf/atlas7: fix printk spelling
  serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/
  w1: comment spelling s/minmum/minimum/
  Blackfin: comment spelling s/divsor/divisor/
  metag: Fix misspellings in comments.
  ia64: Fix misspellings in comments.
  hexagon: Fix misspellings in comments.
  tools/perf: Fix misspellings in comments.
  cris: Fix misspellings in comments.
  c6x: Fix misspellings in comments.
  blackfin: Fix misspelling of 'register' in comment.
  avr32: Fix misspelling of 'definitions' in comment.
  treewide: Fix typos in printk
  Doc: treewide : Fix typos in DocBook/filesystem.xml
  ...
</content>
</entry>
<entry>
<title>tipc: check nl sock before parsing nested attributes</title>
<updated>2016-05-17T01:58:54Z</updated>
<author>
<name>Richard Alpe</name>
<email>richard.alpe@ericsson.com</email>
</author>
<published>2016-05-16T09:14:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=45e093ae2830cd1264677d47ff9a95a71f5d9f9c'/>
<id>urn:sha1:45e093ae2830cd1264677d47ff9a95a71f5d9f9c</id>
<content type='text'>
Make sure the socket for which the user is listing publication exists
before parsing the socket netlink attributes.

Prior to this patch a call without any socket caused a NULL pointer
dereference in tipc_nl_publ_dump().

Tested-and-reported-by: Baozeng Ding &lt;sploving1@gmail.com&gt;
Signed-off-by: Richard Alpe &lt;richard.alpe@ericsson.com&gt;
Acked-by: Jon Maloy &lt;jon.maloy@ericsson.cm&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: redesign connection-level flow control</title>
<updated>2016-05-03T19:51:16Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-05-02T15:58:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=10724cc7bb7832b482df049c20fd824d928c5eaa'/>
<id>urn:sha1:10724cc7bb7832b482df049c20fd824d928c5eaa</id>
<content type='text'>
There are two flow control mechanisms in TIPC; one at link level that
handles network congestion, burst control, and retransmission, and one
at connection level which' only remaining task is to prevent overflow
in the receiving socket buffer. In TIPC, the latter task has to be
solved end-to-end because messages can not be thrown away once they
have been accepted and delivered upwards from the link layer, i.e, we
can never permit the receive buffer to overflow.

Currently, this algorithm is message based. A counter in the receiving
socket keeps track of number of consumed messages, and sends a dedicated
acknowledge message back to the sender for each 256 consumed message.
A counter at the sending end keeps track of the sent, not yet
acknowledged messages, and blocks the sender if this number ever reaches
512 unacknowledged messages. When the missing acknowledge arrives, the
socket is then woken up for renewed transmission. This works well for
keeping the message flow running, as it almost never happens that a
sender socket is blocked this way.

A problem with the current mechanism is that it potentially is very
memory consuming. Since we don't distinguish between small and large
messages, we have to dimension the socket receive buffer according
to a worst-case of both. I.e., the window size must be chosen large
enough to sustain a reasonable throughput even for the smallest
messages, while we must still consider a scenario where all messages
are of maximum size. Hence, the current fix window size of 512 messages
and a maximum message size of 66k results in a receive buffer of 66 MB
when truesize(66k) = 131k is taken into account. It is possible to do
much better.

This commit introduces an algorithm where we instead use 1024-byte
blocks as base unit. This unit, always rounded upwards from the
actual message size, is used when we advertise windows as well as when
we count and acknowledge transmitted data. The advertised window is
based on the configured receive buffer size in such a way that even
the worst-case truesize/msgsize ratio always is covered. Since the
smallest possible message size (from a flow control viewpoint) now is
1024 bytes, we can safely assume this ratio to be less than four, which
is the value we are now using.

This way, we have been able to reduce the default receive buffer size
from 66 MB to 2 MB with maintained performance.

In order to keep this solution backwards compatible, we introduce a
new capability bit in the discovery protocol, and use this throughout
the message sending/reception path to always select the right unit.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: propagate peer node capabilities to socket layer</title>
<updated>2016-05-03T19:51:15Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-05-02T15:58:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=60020e1857042387cdcd4cd6680a9e5496213379'/>
<id>urn:sha1:60020e1857042387cdcd4cd6680a9e5496213379</id>
<content type='text'>
During neighbor discovery, nodes advertise their capabilities as a bit
map in a dedicated 16-bit field in the discovery message header. This
bit map has so far only be stored in the node structure on the peer
nodes, but we now see the need to keep a copy even in the socket
structure.

This commit adds this functionality.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: re-enable compensation for socket receive buffer double counting</title>
<updated>2016-05-03T19:51:14Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-05-02T15:58:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7c8bcfb1255fe9d929c227d67bdcd84430fd200b'/>
<id>urn:sha1:7c8bcfb1255fe9d929c227d67bdcd84430fd200b</id>
<content type='text'>
In the refactoring commit d570d86497ee ("tipc: enqueue arrived buffers
in socket in separate function") we did by accident replace the test

if (sk-&gt;sk_backlog.len == 0)
     atomic_set(&amp;tsk-&gt;dupl_rcvcnt, 0);

with

if (sk-&gt;sk_backlog.len)
     atomic_set(&amp;tsk-&gt;dupl_rcvcnt, 0);

This effectively disables the compensation we have for the double
receive buffer accounting that occurs temporarily when buffers are
moved from the backlog to the socket receive queue. Until now, this
has gone unnoticed because of the large receive buffer limits we are
applying, but becomes indispensable when we reduce this buffer limit
later in this series.

We now fix this by inverting the mentioned condition.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>treewide: Fix typos in printk</title>
<updated>2016-04-18T09:23:24Z</updated>
<author>
<name>Masanari Iida</name>
<email>standby24x7@gmail.com</email>
</author>
<published>2016-02-08T11:53:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c19ca6cb4c0891049009d48a0da79d9e8c475462'/>
<id>urn:sha1:c19ca6cb4c0891049009d48a0da79d9e8c475462</id>
<content type='text'>
This patch fix spelling typos found in printk
within various part of the kernel sources.

Signed-off-by: Masanari Iida &lt;standby24x7@gmail.com&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2016-03-08T17:34:12Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-03-08T17:34:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=810813c47a564416f6306ae214e2661366c987a7'/>
<id>urn:sha1:810813c47a564416f6306ae214e2661366c987a7</id>
<content type='text'>
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: move netlink policies to netlink.c</title>
<updated>2016-03-07T19:56:41Z</updated>
<author>
<name>Richard Alpe</name>
<email>richard.alpe@ericsson.com</email>
</author>
<published>2016-03-04T16:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=49cc66eaee19e772997b63b057ea4b4bf7d48db0'/>
<id>urn:sha1:49cc66eaee19e772997b63b057ea4b4bf7d48db0</id>
<content type='text'>
Make the c files less cluttered and enable netlink attributes to be
shared between files.

Signed-off-by: Richard Alpe &lt;richard.alpe@ericsson.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: Revert "tipc: use existing sk_write_queue for outgoing packet chain"</title>
<updated>2016-03-03T21:30:29Z</updated>
<author>
<name>Parthasarathy Bhuvaragan</name>
<email>parthasarathy.bhuvaragan@ericsson.com</email>
</author>
<published>2016-03-01T10:07:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f214fc402967e1bc94ad7f39faa03db5813d6849'/>
<id>urn:sha1:f214fc402967e1bc94ad7f39faa03db5813d6849</id>
<content type='text'>
reverts commit 94153e36e709e ("tipc: use existing sk_write_queue for
outgoing packet chain")

In Commit 94153e36e709e, we assume that we fill &amp; empty the socket's
sk_write_queue within the same lock_sock() session.

This is not true if the link is congested. During congestion, the
socket lock is released while we wait for the congestion to cease.
This implementation causes a nullptr exception, if the user space
program has several threads accessing the same socket descriptor.

Consider two threads of the same program performing the following:
     Thread1                                  Thread2
--------------------                    ----------------------
Enter tipc_sendmsg()                    Enter tipc_sendmsg()
lock_sock()                             lock_sock()
Enter tipc_link_xmit(), ret=ELINKCONG   spin on socket lock..
sk_wait_event()                             :
release_sock()                          grab socket lock
    :                                   Enter tipc_link_xmit(), ret=0
    :                                   release_sock()
Wakeup after congestion
lock_sock()
skb = skb_peek(pktchain);
!! TIPC_SKB_CB(skb)-&gt;wakeup_pending = tsk-&gt;link_cong;

In this case, the second thread transmits the buffers belonging to
both thread1 and thread2 successfully. When the first thread wakeup
after the congestion it assumes that the pktchain is intact and
operates on the skb's in it, which leads to the following exception:

[2102.439969] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
[2102.440074] IP: [&lt;ffffffffa005f330&gt;] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
[2102.440074] PGD 3fa3f067 PUD 3fa6b067 PMD 0
[2102.440074] Oops: 0000 [#1] SMP
[2102.440074] CPU: 2 PID: 244 Comm: sender Not tainted 3.12.28 #1
[2102.440074] RIP: 0010:[&lt;ffffffffa005f330&gt;]  [&lt;ffffffffa005f330&gt;] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
[...]
[2102.440074] Call Trace:
[2102.440074]  [&lt;ffffffff8163f0b9&gt;] ? schedule+0x29/0x70
[2102.440074]  [&lt;ffffffffa006a756&gt;] ? tipc_node_unlock+0x46/0x170 [tipc]
[2102.440074]  [&lt;ffffffffa005f761&gt;] tipc_link_xmit+0x51/0xf0 [tipc]
[2102.440074]  [&lt;ffffffffa006d8ae&gt;] tipc_send_stream+0x11e/0x4f0 [tipc]
[2102.440074]  [&lt;ffffffff8106b150&gt;] ? __wake_up_sync+0x20/0x20
[2102.440074]  [&lt;ffffffffa006dc9c&gt;] tipc_send_packet+0x1c/0x20 [tipc]
[2102.440074]  [&lt;ffffffff81502478&gt;] sock_sendmsg+0xa8/0xd0
[2102.440074]  [&lt;ffffffff81507895&gt;] ? release_sock+0x145/0x170
[2102.440074]  [&lt;ffffffff815030d8&gt;] ___sys_sendmsg+0x3d8/0x3e0
[2102.440074]  [&lt;ffffffff816426ae&gt;] ? _raw_spin_unlock+0xe/0x10
[2102.440074]  [&lt;ffffffff81115c2a&gt;] ? handle_mm_fault+0x6ca/0x9d0
[2102.440074]  [&lt;ffffffff8107dd65&gt;] ? set_next_entity+0x85/0xa0
[2102.440074]  [&lt;ffffffff816426de&gt;] ? _raw_spin_unlock_irq+0xe/0x20
[2102.440074]  [&lt;ffffffff8107463c&gt;] ? finish_task_switch+0x5c/0xc0
[2102.440074]  [&lt;ffffffff8163ea8c&gt;] ? __schedule+0x34c/0x950
[2102.440074]  [&lt;ffffffff81504e12&gt;] __sys_sendmsg+0x42/0x80
[2102.440074]  [&lt;ffffffff81504e62&gt;] SyS_sendmsg+0x12/0x20
[2102.440074]  [&lt;ffffffff8164aed2&gt;] system_call_fastpath+0x16/0x1b

In this commit, we maintain the skb list always in the stack.

Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
