<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/tipc/node.c, branch v4.20</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.20</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.20'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2018-11-28T00:30:39Z</updated>
<entry>
<title>tipc: fix lockdep warning during node delete</title>
<updated>2018-11-28T00:30:39Z</updated>
<author>
<name>Jon Maloy</name>
<email>donmalo99@gmail.com</email>
</author>
<published>2018-11-26T17:26:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec835f891232d7763dea9da0358f31e24ca6dfb7'/>
<id>urn:sha1:ec835f891232d7763dea9da0358f31e24ca6dfb7</id>
<content type='text'>
We see the following lockdep warning:

[ 2284.078521] ======================================================
[ 2284.078604] WARNING: possible circular locking dependency detected
[ 2284.078604] 4.19.0+ #42 Tainted: G            E
[ 2284.078604] ------------------------------------------------------
[ 2284.078604] rmmod/254 is trying to acquire lock:
[ 2284.078604] 00000000acd94e28 ((&amp;n-&gt;timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
[ 2284.078604]
[ 2284.078604] but task is already holding lock:
[ 2284.078604] 00000000f997afc0 (&amp;(&amp;tn-&gt;node_list_lock)-&gt;rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
[ 2284.078604]
[ 2284.078604] which lock already depends on the new lock.
[ 2284.078604]
[ 2284.078604]
[ 2284.078604] the existing dependency chain (in reverse order) is:
[ 2284.078604]
[ 2284.078604] -&gt; #1 (&amp;(&amp;tn-&gt;node_list_lock)-&gt;rlock){+.-.}:
[ 2284.078604]        tipc_node_timeout+0x20a/0x330 [tipc]
[ 2284.078604]        call_timer_fn+0xa1/0x280
[ 2284.078604]        run_timer_softirq+0x1f2/0x4d0
[ 2284.078604]        __do_softirq+0xfc/0x413
[ 2284.078604]        irq_exit+0xb5/0xc0
[ 2284.078604]        smp_apic_timer_interrupt+0xac/0x210
[ 2284.078604]        apic_timer_interrupt+0xf/0x20
[ 2284.078604]        default_idle+0x1c/0x140
[ 2284.078604]        do_idle+0x1bc/0x280
[ 2284.078604]        cpu_startup_entry+0x19/0x20
[ 2284.078604]        start_secondary+0x187/0x1c0
[ 2284.078604]        secondary_startup_64+0xa4/0xb0
[ 2284.078604]
[ 2284.078604] -&gt; #0 ((&amp;n-&gt;timer)#2){+.-.}:
[ 2284.078604]        del_timer_sync+0x34/0xa0
[ 2284.078604]        tipc_node_delete+0x1a/0x40 [tipc]
[ 2284.078604]        tipc_node_stop+0xcb/0x190 [tipc]
[ 2284.078604]        tipc_net_stop+0x154/0x170 [tipc]
[ 2284.078604]        tipc_exit_net+0x16/0x30 [tipc]
[ 2284.078604]        ops_exit_list.isra.8+0x36/0x70
[ 2284.078604]        unregister_pernet_operations+0x87/0xd0
[ 2284.078604]        unregister_pernet_subsys+0x1d/0x30
[ 2284.078604]        tipc_exit+0x11/0x6f2 [tipc]
[ 2284.078604]        __x64_sys_delete_module+0x1df/0x240
[ 2284.078604]        do_syscall_64+0x66/0x460
[ 2284.078604]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2284.078604]
[ 2284.078604] other info that might help us debug this:
[ 2284.078604]
[ 2284.078604]  Possible unsafe locking scenario:
[ 2284.078604]
[ 2284.078604]        CPU0                    CPU1
[ 2284.078604]        ----                    ----
[ 2284.078604]   lock(&amp;(&amp;tn-&gt;node_list_lock)-&gt;rlock);
[ 2284.078604]                                lock((&amp;n-&gt;timer)#2);
[ 2284.078604]                                lock(&amp;(&amp;tn-&gt;node_list_lock)-&gt;rlock);
[ 2284.078604]   lock((&amp;n-&gt;timer)#2);
[ 2284.078604]
[ 2284.078604]  *** DEADLOCK ***
[ 2284.078604]
[ 2284.078604] 3 locks held by rmmod/254:
[ 2284.078604]  #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
[ 2284.078604]  #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
[ 2284.078604]  #2: 00000000f997afc0 (&amp;(&amp;tn-&gt;node_list_lock)-&gt;rlock){+.-.}, at: tipc_node_stop+0xac/0x19
[...}

The reason is that the node timer handler sometimes needs to delete a
node which has been disconnected for too long. To do this, it grabs
the lock 'node_list_lock', which may at the same time be held by the
generic node cleanup function, tipc_node_stop(), during module removal.
Since the latter is calling del_timer_sync() inside the same lock, we
have a potential deadlock.

We fix this letting the timer cleanup function use spin_trylock()
instead of just spin_lock(), and when it fails to grab the lock it
just returns so that the timer handler can terminate its execution.
This is safe to do, since tipc_node_stop() anyway is about to
delete both the timer and the node instance.

Fixes: 6a939f365bdb ("tipc: Auto removal of peer down node instance")
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: ignore STATE_MSG on wrong link session</title>
<updated>2018-10-02T05:35:30Z</updated>
<author>
<name>LUU Duc Canh</name>
<email>canh.d.luu@dektech.com.au</email>
</author>
<published>2018-09-26T20:28:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d949cfedbcbab4e91590576cbace2671924ad69c'/>
<id>urn:sha1:d949cfedbcbab4e91590576cbace2671924ad69c</id>
<content type='text'>
The initial session number when a link is created is based on a random
value, taken from struct tipc_net-&gt;random. It is then incremented for
each link reset to avoid mixing protocol messages from different link
sessions.

However, when a bearer is reset all its links are deleted, and will
later be re-created using the same random value as the first time.
This means that if the link never went down between creation and
deletion we will still sometimes have two subsequent sessions with
the same session number. In virtual environments with potentially
long transmission times this has turned out to be a real problem.

We now fix this by randomizing the session number each time a link
is created.

With a session number size of 16 bits this gives a risk of session
collision of 1/64k. To reduce this further, we also introduce a sanity
check on the very first STATE message arriving at a link. If this has
an acknowledge value differing from 0, which is logically impossible,
we ignore the message. The final risk for session collision is hence
reduced to 1/4G, which should be sufficient.

Signed-off-by: LUU Duc Canh &lt;canh.d.luu@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: fix failover problem</title>
<updated>2018-09-29T18:45:14Z</updated>
<author>
<name>LUU Duc Canh</name>
<email>canh.d.luu@dektech.com.au</email>
</author>
<published>2018-09-26T19:00:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c140eb166d681f66bd7e99fb121357db1a503e7f'/>
<id>urn:sha1:c140eb166d681f66bd7e99fb121357db1a503e7f</id>
<content type='text'>
We see the following scenario:
1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
   Since there is a second working link, failover procedure is started.
2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
   A on node 2. The node item 1-&gt;2 goes to state FAILINGOVER.
3) Linke endpoint A/2 receives the failover, and is supposed to take
   down its parallell link endpoint B/2, while producing a FAILOVER
   message to send back to A/1.
4) However, B/2 has already been deleted, so no FAILOVER message can
   created.
5) Node 1-&gt;2 remains in state FAILINGOVER forever, refusing to receive
   any messages that can bring B/1 up again. We are left with a non-
   redundant link between node 1 and 2.

We fix this with letting endpoint A/2 build a dummy FAILOVER message
to send to back to A/1, so that the situation can be resolved.

Signed-off-by: LUU Duc Canh &lt;canh.d.luu@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux</title>
<updated>2018-07-21T04:17:12Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-07-20T21:45:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c4c5551df136a7c4edd7c2f433d9a296b39826a2'/>
<id>urn:sha1:c4c5551df136a7c4edd7c2f433d9a296b39826a2</id>
<content type='text'>
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: make link capability update thread safe</title>
<updated>2018-07-20T19:36:13Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-07-18T17:50:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=40999f11ce677ce3c5d0e8f5f76c40192a26b479'/>
<id>urn:sha1:40999f11ce677ce3c5d0e8f5f76c40192a26b479</id>
<content type='text'>
The commit referred to below introduced an update of the link
capabilities field that is not safe. Given the recently added
feature to remove idle node and link items after 5 minutes, there
is a small risk that the update will happen at the very moment the
targeted link is being removed. To avoid this we have to perform
the update inside the node item's write lock protection.

Fixes: 9012de508956 ("tipc: add sequence number check for link STATE messages")
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: check session number before accepting link protocol messages</title>
<updated>2018-07-12T06:06:14Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-07-09T23:07:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7ea817f4e8322fa27fb860d15025bf72f68b179f'/>
<id>urn:sha1:7ea817f4e8322fa27fb860d15025bf72f68b179f</id>
<content type='text'>
In some virtual environments we observe a significant higher number of
packet reordering and delays than we have been used to traditionally.

This makes it necessary with stricter checks on incoming link protocol
messages' session number, which until now only has been validated for
RESET messages.

Since the other two message types, ACTIVATE and STATE messages also
carry this number, it is easy to extend the validation check to those
messages.

We also introduce a flag indicating if a link has a valid peer session
number or not. This eliminates the mixing of 32- and 16-bit arithmethics
we are currently using to achieve this.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: add sequence number check for link STATE messages</title>
<updated>2018-07-12T06:06:14Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-07-09T23:07:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9012de5089560136b849b920ad038b96160ed8f6'/>
<id>urn:sha1:9012de5089560136b849b920ad038b96160ed8f6</id>
<content type='text'>
Some switch infrastructures produce huge amounts of packet duplicates.
This becomes a problem if those messages are STATE/NACK protocol
messages, causing unnecessary retransmissions of already accepted
packets.

We now introduce a unique sequence number per STATE protocol message
so that duplicates can be identified and ignored. This will also be
useful when tracing such cases, and to avert replay attacks when TIPC
is encrypted.

For compatibility reasons we have to introduce a new capability flag
TIPC_LINK_PROTO_SEQNO to handle this new feature.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: fix wrong return value from function tipc_node_try_addr()</title>
<updated>2018-07-07T10:49:01Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-07-06T18:10:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2a57f182420174c7fd4b19db979a2d135231a963'/>
<id>urn:sha1:2a57f182420174c7fd4b19db979a2d135231a963</id>
<content type='text'>
The function for checking if there is an node address conflict is
supposed to return a suggestion for a new address if it finds a
conflict, and zero otherwise. But in case the peer being checked
is previously unknown it does instead return a "suggestion" for
the checked address itself. This results in a DSC_TRIAL_FAIL_MSG
being sent unecessarily to the peer, and sometimes makes the trial
period starting over again.

Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values")
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: Auto removal of peer down node instance</title>
<updated>2018-06-30T12:05:23Z</updated>
<author>
<name>GhantaKrishnamurthy MohanKrishna</name>
<email>mohan.krishna.ghanta.krishnamurthy@ericsson.com</email>
</author>
<published>2018-06-29T11:23:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6a939f365bdb03a74b4617bdb4402fc08da088b9'/>
<id>urn:sha1:6a939f365bdb03a74b4617bdb4402fc08da088b9</id>
<content type='text'>
A peer node is considered down if there are no
active links (or) lost contact to the node. In current implementation,
a peer node instance is deleted either if

a) TIPC module is removed (or)
b) Application can use a netlink/iproute2 interface to delete a
specific down node.

Thus, a down node instance lives in the system forever, unless the
application explicitly removes it.

We fix this by deleting the nodes which are down for
a specified amount of time (5 minutes).
Existing node supervision timer is used to achieve this.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: GhantaKrishnamurthy MohanKrishna &lt;mohan.krishna.ghanta.krishnamurthy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: optimize function tipc_node_timeout()</title>
<updated>2018-06-30T11:51:39Z</updated>
<author>
<name>Tung Nguyen</name>
<email>tung.q.nguyen@dektech.com.au</email>
</author>
<published>2018-06-28T20:39:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=759f29b62fb9af5274e7f761f9f4cdfa7bb5a1f2'/>
<id>urn:sha1:759f29b62fb9af5274e7f761f9f4cdfa7bb5a1f2</id>
<content type='text'>
In single-link usage, the function tipc_node_timeout() still iterates
over the whole link array to handle each link. Given that the maximum
number of bearers are 3, there are 2 redundant iterations with lock
grab/release. Since this function is executing very frequently it makes
sense to optimize it.

This commit adds conditional checking to exit from the loop if the
known number of configured links has already been accessed.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Tung Nguyen &lt;tung.q.nguyen@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
