<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/tipc/socket.h, branch v4.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-03-18T02:11:26Z</updated>
<entry>
<title>tipc: fix netns refcnt leak</title>
<updated>2015-03-18T02:11:26Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2015-03-18T01:32:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=76100a8a64bc2ae898bc49d51dd28c1f4f5ed37b'/>
<id>urn:sha1:76100a8a64bc2ae898bc49d51dd28c1f4f5ed37b</id>
<content type='text'>
When the TIPC module is loaded, we launch a topology server in kernel
space, which in its turn is creating TIPC sockets for communication
with topology server users. Because both the socket's creator and
provider reside in the same module, it is necessary that the TIPC
module's reference count remains zero after the server is started and
the socket created; otherwise it becomes impossible to perform "rmmod"
even on an idle module.

Currently, we achieve this by defining a separate "tipc_proto_kern"
protocol struct, that is used only for kernel space socket allocations.
This structure has the "owner" field set to NULL, which restricts the
module reference count from being be bumped when sk_alloc() for local
sockets is called. Furthermore, we have defined three kernel-specific
functions, tipc_sock_create_local(), tipc_sock_release_local() and
tipc_sock_accept_local(), to avoid the module counter being modified
when module local sockets are created or deleted. This has worked well
until we introduced name space support.

However, after name space support was introduced, we have observed that
a reference count leak occurs, because the netns counter is not
decremented in tipc_sock_delete_local().

This commit remedies this problem. But instead of just modifying
tipc_sock_delete_local(), we eliminate the whole parallel socket
handling infrastructure, and start using the regular sk_create_kern(),
kernel_accept() and sk_release_kernel() calls. Since those functions
manipulate the module counter, we must now compensate for that by
explicitly decrementing the counter after module local sockets are
created, and increment it just before calling sk_release_kernel().

Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericson.com&gt;
Reviewed-by: Erik Hugne &lt;erik.hugne@ericsson.com&gt;
Reported-by: Cong Wang &lt;cwang@twopensource.com&gt;
Tested-by: Erik Hugne &lt;erik.hugne@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: convert legacy nl socket dump to nl compat</title>
<updated>2015-02-09T21:20:48Z</updated>
<author>
<name>Richard Alpe</name>
<email>richard.alpe@ericsson.com</email>
</author>
<published>2015-02-09T08:50:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=487d2a3a1326d339ce273ffbcd03247f2b7b052e'/>
<id>urn:sha1:487d2a3a1326d339ce273ffbcd03247f2b7b052e</id>
<content type='text'>
Convert socket (port) listing to compat dumpit call. If a socket
(port) has publications a second dumpit call is issued to collect them
and format then into the legacy buffer before continuing to process
the sockets (ports).

Command converted in this patch:
TIPC_CMD_SHOW_PORTS

Signed-off-by: Richard Alpe &lt;richard.alpe@ericsson.com&gt;
Reviewed-by: Erik Hugne &lt;erik.hugne@ericsson.com&gt;
Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: eliminate race condition at multicast reception</title>
<updated>2015-02-06T00:00:03Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-02-05T13:36:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cb1b728096f54e7408d60fb571944bed00c5b771'/>
<id>urn:sha1:cb1b728096f54e7408d60fb571944bed00c5b771</id>
<content type='text'>
In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.

Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: simplify socket multicast reception</title>
<updated>2015-02-06T00:00:03Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-02-05T13:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3c724acdd5049907555a831f814bfd5927c3350c'/>
<id>urn:sha1:3c724acdd5049907555a831f814bfd5927c3350c</id>
<content type='text'>
The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.

In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().

Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: resolve race problem at unicast message reception</title>
<updated>2015-02-06T00:00:02Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-02-05T13:36:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c637c1035534867b85b78b453c38c495b58e2c5a'/>
<id>urn:sha1:c637c1035534867b85b78b453c38c495b58e2c5a</id>
<content type='text'>
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.

We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.

This solves the sequentiality problem, and seems to cause no measurable
performance degradation.

A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.

Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: make subscriber server support net namespace</title>
<updated>2015-01-12T21:24:33Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2015-01-09T07:27:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a62fbccecd62bacb4416fc427239f5b43b25d05e'/>
<id>urn:sha1:a62fbccecd62bacb4416fc427239f5b43b25d05e</id>
<content type='text'>
TIPC establishes one subscriber server which allows users to subscribe
their interesting name service status. After tipc supports namespace,
one dedicated tipc stack instance is created for each namespace, and
each instance can be deemed as one independent TIPC node. As a result,
subscriber server must be built for each namespace.

Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Tested-by: Tero Aho &lt;Tero.Aho@coriant.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: make tipc socket support net namespace</title>
<updated>2015-01-12T21:24:33Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2015-01-09T07:27:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e05b31f4bf8994d49322e9afb004ad479a129db0'/>
<id>urn:sha1:e05b31f4bf8994d49322e9afb004ad479a129db0</id>
<content type='text'>
Now tipc socket table is statically allocated as a global variable.
Through it, we can look up one socket instance with port ID, insert
a new socket instance to the table, and delete a socket from the
table. But when tipc supports net namespace, each namespace must own
its specific socket table. So the global variable of socket table
must be redefined in tipc_net structure. As a concequence, a new
socket table will be allocated when a new namespace is created, and
a socket table will be deallocated when namespace is destroyed.

Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Tested-by: Tero Aho &lt;Tero.Aho@coriant.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: make tipc node table aware of net namespace</title>
<updated>2015-01-12T21:24:32Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2015-01-09T07:27:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f2f9800d4955a96d92896841d8ba9b04201deaa1'/>
<id>urn:sha1:f2f9800d4955a96d92896841d8ba9b04201deaa1</id>
<content type='text'>
Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Tested-by: Tero Aho &lt;Tero.Aho@coriant.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: cleanup core.c and core.h files</title>
<updated>2015-01-12T21:24:31Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2015-01-09T07:27:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=859fc7c0cedca0f84dac471fa31e9512259e1ecd'/>
<id>urn:sha1:859fc7c0cedca0f84dac471fa31e9512259e1ecd</id>
<content type='text'>
Only the works of initializing and shutting down tipc module are done
in core.h and core.c files, so all stuffs which are not closely
associated with the two tasks should be moved to appropriate places.

Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Tested-by: Tero Aho &lt;Tero.Aho@coriant.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: convert tipc reference table to use generic rhashtable</title>
<updated>2015-01-09T03:47:14Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2015-01-07T05:41:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=07f6c4bc048a7a8939c68a668bf77474890794c5'/>
<id>urn:sha1:07f6c4bc048a7a8939c68a668bf77474890794c5</id>
<content type='text'>
As tipc reference table is statically allocated, its memory size
requested on stack initialization stage is quite big even if the
maximum port number is just restricted to 8191 currently, however,
the number already becomes insufficient in practice. But if the
maximum ports is allowed to its theory value - 2^32, its consumed
memory size will reach a ridiculously unacceptable value. Apart from
this, heavy tipc users spend a considerable amount of time in
tipc_sk_get() due to the read-lock on ref_table_lock.

If tipc reference table is converted with generic rhashtable, above
mentioned both disadvantages would be resolved respectively: making
use of the new resizable hash table can avoid locking on the lookup;
smaller memory size is required at initial stage, for example, 256
hash bucket slots are requested at the beginning phase instead of
allocating the entire 8191 slots in old mode. The hash table will
grow if entries exceeds 75% of table size up to a total table size
of 1M, and it will automatically shrink if usage falls below 30%,
but the minimum table size is allowed down to 256.

Also converts ref_table_lock to a separate mutex to protect hash table
mutations on write side. Lastly defers the release of the socket
reference using call_rcu() to allow using an RCU read-side protected
call to rhashtable_lookup().

Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Erik Hugne &lt;erik.hugne@ericsson.com&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
