<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/tipc/msg.h, branch v4.20</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.20</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.20'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2018-09-29T18:24:22Z</updated>
<entry>
<title>tipc: buffer overflow handling in listener socket</title>
<updated>2018-09-29T18:24:22Z</updated>
<author>
<name>Tung Nguyen</name>
<email>tung.q.nguyen@dektech.com.au</email>
</author>
<published>2018-09-28T18:23:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6787927475e52f6933e3affce365dabb2aa2fadf'/>
<id>urn:sha1:6787927475e52f6933e3affce365dabb2aa2fadf</id>
<content type='text'>
Default socket receive buffer size for a listener socket is 2Mb. For
each arriving empty SYN, the linux kernel allocates a 768 bytes buffer.
This means that a listener socket can serve maximum 2700 simultaneous
empty connection setup requests before it hits a receive buffer
overflow, and much fewer if the SYN is carrying any significant
amount of data.

When this happens the setup request is rejected, and the client
receives an ECONNREFUSED error.

This commit mitigates this problem by letting the client socket try to
retransmit the SYN message multiple times when it sees it rejected with
the code TIPC_ERR_OVERLOAD. Retransmission is done at random intervals
in the range of [100 ms, setup_timeout / 4], as many times as there is
room for within the setup timeout limit.

Signed-off-by: Tung Nguyen &lt;tung.q.nguyen@dektech.com.au&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: add SYN bit to connection setup messages</title>
<updated>2018-09-29T18:24:22Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-09-28T18:23:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=25b9221b959483f17c2964d0922869e16caa86b5'/>
<id>urn:sha1:25b9221b959483f17c2964d0922869e16caa86b5</id>
<content type='text'>
Messages intended for intitating a connection are currently
indistinguishable from regular datagram messages. The TIPC
protocol specification defines bit 17 in word 0 as a SYN bit
to allow sanity check of such messages in the listening socket,
but this has so far never been implemented.

We do that in this commit.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: handle collisions of 32-bit node address hash values</title>
<updated>2018-03-23T17:12:18Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-03-22T19:42:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=25b0b9c4e835ffaa65b61c3efe2e28acf84d0259'/>
<id>urn:sha1:25b0b9c4e835ffaa65b61c3efe2e28acf84d0259</id>
<content type='text'>
When a 32-bit node address is generated from a 128-bit identifier,
there is a risk of collisions which must be discovered and handled.

We do this as follows:
- We don't apply the generated address immediately to the node, but do
  instead initiate a 1 sec trial period to allow other cluster members
  to discover and handle such collisions.

- During the trial period the node periodically sends out a new type
  of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
  to all the other nodes in the cluster.

- When a node is receiving such a message, it must check that the
  presented 32-bit identifier either is unused, or was used by the very
  same peer in a previous session. In both cases it accepts the request
  by not responding to it.

- If it finds that the same node has been up before using a different
  address, it responds with a DSC_TRIAL_FAIL_MSG containing that
  address.

- If it finds that the address has already been taken by some other
  node, it generates a new, unused address and returns it to the
  requester.

- During the trial period the requesting node must always be prepared
  to accept a failure message, i.e., a message where a peer suggests a
  different (or equal)  address to the one tried. In those cases it
  must apply the suggested value as trial address and restart the trial
  period.

This algorithm ensures that in the vast majority of cases a node will
have the same address before and after a reboot. If a legacy user
configures the address explicitly, there will be no trial period and
messages, so this protocol addition is completely backwards compatible.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: fall back to smaller MTU if allocation of local send skb fails</title>
<updated>2017-12-01T20:21:25Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2017-11-30T15:47:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4c94cc2d3d57a2e843ab10887f67faa82c2337f9'/>
<id>urn:sha1:4c94cc2d3d57a2e843ab10887f67faa82c2337f9</id>
<content type='text'>
When sending node local messages the code is using an 'mtu' of 66060
bytes to avoid unnecessary fragmentation. During situations of low
memory tipc_msg_build() may sometimes fail to allocate such large
buffers, resulting in unnecessary send failures. This can easily be
remedied by falling back to a smaller MTU, and then reassemble the
buffer chain as if the message were arriving from a remote node.

At the same time, we change the initial MTU setting of the broadcast
link to a lower value, so that large messages always are fragmented
into smaller buffers even when we run in single node mode. Apart from
obtaining the same advantage as for the 'fallback' solution above, this
turns out to give a significant performance improvement. This can
probably be explained with the __pskb_copy() operation performed on the
buffer for each recipient during reception. We found the optimal value
for this, considering the most relevant skb pool, to be 3744 bytes.

Acked-by: Ying Xue &lt;ying.xue@ericsson.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: enforce valid ratio between skb truesize and contents</title>
<updated>2017-11-16T01:49:00Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2017-11-15T20:23:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d618d09a68e4eed7a435beb2e355250f6f40664a'/>
<id>urn:sha1:d618d09a68e4eed7a435beb2e355250f6f40664a</id>
<content type='text'>
The socket level flow control is based on the assumption that incoming
buffers meet the condition (skb-&gt;truesize / roundup(skb-&gt;len) &lt;= 4),
where the latter value is rounded off upwards to the nearest 1k number.
This does empirically hold true for the device drivers we know, but we
cannot trust that it will always be so, e.g., in a system with jumbo
frames and very small packets.

We now introduce a check for this condition at packet arrival, and if
we find it to be false, we copy the packet to a new, smaller buffer,
where the condition will be true. We expect this to affect only a small
fraction of all incoming packets, if at all.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: improve link resiliency when rps is activated</title>
<updated>2017-11-11T06:36:05Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2017-11-08T08:59:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d6e79d3ce13e34957de87f7584cbf1bcde74c57'/>
<id>urn:sha1:8d6e79d3ce13e34957de87f7584cbf1bcde74c57</id>
<content type='text'>
Currently, the TIPC RPS dissector is based only on the incoming packets'
source node address, hence steering all traffic from a node to the same
core. We have seen that this makes the links vulnerable to starvation
and unnecessary resets when we turn down the link tolerance to very low
values.

To reduce the risk of this happening, we exempt probe and probe replies
packets from the convergence to one core per source node. Instead, we do
the opposite, - we try to diverge those packets across as many cores as
possible, by randomizing the flow selector key.

To make such packets identifiable to the dissector, we add a new
'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is
set both for PROBE and PROBE_REPLY messages, and only for those.

It should be noted that these packets are not part of any flow anyway,
and only constitute a minuscule fraction of all packets sent across a
link. Hence, there is no risk that this will affect overall performance.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: add multipoint-to-point flow control</title>
<updated>2017-10-13T15:46:01Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2017-10-13T09:04:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04d7b574b245c66001a33cb9da2c0311063af73f'/>
<id>urn:sha1:04d7b574b245c66001a33cb9da2c0311063af73f</id>
<content type='text'>
We already have point-to-multipoint flow control within a group. But
we even need the opposite; -a scheme which can handle that potentially
hundreds of sources may try to send messages to the same destination
simultaneously without causing buffer overflow at the recipient. This
commit adds such a mechanism.

The algorithm works as follows:

- When a member detects a new, joining member, it initially set its
  state to JOINED and advertises a minimum window to the new member.
  This window is chosen so that the new member can send exactly one
  maximum sized message, or several smaller ones, to the recipient
  before it must stop and wait for an additional advertisement. This
  minimum window ADV_IDLE is set to 65 1kB blocks.

- When a member receives the first data message from a JOINED member,
  it changes the state of the latter to ACTIVE, and advertises a larger
  window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can
  continue sending with minimal disturbances to the data flow.

- The active members are kept in a dedicated linked list. Each time a
  message is received from an active member, it will be moved to the
  tail of that list. This way, we keep a record of which members have
  been most (tail) and least (head) recently active.

- There is a maximum number (16) of permitted simultaneous active
  senders per receiver. When this limit is reached, the receiver will
  not advertise anything immediately to a new sender, but instead put
  it in a PENDING state, and add it to a corresponding queue. At the
  same time, it will pick the least recently active member, send it an
  advertisement RECLAIM message, and set this member to state
  RECLAIMING.

- The reclaimee member has to respond with a REMIT message, meaning that
  it goes back to a send window of ADV_IDLE, and returns its unused
  advertised blocks beyond that value to the reclaiming member.

- When the reclaiming member receives the REMIT message, it unlinks
  the reclaimee from its active list, resets its state to JOINED, and
  notes that it is now back at ADV_IDLE advertised blocks to that
  member. If there are still unread data messages sent out by
  reclaimee before the REMIT, the member goes into an intermediate
  state REMITTED, where it stays until the said messages have been
  consumed.

- The returned advertised blocks can now be re-advertised to the
  pending member, which is now set to state ACTIVE and added to
  the active member list.

- To be proactive, i.e., to minimize the risk that any member will
  end up in the pending queue, we start reclaiming resources already
  when the number of active members exceeds 3/4 of the permitted
  maximum.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: guarantee that group broadcast doesn't bypass group unicast</title>
<updated>2017-10-13T15:46:01Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2017-10-13T09:04:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2f487712b89376fce267223bbb0db93d393d4b09'/>
<id>urn:sha1:2f487712b89376fce267223bbb0db93d393d4b09</id>
<content type='text'>
We need a mechanism guaranteeing that group unicasts sent out from a
socket are not bypassed by later sent broadcasts from the same socket.
We do this as follows:

- Each time a unicast is sent, we set a the broadcast method for the
  socket to "replicast" and "mandatory". This forces the first
  subsequent broadcast message to follow the same network and data path
  as the preceding unicast to a destination, hence preventing it from
  overtaking the latter.

- In order to make the 'same data path' statement above true, we let
  group unicasts pass through the multicast link input queue, instead
  of as previously through the unicast link input queue.

- In the first broadcast following a unicast, we set a new header flag,
  requiring all recipients to immediately acknowledge its reception.

- During the period before all the expected acknowledges are received,
  the socket refuses to accept any more broadcast attempts, i.e., by
  blocking or returning EAGAIN. This period should typically not be
  longer than a few microseconds.

- When all acknowledges have been received, the sending socket will
  open up for subsequent broadcasts, this time giving the link layer
  freedom to itself select the best transmission method.

- The forced and/or abrupt transmission method changes described above
  may lead to broadcasts arriving out of order to the recipients. We
  remedy this by introducing code that checks and if necessary
  re-orders such messages at the receiving end.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: introduce group multicast messaging</title>
<updated>2017-10-13T15:46:01Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2017-10-13T09:04:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5b8dddb63769587badc50725ec9857caaeba4de0'/>
<id>urn:sha1:5b8dddb63769587badc50725ec9857caaeba4de0</id>
<content type='text'>
The previously introduced message transport to all group members is
based on the tipc multicast service, but is logically a broadcast
service within the group, and that is what we call it.

We now add functionality for sending messages to all group members
having a certain identity. Correspondingly, we call this feature 'group
multicast'. The service is using unicast when only one destination is
found, otherwise it will use the bearer broadcast service to transfer
the messages. In the latter case, the receiving members filter arriving
messages by looking at the intended destination instance. If there is
no match, the message will be dropped, while still being considered
received and read as seen by the flow control mechanism.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: introduce group unicast messaging</title>
<updated>2017-10-13T15:46:00Z</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2017-10-13T09:04:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27bd9ec027f396457d1a147043c92ff22fc4c71e'/>
<id>urn:sha1:27bd9ec027f396457d1a147043c92ff22fc4c71e</id>
<content type='text'>
We now make it possible to send connectionless unicast messages
within a communication group. To send a message, the sender can use
either a direct port address, aka port identity, or an indirect port
name to be looked up.

This type of messages are subject to the same start synchronization
and flow control mechanism as group broadcast messages.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
