<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/tipc/msg.h, branch v4.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-10-29T21:21:09Z</updated>
<entry>
<title>tipc: fix broadcast link synchronization problem</title>
<updated>2016-10-29T21:21:09Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-10-27T22:51:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=06bd2b1ed04ca9fdbc767859885944a1e8b86b40'/>
<id>urn:sha1:06bd2b1ed04ca9fdbc767859885944a1e8b86b40</id>
<content type='text'>
In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.

We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.

We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.

To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.

This minor protocol update is fully backwards compatible.

Reported-by: John Thompson &lt;thompa.atl@gmail.com&gt;
Tested-by: John Thompson &lt;thompa.atl@gmail.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: transfer broadcast nacks in link state messages</title>
<updated>2016-09-03T00:10:24Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-09-01T17:52:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02d11ca20091fcef904f05defda80c53e5b4e793'/>
<id>urn:sha1:02d11ca20091fcef904f05defda80c53e5b4e793</id>
<content type='text'>
When we send broadcasts in clusters of more 70-80 nodes, we sometimes
see the broadcast link resetting because of an excessive number of
retransmissions. This is caused by a combination of two factors:

1) A 'NACK crunch", where loss of broadcast packets is discovered
   and NACK'ed by several nodes simultaneously, leading to multiple
   redundant broadcast retransmissions.

2) The fact that the NACKS as such also are sent as broadcast, leading
   to excessive load and packet loss on the transmitting switch/bridge.

This commit deals with the latter problem, by moving sending of
broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
bits in word 8 of the said message for this purpose, and introduce a
new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
backwards compatible.

Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: unclone unbundled buffers before forwarding</title>
<updated>2016-06-22T20:33:35Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-06-20T13:20:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27777daa8b6df0c19aaf591d1536a586b3eb5e36'/>
<id>urn:sha1:27777daa8b6df0c19aaf591d1536a586b3eb5e36</id>
<content type='text'>
When extracting an individual message from a received "bundle" buffer,
we just create a clone of the base buffer, and adjust it to point into
the right position of the linearized data area of the latter. This works
well for regular message reception, but during periods of extremely high
load it may happen that an extracted buffer, e.g, a connection probe, is
reversed and forwarded through an external interface while the preceding
extracted message is still unhandled. When this happens, the header or
data area of the preceding message will be partially overwritten by a
MAC header, leading to unpredicatable consequences, such as a link
reset.

We now fix this by ensuring that the msg_reverse() function never
returns a cloned buffer, and that the returned buffer always contains
sufficient valid head and tail room to be forwarded.

Reported-by: Erik Hugne &lt;erik.hugne@gmail.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: redesign connection-level flow control</title>
<updated>2016-05-03T19:51:16Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-05-02T15:58:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=10724cc7bb7832b482df049c20fd824d928c5eaa'/>
<id>urn:sha1:10724cc7bb7832b482df049c20fd824d928c5eaa</id>
<content type='text'>
There are two flow control mechanisms in TIPC; one at link level that
handles network congestion, burst control, and retransmission, and one
at connection level which' only remaining task is to prevent overflow
in the receiving socket buffer. In TIPC, the latter task has to be
solved end-to-end because messages can not be thrown away once they
have been accepted and delivered upwards from the link layer, i.e, we
can never permit the receive buffer to overflow.

Currently, this algorithm is message based. A counter in the receiving
socket keeps track of number of consumed messages, and sends a dedicated
acknowledge message back to the sender for each 256 consumed message.
A counter at the sending end keeps track of the sent, not yet
acknowledged messages, and blocks the sender if this number ever reaches
512 unacknowledged messages. When the missing acknowledge arrives, the
socket is then woken up for renewed transmission. This works well for
keeping the message flow running, as it almost never happens that a
sender socket is blocked this way.

A problem with the current mechanism is that it potentially is very
memory consuming. Since we don't distinguish between small and large
messages, we have to dimension the socket receive buffer according
to a worst-case of both. I.e., the window size must be chosen large
enough to sustain a reasonable throughput even for the smallest
messages, while we must still consider a scenario where all messages
are of maximum size. Hence, the current fix window size of 512 messages
and a maximum message size of 66k results in a receive buffer of 66 MB
when truesize(66k) = 131k is taken into account. It is possible to do
much better.

This commit introduces an algorithm where we instead use 1024-byte
blocks as base unit. This unit, always rounded upwards from the
actual message size, is used when we advertise windows as well as when
we count and acknowledge transmitted data. The advertised window is
based on the configured receive buffer size in such a way that even
the worst-case truesize/msgsize ratio always is covered. Since the
smallest possible message size (from a flow control viewpoint) now is
1024 bytes, we can safely assume this ratio to be less than four, which
is the value we are now using.

This way, we have been able to reduce the default receive buffer size
from 66 MB to 2 MB with maintained performance.

In order to keep this solution backwards compatible, we introduce a
new capability bit in the discovery protocol, and use this throughout
the message sending/reception path to always select the right unit.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: guarantee peer bearer id exchange after reboot</title>
<updated>2016-04-15T20:09:05Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-04-15T17:33:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=634696b197411e7a95b346d6e5c21841f29fcedd'/>
<id>urn:sha1:634696b197411e7a95b346d6e5c21841f29fcedd</id>
<content type='text'>
When a link endpoint is going down locally, e.g., because its interface
is being stopped, it will spontaneously send out a RESET message to
its peer, informing it about this fact. This saves the peer from
detecting the failure via probing, and hence gives both speedier and
less resource consuming failure detection on the peer side.

According to the link FSM, a receiver of a RESET message, ignoring the
reason for it, must now consider the sender ready to come back up, and
starts periodically sending out ACTIVATE messages to the peer in order
to re-establish the link. Also, according to the FSM, the receiver of
an ACTIVATE message can now go directly to state ESTABLISHED and start
sending regular traffic packets. This is a well-proven and robust FSM.

However, in the case of a reboot, there is a small possibilty that link
endpoint on the rebooted node may have been re-created with a new bearer
identity between the moment it sent its (pre-boot) RESET and the moment
it receives the ACTIVATE from the peer. The new bearer identity cannot
be known by the peer according to this scenario, since traffic headers
don't convey such information. This is a problem, because both endpoints
need to know the correct value of the peer's bearer id at any moment in
time in order to be able to produce correct link events for their users.

The only way to guarantee this is to enforce a full setup message
exchange (RESET + ACTIVATE) even after the reboot, since those messages
carry the bearer idientity in their header.

In this commit we do this by introducing and setting a "stopping" bit in
the header of the spontaneously generated RESET messages, informing the
peer that the sender will not be immediately ready to re-establish the
link. A receiver seeing this bit must act as if this were a locally
detected connectivity failure, and hence has to go through a full two-
way setup message exchange before any link can be re-established.

Although never reported, this problem seems to have always been around.

This protocol addition is fully backwards compatible.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: stricter filtering of packets in bearer layer</title>
<updated>2016-04-07T21:00:13Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-04-07T14:09:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5b7066c3dd24c7d538e5ee402eb24bb182c16dab'/>
<id>urn:sha1:5b7066c3dd24c7d538e5ee402eb24bb182c16dab</id>
<content type='text'>
Resetting a bearer/interface, with the consequence of resetting all its
pertaining links, is not an atomic action. This becomes particularly
evident in very large clusters, where a lot of traffic may happen on the
remaining links while we are busy shutting them down. In extreme cases,
we may even see links being re-created and re-established before we are
finished with the job.

To solve this, we now introduce a solution where we temporarily detach
the bearer from the interface when the bearer is reset. This inhibits
all packet reception, while sending still is possible. For the latter,
we use the fact that the device's user pointer now is zero to filter out
which packets can be sent during this situation; i.e., outgoing RESET
messages only.  This filtering serves to speed up the neighbors'
detection of the loss event, and saves us from unnecessary probing.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: let broadcast packet reception use new link receive function</title>
<updated>2015-10-24T13:56:37Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-10-22T12:51:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5266698661401afc5e4a1a521cf9ba10724d10dd'/>
<id>urn:sha1:5266698661401afc5e4a1a521cf9ba10724d10dd</id>
<content type='text'>
The code path for receiving broadcast packets is currently distinct
from the unicast path. This leads to unnecessary code and data
duplication, something that can be avoided with some effort.

We now introduce separate per-peer tipc_link instances for handling
broadcast packet reception. Each receive link keeps a pointer to the
common, single, broadcast link instance, and can hence handle release
and retransmission of send buffers as if they belonged to the own
instance.

Furthermore, we let each unicast link instance keep a reference to both
the pertaining broadcast receive link, and to the common send link.
This makes it possible for the unicast links to easily access data for
broadcast link synchronization, as well as for carrying acknowledges for
received broadcast packets.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: let broadcast transmission use new link transmit function</title>
<updated>2015-10-24T13:56:32Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-10-22T12:51:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2f566124570625c29c3fd79bac4d9cd97c0c31a1'/>
<id>urn:sha1:2f566124570625c29c3fd79bac4d9cd97c0c31a1</id>
<content type='text'>
This commit simplifies the broadcast link transmission function, by
leveraging previous changes to the link transmission function and the
broadcast transmission link life cycle.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: make struct tipc_link generic to support broadcast</title>
<updated>2015-10-24T13:56:32Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-10-22T12:51:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c1ab3f1dea3df566ad38caf98baf69c656679090'/>
<id>urn:sha1:c1ab3f1dea3df566ad38caf98baf69c656679090</id>
<content type='text'>
Realizing that unicast is just a special case of broadcast, we also see
that we can go in the other direction, i.e., that modest changes to the
current unicast link can make it generic enough to support broadcast.

The following changes are introduced here:

- A new counter ("ackers") in struct tipc_link, to indicate how many
  peers need to ack a packet before it can be released.
- A corresponding counter in the skb user area, to keep track of how
  many peers a are left to ack before a buffer can be released.
- A new counter ("acked"), to keep persistent track of how far a peer
  has acked at the moment, i.e., where in the transmission queue to
  start updating buffers when the next ack arrives. This is to avoid
  double acknowledgements from a peer, with inadvertent relase of
  packets as a result.
- A more generic tipc_link_retrans() function, where retransmit starts
  from a given sequence number, instead of the first packet in the
  transmision queue. This is to minimize the number of retransmitted
  packets on the broadcast media.

When the new functionality is taken into use in the next commits,
we expect it to have minimal effect on unicast mode performance.

Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-10-20T13:08:27Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-10-20T13:08:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=26440c835f8b1a491e2704118ac55bf87334366c'/>
<id>urn:sha1:26440c835f8b1a491e2704118ac55bf87334366c</id>
<content type='text'>
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
