<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/openvswitch/flow.h, branch v3.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-06-29T21:10:51Z</updated>
<entry>
<title>openvswitch: Fix tracking of flags seen in TCP flows.</title>
<updated>2014-06-29T21:10:51Z</updated>
<author>
<name>Ben Pfaff</name>
<email>blp@nicira.com</email>
</author>
<published>2014-05-06T23:48:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad55200734c65a3ec5d0c39d6ea904008baea536'/>
<id>urn:sha1:ad55200734c65a3ec5d0c39d6ea904008baea536</id>
<content type='text'>
Flow statistics need to take into account the TCP flags from the packet
currently being processed (in 'key'), not the TCP flags matched by the
flow found in the kernel flow table (in 'flow').

This bug made the Open vSwitch userspace fin_timeout action have no effect
in many cases.
This bug is introduced by commit 88d73f6c411ac2f0578 (openvswitch: Use
TCP flags in the flow key for stats.)

Reported-by: Len Gao &lt;leng@vmware.com&gt;
Signed-off-by: Ben Pfaff &lt;blp@nicira.com&gt;
Acked-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Acked-by: Jesse Gross &lt;jesse@nicira.com&gt;
Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Fix ovs_flow_stats_get/clear RCU dereference.</title>
<updated>2014-05-22T23:27:35Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2014-05-05T21:17:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86ec8dbae27e5fa2b5d54f10f77286d9ef55732a'/>
<id>urn:sha1:86ec8dbae27e5fa2b5d54f10f77286d9ef55732a</id>
<content type='text'>
For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
flow dumps call this with RCU read lock.

ovs_flow_stats_clear() is always called with ovs_mutex, so can use
ovsl_dereference().

Also, make the ovs_flow_stats_get() 'flow' argument const to make
later patches cleaner.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Compact sw_flow_key.</title>
<updated>2014-05-22T23:27:34Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2014-05-05T16:54:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1139e241ec436b9e9610c7a33ac5c6657f87fda1'/>
<id>urn:sha1:1139e241ec436b9e9610c7a33ac5c6657f87fda1</id>
<content type='text'>
Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128-&gt;120 bytes).  These changes also make the keys for IPv4
packets to fit in one cache line.

There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:

- sw_flow_key itself is 64-bit aligned, so the tun_id within is
  always
  64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force
  every
  second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the
  sw_flow_key,
  it is in stack (on tunnel input functions), where compiler has full
  control of the alignment.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Per NUMA node flow stats.</title>
<updated>2014-05-16T20:40:29Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2014-03-27T19:42:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63e7959c4b9bd6f791061c460a22d9ee32ae2240'/>
<id>urn:sha1:63e7959c4b9bd6f791061c460a22d9ee32ae2240</id>
<content type='text'>
Keep kernel flow stats for each NUMA node rather than each (logical)
CPU.  This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master.  Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace.  The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows.  This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
+   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
+   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
+   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
+   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
+   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
+   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
+   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
+   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0).  As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated.  This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency.  If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated.  This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Remove 5-tuple optimization.</title>
<updated>2014-05-16T20:40:29Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2014-03-27T19:35:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=23dabf88abb48a866fdb19ee08ebcf1ddd9b1840'/>
<id>urn:sha1:23dabf88abb48a866fdb19ee08ebcf1ddd9b1840</id>
<content type='text'>
The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch.  Remove it first to make the changes easier to
grasp.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Per cpu flow stats.</title>
<updated>2014-01-06T23:52:24Z</updated>
<author>
<name>Pravin B Shelar</name>
<email>pshelar@nicira.com</email>
</author>
<published>2013-10-30T00:22:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e298e505700604c97e6a9edb21cebb080bdb91f6'/>
<id>urn:sha1:e298e505700604c97e6a9edb21cebb080bdb91f6</id>
<content type='text'>
With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).</title>
<updated>2014-01-06T23:51:27Z</updated>
<author>
<name>Ben Pfaff</name>
<email>blp@nicira.com</email>
</author>
<published>2013-11-25T18:41:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f49ce1135676e5790d8ac5f8ecb2a218c07a33a'/>
<id>urn:sha1:8f49ce1135676e5790d8ac5f8ecb2a218c07a33a</id>
<content type='text'>
We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.

This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems.  On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.

Compile tested only.

Signed-off-by: Ben Pfaff &lt;blp@nicira.com&gt;
Acked-by: Andy Zhou &lt;azhou@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: TCP flags matching support.</title>
<updated>2013-11-02T01:43:45Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2013-10-23T08:44:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5eb26b156e29eadcc21f73fb5d14497f0db24b86'/>
<id>urn:sha1:5eb26b156e29eadcc21f73fb5d14497f0db24b86</id>
<content type='text'>
    tcp_flags=flags/mask
        Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
        bers written in decimal or in hexadecimal prefixed by 0x.   Each
        1-bit  in  mask requires that the corresponding bit in port must
        match.  Each 0-bit in mask causes the corresponding  bit  to  be
        ignored.

        TCP  protocol  currently  defines  9 flag bits, and additional 3
        bits are reserved (must be transmitted as zero), see  RFCs  793,
        3168, and 3540.  The flag bits are, numbering from the least
        significant bit:

        0: FIN No more data from sender.

        1: SYN Synchronize sequence numbers.

        2: RST Reset the connection.

        3: PSH Push function.

        4: ACK Acknowledgement field significant.

        5: URG Urgent pointer field significant.

        6: ECE ECN Echo.

        7: CWR Congestion Windows Reduced.

        8: NS  Nonce Sum.

        9-11:  Reserved.

        12-15: Not matchable, must be zero.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Widen TCP flags handling.</title>
<updated>2013-11-02T01:43:45Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2013-10-23T08:40:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df23e9f642830f10c505c8a3d57772ad1238c701'/>
<id>urn:sha1:df23e9f642830f10c505c8a3d57772ad1238c701</id>
<content type='text'>
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Restructure datapath.c and flow.c</title>
<updated>2013-10-04T01:16:47Z</updated>
<author>
<name>Pravin B Shelar</name>
<email>pshelar@nicira.com</email>
</author>
<published>2013-10-04T01:16:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e64457191a259537bbbfaebeba9a8043786af96f'/>
<id>urn:sha1:e64457191a259537bbbfaebeba9a8043786af96f</id>
<content type='text'>
Over the time datapath.c and flow.c has became pretty large files.
Following patch restructures functionality of component into three
different components:

flow.c: contains flow extract.
flow_netlink.c: netlink flow api.
flow_table.c: flow table api.

This patch restructures code without changing logic.

Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
</feed>
