<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/openvswitch/flow.c, branch v3.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-03-28T20:41:53Z</updated>
<entry>
<title>openvswitch: fix a possible deadlock and lockdep warning</title>
<updated>2014-03-28T20:41:53Z</updated>
<author>
<name>Flavio Leitner</name>
<email>fbl@redhat.com</email>
</author>
<published>2014-03-27T14:05:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4f647e0a3c37b8d5086214128614a136064110c3'/>
<id>urn:sha1:4f647e0a3c37b8d5086214128614a136064110c3</id>
<content type='text'>
There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +-&gt;spin_lock remote CPU#1        ovs_flow_stats_get()
 |  &lt;interrupted&gt;                  stats_read()
 |  ...                       +--&gt;  spin_lock remote CPU#0
 |                            |     &lt;interrupted&gt;
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 &lt;--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross &lt;jesse@nicira.com&gt;

=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -&gt; {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&amp;(&amp;cpu_stats-&gt;lock)-&gt;rlock){+.?...}, at: [&lt;ffffffffa05dd8a1&gt;] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[&lt;ffffffff810f973f&gt;] __lock_acquire+0x68f/0x1c40
[&lt;ffffffff810fb4e2&gt;] lock_acquire+0xa2/0x1d0
[&lt;ffffffff817d8d9e&gt;] _raw_spin_lock+0x3e/0x80
[&lt;ffffffffa05dd9e4&gt;] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[&lt;ffffffffa05da855&gt;] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[&lt;ffffffffa05daf05&gt;] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[&lt;ffffffffa05db41d&gt;] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[&lt;ffffffff816c245d&gt;] genl_family_rcv_msg+0x1cd/0x3f0
[&lt;ffffffff816c270e&gt;] genl_rcv_msg+0x8e/0xd0
[&lt;ffffffff816c0239&gt;] netlink_rcv_skb+0xa9/0xc0
[&lt;ffffffff816c0798&gt;] genl_rcv+0x28/0x40
[&lt;ffffffff816bf830&gt;] netlink_unicast+0x100/0x1e0
[&lt;ffffffff816bfc57&gt;] netlink_sendmsg+0x347/0x770
[&lt;ffffffff81668e9c&gt;] sock_sendmsg+0x9c/0xe0
[&lt;ffffffff816692d9&gt;] ___sys_sendmsg+0x3a9/0x3c0
[&lt;ffffffff8166a911&gt;] __sys_sendmsg+0x51/0x90
[&lt;ffffffff8166a962&gt;] SyS_sendmsg+0x12/0x20
[&lt;ffffffff817e3ce9&gt;] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [&lt;ffffffff8175d5e0&gt;] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [&lt;ffffffff8175d59b&gt;] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [&lt;ffffffff8109be12&gt;] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [&lt;ffffffff8109db05&gt;] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&amp;(&amp;cpu_stats-&gt;lock)-&gt;rlock);
  &lt;Interrupt&gt;
    lock(&amp;(&amp;cpu_stats-&gt;lock)-&gt;rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&amp;ifa-&gt;dad_timer))){+.-...}, at: [&lt;ffffffff810a7155&gt;] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [&lt;ffffffff81788a55&gt;] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [&lt;ffffffff8175d149&gt;] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [&lt;ffffffff8168ba75&gt;] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [&lt;ffffffffa05e41b5&gt;] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 &lt;IRQ&gt;  [&lt;ffffffff817cfe3c&gt;] dump_stack+0x4d/0x66
 [&lt;ffffffff817cb6da&gt;] print_usage_bug+0x1f4/0x205
 [&lt;ffffffff810f7f10&gt;] ? check_usage_backwards+0x180/0x180
 [&lt;ffffffff810f8963&gt;] mark_lock+0x223/0x2b0
 [&lt;ffffffff810f96d3&gt;] __lock_acquire+0x623/0x1c40
 [&lt;ffffffff810f5707&gt;] ? __lock_is_held+0x57/0x80
 [&lt;ffffffffa05e26c6&gt;] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [&lt;ffffffff810fb4e2&gt;] lock_acquire+0xa2/0x1d0
 [&lt;ffffffffa05dd8a1&gt;] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [&lt;ffffffff817d8d9e&gt;] _raw_spin_lock+0x3e/0x80
 [&lt;ffffffffa05dd8a1&gt;] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [&lt;ffffffffa05dd8a1&gt;] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [&lt;ffffffffa05dcc64&gt;] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [&lt;ffffffff810f93f7&gt;] ? __lock_acquire+0x347/0x1c40
 [&lt;ffffffffa05e3bea&gt;] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [&lt;ffffffffa05e4218&gt;] internal_dev_xmit+0x68/0x110 [openvswitch]
 [&lt;ffffffffa05e41b5&gt;] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [&lt;ffffffff8168b4a6&gt;] dev_hard_start_xmit+0x2e6/0x8b0
 [&lt;ffffffff8168be87&gt;] __dev_queue_xmit+0x417/0x9b0
 [&lt;ffffffff8168ba75&gt;] ? __dev_queue_xmit+0x5/0x9b0
 [&lt;ffffffff8175d5e0&gt;] ? ip6_finish_output2+0x4f0/0x840
 [&lt;ffffffff8168c430&gt;] dev_queue_xmit+0x10/0x20
 [&lt;ffffffff8175d641&gt;] ip6_finish_output2+0x551/0x840
 [&lt;ffffffff8176128a&gt;] ? ip6_finish_output+0x9a/0x220
 [&lt;ffffffff8176128a&gt;] ip6_finish_output+0x9a/0x220
 [&lt;ffffffff8176145f&gt;] ip6_output+0x4f/0x1f0
 [&lt;ffffffff81788c29&gt;] mld_sendpack+0x1d9/0x4a0
 [&lt;ffffffff817895b8&gt;] mld_send_initial_cr.part.32+0x88/0xa0
 [&lt;ffffffff817691b0&gt;] ? addrconf_dad_completed+0x220/0x220
 [&lt;ffffffff8178e301&gt;] ipv6_mc_dad_complete+0x31/0x50
 [&lt;ffffffff817690d7&gt;] addrconf_dad_completed+0x147/0x220
 [&lt;ffffffff817691b0&gt;] ? addrconf_dad_completed+0x220/0x220
 [&lt;ffffffff8176934f&gt;] addrconf_dad_timer+0x19f/0x1c0
 [&lt;ffffffff810a71e9&gt;] call_timer_fn+0x99/0x320
 [&lt;ffffffff810a7155&gt;] ? call_timer_fn+0x5/0x320
 [&lt;ffffffff817691b0&gt;] ? addrconf_dad_completed+0x220/0x220
 [&lt;ffffffff810a76c4&gt;] run_timer_softirq+0x254/0x3b0
 [&lt;ffffffff8109d47d&gt;] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner &lt;fbl@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: Correctly report flow used times for first 5 minutes after boot.</title>
<updated>2014-03-20T17:45:21Z</updated>
<author>
<name>Ben Pfaff</name>
<email>blp@nicira.com</email>
</author>
<published>2014-03-20T17:45:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9b8c4c8baded129535d82d74df8e87a7a369f54'/>
<id>urn:sha1:f9b8c4c8baded129535d82d74df8e87a7a369f54</id>
<content type='text'>
The kernel starts out its "jiffies" timer as 5 minutes below zero, as
shown in include/linux/jiffies.h:

  /*
   * Have the 32 bit jiffies value wrap 5 minutes after boot
   * so jiffies wrap bugs show up earlier.
   */
  #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))

The loop in ovs_flow_stats_get() starts out with 'used' set to 0, then
takes any "later" time.  This means that for the first five minutes after
boot, flows will always be reported as never used, since 0 is greater than
any time already seen.

Signed-off-by: Ben Pfaff &lt;blp@nicira.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Read tcp flags only then the tranport header is present.</title>
<updated>2014-02-16T01:37:45Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2014-02-16T01:37:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04382a3303c22b0c536fbd0c94c1f012f2b8ed60'/>
<id>urn:sha1:04382a3303c22b0c536fbd0c94c1f012f2b8ed60</id>
<content type='text'>
Only the first IP fragment can have a TCP header, check for this.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Per cpu flow stats.</title>
<updated>2014-01-06T23:52:24Z</updated>
<author>
<name>Pravin B Shelar</name>
<email>pshelar@nicira.com</email>
</author>
<published>2013-10-30T00:22:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e298e505700604c97e6a9edb21cebb080bdb91f6'/>
<id>urn:sha1:e298e505700604c97e6a9edb21cebb080bdb91f6</id>
<content type='text'>
With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: TCP flags matching support.</title>
<updated>2013-11-02T01:43:45Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2013-10-23T08:44:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5eb26b156e29eadcc21f73fb5d14497f0db24b86'/>
<id>urn:sha1:5eb26b156e29eadcc21f73fb5d14497f0db24b86</id>
<content type='text'>
    tcp_flags=flags/mask
        Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
        bers written in decimal or in hexadecimal prefixed by 0x.   Each
        1-bit  in  mask requires that the corresponding bit in port must
        match.  Each 0-bit in mask causes the corresponding  bit  to  be
        ignored.

        TCP  protocol  currently  defines  9 flag bits, and additional 3
        bits are reserved (must be transmitted as zero), see  RFCs  793,
        3168, and 3540.  The flag bits are, numbering from the least
        significant bit:

        0: FIN No more data from sender.

        1: SYN Synchronize sequence numbers.

        2: RST Reset the connection.

        3: PSH Push function.

        4: ACK Acknowledgement field significant.

        5: URG Urgent pointer field significant.

        6: ECE ECN Echo.

        7: CWR Congestion Windows Reduced.

        8: NS  Nonce Sum.

        9-11:  Reserved.

        12-15: Not matchable, must be zero.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Widen TCP flags handling.</title>
<updated>2013-11-02T01:43:45Z</updated>
<author>
<name>Jarno Rajahalme</name>
<email>jrajahalme@nicira.com</email>
</author>
<published>2013-10-23T08:40:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df23e9f642830f10c505c8a3d57772ad1238c701'/>
<id>urn:sha1:df23e9f642830f10c505c8a3d57772ad1238c701</id>
<content type='text'>
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.

Signed-off-by: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>openvswitch: Restructure datapath.c and flow.c</title>
<updated>2013-10-04T01:16:47Z</updated>
<author>
<name>Pravin B Shelar</name>
<email>pshelar@nicira.com</email>
</author>
<published>2013-10-04T01:16:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e64457191a259537bbbfaebeba9a8043786af96f'/>
<id>urn:sha1:e64457191a259537bbbfaebeba9a8043786af96f</id>
<content type='text'>
Over the time datapath.c and flow.c has became pretty large files.
Following patch restructures functionality of component into three
different components:

flow.c: contains flow extract.
flow_netlink.c: netlink flow api.
flow_table.c: flow table api.

This patch restructures code without changing logic.

Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
<entry>
<title>net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs</title>
<updated>2013-09-11T20:09:58Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2013-09-07T07:41:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3bf4b5b11d381fed6a94a7e487e01c8b3bc436b9'/>
<id>urn:sha1:3bf4b5b11d381fed6a94a7e487e01c8b3bc436b9</id>
<content type='text'>
In function __parse_flow_nlattrs(), we check for condition
(type &gt; OVS_KEY_ATTR_MAX) and if true, print an error, but we do
not return from this function as in other checks. It seems this
has been forgotten, as otherwise, we could access beyond the
memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1].
Hence, a maliciously prepared nla_type from user space could access
beyond this upper limit.

Introduced by 03f0d916a ("openvswitch: Mega flow implementation").

Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Cc: Andy Zhou &lt;azhou@nicira.com&gt;
Acked-by: Jesse Gross &lt;jesse@nicira.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: Fix alignment of struct sw_flow_key.</title>
<updated>2013-09-05T19:54:37Z</updated>
<author>
<name>Jesse Gross</name>
<email>jesse@nicira.com</email>
</author>
<published>2013-09-05T19:17:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0d40f75bdab241868c0eb6f97aef9f8b3a66f7b3'/>
<id>urn:sha1:0d40f75bdab241868c0eb6f97aef9f8b3a66f7b3</id>
<content type='text'>
sw_flow_key alignment was declared as " __aligned(__alignof__(long))".
However, this breaks on the m68k architecture where long is 32 bit in
size but 16 bit aligned by default. This aligns to the size of a long to
ensure that we can always do comparsions in full long-sized chunks. It
also adds an additional build check to catch any reduction in alignment.

CC: Andy Zhou &lt;azhou@nicira.com&gt;
Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Reported-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: optimize flow compare and mask functions</title>
<updated>2013-08-27T20:13:09Z</updated>
<author>
<name>Andy Zhou</name>
<email>azhou@nicira.com</email>
</author>
<published>2013-08-27T20:02:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5828cd9a68873df1340b420371c02c47647878fb'/>
<id>urn:sha1:5828cd9a68873df1340b420371c02c47647878fb</id>
<content type='text'>
Make sure the sw_flow_key structure and valid mask boundaries are always
machine word aligned. Optimize the flow compare and mask operations
using machine word size operations. This patch improves throughput on
average by 15% when CPU is the bottleneck of forwarding packets.

This patch is inspired by ideas and code from a patch submitted by Peter
Klausler titled "replace memcmp() with specialized comparator".
However, The original patch only optimizes for architectures
support unaligned machine word access. This patch optimizes for all
architectures.

Signed-off-by: Andy Zhou &lt;azhou@nicira.com&gt;
Signed-off-by: Jesse Gross &lt;jesse@nicira.com&gt;
</content>
</entry>
</feed>
