<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/openvswitch/datapath.h, branch v5.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-04-24T01:26:11Z</updated>
<entry>
<title>net: openvswitch: expand the meters supported number</title>
<updated>2020-04-24T01:26:11Z</updated>
<author>
<name>Tonghao Zhang</name>
<email>xiangxia.m.yue@gmail.com</email>
</author>
<published>2020-04-24T00:08:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7c4c44c9a95d87e50ced38f7480e779cb472174'/>
<id>urn:sha1:c7c4c44c9a95d87e50ced38f7480e779cb472174</id>
<content type='text'>
In kernel datapath of Open vSwitch, there are only 1024
buckets of meter in one datapath. If installing more than
1024 (e.g. 8192) meters, it may lead to the performance drop.
But in some case, for example, Open vSwitch used as edge
gateway, there should be 20K at least, where meters used for
IP address bandwidth limitation.

[Open vSwitch userspace datapath has this issue too.]

For more scalable meter, this patch use meter array instead of
hash tables, and expand/shrink the array when necessary. So we
can install more meters than before in the datapath.
Introducing the struct *dp_meter_instance, it's easy to
expand meter though changing the *ti point in the struct
*dp_meter_table.

Cc: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Cc: Andy Zhou &lt;azhou@ovn.org&gt;
Signed-off-by: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: openvswitch: add hash info to upcall</title>
<updated>2019-11-15T01:29:46Z</updated>
<author>
<name>Tonghao Zhang</name>
<email>xiangxia.m.yue@gmail.com</email>
</author>
<published>2019-11-13T15:04:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bd1903b7c4596ba6f7677d0dfefd05ba5876707d'/>
<id>urn:sha1:bd1903b7c4596ba6f7677d0dfefd05ba5876707d</id>
<content type='text'>
When using the kernel datapath, the upcall don't
include skb hash info relatived. That will introduce
some problem, because the hash of skb is important
in kernel stack. For example, VXLAN module uses
it to select UDP src port. The tx queue selection
may also use the hash in stack.

Hash is computed in different ways. Hash is random
for a TCP socket, and hash may be computed in hardware,
or software stack. Recalculation hash is not easy.

Hash of TCP socket is computed:
tcp_v4_connect
    -&gt; sk_set_txhash (is random)

__tcp_transmit_skb
    -&gt; skb_set_hash_from_sk

There will be one upcall, without information of skb
hash, to ovs-vswitchd, for the first packet of a TCP
session. The rest packets will be processed in Open vSwitch
modules, hash kept. If this tcp session is forward to
VXLAN module, then the UDP src port of first tcp packet
is different from rest packets.

TCP packets may come from the host or dockers, to Open vSwitch.
To fix it, we store the hash info to upcall, and restore hash
when packets sent back.

+---------------+          +-------------------------+
|   Docker/VMs  |          |     ovs-vswitchd        |
+----+----------+          +-+--------------------+--+
     |                       ^                    |
     |                       |                    |
     |                       |  upcall            v restore packet hash (not recalculate)
     |                     +-+--------------------+--+
     |  tap netdev         |                         |   vxlan module
     +---------------&gt;     +--&gt;  Open vSwitch ko     +--&gt;
       or internal type    |                         |
                           +-------------------------+

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
Signed-off-by: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: openvswitch: Set OvS recirc_id from tc chain index</title>
<updated>2019-09-06T12:59:18Z</updated>
<author>
<name>Paul Blakey</name>
<email>paulb@mellanox.com</email>
</author>
<published>2019-09-04T13:56:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=95a7233c452a58a4c2310c456c73997853b2ec46'/>
<id>urn:sha1:95a7233c452a58a4c2310c456c73997853b2ec46</id>
<content type='text'>
Offloaded OvS datapath rules are translated one to one to tc rules,
for example the following simplified OvS rule:

recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)

Will be translated to the following tc rule:

$ tc filter add dev dev1 ingress \
	    prio 1 chain 0 proto ip \
		flower tcp ct_state -trk \
		action ct pipe \
		action goto chain 2

Received packets will first travel though tc, and if they aren't stolen
by it, like in the above rule, they will continue to OvS datapath.
Since we already did some actions (action ct in this case) which might
modify the packets, and updated action stats, we would like to continue
the proccessing with the correct recirc_id in OvS (here recirc_id(2))
where we left off.

To support this, introduce a new skb extension for tc, which
will be used for translating tc chain to ovs recirc_id to
handle these miss cases. Last tc chain index will be set
by tc goto chain action and read by OvS datapath.

Signed-off-by: Paul Blakey &lt;paulb@mellanox.com&gt;
Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 269</title>
<updated>2019-06-05T15:30:29Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-29T14:12:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c942299924a70b458320846e53b742ba11e985b3'/>
<id>urn:sha1:c942299924a70b458320846e53b742ba11e985b3</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of version 2 of the gnu general public license as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not write to the free
  software foundation inc 51 franklin street fifth floor boston ma
  02110 1301 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 21 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Alexios Zavras &lt;alexios.zavras@intel.com&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Richard Fontana &lt;rfontana@redhat.com&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141334.228102212@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>openvswitch: Support conntrack zone limit</title>
<updated>2018-05-25T20:45:19Z</updated>
<author>
<name>Yi-Hung Wei</name>
<email>yihung.wei@gmail.com</email>
</author>
<published>2018-05-25T00:56:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11efd5cb04a184eea4f57b68ea63dddd463158d1'/>
<id>urn:sha1:11efd5cb04a184eea4f57b68ea63dddd463158d1</id>
<content type='text'>
Currently, nf_conntrack_max is used to limit the maximum number of
conntrack entries in the conntrack table for every network namespace.
For the VMs and containers that reside in the same namespace,
they share the same conntrack table, and the total # of conntrack entries
for all the VMs and containers are limited by nf_conntrack_max.  In this
case, if one of the VM/container abuses the usage the conntrack entries,
it blocks the others from committing valid conntrack entries into the
conntrack table.  Even if we can possibly put the VM in different network
namespace, the current nf_conntrack_max configuration is kind of rigid
that we cannot limit different VM/container to have different # conntrack
entries.

To address the aforementioned issue, this patch proposes to have a
fine-grained mechanism that could further limit the # of conntrack entries
per-zone.  For example, we can designate different zone to different VM,
and set conntrack limit to each zone.  By providing this isolation, a
mis-behaved VM only consumes the conntrack entries in its own zone, and
it will not influence other well-behaved VMs.  Moreover, the users can
set various conntrack limit to different zone based on their preference.

The proposed implementation utilizes Netfilter's nf_conncount backend
to count the number of connections in a particular zone.  If the number of
connection is above a configured limitation, ovs will return ENOMEM to the
userspace.  If userspace does not configure the zone limit, the limit
defaults to zero that is no limitation, which is backward compatible to
the behavior without this patch.

The following high leve APIs are provided to the userspace:
  - OVS_CT_LIMIT_CMD_SET:
    * set default connection limit for all zones
    * set the connection limit for a particular zone
  - OVS_CT_LIMIT_CMD_DEL:
    * remove the connection limit for a particular zone
  - OVS_CT_LIMIT_CMD_GET:
    * get the default connection limit for all zones
    * get the connection limit for a particular zone

Signed-off-by: Yi-Hung Wei &lt;yihung.wei@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: Add meter action support</title>
<updated>2017-11-13T01:37:07Z</updated>
<author>
<name>Andy Zhou</name>
<email>azhou@ovn.org</email>
</author>
<published>2017-11-10T20:09:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd8a6c33693c1b89d2737ffdbf9611564e9ac907'/>
<id>urn:sha1:cd8a6c33693c1b89d2737ffdbf9611564e9ac907</id>
<content type='text'>
Implements OVS kernel meter action support.

Signed-off-by: Andy Zhou &lt;azhou@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: Add meter infrastructure</title>
<updated>2017-11-13T01:37:07Z</updated>
<author>
<name>Andy Zhou</name>
<email>azhou@ovn.org</email>
</author>
<published>2017-11-10T20:09:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=96fbc13d7e770b542d2d1fcf700d0baadc6e8063'/>
<id>urn:sha1:96fbc13d7e770b542d2d1fcf700d0baadc6e8063</id>
<content type='text'>
OVS kernel datapath so far does not support Openflow meter action.
This is the first stab at adding kernel datapath meter support.
This implementation supports only drop band type.

Signed-off-by: Andy Zhou &lt;azhou@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: export get_dp() API.</title>
<updated>2017-11-13T01:37:07Z</updated>
<author>
<name>Andy Zhou</name>
<email>azhou@ovn.org</email>
</author>
<published>2017-11-10T20:09:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9602c01e57f7b868d748c2ba2aef0efa64b71ffc'/>
<id>urn:sha1:9602c01e57f7b868d748c2ba2aef0efa64b71ffc</id>
<content type='text'>
Later patches will invoke get_dp() outside of datapath.c. Export it.

Signed-off-by: Andy Zhou &lt;azhou@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: reliable interface indentification in port dumps</title>
<updated>2017-11-05T12:49:17Z</updated>
<author>
<name>Jiri Benc</name>
<email>jbenc@redhat.com</email>
</author>
<published>2017-11-02T19:04:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9354d452034273a50a4fd703bea31e5d6b1fc20b'/>
<id>urn:sha1:9354d452034273a50a4fd703bea31e5d6b1fc20b</id>
<content type='text'>
This patch allows reliable identification of netdevice interfaces connected
to openvswitch bridges. In particular, user space queries the netdev
interfaces belonging to the ports for statistics, up/down state, etc.
Datapath dump needs to provide enough information for the user space to be
able to do that.

Currently, only interface names are returned. This is not sufficient, as
openvswitch allows its ports to be in different name spaces and the
interface name is valid only in its name space. What is needed and generally
used in other netlink APIs, is the pair ifindex+netnsid.

The solution is addition of the ifindex+netnsid pair (or only ifindex if in
the same name space) to vport get/dump operation.

On request side, ideally the ifindex+netnsid pair could be used to
get/set/del the corresponding vport. This is not implemented by this patch
and can be added later if needed.

Signed-off-by: Jiri Benc &lt;jbenc@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>openvswitch: fix skb_panic due to the incorrect actions attrlen</title>
<updated>2017-08-16T21:12:37Z</updated>
<author>
<name>Liping Zhang</name>
<email>zlpnobody@gmail.com</email>
</author>
<published>2017-08-16T05:30:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=494bea39f3201776cdfddc232705f54a0bd210c4'/>
<id>urn:sha1:494bea39f3201776cdfddc232705f54a0bd210c4</id>
<content type='text'>
For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
  skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
  ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:&lt;NULL&gt;
  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:129!
  [...]
  Call Trace:
   &lt;IRQ&gt;
   [&lt;ffffffff8148be82&gt;] skb_put+0x43/0x44
   [&lt;ffffffff8148fabf&gt;] skb_zerocopy+0x6c/0x1f4
   [&lt;ffffffffa0290d36&gt;] queue_userspace_packet+0x3a3/0x448 [openvswitch]
   [&lt;ffffffffa0292023&gt;] ovs_dp_upcall+0x30/0x5c [openvswitch]
   [&lt;ffffffffa028d435&gt;] output_userspace+0x132/0x158 [openvswitch]
   [&lt;ffffffffa01e6890&gt;] ? ip6_rcv_finish+0x74/0x77 [ipv6]
   [&lt;ffffffffa028e277&gt;] do_execute_actions+0xcc1/0xdc8 [openvswitch]
   [&lt;ffffffffa028e3f2&gt;] ovs_execute_actions+0x74/0x106 [openvswitch]
   [&lt;ffffffffa0292130&gt;] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
   [&lt;ffffffffa0292b77&gt;] ? key_extract+0x63c/0x8d5 [openvswitch]
   [&lt;ffffffffa029848b&gt;] ovs_vport_receive+0xa1/0xc3 [openvswitch]
  [...]

Also we can find that the actions_len is much little than the orig_len:
  crash&gt; struct sw_flow_actions 0xffff8812f539d000
  struct sw_flow_actions {
    rcu = {
      next = 0xffff8812f5398800,
      func = 0xffffe3b00035db32
    },
    orig_len = 1384,
    actions_len = 592,
    actions = 0xffff8812f539d01c
  }

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee &lt;neil.mckee@inmon.com&gt;
Signed-off-by: Liping Zhang &lt;zlpnobody@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
