<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/vxlan.h, branch v6.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-03-18T05:41:16Z</updated>
<entry>
<title>vxlan: Expose helper vxlan_build_gbp_hdr</title>
<updated>2023-03-18T05:41:16Z</updated>
<author>
<name>Gavin Li</name>
<email>gavinl@nvidia.com</email>
</author>
<published>2023-03-16T07:07:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c641e9279f3530aa2fe4bcb250477b555b75104a'/>
<id>urn:sha1:c641e9279f3530aa2fe4bcb250477b555b75104a</id>
<content type='text'>
The function vxlan_build_gbp_hdr will be used by other modules to build
gbp option in vxlan header according to gbp flags.

Signed-off-by: Gavin Li &lt;gavinl@nvidia.com&gt;
Reviewed-by: Gavi Teitz &lt;gavi@nvidia.com&gt;
Reviewed-by: Roi Dayan &lt;roid@nvidia.com&gt;
Reviewed-by: Maor Dickman &lt;maord@nvidia.com&gt;
Acked-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Reviewed-by: Simon Horman &lt;simon.horman@corigine.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>vxlan: mdb: Add an internal flag to indicate MDB usage</title>
<updated>2023-03-17T08:05:49Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2023-03-15T13:11:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bc6c6b013ffee36eb555cc0a68aa3d9608e1fad2'/>
<id>urn:sha1:bc6c6b013ffee36eb555cc0a68aa3d9608e1fad2</id>
<content type='text'>
Add an internal flag to indicate whether MDB entries are configured or
not. Set the flag after installing the first MDB entry and clear it
before deleting the last one.

The flag will be consulted by the data path which will only perform an
MDB lookup if the flag is set, thereby keeping the MDB overhead to a
minimum when the MDB is not used.

Another option would have been to use a static key, but it is global and
not per-device, unlike the current approach.

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: mdb: Add MDB control path support</title>
<updated>2023-03-17T08:05:49Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2023-03-15T13:11:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a3a48de5eade770e911d35291217bdd69ce04ef1'/>
<id>urn:sha1:a3a48de5eade770e911d35291217bdd69ce04ef1</id>
<content type='text'>
Implement MDB control path support, enabling the creation, deletion,
replacement and dumping of MDB entries in a similar fashion to the
bridge driver. Unlike the bridge driver, each entry stores a list of
remote VTEPs to which matched packets need to be replicated to and not a
list of bridge ports.

The motivating use case is the installation of MDB entries by a user
space control plane in response to received EVPN routes. As such, only
allow permanent MDB entries to be installed and do not implement
snooping functionality, avoiding a lot of unnecessary complexity.

Since entries can only be modified by user space under RTNL, use RTNL as
the write lock. Use RCU to ensure that MDB entries and remotes are not
freed while being accessed from the data path during transmission.

In terms of uAPI, reuse the existing MDB netlink interface, but add a
few new attributes to request and response messages:

* IP address of the destination VXLAN tunnel endpoint where the
  multicast receivers reside.

* UDP destination port number to use to connect to the remote VXLAN
  tunnel endpoint.

* VXLAN VNI Network Identifier to use to connect to the remote VXLAN
  tunnel endpoint. Required when Ingress Replication (IR) is used and
  the remote VTEP is not a member of originating broadcast domain
  (VLAN/VNI) [1].

* Source VNI Network Identifier the MDB entry belongs to. Used only when
  the VXLAN device is in external mode.

* Interface index of the outgoing interface to reach the remote VXLAN
  tunnel endpoint. This is required when the underlay destination IP is
  multicast (P2MP), as the multicast routing tables are not consulted.

All the new attributes are added under the 'MDBA_SET_ENTRY_ATTRS' nest
which is strictly validated by the bridge driver, thereby automatically
rejecting the new attributes.

[1] https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-3.2.2

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>drivers: vxlan: vnifilter: per vni stats</title>
<updated>2022-03-01T08:38:02Z</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@nvidia.com</email>
</author>
<published>2022-03-01T05:04:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4095e0e1328a3cd9e3b30174d6cb0edb3824256d'/>
<id>urn:sha1:4095e0e1328a3cd9e3b30174d6cb0edb3824256d</id>
<content type='text'>
Add per-vni statistics for vni filter mode. Counting Rx/Tx
bytes/packets/drops/errors at the appropriate places.

This patch changes vxlan_vs_find_vni to also return the
vxlan_vni_node in cases where the vni belongs to a vni
filtering vxlan device

Signed-off-by: Nikolay Aleksandrov &lt;nikolay@nvidia.com&gt;
Signed-off-by: Roopa Prabhu &lt;roopa@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: vni filtering support on collect metadata device</title>
<updated>2022-03-01T08:38:02Z</updated>
<author>
<name>Roopa Prabhu</name>
<email>roopa@nvidia.com</email>
</author>
<published>2022-03-01T05:04:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9c4bb0b245cee35ef66f75bf409c9573d934cf9'/>
<id>urn:sha1:f9c4bb0b245cee35ef66f75bf409c9573d934cf9</id>
<content type='text'>
This patch adds vnifiltering support to collect metadata device.

Motivation:
You can only use a single vxlan collect metadata device for a given
vxlan udp port in the system today. The vxlan collect metadata device
terminates all received vxlan packets. As shown in the below diagram,
there are use-cases where you need to support multiple such vxlan devices in
independent bridge domains. Each vxlan device must terminate the vni's
it is configured for.
Example usecase: In a service provider network a service provider
typically supports multiple bridge domains with overlapping vlans.
One bridge domain per customer. Vlans in each bridge domain are
mapped to globally unique vxlan ranges assigned to each customer.

vnifiltering support in collect metadata devices terminates only configured
vnis. This is similar to vlan filtering in bridge driver. The vni filtering
capability is provided by a new flag on collect metadata device.

In the below pic:
	- customer1 is mapped to br1 bridge domain
	- customer2 is mapped to br2 bridge domain
	- customer1 vlan 10-11 is mapped to vni 1001-1002
	- customer2 vlan 10-11 is mapped to vni 2001-2002
	- br1 and br2 are vlan filtering bridges
	- vxlan1 and vxlan2 are collect metadata devices with
	  vnifiltering enabled

┌──────────────────────────────────────────────────────────────────┐
│  switch                                                          │
│                                                                  │
│         ┌───────────┐                 ┌───────────┐              │
│         │           │                 │           │              │
│         │   br1     │                 │   br2     │              │
│         └┬─────────┬┘                 └──┬───────┬┘              │
│     vlans│         │               vlans │       │               │
│     10,11│         │                10,11│       │               │
│          │     vlanvnimap:               │    vlanvnimap:        │
│          │       10-1001,11-1002         │      10-2001,11-2002  │
│          │         │                     │       │               │
│   ┌──────┴┐     ┌──┴─────────┐       ┌───┴────┐  │               │
│   │ swp1  │     │vxlan1      │       │ swp2   │ ┌┴─────────────┐ │
│   │       │     │  vnifilter:│       │        │ │vxlan2        │ │
│   └───┬───┘     │   1001,1002│       └───┬────┘ │ vnifilter:   │ │
│       │         └────────────┘           │      │  2001,2002   │ │
│       │                                  │      └──────────────┘ │
│       │                                  │                       │
└───────┼──────────────────────────────────┼───────────────────────┘
        │                                  │
        │                                  │
  ┌─────┴───────┐                          │
  │  customer1  │                    ┌─────┴──────┐
  │ host/VM     │                    │customer2   │
  └─────────────┘                    │ host/VM    │
                                     └────────────┘

With this implementation, vxlan dst metadata device can
be associated with range of vnis.
struct vxlan_vni_node is introduced to represent
a configured vni. We start with vni and its
associated remote_ip in this structure. This
structure can be extended to bring in other
per vni attributes if there are usecases for it.
A vni inherits an attribute from the base vxlan device
if there is no per vni attributes defined.

struct vxlan_dev gets a new rhashtable for
vnis called vxlan_vni_group. vxlan_vnifilter.c
implements the necessary netlink api, notifications
and helper functions to process and manage lifecycle
of vxlan_vni_node.

This patch also adds new helper functions in vxlan_multicast.c
to handle per vni remote_ip multicast groups which are part
of vxlan_vni_group.

Fix build problems:
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Roopa Prabhu &lt;roopa@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: vxlan: add macro definition for number of IANA VXLAN-GPE port</title>
<updated>2021-11-29T12:19:53Z</updated>
<author>
<name>Hao Chen</name>
<email>chenhao288@hisilicon.com</email>
</author>
<published>2021-11-27T09:34:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ed618bd80947fa8d9644baf8ac18cb2a02223a5e'/>
<id>urn:sha1:ed618bd80947fa8d9644baf8ac18cb2a02223a5e</id>
<content type='text'>
Add macro definition for number of IANA VXLAN-GPE port for generic use.

Signed-off-by: Hao Chen &lt;chenhao288@hisilicon.com&gt;
Signed-off-by: Guangbin Huang &lt;huangguangbin2@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: only keep the available bits when setting vxlan md-&gt;gbp</title>
<updated>2020-09-14T23:49:39Z</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2020-09-13T11:51:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13e6ce98aa65ab5ce19351c020419360dfe8af29'/>
<id>urn:sha1:13e6ce98aa65ab5ce19351c020419360dfe8af29</id>
<content type='text'>
As we can see from vxlan_build/parse_gbp_hdr(), when processing metadata
on vxlan rx/tx path, only dont_learn/policy_applied/policy_id fields can
be set to or parse from the packet for vxlan gbp option.

So we'd better do the mask when set it in act_tunnel_key and cls_flower.
Otherwise, when users don't know these bits, they may configure with a
value which can never be matched.

Reported-by: Shuang Li &lt;shuali@redhat.com&gt;
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: ecmp support for mac fdb entries</title>
<updated>2020-05-22T21:00:38Z</updated>
<author>
<name>Roopa Prabhu</name>
<email>roopa@cumulusnetworks.com</email>
</author>
<published>2020-05-22T05:26:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1274e1cc42264d4e629841e4f182795cb0becfd2'/>
<id>urn:sha1:1274e1cc42264d4e629841e4f182795cb0becfd2</id>
<content type='text'>
Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
  exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>vxlan: add adjacent link to limit depth level</title>
<updated>2019-10-24T21:53:49Z</updated>
<author>
<name>Taehee Yoo</name>
<email>ap420073@gmail.com</email>
</author>
<published>2019-10-21T18:47:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0ce1822c2a08f6e05e22239bcb1778dcc916c7bc'/>
<id>urn:sha1:0ce1822c2a08f6e05e22239bcb1778dcc916c7bc</id>
<content type='text'>
Current vxlan code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this routine needs
huge stack memory. So, unlimited nested devices could make
stack overflow.

In order to fix this issue, this patch adds adjacent links.
The adjacent link APIs internally check the depth level.

Test commands:
    ip link add dummy0 type dummy
    ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \
	    dstport 4789
    for i in {1..100}
    do
	    let A=$i-1
	    ip link add vxlan$i type vxlan id $i group 239.1.1.1 \
		    dev vxlan$A dstport 4789
    done
    ip link del dummy0

The top upper link is vxlan100 and the lowest link is vxlan0.
When vxlan0 is deleting, the upper devices will be deleted recursively.
It needs huge stack memory so it makes stack overflow.

Splat looks like:
[  229.628477] =============================================================================
[  229.629785] BUG page-&gt;ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2
[  229.629785] -----------------------------------------------------------------------------
[  229.629785]
[  229.655439] ==================================================================
[  229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200
[  229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0
[  229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334
[  229.655688]
[  229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff  ........h.......
[  229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff  8.......h.......
[  229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f  30b.......`"....
[  229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff  .`..............
[  229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff  h........!......
[  229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00  ................
[  229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df  !..!............
[  229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff  .`..............
[  229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00                          ...A....
[  229.629785] FIX page-&gt;ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a
[  ... ]

Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well")
Signed-off-by: Taehee Yoo &lt;ap420073@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Convert usage of IN_MULTICAST to ipv4_is_multicast</title>
<updated>2019-09-05T07:38:32Z</updated>
<author>
<name>Dave Taht</name>
<email>dave.taht@gmail.com</email>
</author>
<published>2019-09-02T23:29:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=842841ece540f7d7739bec3e9b79bdf9669d77d7'/>
<id>urn:sha1:842841ece540f7d7739bec3e9b79bdf9669d77d7</id>
<content type='text'>
IN_MULTICAST's primary intent is as a uapi macro.

Elsewhere in the kernel we use ipv4_is_multicast consistently.

This patch unifies linux's multicast checks to use that function
rather than this macro.

Signed-off-by: Dave Taht &lt;dave.taht@gmail.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@toke.dk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
