linux/drivers/net/hyperv, branch v6.15

hv_netvsc: Remove rmsg_pgcnt

2025-05-15T02:45:24Z

init_page_array() now always creates a single page buffer array entry for the rndis message, even if the rndis message crosses a page boundary. As such, the number of page buffer array entries used for the rndis message must no longer be tracked -- it is always just 1. Remove the rmsg_pgcnt field and use "1" where the value is needed. Cc: # 6.1.x Signed-off-by: Michael Kelley Link: https://patch.msgid.link/20250513000604.1396-5-mhklinux@outlook.com Signed-off-by: Jakub Kicinski

hv_netvsc: Preserve contiguous PFN grouping in the page buffer array

2025-05-15T02:45:24Z

Starting with commit dca5161f9bd0 ("hv_netvsc: Check status in SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. The root cause of the problem is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two rndis header PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed30a10 ("net: allow small head cache usage with large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in an ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. Current code in init_page_array() creates a separate page buffer array entry for each PFN required to identify the data to be transmitted. Contiguous PFNs get separate entries in the page buffer array, and any information about contiguity is lost. Fix the core issue by having init_page_array() construct the page buffer array to represent contiguous ranges rather than individual pages. When these ranges are subsequently passed to netvsc_build_mpb_array(), it can build GPA ranges that contain multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete error status: 2". If instead the network packet is sent by copying into a pre-allocated send buffer slot, the copy proceeds using the contiguous ranges rather than individual pages, but the result of the copying is the same. Also fix rndis_filter_send_request() to construct a contiguous range, since it has its own page buffer array. This change has a side benefit in CoCo VMs in that netvsc_dma_map() calls dma_map_single() on each contiguous range instead of on each page. This results in fewer calls to dma_map_single() but on larger chunks of memory, which should reduce contention on the swiotlb. Since the page buffer array now contains one entry for each contiguous range instead of for each individual page, the number of entries in the array can be reduced, saving 208 bytes of stack space in netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503 Cc: # 6.1.x Signed-off-by: Michael Kelley Link: https://patch.msgid.link/20250513000604.1396-4-mhklinux@outlook.com Signed-off-by: Jakub Kicinski

hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages

2025-05-15T02:45:23Z

netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus messages. This function creates a series of GPA ranges, each of which contains a single PFN. However, if the rndis header in the VMBus message crosses a page boundary, the netvsc protocol with the host requires that both PFNs for the rndis header must be in a single "GPA range" data structure, which isn't possible with vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a new function netvsc_build_mpb_array() to build a VMBus message with multiple GPA ranges, each of which may contain multiple PFNs. Use vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. There's no functional change since higher levels of netvsc don't maintain or propagate knowledge of contiguous PFNs. Based on its input, netvsc_build_mpb_array() still produces a separate GPA range for each PFN and the behavior is the same as with vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a subsequent patch to provide the necessary grouping. Cc: # 6.1.x Signed-off-by: Michael Kelley Link: https://patch.msgid.link/20250513000604.1396-3-mhklinux@outlook.com Signed-off-by: Jakub Kicinski

net: move misc netdev_lock flavors to a separate header

2025-03-08T17:06:50Z

Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers). The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h. Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski

hv_netvsc: Use VF's tso_max_size value when data path is VF

2025-02-19T09:42:52Z

On Azure, increasing VF's gso/gro packet size to up-to GSO_MAX_SIZE is not possible without allowing the same for netvsc NIC (as the NICs are bonded together). For bonded NICs, the min of the max aggregated pkt size of the members is propagated in the stack. Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size to VF's packet size for netvsc too, when the data path is switched over to the VF Tested on azure env with Accelerated Networking enabled and disabled. Signed-off-by: Shradha Gupta Reviewed-by: Haiyang Zhang Signed-off-by: David S. Miller

hv_netvsc: Replace one-element array with flexible array member

2025-01-18T03:07:48Z

Replace the deprecated one-element array with a modern flexible array member in the struct nvsp_1_message_send_receive_buffer_complete. Use struct_size_t(,,1) instead of sizeof() to maintain the same size. Compile-tested only. Link: https://github.com/KSPP/linux/issues/79 Signed-off-by: Thorsten Blum Tested-by: Roman Kisel Reviewed-by: Roman Kisel Link: https://patch.msgid.link/20250116211932.139564-2-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2024-10-25T07:08:22Z

Cross-merge networking fixes after downstream PR. No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni

hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event

2024-10-24T10:43:20Z

The existing code moves VF to the same namespace as the synthetic NIC during netvsc_register_vf(). But, if the synthetic device is moved to a new namespace after the VF registration, the VF won't be moved together. To make the behavior more consistent, add a namespace check for synthetic NIC's NETDEV_REGISTER event (generated during its move), and move the VF if it is not in the same namespace. Cc: stable@vger.kernel.org Fixes: c0a41b887ce6 ("hv_netvsc: move VF to same namespace as netvsc device") Suggested-by: Stephen Hemminger Signed-off-by: Haiyang Zhang Reviewed-by: Simon Horman Link: https://patch.msgid.link/1729275922-17595-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni

hv_netvsc: Link queues to NAPIs

2024-10-06T15:24:06Z

Use netif_queue_set_napi to link queues to NAPI instances so that they can be queried with netlink. Shradha Gupta tested the patch and reported that the results are as expected: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 5, 'ifindex': 2, 'napi-id': 8198, 'type': 'rx'}, {'id': 6, 'ifindex': 2, 'napi-id': 8199, 'type': 'rx'}, {'id': 7, 'ifindex': 2, 'napi-id': 8200, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'tx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8197, 'type': 'tx'}, {'id': 5, 'ifindex': 2, 'napi-id': 8198, 'type': 'tx'}, {'id': 6, 'ifindex': 2, 'napi-id': 8199, 'type': 'tx'}, {'id': 7, 'ifindex': 2, 'napi-id': 8200, 'type': 'tx'}] Signed-off-by: Joe Damato Reviewed-by: Haiyang Zhang Tested-by: Shradha Gupta Signed-off-by: David S. Miller

hv_netvsc: Don't assume cpu_possible_mask is dense

2024-10-04T20:09:20Z

Current code allocates the pcpu_sum array with size num_possible_cpus(). This code assumes the cpu_possible_mask is dense, which is not true in the general case per [1]. If cpu_possible_mask is sparse, the array might be indexed by a value beyond the size of the array. However, the configurations that Hyper-V provides to guest VMs on x86 and ARM64 hardware, in combination with how architecture specific code assigns Linux CPU numbers, *does* always produce a dense cpu_possible_mask. So the dense assumption is not currently causing failures. But for robustness against future changes in how cpu_possible_mask is populated, update the code to no longer assume dense. The correct approach is to allocate and initialize the array using size "nr_cpu_ids". While this leaves unused array entries corresponding to holes in cpu_possible_mask, the holes are assumed to be minimal and hence the amount of memory wasted by unused entries is minimal. [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@SN6PR02MB4157.namprd02.prod.outlook.com/ Signed-off-by: Michael Kelley Acked-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20241003035333.49261-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski