linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Lines
2026-03-09	gve: add support for UDP GSO for DQO format	Ankit Garg	-12/+29
	Enable support for UDP GSO when using DQO format. Advertise the feature flag during device initialization and enable offload by default. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260306224816.3391551-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05	gve: Enable hw-gro by default if device supported	Ankit Garg	-1/+3
	Change the driver's default behavior to enable hw-gro whenever supported for device. Performance observations: - We observed ~10% improvement in RX single stream throughput across various MTU sizes. - No change in TCP_RR/TCP_CRR latencies Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-5-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05	gve: pull network headers into skb linear part	Ankit Garg	-4/+16
	Currently, in DQO mode with hw-gro enabled, entire received packet is placed into skb fragments when header-split is disabled. This leaves the skb linear part empty, forcing the networking stack to do multiple small memory copies to access eth, IP and TCP headers. This patch adds a single memcpy to put all headers into linear portion before packet reaches the SW GRO stack; thus eliminating multiple smaller memcpy calls. Additionally, the criteria for calling napi_gro_frags() was updated. Since skb->head is now populated, we instead check if the SKB is the cached NAPI scratchpad to ensure we continue using the zero-allocation path. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-4-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05	gve: fix SW coalescing when hw-GRO is used	Ankit Garg	-2/+21
	Leaving gso_segs unpopulated on hardware GRO packet prevents further coalescing by software stack because the kernel's GRO logic marks the SKB for flush because the expected length of all segments doesn't match actual payload length. Setting gso_segs correctly results in significantly more segments being coalesced as measured by the result of dev_gro_receive(). gso_segs are derived from payload length. When header-split is enabled, payload is in the non-linear portion of skb. And when header-split is disabled, we have to parse the headers to determine payload length. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-3-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05	gve: Advertise NETIF_F_GRO_HW instead of NETIF_F_LRO	Ankit Garg	-10/+11
	The device behind DQO format has always coalesced packets per stricter hardware GRO spec even though it was being advertised as LRO. Update advertised capability to match device behavior. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260303195549.2679070-2-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-28	gve: Enable reading max ring size from the device in DQO-QPL mode	Matt Olson	-6/+4
	The gVNIC device indicates a device option (MODIFY_RING) to the driver, which presents a range of ring sizes from which the user is allowed to select. But in DQO-QPL queue format, the driver ignores the "max" of this range and instead allows the user to configure the ring size in the range [min, default]. This was done because increasing the ring size could result in the number of registered pages being higher than the max allowed by the device. In order to support large ring sizes, stop ignoring the "max" of the range presented in the MODIFY_RING option. Signed-off-by: Matt Olson <maolson@google.com> Signed-off-by: Max Yuan <maxyuan@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260225182342.1049816-3-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28	gve: Update QPL page registration logic	Matt Olson	-35/+54
	For DQO, change QPL page registration logic to be more flexible to honor the "max_registered_pages" parameter from the gVNIC device. Previously the number of RX pages per QPL was hardcoded to twice the ring size, and the number of TX pages per QPL was dictated by the device in the DQO-QPL device option. Now [in DQO-QPL mode], the driver will ignore the "tx_pages_per_qpl" parameter indicated in the DQO-QPL device option and instead allocate up to (tx_queue_length / 2) pages per TX QPL and up to (rx_queue_length * 2) pages per RX QPL while keeping the total number of pages under the "max_registered_pages". Merge DQO and GQI QPL page calculation logic into a unified gve_update_num_qpl_pages function. Add rx_pages_per_qpl to the priv struct for consumption by both DQO and GQI. Signed-off-by: Matt Olson <maolson@google.com> Signed-off-by: Max Yuan <maxyuan@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20260225182342.1049816-2-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26	Merge tag 'net-7.0-rc2' of ↵	Linus Torvalds	-31/+25
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...
2026-02-23	gve: fix incorrect buffer cleanup in gve_tx_clean_pending_packets for QPL	Ankit Garg	-31/+25
	In DQ-QPL mode, gve_tx_clean_pending_packets() incorrectly uses the RDA buffer cleanup path. It iterates num_bufs times and attempts to unmap entries in the dma array. This leads to two issues: 1. The dma array shares storage with tx_qpl_buf_ids (union). Interpreting buffer IDs as DMA addresses results in attempting to unmap incorrect memory locations. 2. num_bufs in QPL mode (counting 2K chunks) can significantly exceed the size of the dma array, causing out-of-bounds access warnings (trace below is how we noticed this issue). UBSAN: array-index-out-of-bounds in drivers/net/ethernet/drivers/net/ethernet/google/gve/gve_tx_dqo.c:178:5 index 18 is out of range for type 'dma_addr_t[18]' (aka 'unsigned long long[18]') Workqueue: gve gve_service_task [gve] Call Trace: <TASK> dump_stack_lvl+0x33/0xa0 __ubsan_handle_out_of_bounds+0xdc/0x110 gve_tx_stop_ring_dqo+0x182/0x200 [gve] gve_close+0x1be/0x450 [gve] gve_reset+0x99/0x120 [gve] gve_service_task+0x61/0x100 [gve] process_scheduled_works+0x1e9/0x380 Fix this by properly checking for QPL mode and delegating to gve_free_tx_qpl_bufs() to reclaim the buffers. Cc: stable@vger.kernel.org Fixes: a6fb8d5a8b69 ("gve: Tx path for DQO-QPL") Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260220215324.1631350-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-22	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses	Kees Cook	-5/+4
	Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments	Linus Torvalds	-12/+6
	This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument	Linus Torvalds	-6/+6
	This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	treewide: Replace kmalloc with kmalloc_obj for non-scalar types	Kees Cook	-29/+27
	This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-06	gve: Remove jumbo_remove step from TX path	Alice Mikityanska	-3/+0
	Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove unnecessary steps from the gve TX path, that used to check and remove HBH. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-10-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-28/+53
	Cross-merge networking fixes after downstream PR (net-6.19-rc9). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c 3125fc1701694 ("net: spacemit: k1-emac: fix jumbo frame support") f66086798f91f ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-03	gve: Correct ethtool rx_dropped calculation	Max Yuan	-6/+17
	The gve driver's "rx_dropped" statistic, exposed via `ethtool -S`, incorrectly includes `rx_buf_alloc_fail` counts. These failures represent an inability to allocate receive buffers, not true packet drops where a received packet is discarded. This misrepresentation can lead to inaccurate diagnostics. This patch rectifies the ethtool "rx_dropped" calculation. It removes `rx_buf_alloc_fail` from the total and adds `xdp_tx_errors` and `xdp_redirect_errors`, which represent legitimate packet drops within the XDP path. Cc: stable@vger.kernel.org Fixes: 433e274b8f7b ("gve: Add stats for gve.") Signed-off-by: Max Yuan <maxyuan@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Joshua Washington <joshwash@google.com> Reviewed-by: Matt Olson <maolson@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260202193925.3106272-3-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-03	gve: Fix stats report corruption on queue count change	Debarghya Kundu	-22/+36
	The driver and the NIC share a region in memory for stats reporting. The NIC calculates its offset into this region based on the total size of the stats region and the size of the NIC's stats. When the number of queues is changed, the driver's stats region is resized. If the queue count is increased, the NIC can write past the end of the allocated stats region, causing memory corruption. If the queue count is decreased, there is a gap between the driver and NIC stats, leading to incorrect stats reporting. This change fixes the issue by allocating stats region with maximum size, and the offset calculation for NIC stats is changed to match with the calculation of the NIC. Cc: stable@vger.kernel.org Fixes: 24aeb56f2d38 ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.") Signed-off-by: Debarghya Kundu <debarghyak@google.com> Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260202193925.3106272-2-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-15/+14
	Cross-merge networking fixes after downstream PR (net-6.19-rc8). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c 2c84959167d64 ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()") f66086798f91f ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-27	gve: fix probe failure if clock read fails	Jordan Rhee	-15/+14
	If timestamping is supported, GVE reads the clock during probe, which can fail for various reasons. Previously, this failure would abort the driver probe, rendering the device unusable. This behavior has been observed on production GCP VMs, causing driver initialization to fail completely. This patch allows the driver to degrade gracefully. If gve_init_clock() fails, it logs a warning and continues loading the driver without PTP support. Cc: stable@vger.kernel.org Fixes: a479a27f4da4 ("gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call") Signed-off-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Shachar Raindel <shacharr@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20260127010210.969823-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20	Merge tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux	Jakub Kicinski	-3/+6
	Pavel Begunkov says: ==================== Add support for providers with large rx buffer Many modern NICs support configurable receive buffer lengths, and zcrx and memory providers can use buffers larger than 4K to improve performance. When paired with hw-gro larger rx buffer sizes can drastically reduce the number of buffers traversing the stack and save a lot of processing time. It also allows to give to users larger contiguous chunks of data. Single stream benchmarks showed up to ~30% CPU util improvement. E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC: packets=23987040 (MB=2745098), rps=199559 (MB/s=22837) CPU %usr %nice %sys %iowait %irq %soft %idle 0 1.53 0.00 27.78 2.72 1.31 66.45 0.22 packets=24078368 (MB=2755550), rps=200319 (MB/s=22924) CPU %usr %nice %sys %iowait %irq %soft %idle 0 0.69 0.00 8.26 31.65 1.83 57.00 0.57 This series adds net infrastructure for memory providers configuring the size and implements it for bnxt. It's an opt-in feature for drivers, they should advertise support for the parameter in the qops and must check if the hardware supports the given size. It's limited to memory providers as it drastically simplifies implementation. It doesn't affect the fast path zcrx uAPI, and the user exposed parameter is defined in zcrx terms, which allows it to be flexible and adjusted in the future. A liburing example can be found at [2] full branch: [1] https://github.com/isilence/linux.git zcrx/large-buffers-v8 Liburing example: [2] https://github.com/isilence/liburing.git zcrx/rx-buf-len * tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux: io_uring/zcrx: document area chunking parameter selftests: iou-zcrx: test large chunk sizes eth: bnxt: support qcfg provided rx page size eth: bnxt: adjust the fill level of agg queues with larger buffers eth: bnxt: store rx buffer size per queue net: pass queue rx page size from memory provider net: add bare bone queue configs net: reduce indent of struct netdev_queue_mgmt_ops members net: memzero mp params when closing a queue ==================== Link: https://patch.msgid.link/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14	net: add bare bone queue configs	Pavel Begunkov	-3/+6
	We'll need to pass extra parameters when allocating a queue for memory providers. Define a new structure for queue configurations, and pass it to qapi callbacks. It's empty for now, actual parameters will be added in following patches. Configurations should persist across resets, and for that they're default-initialised on device registration and stored in struct netdev_rx_queue. We also add a new qapi callback for defaulting a given config. It must be implemented if a driver wants to use queue configs and is optional otherwise. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2026-01-06	net: gve: convert to use .get_rx_ring_count	Breno Leitao	-3/+8
	Convert the Google Virtual Ethernet (GVE) driver to use the new .get_rx_ring_count ethtool operation instead of handling ETHTOOL_GRXRINGS in .get_rxnfc. This simplifies the code by moving the ring count query to a dedicated callback. The new callback provides the same functionality in a more direct way, following the ongoing ethtool API modernization. Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260105-gxring_google-v2-1-e7cfe924d429@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-29	gve: defer interrupt enabling until NAPI registration	Ankit Garg	-1/+3
	Currently, interrupts are automatically enabled immediately upon request. This allows interrupt to fire before the associated NAPI context is fully initialized and cause failures like below: [ 0.946369] Call Trace: [ 0.946369] <IRQ> [ 0.946369] __napi_poll+0x2a/0x1e0 [ 0.946369] net_rx_action+0x2f9/0x3f0 [ 0.946369] handle_softirqs+0xd6/0x2c0 [ 0.946369] ? handle_edge_irq+0xc1/0x1b0 [ 0.946369] __irq_exit_rcu+0xc3/0xe0 [ 0.946369] common_interrupt+0x81/0xa0 [ 0.946369] </IRQ> [ 0.946369] <TASK> [ 0.946369] asm_common_interrupt+0x22/0x40 [ 0.946369] RIP: 0010:pv_native_safe_halt+0xb/0x10 Use the `IRQF_NO_AUTOEN` flag when requesting interrupts to prevent auto enablement and explicitly enable the interrupt in NAPI initialization path (and disable it during NAPI teardown). This ensures that interrupt lifecycle is strictly coupled with readiness of NAPI context. Cc: stable@vger.kernel.org Fixes: 1dfc2e46117e ("gve: Refactor napi add and remove functions") Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20251219102945.2193617-1-hramamurthy@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-04	gve: Move gve_init_clock to after AQ CONFIGURE_DEVICE_RESOURCES call	Tim Hostetler	-7/+10
	commit 46e7860ef941 ("gve: Move ptp_schedule_worker to gve_init_clock") moved the first invocation of the AQ command REPORT_NIC_TIMESTAMP to gve_probe(). However, gve_init_clock() invoking REPORT_NIC_TIMESTAMP is not valid until after gve_probe() invokes the AQ command CONFIGURE_DEVICE_RESOURCES. Failure to do so results in the following error: gve 0000:00:07.0: failed to read NIC clock -11 This was missed earlier because the driver under test was loaded at runtime instead of boot-time. The boot-time driver had already initialized the device, causing the runtime driver to successfully call gve_init_clock() incorrectly. Fixes: 46e7860ef941 ("gve: Move ptp_schedule_worker to gve_init_clock") Reviewed-by: Ankit Garg <nktgrg@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251202200207.1434749-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27	gve: Fix race condition on tx->dropped_pkt update	Max Yuan	-0/+8
	The tx->dropped_pkt counter is a 64-bit integer that is incremented directly. On 32-bit architectures, this operation is not atomic and can lead to read/write tearing if a reader accesses the counter during the update. This can result in incorrect values being reported for dropped packets. To prevent this potential data corruption, wrap the increment operation with u64_stats_update_begin() and u64_stats_update_end(). This ensures that updates to the 64-bit counter are atomic, even on 32-bit systems, by using a sequence lock. The u64_stats_sync API requires the writer to have exclusive access, which is already provided in this context by the network stack's serialization of the transmit path (net_device_ops::ndo_start_xmit [1]) for a given queue. [1]: https://www.kernel.org/doc/Documentation/networking/netdevices.txt Signed-off-by: Max Yuan <maxyuan@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18	gve: Add Rx HWTS metadata to AF_XDP ZC mode	Tim Hostetler	-2/+21
	By overlaying the struct gve_xdp_buff on top of the struct xdp_buff_xsk that AF_XDP utilizes, the driver records the 32 bit timestamp via the completion descriptor and the cached 64 bit NIC timestamp via gve_priv. The driver's implementation of xmo_rx_timestamp extends the timestamp to the full and up to date 64 bit timestamp and returns it to the user. gve_rx_xsk_dqo is modified to accept a pointer to the completion descriptor and no longer takes a buf_len explicitly as it can be pulled out of the descriptor. With this patch gve now supports bpf_xdp_metadata_rx_timestamp. Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-5-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-18	gve: Prepare bpf_xdp_metadata_rx_timestamp support	Tim Hostetler	-10/+34
	Support populating XDP RX metadata with hardware RX timestamps. This patch utilizes the same underlying logic to calculate hardware timestamps as the regular RX path. xdp_metadata_ops is registered with the net_device in a future patch. gve_rx_calculate_hwtstamp was pulled out so as to not duplicate logic between gve_xdp_rx_timestamp and gve_rx_hwtstamp. Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-4-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-18	gve: Wrap struct xdp_buff	Tim Hostetler	-8/+13
	RX timestamping will need to keep track of extra temporary information per-packet. In preparation for this, introduce gve_xdp_buff to wrap the xdp_buff. This is similar in function to stmmac_xdp_buff and ice_xdp_buff. Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-3-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-18	gve: Move ptp_schedule_worker to gve_init_clock	Tim Hostetler	-4/+12
	Previously, gve had been only initializing ptp aux work when hardware timestamping was initialized through ndo_hwtsatmp_set. As this patch series introduces XDP hardware timestamp metadata which will require the ptp aux work, the work can't be gated on the kernel_hwtstamp_config being set and must be initialized elsewhere. For simplicity, ptp_schedule_worker is invoked right after the ptp_clock is registered with the kernel (which happens during gve_probe or following reset). The worker is scheduled in GVE_NIC_TS_SYNC_INTERVAL_MS as the synchronous call to gve_clock_nic_ts_read makes the worker redundant if scheduled immediately. If gve cannot read the device clock immediately, it errors out of gve_init_clock. Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251114211146.292068-2-joshwash@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-10	gve: Default to max_rx_buffer_size for DQO if device supported	Ankit Garg	-0/+4
	Change the driver's default behavior to prefer the largest available RX buffer length supported by the device for DQO format, rather than always using the hardcoded 2K default. Previously, the driver would initialize with `GVE_DEFAULT_RX_BUFFER_SIZE` (2K), even if the device advertised support for a larger length (e.g., 4K). Performance observations: - With LRO disabled, we observed >10% improvement in RX single stream throughput when MTU >=2048. - With LRO enabled, we observed >10% improvement in RX single stream throughput when MTU >=1460. - No regressions were observed. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251106192746.243525-5-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10	gve: Allow ethtool to configure rx_buf_len	Ankit Garg	-1/+60
	Add support for getting and setting the RX buffer length via the ethtool ring parameters (`ethtool -g`/`-G`). The driver restricts the allowed buffer length to 2048 (SZ_2K) by default and allows 4096 (SZ_4K) based on device options. As XDP is only supported when the `rx_buf_len` is 2048, the driver now enforces this in two places: 1. In `gve_xdp_set`, rejecting XDP programs if the current buffer length is not 2048. 2. In `gve_set_rx_buf_len_config`, rejecting buffer length changes if XDP is loaded and the new length is not 2048. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251106192746.243525-4-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10	gve: Use extack to log xdp config verification errors	Ankit Garg	-9/+15
	Plumb extack as it allows us to send more detailed error messages back and append 'gve' suffix to method name per convention. NL_SET_ERR_MSG_FMT_MOD doesn't support format string longer than 80 chars so keeping netdev warning with actual queue count details. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251106192746.243525-3-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-10	gve: Decouple header split from RX buffer length	Ankit Garg	-15/+0
	Previously, enabling header split via `gve_set_hsplit_config` also implicitly changed the RX buffer length to 4K (if supported by the device). This coupled two settings that should be orthogonal; this patch removes that side effect. After this change, `gve_set_hsplit_config` only toggles the header split configuration. The RX buffer length is no longer affected and must be configured independently. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251106192746.243525-2-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-0/+15
	Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c 9222582ec524 ("Revert "wifi: ath12k: Fix missing station power save configuration"") 6917e268c433 ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig b1d16f7c0063 ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") 93f53db9f9dc ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31	gve: Implement settime64 with -EOPNOTSUPP	Tim Hostetler	-0/+7
	ptp_clock_settime() assumes every ptp_clock has implemented settime64(). Stub it with -EOPNOTSUPP to prevent a NULL dereference. Fixes: acd16380523b ("gve: Add initial PTP device support") Reported-by: syzbot+a546141ca6d53b90aba3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a546141ca6d53b90aba3 Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251029184555.3852952-3-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31	gve: Implement gettimex64 with -EOPNOTSUPP	Tim Hostetler	-0/+8
	gve implemented a ptp_clock for sole use of do_aux_work at this time. ptp_clock_gettime() and ptp_sys_offset() assume every ptp_clock has implemented either gettimex64 or gettime64. Stub gettimex64 and return -EOPNOTSUPP to prevent NULL dereferencing. Fixes: acd16380523b ("gve: Add initial PTP device support") Reported-by: syzbot+c8c0e7ccabd456541612@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c8c0e7ccabd456541612 Signed-off-by: Tim Hostetler <thostet@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251029184555.3852952-2-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20	gve: Consolidate and persist ethtool ring changes	Ankit Garg	-55/+51
	Refactor the ethtool ring parameter configuration logic to address two issues: unnecessary queue resets and lost configuration changes when the interface is down. Previously, `gve_set_ringparam` could trigger multiple queue destructions and recreations for a single command, as different settings (e.g., header split, ring sizes) were applied one by one. Furthermore, if the interface was down, any changes made via ethtool were discarded instead of being saved for the next time the interface was brought up. This patch centralizes the configuration logic. Individual functions like `gve_set_hsplit_config` are modified to only validate and stage changes in a temporary config struct. The main `gve_set_ringparam` function now gathers all staged changes and applies them as a single, combined configuration: 1. If the interface is up, it calls `gve_adjust_config` once. 2. If the interface is down, it saves the settings directly to the driver's private struct, ensuring they persist and are used when the interface is brought back up. Signed-off-by: Ankit Garg <nktgrg@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Jordan Rhee <jordanrhee@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Link: https://patch.msgid.link/20251017012614.3631351-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15	gve: Check valid ts bit on RX descriptor before hw timestamping	Tim Hostetler	-7/+16
	The device returns a valid bit in the LSB of the low timestamp byte in the completion descriptor that the driver should check before setting the SKB's hardware timestamp. If the timestamp is not valid, do not hardware timestamp the SKB. Cc: stable@vger.kernel.org Fixes: b2c7aeb49056 ("gve: Implement ndo_hwtstamp_get/set for RX timestamping") Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251014004740.2775957-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-21	gve: support unreadable netmem	Mina Almasry	-5/+35
	Declare PP_FLAG_ALLOW_UNREADABLE_NETMEM to turn on unreadable netmem support in GVE. We also drop any net_iov packets where header split is not enabled. We're unable to process packets where the header landed in unreadable netmem. Use page_pool_dma_sync_netmem_for_cpu in lieu of dma_sync_single_range_for_cpu to correctly handle unreadable netmem that should not be dma-sync'd. Disable rx_copybreak optimization if payload is unreadable netmem as that needs access to the payload. Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250818210507.3781705-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-19	gve: prevent ethtool ops after shutdown	Jordan Rhee	-0/+2
	A crash can occur if an ethtool operation is invoked after shutdown() is called. shutdown() is invoked during system shutdown to stop DMA operations without performing expensive deallocations. It is discouraged to unregister the netdev in this path, so the device may still be visible to userspace and kernel helpers. In gve, shutdown() tears down most internal data structures. If an ethtool operation is dispatched after shutdown(), it will dereference freed or NULL pointers, leading to a kernel panic. While graceful shutdown normally quiesces userspace before invoking the reboot syscall, forced shutdowns (as observed on GCP VMs) can still trigger this path. Fix by calling netif_device_detach() in shutdown(). This marks the device as detached so the ethtool ioctl handler will skip dispatching operations to the driver. Fixes: 974365e51861 ("gve: Implement suspend/resume/shutdown") Signed-off-by: Jordan Rhee <jordanrhee@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250818211245.1156919-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-30/+37
	Cross-merge networking fixes after downstream PR (net-6.16-rc8). Conflicts: drivers/net/ethernet/microsoft/mana/gdma_main.c 9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion") 755391121038 ("net: mana: Allocate MSI-X vectors dynamically") https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/icssg/icssg_prueth.h 6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG") ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-22	gve: implement DQO RX datapath and control path for AF_XDP zero-copy	Joshua Washington	-14/+149
	Add the RX datapath for AF_XDP zero-copy for DQ RDA. The RX path is quite similar to that of the normal XDP case. Parallel methods are introduced to properly handle XSKs instead of normal driver buffers. To properly support posting from XSKs, queues are destroyed and recreated, as the driver was initially making use of page pool buffers instead of the XSK pool memory. Expose support for AF_XDP zero-copy, as the TX and RX datapaths both exist. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-6-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-22	gve: implement DQO TX datapath for AF_XDP zero-copy	Joshua Washington	-3/+171
	In the descriptor clean path, a number of changes need to be made to accommodate out of order completions and double completions. The XSK stack can only handle completions being processed in order, as a single counter is incremented in xsk_tx_completed to sigify how many XSK descriptors have been completed. Because completions can come back out of order in DQ, a separate queue of XSK descriptors must be maintained. This queue keeps the pending packets in the order that they were written so that the descriptors can be counted in xsk_tx_completed in the same order. For double completions, a new pending packet state and type are introduced. The new type, GVE_TX_PENDING_PACKET_DQO_XSK, plays an anlogous role to pre-existing _SKB and _XDP_FRAME pending packet types for XSK descriptors. The new state, GVE_PACKET_STATE_XSK_COMPLETE, represents packets for which no more completions are expected. This includes packets which have received a packet completion or reinjection completion, as well as packets whose reinjection completion timer have timed out. At this point, such packets can be counted as part of xsk_tx_completed() and freed. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-5-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-22	gve: keep registry of zc xsk pools in netdev_priv	Joshua Washington	-4/+34
	Relying on xsk_get_pool_from_qid for getting whether zero copy is enabled on a queue is erroneous, as an XSK pool is registered in xp_assign_dev whether AF_XDP zero-copy is enabled or not. This becomes problematic when queues are restarted in copy mode, as all RX queues with XSKs will register a pool, causing the driver to exercise the zero-copy codepath. This patch adds a bitmap to keep track of which queues have zero-copy enabled. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-4-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-22	gve: merge xdp and xsk registration	Joshua Washington	-18/+10
	The existence of both of these xdp_rxq and xsk_rxq is redundant. xdp_rxq can be used in both the zero-copy mode and the copy mode case. XSK pool memory model registration is prioritized over normal memory model registration to ensure that memory model registration happens only once per queue. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-3-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-22	gve: deduplicate xdp info and xsk pool registration logic	Joshua Washington	-86/+83
	The XDP registration path currently has a lot of reused logic, leading changes to the codepaths to be unnecessarily complex. gve_reg_xsk_pool extracts the logic of registering an XSK pool with a queue into a method that can be used by both XDP_SETUP_XSK_POOL and gve_reg_xdp_info. gve_unreg_xdp_info is used to undo XDP info registration in the error path instead of explicitly unregistering the XDP info, as it is more complete and idempotent. This patch will be followed by other changes to the XDP registration logic, and will simplify those changes due to the use of common methods. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-2-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-21	gve: Fix stuck TX queue for DQ queue format	Praveen Kaligineedi	-30/+37
	gve_tx_timeout was calculating missed completions in a way that is only relevant in the GQ queue format. Additionally, it was attempting to disable device interrupts, which is not needed in either GQ or DQ queue formats. As a result, TX timeouts with the DQ queue format likely would have triggered early resets without kicking the queue at all. This patch drops the check for pending work altogether and always kicks the queue after validating the queue has not seen a TX timeout too recently. Cc: stable@vger.kernel.org Fixes: 87a7f321bb6a ("gve: Recover from queue stall due to missed IRQ") Co-developed-by: Tim Hostetler <thostet@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250717192024.1820931-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-09	gve: make IRQ handlers and page allocation NUMA aware	Bailey Forrest	-17/+37
	All memory in GVE is currently allocated without regard for the NUMA node of the device. Because access to NUMA-local memory access is significantly cheaper than access to a remote node, this change attempts to ensure that page frags used in the RX path, including page pool frags, are allocated on the NUMA node local to the gVNIC device. Note that this attempt is best-effort. If necessary, the driver will still allocate non-local memory, as __GFP_THISNODE is not passed. Descriptor ring allocations are not updated, as dma_alloc_coherent handles that. This change also modifies the IRQ affinity setting to only select CPUs from the node local to the device, preserving the behavior that TX and RX queues of the same index share CPU affinity. Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250707210107.2742029-1-jeroendb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-08	gve: global: fix "for a while" typo	Ahelenia Ziemiańska	-1/+1
	Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/5zsbhtyox3cvbntuvhigsn42uooescbvdhrat6s3d6rczznzg5@tarta.nabijaczleweli.xyz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-21	gve: add XDP_TX and XDP_REDIRECT support for DQ RDA	Joshua Washington	-33/+254
	This patch adds support for XDP_TX and XDP_REDIRECT for the DQ RDA queue format. To appropriately support transmission of XDP frames, a new pending packet type GVE_TX_PENDING_PACKET_DQO_XDP_FRAME is introduced for completion handling, as there was a previous assumption that completed packets would be SKBs. XDP_TX handling completes the basic XDP actions, so the feature is recorded accordingly. This patch also enables the ndo_xdp_xmit callback allowing DQ to handle XDP_REDIRECT packets originating from another interface. The XDP spinlock is moved to common TX ring fields so that it can be used in both GQ and DQ. Originally, it was in a section which was mutually exclusive for GQ and DQ. In summary, 3 XDP features are exposed for the DQ RDA queue format: 1) NETDEV_XDP_ACT_BASIC 2) NETDEV_XDP_ACT_NDO_XMIT 3) NETDEV_XDP_ACT_REDIRECT Note that XDP and header-data split are mutually exclusive for the time being due to lack of multi-buffer XDP support. This patch does not add support for the DQ QPL format. That is to come in a future patch series. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>