summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/meta
AgeCommit message (Collapse)AuthorLines
2026-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-1/+1
Merge in late fixes in preparation for the net-next PR. Conflicts: include/net/sch_generic.h a6bd339dbb351 ("net_sched: fix skb memory leak in deferred qdisc drops") ff2998f29f390 ("net: sched: introduce qdisc-specific drop reason tracing") https://lore.kernel.org/adz0iX85FHMz0HdO@sirena.org.uk drivers/net/ethernet/airoha/airoha_eth.c 1acdfbdb516b ("net: airoha: Fix VIP configuration for AN7583 SoC") bf3471e6e6c0 ("net: airoha: Make flow control source port mapping dependent on nbq parameter") Adjacent changes: drivers/net/ethernet/airoha/airoha_ppe.c f44218cd5e6a ("net: airoha: Reset PPE cpu port configuration in airoha_ppe_hw_init()") 7da62262ec96 ("inet: add ip_local_port_step_width sysctl to improve port usage distribution") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09eth: fbnic: Use wake instead of startMohsin Bashir-1/+1
fbnic_up() calls netif_tx_start_all_queues(), which only clears __QUEUE_STATE_DRV_XOFF. If qdisc backlog has accumulated on any TX queue before the reconfiguration (e.g. ring resize via ethtool -G), start does not call __netif_schedule() to kick the qdisc, so the pending backlog is never drained and the queue stalls. Switch to netif_tx_wake_all_queues(), which clears DRV_XOFF and also calls __netif_schedule() on every queue, ensuring any backlog that built up before the down/up cycle is promptly dequeued. Fixes: bc6107771bb4 ("eth: fbnic: Allocate a netdevice and napi vectors with queues") Signed-off-by: Mohsin Bashir <hmohsin@meta.com> Link: https://patch.msgid.link/20260408002415.2963915-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-5/+5
Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()") 0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic") 57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c 4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") 687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling") ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c 078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") 323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64Dimitri Daskalakis-1/+1
On systems with 64K pages, RX queues will be wedged if users set the descriptor count to the current minimum (16). Fbnic fragments large pages into 4K chunks, and scales down the ring size accordingly. With 64K pages and 16 descriptors, the ring size mask is 0 and will never be filled. 32 descriptors is another special case that wedges the RX rings. Internally, the rings track pages for the head/tail pointers, not page fragments. So with 32 descriptors, there's only 1 usable page as one ring slot is kept empty to disambiguate between an empty/full ring. As a result, the head pointer never advances and the HW stalls after consuming 16 page fragments. Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260401162848.2335350-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-31fbnic: Set Relaxed Ordering PCIe TLP attributes for DMA enginesAlexander Duyck-2/+27
Add ATTR CSR bit field definitions for the DMA engine TLP header configuration registers: AW_CFG: RDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9] AR_CFG: TDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9] These fields control the PCIe TLP attribute bits for outbound transactions from the TQM, RQM, RDE (write path), and TDE (read path) DMA engines. An enum is added with standard PCIe TLP attribute values: NS (No Snoop), RO (Relaxed Ordering), and IDO (ID-based Ordering). Read the PCIe Relaxed Ordering capability at probe time and store it in fbnic_dev. Configure Relaxed Ordering on the PCIe TLP attributes in fbnic_mbx_init_desc_ring when the capability is enabled. For the write path (AW_CFG), set RO on RDE and TQM attributes. For the read path (AR_CFG), set RO on all three attributes (TDE, RQM, TQM). This allows the PCIe fabric to reorder these transactions for improved throughput. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260327204445.3074446-1-dimitri.daskalakis1@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-27eth: fbnic: Fix debugfs output for BDQ's with page fragsDimitri Daskalakis-1/+1
The rings size_mask represents the number of pages, so we need to determine the number of page frags when dumping the descriptors. Fixes: df04373b0dab ("eth fbnic: Add debugfs hooks for tx/rx rings") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260324195123.3486219-3-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27eth: fbnic: Account for page fragments when updating BDQ tailDimitri Daskalakis-3/+3
FBNIC supports fixed size buffers of 4K. When PAGE_SIZE > 4K, we fragment the page across multiple descriptors (FBNIC_BD_FRAG_COUNT). When refilling the BDQ, the correct number of entries are populated, but tail was only incremented by one. So on a system with 64K pages, HW would get one descriptor refilled for every 16 we populate. Additionally, we program the ring size in the HW when enabling the BDQ. This was not accounting for page fragments, so on systems with 64K pages, the HW used 1/16th of the ring. Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260324195123.3486219-2-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10eth fbnic: Add mailbox self testMike Marciniszyn (Meta)-0/+142
The mailbox self test ensures the interface to and from the firmware is healthy by sending a test message and fielding the response from the firmware. This patch uses the new completion API [1][2] that allocates a completion structure, binds the completion to the TEST message, and uses a new FW parsing routine that wraps the completion processing around the TLV parser. Link: https://patch.msgid.link/20250516164804.741348-1-lee@trager.us [1] Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com [2] Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-6-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10eth fbnic: TLV support for use by MBX self testMike Marciniszyn (Meta)-0/+303
The TLV (Type-Value-Length) self uses a known set of data to create a TLV message. These routines support the MBX self test by creating the test messages and parsing the response message coming back from the firmware. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-5-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10eth fbnic: Add msix self testMike Marciniszyn (Meta)-0/+214
This function is meant to test the global interrupt registers and the PCIe IP MSI-X functionality. It essentially goes through and tests various combinations of the set, clear, and mask bits in order to verify the behavior is as we expect it to be from the driver. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-4-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10eth fbnic: Add register self testMike Marciniszyn (Meta)-0/+197
The register test will be used to verify hardware is behaving as expected. The test itself will have us writing to registers that should have no side effects due to us resetting after the test has been completed. While the test is being run the interface should be offline. This patch counts on the first patch of this series to export netif_open() and also ensures that the half close calls netif_close() to avoid deadlock. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-3-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05eth: fbnic: Fetch TX pause storm statsMohsin Bashir-0/+20
With pause storm protection in place, track the occurrence of pause storm events. Since there is a one-to-one mapping between pause storm interrupts and events, use the interrupt count to track this metric. ./ethtool -I -a eth0 Pause parameters for eth0: Autonegotiate: off RX: off TX: on Statistics: tx_pause_frames: 759657 rx_pause_frames: 0 tx_pause_storm_events: 219 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-5-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05eth: fbnic: Add protection against pause stormMohsin Bashir-0/+186
Add protection against TX pause storms. A pause storm occurs when a device fails to send received packets up to the stack. When a pause storm is detected (pause state persists beyond the configured timeout), the device stops sending the pause frames and begins dropping packets instead of back-pressuring. The timeout is configurable via ethtool tunable (pfc-prevention-tout) with a maximum value of 10485ms, and the default value of 500ms. Once the device transitions to the storm-detected state, the service task periodically attempts recovery, returning the device to normal operation to handle any subsequent pause storm episodes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-4-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds-1/+1
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook-1/+1
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-19eth: fbnic: Advertise supported XDP features.Dimitri Daskalakis-0/+2
Drivers are supposed to advertise the XDP features they support. This was missed while adding XDP support. Before: $ ynl --family netdev --dump dev-get ... {'ifindex': 3, 'xdp-features': set(), 'xdp-rx-metadata-features': set(), 'xsk-features': set()}, ... After: $ ynl --family netdev --dump dev-get ... {'ifindex': 3, 'xdp-features': {'basic', 'rx-sg'}, 'xdp-rx-metadata-features': set(), 'xsk-features': set()}, ... Fixes: 168deb7b31b2 ("eth: fbnic: Add support for XDP_TX action") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260218030620.3329608-1-dimitri.daskalakis1@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-18eth: fbnic: Add validation for MTU changesDimitri Daskalakis-0/+18
Increasing the MTU beyond the HDS threshold causes the hardware to fragment packets across multiple buffers. If a single-buffer XDP program is attached, the driver will drop all multi-frag frames. While we can't prevent a remote sender from sending non-TCP packets larger than the MTU, this will prevent users from inadvertently breaking new TCP streams. Traditionally, drivers supported XDP with MTU less than 4Kb (packet per page). Fbnic currently prevents attaching XDP when MTU is too high. But it does not prevent increasing MTU after XDP is attached. Fixes: 1b0a3950dbd4 ("eth: fbnic: Add XDP pass, drop, abort support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2026-02-17eth: fbnic: set DMA_HINT_L4 for all flowsBobby Eshleman-3/+5
fbnic always advertises ETHTOOL_TCP_DATA_SPLIT_ENABLED via ethtool .get_ringparam. To enable proper splitting for all flow types, even for IP/Ethernet flows, this patch sets DMA_HINT_L4 unconditionally for all RSS and NFC flow steering rules. According to the spec, L4 falls back to L3 if no valid L4 is found, and L3 falls back to L2 if no L3 is found. This makes sure that the correct header boundary is used regardless of traffic type. This is important for zero-copy use cases where we must ensure that all ZC packets are split correctly. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-3-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17eth: fbnic: increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytesBobby Eshleman-1/+1
Increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytes. The previous minimum was too small to guarantee that very long L2+L3+L4 headers always fit within the header buffer. When EN_HDR_SPLIT is disabled and a packet exceeds MAX_HEADER_BYTES, splitting occurs at that byte offset instead of the header boundary, resulting in some of the header landing in the payload page. The increased minimum ensures headers always fit with the MAX_HEADER_BYTES cut off and land in the header page. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-2-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17eth: fbnic: set FBNIC_QUEUE_RDE_CTL0_EN_HDR_SPLIT on RDE_CTL0Bobby Eshleman-11/+14
Fix EN_HDR_SPLIT configuration by writing the field to RDE_CTL0 instead of RDE_CTL1. Because drop mode configuration and header splitting enablement both use RDE_CTL0, we consolidate these configurations into the single function fbnic_config_drop_mode. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-1-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-13fbnic: close fw_log race between users and teardownChengfeng Ye-10/+12
Fixes a theoretical race on fw_log between the teardown path and fw_log write functions. fw_log is written inside fbnic_fw_log_write() and can be reached from the mailbox handler fbnic_fw_msix_intr(), but fw_log is freed before IRQ/MBX teardown during cleanup, resulting in a potential data race of dereferencing a freed/null variable. Possible Interleaving Scenario: CPU0: fbnic_fw_msix_intr() // Entry fbnic_fw_log_write() if (fbnic_fw_log_ready()) // true ... preempt ... CPU1: fbnic_remove() // Entry fbnic_fw_log_free() vfree(log->data_start); log->data_start = NULL; CPU0: continues, walks log->entries or writes to log->data_start The initialization also has an incorrect order problem, as the fw_log is currently allocated after MBX setup during initialization. Fix the problems by adjusting the synchronization order to put initialization in place before the mailbox is enabled, and not cleared until after the mailbox has been disabled. Fixes: ecc53b1b46c89 ("eth: fbnic: Enable firmware logging") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Link: https://patch.msgid.link/20260211191329.530886-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-30eth fbnic: Add debugfs hooks for tx/rx ringsMike Marciniszyn (Meta)-1/+400
Add debugfs hooks to display tx/rx rings for each napi vector. Note that the cloning mechanism in fbnic_ethtool.c for configuration changes protects against concurrency issues with simultaneous config changes along with debugs ring accesses. The configuration switch builds up the new configuration offline, takes the current config down, which removes the debugfs nv files, and switches to the new configuration. The new configuration is brought up which brings the debugfs files back on top of the new configuration rings. The interaction with fbnic_queue_stop() and fbnic_queue_start() will similarly delete and add the files for the indicated vector. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260127200644.11640-3-mike.marciniszyn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-30eth fbnic: Add debugfs hooks for firmware mailboxMike Marciniszyn (Meta)-1/+50
This patch adds reporting the Rx and Tx information interfacing with the firmware. The result of reading fbnic/fw_mbx is: Rx Rdy: 1 Head: 11 Tail: 10 Idx Len E Addr F H Raw ---------------------------------- 00 4096 0 000101fea000 0 1 1000000101fea001 01 4096 0 000101feb000 0 1 1000000101feb001 . . . 15 4096 0 000101fe9000 0 1 1000000101fe9001 Tx Rdy: 1 Head: 4 Tail: 4 Idx Len E Addr F H Raw ---------------------------------- 00 0004 1 00010321b000 1 1 000440010321b003 01 0004 1 00010228d000 1 1 000440010228d003 . . . 15 0004 1 00010321b000 1 1 000440010321b003 Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260127200644.11640-2-mike.marciniszyn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23net: fbnic: convert to use .get_rx_ring_countBreno Leitao-4/+8
Use the newly introduced .get_rx_ring_count ethtool ops callback instead of handling ETHTOOL_GRXRINGS directly in .get_rxnfc(). Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20260122-grxring_big_v4-v2-5-94dbe4dcaa10@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Update RX mbox timeout valueMohsin Bashir-5/+14
While waiting for completions on read requests, driver is using different timeout values for different messages. Make use of a single timeout value. Introduce a wrapper function to handle the wait, which also simplify maintaining the 80 char line limit. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Remove retry supportMohsin Bashir-19/+5
The driver retries sensor read requests from firmware, but this is unnecessary. A functioning firmware should respond to each request within the timeout period. Remove the retry logic and set the timeout to the sum of all retry timeouts. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-5-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Reuse RX mailbox pagesMohsin Bashir-9/+16
Currently, the RX mailbox frees and reallocates a page for each received message. Since FW Rx messages are processed synchronously, and nothing hold these pages (unlike skbs which we hand over to the stack), reuse the pages and put them back on the Rx ring. Now that we ensure the ring is always fully populated we don't have to worry about filling it up after partial population during init, either. Update fbnic_mbx_process_rx_msgs() to recycle pages after message processing. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-4-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Allocate all pages for RX mailboxMohsin Bashir-5/+9
Now that memory is allocated with GFP_KERNEL, allocation failures should be extremely rare. Ensure the FW communication ring is always fully populated with free pages, and hard fail initialization otherwise. This enables simplifications in next patches. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-3-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Use GFP_KERNEL to allocting mbx pagesMohsin Bashir-2/+1
Replace GFP_ATOMIC with GFP_KERNEL for mailbox RX page allocation. Since interrupt handler is threaded GFP_KERNEL is a safe option to reduce allocation failures. Also remove __GFP_NOWARN so the kernel reports a warning on allocation failure to aid debugging. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-2-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14net: add bare bone queue configsPavel Begunkov-2/+6
We'll need to pass extra parameters when allocating a queue for memory providers. Define a new structure for queue configurations, and pass it to qapi callbacks. It's empty for now, actual parameters will be added in following patches. Configurations should persist across resets, and for that they're default-initialised on device registration and stored in struct netdev_rx_queue. We also add a new qapi callback for defaulting a given config. It must be implemented if a driver wants to use queue configs and is optional otherwise. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2025-12-04Merge tag 'pci-v6.19-changes' of ↵Linus Torvalds-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan Williams) - Switch vmd from custom domain number allocator to the common allocator to prevent a potential race with new non-VMD buses (Dan Williams) - Enable Precision Time Measurement (PTM) only if device advertises support for a relevant role, to prevent invalid PTM Requests that cause ACS violations that are reported as AER Uncorrectable Non-Fatal errors (Mika Westerberg) Resource management: - Prevent resource tree corruption when BAR resize fails (Ilpo Järvinen) - Restore BARs to the original size if a BAR resize fails (Ilpo Järvinen) - Remove BAR release from BAR resize attempts by the xe, i915, and amdgpu drivers so the PCI core can restore BARs if the resize fails (Ilpo Järvinen) - Move Resizable BAR code to rebar.c (Ilpo Järvinen) - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo Järvinen) - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo Järvinen) Power management and error handling: - For drivers using PCI legacy suspend, save config state at suspend so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - For devices with no driver or a driver that lacks power management, save config state at hibernate so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - Save device config space on device addition, before driver binding, so error recovery works more reliably (Lukas Wunner) - Drop pci_save_state() from several drivers that no longer need it since the PCI core always does it and pci_restore_state() no longer invalidates the saved state (Lukas Wunner) - Document use of pci_save_state() by drivers to capture the state they want restored during error recovery (Lukas Wunner) Power control: - Add a struct pci_ops.assert_perst() function pointer to assert/deassert PCIe PERST# and implement it for the qcom driver (Krishna Chaitanya Chundru) - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe switch, which must be held in reset after poweron so the pwrctrl driver can configure the switch via I2C before bringing up the links (Krishna Chaitanya Chundru) Endpoint framework: - Convert the endpoint doorbell test to use a threaded IRQ to fix a 'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri) - Add endpoint VNTB MSI doorbell support to reduce latency between host and endpoint (Frank Li) New native PCIe controller drivers: - Add CIX Sky1 host controller DT binding and driver (Hans Zhang) - Add NXP S32G host controller DT binding and driver (Vincent Guittot) - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu Beznea) - Add SpacemiT K1 host controller DT binding and driver (Alex Elder) Amlogic Meson PCIe controller driver: - Update DT binding to name DBI region 'dbi', not 'elbi', and update driver to support both (Manivannan Sadhasivam) Apple PCIe controller driver: - Move struct pci_host_bridge allocation from pci_host_common_init() to callers, which significantly simplifies pcie-apple (Marc Zyngier) Broadcom STB PCIe controller driver: - Disable advertising ASPM L0s support correctly (Jim Quinlan) - Add a panic/die handler to print diagnostic info in case PCIe caused an unrecoverable abort (Jim Quinlan) Cadence PCIe controller driver: - Add module support for Cadence platform host and endpoint controller driver (Manikandan K Pillai) - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare for new CIX Sky1 driver (Manikandan K Pillai) MediaTek PCIe controller driver: - Convert DT binding to YAML schema (Christian Marangi) - Add Airoha AN7583 DT compatible and driver support (Christian Marangi) Qualcomm PCIe controller driver: - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu) - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280, sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT schemas (Krzysztof Kozlowski) - Look up OPP using both frequency and data rate (not just frequency) so RPMh votes can account for both (Krishna Chaitanya Chundru) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi) STMicroelectronics STM32MP25 PCIe controller driver: - Fix a race between link training and endpoint register initialization (Christian Bruel) - Align endpoint allocations to match the ATU requirements (Christian Bruel) Synopsys DesignWare PCIe controller driver: - Clear L1 PM Substate Capability 'Supported' bits unless glue driver says it's supported, which prevents users from enabling non-working L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas) - Remove now-superfluous L1SS disable code from tegra194 (Bjorn Helgaas) - Configure L1SS support in dw-rockchip when DT says 'supports-clkreq' (Shawn Lin) TI Keystone PCIe controller driver: - Fail the probe instead of silently succeeding if ks_pcie_of_data didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli) - Make keystone buildable as a loadable module, except on ARM32 where hook_fault_code() is __init (Siddharth Vadapalli)" * tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits) MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer PCI: sky1: Add PCIe host support for CIX Sky1 dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings PCI: cadence: Add support for High Perf Architecture (HPA) controller MAINTAINERS: Add NXP S32G PCIe controller driver maintainer PCI: s32g: Add NXP S32G PCIe controller driver (RC) PCI: dwc: Add register and bitfield definitions dt-bindings: PCI: s32g: Add NXP S32G PCIe controller PCI: Add Renesas RZ/G3S host controller driver PCI: host-generic: Move bridge allocation outside of pci_host_common_init() dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding PCI: Validate pci_rebar_size_supported() input Documentation: PCI: Amend error recovery doc with pci_save_state() rules treewide: Drop pci_save_state() after pci_restore_state() PCI/ERR: Ensure error recoverability at all times PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths PCI: dw-rockchip: Configure L1SS support PCI: tegra194: Remove unnecessary L1SS disable code ...
2025-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-1/+1
Conflicts: net/xdp/xsk.c 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number") 8da7bea7db69 ("xsk: add indirect call for xsk_destruct_skb") 30ed05adca4a ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27fbnic: Replace use of internal PCS w/ Designware XPCSAlexander Duyck-63/+55
As we have exposed the PCS registers via the SWMII we can now start looking at connecting the XPCS driver to those registers and let it mange the PCS instead of us doing it directly from the fbnic driver. For now this just gets us the ability to detect link. The hope is in the future to add some of the vendor specific registers to begin enabling XPCS configuration of the interface. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374325295.959489.14521115864034905277.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Add SW shim for MDIO interface to PMD and PCSAlexander Duyck-0/+205
In order for us to support a PCS device we need to add an MDIO bus to allow the drivers to have access to the registers for the device. This change adds such an interface. The interface will consist of 2 PHY addrs, the first one consisting of a PMD and PCS, and the second just being a PCS. There is a need for 2 PHYs addrs due to the fact that in order to support the 50GBase-CR2 mode we will need to access and configure the PCS vendor registers and RSFEC registers from the second lane identical to the first. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374324532.959489.15389723111560978054.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Add handler for reporting link down event statisticsAlexander Duyck-0/+9
We were previously not displaying the number of link_down_events tracked by the device. With this change we should now be able to display the value. The value itself tracks the calls from the phylink interface to the mac_link_down call. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374323824.959489.6915296616773178954.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Add logic to track PMD state via MAC/PCS signalsAlexander Duyck-52/+175
One complication with the design of our part is that the PMD doesn't provide a direct signal to the host. Instead we have visibility to signals that the PCS provides to the MAC that allow us to check the link state through that. We will need to account for several things in the PMD and firmware when managing the link. Specifically when the link first starts to come up the PMD will cause the link to flap. This is due to the firmware starting a training cycle when the link is first detected. This will cause link flapping if we were to immediately report link up when the PCS first detects it. To address that we are adding a pmd_state variable that is meant to be a countdown of sorts indicating the state of the PMD. If the link is down or has been reconfigured the PMD will start out in the initialize state. By default the link is assumed to be in the SEND_DATA state if it is available on initial link inspection. If link is detected while in the initialize state the PMD state will switch to training, and if after 4 seconds the link is still stable we will transition to link_ready, and finally the send_data state. With this we can avoid link flapping when a cable is first connected to the NIC. One side effect of this is that we need to pull the link state away from the PCS. For now we use a union of the PCS link state register value and the pmd_state. The plan is to add a PMD register to report the pmd_state to the phylink interface. With that we can then look at switching over to the use of the XPCS driver for fbnic instead of having an internal one. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374323107.959489.14951134213387615059.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Rename PCS IRQ to MAC IRQ as it is actually a MAC interruptAlexander Duyck-32/+32
Throughout several spots in the code I had called out the IRQ as being related to the PCS. However the actual IRQ is a part of the MAC and it is just exposing PCS data. To more accurately reflect the owner of the calls this change makes it so that we rename the functions and values that are taking in the interrupt value and processing it to reflect that it is a MAC call and not a PCS one. This change is mostly motivated by the fact that we will be moving the handling of this interrupt from being PCS focused to being more PMA/PMD focused as this will drive the phydev driver that I am adding instead of driving the PCS directly. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374322373.959489.12018231545479053860.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-26eth: fbnic: Fix counter roll-over issueMohsin Bashir-1/+1
Fix a potential counter roll-over issue in fbnic_mbx_alloc_rx_msgs() when calculating descriptor slots. The issue occurs when head - tail results in a large positive value (unsigned) and the compiler interprets head - tail - 1 as a signed value. Since FBNIC_IPC_MBX_DESC_LEN is a power of two, use a masking operation, which is a common way of avoiding this problem when dealing with these sort of ring space calculations. Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20251125211704.3222413-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25drivers: net: fbnic: Return the true error in fbnic_alloc_napi_vectors.Dimitri Daskalakis-1/+1
The error case in fbnic_alloc_napi_vectors defaulted to returning ENOMEM. This can mask the true error case, causing confusion. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20251124200518.1848029-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24treewide: Drop pci_save_state() after pci_restore_state()Lukas Wunner-1/+0
In 2009, commit c82f63e411f1 ("PCI: check saved state before restore") changed the behavior of pci_restore_state() such that it became necessary to call pci_save_state() afterwards, lest recovery from subsequent PCI errors fails. The commit has just been reverted and so all the pci_save_state() after pci_restore_state() calls that have accumulated in the tree are now superfluous. Drop them. Two drivers chose a different approach to achieve the same result: drivers/scsi/ipr.c and drivers/net/ethernet/intel/e1000e/netdev.c set the pci_dev's "state_saved" flag to true before calling pci_restore_state(). Drop this as well. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # qat Link: https://patch.msgid.link/c2b28cc4defa1b743cf1dedee23c455be98b397a.1760274044.git.lukas@wunner.de
2025-11-20eth: fbnic: access @pp through netmem_desc instead of pageByungchul Park-1/+2
To eliminate the use of struct page in page pool, the page pool users should use netmem descriptor and APIs instead. Make fbnic access @pp through netmem_desc instead of page. Signed-off-by: Byungchul Park <byungchul@sk.com> Link: https://patch.msgid.link/20251120011118.73253-1-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17eth: fbnic: Configure RDE settings for pause frameMohsin Bashir-4/+28
fbnic supports pause frames. When pause frames are enabled presumably user expects lossless operation from the NIC. Make sure we configure RDE (Rx DMA Engine) to DROP_NEVER mode to avoid discards due to delays in fetching Rx descriptors from the host. While at it enable DROP_NEVER when NIC only has a single queue configured. In this case the NIC acts as a FIFO so there's no risk of head-of-line blocking other queues by making RDE wait. If pause is disabled this just moves the packet loss from the DMA engine to the Rx buffer. Remove redundant call to fbnic_config_drop_mode_rcq(), introduced by commit 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free"). This call does not add value as fbnic_enable_rcq(), which is called immediately afterward, already handles this. Although we do not support autoneg at this time, preserve tx_pause in .mac_link_up instead of fbnic_phylink_get_pauseparam() Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251113232610.1151712-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-22eth: fbnic: fix integer overflow warning in TLV_MAX_DATA definitionPei Xiao-1/+1
The TLV_MAX_DATA macro calculates (PAGE_SIZE - 512) which can exceed the maximum value of a 16-bit unsigned integer on architectures with large page sizes, causing compiler warnings: drivers/net/ethernet/meta/fbnic/fbnic_tlv.h:83:24: warning: conversion from 'long unsigned int' to 'short unsigned int' changes value from '261632' to '65024' [-Woverflow] Fix this by explicitly masking the result to 16 bits using bitwise AND with 0xFFFF, ensuring the value fits within the expected data type while maintaining the intended behavior for normal page sizes. This preserves the existing functionality while eliminating the compiler warning and potential undefined behavior from integer truncation. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510190832.3SQkTCHe-lkp@intel.com/ Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Link: https://patch.msgid.link/182b9d0235d044d69d7a57c1296cc6f46e395beb.1761039651.git.xiaopei01@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-16net: fbnic: Allow builds for all 64 bit architecturesDimitri Daskalakis-1/+1
This enables aarch64 testing, but there's no reason we cannot support other architectures. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251013211449.1377054-3-dimitri.daskalakis1@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-16net: fbnic: Fix page chunking logic when PAGE_SIZE > 4KDimitri Daskalakis-0/+1
The HW always works on a 4K page size. When the OS supports larger pages, we fragment them across multiple BDQ descriptors. We were not properly incrementing the descriptor, which resulted in us specifying the last chunks id/addr and then 15 zero descriptors. This would cause packet loss and driver crashes. This is not a fix since the Kconfig prevents use outside of x86. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251013211449.1377054-2-dimitri.daskalakis1@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-14eth: fbnic: fix various typos in comments and stringsAlok Tiwari-7/+7
Fix several minor typos and grammatical errors in comments and log (in fbnic firmware, PCI, and time modules) Changes include: - "cordeump" -> "coredump" - "of" -> "off" in RPC config comment - "healty" -> "healthy" in firmware heartbeat comment - "Firmware crashed detected!" -> "Firmware crash detected!" - "The could be caused" -> "This could be caused" - "lockng" -> "locking" in fbnic_time.c Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251013160507.768820-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-09eth: fbnic: fix reporting of alloc_failed qstatsJakub Kicinski-11/+58
Rx processing under normal circumstances has 3 rings - 2 buffer rings (heads, payloads) and a completion ring. All the rings have a struct fbnic_ring. Make sure we expose alloc_failed counter from the buffer rings, previously only the alloc_failed from the completion ring was reported, even tho all ring types may increment this counter (buffer rings in __fbnic_fill_bdq()). This makes the pp_alloc_fail.py test pass, it expects the qstat to be incrementing as page pool injections happen. Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 67dc4eb5fc92 ("eth: fbnic: report software Rx queue stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251007232653.2099376-7-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09eth: fbnic: fix saving stats from XDP_TX rings on closeJakub Kicinski-6/+6
When rings are freed - stats get added to the device level stat structs. Save the stats from the XDP_TX ring just as Tx stats. Previously they would be saved to Rx and Tx stats. So we'd not see XDP_TX packets as Rx during runtime but after an down/up cycle the packets would appear in stats. Correct the helper used by ethtool code which does a runtime config switch. Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 5213ff086344 ("eth: fbnic: Collect packet statistics for XDP") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251007232653.2099376-4-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09eth: fbnic: fix accounting of XDP packetsJakub Kicinski-15/+15
Make XDP-handled packets appear in the Rx stats. The driver has been counting XDP_TX packets on the Tx ring, but there wasn't much accounting on the Rx side (the Rx bytes appear to be incremented on XDP_TX but XDP_DROP / XDP_ABORT are only counted as Rx drops). Counting XDP_TX packets (not just bytes) in Rx stats looks like a simple bug of omission. The XDP_DROP handling appears to be intentional. Whether XDP_DROP packets should be counted in interface-level Rx stats is a bit unclear historically. When we were defining qstats, however, we clarified based on operational experience that in this context: name: rx-packets doc: | Number of wire packets successfully received and passed to the stack. For drivers supporting XDP, XDP is considered the first layer of the stack, so packets consumed by XDP are still counted here. fbnic does not obey this requirement. Since XDP support has been added in current release cycle, instead of splitting interface and qstat handling - make them both follow the qstat definition. Another small tweak here is that we count bytes as received on the wire rather than post-XDP bytes (xdp_get_buff_len() vs skb->len). Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 5213ff086344 ("eth: fbnic: Collect packet statistics for XDP") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251007232653.2099376-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-09eth: fbnic: fix missing programming of the default descriptorJakub Kicinski-0/+8
XDP_TX typically uses no offloads. To optimize XDP we added a "default descriptor" feature to the chip, which allows us to send XDP frames with just the buffer descriptors (DMA address + length). All the metadata descriptors are derived from the queue config. Commit under Fixes missed adding setting the defaults up when transplanting the code from the prototype driver. Importantly after reset the "request completion" bit is not set. Packets still get sent but there's no completion, so ring is not cleaned up. We can send one ring's worth of packets and then will start dropping all frames that got the XDP_TX action from the XDP prog. Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 168deb7b31b2 ("eth: fbnic: Add support for XDP_TX action") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251007232653.2099376-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>