aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-11RDMA/ionic: Implement device stats opsAbhijit Gangurde4-0/+554
Implement device stats operations for hw stats and qp stats. Co-developed-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://patch.msgid.link/20250903061606.4139957-14-abhijit.gangurde@amd.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/ionic: Register device ops for miscellaneous functionalityAbhijit Gangurde4-0/+217
Implement idbdev ops for device and port information. Co-developed-by: Andrew Boyer <andrew.boyer@amd.com> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com> Co-developed-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://patch.msgid.link/20250903061606.4139957-13-abhijit.gangurde@amd.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/ionic: Register device ops for datapathAbhijit Gangurde5-0/+1534
Implement device supported verb APIs for datapath. Co-developed-by: Andrew Boyer <andrew.boyer@amd.com> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com> Co-developed-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://patch.msgid.link/20250903061606.4139957-12-abhijit.gangurde@amd.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/ionic: Register device ops for control pathAbhijit Gangurde6-9/+3626
Implement device supported verb APIs for control path. Co-developed-by: Andrew Boyer <andrew.boyer@amd.com> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com> Co-developed-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://patch.msgid.link/20250903061606.4139957-11-abhijit.gangurde@amd.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/ionic: Create device queues to support admin operationsAbhijit Gangurde9-0/+2300
Setup RDMA admin queues using device command exposed over auxiliary device and manage these queues using ida. Co-developed-by: Andrew Boyer <andrew.boyer@amd.com> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com> Co-developed-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://patch.msgid.link/20250903061606.4139957-10-abhijit.gangurde@amd.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/ionic: Register auxiliary module for ionic ethernet adapterAbhijit Gangurde4-0/+314
Register auxiliary module to create ibdevice for ionic ethernet adapter. Co-developed-by: Andrew Boyer <andrew.boyer@amd.com> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com> Co-developed-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://patch.msgid.link/20250903061606.4139957-9-abhijit.gangurde@amd.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Call strscpy() with correct size argumentThorsten Blum1-2/+1
In bnxt_re_register_ib(), strscpy() is called with the length of the source string rather than the size of the destination buffer. This is fine as long as the destination buffer is larger than the source string, but we should still use the destination buffer size instead to call strscpy() as intended. And since 'node_desc' has a fixed size, we can safely omit the size argument and let strscpy() infer it using sizeof(). Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20250901150038.227036-2-thorsten.blum@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/core: fix "truely"->"truly"Xichao Zhao1-1/+1
Trivial fix to spelling mistake in comment text. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Link: https://patch.msgid.link/20250827120007.489496-1-zhao.xichao@vivo.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/rdmavt: Use int type to store negative error codesQianfeng Rong1-7/+6
Change 'ret' from u32 to int in alloc_qpn() to store -EINVAL, and remove the 'bail' label as it simply returns 'ret'. Storing negative error codes in an u32 causes no runtime issues, but it's ugly as pants, Change 'ret' from u32 to int type - this change has no runtime impact. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Link: https://patch.msgid.link/20250826150556.541440-1-rongqianfeng@vivo.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/mlx5: Fix page size bitmap calculation for KSM modeEdward Srouji1-0/+4
When using KSM (Key Scatter-gather Memory) access mode, the HW requires the IOVA to be aligned to the selected page size. Without this alignment, the HW may not function correctly. Currently, mlx5_umem_mkc_find_best_pgsz() does not filter out page sizes that would result in misaligned IOVAs for KSM mode. This can lead to selecting page sizes that are incompatible with the given IOVA. Fix this by filtering the page size bitmap when in KSM mode, keeping only page sizes to which the IOVA is aligned to. Fixes: fcfb03597b7d ("RDMA/mlx5: Align mkc page size capability check to PRM") Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20250824144839.154717-1-edwards@nvidia.com Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Remove unnecessary condition checksKalesh AP1-18/+1
The check for "rdev" and "en_dev" pointer validity always return false. Remove them. Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-11-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Use firmware provided message timeout valueSaravanan Vajravel2-14/+22
Before this patch, we used a hardcoded value of 500 msec as the default value for L2 firmware message response timeout. With this commit, the driver is using the firmware timeout value from the firmware. As part of this change moved bnxt_re_query_hwrm_intf_version() to bnxt_re_setup_chip_ctx() so that timeout value is queries before sending first command. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Co-developed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-10-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Initialize fw with roce_mirror supportSaravanan Vajravel5-3/+18
- Check FW capability for roce_mirror support. - Initialize FW with roce_mirror support. - When modifying QP, use unique GID for sgid incase of RawEth QP. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Anantha Prabhu <anantha.prabhu@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-9-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Add support for flow create/destroySaravanan Vajravel7-1/+173
- Added support for create_flow and destroy_flow verbs. These verbs are used on RawEth QP to add a specific flow action. - To support TCP dump on RoCE, added IB_FLOW_ATTR_SNIFFER attribute. - In create_flow verb, driver allocates mirror_vnic and configure it with RawEth QP. Once this is done, driver will enable mirroring. - In destroy_flow, driver will disable mirroring and free the mirror vnic. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-8-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Add support for mirror vnicSaravanan Vajravel2-0/+71
Added below support: - Querying the pre-reserved mirror_vnic_id - Allocating/freeing mirror_vnic - Configuring mirror vnic to associate it with raw qp These functions will be used in the subsequent patch in this series. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-7-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Add support for unique GIDSaravanan Vajravel5-4/+105
- RawEth QP requires unique GID so that per function stats_ctx is not polluted by packets mirrored to RoCE vnic. - Added support to add unique GID when RawEth type QP is created. - Added support to destroy unique GID when RawEth type QP is destroyed. - Allocated exclusive stats_ctx to use for RawEth type QP. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-6-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Refactor stats context memory allocationKalesh AP3-25/+47
Moved the stats context allocation logic to a new function. The stats context memory allocation code has been moved from bnxt_qplib_alloc_hwctx() to the newly added bnxt_re_get_stats_ctx() function. Also, the code to send the firmware command has been moved. This patch is in preparation for other patches in this series. There is no functional changes intended. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-5-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Refactor hw context memory allocationKalesh AP3-24/+20
This patch is in preparation for other patches in this series. There is no functional changes intended. 1. Rename bnxt_qplib_alloc_ctx() to bnxt_qplib_alloc_hwctx(). 2. Rename bnxt_qplib_free_ctx() to bnxt_qplib_free_hwctx(). 3. Reduce the number of arguments of bnxt_qplib_alloc_hwctx() by moving a check outside of it. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-4-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Add data structures for RoCE mirror supportSaravanan Vajravel5-0/+18
Added data structures required for supporting mirroring on RoCE device. Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250822040801.776196-3-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-11RDMA/bnxt_re: Enhance a log message when bnxt_re_register_netdev failsKalesh AP1-2/+3
Make a error log message more user friendly. When bnxt_re_register_netdev()() fails, the current log does not convey much information. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250814112555.221665-10-kalesh-anakkur.purayil@broadcom.com Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-08RDMA/bnxt_re: Delete always true SGID table checkKalesh AP1-1/+1
The "sgid_tbl" inside "rdev->qplib_res" is a static memory. Hence, the check always return true. Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250814112555.221665-9-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-08RDMA/bnxt_re: Report udp source port for flow_label in bnxt_re_query_qpAbhishek Mohapatra4-2/+5
The firmware doesn't capture the flow_label. Therefore the value that's always returned by qplib_qp->ah.flow_label is 0 whenever a qp is created. And as per IB spec, udp source port can be reported for flow_label. Hence reported udp source port for flow_label in bnxt_re_query_qp by populating the value of qplib_qp->udp_sport into qp_attr->ah_attr.grh.flow_label. Signed-off-by: Abhishek Mohapatra <abhishek.mohapatra@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://patch.msgid.link/20250814112555.221665-8-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-08RDMA/bnxt_re: RoCE related hardware counters updateVasuthevan Maheswaran2-17/+37
Support for new hardware counters added, and existing hardware counters have been modified according to the design documents for compatibility with open-source monitoring agents. Signed-off-by: Vasuthevan Maheswaran <vasuthevan.maheswaran@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250814112555.221665-7-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-08RDMA/bnxt_re: Optimize bnxt_qplib_get_dev_attr functionDamodharam Ammepalli3-8/+9
Optimize bnxt_qplib_get_dev_attr() by separating out query_version which uses creq notification method to host. Due to serialization of cmdq by firmware, expected latency in response to heavy multi-threaded rdma applications might be observed. This patch separates the version_query logic out of device attribute query and called only during rdma driver init. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250814112555.221665-6-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-08RDMA/bnxt_re: RoCE Driver Dynamic Debug for HWRM'sChenna Arnoori1-0/+2
Add Linux kernel dynamic debug prints to ROCE HWRM's. Dumping request and response buffers for the ROCE HWRM's using print_hex_dump_bytes() to be part of kernel dynmic debug. Signed-off-by: Chenna Arnoori <chenna.arnoori@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://patch.msgid.link/20250814112555.221665-4-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-08RDMA/bnxt_re: Show srq_limit in fill_res_srq_entry hookKashyap Desai1-0/+2
Added srq_limit in rdma show resource srq hook. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250814112555.221665-3-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski11-66/+55
Cross-merge networking fixes after downstream PR (net-6.17-rc4). No conflicts. Adjacent changes: drivers/net/ethernet/intel/idpf/idpf_txrx.c 02614eee26fb ("idpf: do not linearize big TSO packets") 6c4e68480238 ("idpf: remove obsolete stashing code") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-25RDMA/erdma: Use vcalloc() instead of vzalloc()Qianfeng Rong1-1/+1
Replace vzalloc() with vcalloc() in vmalloc_to_dma_addrs(). As noted in the kernel documentation [1], open-coded multiplication in allocator arguments is discouraged because it can lead to integer overflow. Use vcalloc() to gain built-in overflow protection, making memory allocation safer when calculating allocation size compared to explicit multiplication. [1]: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://patch.msgid.link/r/20250821072209.510348-1-rongqianfeng@vivo.com Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-25IB/mlx5: Fix obj_type mismatch for SRQ event subscriptionsOr Har-Toov1-0/+1
Fix a bug where the driver's event subscription logic for SRQ-related events incorrectly sets obj_type for RMP objects. When subscribing to SRQ events, get_legacy_obj_type() did not handle the MLX5_CMD_OP_CREATE_RMP case, which caused obj_type to be 0 (default). This led to a mismatch between the obj_type used during subscription (0) and the value used during notification (1, taken from the event's type field). As a result, event mapping for SRQ objects could fail and event notification would not be delivered correctly. This fix adds handling for MLX5_CMD_OP_CREATE_RMP in get_legacy_obj_type, returning MLX5_EVENT_QUEUE_TYPE_RQ so obj_type is consistent between subscription and notification. Fixes: 759738537142 ("IB/mlx5: Enable subscription for device events over DEVX") Link: https://patch.msgid.link/r/8f1048e3fdd1fde6b90607ce0ed251afaf8a148c.1755088962.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-25RDMA/mlx5: Fix vport loopback forcing for MPV devicePatrisious Haddad2-4/+16
Previously loopback for MPV was supposed to be permanently enabled, however other driver flows were able to over-ride that configuration and disable it. Add force_lb parameter that indicates that loopback should always be enabled which prevents all other driver flows from disabling it. Fixes: a9a9e68954f2 ("RDMA/mlx5: Fix vport loopback for MPV device") Link: https://patch.msgid.link/r/cfc6b1f0f99f8100b087483cc14da6025317f901.1755088808.git.leon@kernel.org Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-25RDMA/mlx5: Better estimate max_qp_wr to reflect WQE countOr Har-Toov1-1/+47
The mlx5 driver currently derives max_qp_wr directly from the log_max_qp_sz HCA capability: props->max_qp_wr = 1 << MLX5_CAP_GEN(mdev, log_max_qp_sz); However, this value represents the number of WQEs in units of Basic Blocks (see MLX5_SEND_WQE_BB), not actual number of WQEs. Since the size of a WQE can vary depending on transport type and features (e.g., atomic operations, UMR, LSO), the actual number of WQEs can be significantly smaller than the WQEBB count suggests. This patch introduces a conservative estimation of the worst-case WQE size — considering largest segments possible with 1 SGE and no inline data or special features. It uses this to derive a more accurate max_qp_wr value. Fixes: 938fe83c8dcb ("net/mlx5_core: New device capabilities handling") Link: https://patch.msgid.link/r/7d992c9831c997ed5c33d30973406dc2dcaf5e89.1755088725.git.leon@kernel.org Reported-by: Chuck Lever <cel@kernel.org> Closes: https://lore.kernel.org/all/20250506142202.GJ2260621@ziepe.ca/ Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-25RDMA/mlx5: Enable Data-Direct with Relaxed OrderingYishai Hadas4-7/+41
Relaxed Ordering can improve performance in certain scenarios. Enable it in the Data-Direct use case as well. Link: https://patch.msgid.link/r/1221dcdda8061ba5f6bc3519044083c7438b257e.1755088503.git.leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Gal Shalom <galshalom@Nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-25RDMA/efa: Extend admin timeout error printMichael Margolin1-5/+9
Add command id to the printed message for additional debug information. Link: https://patch.msgid.link/r/20250703182314.16442-1-mrgolin@amazon.com Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-15{rdma,net}/mlx5: export mlx5_vport_get_vhca_idSaeed Mahameed1-23/+4
vhca id is already cached in the vport structure no need to query on every mlx5 layer, use the mlx5_vport_get_vhca_id, where possible. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2025-08-13RDMA/hns: Fix dip entries leak on devices newer than hip09Junxian Huang1-1/+1
DIP algorithm is also supported on devices newer than hip09, so free dip entries too. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250812122602.3524602-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13IB/hfi1: Use for_each_online_cpu() instead of for_each_cpu()Fushuai Wang1-1/+1
Replace the opencoded for_each_cpu(cpu, cpu_online_mask) loop with the more readable and equivalent for_each_online_cpu(cpu) macro. Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Link: https://patch.msgid.link/20250811062534.1041-1-wangfushuai@baidu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/core: Free pfn_list with appropriate kvfree callAkhilesh Patil1-2/+2
Ensure that pfn_list allocated by kvcalloc() is freed using corresponding kvfree() function. Match memory allocation and free routines kvcalloc -> kvfree. Fixes: 259e9bd07c57 ("RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices") Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in> Link: https://patch.msgid.link/aJjcPjL1BVh8QrMN@bhairav-test.ee.iitb.ac.in Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/bnxt_re: Fix to initialize the PBL arrayAnantha Prabhu1-0/+2
memset the PBL page pointer and page map arrays before populating the SGL addresses of the HWQ. Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com> Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250805101000.233310-5-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/bnxt_re: Fix a possible memory leak in the driverKalesh AP1-0/+23
The GID context reuse logic requires the context memory to be not freed if and when DEL_GID firmware command fails. But, if there's no subsequent ADD_GID to reuse it, the context memory must be freed when the driver is unloaded. Otherwise it leads to a memory leak. Below is the kmemleak trace reported: unreferenced object 0xffff88817a4f34d0 (size 8): comm "insmod", pid 1072504, jiffies 4402561550 hex dump (first 8 bytes): 01 00 00 00 00 00 00 00 ........ backtrace (crc ccaa009e): __kmalloc_cache_noprof+0x33e/0x400 0xffffffffc2db9d48 add_modify_gid+0x5e0/0xb60 [ib_core] __ib_cache_gid_add+0x213/0x350 [ib_core] update_gid+0xf2/0x180 [ib_core] enum_netdev_ipv4_ips+0x3f3/0x690 [ib_core] enum_all_gids_of_dev_cb+0x125/0x1b0 [ib_core] ib_enum_roce_netdev+0x14b/0x250 [ib_core] ib_cache_setup_one+0x2e5/0x540 [ib_core] ib_register_device+0x82c/0xf10 [ib_core] 0xffffffffc2df5ad9 0xffffffffc2da8b07 0xffffffffc2db174d auxiliary_bus_probe+0xa5/0x120 really_probe+0x1e4/0x850 __driver_probe_device+0x18f/0x3d0 Fixes: 4a62c5e9e2e1 ("RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250805101000.233310-4-kalesh-anakkur.purayil@broadcom.com Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/bnxt_re: Fix to remove workload check in SRQ limit pathKashyap Desai3-35/+2
There should not be any checks of current workload to set srq_limit value to SRQ hw context. Remove all such workload checks and make a direct call to set srq_limit via doorbell SRQ_ARM. Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250805101000.233310-3-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/bnxt_re: Fix to do SRQ armena by defaultKashyap Desai1-2/+1
Whenever SRQ is created, make sure SRQ arm enable is always set. Driver is always ready to receive SRQ ASYNC event. Additional note - There is no need to do srq arm enable conditionally. See bnxt_qplib_armen_db in bnxt_qplib_create_cq(). Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Link: https://patch.msgid.link/20250805101000.233310-2-kalesh-anakkur.purayil@broadcom.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/hns: Fix querying wrong SCC context for DIP algorithmwenglianfa2-3/+10
When using DIP algorithm, all QPs establishing connections with the same destination IP share the same SCC, which is indexed by dip_idx, but dip_idx isn't necessarily equal to qpn. Therefore, dip_idx should be used to query SCC context instead of qpn. Fixes: 124a9fbe43aa ("RDMA/hns: Append SCC context to the raw dump of QPC") Signed-off-by: wenglianfa <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250726075345.846957-1-huangjunxian6@hisilicon.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/mana_ib: Drain send wrs of GSI QPKonstantin Taranov3-0/+32
Drain send WRs of the GSI QP on device removal. In rare servicing scenarios, the hardware may delete the state of the GSI QP, preventing it from generating CQEs for pending send WRs. Since WRs submitted to the GSI QP hold CM resources, the device cannot be removed until those WRs are completed. This patch marks all pending send WRs as failed, allowing the GSI QP to release the CM resources and enabling safe device removal. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1753779618-23629-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/erdma: Use dma_map_page to map scatter MTT bufferBoshi Yu2-43/+71
Each high-level indirect MTT entry is assumed to point to exactly one page of the low-level MTT buffer, but dma_map_sg may merge contiguous physical pages when mapping. To avoid extra overhead from splitting merged regions, use dma_map_page to map the scatter MTT buffer page by page. Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20250725055410.67520-2-boshiyu@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/erdma: Fix unset QPN of GSI QPBoshi Yu1-0/+2
The QPN of the GSI QP was not set, which may cause issues. Set the QPN to 1 when creating the GSI QP. Fixes: 999a0a2e9b87 ("RDMA/erdma: Support UD QPs and UD WRs") Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20250725055410.67520-4-boshiyu@linux.alibaba.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/erdma: Fix ignored return value of init_kernel_qpBoshi Yu1-1/+3
The init_kernel_qp interface may fail. Check its return value and free related resources properly when it does. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Reviewed-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Boshi Yu <boshiyu@linux.alibaba.com> Link: https://patch.msgid.link/20250725055410.67520-3-boshiyu@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/rxe: Flush delayed SKBs while releasing RXE resourcesZhu Yanjun2-22/+9
When skb packets are sent out, these skb packets still depends on the rxe resources, for example, QP, sk, when these packets are destroyed. If these rxe resources are released when the skb packets are destroyed, the call traces will appear. To avoid skb packets hang too long time in some network devices, a timestamp is added when these skb packets are created. If these skb packets hang too long time in network devices, these network devices can free these skb packets to release rxe resources. Reported-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8425ccfb599521edb153 Tested-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com Fixes: 1a633bdc8fd9 ("RDMA/rxe: Let destroy qp succeed with stuck packet") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250726013104.463570-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/ucma: Support write an event into a CMMark Zhang1-1/+51
Enable user-space to inject an event into a CM through it's event channel. Two new events are added and supported: RDMA_CM_EVENT_USER and RDMA_CM_EVENT_INTERNAL. With these 2 events a new event parameter "arg" is supported, which is passed from sender to receiver transparently. With this feature an application is able to write an event into a CM channel with a new user-space rdmacm API. For example thread T1 could write an event with the API: rdma_write_cm_event(cm_id, RDMA_CM_EVENT_USER, status, arg); and thread T2 could receive the event with rdma_get_cm_event(). Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/fdf49d0b17a45933c5d8c1d90605c9447d9a3c73.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/ucma: Support query resolved service recordsMark Zhang1-0/+40
Enable user-space to query resolved service records through a ucma command when a RDMA_CM_EVENT_ADDRINFO_RESOLVED event is received. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/1090ee7c00c3f8058c4f9e7557de983504a16715.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-08-13RDMA/cma: Support IB service record resolutionMark Zhang3-4/+166
Add new UCMA command and the corresponding CMA implementation. Userspace can send this command to request service resolution based on service name or ID. On a successful resolution, one or multiple service records are returned, the first one will be used as destination address by default. Two new CM events are added and returned to caller accordingly: - RDMA_CM_EVENT_ADDRINFO_RESOLVED: Resolve succeeded; - RDMA_CM_EVENT_ADDRINFO_ERROR: Resolve failed. Internally two new CM states are added: - RDMA_CM_ADDRINFO_QUERY: CM is in the process of IB service resolution; - RDMA_CM_ADDRINFO_RESOLVED: CM has finished the resolve process. With these new states, beside existing state transfer processes, 2 new processes are supported: 1. The default address is used: RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDRINFO_QUERY -> RDMA_CM_ADDRINFO_RESOLVED -> RDMA_CM_ROUTE_QUERY 2. To use a different address: RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDRINFO_QUERY-> RDMA_CM_ADDRINFO_RESOLVED -> RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_RESOLVED -> RDMA_CM_ROUTE_QUERY In the 2nd case, resolve_addrinfo returns multiple records, a user could call rdma_resolve_addr() with the one that is not the first. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Link: https://patch.msgid.link/b6e82ad75522a13b5efe4ff86da0e465aab04cc2.1751279794.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>