aboutsummaryrefslogtreecommitdiffstats
path: root/drivers (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-25amd/amdkfd: enhance kfd process check in switch partitionYifan Zhang3-0/+16
current switch partition only check if kfd_processes_table is empty. kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but kfd_process tear down is in kfd_process_wq_release. consider two processes: Process A (workqueue) -> kfd_process_wq_release -> Access kfd_node member Process B switch partition -> amdgpu_xcp_pre_partition_switch -> amdgpu_amdkfd_device_fini_sw -> kfd_node tear down. Process A and B may trigger a race as shown in dmesg log. This patch is to resolve the race by adding an atomic kfd_process counter kfd_processes_count, it increment as create kfd process, decrement as finish kfd_process_wq_release. v2: Put kfd_processes_count per kfd_dev, move decrement to kfd_process_destroy_pdds and bug fix. (Philip Yang) [3966658.307702] divide error: 0000 [#1] SMP NOPTI [3966658.350818] i10nm_edac [3966658.356318] CPU: 124 PID: 38435 Comm: kworker/124:0 Kdump: loaded Tainted [3966658.356890] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu] [3966658.362839] nfit [3966658.366457] RIP: 0010:kfd_get_num_sdma_engines+0x17/0x40 [amdgpu] [3966658.366460] Code: 00 00 e9 ac 81 02 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 8b 4f 08 48 8b b7 00 01 00 00 8b 81 58 26 03 00 99 <f7> be b8 01 00 00 80 b9 70 2e 00 00 00 74 0b 83 f8 02 ba 02 00 00 [3966658.380967] x86_pkg_temp_thermal [3966658.391529] RSP: 0018:ffffc900a0edfdd8 EFLAGS: 00010246 [3966658.391531] RAX: 0000000000000008 RBX: ffff8974e593b800 RCX: ffff888645900000 [3966658.391531] RDX: 0000000000000000 RSI: ffff888129154400 RDI: ffff888129151c00 [3966658.391532] RBP: ffff8883ad79d400 R08: 0000000000000000 R09: ffff8890d2750af4 [3966658.391532] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000 [3966658.391533] R13: ffff8883ad79d400 R14: ffffe87ff662ba00 R15: ffff8974e593b800 [3966658.391533] FS: 0000000000000000(0000) GS:ffff88fe7f600000(0000) knlGS:0000000000000000 [3966658.391534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [3966658.391534] CR2: 0000000000d71000 CR3: 000000dd0e970004 CR4: 0000000002770ee0 [3966658.391535] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [3966658.391535] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [3966658.391536] PKRU: 55555554 [3966658.391536] Call Trace: [3966658.391674] deallocate_sdma_queue+0x38/0xa0 [amdgpu] [3966658.391762] process_termination_cpsch+0x1ed/0x480 [amdgpu] [3966658.399754] intel_powerclamp [3966658.402831] kfd_process_dequeue_from_all_devices+0x5b/0xc0 [amdgpu] [3966658.402908] kfd_process_wq_release+0x1a/0x1a0 [amdgpu] [3966658.410516] coretemp [3966658.434016] process_one_work+0x1ad/0x380 [3966658.434021] worker_thread+0x49/0x310 [3966658.438963] kvm_intel [3966658.446041] ? process_one_work+0x380/0x380 [3966658.446045] kthread+0x118/0x140 [3966658.446047] ? __kthread_bind_mask+0x60/0x60 [3966658.446050] ret_from_fork+0x1f/0x30 [3966658.446053] Modules linked in: kpatch_20765354(OEK) [3966658.455310] kvm [3966658.464534] mptcp_diag xsk_diag raw_diag unix_diag af_packet_diag netlink_diag udp_diag act_pedit act_mirred act_vlan cls_flower kpatch_21951273(OEK) kpatch_18424469(OEK) kpatch_19749756(OEK) [3966658.473462] idxd_mdev [3966658.482306] kpatch_17971294(OEK) sch_ingress xt_conntrack amdgpu(OE) amdxcp(OE) amddrm_buddy(OE) amd_sched(OE) amdttm(OE) amdkcl(OE) intel_ifs iptable_mangle tcm_loop target_core_pscsi tcp_diag target_core_file inet_diag target_core_iblock target_core_user target_core_mod coldpgs kpatch_18383292(OEK) ip6table_nat ip6table_filter ip6_tables ip_set_hash_ipportip ip_set_hash_ipportnet ip_set_hash_ipport ip_set_bitmap_port xt_comment iptable_nat nf_nat iptable_filter ip_tables ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sn_core_odd(OE) i40e overlay binfmt_misc tun bonding(OE) aisqos(OE) aisqos_hotfixes(OE) rfkill uio_pci_generic uio cuse fuse nf_tables nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm idxd_mdev [3966658.491237] vfio_pci [3966658.501196] vfio_pci vfio_virqfd mdev vfio_iommu_type1 vfio iax_crypto intel_pmt_telemetry iTCO_wdt intel_pmt_class iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq [3966658.508537] vfio_virqfd [3966658.517569] snd_seq_device ipmi_ssif isst_if_mbox_pci isst_if_mmio pcspkr snd_pcm idxd intel_uncore ses isst_if_common intel_vsec idxd_bus enclosure snd_timer mei_me snd i2c_i801 i2c_smbus mei i2c_ismt soundcore joydev acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad vfat fat [3966658.526851] mdev [3966658.536096] nfsd auth_rpcgss nfs_acl lockd grace slb_vtoa(OE) sunrpc dm_mod hookers mlx5_ib(OE) ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm mlx5_core(OE) mlxfw(OE) [3966658.540381] vfio_iommu_type1 [3966658.544341] nvme mpt3sas tls drm nvme_core pci_hyperv_intf raid_class psample libcrc32c crc32c_intel mlxdevm(OE) i2c_core [3966658.551254] vfio [3966658.558742] scsi_transport_sas wmi pinctrl_emmitsburg sd_mod t10_pi sg ahci libahci libata rdma_ucm(OE) ib_uverbs(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_umad(OE) ib_core(OE) ib_ucm(OE) mlx_compat(OE) [3966658.563004] iax_crypto [3966658.570988] [last unloaded: diagnose] [3966658.571027] ---[ end trace cc9dbb180f9ae537 ]--- Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Philip.Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_swYifan Zhang1-1/+9
There is race in amdgpu_amdkfd_device_fini_sw and interrupt. if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and kfree(kfd), and KGD interrupt generated. kernel panic log: BUG: kernel NULL pointer dereference, address: 0000000000000098 amdgpu 0000:c8:00.0: amdgpu: Requesting 4 partitions through PSP PGD d78c68067 P4D d78c68067 kfd kfd: amdgpu: Allocated 3969056 bytes on gart PUD 1465b8067 PMD @ Oops: @002 [#1] SMP NOPTI kfd kfd: amdgpu: Total number of KFD nodes to be created: 4 CPU: 115 PID: @ Comm: swapper/115 Kdump: loaded Tainted: G S W OE K RIP: 0010:_raw_spin_lock_irqsave+0x12/0x40 Code: 89 e@ 41 5c c3 cc cc cc cc 66 66 2e Of 1f 84 00 00 00 00 00 OF 1f 40 00 Of 1f 44% 00 00 41 54 9c 41 5c fa 31 cO ba 01 00 00 00 <fO> OF b1 17 75 Ba 4c 89 e@ 41 Sc 89 c6 e8 07 38 5d RSP: 0018: ffffc90@1a6b0e28 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000018 0000000000000001 RSI: ffff8883bb623e00 RDI: 0000000000000098 ffff8883bb000000 RO8: ffff888100055020 ROO: ffff888100055020 0000000000000000 R11: 0000000000000000 R12: 0900000000000002 ffff888F2b97da0@ R14: @000000000000098 R15: ffff8883babdfo00 CS: 010 DS: 0000 ES: 0000 CRO: 0000000080050033 CR2: 0000000000000098 CR3: 0000000e7cae2006 CR4: 0000000002770ce0 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 0000000000000000 DR6: 00000000fffeO7FO DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> kgd2kfd_interrupt+@x6b/0x1f@ [amdgpu] ? amdgpu_fence_process+0xa4/0x150 [amdgpu] kfd kfd: amdgpu: Node: 0, interrupt_bitmap: 3 YcpxFl Rant tErace amdgpu_irq_dispatch+0x165/0x210 [amdgpu] amdgpu_ih_process+0x80/0x100 [amdgpu] amdgpu: Virtual CRAT table created for GPU amdgpu_irq_handler+0x1f/@x60 [amdgpu] __handle_irq_event_percpu+0x3d/0x170 amdgpu: Topology: Add dGPU node [0x74a2:0x1002] handle_irq_event+0x5a/@xcO handle_edge_irq+0x93/0x240 kfd kfd: amdgpu: KFD node 1 partition @ size 49148M asm_call_irq_on_stack+0xf/@x20 </IRQ> common_interrupt+0xb3/0x130 asm_common_interrupt+0x1le/0x40 5.10.134-010.a1i5000.a18.x86_64 #1 Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Philip Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amd/display: Reject modes with too high pixel clock on DCE6-10Timur Kristóf5-3/+35
Reject modes with a pixel clock higher than the maximum display clock. Use 400 MHz as a fallback value when the maximum display clock is not known. Pixel clocks that are higher than the display clock just won't work and are not supported. With the addition of the YUV422 fallback, DC can now accidentally select a mode requiring higher pixel clock than actually supported when the DP version supports the required bandwidth but the clock is otherwise too high for the display engine. DCE 6-10 don't support these modes but they don't have a bandwidth calculation to reject them properly. Fixes: db291ed1732e ("drm/amd/display: Add fallback path for YCBCR422") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()Mario Limonciello1-2/+0
[Why] amdgpu_connector_add_common_modes() has a check for the width and height of common modes being too small, but the array of common_modes[] has fixed values. The check is dead code. [How] Drop unnecessary check. Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amd/display: Only enable common modes for eDP and LVDSMario Limonciello1-0/+4
[Why] The main reason common modes are added is for compatibility with clone mode when a laptop is connected to a projector or external monitor. Since commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native resolutions on eDP") when non-native modes are picked for eDP the GPU scalar will be used. This is because it is inconsistent whether eDP panels have the capability to actually drive non-native resolutions. With panels connected to other connectors this limitation generally doesn't exist as we the EDID will advertise support for a number of resolutions and monitors will use built in scaling hardware. Comparing DC and non-DC code paths the non-DC code path only adds common modes for LVDS and eDP whereas the DC codepath does it for all connector types. In the past there was an experiment done to disable common mode adding for eDP and LVDS from commit 6d396e7ac1ce3 ("drm/amd/display: Disable common modes for LVDS") and commit 7948afb46af92 ("drm/amd/display: Disable common modes for eDP") but this was reverted in commit a8b79b09185de ("drm/amd: Re-enable common modes for eDP and LVDS") because it caused problems with Xorg. [How] Only add common modes for eDP and LVDS for DC, matching the behavior of non-DC. Suggested-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amdgpu: remove the redeclaration of variable iSunil Khatri1-1/+0
Variable "i" has been redeclared as integer later in the function which is wrong and not serving any purpose. Fixes: 899fbde14646 ("drm/amdgpu: replace get_user_pages with HMM mirror helpers") Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amdgpu/userq: assign an error code for invalid userq vaPrike Liang1-0/+2
It should return an error code if userq VA validation fails. Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amdgpu: revert "rework reserved VMID handling" v2Christian König4-41/+50
This reverts commit e44a0fe630c58b0a87d8281f5c1077a3479e5fce. Initially we used VMID reservation to enforce isolation between processes. That has now been replaced by proper fence handling. Both OpenGL, RADV and ROCm developers requested a way to reserve a VMID for SPM, so restore that approach by reverting back to only allowing a single process to use the reserved VMID. Only compile tested for now. v2: use -ENOENT instead of -EINVAL if VMID is not available Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amdgpu: remove leftover from enforcing isolation by VMIDChristian König1-5/+0
Initially we enforced isolation by reserving a VMID, but that practice was now removed. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amdgpu: Add fallback to pipe reset if KCQ ring reset failsJesse.Zhang1-0/+12
Add a fallback mechanism to attempt pipe reset when KCQ reset fails to recover the ring. After performing the KCQ reset and queue remapping, test the ring functionality. If the ring test fails, initiate a pipe reset as an additional recovery step. v2: fix the typo (Lijo) v3: try pipeline reset when kiq mapping fails (Lijo) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-25drm/amd: Fix hybrid sleepMario Limonciello (AMD)1-1/+1
[Why] commit 530694f54dd5e ("drm/amdgpu: do not resume device in thaw for normal hibernation") optimized the flow for systems that are going into S4 where the power would be turned off. Basically the thaw() callback wouldn't resume the device if the hibernation image was successfully created since the system would be powered off. This however isn't the correct flow for a system entering into s0i3 after the hibernation image is created. Some of the amdgpu callbacks have different behavior depending upon the intended state of the suspend. [How] Use pm_hibernation_mode_is_suspend() as an input to decide whether to run resume during thaw() callback. Reported-by: Ionut Nechita <ionut_n2001@yahoo.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4573 Tested-by: Ionut Nechita <ionut_n2001@yahoo.com> Fixes: 530694f54dd5e ("drm/amdgpu: do not resume device in thaw for normal hibernation") Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Kenneth Crudup <kenny@panix.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Cc: 6.17+ <stable@vger.kernel.org> # 6.17+: 495c8d35035e: PM: hibernate: Add pm_hibernation_mode_is_suspend() Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski105-347/+905
Cross-merge networking fixes after downstream PR (net-6.17-rc8). Conflicts: drivers/net/can/spi/hi311x.c 6b6968084721 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled") 27ce71e1ce81 ("net: WQ_PERCPU added to alloc_workqueue users") https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org net/smc/smc_loopback.c drivers/dibs/dibs_loopback.c a35c04de2565 ("net/smc: fix warning in smc_rx_splice() when calling get_page()") cc21191b584c ("dibs: Move data path to dibs layer") https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org Adjacent changes: drivers/net/dsa/lantiq/lantiq_gswip.c c0054b25e2f1 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()") 7a1eaef0a791 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-25s390/dasd: enforce dma_alignment to ensure proper buffer validationJaehoon Kim1-0/+5
The block layer validates buffer alignment using the device's dma_alignment value. If dma_alignment is smaller than logical_block_size(bp_block) -1, misaligned buffer incorrectly pass validation and propagate to the lower-level driver. This patch adjusts dma_alignment to be at least logical_block_size -1, ensuring that misalignment buffers are properly rejected at the block layer and do not reach the DASD driver unnecessarily. Fixes: 2a07bb64d801 ("s390/dasd: Remove DMA alignment") Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Cc: stable@vger.kernel.org #6.11+ Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-25s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_requestJaehoon Kim1-5/+7
Currently, if CCW request creation fails with -EINVAL, the DASD driver returns BLK_STS_IOERR to the block layer. This can happen, for example, when a user-space application such as QEMU passes a misaligned buffer, but the original cause of the error is masked as a generic I/O error. This patch changes the behavior so that -EINVAL is returned as BLK_STS_INVAL, allowing user space to properly detect alignment issues instead of interpreting them as I/O errors. Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Cc: stable@vger.kernel.org #6.11+ Signed-off-by: Jaehoon Kim <jhkim@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-25Merge tag 'net-6.17-rc8' of ↵Linus Torvalds27-108/+190
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth, IPsec and CAN. No known regressions at this point. Current release - regressions: - xfrm: xfrm_alloc_spi shouldn't use 0 as SPI Previous releases - regressions: - xfrm: fix offloading of cross-family tunnels - bluetooth: fix several races leading to UaFs - dsa: lantiq_gswip: fix FDB entries creation for the CPU port - eth: - tun: update napi->skb after XDP process - mlx: fix UAF in flow counter release Previous releases - always broken: - core: forbid FDB status change while nexthop is in a group - smc: fix warning in smc_rx_splice() when calling get_page() - can: provide missing ndo_change_mtu(), to prevent buffer overflow. - eth: - i40e: fix VF config validation - broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl" * tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits) octeontx2-pf: Fix potential use after free in otx2_tc_add_flow() net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup() libie: fix string names for AQ error codes net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD net/mlx5: HWS, ignore flow level for multi-dest table net/mlx5: fs, fix UAF in flow counter release selftests: fib_nexthops: Add test cases for FDB status change selftests: fib_nexthops: Fix creation of non-FDB nexthops nexthop: Forbid FDB status change while nexthop is in a group net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS bnxt_en: correct offset handling for IPv6 destination address ptp: document behavior of PTP_STRICT_FLAGS broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl broadcom: fix support for PTP_PEROUT_DUTY_CYCLE Bluetooth: MGMT: Fix possible UAFs Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync Bluetooth: hci_event: Fix UAF in hci_conn_tx_dequeue Bluetooth: hci_sync: Fix hci_resume_advertising_sync Bluetooth: Fix build after header cleanup ...
2025-09-25hwmon: (mlxreg-fan) Add support for new flavour of capability registerVadim Pasternak1-3/+15
FAN platform data is common across the various systems, while fan driver should be able to apply only the fan instances relevant to specific system. For example, platform data might contain descriptions for fan1, fan2, ..., fan{n}, while some systems equipped with all 'n' fans, others with less. Also, on some systems fan drawer can be equipped with several tachometers and on others only with one. For detection of the real number of equipped drawers and tachometers special capability registers are used. These registers used to indicate presence of drawers and tachometers through the bitmap. For some new big modular systems this register will provide presence data by counter. Use slot parameter to distinct whether capability register contains bitmask or counter. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250113084859.27064-3-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-09-25hwmon: (mlxreg-fan) Separate methods of fan setting coming from different ↵Vadim Pasternak1-8/+16
subsystems Distinct between fan speed setting request coming for hwmon and thermal subsystems. There are fields 'last_hwmon_state' and 'last_thermal_state' in the structure 'mlxreg_fan_pwm', which respectively store the cooling state set by the 'hwmon' and 'thermal' subsystem. The purpose is to make arbitration of fan speed setting. For example, if fan speed required to be not lower than some limit, such setting is to be performed through 'hwmon' subsystem, thus 'thermal' subsystem will not set fan below this limit. Currently, the 'last_thermal_state' is also be updated by 'hwmon' causing cooling state to never be set to a lower value. Eliminate update of 'last_thermal_state', when request is coming from 'hwmon' subsystem. Fixes: da74944d3a46 ("hwmon: (mlxreg-fan) Use pwm attribute for setting fan speed low limit") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20250113084859.27064-2-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-09-25hwmon: (cros_ec) register fans into thermal framework cooling devicesSung-Chi Li1-0/+83
Register fans connected under EC as thermal cooling devices as well, so these fans can then work with the thermal framework. During the driver probing phase, we will also try to register each fan as a thermal cooling device based on previous probe result (whether the there are fans connected on that channel, and whether EC supports fan control). The basic get max state, get current state, and set current state methods are then implemented as well. Signed-off-by: Sung-Chi Li <lschyi@chromium.org> Acked-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250911-cros_ec_fan-v6-3-a1446cc098af@google.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-09-25hwmon: (cros_ec) add PWM control over fansSung-Chi Li1-0/+230
Newer EC firmware supports controlling fans through host commands, so adding corresponding implementations for controlling these fans in the driver for other kernel services and userspace to control them. The driver will first probe the supported host command versions (get and set of fan PWM values, get and set of fan control mode) to see if the connected EC fulfills the requirements of controlling the fan, then exposes corresponding sysfs nodes for userspace to control the fan with corresponding read and write implementations. As EC will automatically change the fan mode to auto when the device is suspended, the power management hooks are added as well to keep the fan control mode and fan PWM value consistent during suspend and resume. As we need to access the hwmon device in the power management hook, update the driver by storing the hwmon device in the driver data as well. Signed-off-by: Sung-Chi Li <lschyi@chromium.org> Acked-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250911-cros_ec_fan-v6-2-a1446cc098af@google.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-09-25hwmon: add SMARC-sAM67 supportMichael Walle3-0/+172
Add a new driver for the Kontron SMARC-sAM67 board management controller. It has two voltage sensors and one temperature sensor. Signed-off-by: Michael Walle <mwalle@kernel.org> Link: https://lore.kernel.org/r/20250912120745.2295115-7-mwalle@kernel.org [groeck: Added sa67 to index.rst] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-09-25hwmon: (asus-ec-sensors) add TUF GAMING X670E PLUS WIFIMohamad Kamal1-0/+2
Add support for TUF GAMING X670E PLUS WIFI Signed-off-by: Mohamad Kamal <mohamad.kamal.85@gmail.com> Signed-off-by: Eugene Shalygin <eugene.shalygin@gmail.com> Link: https://lore.kernel.org/r/20250914084019.1108941-1-eugene.shalygin@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-09-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-24/+18
Pull virtio fixes from Michael Tsirkin: "virtio,vhost: last minute fixes More small fixes. Most notably this fixes crashes and hangs in vhost-net" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: MAINTAINERS, mailmap: Update address for Peter Hilber virtio_config: clarify output parameters uapi: vduse: fix typo in comment vhost: Take a reference on the task in struct vhost_task. vhost-net: flush batched before enabling notifications Revert "vhost/net: Defer TX queue re-enable until after sendmsg" vhost-net: unbreak busy polling vhost-scsi: fix argument order in tport allocation error message
2025-09-25s390/tape: Add WQ_PERCPU to alloc_workqueue usersMarco Crivellari1-1/+1
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This patch adds a new WQ_PERCPU flag to explicitly request the use of the per-CPU behavior. Both flags coexist for one release cycle to allow callers to transition their calls. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. All existing users have been updated accordingly. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-09-25tpm: loongson: Add bufsiz parameter to tpm_loongson_send()Nathan Chancellor1-1/+1
Commit 5c83b07df9c5 ("tpm: Add a driver for Loongson TPM device") has a semantic conflict with commit 07d8004d6fb9 ("tpm: add bufsiz parameter in the .send callback"), as the former change was developed against a tree without the latter change. This results in a build error: drivers/char/tpm/tpm_loongson.c:48:17: error: initialization of 'int (*)(struct tpm_chip *, u8 *, size_t, size_t)' {aka 'int (*)(struct tpm_chip *, unsigned char *, long unsigned int, long unsigned int)'} from incompatible pointer type 'int (*)(struct tpm_chip *, u8 *, size_t)' {aka 'int (*)(struct tpm_chip *, unsigned char *, long unsigned int)'} [-Wincompatible-pointer-types] 48 | .send = tpm_loongson_send, | ^~~~~~~~~~~~~~~~~ drivers/char/tpm/tpm_loongson.c:48:17: note: (near initialization for 'tpm_loongson_ops.send') drivers/char/tpm/tpm_loongson.c:31:12: note: 'tpm_loongson_send' declared here 31 | static int tpm_loongson_send(struct tpm_chip *chip, u8 *buf, size_t count) | ^~~~~~~~~~~~~~~~~ Add the expected bufsiz parameter to tpm_loongson_send() to resolve the error. Fixes: 5c83b07df9c5 ("tpm: Add a driver for Loongson TPM device") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Lee Jones <lee@kernel.org>
2025-09-25net: gso: restore ids of outer ip headers correctlyRichard Gobert2-6/+19
Currently, NETIF_F_TSO_MANGLEID indicates that the inner-most ID can be mangled. Outer IDs can always be mangled. Make GSO preserve outer IDs by default, with NETIF_F_TSO_MANGLEID allowing both inner and outer IDs to be mangled. This commit also modifies a few drivers that use SKB_GSO_FIXEDID directly. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250923085908.4687-4-richardbgobert@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25eth: fbnic: Read module EEPROMMohsin Bashir3-0/+217
Add support to read module EEPROM for fbnic. Towards this, add required support to issue a new command to the firmware and to receive the response to the corresponding command. Create a local copy of the data in the completion struct before writing to ethtool_module_eeprom to avoid writing to data in case it is freed. Given that EEPROM pages are small, the overhead of additional copy is negligible. Do not block API with explicit checks since API has appropriate checks in place for length, offset, and page. Explicitly check bank, page, offset, and length in fbnic_fw_parse_qsfp_read_resp() to match EEPROM read responses to the correct request. This is important because if the driver times out waiting for an EEPROM read response, a subsequent read request with different values is susceptible to receiving an erroneous response (i.e., the response to the previous request). Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20250922231855.3717483-1-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25platform/x86: lg-laptop: Fix WMAB call in fan_mode_store()Daniel Lee1-20/+14
When WMAB is called to set the fan mode, the new mode is read from either bits 0-1 or bits 4-5 (depending on the value of some other EC register). Thus when WMAB is called with bits 4-5 zeroed and called again with bits 0-1 zeroed, the second call undoes the effect of the first call. This causes writes to /sys/devices/platform/lg-laptop/fan_mode to have no effect (and causes reads to always report a status of zero). Fix this by calling WMAB once, with the mode set in bits 0,1 and 4,5. When the fan mode is returned from WMAB it always has this form, so there is no need to preserve the other bits. As a bonus, the driver now supports the "Performance" fan mode seen in the LG-provided Windows control app, which provides less aggressive CPU throttling but louder fan noise and shorter battery life. Also, correct the documentation to reflect that 0 corresponds to the default mode (what the Windows app calls "Optimal") and 1 corresponds to the silent mode. Fixes: dbf0c5a6b1f8 ("platform/x86: Add LG Gram laptop special features driver") Link: https://bugzilla.kernel.org/show_bug.cgi?id=204913#c4 Signed-off-by: Daniel Lee <dany97@live.ca> Link: https://patch.msgid.link/MN2PR06MB55989CB10E91C8DA00EE868DDC1CA@MN2PR06MB5598.namprd06.prod.outlook.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-09-25octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()Dan Carpenter1-1/+1
This code calls kfree_rcu(new_node, rcu) and then dereferences "new_node" and then dereferences it on the next line. Two lines later, we take a mutex so I don't think this is an RCU safe region. Re-order it to do the dereferences before queuing up the free. Fixes: 68fbff68dbea ("octeontx2-pf: Add police action for TC flower") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/aNKCL1jKwK8GRJHh@stanley.mountain Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25drm/panthor: Defer scheduler entitiy destruction to queue releaseAdrián Larumbe1-7/+1
Commit de8548813824 ("drm/panthor: Add the scheduler logical block") handled destruction of a group's queues' drm scheduler entities early into the group destruction procedure. However, that races with the group submit ioctl, because by the time entities are destroyed (through the group destroy ioctl), the submission procedure might've already obtained a group handle, and therefore the ability to push jobs into entities. This is met with a DRM error message within the drm scheduler core as a situation that should never occur. Fix by deferring drm scheduler entity destruction to queue release time. Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250919164436.531930-1-adrian.larumbe@collabora.com
2025-09-25net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added ↵Vladimir Oltean1-1/+2
to the CPU port The blamed commit and others in that patch set started the trend of reusing existing DSA driver API for a new purpose: calling ds->ops->port_fdb_add() on the CPU port. The lantiq_gswip driver was not prepared to handle that, as can be seen from the many errors that Daniel presents in the logs: [ 174.050000] gswip 1e108000.switch: port 2 failed to add fa:aa:72:f4:8b:1e vid 1 to fdb: -22 [ 174.060000] gswip 1e108000.switch lan2: entered promiscuous mode [ 174.070000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 0 to fdb: -22 [ 174.090000] gswip 1e108000.switch: port 2 failed to add 00:01:02:03:04:02 vid 1 to fdb: -22 [ 174.090000] gswip 1e108000.switch: port 2 failed to delete fa:aa:72:f4:8b:1e vid 1 from fdb: -2 The errors are because gswip_port_fdb() wants to get a handle to the bridge that originated these FDB events, to associate it with a FID. Absolutely honourable purpose, however this only works for user ports. To get the bridge that generated an FDB entry for the CPU port, one would need to look at the db.bridge.dev argument. But this was introduced in commit c26933639b54 ("net: dsa: request drivers to perform FDB isolation"), first appeared in v5.18, and when the blamed commit was introduced in v5.14, no such API existed. So the core DSA feature was introduced way too soon for lantiq_gswip. Not acting on these host FDB entries and suppressing any errors has no other negative effect, and practically returns us to not supporting the host filtering feature at all - peacefully, this time. Fixes: 10fae4ac89ce ("net: dsa: include bridge addresses which are local in the host fdb list") Reported-by: Daniel Golle <daniel@makrotopia.org> Closes: https://lore.kernel.org/netdev/aJfNMLNoi1VOsPrN@pidgin.makrotopia.org/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250918072142.894692-3-vladimir.oltean@nxp.com Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()Vladimir Oltean1-4/+14
A port added to a "single port bridge" operates as standalone, and this is mutually exclusive to being part of a Linux bridge. In fact, gswip_port_bridge_join() calls gswip_add_single_port_br() with add=false, i.e. removes the port from the "single port bridge" to enable autonomous forwarding. The blamed commit seems to have incorrectly thought that ds->ops->port_enable() is called one time per port, during the setup phase of the switch. However, it is actually called during the ndo_open() implementation of DSA user ports, which is to say that this sequence of events: 1. ip link set swp0 down 2. ip link add br0 type bridge 3. ip link set swp0 master br0 4. ip link set swp0 up would cause swp0 to join back the "single port bridge" which step 3 had just removed it from. The correct DSA hook for one-time actions per port at switch init time is ds->ops->port_setup(). This is what seems to match the coder's intention; also see the comment at the beginning of the file: * At the initialization the driver allocates one bridge table entry for ~~~~~~~~~~~~~~~~~~~~~ * each switch port which is used when the port is used without an * explicit bridge. Fixes: 8206e0ce96b3 ("net: dsa: lantiq: Add VLAN unaware bridge offloading") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250918072142.894692-2-vladimir.oltean@nxp.com Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-25accel/habanalabs: add Infineon version checkPavan S1-2/+9
On HL338 ASICs, the Infineon first‑stage firmware is not present and the reported version is 0. In this case printing a version number is misleading, as it suggests valid firmware when it does not exist. Fix this by printing the first‑stage Infineon firmware version only if the reported value is non‑zero. This avoids confusing or incorrect log messages on devices where the first stage is not applicable. Signed-off-by: Pavan S <pavan.sreenivas@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: read preboot status after recovering from dirty stateKonstantin Sinyuk1-1/+7
Dirty state can occur when the host VM undergoes a reset while the device does not. In such a case, the driver must reset the device before it can be used again. As part of this reset, the device capabilities are zeroed. Therefore, the driver must read the Preboot status again to learn the Preboot state, capabilities, and security configuration. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add HL_GET_P_STATE passthrough typeAriel Aviad1-0/+3
Add a new passthrough type HL_GET_P_STATE to the cpucp generic ioctl to allow userspace to read the device performance state via firmware. Signed-off-by: Ariel Aviad <ariel.aviad@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add debugfs interface for HLDIO testingKonstantin Sinyuk2-0/+214
Add debugfs files for NVMe Direct I/O (HLDIO) functionality. This interface allows userspace access to direct SSD ↔ device transfers through debugfs nodes. Four debugfs files are created under /sys/kernel/debug/habanalabs/hlN/: - dio_ssd2hl : trigger SSD-to-device transfers - dio_hl2ssd : trigger device-to-SSD transfers (placeholder, not yet implemented) - dio_stats : show transfer statistics - dio_reset : reset statistics counters Usage examples: # Perform SSD → device transfer echo "fd=3 va=0x10000 off=0 len=4096" > \ /sys/kernel/debug/habanalabs/hl0/dio_ssd2hl # View statistics cat /sys/kernel/debug/habanalabs/hl0/dio_stats # Reset counters echo 1 > /sys/kernel/debug/habanalabs/hl0/dio_reset This interface provides access to HLDIO functionality for validation and diagnostics. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Farah Kassabri <farah.kassabri@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add NVMe Direct I/O (HLDIO) infrastructureKonstantin Sinyuk6-2/+624
Introduce NVMe Direct I/O (HLDIO) infrastructure to support peer‑to‑peer DMA in the habanalabs driver. This adds internal helpers and data structures to enable direct transfers between NVMe storage and device memory. The feature is built only when CONFIG_HL_HLDIO is enabled. A debugfs interface is also provided for functional validation. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Farah Kassabri <farah.kassabri@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: support mapping cb with vmalloc-backed coherent memoryMoti Haimovski2-0/+26
When IOMMU is enabled, dma_alloc_coherent() with GFP_USER may return addresses from the vmalloc range. If such an address is mapped without VM_MIXEDMAP, vm_insert_page() will trigger a BUG_ON due to the VM_PFNMAP restriction. Fix this by checking for vmalloc addresses and setting VM_MIXEDMAP in the VMA before mapping. This ensures safe mapping and avoids kernel crashes. The memory is still driver-allocated and cannot be accessed directly by userspace. Signed-off-by: Moti Haimovski <moti.haimovski@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: remove old interface variation of 'access_ok()'Ilia Levi1-5/+0
The access_ok() API no longer requires the VERIFY_WRITE argument, and the use of the old interface with VERIFY_WRITE is deprecated. Clean up the habanalabs memory manager to use the modern access_ok() interface consistently. This removes old #ifdef guards and aligns the driver with current upstream kernel APIs. Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: use the CPLD_SHUTDOWN event handlerKonstantin Sinyuk2-4/+2
After CPLD shutdown event the device is not usable anymore. The common CPLD_SHUTDOWN event handler disables any subsequent device access. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: disable device access after CPLD_SHUTDOWNKonstantin Sinyuk2-0/+28
After a CPLD shutdown event the device becomes unusable. Prevent further device access once this event is received. Signed-off-by: Konstantin Sinyuk <konstantin.sinyuk@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: clarify ctx use after hl_ctx_put() in dmabuf releaseTomer Tayar1-1/+6
In hl_release_dmabuf(), ctx is dereferenced after calling hl_ctx_put() to obtain the compute device file. This is safe because the dma-buf object holds a file reference taken in export_dmabuf(), and the file release (which drops another ctx reference) can only happen after we drop that file reference via fput(). Thus, this hl_ctx_put() call cannot be the last one at this point. Add a comment explaining this to avoid confusion. Signed-off-by: Tomer Tayar <tomer.tayar@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: add support for logging register accesses from debugfsSharley Calzolari3-1/+148
Add infrastructure for logging the last configuration register accesses that occur via debugfs read/write operations. At interrupt time, these log entries can be dumped to dmesg, which helps in diagnosing the cause of RAZWI and ADDR_DEC interrupts. The logging is implemented as a ring buffer of access entries, with each entry recording timestamp and access details. To ensure correctness under concurrent access, operations are now protected using spinlocks. Entries are copied under lock and then printed after releasing it, which minimizes time spent in the critical section. Signed-off-by: Sharley Calzolari <sharley.calzolari@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: stringify engine/queue idsAriel Suller2-8/+367
Print engine/queue names instead of numerical engine/queue IDs to make logs and debug output more readable. Signed-off-by: Ariel Suller <ariel.suller@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: add generic message type to get error countersVitaly Margolin1-0/+3
Add a new CPUCP generic message type to retrieve HBM, SRAM and critical error counters from the device. Signed-off-by: Vitaly Margolin <vitaly.margolin@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs/gaudi2: fix BMON disable configurationVered Yavniely1-1/+1
Change the BMON_CR register value back to its original state before enabling, so that BMON does not continue to collect information after being disabled. Signed-off-by: Vered Yavniely <vered.yavniely@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-25accel/habanalabs: return ENOMEM if less than requested pages were pinnedTomer Tayar1-1/+1
EFAULT is currently returned if less than requested user pages are pinned. This value means a "bad address" which might be confusing to the user, as the address of the given user memory is not necessarily "bad". Modify the return value to ENOMEM, as "out of memory" is more suitable in this case. Signed-off-by: Tomer Tayar <tomer.tayar@intel.com> Reviewed-by: Koby Elbaz <koby.elbaz@intel.com> Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
2025-09-24bnxt_en: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vadim Fedorenko3-27/+23
Convert bnxt dirver to use new timestamping configuration API. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250923173310.139623-3-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24tg3: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vadim Fedorenko1-37/+29
Convert tg3 driver to new timestamping configuration API. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250923173310.139623-2-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24net: phy: micrel: Fix default LED behaviourHoratiu Vultur1-0/+19
By default the LED will be ON when there is a link but they are not blinking when there is any traffic activity. Therefore change this to blink when there is any traffic. Fixes: 5a774b64cd6a ("net: phy: micrel: Add support for lan8842") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://patch.msgid.link/20250922130314.758229-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24net: stmmac: simplify stmmac_init_phy()Russell King (Oracle)1-13/+12
If we fail to attach a PHY, there is no point trying to configure WoL settings. Exit the function after printing the "cannot attach to PHY" error, and remove the now unnecessary code indentation for configuring the LPI timer in phylink. Since we know that "ret" must be zero at this point, change the final return to use a constant rather than "ret". Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1v11A8-0000000774M-3pmH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>