aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-16drm/msm: Fix pgtable prealloc error pathRob Clark1-0/+5
The following splat was reported: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000008d0fd8000 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP CPU: 5 UID: 1000 PID: 149076 Comm: Xwayland Tainted: G S 6.16.0-rc2-00809-g0b6974bb4134-dirty #367 PREEMPT Tainted: [S]=CPU_OUT_OF_SPEC Hardware name: Qualcomm Technologies, Inc. SM8650 HDK (DT) pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : build_detached_freelist+0x28/0x224 lr : kmem_cache_free_bulk.part.0+0x38/0x244 sp : ffff000a508c7a20 x29: ffff000a508c7a20 x28: ffff000a508c7d50 x27: ffffc4e49d16f350 x26: 0000000000000058 x25: 00000000fffffffc x24: 0000000000000000 x23: ffff00098c4e1450 x22: 00000000fffffffc x21: 0000000000000000 x20: ffff000a508c7af8 x19: 0000000000000002 x18: 00000000000003e8 x17: ffff000809523850 x16: ffff000809523820 x15: 0000000000401640 x14: ffff000809371140 x13: 0000000000000130 x12: ffff0008b5711e30 x11: 00000000001058fa x10: 0000000000000a80 x9 : ffff000a508c7940 x8 : ffff000809371ba0 x7 : 781fffe033087fff x6 : 0000000000000000 x5 : ffff0008003cd000 x4 : 781fffe033083fff x3 : ffff000a508c7af8 x2 : fffffdffc0000000 x1 : 0001000000000000 x0 : ffff0008001a6a00 Call trace: build_detached_freelist+0x28/0x224 (P) kmem_cache_free_bulk.part.0+0x38/0x244 kmem_cache_free_bulk+0x10/0x1c msm_iommu_pagetable_prealloc_cleanup+0x3c/0xd0 msm_vma_job_free+0x30/0x240 msm_ioctl_vm_bind+0x1d0/0x9a0 drm_ioctl_kernel+0x84/0x104 drm_ioctl+0x358/0x4d4 __arm64_sys_ioctl+0x8c/0xe0 invoke_syscall+0x44/0x100 el0_svc_common.constprop.0+0x3c/0xe0 do_el0_svc+0x18/0x20 el0_svc+0x30/0x100 el0t_64_sync_handler+0x104/0x130 el0t_64_sync+0x170/0x174 Code: aa0203f5 b26287e2 f2dfbfe2 aa0303f4 (f8737ab6) ---[ end trace 0000000000000000 ]--- Since msm_vma_job_free() is called directly from the ioctl, this looks like an error path cleanup issue. Which I think results from prealloc_cleanup() called without a preceding successful prealloc_allocate() call. So handle that case better. Reported-by: Connor Abbott <cwabbott0@gmail.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/678677/ Message-ID: <20251006153542.419998-1-robin.clark@oss.qualcomm.com>
2025-10-16drm/sched: Fix potential double free in drm_sched_job_add_resv_dependenciesTvrtko Ursulin1-6/+7
When adding dependencies with drm_sched_job_add_dependency(), that function consumes the fence reference both on success and failure, so in the latter case the dma_fence_put() on the error path (xarray failed to expand) is a double free. Interestingly this bug appears to have been present ever since commit ebd5f74255b9 ("drm/sched: Add dependency tracking"), since the code back then looked like this: drm_sched_job_add_implicit_dependencies(): ... for (i = 0; i < fence_count; i++) { ret = drm_sched_job_add_dependency(job, fences[i]); if (ret) break; } for (; i < fence_count; i++) dma_fence_put(fences[i]); Which means for the failing 'i' the dma_fence_put was already a double free. Possibly there were no users at that time, or the test cases were insufficient to hit it. The bug was then only noticed and fixed after commit 9c2ba265352a ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2") landed, with its fixup of commit 4eaf02d6076c ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies"). At that point it was a slightly different flavour of a double free, which commit 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder") noticed and attempted to fix. But it only moved the double free from happening inside the drm_sched_job_add_dependency(), when releasing the reference not yet obtained, to the caller, when releasing the reference already released by the former in the failure case. As such it is not easy to identify the right target for the fixes tag so lets keep it simple and just continue the chain. While fixing we also improve the comment and explain the reason for taking the reference and not dropping it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/ Cc: Christian König <christian.koenig@amd.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Philipp Stanner <phasta@kernel.org> Cc: Christian König <ckoenig.leichtzumerken@gmail.com> Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20251015084015.6273-1-tvrtko.ursulin@igalia.com
2025-10-15drm/xe/evict: drop bogus assertMatthew Auld1-8/+0
This assert can trigger here with non pin_map users that select LATE_RESTORE, since the vmap is allowed to be NULL given that save/restore can now use the blitter instead. The check here doesn't seem to have much value anymore given that we no longer move pinned memory, so any existing vmap is left well alone, and doesn't need to be recreated upon restore, so just drop the assert here. Fixes: 86f69c26113c ("drm/xe: use backup object for pinned save/restore") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6213 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251010152457.177884-2-matthew.auld@intel.com (cherry picked from commit a10b4a69c7f8f596d2c5218fbe84430734fab3b2) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15drm/xe/migrate: don't misalign current bytesMatthew Auld1-1/+3
If current bytes exceeds the max copy size, ensure the clamped size still accounts for the XE_CACHELINE_BYTES alignment, otherwise we trigger the assert in xe_migrate_vram with the size now being out of alignment. Fixes: 8c2d61e0e916 ("drm/xe/migrate: don't overflow max copy size") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com (cherry picked from commit 641bcf8731d21b56760e3646a39a65f471e9efd1) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15drm/xe/kunit: Fix kerneldoc for parameterized testsMatt Roper1-0/+5
Kunit's generate_params() was recently updated to take an additional test context parameter. Xe's IP and platform parameter generators were updated accordingly at the same time, but the new parameter was not added to the functions' kerneldoc, resulting in the following warnings: Warning: drivers/gpu/drm/xe/tests/xe_pci.c:78 function parameter 'test' not described in 'xe_pci_fake_data_gen_params' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:254 function parameter 'test' not described in 'xe_pci_graphics_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:278 function parameter 'test' not described in 'xe_pci_media_ip_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:302 function parameter 'test' not described in 'xe_pci_id_gen_param' Warning: drivers/gpu/drm/xe/tests/xe_pci.c:390 function parameter 'test' not described in 'xe_pci_live_device_gen_param' 5 warnings as errors Document the new parameter to eliminate the warnings and make CI happy. Fixes: b9a214b5f6aa ("kunit: Pass parameterized test context to generate_params()") Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20251013153014.2362879-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 89e347f8a70165d1e8d88a93d875da7742c902ce) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15drm/xe/svm: Ensure data will be migrated to system if indicated by madvise.Thomas Hellström1-1/+3
If the location madvise() is set to DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, the drm_pagemap in the SVM gpu fault handler will be set to NULL. However there is nothing that explicitly migrates the data to system if it is already present in device memory. In that case, set the device memory owner to NULL to ensure data gets properly migrated to system on page-fault. v2: - Remove redundant dpagemap assignment (Himal Prasad Ghimiray) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251010104149.72783-2-thomas.hellstrom@linux.intel.com Fixes: 10aa5c806030 ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages") (cherry picked from commit 2cfcea7a745794f9b8e265a309717ca6ba335fc4) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-15drm/i915/psr: Deactivate PSR only on LNL and when selective fetch enabledJouni Högander1-2/+10
Using intel_psr_exit in frontbuffer flush on older platforms seems to be causing problems. Sending single full frame update using intel_psr_force_update is anyways more optimal compared to psr deactivate/activate -> move back to this approach on PSR1, PSR HW tracking and Panel Replay full frame update and use deactivate/activate only on LunarLake and only when selective fetch is enabled. Tested-by: Lemen <lemen@lemen.xyz> Tested-by: Koos Vriezen <koos.vriezen@gmail.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14946 Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://lore.kernel.org/r/20250922102725.2752742-1-jouni.hogander@intel.com (cherry picked from commit 924adb0bbdd8fef25fd229c76e3f602c3e8752ee) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-15drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync offThomas Zimmermann2-8/+11
Blank the display by disabling sync pulses with VGACR17<7>. Unblank by reenabling them. This VGA setting should be supported by all Aspeed hardware. Ast currently blanks via sync-off bits in VGACRB6. Not all BMCs handle VGACRB6 correctly. After disabling sync during a reboot, some BMCs do not reenable it after the soft reset. The display output remains dark. When the display is off during boot, some BMCs set the sync-off bits in VGACRB6, so the display remains dark. Observed with Blackbird AST2500 BMCs. Clearing the sync-off bits unconditionally fixes these issues. Also do not modify VGASR1's SD bit for blanking, as it only disables GPU access to video memory. v2: - init vgacrb6 correctly (Jocelyn) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: ce3d99c83495 ("drm: Call drm_atomic_helper_shutdown() at shutdown time for misc drivers") Tested-by: Nick Bowler <nbowler@draconx.ca> Reported-by: Nick Bowler <nbowler@draconx.ca> Closes: https://lore.kernel.org/dri-devel/wpwd7rit6t4mnu6kdqbtsnk5bhftgslio6e2jgkz6kgw6cuvvr@xbfswsczfqsi/ Cc: Douglas Anderson <dianders@chromium.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.7+ Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251014084743.18242-1-tzimmermann@suse.de
2025-10-14Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann1133-19291/+40342
Updating drm-misc-fixes to the state of v6.18-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-10-14drm/rockchip: vop2: use correct destination rectangle height checkAlok Tiwari1-1/+1
The vop2_plane_atomic_check() function incorrectly checks drm_rect_width(dest) twice instead of verifying both width and height. Fix the second condition to use drm_rect_height(dest) so that invalid destination rectangles with height < 4 are correctly rejected. Fixes: 604be85547ce ("drm/rockchip: Add VOP2 driver") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20251012142005.660727-1-alok.a.tiwari@oracle.com
2025-10-14drm/draw: fix color truncation in drm_draw_fill24Francesco Valla2-2/+2
The color parameter passed to drm_draw_fill24() was truncated to 16 bits, leading to an incorrect color drawn to the target iosys_map. Fix this behavior, widening the parameter to 32 bits. Fixes: 31fa2c1ca0b2 ("drm/panic: Move drawing functions to drm_draw") Signed-off-by: Francesco Valla <francesco@valla.it> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251003-drm_draw_fill24_fix-v1-1-8fb7c1c2a893@valla.it Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-13drm/xe/guc: Check GuC running state before deregistering exec queueShuicheng Lin1-1/+12
In normal operation, a registered exec queue is disabled and deregistered through the GuC, and freed only after the GuC confirms completion. However, if the driver is forced to unbind while the exec queue is still running, the user may call exec_destroy() after the GuC has already been stopped and CT communication disabled. In this case, the driver cannot receive a response from the GuC, preventing proper cleanup of exec queue resources. Fix this by directly releasing the resources when GuC is not running. Here is the failure dmesg log: " [ 468.089581] ---[ end trace 0000000000000000 ]--- [ 468.089608] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535) [ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535 [ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1 [ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1) [ 468.092716] ------------[ cut here ]------------ [ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe] " v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled(). As CT may go down and come back during VF migration. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com (cherry picked from commit 9b42321a02c50a12b2beb6ae9469606257fbecea) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/xe: Enable media sampler power gatingVinay Belgaumkar2-0/+9
Where applicable, enable media sampler power gating. Also, add it to the powergate_info debugfs. v2: Remove the sampler powergate status since it is cleared quickly anyway. v3: Use vcs mask (Rodrigo) and fix the version check for media v4: Remove extra spaces v5: Media samplers are independent of vcs mask, use Media version 1255 (Matt Roper) Fixes: 38e8c4184ea0 ("drm/xe: Enable Coarse Power Gating") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 4cbc08649a54c3d533df9832342d52d409dfbbf0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/xe: Handle mixed mappings and existing VRAM on atomic faultsMatthew Brost1-1/+12
Moving to VRAM will fail if mixed mappings are present or if the page is already located in VRAM. Atomic faults that require a move to VRAM currently retry without attempting to evict mixed mappings or locate existing VRAM mappings. This patch fixes the issue by attempting to evict mixed mappings or find existing VRAM pages when a move to VRAM fails during atomic fault handling. Fixes: a9ac0fa455b0 ("drm/xe: Strict migration policy for atomic SVM faults") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com (cherry picked from commit 75188605c56d10c1bd3b1cd94f4872f349c3a9c8) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/xe/migrate: Fix an error pathThomas Hellström1-1/+1
The exhaustive eviction accidently changed an error path goto to a return. Fix this. Fixes: 59eabff2a352 ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com (cherry picked from commit 381f1ed15159c4b3f00dd37cc70924dedebeb111) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/xe: Move rebar to be done earlierLucas De Marchi3-8/+29
There may be cases in which the BAR0 also needs to move to accommodate the bigger BAR2. However if it's not released, the BAR2 resize fails. During the vram probe it can't be released as it's already in use by xe_mmio for early register access. Add a new function in xe_vram and let xe_pci call it directly before even early device probe. This allows the BAR2 to resize in cases BAR0 also needs to move, assuming there aren't other reasons to hold that move: [] xe 0000:03:00.0: vgaarb: deactivate vga console [] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned [] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned [] pcieport 0000:00:01.0: PCI bridge to [bus 01-04] [] pcieport 0000:00:01.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:00:01.0: bridge window [mem 0x4000000000-0x44007fffff 64bit pref] [] pcieport 0000:01:00.0: PCI bridge to [bus 02-04] [] pcieport 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff] [] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] pcieport 0000:02:01.0: PCI bridge to [bus 03] [] pcieport 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff] [] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref] [] xe 0000:03:00.0: [drm] BAR2 resized to 16384M [] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ... For BMG there are additional fix needed in the PCI side, but this helps getting it to a working resize. All the rebar logic is more pci-specific than xe-specific and can be done very early in the probe sequence. In future it would be good to move it out of xe_vram.c, but this refactor is left for later. Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org # 6.12+ Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 45e33f220fd625492c11e15733d8e9b4f9db82a4) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/xe: Don't allow evicting of BOs in same VM in array of VM bindsMatthew Brost2-9/+24
An array of VM binds can potentially evict other buffer objects (BOs) within the same VM under certain conditions, which may lead to NULL pointer dereferences later in the bind pipeline. To prevent this, clear the allow_res_evict flag in the xe_bo_validate call. v2: - Invert polarity of no_res_evict (Thomas) - Add comment in code explaining issue (Thomas) Cc: stable@vger.kernel.org Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268 Fixes: 774b5fa509a9 ("drm/xe: Avoid evicting object of the same vm in none fault mode") Fixes: 77f2ef3f16f5 ("drm/xe: Lock all gpuva ops during VM bind IOCTL") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com (cherry picked from commit 8b9ba8d6d95fe75fed6b0480bb03da4b321bea08) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/xe: Increase global invalidation timeout to 1000usKenneth Graunke1-1/+1
The previous timeout of 500us seems to be too small; panning the map in the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered timeouts within a few seconds of usage, causing the monitor to freeze and the following to be printed to dmesg: [Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout [Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out I haven't hit a single timeout since increasing it to 1000us even after several multi-hour testing sessions. Fixes: 0dd2dd0182bc ("drm/xe: Move DSB l2 flush to a more sensible place") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <dev@lankhorst.se> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 146046907b56578263434107f5a7d5051847c459) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/amdkfd: fix suspend/resume all calls in mes based eviction pathJonathan Kim1-52/+21
Suspend/resume all gangs should be done with the device lock is held. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: enable suspend/resume all for gfx 12Jonathan Kim1-7/+4
Suspend/resume all gangs has been available for GFX12 for a while now so enable it. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: fix hung reset queue array memory allocationJonathan Kim5-10/+20
By design the MES will return an array result that is twice the number of hung doorbells it can report. i.e. if up k reported doorbells are supported, then the second half of the array, also of length k, holds the HQD information (type/queue/pipe) where queue 1 corresponds to index 0 and k, queue 2 corresponds to index 1 and k + 1 etc ... The driver will use the HDQ info to target queue/pipe reset for hardware scheduled user compute queues. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: fix initialization of doorbell array for detect and hangJonathan Kim1-1/+1
Initialized doorbells should be set to invalid rather than 0 to prevent driver from over counting hung doorbells since it checks against the invalid value to begin with. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: fix gfx12 mes packet status return checkJonathan Kim1-1/+6
GFX12 MES uses low 32 bits of status return for success (1 or 0) and high bits for debug information if low bits are 0. GFX11 MES doesn't do this so checking full 64-bit status return for 1 or 0 is still valid. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-10-13drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devicesJesse.Zhang3-6/+7
Previously, APU platforms (and other scenarios with uninitialized VRAM managers) triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL, but that `man->bdev` (the backing device pointer within the manager) remains uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to a kernel OOPS. 1. **amdgpu_cs.c**: Extend the existing bandwidth control check in `amdgpu_cs_get_threshold_for_moves()` to include a check for `ttm_resource_manager_used()`. If the manager is not used (uninitialized `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific logic that would trigger the NULL dereference. 2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info reporting to use a conditional: if the manager is used, return the real VRAM usage; otherwise, return 0. This avoids accessing `man->bdev` when it is NULL. 3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function) data write path. Use `ttm_resource_manager_used()` to check validity: if the manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set `fb_usage` to 0 (APUs have no discrete framebuffer to report). This approach is more robust than APU-specific checks because it: - Works for all scenarios where the VRAM manager is uninitialized (not just APUs), - Aligns with TTM's design by using its native helper function, - Preserves correct behavior for discrete GPUs (which have fully initialized `man->bdev` and pass the `ttm_resource_manager_used()` check). v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: hide VRAM sysfs attributes on GPUs without VRAMChristian König1-0/+3
Otherwise accessing them can cause a crash. Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: fix bit shift logicSathishkumar S1-1/+1
BIT_ULL(n) sets nth bit, remove explicit shift and set the position Fixes: a7a411e24626 ("drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set") Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amd/powerplay: Fix CIK shutdown temperatureTimur Kristóf1-2/+1
Remove extra multiplication. CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case the shutdown temperature is hardcoded in smu7_init_dpm_defaults and is already multiplied by 1000. The value was mistakenly multiplied another time by smu7_get_thermal_temperature_range. Fixes: 4ba082572a42 ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: use atomic functions with memory barriers for vm fault infoGui-Dong Han3-11/+8
The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: b97dfa27ef3a ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: set an error on all fences from a bad contextAlex Deucher3-6/+37
When we backup ring contents to reemit after a queue reset, we don't backup ring contents from the bad context. When we signal the fences, we should set an error on those fences as well. v2: misc cleanups v3: add locking for fence error, fix comment (Christian) v4: fix wrap around, locking (Christian) Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: handle wrap around in reemit handlingAlex Deucher1-5/+10
Compare the sequence numbers directly. Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: fix handling of harvesting for ip_discovery firmwareAlex Deucher1-1/+17
Chips which use the IP discovery firmware loaded by the driver reported incorrect harvesting information in the ip discovery table in sysfs because the driver only uses the ip discovery firmware for populating sysfs and not for direct parsing for the driver itself as such, the fields that are used to print the harvesting info in sysfs report incorrect data for some IPs. Populate the relevant fields for this case as well. Fixes: 514678da56da ("drm/amdgpu/discovery: fix fw based ip discovery") Acked-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: block CE CS if not explicitely allowed by module optionChristian König3-1/+14
The Constant Engine found on gfx6-gfx10 HW has been a notorious source of problems. RADV never used it in the first place, radeonsi only used it for a few releases around 2017 for gfx6-gfx9 before dropping support for it as well. While investigating another problem I just recently found that submitting to the CE seems to be completely broken on gfx9 for quite a while. Since nobody complained about that problem it most likely means that nobody is using any of the affected radeonsi versions on current Linux kernels any more. So to potentially phase out the support for the CE and eliminate another source of problems block submitting CE IBs unless it is enabled again using a debug flag. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: remove two invalid BUG_ON()sChristian König2-4/+0
Those can be triggered trivially by userspace. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amd: Disable ASPM on SITimur Kristóf1-0/+7
Enabling ASPM causes randoms hangs on Tahiti and Oland on Zen4. It's unclear if this is a platform-specific or GPU-specific issue. Disable ASPM on SI for the time being. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amd/pm: Disable MCLK switching on SI at high pixel clocksTimur Kristóf1-0/+5
On various SI GPUs, a flickering can be observed near the bottom edge of the screen when using a single 4K 60Hz monitor over DP. Disabling MCLK switching works around this problem. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13Revert "drm/amd/display: Only restore backlight after amdgpu_dm_init or ↵Matthew Schwartz2-15/+4
dm_resume" This fix regressed the original issue that commit 7875afafba84 ("drm/amd/display: Fix brightness level not retained over reboot") solved, so revert it until a different approach to solve the regression that it caused with AMD_PRIVATE_COLOR is found. Fixes: a490c8d77d50 ("drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4620 Cc: stable@vger.kernel.org Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/i915/fb: Fix the set_tiling vs. addfb race, againVille Syrjälä1-18/+20
intel_frontbuffer_get() is what locks out subsequent set_tiling changes to the bo. Thus the fence vs. modifier check must be done after intel_frontbuffer_get(), or else a concurrent set_tiling ioctl might sneak in and change the fence after the check has been done. Close the race again. See commit dd689287b977 ("drm/i915: Prevent concurrent tiling/framebuffer modifications") for the previous instance. v2: Reorder intel_user_framebuffer_destroy() to match the unwind (Jani) Cc: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Fixes: 10690b8a49bc ("drm/i915/display: Add intel_fb_bo_framebuffer_fini") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20251003145734.7634-3-ville.syrjala@linux.intel.com (cherry picked from commit 1d1e4ded216017f8febd91332ee337f0e0e79285) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-13drm/i915/frontbuffer: Move bo refcounting intel_frontbuffer_{get,release}()Ville Syrjälä2-3/+9
Currently xe's intel_frontbuffer implementation forgets to hold a reference on the bo. This makes the entire thing extremely fragile as the cleanup order now depends on bo references held by other things (namely intel_fb_bo_framebuffer_fini()). Move the bo refcounting to intel_frontbuffer_{get,release}() so that both i915 and xe do this the same way. I first tried to fix this by having xe do the refcounting from its intel_bo_set_frontbuffer() implementation (which is what i915 does currently), but turns out xe's drm_gem_object_free() can sleep and thus drm_gem_object_put() isn't safe to call while we hold fb_tracking.lock. Fixes: 10690b8a49bc ("drm/i915/display: Add intel_fb_bo_framebuffer_fini") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20251003145734.7634-2-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit eb4d490729a5fd8dc5a76d334f8d01fec7c14bbe) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-13drm/i915/guc: Skip communication warning on reset in progressZhanjun Dong1-1/+8
GuC IRQ and tasklet handler receive just single G2H message, and let other messages to be received from next tasklet. During this chained tasklet process, if reset process started, communication will be disabled. Skip warning for this condition. Fixes: 65dd4ed0f4e1 ("drm/i915/guc: Don't receive all G2H messages in irq handler") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15018 Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250929152904.269776-1-zhanjun.dong@intel.com (cherry picked from commit 604b5ee4a653a70979ce689dbd6a5d942eb016bf) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-11drm/bridge: lt9211: Drop check for last nibble of version registerMarek Vasut1-2/+1
There is now a new LT9211 rev. U5, which reports chip ID 0x18 0x01 0xe4 . The previous LT9211 reported chip ID 0x18 0x01 0xe3 , which is what the driver checks for right now. Since there is a possibility there will be yet another revision of the LT9211 in the future, drop the last version nibble check to allow all future revisions of the chip to work with this driver. This fix makes LT9211 rev. U5 work with this driver. Fixes: 8ce4129e3de4 ("drm/bridge: lt9211: Add Lontium LT9211 bridge driver") Signed-off-by: Marek Vasut <marek.vasut@mailbox.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251011110017.12521-1-marek.vasut@mailbox.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-10-10Merge tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds43-248/+385
Pull more drm fixes from Dave Airlie: "Just the follow up fixes for rc1 from the next branch, amdgpu and xe mostly with a single v3d fix in there. amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL xe: - Fix build with clang 16 - Fix handling of invalid configfs syntax usage and spell out the expected syntax in the documentation - Do not try late bind firmware when running as VF since it shouldn't handle firmware loading - Fix idle assertion for local BOs - Fix uninitialized variable for late binding - Do not require perfmon_capable to expose free memory at page granularity. Handle it like other drm drivers do - Fix lock handling on suspend error path - Fix I2C controller resume after S3 v3d: - fix fence locking" * tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits) drm/amd/display: Incorrect Mirror Cositing drm/amd/display: Enable Dynamic DTBCLK Switch drm/amdgpu: Report individual reset error drm/amdgpu: partially revert "revert to old status lock handling v3" drm/amd/display: Fix unsafe uses of kernel mode FPU drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine drm/amdgpu: Check swus/ds for switch state save drm/amdkfd: Fix two comments in kfd_ioctl.h drm/amd/pm: Avoid interface mismatch messaging drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init drm/amd/amdgpu: Fix the mes version that support inv_tlbs drm/amd: Check whether secure display TA loaded successfully drm/amdkfd: Fix mmap write lock not release drm/amdkfd: Fix kfd process ref leaking when userptr unmapping drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O. drm/amd/display: Disable scaling on DCE6 for now drm/amd/display: Properly disable scaling on DCE6 drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6 drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs ...
2025-10-10Merge tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds3-8/+17
Pull drm fixes from Dave Airlie: "Some fixes leftover from our fixes branch, just nouveau and vmwgfx: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation" * tag 'drm-fixes-2025-10-11' of https://gitlab.freedesktop.org/drm/kernel: drm/nouveau: fix bad ret code in nouveau_bo_move_prep drm/vmwgfx: Fix copy-paste typo in validation drm/vmwgfx: Fix Use-after-free in validation drm/vmwgfx: Fix a null-ptr access in the cursor snooper
2025-10-11Merge tag 'drm-misc-fixes-2025-10-09' of ↵Dave Airlie3-8/+17
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: nouveau: - Return errno code from TTM move helper vmwgfx: - Fix null-ptr access in cursor code - Fix UAF in validation - Use correct iterator in validation Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20251009120004.GA17570@linux.fritz.box
2025-10-10Merge tag 'amd-drm-next-6.18-2025-10-09' of ↵Dave Airlie27-172/+274
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-10-09: amdgpu: - DC DCE6 fixes - GPU reset fixes - Secure diplay messaging cleanup - MES fix - GPUVM locking fixes - PMFW messaging cleanup - PCI US/DS switch handling fix - VCN queue reset fix - DC FPU handling fix - DCN 3.5 fix - DC mirroring fix amdkfd: - Fix kfd process ref leak - mmap write lock handling fix - Fix comments in IOCTL Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20251009162915.981503-1-alexander.deucher@amd.com
2025-10-09drm/panthor: Ensure MCU is disabled on suspendKetil Johnsen1-0/+1
Currently the Panthor driver needs the GPU to be powered down between suspend and resume. If this is not done, then the MCU_CONTROL register will be preserved as AUTO, which again will cause a premature FW boot on resume. The FW will go directly into fatal state in this case. This case needs to be handled as there is no guarantee that the GPU will be powered down after the suspend callback on all platforms. The fix is to call panthor_fw_stop() in "pre-reset" path to ensure the MCU_CONTROL register is cleared (set DISABLE). This matches well with the already existing call to panthor_fw_start() from the "post-reset" path. Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block") Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20251008105112.4077015-1-ketil.johnsen@arm.com
2025-10-07drm/nouveau: fix bad ret code in nouveau_bo_move_prepShuhao Fu1-1/+1
In `nouveau_bo_move_prep`, if `nouveau_mem_map` fails, an error code should be returned. Currently, it returns zero even if vmm addr is not correctly mapped. Cc: stable@vger.kernel.org Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Fixes: 9ce523cc3bf2 ("drm/nouveau: separate buffer object backing memory from nvkm structures") Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-10-07drm/amd/display: Incorrect Mirror CositingJesse Agate1-5/+5
[WHY] hinit/vinit are incorrect in the case of mirroring. [HOW] Cositing sign must be flipped when image is mirrored in the vertical or horizontal direction. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Jesse Agate <jesse.agate@amd.com> Signed-off-by: Brendan Leder <breleder@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-07drm/amd/display: Enable Dynamic DTBCLK SwitchFangzhi Zuo1-0/+4
[WHAT] Since dcn35, DTBCLK can be disabled when no DP2 sink connected for power saving purpose. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-07drm/amdgpu: Report individual reset errorLijo Lazar1-10/+15
If reinitialization of one of the GPUs fails after reset, it logs failure on all subsequent GPUs eventhough they have resumed successfully. A sample log where only device at 0000:95:00.0 had a failure - amdgpu 0000:15:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:65:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:75:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:85:00.0: amdgpu: GPU reset(19) succeeded! amdgpu 0000:95:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:e5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:f5:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:05:00.0: amdgpu: GPU reset(19) failed amdgpu 0000:15:00.0: amdgpu: GPU reset end with ret = -5 To avoid confusion, report the error for each device separately and return the first error as the overall result. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-07drm/amdgpu: partially revert "revert to old status lock handling v3"Christian König4-68/+105
The CI systems are pointing out list corruptions, so we still need to fix something here. Keep the asserts, but revert the lock changes for now. Fixes: 59e4405e9ee2 ("drm/amdgpu: revert to old status lock handling v3") Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>