| Age | Commit message (Collapse) | Author | Files | Lines |
|
Prevent application hangs caused by out-of-order fence signaling when
user fences are attached. Use drm_syncobj (via dma-fence-chain) to
guarantee that each user fence signals in order, regardless of the
signaling order of the attached fences. Ensure user fence writebacks to
user space occur in the correct sequence.
v7:
- Skip drm_syncbj create of error (CI)
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com
(cherry picked from commit adda4e855ab6409a3edaa585293f1f2069ab7299)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Currently Xe driver is triggering flr without any clean-up on
shutdown. This is causing random warnings from pending related works as the
underlying hardware is reset in the middle of their execution.
Fix this by performing clean shutdown also when using flr.
Fixes: 501d799a47e2 ("drm/xe: Wire up device shutdown handler")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patch.msgid.link/20251031122312.1836534-1-jouni.hogander@intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
(cherry picked from commit a4ff26b7c8ef38e4dd34f77cbcd73576fdde6dd4)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The xe_device_shutdown() function was needing a few declarations
that were only required under a specific condition. This change
moves those declarations to be within that conditional branch
to avoid unnecessary declarations.
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20251007100208.1407021-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit 15b3036045188f4da4ca62b2ed01b0f160252e9b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Cancel and wait for any Dead CT worker to complete before continuing
with device unbinding. Else the worker will end up using resources freed
by the undind operation.
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Fixes: d2c5a5a926f4 ("drm/xe/guc: Dead CT helper")
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251103123144.3231829-6-balasubramani.vivekanandan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 492671339114e376aaa38626d637a2751cdef263)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Waking the device during a GT reset can lead to unintended memory
allocation, which is not allowed since GT resets occur in the reclaim
path. Prevent this by holding a PM reference while a reset is in flight.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20251022005538.828980-3-matthew.brost@intel.com
(cherry picked from commit 480b358e7d8ef69fd8f1b0cad6e07c7d70a36ee4)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
the DEFINE_CLASS() macro creates an inline function and
the init args are passed down to it; since _ret is passed as an int,
whatever value is set inside the function is not visible to the caller.
Pass _ret as a pointer so its value propagates to the caller.
Fixes: c460bc2311df ("drm/xe: Introduce an xe_validation wrapper around drm_exec")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6220
Cc: Maarten Lankhorst <maarten.lankhorst@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251027131228.12098-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit fcb8c304f4673747d535c74b340b5b8a4823727b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Workqueue allocation can fail, so check the return value of the GGTT
workqueue allocation and fail driver initialization if the allocation
fails.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20251022005538.828980-2-matthew.brost@intel.com
(cherry picked from commit 1f1314e8e71385bae319e43082b798c11f6648bc)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The madvise implementation currently resets the SVM madvise if the
underlying CPU map is unmapped. This is in an attempt to mimic the
CPU madvise behaviour. However, it's not clear that this is a desired
behaviour since if the end app user relies on it for malloc()ed
objects or stack objects, it may not work as intended.
Instead of having the autoreset functionality being a direct
application-facing implicit UAPI, make the UMD explicitly choose
this behaviour if it wants to expose it by introducing
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, and add a semantics
description.
v2:
- Kerneldoc fixes. Fix a commit log message.
Fixes: a2eb8aec3ebe ("drm/xe: Reset VMA attributes to default in SVM garbage collector")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: "Falkowski, John" <john.falkowski@intel.com>
Cc: "Mrozek, Michal" <michal.mrozek@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20251015170726.178685-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit 59a2d3f38ab23cce4cd9f0c4a5e08fdfe9e67ae7)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
When splitting and restoring vmas for madvise, we only copied the
XE_VMA_SYSTEM_ALLOCATOR flag. That meant we lost flags for read_only,
dumpable and sparse (in case anyone would call madvise for the latter).
Instead, define a mask of relevant flags and ensure all are replicated,
To simplify this and make the code a bit less fragile, remove the
conversion to VMA_CREATE flags and instead just pass around the
gpuva flags after initial conversion from user-space.
Fixes: a2eb8aec3ebe ("drm/xe: Reset VMA attributes to default in SVM garbage collector")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251015170726.178685-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit b3af8658ec70f2196190c66103478352286aba3b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
This assert can trigger here with non pin_map users that select
LATE_RESTORE, since the vmap is allowed to be NULL given that
save/restore can now use the blitter instead. The check here doesn't
seem to have much value anymore given that we no longer move pinned
memory, so any existing vmap is left well alone, and doesn't need to be
recreated upon restore, so just drop the assert here.
Fixes: 86f69c26113c ("drm/xe: use backup object for pinned save/restore")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6213
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20251010152457.177884-2-matthew.auld@intel.com
(cherry picked from commit a10b4a69c7f8f596d2c5218fbe84430734fab3b2)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
If current bytes exceeds the max copy size, ensure the clamped size
still accounts for the XE_CACHELINE_BYTES alignment, otherwise we
trigger the assert in xe_migrate_vram with the size now being out of
alignment.
Fixes: 8c2d61e0e916 ("drm/xe/migrate: don't overflow max copy size")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com
(cherry picked from commit 641bcf8731d21b56760e3646a39a65f471e9efd1)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Kunit's generate_params() was recently updated to take an additional
test context parameter. Xe's IP and platform parameter generators were
updated accordingly at the same time, but the new parameter was not
added to the functions' kerneldoc, resulting in the following warnings:
Warning: drivers/gpu/drm/xe/tests/xe_pci.c:78 function parameter 'test' not described in 'xe_pci_fake_data_gen_params'
Warning: drivers/gpu/drm/xe/tests/xe_pci.c:254 function parameter 'test' not described in 'xe_pci_graphics_ip_gen_param'
Warning: drivers/gpu/drm/xe/tests/xe_pci.c:278 function parameter 'test' not described in 'xe_pci_media_ip_gen_param'
Warning: drivers/gpu/drm/xe/tests/xe_pci.c:302 function parameter 'test' not described in 'xe_pci_id_gen_param'
Warning: drivers/gpu/drm/xe/tests/xe_pci.c:390 function parameter 'test' not described in 'xe_pci_live_device_gen_param'
5 warnings as errors
Document the new parameter to eliminate the warnings and make CI happy.
Fixes: b9a214b5f6aa ("kunit: Pass parameterized test context to generate_params()")
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20251013153014.2362879-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 89e347f8a70165d1e8d88a93d875da7742c902ce)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
If the location madvise() is set to
DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM, the drm_pagemap in the
SVM gpu fault handler will be set to NULL. However there is nothing
that explicitly migrates the data to system if it is already present
in device memory.
In that case, set the device memory owner to NULL to ensure
data gets properly migrated to system on page-fault.
v2:
- Remove redundant dpagemap assignment (Himal Prasad Ghimiray)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20251010104149.72783-2-thomas.hellstrom@linux.intel.com
Fixes: 10aa5c806030 ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages")
(cherry picked from commit 2cfcea7a745794f9b8e265a309717ca6ba335fc4)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
In normal operation, a registered exec queue is disabled and
deregistered through the GuC, and freed only after the GuC confirms
completion. However, if the driver is forced to unbind while the exec
queue is still running, the user may call exec_destroy() after the GuC
has already been stopped and CT communication disabled.
In this case, the driver cannot receive a response from the GuC,
preventing proper cleanup of exec queue resources. Fix this by directly
releasing the resources when GuC is not running.
Here is the failure dmesg log:
"
[ 468.089581] ---[ end trace 0000000000000000 ]---
[ 468.089608] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
[ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535
[ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1
[ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1)
[ 468.092716] ------------[ cut here ]------------
[ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe]
"
v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled().
As CT may go down and come back during VF migration.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010172529.2967639-2-shuicheng.lin@intel.com
(cherry picked from commit 9b42321a02c50a12b2beb6ae9469606257fbecea)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Where applicable, enable media sampler power gating. Also, add
it to the powergate_info debugfs.
v2: Remove the sampler powergate status since it is cleared quickly anyway.
v3: Use vcs mask (Rodrigo) and fix the version check for media
v4: Remove extra spaces
v5: Media samplers are independent of vcs mask,
use Media version 1255 (Matt Roper)
Fixes: 38e8c4184ea0 ("drm/xe: Enable Coarse Power Gating")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 4cbc08649a54c3d533df9832342d52d409dfbbf0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Moving to VRAM will fail if mixed mappings are present or if the page is
already located in VRAM. Atomic faults that require a move to VRAM
currently retry without attempting to evict mixed mappings or locate
existing VRAM mappings.
This patch fixes the issue by attempting to evict mixed mappings or find
existing VRAM pages when a move to VRAM fails during atomic fault
handling.
Fixes: a9ac0fa455b0 ("drm/xe: Strict migration policy for atomic SVM faults")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20251009130629.3531962-1-matthew.brost@intel.com
(cherry picked from commit 75188605c56d10c1bd3b1cd94f4872f349c3a9c8)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The exhaustive eviction accidently changed an error path goto to
a return. Fix this.
Fixes: 59eabff2a352 ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 381f1ed15159c4b3f00dd37cc70924dedebeb111)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
There may be cases in which the BAR0 also needs to move to accommodate
the bigger BAR2. However if it's not released, the BAR2 resize fails.
During the vram probe it can't be released as it's already in use by
xe_mmio for early register access.
Add a new function in xe_vram and let xe_pci call it directly before
even early device probe. This allows the BAR2 to resize in cases BAR0
also needs to move, assuming there aren't other reasons to hold that
move:
[] xe 0000:03:00.0: vgaarb: deactivate vga console
[] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB
[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing
[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned
[] pcieport 0000:00:01.0: PCI bridge to [bus 01-04]
[] pcieport 0000:00:01.0: bridge window [mem 0x83000000-0x840fffff]
[] pcieport 0000:00:01.0: bridge window [mem 0x4000000000-0x44007fffff 64bit pref]
[] pcieport 0000:01:00.0: PCI bridge to [bus 02-04]
[] pcieport 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]
[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
[] pcieport 0000:02:01.0: PCI bridge to [bus 03]
[] pcieport 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff]
[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
[] xe 0000:03:00.0: [drm] BAR2 resized to 16384M
[] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ...
For BMG there are additional fix needed in the PCI side, but this
helps getting it to a working resize.
All the rebar logic is more pci-specific than xe-specific and can be
done very early in the probe sequence. In future it would be good to
move it out of xe_vram.c, but this refactor is left for later.
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org # 6.12+
Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20250918-xe-pci-rebar-2-v1-2-6c094702a074@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 45e33f220fd625492c11e15733d8e9b4f9db82a4)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
An array of VM binds can potentially evict other buffer objects (BOs)
within the same VM under certain conditions, which may lead to NULL
pointer dereferences later in the bind pipeline. To prevent this, clear
the allow_res_evict flag in the xe_bo_validate call.
v2:
- Invert polarity of no_res_evict (Thomas)
- Add comment in code explaining issue (Thomas)
Cc: stable@vger.kernel.org
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268
Fixes: 774b5fa509a9 ("drm/xe: Avoid evicting object of the same vm in none fault mode")
Fixes: 77f2ef3f16f5 ("drm/xe: Lock all gpuva ops during VM bind IOCTL")
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com
(cherry picked from commit 8b9ba8d6d95fe75fed6b0480bb03da4b321bea08)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The previous timeout of 500us seems to be too small; panning the map in
the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered
timeouts within a few seconds of usage, causing the monitor to freeze
and the following to be printed to dmesg:
[Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout
[Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out
I haven't hit a single timeout since increasing it to 1000us even after
several multi-hour testing sessions.
Fixes: 0dd2dd0182bc ("drm/xe: Move DSB l2 flush to a more sensible place")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 146046907b56578263434107f5a7d5051847c459)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Pull more drm fixes from Dave Airlie:
"Just the follow up fixes for rc1 from the next branch, amdgpu and xe
mostly with a single v3d fix in there.
amdgpu:
- DC DCE6 fixes
- GPU reset fixes
- Secure diplay messaging cleanup
- MES fix
- GPUVM locking fixes
- PMFW messaging cleanup
- PCI US/DS switch handling fix
- VCN queue reset fix
- DC FPU handling fix
- DCN 3.5 fix
- DC mirroring fix
amdkfd:
- Fix kfd process ref leak
- mmap write lock handling fix
- Fix comments in IOCTL
xe:
- Fix build with clang 16
- Fix handling of invalid configfs syntax usage and spell out the
expected syntax in the documentation
- Do not try late bind firmware when running as VF since it shouldn't
handle firmware loading
- Fix idle assertion for local BOs
- Fix uninitialized variable for late binding
- Do not require perfmon_capable to expose free memory at page
granularity. Handle it like other drm drivers do
- Fix lock handling on suspend error path
- Fix I2C controller resume after S3
v3d:
- fix fence locking"
* tag 'drm-next-2025-10-11-1' of https://gitlab.freedesktop.org/drm/kernel: (34 commits)
drm/amd/display: Incorrect Mirror Cositing
drm/amd/display: Enable Dynamic DTBCLK Switch
drm/amdgpu: Report individual reset error
drm/amdgpu: partially revert "revert to old status lock handling v3"
drm/amd/display: Fix unsafe uses of kernel mode FPU
drm/amd/pm: Disable VCN queue reset on SMU v13.0.6 due to regression
drm/amdgpu: Fix general protection fault in amdgpu_vm_bo_reset_state_machine
drm/amdgpu: Check swus/ds for switch state save
drm/amdkfd: Fix two comments in kfd_ioctl.h
drm/amd/pm: Avoid interface mismatch messaging
drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init
drm/amd/amdgpu: Fix the mes version that support inv_tlbs
drm/amd: Check whether secure display TA loaded successfully
drm/amdkfd: Fix mmap write lock not release
drm/amdkfd: Fix kfd process ref leaking when userptr unmapping
drm/amdgpu: Fix for GPU reset being blocked by KIQ I/O.
drm/amd/display: Disable scaling on DCE6 for now
drm/amd/display: Properly disable scaling on DCE6
drm/amd/display: Properly clear SCL_*_FILTER_CONTROL on DCE6
drm/amd/display: Add missing DCE6 SCL_HORZ_FILTER_INIT* SRIs
...
|
|
In S3 and above sleep states, the device can loose power regardless of
d3cold.allowed flag. Bring up I2C controller explicitly in system PM
path to ensure its normal operation after losing power.
v2: Cover S3 and above states (Rodrigo)
Fixes: 0ea07b69517a ("drm/xe/pm: Wire up suspend/resume for I2C controller")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250918103200.2952576-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit e4863f1159befcd70df24fcb5458afaf2feab043)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
In xe_hw_engine_group_get_mode(), a write lock is acquired before
calling switch_mode(), which in turn invokes
xe_hw_engine_group_suspend_faulting_lr_jobs().
On failure inside xe_hw_engine_group_suspend_faulting_lr_jobs(),
the write lock is released there, and then again in
xe_hw_engine_group_get_mode(), leading to a double release.
Fix this by keeping both acquire and release operation in
xe_hw_engine_group_get_mode().
Fixes: 770bd1d34113 ("drm/xe/hw_engine_group: Ensure safe transition between execution modes")
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250925023145.1203004-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 662d98b8b373007fa1b08ba93fee11f6fd3e387c)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Currently this is hidden behind perfmon_capable() since this is
technically an info leak, given that this is a system wide metric.
However the granularity reported here is always PAGE_SIZE aligned, which
matches what the core kernel is already willing to expose to userspace
if querying how many free RAM pages there are on the system, and that
doesn't need any special privileges. In addition other drm drivers seem
happy to expose this.
The motivation here if with oneAPI where they want to use the system
wide 'used' reporting here, so not the per-client fdinfo stats. This has
also come up with some perf overlay applications wanting this
information.
Fixes: 1105ac15d2a1 ("drm/xe/uapi: restrict system wide accounting")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joshua Santosh <joshua.santosh.ranjan@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250919122052.420979-2-matthew.auld@intel.com
(cherry picked from commit 4d0b035fd6dae8ee48e9c928b10f14877e595356)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Initialize the uval variable to 0 in xe_late_bind_fw_num_fans() to fix
a potential use of uninitialized variable warning and ensure predictable
behavior.
The variable is passed by reference to xe_pcode_read() which should
populate it on success, but initializing it to 0 provides a safe
default value and follows kernel coding best practices.
v2:
- uval = 0 which serves as both a safe default and the fallback
value when the pcode read operation fails.
v3:
- Handle MMIO failure (Rodrigo)
- The function should probably return the error and make the uval as
pointer-argument, like the pcode_read.
- Change the caller of this function to propagate the error
upwards if mmio failed.
Fixes: 45832bf9c10f3 ("drm/xe/xe_late_bind_fw: Initialize late binding firmware")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Link: https://lore.kernel.org/r/20251002005648.3185636-1-mallesh.koujalagi@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 07abc16c14693df703763c45e9fc0abfefc927d5)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
When userptr is used on SVM-enabled VMs, a non-NULL
hmm_range::dev_private_owner value might mean that
hmm_range_fault() attempts to return device private pages.
Either that will fail, or the userptr code will not know
how to handle those.
Use NULL for hmm_range::dev_private_owner to migrate
such pages to system. In order to do that, move the
struct drm_gpusvm::device_private_page_owner field to
struct drm_gpusvm_ctx::device_private_page_owner so that
it doesn't remain immutable over the drm_gpusvm lifetime.
v2:
- Don't conditionally compile xe_svm_devm_owner().
- Kerneldoc xe_svm_devm_owner().
Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250930122752.96034-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit ad298d9ec957414dbf3d51f3c8bca4b6d2416c0c)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The variable offset is not being initialized, and it is only set inside
a for-loop if entry->name is the same as manifest_entry. In the case
where it is not initialized a non-zero check on offset is potentialy checking
a bogus uninitalized value. Fix this by initializing offset to zero.
Fixes: efa29317a553 ("drm/xe/xe_late_bind_fw: Extract and print version info")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250924102208.9216-1-colin.i.king@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 20f3b28e2e07747fd27301f0f5deb3cb569ee15c)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Before calling ttm_bo_populate() in the CPU fault path of a bo,
we assert that the bo is not being migrated. However, for
local bos we share the reservation object with other local bos
that might be in the process of being migrated. Also some VM
operations may attach USAGE_KERNEL fences to the common
reservation object and trigger false positives from the assert.
So remove the assert and instead wait for bo idle. This may
unnecessarily wait for idle in some cases but since we're
doing this wait later in the fault path anyway we might as
well do it here as well.
This fixes warnings like:
Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------
Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) || (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr>
Sep 25 14:56:23 desky kernel: snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor>
Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G U W 6.17.0-rc7+ #32 PREEMPT(voluntary)
Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN
Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022
Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4>
Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286
Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027
Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0
Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa
Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8
Sep 25 14:56:23 desky kernel: FS: 00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000
Sep 25 14:56:23 desky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0
Sep 25 14:56:23 desky kernel: PKRU: 55555558
Sep 25 14:56:23 desky kernel: Call Trace:
Sep 25 14:56:23 desky kernel: <TASK>
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe]
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault+0x84/0x410 [xe]
Sep 25 14:56:23 desky kernel: ? __x64_sys_mmap+0x33/0x50
Sep 25 14:56:23 desky kernel: ? x64_sys_call+0x1b2e/0x20d0
Sep 25 14:56:23 desky kernel: ? do_syscall_64+0x9d/0x1f0
Sep 25 14:56:23 desky kernel: ? __check_object_size+0x4a/0x2e0
Sep 25 14:56:23 desky kernel: __do_fault+0x36/0x190
Sep 25 14:56:23 desky kernel: do_fault+0xcf/0x570
Sep 25 14:56:23 desky kernel: __handle_mm_fault+0x92b/0xfe0
Sep 25 14:56:23 desky kernel: ? ktime_get_mono_fast_ns+0x39/0xd0
Sep 25 14:56:23 desky kernel: handle_mm_fault+0x164/0x2c0
Sep 25 14:56:23 desky kernel: do_user_addr_fault+0x2cb/0x840
Sep 25 14:56:23 desky kernel: exc_page_fault+0x75/0x180
Sep 25 14:56:23 desky kernel: asm_exc_page_fault+0x27/0x30
Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7
Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7>
Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207
Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000
Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020
Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000
Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240
Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000
Sep 25 14:56:23 desky kernel: </TASK>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250
Fixes: c2ae94cf8cd8 ("drm/xe: Convert the CPU fault handler for exhaustive eviction")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250929112649.6131-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 8f1756a7ea33b352a54e6f53d76c552b3a424187)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
In general, the VFs can't load firmwares so attempt to initialize
the firmware late-bind component leads to errors like:
[] xe 0000:03:00.1: [drm] *ERROR* Late bind component not bound
Fixes: 918bd789d62e ("drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6190
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-3-michal.wajdeczko@intel.com
(cherry picked from commit e35e288090f362be88d77b60d9846cea15df173e)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
This is a VF only function and its name should reflect that to
avoid any confusion. Move the VF check to the caller side.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-2-michal.wajdeczko@intel.com
(cherry picked from commit b88bb1eefa88f0cefc00fe5e78b1186cd8f9db78)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Spell out the syntax instead of only using examples. Particularly
important the <engine-class> part since that's different than
engines_allowed and may confuse users. The same batch buffer is used for
all engines of a certain class.
Cc: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Fixes: e2a9854d806e ("drm/xe/configfs: Allow to select by class only")
Link: https://lore.kernel.org/r/20250924152709.659483-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 47ca7acff4011fa322853a3612f464b959e88210)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
If mask is NULL, only the engine class should be accepted, so the
pattern string should be completely parsed. This should fix passing e.g.
rcs0 to ctx_restore_post_bb when it's only expecting the engine class.
Reported-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Closes: https://lore.kernel.org/r/20250922155544.67712-1-jonathan.cavitt@intel.com
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/aNJKnrCQmL9xS9Gv@stanley.mountain
Fixes: e2a9854d806e ("drm/xe/configfs: Allow to select by class only")
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250924152709.659483-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit dd797967160b79cc0ca2d2eb05fc55436b66dce0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The following error was reported when building with clang 16.0.6:
In file included from drivers/gpu/drm/xe/xe_pci.c:1104:
>> drivers/gpu/drm/xe/tests/xe_pci.c:214:2: error: initializer \
element is not a compile-time constant
graphics_ip_xelp,
^~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/tests/xe_pci.c:221:2: error: initializer \
element is not a compile-time constant
media_ip_xem,
^~~~~~~~~~~~
2 errors generated.
Fix that by explicit re-definition of pre-GMDID IPs, as there are
not so many of them.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509192041.tQwdE4DS-lkp@intel.com/
Fixes: 5bb5258e357e ("drm/xe/tests: Add pre-GMDID IP descriptors to param generators")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250922101207.192028-1-michal.wajdeczko@intel.com
(cherry picked from commit 2de80e2da74b402a9d838b8e729cd01cf94cdcbc)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Pull drm updates from Dave Airlie:
"cross-subsystem:
- i2c-hid: Make elan touch controllers power on after panel is
enabled
- dt bindings for STM32MP25 SoC
- pci vgaarb: use screen_info helpers
- rust pin-init updates
- add MEI driver for late binding firmware update/load
uapi:
- add ioctl for reassigning GEM handles
- provide boot_display attribute on boot-up devices
core:
- document DRM_MODE_PAGE_FLIP_EVENT
- add vendor specific recovery method to drm device wedged uevent
gem:
- Simplify gpuvm locking
ttm:
- add interface to populate buffers
sched:
- Fix race condition in trace code
atomic:
- Reallow no-op async page flips
display:
- dp: Fix command length
video:
- Improve pixel-format handling for struct screen_info
rust:
- drop Opaque<> from ioctl args
- Alloc:
- BorrowedPage type and AsPageIter traits
- Implement Vmalloc::to_page() and VmallocPageIter
- DMA/Scatterlist:
- Add dma::DataDirection and type alias for dma_addr_t
- Abstraction for struct scatterlist and sg_table
- DRM:
- simplify use of generics
- add DriverFile type alias
- drop Object::SIZE
- Rust:
- pin-init tree merge
- Various methods for AsBytes and FromBytes traits
gpuvm:
- Support madvice in Xe driver
gpusvm:
- fix hmm_pfn_to_map_order usage in gpusvm
bridge:
- Improve and fix ref counting on bridge management
- cdns-dsi: Various improvements to mode setting
- Support Solomon SSD2825 plus DT bindings
- Support Waveshare DSI2DPI plus DT bindings
- Support Content Protection property
- display-connector: Improve DP display detection
- Add support for Radxa Ra620 plus DT bindings
- adv7511: Provide SPD and HDMI infoframes
- it6505: Replace crypto_shash with sha()
- synopsys: Add support for DW DPTX Controller plus DT bindings
- adv7511: Write full Audio infoframe
- ite6263: Support vendor-specific infoframes
- simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings
panel:
- panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
Support SHP LQ134Z1; Fixes
- panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
- Support Samsung AMS561RA01
- Support Hydis HV101HD1 plus DT bindings
- ilitek-ili9881c: Refactor mode setting; Add support for Bestar
BSD1218-A101KL68 LCD plus DT bindings
- lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
- edp: Add support for additonal mt8189 Chromebook panels
- lvds: Add DT bindings for EDT ETML0700Z8DHA
amdgpu:
- add CRIU support for gem objects
- RAS updates
- VCN SRAM load fixes
- EDID read fixes
- eDP ALPM support
- Documentation updates
- Rework PTE flag generation
- DCE6 fixes
- VCN devcoredump cleanup
- MMHUB client id fixes
- VCN 5.0.1 RAS support
- SMU 13.0.x updates
- Expanded PCIe DPC support
- Expanded VCN reset support
- VPE per queue reset support
- give kernel jobs unique id for tracing
- pre-populate exported buffers
- cyan skillfish updates
- make vbios build number available in sysfs
- userq updates
- HDCP updates
- support MMIO remap page as ttm pool
- JPEG parser updates
- DCE6 DC updates
- use devm for i2c buses
- GPUVM locking updates
- Drop non-DC DCE11 code
- improve fallback handling for pixel encoding
amdkfd:
- SVM/page migration fixes
- debugfs fixes
- add CRIO support for gem objects
- SVM updates
radeon:
- use dev_warn_once in CS parsers
xe:
- add madvise interface
- add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count
and memory attributes
- drop L# bank mask reporting from media GT3 on Xe3+.
- add SLPC power_profile sysfs interface
- add configs attribs to add post/mid context-switch commands
- handle firmware reported hardware errors notifying userspace with
device wedged uevent
- use same dir structure across sysfs/debugfs
- cleanup and future proof vram region init
- add G-states and PCI link states to debugfs
- Add SRIOV support for CCS surfaces on Xe2+
- Enable SRIOV PF mode by default on supported platforms
- move flush to common code
- extended core workarounds for Xe2/3
- use DRM scheduler for delayed GT TLB invalidations
- configs improvements and allow VF device enablement
- prep work to expose mmio regions to userspace
- VF migration support added
- prepare GPU SVM for THP migration
- start fixing XE_PAGE_SIZE vs PAGE_SIZE
- add PSMI support for hw validation
- resize VF bars to max possible size according to number of VFs
- Ensure GT is in C0 during resume
- pre-populate exported buffers
- replace xe_hmm with gpusvm
- add more SVM GT stats to debugfs
- improve fake pci and WA kunnit handle for new platform testing
- Test GuC to GuC comms to add debugging
- use attribute groups to simplify sysfs registration
- add Late Binding firmware code to interact with MEI
i915:
- apply multiple JSL/EHL/Gen7/Gen6 workarounds properly
- protect against overflow in active_engine()
- Use try_cmpxchg64() in __active_lookup()
- include GuC registers in error state
- get rid of dev->struct_mutex
- iopoll: generalize read_poll_timout
- lots more display refactoring
- Reject HBR3 in any eDP Panel
- Prune modes for YUV420
- Display Wa fix, additions, and updates
- DP: Fix 2.7 Gbps link training on g4x
- DP: Adjust the idle pattern handling
- DP: Shuffle the link training code a bit
- Don't set/read the DSI C clock divider on GLK
- Enable_psr kernel parameter changes
- Type-C enabled/disconnected dp-alt sink
- Wildcat Lake enabling
- DP HDR updates
- DRAM detection
- wait PSR idle on dsb commit
- Remove FBC modulo 4 restriction for ADL-P+
- panic: refactor framebuffer allocation
habanalabs:
- debug/visibility improvements
- vmalloc-backed coherent mmap support
- HLDIO infrastructure
nova-core:
- various register!() macro improvements
- minor vbios/firmware fixes/refactoring
- advance firmware boot stages; process Booter and patch signatures
- process GSP and GSP bootloader
- Add r570.144 firmware bindings and update to it
- Move GSP boot code to own module
- Use new pin-init features to store driver's private data in a
single allocation
- Update ARef import from sync::aref
nova-drm:
- Update ARef import from sync::aref
tyr:
- initial driver skeleton for a rust driver for ARM Mali GPUs
- capable of powering up, query metadata and provide it to userspace.
msm:
- GPU and Core:
- in DT bindings describe clocks per GPU type
- GMU bandwidth voting for x1-85
- a623/a663 speedbins
- cleanup some remaining no-iommu leftovers after VM_BIND conversion
- fix GEM obj 32b size truncation
- add missing VM_BIND param validation
- IFPC for x1-85 and a750
- register xml and gen_header.py sync from mesa
- Display:
- add missing bindings for display on SC8180X
- added DisplayPort MST bindings
- conversion from round_rate() to determine_rate()
amdxdna:
- add IOCTL_AMDXDNA_GET_ARRAY
- support user space allocated buffers
- streamline PM interfaces
- Refactoring wrt. hardware contexts
- improve error reporting
nouveau:
- use GSP firmware by default
- improve error reporting
- Pre-populate exported buffers
ast:
- Clean up detection of DRAM config
exynos:
- add DSIM bridge driver support for Exynos7870
- Document Exynos7870 DSIM compatible in dt-binding
panthor:
- Print task/pid on errors
- Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25
- Improve cache flushing
- Fail VM bind if BO has offset
renesas:
- convert to RUNTIME_PM_OPS
rcar-du:
- Make number of lanes configurable
- Use RUNTIME_PM_OPS
- Add support for DSI commands
rocket:
- Add driver for Rockchip NPU plus DT bindings
- Use kfree() and sizeof() correctly
- Test DMA status
rockchip:
- dsi2: Add support for RK3576 plus DT bindings
- Add support for RK3588 DPTX output
tidss:
- Use crtc_ fields for programming display mode
- Remove other drivers from aperture
pixpaper:
- Add support for Mayqueen Pixpaper plus DT bindings
v3d:
- Support querying nubmer of GPU resets for KHR_robustness
stm:
- Clean up logging
- ltdc: Add support support for STM32MP257F-EV1 plus DT bindings
sitronix:
- st7571-i2c: Add support for inverted displays and 2-bit grayscale
tidss:
- Convert to kernel's FIELD_ macros
vesadrm:
- Support 8-bit palette mode
imagination:
- Improve power management
- Add support for TH1520 GPU
- Support Risc-V architectures
v3d:
- Improve job management and locking
vkms:
- Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x
- Spport YUV with 16-bit components"
* tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits)
drm/amd: Add name to modes from amdgpu_connector_add_common_modes()
drm/amd: Drop some common modes from amdgpu_connector_add_common_modes()
drm/amdgpu: update MODULE_PARM_DESC for freesync_video
drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes()
drm/amd/display: Share dce100_validate_global with DCE6-8
drm/amd/display: Share dce100_validate_bandwidth with DCE6-8
drm/amdgpu: Fix fence signaling race condition in userqueue
amd/amdkfd: enhance kfd process check in switch partition
amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
drm/amd/display: Reject modes with too high pixel clock on DCE6-10
drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()
drm/amd/display: Only enable common modes for eDP and LVDS
drm/amdgpu: remove the redeclaration of variable i
drm/amdgpu/userq: assign an error code for invalid userq va
drm/amdgpu: revert "rework reserved VMID handling" v2
drm/amdgpu: remove leftover from enforcing isolation by VMID
drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails
accel/habanalabs: add Infineon version check
accel/habanalabs/gaudi2: read preboot status after recovering from dirty state
accel/habanalabs: add HL_GET_P_STATE passthrough type
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- New parameterized test features
KUnit parameterized tests supported two primary methods for getting
parameters:
- Defining custom logic within a generate_params() function.
- Using the KUNIT_ARRAY_PARAM() and KUNIT_ARRAY_PARAM_DESC() macros
with a pre-defined static array and passing the created
*_gen_params() to KUNIT_CASE_PARAM().
These methods present limitations when dealing with dynamically
generated parameter arrays, or in scenarios where populating
parameters sequentially via generate_params() is inefficient or
overly complex.
These limitations are fixed with a parameterized test method
- Fix issues in kunit build artifacts cleanup
- Fix parsing skipped test problem in kselftest framework
- Enable PCI on UML without triggering WARN()
- a few other fixes and adds support for new configs such as MIPS
* tag 'linux_kselftest-kunit-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Extend kconfig help text for KUNIT_UML_PCI
rust: kunit: allow `cfg` on `test`s
kunit: qemu_configs: Add MIPS configurations
kunit: Enable PCI on UML without triggering WARN()
Documentation: kunit: Document new parameterized test features
kunit: Add example parameterized test with direct dynamic parameter array setup
kunit: Add example parameterized test with shared resource management using the Resource API
kunit: Enable direct registration of parameter arrays to a KUnit test
kunit: Pass parameterized test context to generate_params()
kunit: Introduce param_init/exit for parameterized test context management
kunit: Add parent kunit for parameterized test context
kunit: tool: Accept --raw_output=full as an alias of 'all'
kunit: tool: Parse skipped tests from kselftest.h
kunit: Always descend into kunit directory during build
|
|
We were copying the bo content the bos on the list
"xe->pinned.late.kernel_bo_present" twice on suspend.
Presumingly the intent is to copy the pinned external bos on
the first pass.
This is harmless since we (currently) should have no pinned
external bos needing copy since
a) exernal system bos don't have compressed content,
b) We do not (yet) allow pinning of VRAM bos.
Still, fix this up so that we copy pinned external bos on
the first pass. We're about to allow bos pinned in VRAM.
Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.16+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250918092207.54472-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit 9e69bafece43dcefec864f00b3ec7e088aa7fcbc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When building with CONFIG_MODULES=n, the __exit functions are dropped.
However our init functions may call them for error handling, so they are
not good candidates for the exit sections.
Fix this error reported by 0day:
ld.lld: error: relocation refers to a symbol in a discarded section: xe_configfs_exit
>>> defined in vmlinux.a(drivers/gpu/drm/xe/xe_configfs.o)
>>> referenced by xe_module.c
>>> drivers/gpu/drm/xe/xe_module.o:(init_funcs) in archive vmlinux.a
This is the only exit function using __exit. Drop it to fix the build.
Cc: Riana Tauro <riana.tauro@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506092221.1FmUQmI8-lkp@intel.com/
Fixes: 16280ded45fb ("drm/xe: Add configfs to enable survivability mode")
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250912-fix-nomodule-build-v1-1-d11b70a92516@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit d9b2623319fa20c2206754284291817488329648)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE
(already guarded by the info.skip_pcode flag) so we shouldn't
expose attributes that require any of them to avoid errors like:
[] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \
inaccessible register 0x138340+0x0
[] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe]
[] Call Trace:
[] xe_mmio_read32+0x110/0x280 [xe]
[] auto_link_downgrade_capable_show+0x2e/0x70 [xe]
[] dev_attr_show+0x1a/0x70
[] sysfs_kf_seq_show+0xaa/0x120
[] kernfs_seq_show+0x41/0x60
Fixes: 0e414bf7ad01 ("drm/xe: Expose PCIe link downgrade attributes")
Fixes: cdc36b66cd41 ("drm/xe: Expose fan control and voltage regulator version")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com
(cherry picked from commit a2d6223d224f333f705ed8495bf8bebfbc585c35)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Drop L3 bank mask reporting from the media GT on Xe3 and later. Only
do that for the primary GT. No userspace needs or uses it for media
and some platforms may report bogus values.
- Add SLPC power_profile sysfs interface with support for base and
power_saving modes (Vinay Belgaumkar, Rodrigo Vivi)
- Add configfs attributes to add post/mid context-switch commands
(Lucas De Marchi)
Cross-subsystem Changes:
- Fix hmm_pfn_to_map_order() usage in gpusvm and refactor APIs to
align with pieces previous handled by xe_hmm (Matthew Auld)
Core Changes:
- Add MEI driver for Late Binding Firmware Update/Upload
(Alexander Usyskin)
Driver Changes:
- Fix GuC CT teardown wrt TLB invalidation (Satyanarayana)
- Fix CCS save/restore on VF (Satyanarayana)
- Increase default GuC crash buffer size (Zhanjun)
- Allow to clear GT stats in debugfs to aid debugging (Matthew Brost)
- Add more SVM GT stats to debugfs (Matthew Brost)
- Fix error handling in VMA attr query (Himal)
- Move sa_info in debugfs to be per tile (Michal Wajdeczko)
- Limit number of retries upon receiving NO_RESPONSE_RETRY from GuC to
avoid endless loop (Michal Wajdeczko)
- Fix configfs handling for survivability_mode undoing user choice when
unbinding the module (Michal Wajdeczko)
- Refactor configfs attribute visibility to future-proof it and stop
exposing survivability_mode if not applicable (Michal Wajdeczko)
- Constify some functions (Harish Chegondi, Michal Wajdeczko)
- Add/extend more HW workarounds for Xe2 and Xe3
(Harish Chegondi, Tangudu Tilak Tirumalesh)
- Replace xe_hmm with gpusvm (Matthew Auld)
- Improve fake pci and WA kunit handling for testing new platforms
(Michal Wajdeczko)
- Reduce unnecessary PTE writes when migrating (Sanjay Yadav)
- Cleanup GuC interface definitions and log message (John Harrison)
- Small improvements around VF CCS (Michal Wajdeczko)
- Enable bus mastering for the I2C controller (Raag Jadav)
- Prefer devm_mutex of hand rolling it (Christophe JAILLET)
- Drop sysfs and debugfs attributes not available for VF (Michal Wajdeczko)
- GuC CT devm actions improvements (Michal Wajdeczko)
- Recommend new GuC versions for PTL and BMG (Julia Filipchuk)
- Improveme driver handling for exhaustive eviction using new
xe_validation wrapper around drm_exec (Thomas Hellström)
- Add and use printk wrappers for tile and device (Michal Wajdeczko)
- Better document workaround handling in Xe (Lucas De Marchi)
- Improvements on ARRAY_SIZE and ERR_CAST usage (Lucas De Marchi,
Fushuai Wang)
- Align CSS firmware headers with the GuC APIs (John Harrison)
- Test GuC to GuC (G2G) communication to aid debug in pre-production
firmware (John Harrison)
- Bail out driver probing if GuC fails to load (John Harrison)
- Allow error injection in xe_pxp_exec_queue_add()
(Daniele Ceraolo Spurio)
- Minor refactors in xe_svm (Shuicheng Lin)
- Fix madvise ioctl error handling (Shuicheng Lin)
- Use attribute groups to simplify sysfs registration
(Michal Wajdeczko)
- Add Late Binding Firmware implementation in Xe to work together with
the MEI component (Badal Nilawar, Daniele Ceraolo Spurio, Rodrigo
Vivi)
- Fix build with CONFIG_MODULES=n (Lucas De Marchi)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/c2et6dnkst2apsgt46dklej4nprqdukjosb55grpaknf3pvcxy@t7gtn3hqtp6n
|
|
When building with CONFIG_MODULES=n, the __exit functions are dropped.
However our init functions may call them for error handling, so they are
not good candidates for the exit sections.
Fix this error reported by 0day:
ld.lld: error: relocation refers to a symbol in a discarded section: xe_configfs_exit
>>> defined in vmlinux.a(drivers/gpu/drm/xe/xe_configfs.o)
>>> referenced by xe_module.c
>>> drivers/gpu/drm/xe/xe_module.o:(init_funcs) in archive vmlinux.a
This is the only exit function using __exit. Drop it to fix the build.
Cc: Riana Tauro <riana.tauro@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506092221.1FmUQmI8-lkp@intel.com/
Fixes: 16280ded45fb ("drm/xe: Add configfs to enable survivability mode")
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250912-fix-nomodule-build-v1-1-d11b70a92516@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Cross-subsystem Changes:
- Overflow: add range_overflows and range_end_overflows (Jani)
Core Changes:
- Get rid of dev->struct_mutex (Luiz)
Non-display related:
- GVT: Remove redundant ternary operators (Liao)
- Various i915_utils clean-ups (Jani)
Display related:
- Wait PSR idle before on dsb commit (Jouni)
- Fix size for for_each_set_bit() in abox iteration (Jani)
- Abstract figuring out encoder name (Jani)
- Remove FBC modulo 4 restriction for ADL-P+ (Uma)
- Panic: refactor framebuffer allocation (Jani)
- Backlight luminance control improvements (Suraj, Aaron)
- Add intel_display_device_present (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aMxX_lBxm7wd5wmi@intel.com
|
|
Like done for post context restore, allow the user to add commands to
the middle of context restore, at the beginning of engine restore
commands.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-7-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Like done for post-context-restore commands, allow to add commands from
configfs in the middle of context restore. Since currently the indirect
ctx hardcodes the offset to CTX_INDIRECT_CTX_OFFSET_DEFAULT, this is
executed in the very beginning of engine context restore.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-6-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Currently it's only allowed for render and compute. Going forward we
want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX
flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor
for its availability.
While at it, add the missing const to rcs_funcs array. Since
CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and
gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there
is no change in behavior, it's only preparation for future use case.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Allow the user to specify commands to execute during a context restore.
Currently it's possible to parse 2 types of actions:
- cmd: the instructions are added as is to the bb
- reg: just use the address and value, without worrying about
encoding the right LRI instruction. This is possibly the most
useful use case, so added a dedicated action for that.
This also prepares for future BBs: mid context restore and rc6 context
restore that can re-use the same parsing functions.
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-4-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
During validation it's useful to allows additional commands to be
executed on context switch. Fetch the commands from configfs (to be
added) and add them to the WA BB.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-3-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
For a future configfs attribute, it's desirable to select by engine mask
only as the instance doesn't make sense.
Rename the function lookup_engine_mask() to lookup_engine_info() and
make it return the entry. This allows parse_engine() to still return an
item if the caller wants to allow parsing a class-only string like
"rcs", "bcs", "ccs", etc.
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-2-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Move the part that copies the engine to a local buffer so it can be
shared in future for other configfs attributes parsing an engine.
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-1-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Extract and print version info of the late binding binary.
v2: Some refinements (Daniele)
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-10-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Introduce a debug filesystem node to disable late binding fw reload
during the system or runtime resume. This is intended for situations
where the late binding fw needs to be loaded from user mode,
perticularly for validation purpose.
Note that xe kmd doesn't participate in late binding flow from user
space. Binary loaded from the userspace will be lost upon entering to
D3 cold hence user space app need to handle this situation.
v2:
- s/(uval == 1) ? true : false/!!uval/ (Daniele)
v3:
- Refine the commit message (Daniele)
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-9-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|