linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Lines
2025-06-03	drm/i915: Use dma-fence driver and timeline name helpers	Tvrtko Ursulin	-5/+5
	Access the dma-fence internals via the previously added helpers. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250515095004.28318-6-tvrtko.ursulin@igalia.com
2025-06-03	dma-fence: Use a flag for 64-bit seqnos	Tvrtko Ursulin	-10/+7
	With the goal of reducing the need for drivers to touch (and dereference) fence->ops, we move the 64-bit seqnos flag from struct dma_fence_ops to the fence->flags. Drivers which were setting this flag are changed to use new dma_fence_init64() instead of dma_fence_init(). v2: * Streamlined init and added kerneldoc. * Rebase for amdgpu userq which landed since. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> # v1 Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250515095004.28318-3-tvrtko.ursulin@igalia.com
2025-06-03	drm/ttm: Increase pool shrinker batch target	Tvrtko Ursulin	-1/+8
	The default core shrink target of 128 pages (SHRINK_BATCH) is quite low relative to how cheap TTM pool shrinking is, and how the free pages are distributed in page order pools. We can make the target a bit more aggressive by making it roughly the average number of pages across all pools, freeing more of the cached pages every time shrinker core invokes our callback. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250603112750.34997-3-tvrtko.ursulin@igalia.com
2025-06-03	drm/ttm: Respect the shrinker core free target	Tvrtko Ursulin	-3/+5
	Currently the TTM shrinker aborts shrinking as soon as it frees pages from any of the page order pools and by doing so it can fail to respect the freeing target which was configured by the shrinker core. We use the wording "can fail" because the number of freed pages will depend on the presence of pages in the pools and the order of the pools on the LRU list. For example if there are no free pages in the high order pools the shrinker core may require multiple passes over the TTM shrinker before it will free the default target of 128 pages (assuming there are free pages in the low order pools). This inefficiency can be compounded by the pool LRU where multiple further calls into the TTM shrinker are required to end up looking at the pool with pages. Improve this by never freeing less than the shrinker core has requested. At the same time we start reporting the number of scanned pages (freed in this case), which prevents the core shrinker from giving up on the TTM shrinker too soon and moving on. v2: * Simplify loop logic. (Christian) * Improve commit message. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20250603112750.34997-2-tvrtko.ursulin@igalia.com
2025-06-03	Merge tag 'bitmap-for-6.16-rc1' of https://github.com/norov/linux	Linus Torvalds	-97/+11
	Pull bitmap updates from Yury Norov: - dead code cleanups for cpumasks and nodemasks (me) - fixed-width flavors of GENMASK() and BIT() (Vincent, Lucas and me) - FIELD_MODIFY() helper (Luo) - for_each_node_with_cpus() optimization (me) - bitmap-str fixes (Andy) * tag 'bitmap-for-6.16-rc1' of https://github.com/norov/linux: topology: make for_each_node_with_cpus() O(N) bitfield: Add FIELD_MODIFY() helper bitmap-str: Add missing header(s) bitmap-str: Get rid of 'extern' for function prototypes build_bug.h: more user friendly error messages in BUILD_BUG_ON_ZERO() test_bits: add tests for BIT_U() test_bits: add tests for GENMASK_U() drm/i915: Convert REG_GENMASK() to fixed-width GENMASK_U() bits: introduce fixed-type BIT_U() bits: introduce fixed-type GENMASK_U() bits: add comments and newlines to #if, #else and #endif directives cpumask: drop cpumask_assign_cpu() riscv: switch set_icache_stale_mask() to using non-atomic assign_cpu() cpumask: add non-atomic __assign_cpu() nodemask: drop nodes_shift
2025-06-03	Merge drm-next-2025-05-28 into drm-misc-next	Maxime Ripard	-22702/+36034
	Christian needs a recent drm-next branch to merge fence patches. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-03	drm/i915/display: Fix u32 overflow in SNPS PHY HDMI PLL setup	Dibin Moolakadan Subrahmanian	-8/+8
	When configuring the HDMI PLL, calculations use DIV_ROUND_UP_ULL and DIV_ROUND_DOWN_ULL macros, which internally rely on do_div. However, do_div expects a 32-bit (u32) divisor, and at higher data rates, the divisor can exceed this limit. This leads to incorrect division results and ultimately misconfigured PLL values. This fix replaces do_div calls with div64_base64 calls where diviser can exceed u32 limit. Fixes: 5947642004bf ("drm/i915/display: Add support for SNPS PHY HDMI PLL algorithm for DG2") Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Dibin Moolakadan Subrahmanian <dibin.moolakadan.subrahmanian@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://lore.kernel.org/r/20250528064557.4172149-1-dibin.moolakadan.subrahmanian@intel.com
2025-06-03	drm/xe/vf: Add sanity check for GGTT configuration	Michal Wajdeczko	-1/+10
	The VF GGTT configuration was prepared by the PF, which should be trusted, was obtained from the GuC, which likely already did some sanity checks too, but since it's a received data, we should have our own sanity checks to detect early any misconfiguration. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250602103325.549-4-michal.wajdeczko@intel.com
2025-06-03	drm/xe/vf: Move tile-related VF functions to separate file	Michal Wajdeczko	-253/+269
	Some of our VF functions, even if they take a GT pointer, work only on primary GT and really are tile-related and would be better to keep them separate from the rest of true GT-oriented functions. Move them to a file and update to take a tile pointer instead. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Tomasz Lis <tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20250602103325.549-3-michal.wajdeczko@intel.com
2025-06-03	drm/xe/vf: Introduce helpers to access GGTT configuration	Michal Wajdeczko	-1/+38
	In upcoming patch we want to separate tile-oriented VF functions from GT-oriented functions and to allow the former access a GGTT configuration stored at GT level we need to provide some helpers. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Tomasz Lis<tomasz.lis@intel.com> Link: https://lore.kernel.org/r/20250602103325.549-2-michal.wajdeczko@intel.com
2025-06-03	drm/udl: use DRM_GEM_SHMEM_DRIVER_OPS_NO_MAP_SGT	Shixiong Ou	-1/+1
	Import dmabuf without mapping its sg_table to avoid issues likes: udl 2-1.4:1.0: swiotlb buffer is full (sz: 2097152 bytes), total 65536 (slots), used 1 (slots) Signed-off-by: Shixiong Ou <oushixiong@kylinos.cn> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250522070714.439824-3-oushixiong1025@163.com
2025-06-03	drm/ast: use DRM_GEM_SHMEM_DRIVER_OPS_NO_MAP_SGT	Shixiong Ou	-1/+1
	Import dmabuf without mapping its sg_table to avoid issues likes: ast 0000:07:00.0: swiotlb buffer is full (sz: 3145728 bytes), total 32768 (slots), used 0 (slots) Signed-off-by: Shixiong Ou <oushixiong@kylinos.cn> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250522070714.439824-2-oushixiong1025@163.com
2025-06-03	drm/shmem-helper: Import dmabuf without mapping its sg_table	Shixiong Ou	-9/+84
	[WHY] 1. Drivers using DRM_GEM_SHADOW_PLANE_HELPER_FUNCS and DRM_GEM_SHMEM_DRIVER_OPS (e.g., udl, ast) do not require sg_table import. They only need dma_buf_vmap() to access the shared buffer's kernel virtual address. 2. On certain Aspeed-based boards, a dma_mask of 0xffff_ffff may trigger SWIOTLB during dmabuf import. However, IO_TLB_SEGSIZE restricts the maximum DMA streaming mapping memory, resulting in errors like: ast 0000:07:00.0: swiotlb buffer is full (sz: 3145728 bytes), total 32768 (slots), used 0 (slots) [HOW] Provide a gem_prime_import implementation without sg_table mapping to avoid issues (e.g., "swiotlb buffer is full"). Drivers that do not require sg_table can adopt this. Signed-off-by: Shixiong Ou <oushixiong@kylinos.cn> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250522070714.439824-1-oushixiong1025@163.com
2025-06-02	drm/xe: remove unmatched xe_vm_unlock() from __xe_exec_queue_init()	Maciej Patelczyk	-5/+1
	There is unmatched xe_vm_unlock() in the __xe_exec_queue_init(). Leftover from commit fbeaad071a98 ("drm/xe: Create LRC BO without VM") Fixes: fbeaad071a98 ("drm/xe: Create LRC BO without VM") Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://lore.kernel.org/r/20250530135627.2821612-1-maciej.patelczyk@intel.com
2025-06-02	drm/xe/configfs: Add attribute to disable engines	Lucas De Marchi	-2/+147
	Add the userspace interface to load the driver with fewer engines. The syntax is to just echo the engine names to a file in configfs, like below: echo 'rcs0,bcs0' > /sys/kernel/config/xe/<bdf>/engine_allowed With that engines other than rcs0 and bcs0 will not be enabled. To enable all instances from a class, a '*' can be used. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250528-engine-mask-v4-4-f4636d2a890a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-02	drm/xe: Allow to disable engines	Lucas De Marchi	-0/+38
	Sometimes it's useful to load the driver with a smaller set of engines to allow more targeted debugging, particularly on early enabling. Besides checking what is fused off in hardware, add similar logic to disable engines in software. This will use configfs to allow users to set what engine to disable, so already add prepare for that. The exact configfs interface will be added later. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250528-engine-mask-v4-3-f4636d2a890a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-02	drm/xe: Convert "fused off" messages to be gt-based	Lucas De Marchi	-11/+6
	It's useful to see in the log message what GT was being checked for disabled/fused-off engines. Especially on multi-tile platforms the different tiles may be fused differently making it harder to parse the information. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250528-engine-mask-v4-2-f4636d2a890a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-02	drm/xe/configfs: Drop trailing semicolons	Lucas De Marchi	-4/+4
	Drop the semicolons from the dummy implementation: they shouldn't be there. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250528-engine-mask-v4-1-f4636d2a890a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-02	drm/xe/guc: Resend potentially lost H2G MMIO request	Michal Wajdeczko	-0/+7
	There could be a scenario where the VF driver is resuming faster than the driver PF is able to complete the VF FLR sequence which includes reset of the VF scratch registers. This may result in deletion of the ongoing HXG message (it could be either a host request or a GuC response). When we detect that HXG message was likey lost (scratch register with HXG header was zeroed) try to send this request once more before giving up. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250528090021.329-1-michal.wajdeczko@intel.com
2025-06-02	drm/xe: Use GT-oriented printer to dump topology on init	Michal Wajdeczko	-2/+2
	During the probe we dump the discovered GT topology, but instead of a generic printer we can use our own GT-oriented printer which contains information about the source GT. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250530210524.505-1-michal.wajdeczko@intel.com
2025-06-02	drm/xe: Convert page fault messages to be GT-oriented	Michal Wajdeczko	-38/+36
	We are processing here G2H messages, so we should use GT oriented messages to retain information about the origin GT. While at it, print error codes in a user-friendly way. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250530164835.461-1-michal.wajdeczko@intel.com
2025-06-02	drm/xe/hwmon: Simplify and fix 32b wrap	Lucas De Marchi	-10/+8
	Like done in commit eaa287069a70 ("drm/xe/guc_submit: Simplify and fix diff calculation"), just use u32 for wrapping the value, which is simpler and more correct: when wrapping on 32b, the accumulated value was off by one. Also, do not mix the u64 value from pmt with the u32 value used for the calculation. Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250530-xe-hwmon-wrap-v2-1-ce653db7fe4a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-02	drm/xe/pxp: Decouple queue addition from PXP start	Daniele Ceraolo Spurio	-61/+86
	Starting PXP and adding a queue to the PXP queue list are separate actions. Given that a queue can only be added to the list if PXP is active, the 2 actions were bundled together to avoid having to re-lock and re-check the status to perform the queue addition after having done so during the PXP start. However, we don't save a lot of complexity by doing so and we lose in clarity of code, so overall it's cleaner to just keep the 2 actions separate. v2: remove leftover rpm_get (John), fix rpm_put in error case Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250522225401.3953243-8-daniele.ceraolospurio@intel.com
2025-06-02	drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not ready	Daniele Ceraolo Spurio	-2/+6
	The expected flow of operations when using PXP is to query the PXP status and wait for it to transition to "ready" before attempting to create an exec_queue. This flow is followed by the Mesa driver, but there is no guarantee that an incorrectly coded (or malicious) app will not attempt to create the queue first without querying the status. Therefore, we need to clarify what the expected behavior of the queue creation ioctl is in this scenario. Currently, the ioctl always fails with an -EBUSY code no matter the error, but for consistency it is better to distinguish between "failed to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl does. Note that, while this is a change in the return code of an ioctl, the behavior of the ioctl in this particular corner case was not clearly spec'd, so no one should have been relying on it (and we know that Mesa, which is the only known userspace for this, didn't). v2: Minor rework of the doc (Rodrigo) Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
2025-06-02	drm/xe/pxp: Use the correct define in the set_property_funcs array	Daniele Ceraolo Spurio	-1/+1
	The define of the extension type was accidentally used instead of the one of the property itself. They're both zero, so no functional issue, but we should use the correct define for code correctness. Fixes: 41a97c4a1294 ("drm/xe/pxp/uapi: Add API to mark a BO as using PXP") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250522225401.3953243-6-daniele.ceraolospurio@intel.com
2025-06-02	drm/panfrost: Fix panfrost device variable name in devfreq	Adrián Larumbe	-2/+2
	Commit 64111a0e22a9 ("drm/panfrost: Fix incorrect updating of current device frequency") was a Panfrost port of a similar fix in Panthor. Fix the Panfrost device pointer variable name so that it follows Panfrost naming conventions. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: 64111a0e22a9 ("drm/panfrost: Fix incorrect updating of current device frequency") Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250520174634.353267-6-adrian.larumbe@collabora.com
2025-06-02	drm/panfrost: show device-wide list of DRM GEM objects over DebugFS	Adrián Larumbe	-0/+236
	This change is essentially a Panfrost port of commit a3707f53eb3f ("drm/panthor: show device-wide list of DRM GEM objects over DebugFS"). The DebugFS file is almost the same as in Panthor, minus the GEM object usage flags, since Panfrost has no kernel-only BO's. Two additional GEM state flags which are displayed but aren't relevant to Panthor are 'Purged' and 'Purgeable', since Panfrost implements an explicit shrinker and a madvise ioctl to flag objects as reclaimable. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250520174634.353267-5-adrian.larumbe@collabora.com
2025-06-02	drm/panfrost: Add driver IOCTL for setting BO labels	Adrián Larumbe	-1/+45
	Allow UM to label a BO for which it possesses a DRM handle. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250520174634.353267-4-adrian.larumbe@collabora.com
2025-06-02	drm/panfrost: Internally label some BOs	Adrián Larumbe	-0/+12
	Perfcnt samples buffer is not exposed to UM, but we would like to keep a tag on it as a potential debug aid. PRIME imported GEM buffers are UM exposed, but since the usual Panfrost UM driver code path is not followed in their creation, they might remain unlabelled for their entire lifetime, so a generic tag was deemed preferable. The tag is assigned before a UM handle is created so it doesn't contradict the logic about labelling internal BOs. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250520174634.353267-3-adrian.larumbe@collabora.com
2025-06-02	drm/panfrost: Add BO labelling to Panfrost	Adrián Larumbe	-0/+59
	Functions for labelling UM-exposed an internal BOs are provided. An example of the latter would be the Perfcnt sample buffer. This commit is done in preparation of a following one that will allow UM to set BO labels through a new ioctl(). Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250520174634.353267-2-adrian.larumbe@collabora.com
2025-06-02	drm/sched/tests: Use one lock for fence context	Philipp Stanner	-4/+2
	There is no need for separate locks for single jobs and the entire scheduler. The dma_fence context can be protected by the scheduler lock, allowing for removing the jobs' locks. This simplifies things and reduces the likelyhood of deadlocks etc. Replace the jobs' locks with the mock scheduler lock. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250527101029.56491-2-phasta@kernel.org
2025-06-02	drm/xe/sched: stop re-submitting signalled jobs	Matthew Auld	-1/+9
	Customer is reporting a really subtle issue where we get random DMAR faults, hangs and other nasties for kernel migration jobs when stressing stuff like s2idle/s3/s4. The explosions seems to happen somewhere after resuming the system with splats looking something like: PM: suspend exit rfkill: input handler disabled xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0 xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1] xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out The likely cause appears to be a race between suspend cancelling the worker that processes the free_job()'s, such that we still have pending jobs to be freed after the cancel. Following from this, on resume the pending_list will now contain at least one already complete job, but it looks like we call drm_sched_resubmit_jobs(), which will then call run_job() on everything still on the pending_list. But if the job was already complete, then all the resources tied to the job, like the bb itself, any memory that is being accessed, the iommu mappings etc. might be long gone since those are usually tied to the fence signalling. This scenario can be seen in ftrace when running a slightly modified xe_pm IGT (kernel was only modified to inject artificial latency into free_job to make the race easier to hit): xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ... xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13 xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4 xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3 xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3 xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3 xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13 xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ... ..... xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13 So the job_run() is clearly triggered twice for the same job, even though the first must have already signalled to completion during suspend. We can also see a CAT error after the re-submit. To prevent this only resubmit jobs on the pending_list that have not yet signalled. v2: - Make sure to re-arm the fence callbacks with sched_start(). v3 (Matt B): - Stop using drm_sched_resubmit_jobs(), which appears to be deprecated and just open-code a simple loop such that we skip calling run_job() on anything already signalled. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: William Tseng <william.tseng@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
2025-06-02	drm/xe: Rework eviction rejection of bound external bos	Thomas Hellström	-18/+105
	For preempt_fence mode VM's we're rejecting eviction of shared bos during VM_BIND. However, since we do this in the move() callback, we're getting an eviction failure warning from TTM. The TTM callback intended for these things is eviction_valuable(). However, the latter doesn't pass in the struct ttm_operation_ctx needed to determine whether the caller needs this. Instead, attach the needed information to the vm under the vm->resv, until we've been able to update TTM to provide the needed information. And add sufficient lockdep checks to prevent misuse and races. v2: - Fix a copy-paste error in xe_vm_clear_validating() v3: - Fix kerneldoc errors. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 0af944f0e308 ("drm/xe: Reject BO eviction if BO is bound to current VM") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250528164105.234718-1-thomas.hellstrom@linux.intel.com
2025-06-02	drm/i915/guc: Handle race condition where wakeref count drops below 0	Jesus Narvaez	-3/+14
	There is a rare race condition when preparing for a reset where guc_lrc_desc_unpin() could be in the process of deregistering a context while a different thread is scrubbing outstanding contexts and it alters the context state and does a wakeref put. Then, if there is a failure with deregister_context(), a second wakeref put could occur. As a result the wakeref count could drop below 0 and fail an INTEL_WAKEREF_BUG_ON() check. Therefore if there is a failure with deregister_context(), undo the context state changes and do a wakeref put only if the context was set to be destroyed earlier. v2: Expand comment to better explain change. (Daniele) v3: Removed addition to the original comment. (Daniele) Fixes: 2f2cc53b5fe7 ("drm/i915/guc: Close deregister-context race against CT-loss") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Mousumi Jana <mousumi.jana@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250528230551.1855177-1-jesus.narvaez@intel.com (cherry picked from commit f36a75aba1c3176d177964bca76f86a075d2943a) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-06-02	drm/i915/psr: Fix using wrong mask in REG_FIELD_PREP	Jouni Högander	-2/+2
	Wrong mask is used in PORT_ALPM_LFPS_CTL_FIRST_LFPS_HALF_CYCLE_DURATION and PORT_ALPM_LFPS_CTL_LAST_LFPS_HALF_CYCLE_DURATION. Fixes: 295099580f04 ("drm/i915/psr: Add missing ALPM AUX-Less register definitions") Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://lore.kernel.org/r/20250526120512.1702815-12-jouni.hogander@intel.com (cherry picked from commit 8097128a40ff378761034ec72cdbf6f46e466dc0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-06-02	drm/i915/guc: Check if expecting reply before decrementing ↵	Jesus Narvaez	-1/+1
	outstanding_submission_g2h When sending a H2G message where a reply is expected in guc_submission_send_busy_loop(), outstanding_submission_g2h is incremented before the send. However, if there is an error sending the message, outstanding_submission_g2h is decremented without checking if a reply is expected. Therefore, check if reply is expected when there is a failure before decrementing outstanding_submission_g2h. Fixes: 2f2cc53b5fe7 ("drm/i915/guc: Close deregister-context race against CT-loss") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Mousumi Jana <mousumi.jana@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250514225224.4142684-1-jesus.narvaez@intel.com (cherry picked from commit a6a26786f22a4ab0227bcf610510c4c9c2df0808) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2025-06-02	drm/tests: hdmi: Add test for unsuccessful fallback to YUV420	Cristian Ciocaltea	-0/+87
	Provide test to verify a mandatory fallback to YUV420 output cannot succeed when driver doesn't advertise YUV420 support. Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-19-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Add max TMDS rate fallback tests for YUV420 mode	Cristian Ciocaltea	-0/+154
	Provide tests to verify drm_atomic_helper_connector_hdmi_check() helper fallback behavior when using YUV420 output format. Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-18-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Provide EDID supporting 4K@30Hz with RGB/YUV	Cristian Ciocaltea	-0/+114
	Create a test EDID advertising the following capabilities: Max resolution: 3840x2160@30Hz with RGB, YUV444, YUV422, YUV420 Max BPC: 16 for all modes Max TMDS clock: 340 MHz Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-17-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Rename max TMDS rate fallback tests	Cristian Ciocaltea	-4/+4
	In preparation to extend the max TMDS rate fallback tests for covering YUV420 output, update the rather generic function names drm_test_check_max_tmds_rate_{bpc\|format}_fallback() to properly indicate the intended test cases. Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-16-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Add limited range tests for YUV420 mode	Cristian Ciocaltea	-5/+103
	Provide tests to verify that drm_atomic_helper_connector_hdmi_check() helper behaviour when using YUV420 output format is to always set the limited RGB quantization range to 'limited', no matter what the value of Broadcast RGB property is. Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-15-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Provide EDID supporting 4K@30Hz with YUV420 only	Cristian Ciocaltea	-0/+118
	Create a test EDID advertising the following capabilities: Max resolution: - 1920x1080@60Hz with RGB, YUV444, YUV422 - 3840x2160@30Hz with YUV420 only Max BPC: 16 for all modes Max TMDS clock: 200 MHz Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-14-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Switch to drm_atomic_get_new_connector_state() where possible	Cristian Ciocaltea	-6/+6
	Replace the calls to drm_atomic_get_connector_state() with drm_atomic_get_new_connector_state() for cases which do not require allocating the connector state, e.g. after drm_atomic_check_only() when the intent is to only read the new connector state. The rational is to avoid the need to handle the potential EDEADLK error returned by the former helper, which would require restarting the entire atomic sequence. Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-13-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Setup ycbcr_420_allowed before initializing connector	Cristian Ciocaltea	-0/+2
	Initializing HDMI connector via drmm_connector_hdmi_init() requires its ->ycbcr_420_allowed flag to be adjusted according to the supported formats passed as function argument, prior to the actual invocation. In order to allow providing test coverage for YUV420 modes, ensure the flag is properly setup. Reviewed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-12-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Drop unused drm_kunit_helper_connector_hdmi_init_funcs()	Cristian Ciocaltea	-10/+0
	After updating the code to make use of the new EDID setup helper, drm_kunit_helper_connector_hdmi_init_funcs() became unused, hence drop it. Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-11-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Replace open coded EDID setup	Cristian Ciocaltea	-142/+92
	Make use of the recently introduced macros to reduce boilerplate code around EDID setup. This also helps dropping the redundant calls to set_connector_edid(). No functional changes intended. Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-10-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Add macro to simplify EDID setup	Cristian Ciocaltea	-18/+28
	Factor out the HDMI connector initialization from drm_kunit_helper_connector_hdmi_init_funcs() into a common __connector_hdmi_init() function, while extending its functionality to allow setting custom (i.e. non-default) EDID data. Introduce a macro as a wrapper over the new helper to allow dropping the open coded EDID setup from all test cases. The actual conversion will be handled separately; for now just apply it to drm_kunit_helper_connector_hdmi_init() helper. Reviewed-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-9-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Switch to 'void *' type for EDID data	Cristian Ciocaltea	-2/+2
	Replace 'const char ' with 'const void ' type for current_edid member in struct drm_atomic_helper_connector_hdmi_priv, as well as for the edid parameter of set_connector_edid() function. Suggested-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-8-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/tests: hdmi: Replace '[_]MHz' with 'mhz'	Cristian Ciocaltea	-4/+4
	Improve consistency throughout drm_hdmi_state_helper_test.c by replacing the two occurrences of '[_]MHz' substring with 'mhz'. As a bonus, this also helps getting rid of checkpatch.pl complaint: CHECK: Avoid CamelCase: <reject_100_MHz_connector_hdmi_funcs> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-7-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-06-02	drm/connector: hdmi: Use YUV420 output format as an RGB fallback	Cristian Ciocaltea	-4/+14
	Try to make use of YUV420 when computing the best output format and RGB cannot be supported for any of the available color depths. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250527-hdmi-conn-yuv-v5-6-74c9c4a8ac0c@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>