linux/drivers/gpu/drm/xe/instructions, branch master

drm/xe/xelp: Wait for AuxCCS invalidation to complete

2026-03-24T13:29:11Z

On AuxCCS platforms we need to wait for AuxCCS invalidations to complete. Signed-off-by: Tvrtko Ursulin Reviewed-by: Rodrigo Vivi Link: https://patch.msgid.link/20260324084018.20353-6-tvrtko.ursulin@igalia.com Signed-off-by: Rodrigo Vivi

drm/xe: Add MI_SEMAPHORE_WAIT command definition

2026-03-23T09:38:13Z

This command supports memory based Semaphore WAIT. Memory based semaphores will be used for synchronization between the Producer and the Consumer contexts. Producer and Consumer Contexts could be running on different engines or on the same engine inside GT. Bspec: 45749, 60244 Signed-off-by: Michal Wajdeczko Reviewed-by: Michał Winiarski Link: https://patch.msgid.link/20260303201354.17948-3-michal.wajdeczko@intel.com

drm/xe/xe3p_lpg: Add LRC parsing for additional RCS engine state

2026-02-10T13:09:05Z

Xe3p_LPG adds some additional state instructions to the RCS engine's LRC. Add support for these to the debugfs LRC parser. Note that the bspec's LRC description page seems to have a few mistakes in the name/spelling of these new instructions (e.g., "3DSTATE_TASK_DATA_EXT" instead of "3DSTATE_TASK_SHADER_DATA_EXT" or "3DSTATE_VIEWPORT_STATE_POINTERS_CL_SF_2" instead of "3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP_2"). Bspec: 65182 Signed-off-by: Matt Roper Reviewed-by: Matt Atwood Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-6-636e1ad32688@intel.com Signed-off-by: Gustavo Sousa

drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches

2025-12-12T03:21:34Z

To properly support soft light restore between batches being arbitrated at the CFEG, PIPE_CONTROL instructions have a new bit in the first DW, QUEUE_DRAIN_MODE. When set, this indicates to the CFEG that it should only drain the current queue. Additionally we no longer want to set the CS_STALL bit for these multi queue queues as this causes the entire pipeline to stall waiting for completion of the prior batch, preventing this soft light restore from occurring between queues in a queue group. v4: Assert !multi_queue where applicable (Matt Roper) Bspec: 56551 Signed-off-by: Stuart Summers Signed-off-by: Niranjana Vishwanathapura Reviewed-by: Matt Roper Link: https://patch.msgid.link/20251211010249.1647839-29-niranjana.vishwanathapura@intel.com

drm/xe/migrate: support MEM_COPY instruction

2025-10-23T09:48:39Z

Make this the default on xe2+ when doing a copy. This has a few advantages over the exiting copy instruction: 1) It has a special PAGE_COPY mode that claims to be optimised for page-in/page-out, which is the vast majority of current users. 2) It also has a simple BYTE_COPY mode that supports byte granularity copying without any restrictions. With 2) we can now easily skip the bounce buffer flow when copying buffers with strange sizing/alignment, like for memory_access. But that is left for the next patch. v2 (Matt Brost): - Use device info to check whether device should use the MEM_COPY path. This should fit better with making this a configfs tunable. - And with that also keep old path still functional on xe2 for possible experimentation. - Add a define for PAGE_COPY page-size. v3 (Matt Brost): - Fallback to an actual linear copy for pitch=1. - Also update NVL. BSpec: 57561 Signed-off-by: Matthew Auld Cc: Matthew Brost Reviewed-by: Matthew Brost Link: https://lore.kernel.org/r/20251022163836.191405-7-matthew.auld@intel.com

drm/xe/xelp: Implement Wa_16010904313

2025-07-25T15:42:49Z

Add XeLP workaround 16010904313. The description calls for it to be emitted as the indirect context buffer workaround for render and compute, and from the workaround batch buffer for the other engines. Therefore we plug into the previously added respective top level emission functions. The actual command streamer programming sequence differs from what is described in the PRM, in that it assumes the listed LRCA offset was supposed to actually refer to the location of the CTX_TIMESTAMP register instead of LRCA + 0x180c (which is in GPR space). Latter appears to make more sense under the assumption that multiple writes are helping with restoring the CTX_TIMESTAMP register content from the saved context state. Signed-off-by: Tvrtko Ursulin Reviewed-by: Lucas De Marchi Cc: Matt Roper Link: https://lore.kernel.org/r/20250711160153.49833-8-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi

drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value

2025-05-12T21:33:23Z

For determining actual job execution time, save the current value of the CTX_TIMESTAMP register rather than the value saved in LRC since the current register value is the closest to the start time of the job. v2: Define MI_STORE_REGISTER_MEM to fix compile error v3: Place MI_STORE_REGISTER_MEM sorted by MI_INSTR (Lucas) Fixes: 65921374c48f ("drm/xe: Emit ctx timestamp copy in ring ops") Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Matthew Brost Reviewed-by: Lucas De Marchi Link: https://lore.kernel.org/r/20250509161159.2173069-6-umesh.nerlige.ramappa@intel.com

drm/xe: Invalidate L3 read-only cachelines for geometry streams too

2025-03-31T16:18:41Z

Historically, the Vertex Fetcher unit has not been an L3 client. That meant that, when a buffer containing vertex data was written to, it was necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any VF L2 cachelines associated with that buffer, so the new value would be properly read from memory. Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER have included an "L3 Bypass Enable" bit which userspace drivers can set to request that the vertex fetcher unit snoop L3. However, unlike most true L3 clients, the "VF Cache Invalidate" bit continues to only invalidate the VF L2 cache - and not any associated L3 lines. To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation Bit", which according to the docs, "controls the invalidation of the Geometry streams cached in L3 cache at the top of the pipe." In other words, the vertex and index buffer data that gets cached in L3 when "L3 Bypass Disable" is set. Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only Cache Invalidate so that both L2 and L3 vertex data is invalidated. xe is issuing VF cache invalidates too (which handles cases like CPU writes to a buffer between GPU batches). Because userspace may enable L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well. Fixes significant flickering in Firefox on Meteorlake, which was writing to vertex buffers via the CPU between batches; the missing L3 Read Only invalidates were causing the vertex fetcher to read stale data from L3. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460 Fixes: 6ef3bb60557d ("drm/xe: enable lite restore") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Kenneth Graunke Reviewed-by: Rodrigo Vivi Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi

drm/xe: Add MI_MATH and ALU instruction definitions

2025-03-12T10:37:50Z

The command streamer implements an Arithmetic Logic Unit (ALU) which supports basic arithmetic and logical operations on two 64-bit operands. Access to this ALU is thru the MI_MATH command and sixteen General Purpose Register (GPR) 64-bit registers, which are used as temporary storage. Bspec: 45737, 60236 # MI Bspec: 45525, 60132 # ALU Bspec: 45533, 60309 # GPR Signed-off-by: Michal Wajdeczko Reviewed-by: Matt Roper Link: https://patchwork.freedesktop.org/patch/msgid/20250304162307.1866-1-michal.wajdeczko@intel.com

drm/xe: Add MI_LOAD_REGISTER_REG command definition

2025-03-12T10:37:49Z

The MI_LOAD_REGISTER_REG command reads value from a source register location and writes that value to a destination register location. Bspec: 45730, 60233 Signed-off-by: Michal Wajdeczko Reviewed-by: Matt Roper Link: https://patchwork.freedesktop.org/patch/msgid/20250303173522.1822-2-michal.wajdeczko@intel.com