<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/xe/instructions, branch master</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=master</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2026-03-24T13:29:11Z</updated>
<entry>
<title>drm/xe/xelp: Wait for AuxCCS invalidation to complete</title>
<updated>2026-03-24T13:29:11Z</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2026-03-24T08:40:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd1a516234ebb049007ce20c6b6e76936b29bade'/>
<id>urn:sha1:cd1a516234ebb049007ce20c6b6e76936b29bade</id>
<content type='text'>
On AuxCCS platforms we need to wait for AuxCCS invalidations to complete.

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://patch.msgid.link/20260324084018.20353-6-tvrtko.ursulin@igalia.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Add MI_SEMAPHORE_WAIT command definition</title>
<updated>2026-03-23T09:38:13Z</updated>
<author>
<name>Michal Wajdeczko</name>
<email>michal.wajdeczko@intel.com</email>
</author>
<published>2026-03-03T20:13:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d95fda29758b59f4279465892905ca57dfd4bb10'/>
<id>urn:sha1:d95fda29758b59f4279465892905ca57dfd4bb10</id>
<content type='text'>
This command supports memory based Semaphore WAIT. Memory based
semaphores will be used for synchronization between the Producer
and the Consumer contexts. Producer and Consumer Contexts could
be running on different engines or on the same engine inside GT.

Bspec: 45749, 60244
Signed-off-by: Michal Wajdeczko &lt;michal.wajdeczko@intel.com&gt;
Reviewed-by: Michał Winiarski &lt;michal.winiarski@intel.com&gt;
Link: https://patch.msgid.link/20260303201354.17948-3-michal.wajdeczko@intel.com
</content>
</entry>
<entry>
<title>drm/xe/xe3p_lpg: Add LRC parsing for additional RCS engine state</title>
<updated>2026-02-10T13:09:05Z</updated>
<author>
<name>Matt Roper</name>
<email>matthew.d.roper@intel.com</email>
</author>
<published>2026-02-06T18:36:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4a0836a2604f1300b9f8886049e47f5aa4300c57'/>
<id>urn:sha1:4a0836a2604f1300b9f8886049e47f5aa4300c57</id>
<content type='text'>
Xe3p_LPG adds some additional state instructions to the RCS engine's
LRC.  Add support for these to the debugfs LRC parser.

Note that the bspec's LRC description page seems to have a few mistakes
in the name/spelling of these new instructions (e.g.,
"3DSTATE_TASK_DATA_EXT" instead of "3DSTATE_TASK_SHADER_DATA_EXT" or
"3DSTATE_VIEWPORT_STATE_POINTERS_CL_SF_2" instead of
"3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP_2").

Bspec: 65182
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Reviewed-by: Matt Atwood &lt;matthew.s.atwood@intel.com&gt;
Link: https://patch.msgid.link/20260206-nvl-p-upstreaming-v3-6-636e1ad32688@intel.com
Signed-off-by: Gustavo Sousa &lt;gustavo.sousa@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches</title>
<updated>2025-12-12T03:21:34Z</updated>
<author>
<name>Niranjana Vishwanathapura</name>
<email>niranjana.vishwanathapura@intel.com</email>
</author>
<published>2025-12-11T01:02:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1b5d39e6672fdee158c3306f5cb2df8975c77e5a'/>
<id>urn:sha1:1b5d39e6672fdee158c3306f5cb2df8975c77e5a</id>
<content type='text'>
To properly support soft light restore between batches
being arbitrated at the CFEG, PIPE_CONTROL instructions
have a new bit in the first DW, QUEUE_DRAIN_MODE. When
set, this indicates to the CFEG that it should only
drain the current queue.

Additionally we no longer want to set the CS_STALL bit
for these multi queue queues as this causes the entire
pipeline to stall waiting for completion of the prior
batch, preventing this soft light restore from occurring
between queues in a queue group.

v4: Assert !multi_queue where applicable (Matt Roper)

Bspec: 56551
Signed-off-by: Stuart Summers &lt;stuart.summers@intel.com&gt;
Signed-off-by: Niranjana Vishwanathapura &lt;niranjana.vishwanathapura@intel.com&gt;
Reviewed-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patch.msgid.link/20251211010249.1647839-29-niranjana.vishwanathapura@intel.com
</content>
</entry>
<entry>
<title>drm/xe/migrate: support MEM_COPY instruction</title>
<updated>2025-10-23T09:48:39Z</updated>
<author>
<name>Matthew Auld</name>
<email>matthew.auld@intel.com</email>
</author>
<published>2025-10-22T16:38:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1e12dbae9d726b1e4ada1e5e101ccf6bb7a8c8aa'/>
<id>urn:sha1:1e12dbae9d726b1e4ada1e5e101ccf6bb7a8c8aa</id>
<content type='text'>
Make this the default on xe2+ when doing a copy. This has a few
advantages over the exiting copy instruction:

1) It has a special PAGE_COPY mode that claims to be optimised for
   page-in/page-out, which is the vast majority of current users.

2) It also has a simple BYTE_COPY mode that supports byte granularity
   copying without any restrictions.

With 2) we can now easily skip the bounce buffer flow when copying
buffers with strange sizing/alignment, like for memory_access. But that
is left for the next patch.

v2 (Matt Brost):
  - Use device info to check whether device should use the MEM_COPY
    path. This should fit better with making this a configfs tunable.
  - And with that also keep old path still functional on xe2 for possible
    experimentation.
  - Add a define for PAGE_COPY page-size.
v3 (Matt Brost):
  - Fallback to an actual linear copy for pitch=1.
  - Also update NVL.

BSpec: 57561
Signed-off-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://lore.kernel.org/r/20251022163836.191405-7-matthew.auld@intel.com
</content>
</entry>
<entry>
<title>drm/xe/xelp: Implement Wa_16010904313</title>
<updated>2025-07-25T15:42:49Z</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@igalia.com</email>
</author>
<published>2025-07-11T16:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e8372edec948318934c71542e939d1a8fb8fabcf'/>
<id>urn:sha1:e8372edec948318934c71542e939d1a8fb8fabcf</id>
<content type='text'>
Add XeLP workaround 16010904313.

The description calls for it to be emitted as the indirect context buffer
workaround for render and compute, and from the workaround batch buffer
for the other engines. Therefore we plug into the previously added
respective top level emission functions.

The actual command streamer programming sequence differs from what is
described in the PRM, in that it assumes the listed LRCA offset was
supposed to actually refer to the location of the CTX_TIMESTAMP register
instead of LRCA + 0x180c (which is in GPR space). Latter appears to make
more sense under the assumption that multiple writes are helping with
restoring the CTX_TIMESTAMP register content from the saved context state.

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@igalia.com&gt;
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Cc: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://lore.kernel.org/r/20250711160153.49833-8-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value</title>
<updated>2025-05-12T21:33:23Z</updated>
<author>
<name>Umesh Nerlige Ramappa</name>
<email>umesh.nerlige.ramappa@intel.com</email>
</author>
<published>2025-05-09T16:12:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=38b14233e5deff51db8faec287b4acd227152246'/>
<id>urn:sha1:38b14233e5deff51db8faec287b4acd227152246</id>
<content type='text'>
For determining actual job execution time, save the current value of the
CTX_TIMESTAMP register rather than the value saved in LRC since the
current register value is the closest to the start time of the job.

v2: Define MI_STORE_REGISTER_MEM to fix compile error
v3: Place MI_STORE_REGISTER_MEM sorted by MI_INSTR (Lucas)

Fixes: 65921374c48f ("drm/xe: Emit ctx timestamp copy in ring ops")
Signed-off-by: Umesh Nerlige Ramappa &lt;umesh.nerlige.ramappa@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Link: https://lore.kernel.org/r/20250509161159.2173069-6-umesh.nerlige.ramappa@intel.com
</content>
</entry>
<entry>
<title>drm/xe: Invalidate L3 read-only cachelines for geometry streams too</title>
<updated>2025-03-31T16:18:41Z</updated>
<author>
<name>Kenneth Graunke</name>
<email>kenneth@whitecape.org</email>
</author>
<published>2025-03-30T16:59:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=61672806b579dd5a150a042ec9383be2bbc2ae7e'/>
<id>urn:sha1:61672806b579dd5a150a042ec9383be2bbc2ae7e</id>
<content type='text'>
Historically, the Vertex Fetcher unit has not been an L3 client.  That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.

Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3.  However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.

To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe."  In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.

Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.

xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches).  Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.

Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Fixes: 6ef3bb60557d ("drm/xe: enable lite restore")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Kenneth Graunke &lt;kenneth@whitecape.org&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Add MI_MATH and ALU instruction definitions</title>
<updated>2025-03-12T10:37:50Z</updated>
<author>
<name>Michal Wajdeczko</name>
<email>michal.wajdeczko@intel.com</email>
</author>
<published>2025-03-04T16:23:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b823f80bbd63f67c926a3a0c5dcf246fb8736d7b'/>
<id>urn:sha1:b823f80bbd63f67c926a3a0c5dcf246fb8736d7b</id>
<content type='text'>
The command streamer implements an Arithmetic Logic Unit (ALU)
which supports basic arithmetic and logical operations on two
64-bit operands. Access to this ALU is thru the MI_MATH command
and sixteen General Purpose Register (GPR) 64-bit registers,
which are used as temporary storage.

Bspec: 45737, 60236 # MI
Bspec: 45525, 60132 # ALU
Bspec: 45533, 60309 # GPR
Signed-off-by: Michal Wajdeczko &lt;michal.wajdeczko@intel.com&gt;
Reviewed-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250304162307.1866-1-michal.wajdeczko@intel.com
</content>
</entry>
<entry>
<title>drm/xe: Add MI_LOAD_REGISTER_REG command definition</title>
<updated>2025-03-12T10:37:49Z</updated>
<author>
<name>Michal Wajdeczko</name>
<email>michal.wajdeczko@intel.com</email>
</author>
<published>2025-03-03T17:35:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4383dd88fa77b8489b627125b268c3f1ab934e37'/>
<id>urn:sha1:4383dd88fa77b8489b627125b268c3f1ab934e37</id>
<content type='text'>
The MI_LOAD_REGISTER_REG command reads value from a source register
location and writes that value to a destination register location.

Bspec: 45730, 60233
Signed-off-by: Michal Wajdeczko &lt;michal.wajdeczko@intel.com&gt;
Reviewed-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250303173522.1822-2-michal.wajdeczko@intel.com
</content>
</entry>
</feed>
