aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/xe/instructions (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-07-25drm/xe/xelp: Implement Wa_16010904313Tvrtko Ursulin1-0/+1
Add XeLP workaround 16010904313. The description calls for it to be emitted as the indirect context buffer workaround for render and compute, and from the workaround batch buffer for the other engines. Therefore we plug into the previously added respective top level emission functions. The actual command streamer programming sequence differs from what is described in the PRM, in that it assumes the listed LRCA offset was supposed to actually refer to the location of the CTX_TIMESTAMP register instead of LRCA + 0x180c (which is in GPR space). Latter appears to make more sense under the assumption that multiple writes are helping with restoring the CTX_TIMESTAMP register content from the saved context state. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-8-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-12drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC valueUmesh Nerlige Ramappa1-0/+4
For determining actual job execution time, save the current value of the CTX_TIMESTAMP register rather than the value saved in LRC since the current register value is the closest to the start time of the job. v2: Define MI_STORE_REGISTER_MEM to fix compile error v3: Place MI_STORE_REGISTER_MEM sorted by MI_INSTR (Lucas) Fixes: 65921374c48f ("drm/xe: Emit ctx timestamp copy in ring ops") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-6-umesh.nerlige.ramappa@intel.com
2025-03-31drm/xe: Invalidate L3 read-only cachelines for geometry streams tooKenneth Graunke1-0/+1
Historically, the Vertex Fetcher unit has not been an L3 client. That meant that, when a buffer containing vertex data was written to, it was necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any VF L2 cachelines associated with that buffer, so the new value would be properly read from memory. Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER have included an "L3 Bypass Enable" bit which userspace drivers can set to request that the vertex fetcher unit snoop L3. However, unlike most true L3 clients, the "VF Cache Invalidate" bit continues to only invalidate the VF L2 cache - and not any associated L3 lines. To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation Bit", which according to the docs, "controls the invalidation of the Geometry streams cached in L3 cache at the top of the pipe." In other words, the vertex and index buffer data that gets cached in L3 when "L3 Bypass Disable" is set. Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only Cache Invalidate so that both L2 and L3 vertex data is invalidated. xe is issuing VF cache invalidates too (which handles cases like CPU writes to a buffer between GPU batches). Because userspace may enable L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well. Fixes significant flickering in Firefox on Meteorlake, which was writing to vertex buffers via the CPU between batches; the missing L3 Read Only invalidates were causing the vertex fetcher to read stale data from L3. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460 Fixes: 6ef3bb60557d ("drm/xe: enable lite restore") Cc: stable@vger.kernel.org # v6.13+ Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-12drm/xe: Add MI_MATH and ALU instruction definitionsMichal Wajdeczko2-0/+80
The command streamer implements an Arithmetic Logic Unit (ALU) which supports basic arithmetic and logical operations on two 64-bit operands. Access to this ALU is thru the MI_MATH command and sixteen General Purpose Register (GPR) 64-bit registers, which are used as temporary storage. Bspec: 45737, 60236 # MI Bspec: 45525, 60132 # ALU Bspec: 45533, 60309 # GPR Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250304162307.1866-1-michal.wajdeczko@intel.com
2025-03-12drm/xe: Add MI_LOAD_REGISTER_REG command definitionMichal Wajdeczko1-0/+4
The MI_LOAD_REGISTER_REG command reads value from a source register location and writes that value to a destination register location. Bspec: 45730, 60233 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250303173522.1822-2-michal.wajdeczko@intel.com
2025-03-10drm/xe/xe3: Recognize 3DSTATE_COARSE_PIXEL in LRC dumpsMatt Roper1-0/+1
Xe3 adds a new 3DSTATE_COARSE_PIXEL state instruction as part of the render engine LRC. Ensure we can recognize and report this properly in the LRC dumps. Bspec: 65182, 73415 Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307190754.678376-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-02-03drm/xe/pxp: Add VCS inline termination supportDaniele Ceraolo Spurio3-0/+34
The key termination is done with a specific submission to the VCS engine. This flow will be triggered in response to a termination interrupt, whose handling is coming in a follow-up patch in the series. v2: clean up defines and command emission code. (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-4-daniele.ceraolospurio@intel.com
2025-01-02Revert "drm/xe: Force write completion of MI_STORE_DATA_IMM"José Roberto de Souza1-7/+6
This reverts commit 1460bb1fef9ccf7390af0d74a15252442fd6effd. In all places the MI_STORE_DATA_IMM are not followed by a read of the same memory address in the same batch buffer and the posted writes are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c functions so there is no need to set this register. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Fixes: 1460bb1fef9c ("drm/xe: Force write completion of MI_STORE_DATA_IMM") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241227183230.101334-1-jose.souza@intel.com
2024-12-18drm/xe: Force write completion of MI_STORE_DATA_IMMJosé Roberto de Souza1-6/+7
With Force write completion unset there is no guarantees of when the write will be globally visible what is not the behavior wanted. Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241217160732.46280-1-jose.souza@intel.com
2024-06-18drm/xe/oa: Add OAR supportAshutosh Dixit1-0/+1
Add OAR support to allow userspace to execute MI_REPORT_PERF_COUNT on render engines. Configuration batches are used to program the OAR unit, as well as modifying the render engine context image of a specified exec queue (to have correct register values when that context switches in). v2: Rename/refactor xe_oa_modify_self (Umesh) v3: Move IS_MI_LRI_CMD() into xe_oa.c (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-11-ashutosh.dixit@intel.com
2024-06-12drm/xe: Add MI_COPY_MEM_MEM GPU instruction definitionsMatthew Brost1-0/+4
MI_COPY_MEM_MEM GPU instructions are used to copy ctx timestamp from a LRC registers to another location at the beginning of every jobs execution. Add MI_COPY_MEM_MEM GPU instruction definitions. v2: - Include MI_COPY_MEM_MEM based on instruction order (Michal) - Fix tabs/spaces issue (Michal) - Use macro for DW definition (Michal) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240611144053.2805091-3-matthew.brost@intel.com
2024-05-09drm/xe: Move xe_gpu_commands.h file to instructions/Michal Wajdeczko1-0/+70
All other files with commands definitions are in instructions/ folder. Move xe_gpu_commands.h also there. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240508174856.1908-1-michal.wajdeczko@intel.com
2024-02-29drm/xe: Add LRC parsing for more GPU instructionsMatt Roper3-0/+22
The LRCs on some of our newer platforms appear to contain a few GPU instructions that weren't handled in our LRC parser. Add the relevant instruction names and opcodes so that our debugfs LRC dumps will properly indicate what these are. Bspec: 55866, 64848, 46931 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240222184009.6857-2-matthew.d.roper@intel.com
2023-12-21drm/xe: Add command MI_LOAD_REGISTER_MEMMichal Wajdeczko1-0/+3
We will need this shortly during context state preparation. Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20231214185955.1791-2-michal.wajdeczko@intel.com Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
2023-12-21drm/xe/gsc: Query GSC compatibility versionDaniele Ceraolo Spurio1-0/+2
The version is obtained via a dedicated MKHI GSC HECI command. The compatibility version is what we want to match against for the GSC, so we need to call the FW version checker after obtaining the version. Since this is the first time we send a GSC HECI command via the GSCCS, this patch also introduces common infrastructure to send such commands to the GSC. Communication with the GSC FW is done via input/output buffers, whose addresses are provided via a GSCCS command. The buffers contain a generic header and a client-specific packet (e.g. PXP, HDCP); the clients don't care about the header format and/or the GSCCS command in the batch, they only care about their client-specific header. This patch therefore introduces helpers that allow the callers to automatically fill in the input header, submit the GSCCS job and decode the output header, to make it so that the caller only needs to worry about their client-specific input and output messages. v3: squash of 2 separate patches ahead of merge, so that the common functions and their first user are added at the same time Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.Com> #v1 Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe/gsc: GSC FW loadDaniele Ceraolo Spurio2-0/+35
The GSC FW must be copied in a 4MB stolen memory allocation, whose GGTT address is then passed as a parameter to a dedicated load instruction submitted via the GSC engine. Since the GSC load is relatively slow (up to 250ms), we perform it asynchronously via a worker. This requires us to make sure that the worker has stopped before suspending/unloading. Note that we can't yet use xe_migrate_copy for the copy because it doesn't work with stolen memory right now, so we do a memcpy from the CPU side instead. v2: add comment about timeout value, fix GSC status checking before load (John) Bspec: 65306, 65346 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe/xe2: Update SVG state handlingMatt Roper1-0/+1
As with DG2/MTL, Xe2 also fails to emit instruction headers for SVG state instructions if no explicit state has been set. The SVG part of the LRC is nearly identical to DG2/MTL; the only change is that 3DSTATE_DRAWING_RECTANGLE has been replaced by 3DSTATE_DRAWING_RECTANGLE_FAST, so we can just re-use the same state table and handle that single instruction when we encounter it. Bspec: 65182 Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20231025151732.3461842-8-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Emit SVG state on RCS during driver load on DG2 and MTLMatt Roper1-0/+51
When recording the default LRC, the expectation is that the hardware's original state settings (both register and instruction) will be written out to the LRC upon first context switch. For many 3DSTATE_* state instructions that don't truly have "default" values, this translates to a simple instruction header (opcodes + dword length) being written to the LRC, followed by an appropriate number of blank dwords as a place holder. When userspace creates a context (which starts as a copy of the default LRC), they'll generally emit real 3DSTATE_* as part of their initialization to select the settings they desire. If they don't emit one of the 3DSTATE instructions, then the zeroed dwords that remain in their LRC image generally translate to various state remaining disabled. This will either be what userspace wants or will lead to very reproducible and easily-debugged problems (rendering glitches, engine hangs). It turns out that a subset of the 3DSTATE instructions, specifically those belonging to the SVG (State Variable - Global) unit, are not only emitting 0's for the instruction's "body" dwords, but also for the instruction header dword if no specific state has been explicitly set before context switch. This means that when the hardware switches to a context that hasn't explicitly provided an appropriate state setting, the hardware will just see a sequence of NOOPs in the spot reserved for that 3DSTATE instruction while executing the LRC, and the actual hardware state setting will unintentionally inherit the configuration used by the previously running context. Now when userspace makes a mistake and forgets to emit an important state instruction they no longer get consistent, easily-reproducible corruption/hangs, but rather erratic behavior where the presence/absence of a problem depends on what other workloads are running on the system and what order the contexts are scheduled on the engine. A specific example of this that came up recently related to mesh shading The OpenGL driver was not specifically emitting a 3DSTATE_MESH_CONTROL to disable mesh shading at context init, so on context switch, mesh shading would either be on or off depending on what the previous context had been doing. Vulkan apps _were_ enabling mesh shading, so running a Vulkan app and then context switching to an OpenGL app resulted in mesh shading still unexpectedly being enabled during OpenGL operation, and since other Mesh-related state was not properly initialized for that context a GPU hang was seen. Due to the specific ordering requirements (Vulkan app runs first, followed by OpenGL app), it took additional debug effort to track down the cause of the problem. There are various workarounds related to this behavior, with current implementations handled in the userspace drivers. E.g., Wa_14019789679 and Wa_22018402687. However it's been suggested that the kernel driver can help simplify things here by emitting zeroed SVG state with proper instruction headers as part of our default context creation (i.e., at the same point we apply LRC workarounds). This will help ensure that any future cases where a userspace driver does not emit an important state setting will result in consistent behavior. Bspec: 46261 Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20231025151732.3461842-7-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe/debugfs: Include GFXPIPE commands in LRC dumpMatt Roper1-0/+108
RCS and CCS engines include several non-register gfxpipe commands in their LRC images. Include these in the dump output so that we can see exactly what's inside the context snapshot. v2: - Include raw instruction header in output - Add 3DSTATE_AMFS_TEXTURE_POINTERS and 3DSTATE_MONOFILTER_SIZE. The first was supposed to be removed in Xe_HPG, and the second by gen12, but both still show up in the RCS LRC. v3: - Sanity check that we don't have numdw > remaining_dw. (Lucas) Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-14-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe/debugfs: Add dump of default LRCs' MI instructionsMatt Roper2-0/+4
For non-RCS engines, nearly all of the LRC state is composed of MI instructions (specifically MI_LOAD_REGISTER_IMM). Providing a dump interface allows us to verify that the context image layout matches what's documented in the bspec, and also allows us to check whether LRC workarounds are being properly captured by the default state we record at startup. For now, the non-MI instructions found in the RCS and CCS engines will dump as "unknown;" parsing of those will be added in a follow-up patch. v2: - Add raw instruction header as well as decoded meaning. (Lucas) - Check that num_dw isn't greater than remaining_dw for instructions that have a "# dwords" field. (Lucas) - Clarify comment about skipping over ppHWSP. (Lucas) Bspec: 64993 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21drm/xe: Extract MI_* instructions to their own headerMatt Roper2-0/+89
Extracting the common MI_* instructions that can be used with any engine to their own header will make it easier as we add additional engine instructions in upcoming patches. Also, since the majority of GPU instructions (both MI and non-MI) have a "length" field in bits 7:0 of the instruction header, a common define is added for that. Instruction-specific length fields are still defined for special case instructions that have larger/smaller length fields. v2: - Use "instr" instead of "inst" as the short form of "instruction" everywhere. (Lucas) - Include xe_reg_defs.h instead of the i915 compat header. (Lucas) Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>