linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Files	Lines
2025-11-07	drm/xe: Enforce correct user fence signaling order using	Matthew Brost	1	-0/+14
	Prevent application hangs caused by out-of-order fence signaling when user fences are attached. Use drm_syncobj (via dma-fence-chain) to guarantee that each user fence signals in order, regardless of the signaling order of the attached fences. Ensure user fence writebacks to user space occur in the correct sequence. v7: - Skip drm_syncbj create of error (CI) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com (cherry picked from commit adda4e855ab6409a3edaa585293f1f2069ab7299) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-16	drm/xe: Fix error handling if PXP fails to start	Daniele Ceraolo Spurio	1	-7/+15
	Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini). Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
2025-08-27	drm/xe: s/tlb_invalidation/tlb_inval	Matthew Brost	1	-1/+1
	tlb_invalidation is a bit verbose leading to ugly wraps in the code, shorten to tlb_inval. Signed-off-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250826182911.392550-4-stuart.summers@intel.com
2025-08-08	drm/xe/vf: Refactor CCS save/restore to use default migration context	Satyanarayana K V P	1	-0/+15
	Previously, CCS save/restore operations created separate migration contexts with new VM memory allocations, resulting in significant overhead. This commit eliminates redundant context creation reusing the default migration context by registering new execution queues for CCS save and restore on the existing migrate VM. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250808073628.32745-2-satyanarayana.k.v.p@intel.com
2025-08-04	drm/xe/vf: Refresh utilization buffer during migration recovery	Tomasz Lis	1	-1/+9
	The WA buffer we use to capture context utilization contains GGTT references. This means its instructions have to be either fixed or re-emitted during VF post-migration recovery. This patch adds re-emitting content of the utilization WA BB during the recovery. The way we write to vram requires scratch buffer to be used before the whole block is memcopied. We are re-using a scratch buffer introduced in earlier part of the recovery. This is not a performance optimization, but a necessity to avoid creating dependencies between locks. v2: Notable rebase after "Prepare WA BB setup for more users" patch v3: Added error propagation Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-8-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-08-04	drm/xe/vf: Post migration, repopulate ring area for pending request	Tomasz Lis	1	-0/+24
	The commands within ring area allocated for a request may contain references to GGTT. These references require update after VF migration, in order to continue any preempted LRCs, or jobs which were emitted to the ring but not sent to GuC yet. This change calls the emit function again for all such jobs, as part of post-migration recovery. v2: Moved few functions to better files v3: Take job_list_lock v4: Rephrased comments Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-7-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-08-04	drm/xe/vf: Rebase MEMIRQ structures for all contexts after migration	Tomasz Lis	1	-2/+5
	All contexts require an update of state data, as the data includes GGTT references to memirq-related buffers. Default contexts need these references updated as well, because they are not refreshed when a new context is created from them. The way we write to vram requires scratch buffer to be used before the whole block is memcopied. Since using kalloc() within specific recovery functions would lead to unintended relations between locks, we are allocating the buffer earlier, before any locks are taken. The same buffer will be used for other steps of the recovery. v2: Update addresses by xe_lrc_write_ctx_reg() rather than set_memory_based_intr() v3: Renamed parameter, reordered parameters in some functs v4: Check if have MEMIRQ, move `xe_gt*` funct to proper file v5: Revert back to requiring scratch buffer, but allocate it earlier this time Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-6-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-08-04	drm/xe/vf: Rebase HWSP of all contexts after migration	Tomasz Lis	1	-0/+13
	All contexts require an update due to GGTT range shift, as that affects their HWSP. The HW status page of a context contains GGTT references, which need to be shifted to a new range (or re-computed using the previously updated vma nodes). The references include ring start address and indirect state address. v2: move some functions to better matched files v3: Add missing kerneldocs v4: Style fix Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-5-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-07-24	drm/xe: Add dependency scheduler for GT TLB invalidations to bind queues	Matthew Brost	1	-0/+48
	Add a generic dependency scheduler for GT TLB invalidations, used to schedule jobs that issue GT TLB invalidations to bind queues. v2: - Use shared GT TLB invalidation queue for dep scheduler - Break allocation of dep scheduler into its own function - Add define for max number tlb invalidations - Skip media if not present Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250724191216.4076566-5-matthew.brost@intel.com
2025-07-02	drm/xe: Don't compare GT ID to GT count when determining valid GTs	Matt Roper	1	-1/+1
	On current platforms with multiple GTs, all of the GT IDs are consecutive; as a result we know that the GT IDs range from 0 to gt_count-1 and can determine if a GT ID is valid by comparing against the count. The consecutive nature of GT IDs may not hold true on future platforms if/when we have platforms that are both multi-tile and have multiple GTs within each tile. Once such platforms exist, it's quite possible that we could wind up with something like a GT list composed of IDs 0, 2, and 3 with no GT 1 (which would be a 2-tile platform with media only on the second tile). To future-proof the code we should stop comparing against the GT count to determine whether a GT ID is valid or not. Instead we should do an actual lookup of the ID to determine whether the GT exists. This also means that our GT loop macro should not end at the GT count, but should rather examine the entire space up to (# of tiles) * (max GT per tile) to ensure it doesn't stop prematurely. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250701201320.2514369-15-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-06-02	drm/xe: remove unmatched xe_vm_unlock() from __xe_exec_queue_init()	Maciej Patelczyk	1	-5/+1
	There is unmatched xe_vm_unlock() in the __xe_exec_queue_init(). Leftover from commit fbeaad071a98 ("drm/xe: Create LRC BO without VM") Fixes: fbeaad071a98 ("drm/xe: Create LRC BO without VM") Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Link: https://lore.kernel.org/r/20250530135627.2821612-1-maciej.patelczyk@intel.com
2025-05-29	drm/xe: Create LRC BO without VM	Niranjana Vishwanathapura	1	-9/+0
	Specifying VM during lrc->bo creation requires VM's reference to be held for the lifetime of lrc->bo as it will use VM's dma reservation object. Using VM's dma reservation object for lrc->bo doesn't provide any advantage. Hence do not pass VM while creating lrc->bo. v2: Use xe_bo_unpin_map_no_vm (Matthew Brost) Fixes: 264eecdba211 ("drm/xe: Decouple xe_exec_queue and xe_lrc") Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250529052031.2429120-2-niranjana.vishwanathapura@intel.com
2025-05-12	drm/xe: Add WA BB to capture active context utilization	Umesh Nerlige Ramappa	1	-1/+1
	Context Timestamp (CTX_TIMESTAMP) in the LRC accumulates the run ticks of the context, but only gets updated when the context switches out. In order to check how long a context has been active before it switches out, two things are required: (1) Determine if the context is running: To do so, we program the WA BB to set an initial value for CTX_TIMESTAMP in the LRC. The value chosen is 1 since 0 is the initial value when the LRC is initialized. During a query, we just check for this value to determine if the context is active. If the context switched out, it would overwrite this location with the actual CTX_TIMESTAMP MMIO value. Note that WA BB runs as the last part of the context restore, so reusing this LRC location will not clobber anything. (2) Calculate the time that the context has been active for: The CTX_TIMESTAMP ticks only when the context is active. If a context is active, we just use the CTX_TIMESTAMP MMIO as the new value of utilization. While doing so, we need to read the CTX_TIMESTAMP MMIO for the specific engine instance. Since we do not know which instance the context is running on until it is scheduled, we also read the ENGINE_ID MMIO in the WA BB and store it in the PPHSWP. Using the above 2 instructions in a WA BB, capture active context utilization. v2: (Matt Brost) - This breaks TDR, fix it by saving the CTX_TIMESTAMP register "drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value" - Drop tile from LRC if using gt "drm/xe: Save the gt pointer in LRC and drop the tile" v3: - Remove helpers for bb_per_ctx_ptr (Matt) - Add define for context active value (Matt) - Use 64 bit CTX TIMESTAMP for platforms that support it. For platforms that don't, live with the rare race. (Matt, Lucas) - Convert engine id to hwe and get the MMIO value (Lucas) - Correct commit message on when WA BB runs (Lucas) v4: - s/GRAPHICS_VER(...)/xe->info.has_64bit_timestamp/ (Matt) - Drop support for active utilization on a VF (CI failure) - In xe_lrc_init ensure the lrc value is 0 to begin with (CI regression) v5: - Minor checkpatch fix - Squash into previous commit and make TDR use 32-bit time - Update code comment to match commit msg Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4532 Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-8-umesh.nerlige.ramappa@intel.com
2025-05-07	drm/xe: Use copy_from_user() instead of __copy_from_user()	Harish Chegondi	1	-5/+4
	copy_from_user() has more checks and is more safer than __copy_from_user() Suggested-by: Kees Cook <kees@kernel.org> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/acabf20aa8621c7bc8de09b1bffb8d14b5376484.1746126614.git.harish.chegondi@intel.com
2025-03-06	drm/xe: Allow fault injection in exec queue IOCTLs	Francois Dugast	1	-0/+1
	Use fault injection infrastructure to allow specific functions to be configured over debugfs for failing during the execution of xe_exec_queue_create_ioctl(). xe_exec_queue_destroy_ioctl() and xe_exec_queue_get_property_ioctl() are not considered as there is no unwinding code to test with fault injection. This allows more thorough testing from user space by going through code paths for error handling and unwinding which cannot be reached by simply injecting errors in IOCTL arguments. This can help increase code robustness. The corresponding IGT series is: https://patchwork.freedesktop.org/series/144138/ Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250305150659.46276-1-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>
2025-03-05	drm/xe/uapi: Use hint for guc to set GT frequency	Tejas Upadhyay	1	-3/+7
	Allow user to provide a low latency hint. When set, KMD sends a hint to GuC which results in special handling for that process. SLPC will ramp the GT frequency aggressively every time it switches to this process. We need to enable the use of SLPC Compute strategy during init, but it will apply only to processes that set this bit during process creation. Improvement with this approach as below: Before, :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz Kernel launch latency : 283.16 us After, :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz Kernel launch latency : 63.38 us Compute PR: https://github.com/intel/compute-runtime/pull/794 Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214 IGT PR: https://patchwork.freedesktop.org/patch/639989/ V10(Lucas): - Remove doc from drm-uapi.rst v9(Vinay): - remove extra line, align commit message v8(Vinay): - Add separate example for using low latency hint v7(Jose): - Update UMD PR - applicable to all gpus V6: - init flags, remove redundant flags check (MAuld) V5: - Move uapi doc to documentation and GuC ABI specific change (Rodrigo) - Modify logic to restrict exec queue flags (MAuld) V4: - To make it clear, dont use exec queue word (Vinay) - Correct typo in description of flag (Jose/Vinay) - rename set_strategy api and replace ctx with exec queue(Vinay) - Start with 0th bit to indentify user flags (Jose) V3: - Conver user flag to kernel internal flag and use (Oak) - Support query config for use to check kernel support (Jose) - Dont need to take runtime pm (Vinay) V2: - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon) - Add motivation to description (Lucas) Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-02-19	drm/xe: Drop unnecessary GT lookup in xe_exec_queue_create_ioctl()	Matt Roper	1	-4/+2
	xe_exec_queue_create_ioctl() performs a lookup of the xe_gt for the GT ID passed from userspace, but the result is never actually used. Since there's already a separate (and earlier) check that the ID passed from userspace is valid, the unnecessary lookup can be removed. Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250218200511.4050060-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-02-03	drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues	Daniele Ceraolo Spurio	1	-2/+54
	Userspace is required to mark a queue as using PXP to guarantee that the PXP instructions will work. In addition to managing the PXP sessions, when a PXP queue is created the driver will set the relevant bits in its context control register. On submission of a valid PXP queue, the driver will validate all encrypted objects mapped to the VM to ensured they were encrypted with the current key. v2: Remove pxp_types include outside of PXP code (Jani), better comments and code cleanup (John) v3: split the internal PXP management to a separate patch for ease of review. re-order ioctl checks to always return -EINVAL if parameters are invalid, rebase on msix changes. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-9-daniele.ceraolospurio@intel.com
2025-02-03	drm/xe/pxp: Add PXP queue tracking and session start	Daniele Ceraolo Spurio	1	-0/+1
	We expect every queue that uses PXP to be marked as doing so, to allow the driver to correctly manage the encryption status. The API for doing this from userspace is coming in the next patch, while this patch implement the management side of things. When a PXP queue is created, the driver will do the following: - Start the default PXP session if it is not already running; - assign an rpm ref to the queue to keep for its lifetime (this is required because PXP HWDRM sessions are killed by the HW suspend flow). Since PXP start and termination can race each other, this patch also introduces locking and a state machine to keep track of the pending operations. Note that since we'll need to take the lock from the suspend/resume paths as well, we can't do submissions while holding it, which means we need a slightly more complicated state machine to keep track of intermediate steps. v4: new patch in the series, split from the following interface patch to keep review manageable. Lock and status rework to not do submissions under lock. v5: Improve comments and error logs (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-8-daniele.ceraolospurio@intel.com
2025-02-03	drm/xe/pxp: Allocate PXP execution resources	Daniele Ceraolo Spurio	1	-0/+3
	PXP requires submissions to the HW for the following operations 1) Key invalidation, done via the VCS engine 2) Communication with the GSC FW for session management, done via the GSCCS. Key invalidation submissions are serialized (only 1 termination can be serviced at a given time) and done via GGTT, so we can allocate a simple BO and a kernel queue for it. Submissions for session management are tied to a PXP client (identified by a unique host_session_id); from the GSC POV this is a user-accessible construct, so all related submission must be done via PPGTT. The driver does not currently support PPGTT submission from within the kernel, so to add this support, the following changes have been included: - a new type of kernel-owned VM (marked as GSC), required to ensure we don't use fault mode on the engine and to mark the different lock usage with lockdep. - a new function to map a BO into a VM from within the kernel. v2: improve comments and function name, remove unneeded include (John) v3: fix variable/function names in documentation Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-3-daniele.ceraolospurio@intel.com
2025-01-10	Merge tag 'v6.13-rc6' into drm-next	Dave Airlie	1	-1/+0
	This backmerges Linux 6.13-rc6 this is need for the newer pulls. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-12-23	drm/xe: Fix fault on fd close after unbind	Lucas De Marchi	1	-0/+9
	If userspace holds an fd open, unbinds the device and then closes it, the driver shouldn't try to access the hardware. Protect it by using drm_dev_enter()/drm_dev_exit(). This fixes the following page fault: <6> [IGT] xe_wedged: exiting, ret=98 <1> BUG: unable to handle page fault for address: ffffc901bc5e508c <1> #PF: supervisor read access in kernel mode <1> #PF: error_code(0x0000) - not-present page ... <4> xe_lrc_update_timestamp+0x1c/0xd0 [xe] <4> xe_exec_queue_update_run_ticks+0x50/0xb0 [xe] <4> xe_exec_queue_fini+0x16/0xb0 [xe] <4> __guc_exec_queue_fini_async+0xc4/0x190 [xe] <4> guc_exec_queue_fini_async+0xa0/0xe0 [xe] <4> guc_exec_queue_fini+0x23/0x40 [xe] <4> xe_exec_queue_destroy+0xb3/0xf0 [xe] <4> xe_file_close+0xd4/0x1a0 [xe] <4> drm_file_free+0x210/0x280 [drm] <4> drm_close_helper.isra.0+0x6d/0x80 [drm] <4> drm_release_noglobal+0x20/0x90 [drm] Fixes: 514447a12190 ("drm/xe: Stop accumulating LRC timestamp on job_free") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3421 Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218053122.2730195-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 4ca1fd418338d4d135428a0eb1e16e3b3ce17ee8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-12-20	drm/xe: Use q->xef for accessing xe file	Lucas De Marchi	1	-8/+4
	No need to traverse through the vm object as each exec queue maintains a reference to xe_file. Also improve/simplify the comment on why xef is checked. Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218053122.2730195-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-12-20	drm/xe: Fix fault on fd close after unbind	Lucas De Marchi	1	-0/+9
	If userspace holds an fd open, unbinds the device and then closes it, the driver shouldn't try to access the hardware. Protect it by using drm_dev_enter()/drm_dev_exit(). This fixes the following page fault: <6> [IGT] xe_wedged: exiting, ret=98 <1> BUG: unable to handle page fault for address: ffffc901bc5e508c <1> #PF: supervisor read access in kernel mode <1> #PF: error_code(0x0000) - not-present page ... <4> xe_lrc_update_timestamp+0x1c/0xd0 [xe] <4> xe_exec_queue_update_run_ticks+0x50/0xb0 [xe] <4> xe_exec_queue_fini+0x16/0xb0 [xe] <4> __guc_exec_queue_fini_async+0xc4/0x190 [xe] <4> guc_exec_queue_fini_async+0xa0/0xe0 [xe] <4> guc_exec_queue_fini+0x23/0x40 [xe] <4> xe_exec_queue_destroy+0xb3/0xf0 [xe] <4> xe_file_close+0xd4/0x1a0 [xe] <4> drm_file_free+0x210/0x280 [drm] <4> drm_close_helper.isra.0+0x6d/0x80 [drm] <4> drm_release_noglobal+0x20/0x90 [drm] Fixes: 83db047d9425 ("drm/xe: Stop accumulating LRC timestamp on job_free") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3421 Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218053122.2730195-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-12-13	drm/xe: Initial MSI-X support for HW engines	Ilia Levi	1	-1/+3
	- Configure the HW engines to work with MSI-X - Program the LRC to use memirq infra (similar to VF) - CS_INT_VEC field added to the LRC Bspec: 60342, 72547 Signed-off-by: Ilia Levi <ilia.levi@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241213072538.6823-3-ilia.levi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-11-14	drm/xe: Allow fault injection in vm create and vm bind IOCTLs	Francois Dugast	1	-0/+1
	Use fault injection infrastructure to allow specific functions to be configured over debugfs for failing during the execution of xe_vm_create_ioctl() and xe_vm_bind_ioctl(). This allows more thorough testing from user space by going through code paths for error handling and unwinding which cannot be reached by simply injecting errors in IOCTL arguments. This can help increase code robustness. v2: Add xe_pt_update_ops_{prepare,run} (Matthew Brost) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241113162212.2154103-1-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>
2024-11-13	drm/xe: Wait on killed exec queues	Lucas De Marchi	1	-0/+6
	When an exec queue is killed it triggers an async process of asking the GuC to schedule the context out. The timestamp in the context image is only updated when this process completes. In case a userspace process kills an exec and tries to read the timestamp, it may not get an updated runtime. Add synchronization between the process reading the fdinfo and the exec queue being killed. After reading all the timestamps, wait on exec queues in the process of being killed. When that wait is over, xe_exec_queue_fini() was already called and updated the timestamps. v2: Do not update pending_removal before validating user args (Matthew Auld) v3: Move wait on pending to be done before getting any timestamp so it's more likely for the gpu and exec queue timestamps to be closer together Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667 Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241108053318.3483678-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05	drm/xe: Stop accumulating LRC timestamp on job_free	Lucas De Marchi	1	-0/+6
	The exec queue timestamp is only really useful when it's being queried through the fdinfo. There's no need to update it so often, on every job_free. Tracing a simple app like vkcube running shows an update rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot of pcie transactions. The update on job_free() is used to cover a gap: if exec queue is created and destroyed rapidly, before a new query, the timestamp still needs to be accumulated and accounted for in the xef. Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") couldn't do it on the exec_queue_fini since the xef could be gone at that point. However since commit ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is refcounted and the exec queue always holds a reference, making this safe now. Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") by reducing the frequency in which the update is needed. Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 83db047d9425d9a649f01573797558eff0f632e1) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05	drm/xe: Stop accumulating LRC timestamp on job_free	Lucas De Marchi	1	-0/+6
	The exec queue timestamp is only really useful when it's being queried through the fdinfo. There's no need to update it so often, on every job_free. Tracing a simple app like vkcube running shows an update rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot of pcie transactions. The update on job_free() is used to cover a gap: if exec queue is created and destroyed rapidly, before a new query, the timestamp still needs to be accumulated and accounted for in the xef. Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") couldn't do it on the exec_queue_fini since the xef could be gone at that point. However since commit ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is refcounted and the exec queue always holds a reference, making this safe now. Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") by reducing the frequency in which the update is needed. Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03	drm/xe/queue: move xa_alloc to prevent UAF	Matthew Auld	1	-1/+3
	Evil user can guess the next id of the queue before the ioctl completes and then call queue destroy ioctl to trigger UAF since create ioctl is still referencing the same queue. Move the xa_alloc all the way to the end to prevent this. v2: - Rebase Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-4-matthew.auld@intel.com (cherry picked from commit 16536582ddbebdbdf9e1d7af321bbba2bf955a87) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03	drm/xe: Clean up VM / exec queue file lock usage.	Matthew Brost	1	-2/+0
	Both the VM / exec queue file lock protect the lookup and reference to the object, nothing more. These locks are not intended anything else underneath them. XA have their own locking too, so no need to take the VM / exec queue file lock aside from when doing a lookup and reference get. Add some kernel doc to make this clear and cleanup a few typos too. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240921011712.2681510-1-matthew.brost@intel.com (cherry picked from commit fe4f5d4b661666a45b48fe7f95443f8fefc09c8c) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-09-27	drm/xe/queue: move xa_alloc to prevent UAF	Matthew Auld	1	-1/+3
	Evil user can guess the next id of the queue before the ioctl completes and then call queue destroy ioctl to trigger UAF since create ioctl is still referencing the same queue. Move the xa_alloc all the way to the end to prevent this. v2: - Rebase Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-4-matthew.auld@intel.com
2024-09-24	drm/xe: Clean up VM / exec queue file lock usage.	Matthew Brost	1	-2/+0
	Both the VM / exec queue file lock protect the lookup and reference to the object, nothing more. These locks are not intended anything else underneath them. XA have their own locking too, so no need to take the VM / exec queue file lock aside from when doing a lookup and reference get. Add some kernel doc to make this clear and cleanup a few typos too. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240921011712.2681510-1-matthew.brost@intel.com
2024-09-12	drm/xe: fix missing 'xe_vm_put'	Dafna Hirschfeld	1	-1/+3
	Fix memleak caused by missing xe_vm_put Fixes: 852856e3b6f6 ("drm/xe: Use reserved copy engine for user binds on faulting devices") Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240901044227.1177211-1-dhirschfeld@habana.ai Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 249df8cbecf0ab4877eab66cae857748631831a9) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-09-06	drm/xe: fix missing 'xe_vm_put'	Dafna Hirschfeld	1	-1/+3
	Fix memleak caused by missing xe_vm_put Fixes: 852856e3b6f6 ("drm/xe: Use reserved copy engine for user binds on faulting devices") Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240901044227.1177211-1-dhirschfeld@habana.ai Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-08-28	drm/xe: replace #include <drm/xe_drm.h> with <uapi/drm/xe_drm.h>	Jani Nikula	1	-1/+1
	include/drm/xe_drm.h does not exist. Prefer the explicit uapi include. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827091539.4136838-1-jani.nikula@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-08-19	drm/xe: prevent UAF around preempt fence	Matthew Auld	1	-1/+0
	The fence lock is part of the queue, therefore in the current design anything locking the fence should then also hold a ref to the queue to prevent the queue from being freed. However, currently it looks like we signal the fence and then drop the queue ref, but if something is waiting on the fence, the waiter is kicked to wake up at some later point, where upon waking up it first grabs the lock before checking the fence state. But if we have already dropped the queue ref, then the lock might already be freed as part of the queue, leading to uaf. To prevent this, move the fence lock into the fence itself so we don't run into lifetime issues. Alternative might be to have device level lock, or only release the queue in the fence release callback, however that might require pushing to another worker to avoid locking issues. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2454 References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2342 References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2020 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240814110129.825847-2-matthew.auld@intel.com
2024-08-17	drm/xe/exec_queue: Prepare last fence for hw engine group resume context	Francois Dugast	1	-2/+31
	Ensure we can safely take a ref of the exec queue's last fence from the context of resuming jobs from the hw engine group. The locking requirements differ from the general case, hence the introduction of this new function. v2: Add kernel doc, rework the code to prevent code duplication v3: Fix kernel doc, remove now unnecessary lockdep variants (Matt Brost) v4: Remove new put function (Matt Brost) Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-7-francois.dugast@intel.com
2024-08-17	drm/xe/exec_queue: Remove duplicated code	Francois Dugast	1	-4/+1
	This code section is the same as the body of xe_exec_queue_last_fence_put_unlocked() so call the function instead and remove duplicated code to make maintenance easier. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-6-francois.dugast@intel.com
2024-08-17	'drm/xe/hw_engine_group: Register hw engine group's exec queues	Francois Dugast	1	-0/+11
	Add helpers to safely add and delete the exec queues attached to a hw engine group, and make use them at the time of creation and destruction of the exec queues. Keeping track of them is required to control the execution mode of the hw engine group. v2: Improve error handling and robustness, suspend exec queues created in fault mode if group in dma-fence mode, init queue link (Matt Brost) v3: Delete queue from hw engine group when it is destroyed by the user, also clean up at the time of closing the file in case the user did not destroy the queue v4: Use correct list when checking if empty, do not add the queue if VM is in xe_vm_in_preempt_fence_mode (Matt Brost) v5: Remove unrelated newline, add checks and asserts for group, unwind on suspend failure (Matt Brost) Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-4-francois.dugast@intel.com
2024-08-17	drm/xe: Use reserved copy engine for user binds on faulting devices	Matthew Brost	1	-59/+63
	User binds map to engines with can fault, faults depend on user binds completion, thus we can deadlock. Avoid this by using reserved copy engine for user binds on faulting devices. While we are here, normalize bind queue creation with a helper. v2: - Pass in extensions to bind queue creation (CI) v3: - s/resevered/reserved (Lucas) - Fix NULL hwe check (Jonathan) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240816034033.53837-1-matthew.brost@intel.com
2024-08-15	drm/xe: Make exec_queue_kill safe to call twice	Daniele Ceraolo Spurio	1	-0/+9
	An upcoming PXP patch will kill queues at runtime when a PXP invalidation event occurs, so we need exec_queue_kill to be safe to call multiple times. v2: Add documentation (Matt B) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240814205654.1716586-1-daniele.ceraolospurio@intel.com
2024-08-09	drm/xe: Move VM dma-resv lock from xe_exec_queue_create to __xe_exec_queue_init	Matthew Brost	1	-9/+14
	The critical section which requires the VM dma-resv is the call xe_lrc_create in __xe_exec_queue_init. Move this lock to __xe_exec_queue_init holding it just around xe_lrc_create. Not only is good practice, this also fixes a locking double of the VM dma-resv in the error paths of __xe_exec_queue_init as xe_lrc_put tries to acquire this too resulting in a deadlock. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240724152831.1848325-1-matthew.brost@intel.com
2024-07-30	drm/xe: Move and export xe_hw_engine lookup.	Dominik Grzegorzek	1	-33/+4
	Move and export xe_hw_engine lookup. This is in preparation to use this in eudebug code where we want to find active engine. v2: s/tile/gt due to uapi changes (Mika) Signed-off-by: Dominik Grzegorzek <dominik.grzegorzek@intel.com> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240729130152.100130-1-mika.kuoppala@linux.intel.com
2024-07-19	drm/xe: Fix use after free when client stats are captured	Umesh Nerlige Ramappa	1	-1/+9
	xe_file_close triggers an asynchronous queue cleanup and then frees up the xef object. Since queue cleanup flushes all pending jobs and the KMD stores client usage stats into the xef object after jobs are flushed, we see a use-after-free for the xef object. Resolve this by taking a reference to xef from xe_exec_queue. While at it, revert an earlier change that contained a partial work around for this issue. v2: - Take a ref to xef even for the VM bind queue (Matt) - Squash patches relevant to that fix and work around (Lucas) v3: Fix typo (Lucas) Fixes: ce62827bc294 ("drm/xe: Do not access xe file when updating exec queue run_ticks") Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240718210548.3580382-5-umesh.nerlige.ramappa@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-07-03	drm/xe: Add xe_exec_queue_last_fence_test_dep	Matthew Brost	1	-0/+23
	Helpful to determine if a bind can immediately use CPU or needs to be deferred a drm scheduler job. v7: - Better wording in kernel doc (Matthew Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240704041652.272920-4-matthew.brost@intel.com
2024-06-17	drm/xe/exec_queue: Rename xe_exec_queue::compute to xe_exec_queue::lr	Francois Dugast	1	-3/+3
	The properties of this struct are used in long running context so make that clear by renaming it to lr, in alignment with the rest of the code. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240613170348.723245-1-francois.dugast@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-06-07	drm/xe: Drop EXEC_QUEUE_FLAG_BANNED	Matthew Brost	1	-1/+1
	Clean up laying violation of setting q->flags EXEC_QUEUE_FLAG_BANNED bit in GuC backend. Move banned to GuC owned bit and report banned status to upper layers via reset_status vfunc. This is a slight change in behavior as reset_status returns true if wedged or killed bits set too, but in all of these cases submission to queue is no longer allowed. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240604184700.1946918-1-matthew.brost@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-29	drm/xe: Decouple xe_exec_queue and xe_lrc	Niranjana Vishwanathapura	1	-12/+14
	Decouple xe_lrc from xe_exec_queue and reference count xe_lrc. Removing hard coupling between xe_exec_queue and xe_lrc allows flexible design where the user interface xe_exec_queue can be destroyed independent of the hardware/firmware interface xe_lrc. v2: Fix lrc indexing in wq_item_append() Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530032211.29299-1-niranjana.vishwanathapura@intel.com
2024-05-27	drm/xe: Do not access xe file when updating exec queue run_ticks	Umesh Nerlige Ramappa	1	-4/+1
	The current code is running into a use after free case where xe file is closed before the exec queue run_ticks can be updated. This is occurring in the xe_file_close path. To fix that, do not access xe file when updating the exec queue run_ticks. Instead store the exec queue run_ticks locally in the exec queue object and accumulate it when the user dumps the drm client stats. We know that the xe file is valid when user is dumping the run_ticks for the drm client, so this effectively removes the dependency on xe file object in xe_exec_queue_update_run_ticks(). v2: - Fix the accumulation of q->run_ticks delta into xe file run_ticks - s/runtime/run_ticks/ (Rodrigo) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908 Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524234744.1352543-2-umesh.nerlige.ramappa@intel.com