linux/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c, branch v6.8

drm/amdkfd: update SIMD distribution algo for GFXIP 9.4.2 onwards

2024-02-15T19:18:44Z

In certain cooperative group dispatch scenarios the default SPI resource allocation may cause reduced per-CU workgroup occupancy. Set COMPUTE_RESOURCE_LIMITS.FORCE_SIMD_DIST=1 to mitigate soft hang scenarions. Reviewed-by: Felix Kuehling Suggested-by: Joseph Greathouse Signed-off-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher

drm/amdkfd: only flush mes process context if mes support is there

2023-12-15T17:15:13Z

Fix up on mes process context flush to prevent non-mes devices from spamming error messages or running into undefined behaviour during process termination. Fixes: bd33bb1409b4 ("drm/amdkfd: fix mes set shader debugger process management") Signed-off-by: Jonathan Kim Reviewed-by: Eric Huang Signed-off-by: Alex Deucher

drm/amdkfd: fix mes set shader debugger process management

2023-12-13T21:07:43Z

MES provides the driver a call to explicitly flush stale process memory within the MES to avoid a race condition that results in a fatal memory violation. When SET_SHADER_DEBUGGER is called, the driver passes a memory address that represents a process context address MES uses to keep track of future per-process calls. Normally, MES will purge its process context list when the last queue has been removed. The driver, however, can call SET_SHADER_DEBUGGER regardless of whether a queue has been added or not. If SET_SHADER_DEBUGGER has been called with no queues as the last call prior to process termination, the passed process context address will still reside within MES. On a new process call to SET_SHADER_DEBUGGER, the driver may end up passing an identical process context address value (based on per-process gpu memory address) to MES but is now pointing to a new allocated buffer object during KFD process creation. Since the MES is unaware of this, access of the passed address points to the stale object within MES and triggers a fatal memory violation. The solution is for KFD to explicitly flush the process context address from MES on process termination. Note that the flush call and the MES debugger calls use the same MES interface but are separated as KFD calls to avoid conflicting with each other. Signed-off-by: Jonathan Kim Tested-by: Alice Wong Reviewed-by: Eric Huang Signed-off-by: Alex Deucher

drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit

2023-11-29T21:49:23Z

[Why] Memory leaks of gang_ctx_bo and wptr_bo. [How] Free gang_ctx_bo and wptr_bo in pqm_uninit. v2: add a common function pqm_clean_queue_resource to free queue's resources. v3: reset pdd->pqd.num_gws when destorying GWS queue. Reviewed-by: Felix Kuehling Signed-off-by: ZhenGuo Yin Signed-off-by: Alex Deucher

drm/amdkfd: get doorbell's absolute offset based on the db_size

2023-10-09T21:02:34Z

Here, Adding db_size in byte to find the doorbell's absolute offset for both 32-bit and 64-bit doorbell sizes. So that doorbell offset will be aligned based on the doorbell size. v2: - Addressed the review comment from Felix. v3: - Adding doorbell_size as parameter to get db absolute offset. v4: Squash the two patches into one. Cc: Christian Koenig Cc: Alex Deucher Reviewed-by: Felix Kuehling Signed-off-by: Shashank Sharma Signed-off-by: Arvind Yadav Signed-off-by: Alex Deucher

drm/amdgpu: use doorbell mgr for kfd process doorbells

2023-08-07T21:14:07Z

This patch: - adds a doorbell object in kfd pdd structure. - allocates doorbells for a process while creating its queue. - frees the doorbells with pdd destroy. - moves doorbell bitmap init function to kfd_doorbell.c PS: This patch ensures that we don't break the existing KFD functionality, but now KFD userspace library should also create doorbell pages as AMDGPU GEM objects using libdrm functions in userspace. The reference code for the same is available with AMDGPU Usermode queue libdrm MR. Once this is done, we will not need to create process doorbells in kernel. V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel instead (Alex). - Do not use custom doorbell structure, instead use separate variables for bo and doorbell_bitmap (Alex) V3: - Do not allocate doorbell page with PDD, delay doorbell process page allocation until really needed (Felix) Cc: Alex Deucher Cc: Christian Koenig Cc: Felix Kuehling Acked-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Shashank Sharma Signed-off-by: Alex Deucher

drm/amdkfd: enable cooperative groups for gfx11

2023-07-25T17:35:43Z

MES can concurrently schedule queues on the device that require exclusive device access if marked exclusively_scheduled without the requirement of GWS. Similar to the F32 HWS, MES will manage quality of service for these queues. Use this for cooperative groups since cooperative groups are device occupancy limited. Since some GFX11 devices can only be debugged with partial CUs, do not allow the debugging of cooperative groups on these devices as the CU occupancy limit will change on attach. In addition, zero initialize the MES add queue submission vector for MES initialization tests as we do not want these to be cooperative dispatches. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Enable GWS on GFX9.4.3

2023-06-30T17:06:49Z

Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: add debug queue snapshot operation

2023-06-09T16:36:57Z

Allow the debugger to get a snapshot of a specified number of queues containing various queue property information that is copied to the debugger. Since the debugger doesn't know how many queues exist at any given time, allow the debugger to pass the requested number of snapshots as 0 to get the actual number of potential snapshots to use for a subsequent snapshot request for actual information. To prevent future ABI breakage, pass in the requested entry_size. The KFD will return it's own entry_size in case the debugger still wants log the information in a core dump on sizing failure. Also allow the debugger to clear exceptions when doing a snapshot. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: add debug suspend and resume process queues operation

2023-06-09T16:36:43Z

In order to inspect waves from the saved context at any point during a debug session, the debugger must be able to preempt queues to trigger context save by suspending them. On queue suspend, the KFD will copy the context save header information so that the debugger can correctly crawl the appropriate size of the saved context. The debugger must then also be allowed to resume suspended queues. A queue that is newly created cannot be suspended because queue ids are recycled after destruction so the debugger needs to know that this has occurred. Query functions will be later added that will clear a given queue of its new queue status. A queue cannot be destroyed while it is suspended to preserve its saved context during debugger inspection. Have queue destruction block while a queue is suspended and unblocked when it is resumed. Likewise, if a queue is about to be destroyed, it cannot be suspended. Return the number of queues successfully suspended or resumed along with a per queue status array where the upper bits per queue status show that the request was invalid (new/destroyed queue suspend request, missing queue) or an error occurred (HWS in a fatal state so it can't suspend or resume queues). Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher