linux/drivers/gpu/drm/amd/amdkfd/kfd_queue.c, branch v7.0

drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size

2026-03-30T20:22:44Z

The control stack size is calculated based on the number of CUs and waves, and is then aligned to PAGE_SIZE. When the resulting control stack size is aligned to 64 KB, GPU hangs and queue preemption failures are observed while running RCCL unit tests on systems with more than two GPUs. amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4 amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues This issue is observed on both 4 KB and 64 KB system page-size configurations. This patch fixes the issue by aligning the control stack size to AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size will not be 64 KB on systems with a 64 KB page size and queue preemption works correctly. Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE, which can waste memory if the system page size is large. In this patch, wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated from wg_data_size and the control stack size, is aligned to PAGE_SIZE. Reviewed-by: Felix Kuehling Signed-off-by: Donet Tom Signed-off-by: Alex Deucher (cherry picked from commit a3e14436304392fbada359edd0f1d1659850c9b7)

drm/amdkfd: Align expected_queue_size to PAGE_SIZE

2026-03-30T20:11:29Z

The AQL queue size can be 4K, but the minimum buffer object (BO) allocation size is PAGE_SIZE. On systems with a page size larger than 4K, the expected queue size does not match the allocated BO size, causing queue creation to fail. Align the expected queue size to PAGE_SIZE so that it matches the allocated BO size and allows queue creation to succeed. Reviewed-by: Felix Kuehling Signed-off-by: Donet Tom Signed-off-by: Alex Deucher (cherry picked from commit b01cd158a2f5230b137396c5f8cda3fc780abbc2)

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51Z

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28Z

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

drm/amdkfd: Relax size checking during queue buffer get

2026-01-14T19:28:48Z

HW-supported EOP buffer sizes are 4K and 32K. On systems that do not use 4K pages, the minimum buffer object (BO) allocation size is PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver currently checks the allocated BO size against the supported EOP buffer size. Since the allocated BO is larger than the expected size, this check fails, preventing queue creation. Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used. Only the required 4K region of the buffer will be used as the EOP buffer and avoids queue creation failures on non-4K page systems. Acked-by: Christian König Suggested-by: Philip Yang Signed-off-by: Donet Tom Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Add metadata ring buffer for compute

2026-01-05T21:59:56Z

Add support for separate ring-buffer for metadata packets when using compute queues. Userspace application allocate the metadata ring-buffer and the queue ring-buffer with a single allocation. The metadata ring-buffer starts after the queue ring-buffer. Signed-off-by: David Yat Sin Reviewed-by: Philip Yang Signed-off-by: Alex Deucher

drm/amdkfd: Update CWSR area calculations for GFX 12.1

2025-12-16T18:27:45Z

Update the SGPR, VGPR, HWREG size and number of waves supported for GFX 12.1 CWSR memory limits. The CU calculation changed in topology, as a result, the values need to be updated. Signed-off-by: Mukul Joshi Reviewed-by: Feifei Xu Signed-off-by: Alex Deucher

drm/amdkfd: bump minimum vgpr size for gfx1151

2025-12-08T19:26:06Z

GFX1151 has 1.5x the number of available physical VGPRs per SIMD. Bump total memory availability for acquire checks on queue creation. Signed-off-by: Jonathan Kim Reviewed-by: Mario Limonciello Signed-off-by: Alex Deucher

drm/amdkfd: relax checks for over allocation of save area

2025-11-12T02:54:17Z

Over allocation of save area is not fatal, only under allocation is. ROCm has various components that independently claim authority over save area size. Unless KFD decides to claim single authority, relax size checks. Signed-off-by: Jonathan Kim Reviewed-by: Philip Yang Signed-off-by: Alex Deucher

drm/amdkfd: Drop workaround for GC v9.4.3 revID 0

2025-04-07T19:18:59Z

Remove workaround code for the early engineering samples GC v9.4.3 SOCs with revID 0 Reviewed-by: Amber Lin Signed-off-by: Apurv Mishra Signed-off-by: Alex Deucher