<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdkfd/kfd_queue.c, branch v7.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v7.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v7.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2026-03-30T20:22:44Z</updated>
<entry>
<title>drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size</title>
<updated>2026-03-30T20:22:44Z</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2026-03-23T04:28:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=78746a474e92fc7aaed12219bec7c78ae1bd6156'/>
<id>urn:sha1:78746a474e92fc7aaed12219bec7c78ae1bd6156</id>
<content type='text'>
The control stack size is calculated based on the number of CUs and
waves, and is then aligned to PAGE_SIZE. When the resulting control
stack size is aligned to 64 KB, GPU hangs and queue preemption
failures are observed while running RCCL unit tests on systems with
more than two GPUs.

amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4
amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues

This issue is observed on both 4 KB and 64 KB system page-size
configurations.

This patch fixes the issue by aligning the control stack size to
AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size
will not be 64 KB on systems with a 64 KB page size and queue
preemption works correctly.

Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE,
which can waste memory if the system page size is large. In this patch,
wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated
from wg_data_size and the control stack size, is aligned to PAGE_SIZE.

Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a3e14436304392fbada359edd0f1d1659850c9b7)
</content>
</entry>
<entry>
<title>drm/amdkfd: Align expected_queue_size to PAGE_SIZE</title>
<updated>2026-03-30T20:11:29Z</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2026-03-23T04:28:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=68484a648ade555842e0c75a392f3572b3d51998'/>
<id>urn:sha1:68484a648ade555842e0c75a392f3572b3d51998</id>
<content type='text'>
The AQL queue size can be 4K, but the minimum buffer object (BO)
allocation size is PAGE_SIZE. On systems with a page size larger
than 4K, the expected queue size does not match the allocated BO
size, causing queue creation to fail.

Align the expected queue size to PAGE_SIZE so that it matches the
allocated BO size and allows queue creation to succeed.

Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit b01cd158a2f5230b137396c5f8cda3fc780abbc2)
</content>
</entry>
<entry>
<title>Convert 'alloc_obj' family to use the new default GFP_KERNEL argument</title>
<updated>2026-02-22T01:09:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-22T00:37:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43'/>
<id>urn:sha1:bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43</id>
<content type='text'>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28Z</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>urn:sha1:69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Relax size checking during queue buffer get</title>
<updated>2026-01-14T19:28:48Z</updated>
<author>
<name>Donet Tom</name>
<email>donettom@linux.ibm.com</email>
</author>
<published>2026-01-12T14:06:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=42ea9cf2f16b7131cb7302acb3dac510968f8bdc'/>
<id>urn:sha1:42ea9cf2f16b7131cb7302acb3dac510968f8bdc</id>
<content type='text'>
HW-supported EOP buffer sizes are 4K and 32K. On systems that do not
use 4K pages, the minimum buffer object (BO) allocation size is
PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver
currently checks the allocated BO size against the supported EOP buffer
size. Since the allocated BO is larger than the expected size, this check
fails, preventing queue creation.

Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used.
Only the required 4K region of the buffer will be used as the EOP buffer
and avoids queue creation failures on non-4K page systems.

Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Philip Yang &lt;yangp@amd.com&gt;
Signed-off-by: Donet Tom &lt;donettom@linux.ibm.com&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add metadata ring buffer for compute</title>
<updated>2026-01-05T21:59:56Z</updated>
<author>
<name>David Yat Sin</name>
<email>David.YatSin@amd.com</email>
</author>
<published>2025-03-18T19:49:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c51bb53d5c68041dd02f66d9b638cda33647623e'/>
<id>urn:sha1:c51bb53d5c68041dd02f66d9b638cda33647623e</id>
<content type='text'>
Add support for separate ring-buffer for metadata packets when using
compute queues. Userspace application allocate the metadata ring-buffer
and the queue ring-buffer with a single allocation. The metadata
ring-buffer starts after the queue ring-buffer.

Signed-off-by: David Yat Sin &lt;David.YatSin@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Update CWSR area calculations for GFX 12.1</title>
<updated>2025-12-16T18:27:45Z</updated>
<author>
<name>Mukul Joshi</name>
<email>mukul.joshi@amd.com</email>
</author>
<published>2025-01-10T03:04:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=91689b5a7ce43e3742be6a249c7fc73be26b7d5f'/>
<id>urn:sha1:91689b5a7ce43e3742be6a249c7fc73be26b7d5f</id>
<content type='text'>
Update the SGPR, VGPR, HWREG size and number of waves supported
for GFX 12.1 CWSR memory limits. The CU calculation changed in
topology, as a result, the values need to be updated.

Signed-off-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Reviewed-by: Feifei Xu &lt;Feifei.Xu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: bump minimum vgpr size for gfx1151</title>
<updated>2025-12-08T19:26:06Z</updated>
<author>
<name>Jonathan Kim</name>
<email>jonathan.kim@amd.com</email>
</author>
<published>2025-12-05T19:41:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b42f3bf9536c9b710fd1d4deb7d1b0dc819dc72d'/>
<id>urn:sha1:b42f3bf9536c9b710fd1d4deb7d1b0dc819dc72d</id>
<content type='text'>
GFX1151 has 1.5x the number of available physical VGPRs per SIMD.
Bump total memory availability for acquire checks on queue creation.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: relax checks for over allocation of save area</title>
<updated>2025-11-12T02:54:17Z</updated>
<author>
<name>Jonathan Kim</name>
<email>jonathan.kim@amd.com</email>
</author>
<published>2025-11-06T15:17:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=15bd4958fe38e763bc17b607ba55155254a01f55'/>
<id>urn:sha1:15bd4958fe38e763bc17b607ba55155254a01f55</id>
<content type='text'>
Over allocation of save area is not fatal, only under allocation is.
ROCm has various components that independently claim authority over save
area size.

Unless KFD decides to claim single authority, relax size checks.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Reviewed-by: Philip Yang &lt;philip.yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Drop workaround for GC v9.4.3 revID 0</title>
<updated>2025-04-07T19:18:59Z</updated>
<author>
<name>Apurv Mishra</name>
<email>Apurv.Mishra@amd.com</email>
</author>
<published>2025-03-17T18:00:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=daafa303d19f5522e4c24fbf5c1c981a16df2c2f'/>
<id>urn:sha1:daafa303d19f5522e4c24fbf5c1c981a16df2c2f</id>
<content type='text'>
Remove workaround code for the early engineering
samples GC v9.4.3 SOCs with revID 0

Reviewed-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Signed-off-by: Apurv Mishra &lt;Apurv.Mishra@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
