<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c, branch v6.18</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.18</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.18'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-10-13T18:14:28Z</updated>
<entry>
<title>drm/amdkfd: fix suspend/resume all calls in mes based eviction path</title>
<updated>2025-10-13T18:14:28Z</updated>
<author>
<name>Jonathan Kim</name>
<email>jonathan.kim@amd.com</email>
</author>
<published>2025-06-18T14:31:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=079ae5118e1f0dcf5b1ab68ffdb5760b06ed79a2'/>
<id>urn:sha1:079ae5118e1f0dcf5b1ab68ffdb5760b06ed79a2</id>
<content type='text'>
Suspend/resume all gangs should be done with the device lock is held.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;harish.kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix checkpoint-restore on multi-xcc</title>
<updated>2025-08-04T19:38:49Z</updated>
<author>
<name>David Yat Sin</name>
<email>David.YatSin@amd.com</email>
</author>
<published>2025-07-16T22:04:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f6c0f3d24478a0792e50a64c2eba9f34d65519f2'/>
<id>urn:sha1:f6c0f3d24478a0792e50a64c2eba9f34d65519f2</id>
<content type='text'>
GPUs with multi-xcc have multiple MQDs per queue. This patch saves and
restores all the MQDs within the partition.

Signed-off-by: David Yat Sin &lt;David.YatSin@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a578f2a58c3ab38f0643b1b6e7534af860233cb1)
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdgpu/sdma: allow caller to handle kernel rings in engine reset</title>
<updated>2025-07-07T17:48:25Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-06-26T12:58:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0c3c2e334c4fd00ed7a8ddb9c163a8c1138af1f1'/>
<id>urn:sha1:0c3c2e334c4fd00ed7a8ddb9c163a8c1138af1f1</id>
<content type='text'>
Add a parameter to amdgpu_sdma_reset_engine() to let the
caller handle the kernel rings.  This allows the kernel
rings to back up their unprocessed state if the reset comes in
via the drm scheduler rather than KFD.

Reviewed-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Avoid queue reset if disabled</title>
<updated>2025-07-07T17:48:12Z</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2025-06-30T04:37:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=91134e800894fc6992cb0cdadea4cc94fe21b6e2'/>
<id>urn:sha1:91134e800894fc6992cb0cdadea4cc94fe21b6e2</id>
<content type='text'>
If ring reset is disabled, skip resetting queues. Instead, fall back to
device based reset.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: change error to warning message for SDMA queues creation</title>
<updated>2025-05-07T21:41:27Z</updated>
<author>
<name>Eric Huang</name>
<email>jinhuieric.huang@amd.com</email>
</author>
<published>2025-05-02T18:58:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f0be138691d9294760b59e034a9594bc23c2884e'/>
<id>urn:sha1:f0be138691d9294760b59e034a9594bc23c2884e</id>
<content type='text'>
SDMA doesn't support oversubsciption, it is the user matter to create
queues over HW limit, but not supposed to be a KFD error.

Signed-off-by: Eric Huang &lt;jinhuieric.huang@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/sdma: fix engine reset handling</title>
<updated>2025-03-21T16:16:34Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-03-14T23:23:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e02fcf73081b7fc941aa6de007b0239e67688e15'/>
<id>urn:sha1:e02fcf73081b7fc941aa6de007b0239e67688e15</id>
<content type='text'>
Move the kfd suspend/resume code into the caller.  That
is where the KFD is likely to detect a reset so on the KFD
side there is no need to call them.  Also add a mutex to
lock the actual reset sequence.

v2: make the locking per instance

Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+")
Reviewed-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdkfd: Evict all queues even HWS remove queue failed</title>
<updated>2025-03-11T16:36:52Z</updated>
<author>
<name>Yifan Zha</name>
<email>Yifan.Zha@amd.com</email>
</author>
<published>2025-03-05T05:14:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=42c854b8fb0cce512534aa2b7141948e80c6ebb0'/>
<id>urn:sha1:42c854b8fb0cce512534aa2b7141948e80c6ebb0</id>
<content type='text'>
[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue will be failed.
Then remaining queues are not evicted and in active state.

After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted.
So remove queue will be failed again.

[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain-&gt;sem.

v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.

Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Yifan Zha &lt;Yifan.Zha@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add pm_config_dequeue_wait_counts API</title>
<updated>2025-03-10T17:23:12Z</updated>
<author>
<name>Harish Kasiviswanathan</name>
<email>Harish.Kasiviswanathan@amd.com</email>
</author>
<published>2025-02-11T17:56:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ed962f8d0603da15c26f1c9ce60cba42607a2768'/>
<id>urn:sha1:ed962f8d0603da15c26f1c9ce60cba42607a2768</id>
<content type='text'>
Update pm_update_grace_period() to more cleaner
pm_config_dequeue_wait_counts(). Previously, grace_period variable was
overloaded as a variable and a macro, making it inflexible to configure
additional dequeue wait times.

pm_config_dequeue_wait_counts() now takes in a cmd / variable. This
allows flexibility to update different dequeue wait times.

Signed-off-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Reviewed-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add support for more per-process flag</title>
<updated>2025-03-07T20:33:49Z</updated>
<author>
<name>Harish Kasiviswanathan</name>
<email>Harish.Kasiviswanathan@amd.com</email>
</author>
<published>2025-01-14T21:02:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf6d949a409e09539477d32dbe7c954e4852e744'/>
<id>urn:sha1:cf6d949a409e09539477d32dbe7c954e4852e744</id>
<content type='text'>
Add support for more per-process flags starting with option to configure
MFMA precision for gfx 9.5

v2: Change flag name to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
    Remove unused else condition
v3: Bump the KFD API version
v4: Missed SH_MEM_CONFIG__PRECISION_MODE__SHIFT define. Added it.

Signed-off-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Set per-process flags only once cik/vi</title>
<updated>2025-03-07T20:33:49Z</updated>
<author>
<name>Harish Kasiviswanathan</name>
<email>Harish.Kasiviswanathan@amd.com</email>
</author>
<published>2025-01-14T19:07:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=289e68503a4533b014f8447e2af28ad44c92c221'/>
<id>urn:sha1:289e68503a4533b014f8447e2af28ad44c92c221</id>
<content type='text'>
Set per-process static sh_mem config only once during process
initialization. Move all static changes from update_qpd() which is
called each time a queue is created to set_cache_memory_policy() which
is called once during process initialization.

set_cache_memory_policy() is currently defined only for cik and vi
family. So this commit only focuses on these two. A separate commit will
address other asics.

Signed-off-by: Harish Kasiviswanathan &lt;Harish.Kasiviswanathan@amd.com&gt;
Reviewed-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
