<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdkfd, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-09-18T18:59:24Z</updated>
<entry>
<title>drm/amdkfd: add proper handling for S0ix</title>
<updated>2025-09-18T18:59:24Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-09-17T16:42:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ade36eaa9ac05e4913e9785df19c2cde8f912fb'/>
<id>urn:sha1:2ade36eaa9ac05e4913e9785df19c2cde8f912fb</id>
<content type='text'>
When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Tested-by: David Perry &lt;david.perry@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 4bfa8609934dbf39bbe6e75b4f971469384b50b1)
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: fix p2p links bug in topology</title>
<updated>2025-09-09T16:28:18Z</updated>
<author>
<name>Eric Huang</name>
<email>jinhuieric.huang@amd.com</email>
</author>
<published>2025-08-25T13:50:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ce42a3b581a9db10765eb835840b04dbe7972135'/>
<id>urn:sha1:ce42a3b581a9db10765eb835840b04dbe7972135</id>
<content type='text'>
When creating p2p links, KFD needs to check XGMI link
with two conditions, hive_id and is_sharing_enabled,
but it is missing to check is_sharing_enabled, so add
it to fix the error.

Signed-off-by: Eric Huang &lt;jinhuieric.huang@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 36cc7d13178d901982da7a122c883861d98da624)
</content>
</entry>
<entry>
<title>drm/amdkfd: Destroy KFD debugfs after destroy KFD wq</title>
<updated>2025-08-06T20:52:08Z</updated>
<author>
<name>Amber Lin</name>
<email>Amber.Lin@amd.com</email>
</author>
<published>2025-08-01T00:45:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2e58401a24e7b2d4ec619104e1a76590c1284a4c'/>
<id>urn:sha1:2e58401a24e7b2d4ec619104e1a76590c1284a4c</id>
<content type='text'>
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
    debugfs_remove_recursive(entry-&gt;proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/&lt;pid&gt; while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.

Signed-off-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Reviewed-by: Eric Huang &lt;jinhuieric.huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 0333052d90683d88531558dcfdbf2525cc37c233)
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix checkpoint-restore on multi-xcc</title>
<updated>2025-08-04T19:38:49Z</updated>
<author>
<name>David Yat Sin</name>
<email>David.YatSin@amd.com</email>
</author>
<published>2025-07-16T22:04:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f6c0f3d24478a0792e50a64c2eba9f34d65519f2'/>
<id>urn:sha1:f6c0f3d24478a0792e50a64c2eba9f34d65519f2</id>
<content type='text'>
GPUs with multi-xcc have multiple MQDs per queue. This patch saves and
restores all the MQDs within the partition.

Signed-off-by: David Yat Sin &lt;David.YatSin@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit a578f2a58c3ab38f0643b1b6e7534af860233cb1)
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: enable kfd on LoongArch systems</title>
<updated>2025-07-15T18:07:50Z</updated>
<author>
<name>Han Gao</name>
<email>rabenda.cn@gmail.com</email>
</author>
<published>2025-07-09T06:51:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa301127ba9a22f40b4261f569f1fc8b3d66e04e'/>
<id>urn:sha1:fa301127ba9a22f40b4261f569f1fc8b3d66e04e</id>
<content type='text'>
KFD has been confirmed that can run on LoongArch systems.
It's necessary to support CONFIG_HSA_AMD on LoongArch.

Signed-off-by: Han Gao &lt;rabenda.cn@gmail.com&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/sdma: allow caller to handle kernel rings in engine reset</title>
<updated>2025-07-07T17:48:25Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-06-26T12:58:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0c3c2e334c4fd00ed7a8ddb9c163a8c1138af1f1'/>
<id>urn:sha1:0c3c2e334c4fd00ed7a8ddb9c163a8c1138af1f1</id>
<content type='text'>
Add a parameter to amdgpu_sdma_reset_engine() to let the
caller handle the kernel rings.  This allows the kernel
rings to back up their unprocessed state if the reset comes in
via the drm scheduler rather than KFD.

Reviewed-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Avoid queue reset if disabled</title>
<updated>2025-07-07T17:48:12Z</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2025-06-30T04:37:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=91134e800894fc6992cb0cdadea4cc94fe21b6e2'/>
<id>urn:sha1:91134e800894fc6992cb0cdadea4cc94fe21b6e2</id>
<content type='text'>
If ring reset is disabled, skip resetting queues. Instead, fall back to
device based reset.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'amd-drm-next-6.17-2025-07-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next</title>
<updated>2025-07-04T00:06:29Z</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2025-07-04T00:06:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e2818386aad54ba5ab70e228c555814d33bdad1'/>
<id>urn:sha1:7e2818386aad54ba5ab70e228c555814d33bdad1</id>
<content type='text'>
amd-drm-next-6.17-2025-07-01:

amdgpu:
- FAMS2 fixes
- OLED fixes
- Misc cleanups
- AUX fixes
- DMCUB updates
- SR-IOV hibernation support
- RAS updates
- DP tunneling fixes
- DML2 fixes
- Backlight improvements
- Suspend improvements
- Use scaling for non-native modes on eDP
- SDMA 4.4.x fixes
- PCIe DPM fixes
- SDMA 5.x fixes
- Cleaner shader updates for GC 9.x
- Remove fence slab
- ISP genpd support
- Parition handling rework
- SDMA FW checks for userq support
- Add missing firmware declaration
- Fix leak in amdgpu_ctx_mgr_entity_fini()
- Freesync fix
- Ring reset refactoring
- Legacy dpm verbosity changes

amdkfd:
- GWS fix
- mtype fix for ext coherent system memory
- MMU notifier fix
- gfx7/8 fix

radeon:
- CS validation support for additional GL extensions
- Bump driver version for new CS validation checks

From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://lore.kernel.org/r/20250701194707.32905-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Don't call mmput from MMU notifier callback</title>
<updated>2025-06-30T15:54:02Z</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2025-06-20T22:32:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a29e067bd38946f752b0ef855f3dfff87e77bec7'/>
<id>urn:sha1:a29e067bd38946f752b0ef855f3dfff87e77bec7</id>
<content type='text'>
If the process is exiting, the mmput inside mmu notifier callback from
compactd or fork or numa balancing could release the last reference
of mm struct to call exit_mmap and free_pgtable, this triggers deadlock
with below backtrace.

The deadlock will leak kfd process as mmu notifier release is not called
and cause VRAM leaking.

The fix is to take mm reference mmget_non_zero when adding prange to the
deferred list to pair with mmput in deferred list work.

If prange split and add into pchild list, the pchild work_item.mm is not
used, so remove the mm parameter from svm_range_unmap_split and
svm_range_add_child.

The backtrace of hung task:

 INFO: task python:348105 blocked for more than 64512 seconds.
 Call Trace:
  __schedule+0x1c3/0x550
  schedule+0x46/0xb0
  rwsem_down_write_slowpath+0x24b/0x4c0
  unlink_anon_vmas+0xb1/0x1c0
  free_pgtables+0xa9/0x130
  exit_mmap+0xbc/0x1a0
  mmput+0x5a/0x140
  svm_range_cpu_invalidate_pagetables+0x2b/0x40 [amdgpu]
  mn_itree_invalidate+0x72/0xc0
  __mmu_notifier_invalidate_range_start+0x48/0x60
  try_to_unmap_one+0x10fa/0x1400
  rmap_walk_anon+0x196/0x460
  try_to_unmap+0xbb/0x210
  migrate_page_unmap+0x54d/0x7e0
  migrate_pages_batch+0x1c3/0xae0
  migrate_pages_sync+0x98/0x240
  migrate_pages+0x25c/0x520
  compact_zone+0x29d/0x590
  compact_zone_order+0xb6/0xf0
  try_to_compact_pages+0xbe/0x220
  __alloc_pages_direct_compact+0x96/0x1a0
  __alloc_pages_slowpath+0x410/0x930
  __alloc_pages_nodemask+0x3a9/0x3e0
  do_huge_pmd_anonymous_page+0xd7/0x3e0
  __handle_mm_fault+0x5e3/0x5f0
  handle_mm_fault+0xf7/0x2e0
  hmm_vma_fault.isra.0+0x4d/0xa0
  walk_pmd_range.isra.0+0xa8/0x310
  walk_pud_range+0x167/0x240
  walk_pgd_range+0x55/0x100
  __walk_page_range+0x87/0x90
  walk_page_range+0xf6/0x160
  hmm_range_fault+0x4f/0x90
  amdgpu_hmm_range_get_pages+0x123/0x230 [amdgpu]
  amdgpu_ttm_tt_get_user_pages+0xb1/0x150 [amdgpu]
  init_user_pages+0xb1/0x2a0 [amdgpu]
  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu]
  kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu]
  kfd_ioctl+0x29d/0x500 [amdgpu]

Fixes: fa582c6f3684 ("drm/amdkfd: Use mmget_not_zero in MMU notifier")
Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd: Do not include &lt;linux/export.h&gt; when unused</title>
<updated>2025-06-30T15:53:45Z</updated>
<author>
<name>André Almeida</name>
<email>andrealmeid@igalia.com</email>
</author>
<published>2025-06-13T18:26:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6531fd55f3217b3cb8cdc41611e39ca12e095fe5'/>
<id>urn:sha1:6531fd55f3217b3cb8cdc41611e39ca12e095fe5</id>
<content type='text'>
Fix the following compile time warning when building with W=1:

warning: EXPORT_SYMBOL() is not used, but #include &lt;linux/export.h&gt; is present

Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: André Almeida &lt;andrealmeid@igalia.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
