<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-05-07T21:41:49Z</updated>
<entry>
<title>drm/amdgpu: Implement Runtime Bad Page query for VFs</title>
<updated>2025-05-07T21:41:49Z</updated>
<author>
<name>Ellen Pan</name>
<email>yunru.pan@amd.com</email>
</author>
<published>2025-04-29T20:48:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5da3d8820dd3202d5ae871050e45facb2787235d'/>
<id>urn:sha1:5da3d8820dd3202d5ae871050e45facb2787235d</id>
<content type='text'>
Host will send a notification when new bad pages are available.

Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.

Reviewed-by: Shravan Kumar Gande &lt;Shravankumar.Gande@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Signed-off-by: Ellen Pan &lt;yunru.pan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix CPER error handling on VFs</title>
<updated>2025-04-07T22:00:40Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>Victor.Skvortsov@amd.com</email>
</author>
<published>2025-03-30T18:54:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9fbc338811c8d123f0c05cd699ffa3ae7f08d34'/>
<id>urn:sha1:f9fbc338811c8d123f0c05cd699ffa3ae7f08d34</id>
<content type='text'>
CPER read will loop infinitely if an error is encountered and
the more bit is set. Add error checks to break upon failure.

v2: added function pointer checks

Suggested-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Signed-off-by: Victor Skvortsov &lt;Victor.Skvortsov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: Fix MES init sequence</title>
<updated>2025-03-14T03:16:08Z</updated>
<author>
<name>Shaoyun Liu</name>
<email>shaoyun.liu@amd.com</email>
</author>
<published>2025-03-10T16:38:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f81cd793119e7f4b426a825435d49cc10a081c7a'/>
<id>urn:sha1:f81cd793119e7f4b426a825435d49cc10a081c7a</id>
<content type='text'>
When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu &lt;shaoyun.liu@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add support for CPERs on virtualization</title>
<updated>2025-03-05T15:47:03Z</updated>
<author>
<name>Tony Yi</name>
<email>Tony.Yi@amd.com</email>
</author>
<published>2025-02-26T22:03:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a91d91b6004796b868374394962331a1322da7ab'/>
<id>urn:sha1:a91d91b6004796b868374394962331a1322da7ab</id>
<content type='text'>
Add support for CPERs on VFs.

VFs do not receive PMFW messages directly; as such, they need to
query them from the host. To avoid hitting host event guard,
CPER queries need to be rate limited. CPER queries share the same
RAS telemetry buffer as error count query, so a mutex protecting
the shared buffer was added as well.

For readability, the amdgpu_detect_virtualization was refactored
into multiple individual functions.

Signed-off-by: Tony Yi &lt;Tony.Yi@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV</title>
<updated>2025-02-19T20:14:38Z</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2025-02-14T08:52:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dc0297f3198bd60108ccbd167ee5d9fa4af31ed0'/>
<id>urn:sha1:dc0297f3198bd60108ccbd167ee5d9fa4af31ed0</id>
<content type='text'>
RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[  253.013423] =============================
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE
[  253.013464] -----------------------------
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] ffff9f30542e3cf8 (&amp;adev-&gt;virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  253.013877]  #1: ffffb789c008be40 ((work_completion)(&amp;wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: ffff9f3054281838 (&amp;adev-&gt;gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  &lt;TASK&gt;
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  &lt;/TASK&gt;

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao &lt;lin.cao@amd.com&gt;
Cc: Jingwen Chen &lt;Jingwen.Chen2@amd.com&gt;
Cc: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Cc: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Suggested-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks</title>
<updated>2025-02-13T02:02:59Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2025-01-21T03:00:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04893397766a2b2f1bc7fe5c6414e4c0846ed171'/>
<id>urn:sha1:04893397766a2b2f1bc7fe5c6414e4c0846ed171</id>
<content type='text'>
VFs are not able to query error counts for all RAS blocks. Rather than
returning error for queries on these blocks, skip sysfs the creation
all together.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/admgpu: replace kmalloc() and memcpy() with kmemdup()</title>
<updated>2024-12-18T17:39:08Z</updated>
<author>
<name>Mirsad Todorovac</name>
<email>mtodorovac69@gmail.com</email>
</author>
<published>2024-12-17T22:58:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a21ab06b8c2d8d25c4a83bdf39542834b1f3beae'/>
<id>urn:sha1:a21ab06b8c2d8d25c4a83bdf39542834b1f3beae</id>
<content type='text'>
The static analyser tool gave the following advice:

./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup

 → 1266         tmp = kmalloc(used_size, GFP_KERNEL);
   1267         if (!tmp)
   1268                 return -ENOMEM;
   1269
 → 1270         memcpy(tmp, &amp;host_telemetry-&gt;body.error_count, used_size);

Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics.
Original code works without fault, so this is not a bug fix but proposed improvement.

Link: https://lwn.net/Articles/198928/
Fixes: 84a2947ecc85 ("drm/amdgpu: Implement virt req_ras_err_count")
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "Christian König" &lt;christian.koenig@amd.com&gt;
Cc: Xinhui Pan &lt;Xinhui.Pan@amd.com&gt;
Cc: David Airlie &lt;airlied@gmail.com&gt;
Cc: Simona Vetter &lt;simona@ffwll.ch&gt;
Cc: Zhigang Luo &lt;Zhigang.Luo@amd.com&gt;
Cc: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Cc: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Cc: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Cc: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Cc: Jack Xiao &lt;Jack.Xiao@amd.com&gt;
Cc: Vignesh Chander &lt;Vignesh.Chander@amd.com&gt;
Cc: Danijel Slivka &lt;danijel.slivka@amd.com&gt;
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Mirsad Todorovac &lt;mtodorovac69@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Implement virt req_ras_err_count</title>
<updated>2024-11-11T16:55:42Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-10-30T14:18:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=84a2947ecc85c67f433f2cc2186e54cdb9047b61'/>
<id>urn:sha1:84a2947ecc85c67f433f2cc2186e54cdb9047b61</id>
<content type='text'>
Enable RAS late init  if VF RAS Telemetry is supported.

When enabled, the VF can use this interface to query total
RAS error counts from the host.

The VF FB access may abruptly end due to a fatal error,
therefore the VF must cache and sanitize the input.

The Host allows 15 Telemetry messages every 60 seconds, afterwhich
the host will ignore any more in-coming telemetry messages. The VF will
rate limit its msg calling to once every 5 seconds (12 times in 60 seconds).
While the VF is rate limited, it will continue to report the last
good cached data.

v2: Flip generate report &amp; update statistics order for VF

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Acked-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: VF Query RAS Caps from Host if supported</title>
<updated>2024-11-11T16:55:36Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-10-30T13:58:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=907fec2dfd061ca422d8b121f4af1b6062e098ba'/>
<id>urn:sha1:907fec2dfd061ca422d8b121f4af1b6062e098ba</id>
<content type='text'>
If VF RAS Capability support is enabled, guest is able to
retrieve the real RAS support from the host.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Disable dpm_enabled flag while VF is in reset</title>
<updated>2024-08-13T16:12:52Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-08-08T17:22:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f83cec3b3a7c968bbceb810b7acd1baf3fe8cd87'/>
<id>urn:sha1:f83cec3b3a7c968bbceb810b7acd1baf3fe8cd87</id>
<content type='text'>
VFs do not perform HW fini/suspend in FLR, so the dpm_enabled
is incorrectly kept enabled. Add interface to disable it in
virt_pre_reset call.

v2: Made implementation generic for all asics
v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
