<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c, branch v6.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-08-13T16:12:52Z</updated>
<entry>
<title>drm/amdgpu: Disable dpm_enabled flag while VF is in reset</title>
<updated>2024-08-13T16:12:52Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-08-08T17:22:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f83cec3b3a7c968bbceb810b7acd1baf3fe8cd87'/>
<id>urn:sha1:f83cec3b3a7c968bbceb810b7acd1baf3fe8cd87</id>
<content type='text'>
VFs do not perform HW fini/suspend in FLR, so the dpm_enabled
is incorrectly kept enabled. Add interface to disable it in
virt_pre_reset call.

v2: Made implementation generic for all asics
v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/mes: add multiple mes ring instances support</title>
<updated>2024-08-13T14:29:25Z</updated>
<author>
<name>Jack Xiao</name>
<email>Jack.Xiao@amd.com</email>
</author>
<published>2024-08-07T03:53:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7d4355648ffa02a1551495b05c71ea6c884d29c'/>
<id>urn:sha1:c7d4355648ffa02a1551495b05c71ea6c884d29c</id>
<content type='text'>
Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao &lt;Jack.Xiao@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Set no_hw_access when VF request full GPU fails</title>
<updated>2024-07-08T20:46:56Z</updated>
<author>
<name>Yifan Zha</name>
<email>Yifan.Zha@amd.com</email>
</author>
<published>2024-06-27T07:06:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=33f23fc3155b13c4a96d94a0a22dc26db767440b'/>
<id>urn:sha1:33f23fc3155b13c4a96d94a0a22dc26db767440b</id>
<content type='text'>
[Why]
If VF request full GPU access and the request failed,
the VF driver can get stuck accessing registers for an extended period during
the unload of KMS.

[How]
Set no_hw_access flag when VF request for full GPU access fails
This prevents further hardware access attempts, avoiding the prolonged
stuck state.

Signed-off-by: Yifan Zha &lt;Yifan.Zha@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: process RAS fatal error MB notification</title>
<updated>2024-06-27T21:31:37Z</updated>
<author>
<name>Vignesh Chander</name>
<email>Vignesh.Chander@amd.com</email>
</author>
<published>2024-06-24T21:44:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cbda2758d8bfae323b846210a3e52f0ad5fe7164'/>
<id>urn:sha1:cbda2758d8bfae323b846210a3e52f0ad5fe7164</id>
<content type='text'>
For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset completion message first
to avoid accidentally spamming multiple reset requests to host.

v2: add another mailbox check for handling case where kfd detects
timeout first

v3: set host_flr bit and use wait_for_reset

Signed-off-by: Vignesh Chander &lt;Vignesh.Chander@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;Zhigang.Luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix sriov host flr handler</title>
<updated>2024-06-14T20:15:58Z</updated>
<author>
<name>Yunxiang Li</name>
<email>Yunxiang.Li@amd.com</email>
</author>
<published>2024-05-24T20:22:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5c0a1cdd17ce9eb315102c65084af899622ed268'/>
<id>urn:sha1:5c0a1cdd17ce9eb315102c65084af899622ed268</id>
<content type='text'>
We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.

In the current state since we take tens of seconds to stop everything,
it is very likely that host would give up waiting and reset the GPU
before we send ready, so it would be the same as before. But this gets
rid of the hack with reset_domain locking and also let us tell how slow
ready to reset actually is from the host. The ready to reset speed can
be improved later.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add skip_hw_access checks for sriov</title>
<updated>2024-06-14T20:15:58Z</updated>
<author>
<name>Yunxiang Li</name>
<email>Yunxiang.Li@amd.com</email>
</author>
<published>2024-05-24T20:14:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3948ad1ac582f560e1f3aeaecf384619921c48d'/>
<id>urn:sha1:b3948ad1ac582f560e1f3aeaecf384619921c48d</id>
<content type='text'>
Accessing registers via host is missing the check for skip_hw_access and
the lockdep check that comes with it.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix failure mapping legacy queue when FLR</title>
<updated>2024-06-05T15:25:14Z</updated>
<author>
<name>Lin.Cao</name>
<email>lincao12@amd.com</email>
</author>
<published>2024-05-31T06:02:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c8ad1bbbc2751063c7a5825911e58996ef849628'/>
<id>urn:sha1:c8ad1bbbc2751063c7a5825911e58996ef849628</id>
<content type='text'>
Flag "mes.ring.shced.ready" will be set as true after mes hw init and set
as false when mes hw fini to avoid duplicate initialization. But hw fini
will not be called when function level reset, which will cause mes hw
init be skipped during FLR, which will leads to mapping legacy queue
fail. Set this flag as false when post reset will fix this issue.

Signed-off-by: Lin.Cao &lt;lincao12@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Add lock around VF RLCG interface</title>
<updated>2024-05-29T18:48:30Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-05-27T20:10:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e864180ee49b4d30e640fd1e1d852b86411420c9'/>
<id>urn:sha1:e864180ee49b4d30e640fd1e1d852b86411420c9</id>
<content type='text'>
flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.

Both of these threads access registers through the VF
RLCG interface during VF Full Access. Add a lock around this interface
to prevent race conditions between these threads.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Queue KFD reset workitem in VF FED</title>
<updated>2024-05-20T20:20:25Z</updated>
<author>
<name>Victor Skvortsov</name>
<email>victor.skvortsov@amd.com</email>
</author>
<published>2024-05-19T14:39:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5434bc03f52de2ec57d6ce684b1853928f508cbc'/>
<id>urn:sha1:5434bc03f52de2ec57d6ce684b1853928f508cbc</id>
<content type='text'>
The guest recovery sequence is buggy in Fatal Error when both
FLR &amp; KFD reset workitems are queued at the same time. In addition,
FLR guest recovery sequence is out of order when PF/VF communication
breaks due to a GPU fatal error

As a temporary work around, perform a KFD style reset (Initiate reset
request from the guest) inside the pf2vf thread on FED.

Signed-off-by: Victor Skvortsov &lt;victor.skvortsov@amd.com&gt;
Reviewed-by: Zhigang Luo &lt;zhigang.luo@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Fix two reset triggered in a row</title>
<updated>2024-05-02T19:40:44Z</updated>
<author>
<name>Yunxiang Li</name>
<email>Yunxiang.Li@amd.com</email>
</author>
<published>2024-04-22T18:59:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f4322b9f8ad5f9f62add288c785d2e10bb6a5efe'/>
<id>urn:sha1:f4322b9f8ad5f9f62add288c785d2e10bb6a5efe</id>
<content type='text'>
Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.

Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.

Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Signed-off-by: Yunxiang Li &lt;Yunxiang.Li@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
