<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-04-18T21:14:13Z</updated>
<entry>
<title>drm/amdgpu: Fix desktop freezed after gpu-reset</title>
<updated>2023-04-18T21:14:13Z</updated>
<author>
<name>Alan Liu</name>
<email>HaoPing.Liu@amd.com</email>
</author>
<published>2023-04-14T10:39:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c8b5a95b570949536a2b75cd8fc4f1de0bc60629'/>
<id>urn:sha1:c8b5a95b570949536a2b75cd8fc4f1de0bc60629</id>
<content type='text'>
[Why]
After gpu-reset, sometimes the driver fails to enable vblank irq,
causing flip_done timed out and the desktop freezed.

During gpu-reset, we disable and enable vblank irq in dm_suspend() and
dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we check
irqs' refcount and decide to enable or disable the irqs again.

However, we have 2 sets of API for controling vblank irq, one is
dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
its own refcount and flag to store the state of vblank irq, and they
are not synchronized.

In drm we use the first API to control vblank irq but in
amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.

The failure happens when vblank irq was enabled by dm_vblank_get()
before gpu-reset, we have vblank-&gt;enabled true. However, during
gpu-reset, in amdgpu_irq_gpu_reset_resume_helper() vblank irq's state
checked from amdgpu_irq_update() is DISABLED. So finally it disables
vblank irq again. After gpu-reset, if there is a cursor plane commit,
the driver will try to enable vblank irq by calling drm_vblank_enable(),
but the vblank-&gt;enabled is still true, so it fails to turn on vblank
irq and causes flip_done can't be completed in vblank irq handler and
desktop become freezed.

[How]
Combining the 2 vblank control APIs by letting drm's API finally calls
amdgpu_irq's API, so the irq's refcount and state of both APIs can be
synchronized. Also add a check to prevent refcount from being less then
0 in amdgpu_irq_put().

v2:
- Add warning in amdgpu_irq_enable() if the irq is already disabled.
- Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change
  if it is in gpu-reset.

v3:
- Improve commit message and code comments.

Signed-off-by: Alan Liu &lt;HaoPing.Liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdgpu: allow more APUs to do mode2 reset when go to S4</title>
<updated>2023-03-30T15:23:58Z</updated>
<author>
<name>Tim Huang</name>
<email>tim.huang@amd.com</email>
</author>
<published>2023-03-30T02:33:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2fec9dc8e0acc3dfb56d1389151bcf405f087b10'/>
<id>urn:sha1:2fec9dc8e0acc3dfb56d1389151bcf405f087b10</id>
<content type='text'>
Skip mode2 reset only for IMU enabled APUs when do S4.

This patch is to fix the regression issue
https://gitlab.freedesktop.org/drm/amd/-/issues/2483
It is generated by commit b589626674de ("drm/amdgpu: skip ASIC reset
for APUs when go to S4").

Fixes: b589626674de ("drm/amdgpu: skip ASIC reset for APUs when go to S4")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2483
Tested-by:  Yuan  Perry &lt;Perry.Yuan@amd.com&gt;
Signed-off-by: Tim Huang &lt;tim.huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.1.x
</content>
</entry>
<entry>
<title>drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi</title>
<updated>2023-03-23T13:34:35Z</updated>
<author>
<name>Kai-Heng Feng</name>
<email>kai.heng.feng@canonical.com</email>
</author>
<published>2023-03-15T12:07:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2b072442f4962231a8516485012bb2d2551ef2fe'/>
<id>urn:sha1:2b072442f4962231a8516485012bb2d2551ef2fe</id>
<content type='text'>
S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is
caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default").

The root cause is still not clear for now.

So extend and apply the ASPM quirk from commit e02fe3bc7aba
("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to
workaround the issue on Navi cards too.

Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Kai-Heng Feng &lt;kai.heng.feng@canonical.com&gt;
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdgpu/gfx: set cg flags to enter/exit safe mode</title>
<updated>2023-03-23T13:32:01Z</updated>
<author>
<name>Jane Jian</name>
<email>Jane.Jian@amd.com</email>
</author>
<published>2023-03-15T10:59:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e06bfcc1a1c41bcb8c31470d437e147ce9f0acfd'/>
<id>urn:sha1:e06bfcc1a1c41bcb8c31470d437e147ce9f0acfd</id>
<content type='text'>
sriov needs to enter/exit safe mode in update umd p state
add the cg flag to let it enter or exit while needed

Signed-off-by: Jane Jian &lt;Jane.Jian@amd.com&gt;
Reviewed-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs</title>
<updated>2023-03-23T13:31:30Z</updated>
<author>
<name>YuBiao Wang</name>
<email>YuBiao.Wang@amd.com</email>
</author>
<published>2023-03-16T03:30:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=033c56474acf567a450f8bafca50e0b610f2b716'/>
<id>urn:sha1:033c56474acf567a450f8bafca50e0b610f2b716</id>
<content type='text'>
[Why]
For engines not supporting soft reset, i.e. VCN, there will be a failed
ib test before mode 1 reset during asic reset. The fences in this case
are never signaled and next time when we try to free the sa_bo, kernel
will hang.

[How]
During pre_asic_reset, driver will clear job fences and afterwards the
fences' refcount will be reduced to 1. For drm_sched_jobs it will be
released in job_free_cb, and for non-sched jobs like ib_test, it's meant
to be released in sa_bo_free but only when the fences are signaled. So
we have to force signal the non_sched bad job's fence during
pre_asic_reset or the clear is not complete.

Signed-off-by: YuBiao Wang &lt;YuBiao.Wang@amd.com&gt;
Acked-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: add mes resume when do gfx post soft reset</title>
<updated>2023-03-23T13:30:59Z</updated>
<author>
<name>Tong Liu01</name>
<email>Tong.Liu01@amd.com</email>
</author>
<published>2023-03-15T07:24:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4eb0b49a0ad3e004a6a65b84efe37bc7e66d560f'/>
<id>urn:sha1:4eb0b49a0ad3e004a6a65b84efe37bc7e66d560f</id>
<content type='text'>
[why]
when gfx do soft reset, mes will also do reset, if mes is not
resumed when do recover from soft reset, mes is unable to respond
in later sequence

[how]
resume mes when do gfx post soft reset

Signed-off-by: Tong Liu01 &lt;Tong.Liu01@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: skip ASIC reset for APUs when go to S4</title>
<updated>2023-03-23T13:29:55Z</updated>
<author>
<name>Tim Huang</name>
<email>tim.huang@amd.com</email>
</author>
<published>2023-03-09T08:27:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b589626674de94d977e81c99bf7905872b991197'/>
<id>urn:sha1:b589626674de94d977e81c99bf7905872b991197</id>
<content type='text'>
For GC IP v11.0.4/11, PSP TMR need to be reserved
for ASIC mode2 reset. But for S4, when psp suspend,
it will destroy the TMR that fails the ASIC reset.

[  96.006101] amdgpu 0000:62:00.0: amdgpu: MODE2 reset
[  100.409717] amdgpu 0000:62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000011 SMN_C2PMSG_82:0x00000002
[  100.411593] amdgpu 0000:62:00.0: amdgpu: Mode2 reset failed!
[  100.412470] amdgpu 0000:62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62
[  100.414020] amdgpu 0000:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62
[  100.415311] amdgpu 0000:62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs
[  100.416608] amdgpu 0000:62:00.0: PM: failed to freeze async: error -62

We can skip the reset on APUs, assuming we can resume them
properly. Verified on some GFX11, GFX10 and old GFX9 APUs.

Signed-off-by: Tim Huang &lt;tim.huang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.1.x
</content>
</entry>
<entry>
<title>drm/amdgpu: reposition the gpu reset checking for reuse</title>
<updated>2023-03-23T13:29:32Z</updated>
<author>
<name>Tim Huang</name>
<email>tim.huang@amd.com</email>
</author>
<published>2023-03-15T07:52:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aaee0ce460b954e08b6e630d7e54b2abb672feb8'/>
<id>urn:sha1:aaee0ce460b954e08b6e630d7e54b2abb672feb8</id>
<content type='text'>
Move the amdgpu_acpi_should_gpu_reset out of
CONFIG_SUSPEND to share it with hibernate case.

Signed-off-by: Tim Huang &lt;tim.huang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org # 6.1.x
</content>
</entry>
<entry>
<title>drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes</title>
<updated>2023-03-14T14:40:48Z</updated>
<author>
<name>Guilherme G. Piccoli</name>
<email>gpiccoli@igalia.com</email>
</author>
<published>2023-03-12T16:51:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=542a56e8eb4467ae654eefab31ff194569db39cd'/>
<id>urn:sha1:542a56e8eb4467ae654eefab31ff194569db39cd</id>
<content type='text'>
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:

[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block &lt;vcn_v3_0&gt; failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]

Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: stable@vger.kernel.org
Cc: James Zhu &lt;James.Zhu@amd.com&gt;
Cc: Leo Liu &lt;leo.liu@amd.com&gt;
Signed-off-by: Guilherme G. Piccoli &lt;gpiccoli@igalia.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu/nv: fix codec array for SR_IOV</title>
<updated>2023-03-14T14:40:41Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2023-03-09T03:45:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=45aa07fa832412f1de99194f37fd847915d7e0f6'/>
<id>urn:sha1:45aa07fa832412f1de99194f37fd847915d7e0f6</id>
<content type='text'>
Copy paste error.

Fixes: 384334120b66 ("drm/amdgpu/nv: don't expose AV1 if VCN0 is harvested")
Reported-by: Abaci Robot &lt;abaci@linux.alibaba.com&gt;
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4454
Cc: Jiapeng Chong &lt;jiapeng.chong@linux.alibaba.com&gt;
Acked-by: Guchun Chen &lt;guchun.chen@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
