linux/drivers/gpu/drm/amd/amdgpu, branch v5.17

drm/amdgpu: fix suspend/resume hang regression

2022-03-02T23:36:43Z

Regression has been reported that suspend/resume may hang with the previous vm ready check commit. So bring back the evicted list check as a temp fix. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1922 Fixes: c1a66c3bc425 ("drm/amdgpu: check vm ready by amdgpu_vm->evicting flag") Reviewed-by: Christian König Signed-off-by: Qiang Yu Signed-off-by: Alex Deucher

drm/amdgpu: check vm ready by amdgpu_vm->evicting flag

2022-02-23T21:31:06Z

Workstation application ANSA/META v21.1.4 get this error dmesg when running CI test suite provided by ANSA/META: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16) This is caused by: 1. create a 256MB buffer in invisible VRAM 2. CPU map the buffer and access it causes vm_fault and try to move it to visible VRAM 3. force visible VRAM space and traverse all VRAM bos to check if evicting this bo is valuable 4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable() will set amdgpu_vm->evicting, but latter due to not in visible VRAM, won't really evict it so not add it to amdgpu_vm->evicted 5. before next CS to clear the amdgpu_vm->evicting, user VM ops ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted) but fail in amdgpu_vm_bo_update_mapping() (check amdgpu_vm->evicting) and get this error log This error won't affect functionality as next CS will finish the waiting VM ops. But we'd better clear the error log by checking the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling amdgpu_vm_bo_update_mapping() later. Another reason is amdgpu_vm->evicted list holds all BOs (both user buffer and page table), but only page table BOs' eviction prevent VM ops. amdgpu_vm->evicting flag is set only for page table BOs, so we should use evicting flag instead of evicted list in amdgpu_vm_ready(). The side effect of this change is: previously blocked VM op (user buffer in "evicted" list but no page table in it) gets done immediately. v2: update commit comments. Acked-by: Paul Menzel Reviewed-by: Christian König Signed-off-by: Qiang Yu Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amdgpu: bypass tiling flag check in virtual display case (v2)

2022-02-23T21:31:06Z

vkms leverages common amdgpu framebuffer creation, and also as it does not support FB modifier, there is no need to check tiling flags when initing framebuffer when virtual display is enabled. This can fix below calltrace: amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu] v2: check adev->enable_virtual_display instead as vkms can be enabled in bare metal as well. Signed-off-by: Leslie Shi Signed-off-by: Guchun Chen Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

Revert "drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()"

2022-02-23T21:31:06Z

This reverts commit 4046afcebfc3c8c0dd5666c2671b2c192b344f78. No need to support modifier in virtual kms, otherwise, in SRIOV mode, when lanuching X server, set crtc will fail due to mismatch between primary plane modifier and framebuffer modifier. Signed-off-by: Guchun Chen Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: do not enable asic reset for raven2

2022-02-23T21:31:06Z

The GPU reset function of raven2 is not maintained or tested, so it should be very unstable. Now the amdgpu_asic_reset function is added to amdgpu_pmops_suspend, which causes the S3 test of raven2 to fail, so the asic_reset of raven2 is ignored here. Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)") Signed-off-by: Chen Gong Acked-by: Alex Deucher Reviewed-by: Mario Limonciello Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amd: Check if ASPM is enabled from PCIe subsystem

2022-02-23T21:31:06Z

commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") enabled ASPM by default but a variety of hardware configurations it turns out that this caused a regression. * PPC64LE hardware does not support ASPM at a hardware level. CONFIG_PCIEASPM is often disabled on these architectures. * Some dGPUs on ALD platforms don't work with ASPM enabled and PCIe subsystem disables it Check with the PCIe subsystem to see that ASPM has been enabled or not. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://wiki.raptorcs.com/w/images/a/ad/P9_PHB_version1.0_27July2018_pub.pdf Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1723 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1739 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1907 Tested-by: koba.ko@canonical.com Reviewed-by: Alex Deucher Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amdgpu: disable MMHUB PG for Picasso

2022-02-21T22:55:17Z

MMHUB PG needs to be disabled for Picasso for stability reasons. Signed-off-by: Evan Quan Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.

2022-02-14T19:59:46Z

[Why] SDMA ring buffer test failed if suspend is aborted during S0i3 resume. [How] If suspend is aborted for some reason during S0i3 resume cycle, it follows SDMA ring test failing and errors in amdgpu resume. For RN/CZN/Picasso, SMU saves and restores SDMA registers during S0ix cycle. So, skipping SDMA suspend and resume from driver solves the issue. This time, the system is able to resume gracefully even the suspend is aborted. Reviewed-by: Mario Limonciello Reviewed-by: Alex Deucher Signed-off-by: Rajib Mahapatra Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amdgpu: add utcl2_harvest to gc 10.3.1

2022-02-09T20:08:05Z

Confirmed with hardware team, there is harvesting for gc 10.3.1. Signed-off-by: Aaron Liu Reviewed-by: Huang Rui Signed-off-by: Alex Deucher

drm/amdgpu: fix logic inversion in check

2022-02-02T23:35:00Z

We probably never trigger this, but the logic inside the check is inverted. Signed-off-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher