linux/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v6.0

drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV

2022-09-27T22:03:36Z

- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor during the suspend time. Furthermore, we cannot request a mode 1 reset under SRIOV as VF. Therefore, we will skip it as it is called in suspend_noirq() function. - In the resume code path, we need to send REQ_GPU_INIT to the hypervisor and also resume PSP IP block under SRIOV. Signed-off-by: Bokun Zhang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amdgpu: make sure to init common IP before gmc

2022-09-14T18:21:49Z

Move common IP init before GMC init so that HDP gets remapped before GMC init which uses it. This fixes the Unsupported Request error reported through AER during driver load. The error happens as a write happens to the remap offset before real remapping is done. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: Christian König Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amdgpu: ensure no PCIe peer access for CPU XGMI iolinks

2022-08-30T21:07:43Z

[Why] Devices with CPU XGMI iolink do not support PCIe peer access. Signed-off-by: Alex Sierra Acked-by: Alex Deucher Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: Remove the additional kfd pre reset call for sriov

2022-08-19T21:06:38Z

The additional call is caused by merge conflict Reviewed-by: Felix Kuehling Signed-off-by: shaoyunl Signed-off-by: Alex Deucher

drm/amdgpu: fix hive reference leak when adding xgmi device

2022-08-19T21:06:21Z

Only amdgpu_get_xgmi_hive but no amdgpu_put_xgmi_hive which will leak the hive reference. Signed-off-by: YiPeng Chai Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: Avoid another list of reset devices

2022-08-10T19:07:14Z

A list of devices to be reset is already created in amdgpu_device_gpu_recover function. Creating another list with the same nodes is incorrect and not supported in list_head. Instead, pass the device list as part of reset context. Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2) Signed-off-by: Lijo Lazar Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: move mes self test after drm sched re-started

2022-07-28T20:28:54Z

mes self test rely on vm mapping, move it after drm sched re-started so that vm mapping can work during gpu reset. Signed-off-by: Jack Xiao Acked-and-tested-by: Evan Quan Signed-off-by: Alex Deucher

drm/amdgpu: drop non-necessary call trace dump

2022-07-28T20:28:54Z

This extra call trace dump comes out in every gpu reset. And it gives people a wrong impression that something went wrong. Although actually there was not. Signed-off-by: Evan Quan Acked-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Get rid of amdgpu_job->external_hw_fence

2022-07-18T20:37:25Z

This is a follow-up cleanup to [1]. See bellow refcount balancing for calling amdgpu_job_submit_direct after this cleanup as far as I calculated. amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3 ---> amdgpu_job_submit_direct completes before fence signaled amdgpu_sa_bo_free (*sa_bo)->fence = dma_fence_get(fence) 4 amdgpu_job_free dma_fence_put 3 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 4 dma_fence_put(f); 3 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 2 amdgpu_fence_process dma_fence_put 1 amdgpu_sa_bo_remove_locked dma_fence_put 0 ---> amdgpu_job_submit_direct completes after fence signaled amdgpu_fence_process dma_fence_put 2 amdgpu_job_free dma_fence_put 1 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 2 dma_fence_put(f); 1 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 0 [1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzovsky@amd.com/ Signed-off-by: Andrey Grodzovsky Suggested-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: support reset flag set for gpu reset

2022-07-13T15:25:17Z

Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: Likun Gao Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher