linux/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c, branch v5.17

drm/amdgpu: add dummy event6 for vega10

2022-01-07T22:19:34Z

[why] Malicious mailbox event1 fails driver loading on vega10. A dummy event6 prevent driver from taking response from malicious event1 as its own. [how] On vega10, send a mailbox event6 before sending event1. Signed-off-by: James Yao Reviewed-by: Jingwen Chen Signed-off-by: Alex Deucher

drm/amdgpu: SRIOV flr_work should use down_write

2021-12-14T21:09:02Z

Host initiated VF FLR may fail if someone else is already holding a read_lock. Change from down_write_trylock to down_write to guarantee the reset goes through. Signed-off-by: Victor Skvortsov Reviewed by: Shaoyun.liu Signed-off-by: Alex Deucher

drm/amd/amdgpu: Add ready_to_reset resp for vega10

2021-08-30T18:59:33Z

Send response to host after received the flr notification from host. Port NV change to vega10. Signed-off-by: YuBiao Wang Reviewed-by: Jingwen Chen Signed-off-by: Alex Deucher

drm/amdgpu: SRIOV flr_work should take write_lock

2021-07-13T15:48:09Z

[Why] If flr_work takes read_lock, then other threads who takes read_lock can access hardware when host is doing vf flr. [How] flr_work should take write_lock to avoid this case. Signed-off-by: Jingwen Chen Reviewed-by: Monk Liu Signed-off-by: Alex Deucher

drm/amdgpu/sriov Stop data exchange for wholegpu reset

2021-01-14T04:47:39Z

[Why] When host trigger a whole gpu reset, guest will keep waiting till host finish reset. But there's a work queue in guest exchanging data between vf&pf which need to access frame buffer. During whole gpu reset, frame buffer is not accessable, and this causes the call trace. [How] After vf get reset notification from pf, stop data exchange. Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang Reviewed-by: Monk Liu Signed-off-by: Alex Deucher

drm/amdgpu/SRIOV: Extend VF reset request wait period

2020-12-15T16:35:35Z

In Virtualization case, when one VF is sending too many FLR requests, hypervisor would stop responding to this VF's request for a long period of time. This is called event guard. During this period of cooling time, guest driver should wait instead of doing other things. After this period of time, guest driver would resume reset process and return to normal. Currently, guest driver would wait 12 seconds and return fail if it doesn't get response from host. Solution: extend this waiting time in guest driver and poll response periodically. Poll happens every 6 seconds and it will last for 60 seconds. v2: change the max repetition times from number to macro. Signed-off-by: Jiange Zhao Acked-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: Do gpu recovery when no job is running

2020-09-15T21:24:18Z

In function flr_work, we should do gpu recovery when no job is running. Fix the logic by inverting it. v2: modify the description Reviewed-by: Christian König Signed-off-by: Liu ChengZhe Signed-off-by: Alex Deucher

drm/amdgpu: change reset lock from mutex to rw_semaphore

2020-08-24T16:23:48Z

clients don't need reset-lock for synchronization when no GPU recovery. v2: change to return the return value of down_read_killable. v3: if GPU recovery begin, VF ignore FLR notification. Reviewed-by: Monk Liu Acked-by: Christian König Signed-off-by: Dennis Li Signed-off-by: Alex Deucher

drm/amdgpu: refine codes to avoid reentering GPU recovery

2020-08-24T16:22:56Z

if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: Hawking Zhang Signed-off-by: Dennis Li Signed-off-by: Alex Deucher

drm/amdgpu: Fix repeatly flr issue

2020-08-18T22:22:02Z

Only for no job running test case need to do recover in flr notification. For having job in mirror list, then let guest driver to hit job timeout, and then do recover. Signed-off-by: jqdeng Acked-by: Nirmoy Das Signed-off-by: Alex Deucher