linux/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c, branch v5.19

drm/amdgpu: assign the cpu/gpu address of fence from ring

2022-05-04T14:03:31Z

assign the cpu/gpu address of fence for the normal or mes ring from ring structure. Signed-off-by: Jack Xiao Acked-by: Christian König Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher

drm/amdgpu: Move scheduler init to after XGMI is ready

2022-02-09T17:15:04Z

Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Link: https://www.spinics.net/lists/amd-gfx/msg74112.html

Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)"

2022-01-09T18:32:16Z

This reverts commit f7d6779df642720e22bffd449e683bb8690bd3bf. This bisected regression has impacted suspend-resume stability since 5.15-rc1. It regressed -stable via 5.14.10. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215315 Fixes: f7d6779df64 ("drm/amdgpu: stop scheduler when calling hw_fini (v2)") Cc: Guchun Chen Cc: Andrey Grodzovsky Cc: Christian Koenig Cc: Alex Deucher Cc: # 5.14+ Signed-off-by: Len Brown Signed-off-by: Linus Torvalds

drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence

2021-12-17T17:47:28Z

The job embedded fence donesn't initialize the flags at dma_fence_init(). Then we will go a wrong way in amdgpu_fence_get_timeline_name callback and trigger a null pointer panic once we enabled the trace event here. So introduce new amdgpu_fence object to indicate the job embedded fence. [ 156.131790] BUG: kernel NULL pointer dereference, address: 00000000000002a0 [ 156.131804] #PF: supervisor read access in kernel mode [ 156.131811] #PF: error_code(0x0000) - not-present page [ 156.131817] PGD 0 P4D 0 [ 156.131824] Oops: 0000 [#1] PREEMPT SMP PTI [ 156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G OE 5.16.0-rc1-custom #1 [ 156.131842] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016 [ 156.131848] RIP: 0010:strlen+0x0/0x20 [ 156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 156.131872] RSP: 0018:ffff9bd0018dbcf8 EFLAGS: 00010206 [ 156.131880] RAX: 00000000000002a0 RBX: ffff8d0305ef01b0 RCX: 000000000000000b [ 156.131888] RDX: ffff8d03772ab924 RSI: ffff8d0305ef01b0 RDI: 00000000000002a0 [ 156.131895] RBP: ffff9bd0018dbd60 R08: ffff8d03002094d0 R09: 0000000000000000 [ 156.131901] R10: 000000000000005e R11: 0000000000000065 R12: ffff8d03002094d0 [ 156.131907] R13: 000000000000001f R14: 0000000000070018 R15: 0000000000000007 [ 156.131914] FS: 0000000000000000(0000) GS:ffff8d062ed80000(0000) knlGS:0000000000000000 [ 156.131923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.131929] CR2: 00000000000002a0 CR3: 000000001120a005 CR4: 00000000003706e0 [ 156.131937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 156.131942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 156.131949] Call Trace: [ 156.131953] [ 156.131957] ? trace_event_raw_event_dma_fence+0xcc/0x200 [ 156.131973] ? ring_buffer_unlock_commit+0x23/0x130 [ 156.131982] dma_fence_init+0x92/0xb0 [ 156.131993] amdgpu_fence_emit+0x10d/0x2b0 [amdgpu] [ 156.132302] amdgpu_ib_schedule+0x2f9/0x580 [amdgpu] [ 156.132586] amdgpu_job_run+0xed/0x220 [amdgpu] v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot) Signed-off-by: Huang Rui Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: use adev_to_drm for consistency when accessing drm_device

2021-10-08T17:22:13Z

adev_to_drm is used everywhere, so improve recent changes when accessing drm_device pointer from amdgpu_device. Signed-off-by: Guchun Chen Acked-by: Christian König Signed-off-by: Alex Deucher

Merge drm/drm-next into drm-misc-next

2021-09-14T07:25:30Z

Kickstart new drm-misc-next cycle. Signed-off-by: Maxime Ripard

dma-buf: nuke DMA_FENCE_TRACE macros v2

2021-09-02T10:40:52Z

Only the DRM GPU scheduler, radeon and amdgpu where using them and they depend on a non existing config option to actually emit some code. v2: keep the signal path as is for now Signed-off-by: Christian König Acked-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20210818105443.1578-1-christian.koenig@amd.com

drm/amdgpu: stop scheduler when calling hw_fini (v2)

2021-08-31T18:19:47Z

This gurantees no more work on the ring can be submitted to hardware in suspend/resume case, otherwise a potential race will occur and the ring will get no chance to stay empty before suspend. v2: Call drm_sched_resubmit_job before drm_sched_start to restart jobs from the pending list. Suggested-by: Andrey Grodzovsky Suggested-by: Christian König Signed-off-by: Guchun Chen Reviewed-by: Christian König Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org

drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-16T19:16:58Z

Why: Previously hw fence is alloced separately with job. It caused historical lifetime issues and corner cases. The ideal situation is to take fence to manage both job and fence's lifetime, and simplify the design of gpu-scheduler. How: We propose to embed hw_fence into amdgpu_job. 1. We cover the normal job submission by this method. 2. For ib_test, and submit without a parent job keep the legacy way to create a hw fence separately. v2: use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is embedded in a job. v3: remove redundant variable ring in amdgpu_job v4: add tdr sequence support for this feature. Add a job_run_counter to indicate whether this job is a resubmit job. v5 add missing handling in amdgpu_fence_enable_signaling Signed-off-by: Jingwen Chen Signed-off-by: Jack Zhang Reviewed-by: Andrey Grodzovsky Reviewed by: Monk Liu Signed-off-by: Alex Deucher

drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-06T01:17:58Z

In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop scheduler in s3 test, otherwise, fence related failure will arrive after resume. To fix this and for a better clean up, move drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of driver shutdown, and should never be called in hw_fini. v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to keep sw_init and sw_fini paired. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1668 Fixes: 8d35a2596164c1 ("drm/amdgpu: adjust fence driver enable sequence") Suggested-by: Christian König Tested-by: Mike Lothian Signed-off-by: Guchun Chen Reviewed-by: Christian König Signed-off-by: Alex Deucher