aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-06-19drm/amdgpu: init TA fw for psp v14Likun Gao1-0/+5
Add support to init TA firmware for psp v14. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: refine gfx6 firmware loadingYang Wang1-10/+9
refine gfx6 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19Revert "drm/amdgpu: change aca bank error lock type to spinlock"Yang Wang2-11/+11
This reverts commit f6bce954f432c556659a57be9e18fecdc575affb. Revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: YiPeng Chai <yipeng.chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19Revert "drm/amdgpu: change bank cache lock type to spinlock"Yang Wang2-6/+7
This reverts commit 258ed689bc3163f86204f75df6c23f92b59b3fad revert this patch to modify lock type back to 'mutex' to avoid kernel calltrace issue. [ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu] [ 602.668939] Call Trace: [ 602.668940] <TASK> [ 602.668941] dump_stack_lvl+0x4c/0x70 [ 602.668945] dump_stack+0x14/0x20 [ 602.668946] __schedule_bug+0x5a/0x70 [ 602.668950] __schedule+0x940/0xb30 [ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668955] ? hrtimer_reprogram+0x77/0xb0 [ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.668959] ? hrtimer_start_range_ns+0x126/0x370 [ 602.668961] schedule+0x39/0xe0 [ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140 [ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 602.668966] schedule_hrtimeout_range+0x17/0x20 [ 602.668967] usleep_range_state+0x69/0x90 [ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu] [ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu] [ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu] [ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu] [ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669339] ? stack_depot_save+0x12/0x20 [ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669343] ? set_track_prepare+0x52/0x70 [ 602.669346] ? kmemleak_alloc+0x4f/0x90 [ 602.669348] ? __kmalloc_node+0x34b/0x450 [ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu] [ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu] [ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu] [ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu] [ 602.669743] ? kmemleak_free+0x36/0x60 [ 602.669745] ? kvfree+0x32/0x40 [ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669749] ? kfree+0x15d/0x2a0 [ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu] [ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu] [ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0 [ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu] [ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu] [ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu] [ 602.670187] ? srso_alias_return_thunk+0x5/0xfbef5 [ 602.670189] ? __schedule+0x37d/0xb30 [ 602.670191] process_one_work+0x176/0x350 [ 602.670194] worker_thread+0x2f7/0x420 [ 602.670197] ? Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: remove amdgpu_mes_fence_wait_polling()Alex Deucher2-16/+0
No longer used so remove it. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: cleanup MES12 command submissionAlex Deucher1-28/+48
The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: ade887c63394 ("drm/amdgpu/mes12: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: refine gfx10 firmware loadingYang Wang1-13/+12
refine gfx10 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: refine gfx9 firmware loadingYang Wang2-30/+26
refine gfx9 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: cleanup MES11 command submissionChristian König1-28/+48
The approach of having a separate WB slot for each submission doesn't really work well and for example breaks GPU reset. Use a status query packet for the fence update instead since those should always succeed we can use the fence of the original packet to signal the state of the operation. While at it cleanup the coding style. Fixes: eef016ba8986 ("drm/amdgpu/mes11: Use a separate fence per transaction") Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: fix using the reserved VMID with gang submitChristian König3-13/+45
We need to ensure that even when using a reserved VMID that the gang members can still run in parallel. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19drm/amdgpu: Do not wait for MP0_C2PMSG_33 IFWI init in SRIOVVictor Lu1-12/+14
SRIOV does not need to wait for IFWI init, and MP0_C2PMSG_33 is blocked for VF access. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19Revert "drm/amdgpu: Add missing locking for MES API calls"Mukul Joshi1-12/+0
This reverts commit 3612702852acbded39233b1600c8d9f47e40139f. This is causing a BUG message during suspend. [ 61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283 [ 61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14 [ 61.603553] preempt_count: 1, expected: 0 [ 61.603555] RCU nest depth: 0, expected: 0 [ 61.603557] Preemption disabled at: [ 61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G W 6.8.0+ #7 [ 61.603795] Workqueue: events_unbound async_run_entry_fn [ 61.603801] Call Trace: [ 61.603803] <TASK> [ 61.603806] dump_stack_lvl+0x37/0x50 [ 61.603811] ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu] [ 61.604007] dump_stack+0x10/0x20 [ 61.604010] __might_resched+0x16f/0x1d0 [ 61.604016] __might_sleep+0x43/0x70 [ 61.604020] mutex_lock+0x1f/0x60 [ 61.604024] amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu] [ 61.604226] gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu] [ 61.604422] ? srso_alias_return_thunk+0x5/0xfbef5 [ 61.604429] amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu] [ 61.604621] gfx_v11_0_hw_fini+0xda/0x100 [amdgpu] [ 61.604814] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 61.605008] amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu] [ 61.605175] amdgpu_device_suspend+0xec/0x180 [amdgpu] Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine gfx8 firmware loadingYang Wang1-36/+33
refine gfx8 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: set RAS fed status for more casesTao Zhou2-0/+2
Indicate fatal error for each RAS block and NBIO. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: create amdgpu_ras_in_recovery to simplify codeTao Zhou4-35/+25
Reduce redundant code and user doesn't need to pay attention to RAS details. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: trigger mode1 reset for RAS RMA statusTao Zhou3-11/+29
Check RMA status in bad page retirement flow. v2: fix coding bugs in v1. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine gfx7 firmware loadingYang Wang1-14/+13
refine gfx7 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2Christian König3-51/+0
This reverts commit b8c415e3bf98 ("drm/amdgpu: take runtime pm reference when we attach a buffer") and commit 425285d39afd ("drm/amdgpu: add amdgpu runpm usage trace for separate funcs"). Taking a runtime pm reference for DMA-buf is actually completely unnecessary and even dangerous. The problem is that calling pm_runtime_get_sync() from the DMA-buf callbacks is illegal because we have the reservation locked here which is also taken during resume. So this would deadlock. When the buffer is in GTT it is still accessible even when the GPU is powered down and when it is in VRAM the buffer gets migrated to GTT before powering down. The only use case which would make it mandatory to keep the runtime pm reference would be if we pin the buffer into VRAM, and that's not something we currently do. v2: improve the commit message Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org
2024-06-14drm/amdgpu: refine imu firmware loadingYang Wang2-12/+8
refine imu firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine gmc firmware loadingYang Wang3-19/+8
refine gmc firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine vpe firmware loadingYang Wang1-4/+2
refine vpe firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine vcn firmware loadingYang Wang1-9/+5
refine vcn firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: move aca/mca init functions into ras_init() stageYang Wang4-37/+69
adjust the function position to better match aca/mca fini code in ras_fini(). Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: fix overflowed constant warning in mmhub_set_clockgating()Bob Zhou4-4/+4
To fix potential overflowed constant warning, modify the variables to u32 for getting the return value of RREG32_SOC15(). Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: Indicate CU havest info to CPHarish Kasiviswanathan1-2/+13
To achieve full occupancy CP hardware needs to know if CUs in SE are symmetrically or asymmetrically harvested v2: Reset is_symmetric_cus for each loop Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: Skip coredump during resets for debugLijo Lazar1-0/+1
Skip scheduling coredump when gpu reset is intentionally triggered through debugfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine sdma firmware loadingYang Wang4-19/+22
refine sdma firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine psp firmware loadingYang Wang1-19/+7
refine psp firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine mes firmware loadingYang Wang1-4/+2
v1: refine mes firmware loading v2: use dev_info instead of DRM_INFO Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: Add missing locking for MES API callsMukul Joshi1-0/+12
Add missing locking at a few places when calling MES APIs to ensure exclusive access to MES queue. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amd/display: Set default brightness according to ACPIMario Limonciello2-0/+5
Currently, amdgpu will always set up the brightness at 100% when it loads. However this is jarring when the BIOS has it previously programmed to a much lower value. The ACPI ATIF method includes two members for "ac_level" and "dc_level". These represent the default values that should be used if the system is brought up in AC and DC respectively. Use these values to set up the default brightness when the backlight device is registered. v2: squash in ACPI fix Reviewed-by: Leo Li <sunpeng.li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: drop some kernel messages in VCN codeDavid (Ming Qiang) Wu14-104/+17
Similar to commit 813e7d4cd05e where some kernel log messages are dropped. With this commit, more log messages in older version of VCN/JPEG code are dropped. Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: refine gpu_info firmware loadingYang Wang1-5/+4
refine gpu_info firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: enhance amdgpu_ucode_request() function flexibilityYang Wang2-9/+24
v1: Adding formatting string feature to improve function flexibility. v2: modify macro name to ADMGPU_UCODE_MAX_NAME. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: fix the overflowed constant warning for RREG32_SOC15()Bob Zhou1-3/+4
To fix potential overflowed constant warning reported by Coverity, modify the variables to uint32_t. Signed-off-by: Bob Zhou <bob.zhou@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: add lock in amdgpu_gart_invalidate_tlbYunxiang Li1-1/+5
We need to take the reset domain lock before flush hdp. We can't put the lock inside amdgpu_device_flush_hdp itself because it is used during reset where we already take the write side lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: fix locking scope when flushing tlbYunxiang Li1-32/+34
Which method is used to flush tlb does not depend on whether a reset is in progress or not. We should skip flush altogether if the GPU will get reset. So put both path under reset_domain read lock. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org
2024-06-14drm/amdgpu: call flush_gpu_tlb directly in gfxhub enableYunxiang Li3-5/+7
Here since we are in reset and takes the reset_domain write side lock already. We can't use the flush tlb helper which tries to take the read side. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: use helper in amdgpu_gart_unbindYunxiang Li1-4/+1
When amdgpu_gart_invalidate_tlb helper is introduced this part was left out of the conversion. Avoid the code duplication here. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recoverYunxiang Li1-2/+0
At this point the gart is not set up, there's no point to invalidate tlb here and it could even be harmful. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: fix sriov host flr handlerYunxiang Li6-52/+50
We send back the ready to reset message before we stop anything. This is wrong. Move it to when we are actually ready for the FLR to happen. In the current state since we take tens of seconds to stop everything, it is very likely that host would give up waiting and reset the GPU before we send ready, so it would be the same as before. But this gets rid of the hack with reset_domain locking and also let us tell how slow ready to reset actually is from the host. The ready to reset speed can be improved later. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: add skip_hw_access checks for sriovYunxiang Li1-0/+9
Accessing registers via host is missing the check for skip_hw_access and the lockdep check that comes with it. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: add reset source in various casesEric Huang3-0/+3
To fullfill the reset event description. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: fix NULL pointer in amdgpu_reset_get_descEric Huang1-4/+2
amdgpu_job_ring may return NULL, which causes kernel NULL pointer error, using another way to print ring name instead of ring->name. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: remove dead code in atom_get_src_intJesse Zhang1-4/+4
Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7. In the case of ATOM_ARG_IMM, the code cannot reach the default case. So there is no need for "break". Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: add sdma 7.0 support for copy dcc bufferFrank Min4-4/+27
1. Add dcc buffer flag for copy buffer 2. Add sdma 7.0 support copy dcc buffer Signed-off-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: support for DCC featureLikun Gao2-0/+9
Deal with AMDGPU_GEM_CREATE_GFX12_DCC to set DCC bit when needed. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: add additional VM bitsAlex Deucher1-0/+1
Add additional VM PTE bits. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard5-18/+27
Roll -rc3 and current drm/fixes in. This will also unstuck our for-next branch. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-06-11Merge tag 'amd-drm-next-6.11-2024-06-07' of ↵Dave Airlie121-1498/+14182
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.11-2024-06-07: amdgpu: - DCN 4.0.x support - DCN 3.5 updates - GC 12.0 support - DP MST fixes - Cursor fixes - MES11 updates - MMHUB 4.1 support - DML2 Updates - DCN 3.1.5 fixes - IPS fixes - Various code cleanups - GMC 12.0 support - SDMA 7.0 support - SMU 13 updates - SR-IOV fixes - VCN 5.x fixes - MES12 support - SMU 14.x updates - Devcoredump improvements - Fixes for HDP flush on platforms with >4k pages - GC 9.4.3 fixes - RAS ACA updates - Silence UBSAN flex array warnings - MMHUB 3.3 updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings UAPI: - GFX12 modifier and DCC support Proposed Mesa changes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510 - KFD GFX ALU exceptions Proposed ROCdebugger changes: https://github.com/ROCm/ROCdbgapi/commit/08c760622b6601abf906f75abbc5e21d9fd425df https://github.com/ROCm/ROCgdb/commit/944fe1c1414a68700414e86e32273b6bfa62ba6f - KFD Contiguous VRAM allocation flag Proposed ROCr/HIP changes: https://github.com/ROCm/ROCT-Thunk-Interface/commit/f7b4a269914a3ab4f1e2453c2879adb97b5cc9e5 https://github.com/ROCm/ROCR-Runtime/pull/214/commits/26e8530d05a775872cb06dde6693db72be0c454a https://github.com/ROCm/clr/commit/1d48f2a1ab38b632919c4b7274899b3faf4279ff Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240607195900.902537-1-alexander.deucher@amd.com