linux/drivers/gpu/drm/amd/amdkfd, branch v6.0

drm/amdkfd: fix dropped interrupt in kfd_int_process_v11

2022-09-27T21:54:25Z

Shader wave interrupts were getting dropped in event_interrupt_wq_v11 if the PRIV bit was set to 1. This would often lead to a hang. Until debugger logic is upstreamed, expand comment to stop early return. Signed-off-by: Graham Sider Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alex Deucher

drm/amdgpu: pass queue size and is_aql_queue to MES

2022-09-27T21:54:12Z

Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also re-use gds_size for the queue size (unused for KFD). MES requires the queue size in order to compute the actual wptr offset within the queue RB since it increases monotonically for AQL queues. v2: Make is_aql_queue assign clearer Signed-off-by: Graham Sider Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: fix MQD init for GFX11 in init_mqd

2022-09-27T21:54:01Z

Set remaining compute_static_thread_mgmt_se* accordingly. Signed-off-by: Graham Sider Acked-by: Alex Deucher Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Fix isa version for the GC 10.3.7

2022-08-25T17:54:08Z

Correct the isa version for handling KFD test. Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions") Signed-off-by: Prike Liang Reviewed-by: Aaron Liu Signed-off-by: Alex Deucher

Merge tag 'amd-drm-fixes-6.0-2022-08-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

2022-08-18T23:45:22Z

amd-drm-fixes-6.0-2022-08-17: amdgpu: - Revert some DML stack changes - Rounding fixes in KFD allocations - atombios vram info table parsing fix - DCN 3.1.4 fixes - Clockgating fixes for various new IPs - SMU 13.0.4 fixes - DCN 3.1.4 FP fixes - TMDS fixes for YCbCr420 4k modes - DCN 3.2.x fixes - USB 4 fixes - SMU 13.0 fixes - SMU driver unload memory leak fixes - Display orientation fix - Regression fix for generic fbdev conversion - SDMA 6.x fixes - SR-IOV fixes - IH 6.x fixes - Use after free fix in bo list handling - Revert pipe1 support - XGMI hive reset fix amdkfd: - Fix potential crach in kfd_create_indirect_link_prop() Signed-off-by: Dave Airlie From: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20220818025206.6463-1-alexander.deucher@amd.com

drm/amdkfd: potential crash in kfd_create_indirect_link_prop()

2022-08-16T22:06:29Z

This code has two bugs. If kfd_topology_device_by_proximity_domain() failed on the first iteration through the loop then "cpu_link" is uninitialized and should not be dereferenced. The second bug is that we cannot dereference a list iterator when it points to the list head. In other words, if we exit the list_for_each_entry() loop exits without hitting a break then "cpu_link" is not a valid pointer and should not be dereferenced. Fix both of these problems by setting "cpu_link" to NULL when it is invalid and non-NULL when it is valid. That makes it easier to test for valid vs invalid. Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: Dan Carpenter Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: reserve 2 queues for sdma 6.0.1 in bitmap

2022-08-16T22:06:23Z

There is only one engine in sdma 6.0.1, the total number of reserved queues should be 2, reflect this number in bitmap as well. Signed-off-by: Yifan Zhang Reviewed-by: Tim Huang Signed-off-by: Alex Deucher

drm/amdkfd: Fix mm reference in SVM eviction worker

2022-08-16T22:03:23Z

Use the mm reference from the fence. This allows removing the svm_bo->svms pointer, which was problematic because we cannot assume that the struct kfd_process containing the svms is still allocated without holding a refcount on the process. Use mmget_not_zero to ensure the mm is still valid, and drop the svm_bo reference if it isn't. Signed-off-by: Felix Kuehling Reviewed-by: Philip Yang Signed-off-by: Alex Deucher

drm/amdkfd: Handle restart of kfd_ioctl_wait_events

2022-08-10T19:41:23Z

When kfd_ioctl_wait_events needs to restart due to a signal, we need to update the timeout to account for the time already elapsed. We also need to undo auto_reset of events that have signaled already, so that the restarted ioctl will be able to count those signals again. This fixes infinite hangs when kfd_ioctl_wait_events is interrupted by a signal. Signed-off-by: Felix Kuehling Reviewed-and-tested-by: Xiaogang Chen Signed-off-by: Alex Deucher

Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2022-08-05T23:32:45Z

Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...