linux/drivers/gpu/drm/amd/amdkfd, branch v4.19

drm/amdkfd: Fix incorrect use of process->mm

2018-10-04T15:37:25Z

This mm_struct pointer should never be dereferenced. If running in a user thread, just use current->mm. If running in a kernel worker use get_task_mm to get a safe reference to the mm_struct. Reviewed-by: Oded Gabbay Acked-by: Christian König Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs

2018-09-20T15:25:23Z

Because CRAT_CU_FLAGS_IOMMU_PRESENT was not set in some BIOS crat, we need to workaround this. For future compatibility, we also overwrite the bit in capability according to the value of needs_iommu_device. Acked-by: Alex Deucher Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9

2018-09-20T15:25:17Z

CWSR fails on Raven if the control stack is MTYPE_UC, which is used for regular GART mappings. As a workaround we map it using MTYPE_NC. The MEC firmware expects the control stack at one page offset from the start of the MQD so it is part of the MQD allocation on GFXv9. AMDGPU added a memory allocation flag just for this purpose. Acked-by: Alex Deucher Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Call kfd2kgd.set_compute_idle

2018-07-16T23:10:37Z

User mode queue submissions don't go through KFD. Therefore we don't know exactly when compute is idle or not idle. We use the existence of user mode queues on a device as an approximation. register_process is called when the first queue of a process is created. Conversely unregister_process is called when the last queue is destroyed. The first process that is registered takes compute out of idle. The last process that is unregisters sets compute back to idle. Signed-off-by: Felix Kuehling Reviewed-by: Eric Huang Reviewed-by: Alex Deucher Signed-off-by: Oded Gabbay

drm/amdkfd: Add CU-masking ioctl to KFD

2018-07-14T23:05:59Z

CU-masking allows a KFD client to control the set of CUs used by a user mode queue for executing compute dispatches. This can be used for optimizing the partitioning of the GPU and minimize conflicts between concurrent tasks. Signed-off-by: Flora Cui Signed-off-by: Kent Russell Signed-off-by: Eric Huang Signed-off-by: Felix Kuehling Acked-by: Oded Gabbay Signed-off-by: Oded Gabbay

drm/amdkfd: Enable Raven for KFD

2018-07-13T20:17:48Z

Add DID and kfd_device_info for Raven. Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Acked-by: Alex Deucher Signed-off-by: Oded Gabbay

drm/amdkfd: Optimize out some duplicated code in kfd_signal_iommu_event()

2018-07-13T20:17:47Z

memory_exception_data is already initialized for not-present faults. It only needs to be overridden for permission faults. Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Acked-by: Alex Deucher Signed-off-by: Oded Gabbay

drm/amdkfd: Workaround to accommodate Raven too many PPR issue

2018-07-13T20:17:46Z

On Raven multiple PPRs can be queued up by the hardware. When the first of those requests is handled by the IOMMU driver, the memory access succeeds. After that the application may be done with the memory and unmap it. At that point the page table entries are invalidated, but there are still outstanding duplicate PPRs for those addresses. When the IOMMU driver processes those duplicate requests, it finds invalid page table entries and triggers an invalid PPR fault. As a workaround, don't signal invalid PPR faults on Raven to avoid segfaulting applications that haven't done anything wrong. As a side effect, real GPU memory access faults may go unnoticed by the application. Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Acked-by: Alex Deucher Signed-off-by: Oded Gabbay

drm/amdkfd: Avoid flooding dmesg on Raven due to IOMMU issues

2018-07-13T20:17:45Z

On Raven Invalid PPRs (peripheral page requests) can be reported because multiple PPRs can be still queued when memory is freed. Apply a rate limit to avoid flooding the log in this case. Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Acked-by: Alex Deucher Signed-off-by: Oded Gabbay

drm/amdkfd: Make SDMA engine number an ASIC-dependent variable

2018-07-13T20:17:44Z

On Raven there is only one SDMA engine instead of previously assumed two, so we need to adapt our code to this new scenario. Signed-off-by: Yong Zhao Reviewed-by: Felix Kuehling Signed-off-by: Felix Kuehling Acked-by: Alex Deucher Signed-off-by: Oded Gabbay