linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c, branch v4.15

drm/amdgpu: bypass lru touch for KIQ ring submission

2017-11-08T22:55:14Z

KIQ ring submission is used for register accessing on SRIOV VF that could happen both in irq enabled and irq disabled cases. Inversion lock could happen on adev->ring_lru_list_lock, while this operation is useless and just adds overhead in this use case. Signed-off-by: Pixel Ding Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: add framework for HW specific priority settings v9

2017-10-09T20:30:21Z

Add an initial framework for changing the HW priorities of rings. The framework allows requesting priority changes for the lifetime of an amdgpu_job. After the job completes the priority will decay to the next lowest priority for which a request is still valid. A new ring function set_priority() can now be populated to take care of the HW specific programming sequence for priority changes. v2: set priority before emitting IB, and take a ref on amdgpu_job v3: use AMD_SCHED_PRIORITY_* instead of AMDGPU_CTX_PRIORITY_* v4: plug amdgpu_ring_restore_priority_cb into amdgpu_job_free_cb v5: use atomic for tracking job priorities instead of last_job v6: rename amdgpu_ring_priority_[get/put]() and align parameters v7: replace spinlocks with mutexes for KIQ compatibility v8: raise ring priority during cs_ioctl, instead of job_run v9: priority_get() before push_job() Reviewed-by: Christian König Acked-by: Christian König Signed-off-by: Andres Rodriguez Signed-off-by: Alex Deucher

drm/amdgpu: map compute rings by least recently used pipe

2017-09-28T20:03:22Z

This patch provides a guarantee that the first n queues allocated by an application will be on different pipes. Where n is the number of pipes available from the hardware. This helps avoid ring aliasing which can result in work executing in time-sliced mode instead of truly parallel mode. Reviewed-by: Alex Deucher Signed-off-by: Andres Rodriguez Signed-off-by: Alex Deucher

drm/amdgpu: set sched_hw_submission higher for KIQ (v3)

2017-08-29T19:27:44Z

KIQ doesn't really use the GPU scheduler. The base drivers generally use the KIQ ring directly rather than submitting IBs. However, amdgpu_sched_hw_submission (which defaults to 2) limits the number of outstanding fences to 2. KFD uses the KIQ for TLB flushes and the 2 fence limit hurts performance when there are several KFD processes running. v2: move some expressions to one line change KIQ sched_hw_submission to at least 16 v3: bump to 256 Reviewed-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: don't finish the ring if not initialized

2017-08-15T18:46:17Z

If a ring is not initialized, it also should not be finished. For example, in Vega10's SR-IOV environment, UVD's decode ring is not initialized, but will be finnished in amdgpu_uvd_sw_fini, because UVD driver put all the uvd decode ring's finish operation into amdgpu_uvd_sw_fini function, while not uvd_vXXX_0_sw_fini. This will lead to amdgpu module unloading failure. Signed-off-by: Trigger Huang Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: use 256 bit buffers for all wb allocations (v2)

2017-08-15T18:46:08Z

May waste a bit of memory, but simplifies the interface significantly. v2: convert internal accounting to use 256bit slots Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: make wb 256bit function names consistent

2017-08-15T18:45:59Z

Use a lower case b to be consistent with the other wb functions. Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu:fix gfx fence allocate size

2017-07-25T20:29:26Z

1, for sriov, we need 8dw for the gfx fence due to CP behaviour 2, cleanup wrong logic in wptr/rptr wb alloc and free Change-Id: Ifbfed17a4621dae57244942ffac7de1743de0294 Signed-off-by: Monk Liu Signed-off-by: Xiangliang Yu Reviewed-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c

2017-06-01T20:00:20Z

In review, Christian would like to keep the logic inside amdgpu_vm.c with a cost of slightly slower. The loop is still optimized out with this patch. v2: remove the if statement. Now it is not slower. Signed-off-by: Alex Xie Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: guarantee bijective mapping of ring ids for LRU v3

2017-05-31T20:49:03Z

Depending on usage patterns, the current LRU policy may create a non-injective mapping between userspace ring ids and kernel rings. This behaviour is undesired as apps that attempt to fill all HW blocks would be unable to reach some of them. This change forces the LRU policy to create bijective mappings only. v2: compress ring_blacklist v3: simplify amdgpu_ring_is_blacklisted() logic Signed-off-by: Andres Rodriguez Reviewed-by: Nicolai Hähnle Signed-off-by: Alex Deucher