linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c, branch v4.14

drm/amdgpu: set sched_hw_submission higher for KIQ (v3)

2017-08-24T15:48:45Z

KIQ doesn't really use the GPU scheduler. The base drivers generally use the KIQ ring directly rather than submitting IBs. However, amdgpu_sched_hw_submission (which defaults to 2) limits the number of outstanding fences to 2. KFD uses the KIQ for TLB flushes and the 2 fence limit hurts performance when there are several KFD processes running. v2: move some expressions to one line change KIQ sched_hw_submission to at least 16 v3: bump to 256 Reviewed-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: don't finish the ring if not initialized

2017-08-15T18:46:17Z

If a ring is not initialized, it also should not be finished. For example, in Vega10's SR-IOV environment, UVD's decode ring is not initialized, but will be finnished in amdgpu_uvd_sw_fini, because UVD driver put all the uvd decode ring's finish operation into amdgpu_uvd_sw_fini function, while not uvd_vXXX_0_sw_fini. This will lead to amdgpu module unloading failure. Signed-off-by: Trigger Huang Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: use 256 bit buffers for all wb allocations (v2)

2017-08-15T18:46:08Z

May waste a bit of memory, but simplifies the interface significantly. v2: convert internal accounting to use 256bit slots Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: make wb 256bit function names consistent

2017-08-15T18:45:59Z

Use a lower case b to be consistent with the other wb functions. Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu:fix gfx fence allocate size

2017-07-25T20:29:26Z

1, for sriov, we need 8dw for the gfx fence due to CP behaviour 2, cleanup wrong logic in wptr/rptr wb alloc and free Change-Id: Ifbfed17a4621dae57244942ffac7de1743de0294 Signed-off-by: Monk Liu Signed-off-by: Xiangliang Yu Reviewed-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c

2017-06-01T20:00:20Z

In review, Christian would like to keep the logic inside amdgpu_vm.c with a cost of slightly slower. The loop is still optimized out with this patch. v2: remove the if statement. Now it is not slower. Signed-off-by: Alex Xie Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: guarantee bijective mapping of ring ids for LRU v3

2017-05-31T20:49:03Z

Depending on usage patterns, the current LRU policy may create a non-injective mapping between userspace ring ids and kernel rings. This behaviour is undesired as apps that attempt to fill all HW blocks would be unable to reach some of them. This change forces the LRU policy to create bijective mappings only. v2: compress ring_blacklist v3: simplify amdgpu_ring_is_blacklisted() logic Signed-off-by: Andres Rodriguez Reviewed-by: Nicolai Hähnle Signed-off-by: Alex Deucher

drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute v4

2017-05-31T20:49:02Z

Use an LRU policy to map usermode rings to HW compute queues. Most compute clients use one queue, and usually the first queue available. This results in poor pipe/queue work distribution when multiple compute apps are running. In most cases pipe 0 queue 0 is the only queue that gets used. In order to better distribute work across multiple HW queues, we adopt a policy to map the usermode ring ids to the LRU HW queue. This fixes a large majority of multi-app compute workloads sharing the same HW queue, even though 7 other queues are available. v2: use ring->funcs->type instead of ring->hw_ip v3: remove amdgpu_queue_mapper_funcs v4: change ring_lru_list_lock to spinlock, grab only once in lru_get() Signed-off-by: Andres Rodriguez Signed-off-by: Alex Deucher

drm/amdgpu: Optimize a function called by every IB sheduling

2017-05-31T18:16:38Z

Move several if statements and a loop statment from run time to initialization time. Signed-off-by: Alex Xie Reviewed-by: Chunming Zhou Signed-off-by: Alex Deucher

drm/amd/amdgpu: Correct ring wptr address in debugfs (v2)

2017-03-30T03:55:53Z

On gfx9 hardware the value is not wrapped and is a 64-bit value. So we reduce it modulo the ring size. Signed-off-by: Tom St Denis Reviewed-by: Christian König (v2) use buf_mask instead of computing on the fly Signed-off-by: Alex Deucher