<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c, branch v4.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2017-08-24T15:48:45Z</updated>
<entry>
<title>drm/amdgpu: set sched_hw_submission higher for KIQ (v3)</title>
<updated>2017-08-24T15:48:45Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2017-08-22T20:39:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eab2c600fcc8f687f22bd2f2fa2b92ad9a043809'/>
<id>urn:sha1:eab2c600fcc8f687f22bd2f2fa2b92ad9a043809</id>
<content type='text'>
KIQ doesn't really use the GPU scheduler.  The base
drivers generally use the KIQ ring directly rather than
submitting IBs.  However, amdgpu_sched_hw_submission
(which defaults to 2) limits the number of outstanding
fences to 2.  KFD uses the KIQ for TLB flushes and the
2 fence limit hurts performance when there are several KFD
processes running.

v2: move some expressions to one line
    change KIQ sched_hw_submission to at least 16
v3: bump to 256

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: don't finish the ring if not initialized</title>
<updated>2017-08-15T18:46:17Z</updated>
<author>
<name>Trigger Huang</name>
<email>trigger.huang@amd.com</email>
</author>
<published>2017-08-08T10:42:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=41cc07cff24d55661a76efc07d70e80a97af4276'/>
<id>urn:sha1:41cc07cff24d55661a76efc07d70e80a97af4276</id>
<content type='text'>
If a ring is not initialized, it also should not be finished.
For example, in Vega10's SR-IOV environment, UVD's decode ring is not
initialized, but will be finnished in amdgpu_uvd_sw_fini, because UVD
driver put all the uvd decode ring's finish operation into
amdgpu_uvd_sw_fini function, while not uvd_vXXX_0_sw_fini. This will
lead to amdgpu module unloading failure.

Signed-off-by: Trigger Huang &lt;trigger.huang@amd.com&gt;
Reviewed-by: Monk Liu &lt;monk.liu@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: use 256 bit buffers for all wb allocations (v2)</title>
<updated>2017-08-15T18:46:08Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2017-07-28T16:14:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=97407b63ea605c12f328ce46b155026080b34246'/>
<id>urn:sha1:97407b63ea605c12f328ce46b155026080b34246</id>
<content type='text'>
May waste a bit of memory, but simplifies the interface
significantly.

v2: convert internal accounting to use 256bit slots

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: make wb 256bit function names consistent</title>
<updated>2017-08-15T18:45:59Z</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2017-07-27T19:10:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eacf3e149ea64c550925b4635f854062bb535005'/>
<id>urn:sha1:eacf3e149ea64c550925b4635f854062bb535005</id>
<content type='text'>
Use a lower case b to be consistent with the other wb functions.

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu:fix gfx fence allocate size</title>
<updated>2017-07-25T20:29:26Z</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2017-06-19T14:19:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0915fdbc69f58644f437730fbc9e1f1ab426fe18'/>
<id>urn:sha1:0915fdbc69f58644f437730fbc9e1f1ab426fe18</id>
<content type='text'>
1, for sriov, we need 8dw for the gfx fence due to CP
behaviour
2, cleanup wrong logic in wptr/rptr wb alloc and free

Change-Id: Ifbfed17a4621dae57244942ffac7de1743de0294
Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Signed-off-by: Xiangliang Yu &lt;Xiangliang.Yu@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Move compute vm bug logic to amdgpu_vm.c</title>
<updated>2017-06-01T20:00:20Z</updated>
<author>
<name>Alex Xie</name>
<email>AlexBin.Xie@amd.com</email>
</author>
<published>2017-06-01T13:42:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e59c020598666ffc22c627910667e44ac2412304'/>
<id>urn:sha1:e59c020598666ffc22c627910667e44ac2412304</id>
<content type='text'>
  In review, Christian would like to keep the logic
  inside amdgpu_vm.c with a cost of slightly slower.
  The loop is still optimized out with this patch.

v2: remove the if statement. Now it is not slower.

Signed-off-by: Alex Xie &lt;AlexBin.Xie@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koeng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: guarantee bijective mapping of ring ids for LRU v3</title>
<updated>2017-05-31T20:49:03Z</updated>
<author>
<name>Andres Rodriguez</name>
<email>andresx7@gmail.com</email>
</author>
<published>2017-03-17T18:30:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6065343a116fce16f7523ab10841efd942ce612d'/>
<id>urn:sha1:6065343a116fce16f7523ab10841efd942ce612d</id>
<content type='text'>
Depending on usage patterns, the current LRU policy may create a
non-injective mapping between userspace ring ids and kernel rings.

This behaviour is undesired as apps that attempt to fill all HW blocks
would be unable to reach some of them.

This change forces the LRU policy to create bijective mappings only.

v2: compress ring_blacklist
v3: simplify amdgpu_ring_is_blacklisted() logic

Signed-off-by: Andres Rodriguez &lt;andresx7@gmail.com&gt;
Reviewed-by: Nicolai Hähnle &lt;nicolai.haehnle@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute v4</title>
<updated>2017-05-31T20:49:02Z</updated>
<author>
<name>Andres Rodriguez</name>
<email>andresx7@gmail.com</email>
</author>
<published>2017-03-06T21:27:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=795f2813e628bcf57a69f2dfe413360d14a1d7f4'/>
<id>urn:sha1:795f2813e628bcf57a69f2dfe413360d14a1d7f4</id>
<content type='text'>
Use an LRU policy to map usermode rings to HW compute queues.

Most compute clients use one queue, and usually the first queue
available. This results in poor pipe/queue work distribution when
multiple compute apps are running. In most cases pipe 0 queue 0 is
the only queue that gets used.

In order to better distribute work across multiple HW queues, we adopt
a policy to map the usermode ring ids to the LRU HW queue.

This fixes a large majority of multi-app compute workloads sharing the
same HW queue, even though 7 other queues are available.

v2: use ring-&gt;funcs-&gt;type instead of ring-&gt;hw_ip
v3: remove amdgpu_queue_mapper_funcs
v4: change ring_lru_list_lock to spinlock, grab only once in lru_get()

Signed-off-by: Andres Rodriguez &lt;andresx7@gmail.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: Optimize a function called by every IB sheduling</title>
<updated>2017-05-31T18:16:38Z</updated>
<author>
<name>Alex Xie</name>
<email>AlexBin.Xie@amd.com</email>
</author>
<published>2017-05-30T21:10:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dd684d313e280c3bad2ebb7b33e7688ab5409bc9'/>
<id>urn:sha1:dd684d313e280c3bad2ebb7b33e7688ab5409bc9</id>
<content type='text'>
  Move several if statements and a loop statment from
  run time to initialization time.

Signed-off-by: Alex Xie &lt;AlexBin.Xie@amd.com&gt;
Reviewed-by: Chunming Zhou &lt;david1.zhou@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: Correct ring wptr address in debugfs (v2)</title>
<updated>2017-03-30T03:55:53Z</updated>
<author>
<name>Tom St Denis</name>
<email>tom.stdenis@amd.com</email>
</author>
<published>2017-03-29T17:01:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec63982e90a8793a838dfba9037bde597bb565e1'/>
<id>urn:sha1:ec63982e90a8793a838dfba9037bde597bb565e1</id>
<content type='text'>
On gfx9 hardware the value is not wrapped and is a 64-bit value.  So
we reduce it modulo the ring size.

Signed-off-by: Tom St Denis &lt;tom.stdenis@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;

(v2) use buf_mask instead of computing on the fly

Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
