<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdkfd, branch v5.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-09-15T22:16:04Z</updated>
<entry>
<title>drm/amdkfd: fix a memory leak issue</title>
<updated>2020-09-15T22:16:04Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-09-02T09:11:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=087d764159996ae378b08c0fdd557537adfd6899'/>
<id>urn:sha1:087d764159996ae378b08c0fdd557537adfd6899</id>
<content type='text'>
In the resume stage of GPU recovery, start_cpsch will call pm_init
which set pm-&gt;allocated as false, cause the next pm_release_ib has
no chance to release ib memory.

Add pm_release_ib in stop_cpsch which will be called in the suspend
stage of GPU recovery.

Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/kfd: fix a system crash issue during GPU recovery</title>
<updated>2020-09-15T22:15:28Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-09-02T04:57:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=66a5710beaf42903d553378f609166034bd219c7'/>
<id>urn:sha1:66a5710beaf42903d553378f609166034bd219c7</id>
<content type='text'>
The crash log as the below:

[Thu Aug 20 23:18:14 2020] general protection fault: 0000 [#1] SMP NOPTI
[Thu Aug 20 23:18:14 2020] CPU: 152 PID: 1837 Comm: kworker/152:1 Tainted: G           OE     5.4.0-42-generic #46~18.04.1-Ubuntu
[Thu Aug 20 23:18:14 2020] Hardware name: GIGABYTE G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020
[Thu Aug 20 23:18:14 2020] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[Thu Aug 20 23:18:14 2020] RIP: 0010:evict_process_queues_cpsch+0xc9/0x130 [amdgpu]
[Thu Aug 20 23:18:14 2020] Code: 49 8d 4d 10 48 39 c8 75 21 eb 44 83 fa 03 74 36 80 78 72 00 74 0c 83 ab 68 01 00 00 01 41 c6 45 41 00 48 8b 00 48 39 c8 74 25 &lt;80&gt; 78 70 00 c6 40 6d 01 74 ee 8b 50 28 c6 40 70 00 83 ab 60 01 00
[Thu Aug 20 23:18:14 2020] RSP: 0018:ffffb29b52f6fc90 EFLAGS: 00010213
[Thu Aug 20 23:18:14 2020] RAX: 1c884edb0a118914 RBX: ffff8a0d45ff3c00 RCX: ffff8a2d83e41038
[Thu Aug 20 23:18:14 2020] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8a0e2e4178c0
[Thu Aug 20 23:18:14 2020] RBP: ffffb29b52f6fcb0 R08: 0000000000001b64 R09: 0000000000000004
[Thu Aug 20 23:18:14 2020] R10: ffffb29b52f6fb78 R11: 0000000000000001 R12: ffff8a0d45ff3d28
[Thu Aug 20 23:18:14 2020] R13: ffff8a2d83e41028 R14: 0000000000000000 R15: 0000000000000000
[Thu Aug 20 23:18:14 2020] FS:  0000000000000000(0000) GS:ffff8a0e2e400000(0000) knlGS:0000000000000000
[Thu Aug 20 23:18:14 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Aug 20 23:18:14 2020] CR2: 000055c783c0e6a8 CR3: 00000034a1284000 CR4: 0000000000340ee0
[Thu Aug 20 23:18:14 2020] Call Trace:
[Thu Aug 20 23:18:14 2020]  kfd_process_evict_queues+0x43/0xd0 [amdgpu]
[Thu Aug 20 23:18:14 2020]  kfd_suspend_all_processes+0x60/0xf0 [amdgpu]
[Thu Aug 20 23:18:14 2020]  kgd2kfd_suspend.part.7+0x43/0x50 [amdgpu]
[Thu Aug 20 23:18:14 2020]  kgd2kfd_pre_reset+0x46/0x60 [amdgpu]
[Thu Aug 20 23:18:14 2020]  amdgpu_amdkfd_pre_reset+0x1a/0x20 [amdgpu]
[Thu Aug 20 23:18:14 2020]  amdgpu_device_gpu_recover+0x377/0xf90 [amdgpu]
[Thu Aug 20 23:18:14 2020]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[Thu Aug 20 23:18:14 2020]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[Thu Aug 20 23:18:14 2020]  process_one_work+0x20f/0x400
[Thu Aug 20 23:18:14 2020]  worker_thread+0x34/0x410

When GPU hang, user process will fail to create a compute queue whose
struct object will be freed later, but driver wrongly add this queue to
queue list of the proccess. And then kfd_process_evict_queues will
access a freed memory, which cause a system crash.

v2:
The failure to execute_queues should probably not be reported to
the caller of create_queue, because the queue was already created.
Therefore change to ignore the return value from execute_queues.

Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amd/amdkfd: Fix large framesize for kfd_smi_ev_read()</title>
<updated>2020-07-15T17:27:34Z</updated>
<author>
<name>Aurabindo Pillai</name>
<email>aurabindo.pillai@amd.com</email>
</author>
<published>2020-05-19T20:48:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e14adea0ac3037d923a9591d1a094c115d7947c'/>
<id>urn:sha1:6e14adea0ac3037d923a9591d1a094c115d7947c</id>
<content type='text'>
The buffer allocated is of 1024 bytes. Allocate this from
heap instead of stack.

Also remove check for stack size since we're allocating from heap

Signed-off-by: Aurabindo Pillai &lt;aurabindo.pillai@amd.com&gt;
Tested-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Provide SMI events watch</title>
<updated>2020-07-15T17:27:34Z</updated>
<author>
<name>Amber Lin</name>
<email>Amber.Lin@amd.com</email>
</author>
<published>2020-05-13T12:19:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=938a0650aae6275ba8e924685836bdee2c6aa3db'/>
<id>urn:sha1:938a0650aae6275ba8e924685836bdee2c6aa3db</id>
<content type='text'>
When the compute is malfunctioning or performance drops, the system admin
will use SMI (System Management Interface) tool to monitor/diagnostic what
went wrong. This patch provides an event watch interface for the user
space to register devices and subscribe events they are interested. After
registered, the user can use annoymous file descriptor's poll function
with wait-time specified and wait for events to happen. Once an event
happens, the user can use read() to retrieve information related to the
event.

VM fault event is done in this patch.

v2: - remove UNREGISTER and add event ENABLE/DISABLE
    - correct kfifo usage
    - move event message API to kfd_ioctl.h
v3: send the event msg in text than in binary
v4: support multiple clients
v5: move events enablement from ioctl to fd write
v6: sparse fix

Signed-off-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add kfd2kgd_funcs for navy_flounder kfd support</title>
<updated>2020-07-15T16:46:59Z</updated>
<author>
<name>Chengming Gui</name>
<email>Jack.Gui@amd.com</email>
</author>
<published>2020-06-05T02:59:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=09759e13f4b97dde05f13520b47618bd37e415a5'/>
<id>urn:sha1:09759e13f4b97dde05f13520b47618bd37e415a5</id>
<content type='text'>
Add callbacks to KGD for navy flounder.

Signed-off-by: Chengming Gui &lt;Jack.Gui@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Support navy_flounder KFD</title>
<updated>2020-07-15T16:46:55Z</updated>
<author>
<name>Chengming Gui</name>
<email>Jack.Gui@amd.com</email>
</author>
<published>2020-06-02T08:15:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=de89b2e456f7f37ae4680c6a063eeb5bd09cf148'/>
<id>urn:sha1:de89b2e456f7f37ae4680c6a063eeb5bd09cf148</id>
<content type='text'>
Add KFD support for Navy Flounder.

Signed-off-by: Chengming Gui &lt;Jack.Gui@amd.com&gt;
Reviewed-by: Tao Zhou &lt;tao.zhou1@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: fix kernel-doc and cleanup</title>
<updated>2020-07-15T16:41:04Z</updated>
<author>
<name>Rajneesh Bhardwaj</name>
<email>rajneesh.bhardwaj@amd.com</email>
</author>
<published>2020-07-13T15:15:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a4497974ed339985fa89fabde9d5f6038bf1a59e'/>
<id>urn:sha1:a4497974ed339985fa89fabde9d5f6038bf1a59e</id>
<content type='text'>
 - fix some styling issues
 - fixes for kernel-doc type

Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Rajneesh Bhardwaj &lt;rajneesh.bhardwaj@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Remove redundant kfd2kgd interface lookup</title>
<updated>2020-07-08T13:02:54Z</updated>
<author>
<name>Felix Kuehling</name>
<email>Felix.Kuehling@amd.com</email>
</author>
<published>2020-07-01T02:28:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c12139118bb631d26d359f8e272cedeff0fb7516'/>
<id>urn:sha1:c12139118bb631d26d359f8e272cedeff0fb7516</id>
<content type='text'>
kfd_pasid.c isn't using the kfd2kgd interface any more. Remove redundant
code trying to look up a device for finding that interface.

Signed-off-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Reviewed-by: Kent Russell &lt;kent.russell@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Add Arcturus GWS support and fix VG10</title>
<updated>2020-07-02T16:02:56Z</updated>
<author>
<name>Joseph Greathouse</name>
<email>Joseph.Greathouse@amd.com</email>
</author>
<published>2020-06-29T23:05:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fea7d919158ab04ddefe2b1cf1b8e21418f4e569'/>
<id>urn:sha1:fea7d919158ab04ddefe2b1cf1b8e21418f4e569</id>
<content type='text'>
Add support for GWS in Arcturus, which needs MEC2 firmware #48
or above. Fix the MEC2 version check for Vega 10 GWS support,
since Vega 10 firmware adds 0x8000 to the actual firmware
revision. We were previously declaring support where it did not
exist.

Signed-off-by: Joseph Greathouse &lt;Joseph.Greathouse@amd.com&gt;
Reviewed-by: Kent Russell &lt;kent.russell@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Update hardware scheduling time quanta</title>
<updated>2020-07-02T16:02:55Z</updated>
<author>
<name>Joseph Greathouse</name>
<email>Joseph.Greathouse@amd.com</email>
</author>
<published>2020-06-29T21:23:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d7c6f18d2a9262e6daf5dd8806961d32060c976'/>
<id>urn:sha1:5d7c6f18d2a9262e6daf5dd8806961d32060c976</id>
<content type='text'>
Update PROCESS_QUANTUM, the time the hardware scheduler allows
processes to run before switching to other processes when it becomes
over-subscribed. Increase this to 10ms, to allow processes to better
amortize their task switch times.

Update HQD Quantum, the amount of time that an active queue stays
attached to the CP before we forcibly switch it for another active
queue for fairness.

Setting these so that HQD &lt; PROCESS makes it easier to ensure that
we get fairness when we have multiple active queues on the device.
Otherwise we may start process-swapping before we get to all the
queues in a CP.

Signed-off-by: Joseph Greathouse &lt;Joseph.Greathouse@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
