<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdkfd/kfd_debug.c, branch v6.18</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.18</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.18'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-03-11T16:36:38Z</updated>
<entry>
<title>drm/amdkfd: delete stray tab in kfd_dbg_set_mes_debug_mode()</title>
<updated>2025-03-11T16:36:38Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@linaro.org</email>
</author>
<published>2025-03-10T10:47:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5b1fa87f305050d17c553381c39ad8f1e17ce062'/>
<id>urn:sha1:5b1fa87f305050d17c553381c39ad8f1e17ce062</id>
<content type='text'>
These lines are indented one tab more than they should be.  Delete
the stray tabs.

Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Have kfd driver use same PASID values from graphic driver</title>
<updated>2025-02-13T02:02:55Z</updated>
<author>
<name>Xiaogang Chen</name>
<email>xiaogang.chen@amd.com</email>
</author>
<published>2025-01-13T23:35:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8544374c0f82edb285779f21b149826fe2c2977c'/>
<id>urn:sha1:8544374c0f82edb285779f21b149826fe2c2977c</id>
<content type='text'>
Current kfd driver has its own PASID value for a kfd process and uses it to
locate vm at interrupt handler or mapping between kfd process and vm. That
design is not working when a physical gpu device has multiple spatial
partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid
values that graphic driver generated which is per vm per pasid.

These pasid values are passed to fw/hardware. We do not need change interrupt
handler though more pasid values are used. Also, pasid values at log are
replaced by user process pid; pasid values are not exposed to user. Users see
their process pids that have meaning in user space.

Signed-off-by: Xiaogang Chen &lt;xiaogang.chen@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: fixed page fault when enable MES shader debugger</title>
<updated>2025-01-06T20:13:37Z</updated>
<author>
<name>Jesse.zhang@amd.com</name>
<email>Jesse.zhang@amd.com</email>
</author>
<published>2024-12-18T10:23:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9738609449c3e44d1afb73eecab4763362b57930'/>
<id>urn:sha1:9738609449c3e44d1afb73eecab4763362b57930</id>
<content type='text'>
Initialize the process context address before setting the shader debugger.

[  260.781212] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[  260.781236] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[  260.781255] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A40
[  260.781270] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CPC (0x5)
[  260.781284] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x0
[  260.781296] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[  260.781308] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x4
[  260.781320] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  260.781332] amdgpu 0000:03:00.0: amdgpu:      RW: 0x1
[  260.782017] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[  260.782039] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[  260.782058] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A41
[  260.782073] amdgpu 0000:03:00.0: amdgpu:      Faulty UTCL2 client ID: CPC (0x5)
[  260.782087] amdgpu 0000:03:00.0: amdgpu:      MORE_FAULTS: 0x1
[  260.782098] amdgpu 0000:03:00.0: amdgpu:      WALKER_ERROR: 0x0
[  260.782110] amdgpu 0000:03:00.0: amdgpu:      PERMISSION_FAULTS: 0x4
[  260.782122] amdgpu 0000:03:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  260.782137] amdgpu 0000:03:00.0: amdgpu:      RW: 0x1
[  260.782155] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[  260.782166] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10

Fixes: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3849
Signed-off-by: Jesse Zhang &lt;jesse.zhang@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
(cherry picked from commit 5b231f5bc9ff02ec5737f2ec95cdf15ac95088e9)
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>drm/amdkfd: fix debug watchpoints for logical devices</title>
<updated>2024-08-06T14:43:39Z</updated>
<author>
<name>Jonathan Kim</name>
<email>Jonathan.Kim@amd.com</email>
</author>
<published>2024-07-22T17:26:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b41a382932263b2951bc9e83a22168d579a94865'/>
<id>urn:sha1:b41a382932263b2951bc9e83a22168d579a94865</id>
<content type='text'>
The number of watchpoints should be set and constrained per logical
partition device, not by the socket device.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Reviewed-by: Harish Kasiviswanathan &lt;harish.kasiviswanathan@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: fix the kdf debugger issue</title>
<updated>2024-06-05T15:25:00Z</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2024-05-31T01:56:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8178cfb0b48b122dd72ba6ffc2251926f62a0002'/>
<id>urn:sha1:8178cfb0b48b122dd72ba6ffc2251926f62a0002</id>
<content type='text'>
The expression caps | HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED
and  caps | HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED
are always 1/true regardless of the values of its operand.

Fixes: 9243240bed38 ("drm/amdkfd: enable single alu ops for gfx12")
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Suggested-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Check debug trap enable before write dbg_ev_file</title>
<updated>2024-05-08T19:17:05Z</updated>
<author>
<name>Lin.Cao</name>
<email>lincao12@amd.com</email>
</author>
<published>2024-04-24T03:27:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=547033b593063eb85bfdf9b25a5f1b8fd1911be2'/>
<id>urn:sha1:547033b593063eb85bfdf9b25a5f1b8fd1911be2</id>
<content type='text'>
In interrupt context, write dbg_ev_file will be run by work queue. It
will cause write dbg_ev_file execution after debug_trap_disable, which
will cause NULL pointer access.
v2: cancel work "debug_event_workarea" before set dbg_ev_file as NULL.

Signed-off-by: Lin.Cao &lt;lincao12@amd.com&gt;
Reviewed-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: enable single alu ops for gfx12</title>
<updated>2024-05-02T20:18:13Z</updated>
<author>
<name>Jonathan Kim</name>
<email>jonathan.kim@amd.com</email>
</author>
<published>2023-08-21T15:47:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9243240bed3859fba2d15c016902a4c73a186249'/>
<id>urn:sha1:9243240bed3859fba2d15c016902a4c73a186249</id>
<content type='text'>
GFX12 debugging requires setting up precise ALU operation for catching
ALU exceptions.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Tested-by: Lancelot Six &lt;lancelot.six@amd.com&gt;
Reviewed-by: Eric Huang &lt;jinhuieric.huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix variable dereferenced before NULL check in 'kfd_dbg_trap_device_snapshot()'</title>
<updated>2024-01-15T23:35:37Z</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2024-01-05T05:57:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=81d4b970684554dbd8faff90fbcd600b86847a68'/>
<id>urn:sha1:81d4b970684554dbd8faff90fbcd600b86847a68</id>
<content type='text'>
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:1024 kfd_dbg_trap_device_snapshot() warn: variable dereferenced before check 'entry_size' (see line 1021)

Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: fix and enable ttmp setup for gfx11</title>
<updated>2023-07-27T18:59:29Z</updated>
<author>
<name>Jonathan Kim</name>
<email>jonathan.kim@amd.com</email>
</author>
<published>2023-06-12T15:31:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc7f1d9697bcd3f7a4c64fb00a0e9bb050eaaa2a'/>
<id>urn:sha1:fc7f1d9697bcd3f7a4c64fb00a0e9bb050eaaa2a</id>
<content type='text'>
The MES cached process context must be cleared on adding any queue for
the first time.

For proper debug support, the MES will clear it's cached process context
on the first call to SET_SHADER_DEBUGGER.

This allows TTMPs to be pesistently enabled in a safe manner.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Reviewed-by: Eric Huang &lt;jinhuieric@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: enable cooperative groups for gfx11</title>
<updated>2023-07-25T17:35:43Z</updated>
<author>
<name>Jonathan Kim</name>
<email>jonathan.kim@amd.com</email>
</author>
<published>2023-07-12T20:58:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7a1c5c6753858cbbf0b073eaa9b53d8f56ee0927'/>
<id>urn:sha1:7a1c5c6753858cbbf0b073eaa9b53d8f56ee0927</id>
<content type='text'>
MES can concurrently schedule queues on the device that require
exclusive device access if marked exclusively_scheduled without the
requirement of GWS.  Similar to the F32 HWS, MES will manage
quality of service for these queues.
Use this for cooperative groups since cooperative groups are device
occupancy limited.

Since some GFX11 devices can only be debugged with partial CUs, do not
allow the debugging of cooperative groups on these devices as the CU
occupancy limit will change on attach.

In addition, zero initialize the MES add queue submission vector for MES
initialization tests as we do not want these to be cooperative
dispatches.

Signed-off-by: Jonathan Kim &lt;jonathan.kim@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
