<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/gpu/drm/amd/amdkfd/kfd_queue.c, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-04-07T19:18:59Z</updated>
<entry>
<title>drm/amdkfd: Drop workaround for GC v9.4.3 revID 0</title>
<updated>2025-04-07T19:18:59Z</updated>
<author>
<name>Apurv Mishra</name>
<email>Apurv.Mishra@amd.com</email>
</author>
<published>2025-03-17T18:00:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=daafa303d19f5522e4c24fbf5c1c981a16df2c2f'/>
<id>urn:sha1:daafa303d19f5522e4c24fbf5c1c981a16df2c2f</id>
<content type='text'>
Remove workaround code for the early engineering
samples GC v9.4.3 SOCs with revID 0

Reviewed-by: Amber Lin &lt;Amber.Lin@amd.com&gt;
Signed-off-by: Apurv Mishra &lt;Apurv.Mishra@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix NULL Pointer Dereference in KFD queue</title>
<updated>2025-03-05T15:45:35Z</updated>
<author>
<name>Andrew Martin</name>
<email>Andrew.Martin@amd.com</email>
</author>
<published>2025-02-28T16:26:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=049e5bf3c8406f87c3d8e1958e0a16804fa1d530'/>
<id>urn:sha1:049e5bf3c8406f87c3d8e1958e0a16804fa1d530</id>
<content type='text'>
Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence
when calling kfd_queue_acquire_buffers.

Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size")
Signed-off-by: Andrew Martin &lt;Andrew.Martin@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Andrew Martin &lt;Andrew.Martin@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix user queue validation on Gfx7/8</title>
<updated>2025-02-17T19:09:29Z</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2025-01-29T17:37:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e7a477735f1771b9a9346a5fbd09d7ff0641723a'/>
<id>urn:sha1:e7a477735f1771b9a9346a5fbd09d7ff0641723a</id>
<content type='text'>
To workaround queue full h/w issue on Gfx7/8, when application create
AQL queue, the ring buffer bo allocate size is queue_size/2 and
map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each
attachment map size is queue_size/2, with same ring_bo backing memory.

For Gfx7/8, user queue buffer validation should use queue_size/2 to
verify ring_bo allocation and mapping size.

Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers")
Suggested-by: Tomáš Trnka &lt;trnka@scm.com&gt;
Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: update the cwsr area size for gfx950</title>
<updated>2024-12-10T15:26:51Z</updated>
<author>
<name>Le Ma</name>
<email>le.ma@amd.com</email>
</author>
<published>2024-08-07T09:33:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5a7c8c579dd1d35dc385724fd34ffe94f90d872f'/>
<id>urn:sha1:5a7c8c579dd1d35dc385724fd34ffe94f90d872f</id>
<content type='text'>
Update cwsr area size for gfx950 to fit the new user queue buffer validation.
The size of LDS calculation is referred from gfx950 thunk implementation.

Signed-off-by: Le Ma &lt;le.ma@amd.com&gt;
Acked-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Handle queue destroy buffer access race</title>
<updated>2024-08-13T14:43:16Z</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2024-08-02T15:28:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a1fc9f584c4aaf8bc1ebfa459fc57a3f26a290d8'/>
<id>urn:sha1:a1fc9f584c4aaf8bc1ebfa459fc57a3f26a290d8</id>
<content type='text'>
Add helper function kfd_queue_unreference_buffers to reduce queue buffer
refcount, separate it from release queue buffers.

Because it is circular locking to hold dqm_lock to take vm lock,
kfd_ioctl_destroy_queue should take vm lock, unreference queue buffers
first, but not release queue buffers, to handle error in case failed to
hold vm lock. Then hold dqm_lock to remove queue from queue list and
then release queue buffers.

Restore process worker restore queue hold dqm_lock, will always find
the queue with valid queue buffers.

v2 (Felix):
- renamed kfd_queue_unreference_buffer(s) to kfd_queue_unref_bo_va(s)
- added two FIXME comments for follow up

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix compile error if HMM support not enabled</title>
<updated>2024-08-06T14:40:30Z</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2024-07-25T23:10:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1cb62da0802c8f08e26443a5409edba99b8a1f6e'/>
<id>urn:sha1:1cb62da0802c8f08e26443a5409edba99b8a1f6e</id>
<content type='text'>
Fixes the below if kernel config not enable HMM support

&gt;&gt; drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:26: error:
implicit declaration of function 'svm_range_from_addr'

&gt;&gt; drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:24: error:
assignment to 'struct svm_range *' from 'int' makes pointer from integer
without a cast [-Wint-conversion]

&gt;&gt; drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:111:28: error:
invalid use of undefined type 'struct svm_range'

Fixes: b049504e211e ("drm/amdkfd: Validate user queue svm memory residency")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202407252127.zvnxaKRA-lkp@intel.com/
Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Fix missing error code in kfd_queue_acquire_buffers</title>
<updated>2024-07-27T21:29:37Z</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2024-07-26T06:47:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7c5b344537a143d15385992e41a50a9c5125e93c'/>
<id>urn:sha1:7c5b344537a143d15385992e41a50a9c5125e93c</id>
<content type='text'>
The fix involves setting 'err' to '-EINVAL' before each 'goto
out_err_unreserve'.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:265 kfd_queue_acquire_buffers()
warn: missing error code 'err'

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c
    226 int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
    227 {
    228         struct kfd_topology_device *topo_dev;
    229         struct amdgpu_vm *vm;
    230         u32 total_cwsr_size;
    231         int err;
    232
    233         topo_dev = kfd_topology_device_by_id(pdd-&gt;dev-&gt;id);
    234         if (!topo_dev)
    235                 return -EINVAL;
    236
    237         vm = drm_priv_to_vm(pdd-&gt;drm_priv);
    238         err = amdgpu_bo_reserve(vm-&gt;root.bo, false);
    239         if (err)
    240                 return err;
    241
    242         err = kfd_queue_buffer_get(vm, properties-&gt;write_ptr, &amp;properties-&gt;wptr_bo, PAGE_SIZE);
    243         if (err)
    244                 goto out_err_unreserve;
    245
    246         err = kfd_queue_buffer_get(vm, properties-&gt;read_ptr, &amp;properties-&gt;rptr_bo, PAGE_SIZE);
    247         if (err)
    248                 goto out_err_unreserve;
    249
    250         err = kfd_queue_buffer_get(vm, (void *)properties-&gt;queue_address,
    251                                    &amp;properties-&gt;ring_bo, properties-&gt;queue_size);
    252         if (err)
    253                 goto out_err_unreserve;
    254
    255         /* only compute queue requires EOP buffer and CWSR area */
    256         if (properties-&gt;type != KFD_QUEUE_TYPE_COMPUTE)
    257                 goto out_unreserve;

This is clearly a success path.

    258
    259         /* EOP buffer is not required for all ASICs */
    260         if (properties-&gt;eop_ring_buffer_address) {
    261                 if (properties-&gt;eop_ring_buffer_size != topo_dev-&gt;node_props.eop_buffer_size) {
    262                         pr_debug("queue eop bo size 0x%lx not equal to node eop buf size 0x%x\n",
    263                                 properties-&gt;eop_buf_bo-&gt;tbo.base.size,
    264                                 topo_dev-&gt;node_props.eop_buffer_size);
--&gt; 265                         goto out_err_unreserve;

This has err in the label name.  err = -EINVAL?

    266                 }
    267                 err = kfd_queue_buffer_get(vm, (void *)properties-&gt;eop_ring_buffer_address,
    268                                            &amp;properties-&gt;eop_buf_bo,
    269                                            properties-&gt;eop_ring_buffer_size);
    270                 if (err)
    271                         goto out_err_unreserve;
    272         }
    273
    274         if (properties-&gt;ctl_stack_size != topo_dev-&gt;node_props.ctl_stack_size) {
    275                 pr_debug("queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
    276                         properties-&gt;ctl_stack_size,
    277                         topo_dev-&gt;node_props.ctl_stack_size);
    278                 goto out_err_unreserve;

err?

    279         }
    280
    281         if (properties-&gt;ctx_save_restore_area_size != topo_dev-&gt;node_props.cwsr_size) {
    282                 pr_debug("queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
    283                         properties-&gt;ctx_save_restore_area_size,
    284                         topo_dev-&gt;node_props.cwsr_size);
    285                 goto out_err_unreserve;

err?  Not sure.

    286         }
    287
    288         total_cwsr_size = (topo_dev-&gt;node_props.cwsr_size + topo_dev-&gt;node_props.debug_memory_size)
    289                           * NUM_XCC(pdd-&gt;dev-&gt;xcc_mask);
    290         total_cwsr_size = ALIGN(total_cwsr_size, PAGE_SIZE);
    291
    292         err = kfd_queue_buffer_get(vm, (void *)properties-&gt;ctx_save_restore_area_address,
    293                                    &amp;properties-&gt;cwsr_bo, total_cwsr_size);
    294         if (!err)
    295                 goto out_unreserve;
    296
    297         amdgpu_bo_unreserve(vm-&gt;root.bo);
    298
    299         err = kfd_queue_buffer_svm_get(pdd, properties-&gt;ctx_save_restore_area_address,
    300                                        total_cwsr_size);
    301         if (err)
    302                 goto out_err_release;
    303
    304         return 0;
    305
    306 out_unreserve:
    307         amdgpu_bo_unreserve(vm-&gt;root.bo);
    308         return 0;
    309
    310 out_err_unreserve:
    311         amdgpu_bo_unreserve(vm-&gt;root.bo);
    312 out_err_release:
    313         kfd_queue_release_buffers(pdd, properties);
    314         return err;
    315 }

Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size")
Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Philip Yang &lt;Philip.Yang@amd.com&gt;
Cc: Felix Kuehling &lt;Felix.Kuehling@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Validate queue cwsr area and eop buffer size</title>
<updated>2024-07-24T18:46:06Z</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2024-06-26T19:03:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=629568d25fea8ece4f65073f039aeef4e240ab67'/>
<id>urn:sha1:629568d25fea8ece4f65073f039aeef4e240ab67</id>
<content type='text'>
When creating KFD user compute queue, check if queue eop buffer size,
cwsr area size, ctl stack size equal to the size of KFD node
properities.

Check the entire cwsr area which may split into multiple svm ranges
aligned to granularity boundary.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Store queue cwsr area size to node properties</title>
<updated>2024-07-24T18:45:58Z</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2024-06-26T18:52:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=517fff221c1e6b8a8db69e7a440116caee120ff5'/>
<id>urn:sha1:517fff221c1e6b8a8db69e7a440116caee120ff5</id>
<content type='text'>
Use the queue eop buffer size, cwsr area size, ctl stack size
calculation from Thunk, store the value to KFD node properties.

Those will be used to validate queue eop buffer size, cwsr area size,
ctl stack size when creating KFD user compute queue.

Those will be exposed to user space via sysfs KFD node properties, to
remove the duplicate calculation code from Thunk.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/amdkfd: Validate user queue svm memory residency</title>
<updated>2024-07-24T18:43:28Z</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2024-06-20T16:44:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b049504e211e8f4dbcd40434f2dcab2215ea1039'/>
<id>urn:sha1:b049504e211e8f4dbcd40434f2dcab2215ea1039</id>
<content type='text'>
Queue CWSR area maybe registered to GPU as svm memory, create queue to
ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag.

Add queue_refcount to struct svm_range, to track queue CWSR area usage.

Because unmap mmu notifier callback return value is ignored, if
application unmap the CWSR area while queue is active, pr_warn message
in dmesg log. To be safe, evict user queue.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Reviewed-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
</feed>
