summaryrefslogtreecommitdiffstats
path: root/drivers/accel/amdxdna
AgeCommit message (Collapse)AuthorLines
2026-03-27Merge tag 'drm-misc-next-2026-03-26' of ↵Dave Airlie-275/+396
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v7.1: UAPI Changes: amdxdna: - support per-BO memory-usage queries docs: - Improve UAPI documentation panthor: - extend timestamp query with flags Core Changes: edid: - provide enum drm_output_color_format; mass-convert drivers gem-dma: - use drm_dev_dma_dev() for DMA mappings - set VM_DONTDUMP on mmap mipi-dbi: - drop simple-display; mass-convert drivers prime: - use drm_dev_dma_dev() for DMA mappings ttm: - improve handling of gfp_retry_mayfail Driver Changes: amdgpu: - use atomic_create_state for private_obj amdxdna: - refactor GEM implementation - fixes bridge: - provide clear-and-put helper for reliable cleanup - analogix_dp: Use DP helpers for link training - lontium-lt8713sx: Fix 64-bit division and Kconfig - samsung-dsim: Use clear-and-put imagination: - improve power-off sequence - support context-reset notification from firmware komeda: - support Arm China Linlon D6 plus DT bindings mediatek: - use drm_dev_dma_dev() for DMA mappings panel: - support Himax HX83121A plus DT bindings - support JuTouch JT070TM041 plus DT bindings - support Samsung S6E8FC0 plus DT bindings - himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support backlight - ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings - simple: support Tianma TM050RDH03 plus DT bindings panthor: - support various sources for timestamp queries - fixes omapdrm: - use atomic_create_state for private_obj rcar-du: - fix suspend/resume wrt VSP interface - fix leak of device_link - clean up sun4i: - use drm_dev_dma_dev() for DMA mappings tegra: - use atomic_create_state for private_obj xe: - send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260326151812.GA76082@linux.fritz.box
2026-03-26BackMerge tag 'v7.0-rc4' into drm-nextDave Airlie-12/+12
Linux 7.0-rc4 Needed for rust tree. Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-03-25accel/amdxdna: Add per-process BO memory usage query supportMax Zhen-13/+142
Add support for querying per-process buffer object (BO) memory usage through the amdxdna GET_ARRAY UAPI. Introduce a new query type, DRM_AMDXDNA_BO_USAGE, along with struct amdxdna_drm_bo_usage to report BO memory usage statistics, including heap, total, and internal usage. Track BO memory usage on a per-client basis by maintaining counters in GEM open/close and heap allocation/free paths. This ensures the reported statistics reflect the current memory footprint of each process. Wire the new query into the GET_ARRAY implementation to expose the usage information to userspace. Link: https://github.com/amd/xdna-driver/commit/0546f2aaadbdacf1c3556410ecd71622044cd916 Signed-off-by: Max Zhen <max.zhen@amd.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260324163159.2425461-1-lizhi.hou@amd.com
2026-03-23accel/amdxdna: Return ERR_PTR on dma_alloc_noncoherent failureWendy Liang-1/+6
dma_alloc_noncoherent() returns NULL on failure, but callers of aie2_alloc_msg_buffer() check for IS_ERR(). Return ERR_PTR(-ENOMEM) instead of NULL to match the amdxdna_iommu_alloc() path and the caller's error checking convention. Fixes: ece3e8980907 ("accel/amdxdna: Allow forcing IOVA-based DMA via module parameter") Signed-off-by: Wendy Liang <wendy.liang@amd.com> Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260323173719.2311474-1-lizhi.hou@amd.com
2026-03-23accel/amdxdna: fix missing newline in pr_err messagehaoyu.lu-1/+1
Add missing newline to pr_err message in amdxdna_mailbox.c. Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox") Signed-off-by: haoyu.lu <hechushiguitu666@gmail.com> Reviewed-by: Lizhi.hou <lizhi.hou@amd.com> Signed-off-by: Lizhi.hou <lizhi.hou@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260323034933.216-1-hechushiguitu666@gmail.com
2026-03-20accel/amdxdna: Refactor GEM BO handling and add helper APIs for address ↵Max Zhen-274/+261
retrieval Refactor amdxdna GEM buffer object (BO) handling to simplify address management and unify BO type semantics. Introduce helper APIs to retrieve commonly used BO addresses: - User virtual address (UVA) - Kernel virtual address (KVA) - Device address (IOVA/PA) These helpers centralize address lookup logic and avoid duplicating BO-specific handling across submission and execution paths. This also improves readability and reduces the risk of inconsistent address handling in future changes. As part of the refactor: - Rename SHMEM BO type to SHARE to better reflect its usage. - Merge CMD BO handling into SHARE, removing special-case logic for command buffers. - Consolidate BO type handling paths to reduce code duplication and simplify maintenance. No functional change is intended. The refactor prepares the driver for future enhancements by providing a cleaner abstraction for BO address management. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Max Zhen <max.zhen@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260320210615.1973016-1-lizhi.hou@amd.com
2026-03-17accel/amdxdna: Support retrieving hardware context debug informationLizhi Hou-11/+213
The firmware implements the GET_APP_HEALTH command to collect debug information for a specific hardware context. When a command times out, the driver issues this command to collect the relevant debug information. User space tools can also retrieve this information through the hardware context query IOCTL. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260317044906.1513133-1-lizhi.hou@amd.com
2026-03-16accel/amdxdna: Add debug prints for command submissionLizhi Hou-5/+29
Add debug prints to help diagnose issues with incoming command submissions. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260316175642.1451749-1-lizhi.hou@amd.com
2026-03-12accel/amdxdna: Allow forcing IOVA-based DMA via module parameterLizhi Hou-32/+325
The amdxdna driver normally performs DMA using userspace virtual address plus PASID. For debugging and validation purposes, add a module parameter, force_iova, to force DMA to go through IOMMU IOVA mapping. When force_iova=1 is set, the driver will allocate and map DMA buffers using IOVA. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260126193001.1400545-1-lizhi.hou@amd.com
2026-03-12Merge drm/drm-next into drm-misc-nextMaxime Ripard-113/+167
Biju Das needs a patch for rz-du merged in 7.0-rc3 Signed-off-by: Maxime Ripard <mripard@kernel.org>
2026-03-11accel/amdxdna: Support sensors for column utilizationMario Limonciello (AMD)-5/+37
The AMD PMF driver provides realtime column utilization (npu_busy) metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor query to expose these metrics to userspace. Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum and update aie2_get_sensors() to return both the total power and up to 8 column utilization sensors if the user buffer permits. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> [lizhi: support legacy tool which uses small buffer. checkpatch cleanup] Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260311171842.473453-1-lizhi.hou@amd.com
2026-03-11accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimateLizhi Hou-1/+51
The AMD PMF driver provides an interface to obtain realtime power estimates for the NPU. Expose this information to userspace through a new DRM_IOCTL_AMDXDNA_GET_INFO parameter, allowing applications to query the current NPU power level. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> (Update comment to indicate power and utilization) Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260228061109.361239-2-superm1@kernel.org
2026-03-11accel/amdxdna: Import AMD_PMF namespaceMario Limonciello (AMD)-0/+1
The amdxdna driver uses amd_pmf_get_npu_data() which is exported in the AMD_PMF namespace. Import the AMD_PMF namespace. Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260301005028.367618-1-superm1@kernel.org
2026-03-10accel/amdxdna: Fix runtime suspend deadlock when there is pending jobLizhi Hou-12/+12
The runtime suspend callback drains the running job workqueue before suspending the device. If a job is still executing and calls pm_runtime_resume_and_get(), it can deadlock with the runtime suspend path. Fix this by moving pm_runtime_resume_and_get() from the job execution routine to the job submission routine, ensuring the device is resumed before the job is queued and avoiding the deadlock during runtime suspend. Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260310180058.336348-1-lizhi.hou@amd.com
2026-03-05accel/amdxdna: Split mailbox channel create functionLizhi Hou-90/+112
The management channel used for firmware control command submission is currently created after the firmware is started. If channel creation fails (for example, due to memory allocation failure or workqueue creation interruption), the firmware remains in a pending state and is unable to receive any control commands. To avoid leaving the firmware in this inconsistent state, split xdna_mailbox_create_channel() into two separate functions so that resource allocation can be completed before interacting with the hardware. xdna_mailbox_alloc_channel() Allocates memory and initializes the workqueue. This can be called earlier, before interacting with the hardware. xdna_mailbox_start_channel() Performs the hardware interaction required to start the channel. Rename xdna_mailbox_destroy_channel() to xdna_mailbox_free_channel(). Ensure that xdna_mailbox_stop_channel() and xdna_mailbox_free_channel() properly unwind the corresponding start and allocation steps, respectively. Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260305062041.3954024-1-lizhi.hou@amd.com
2026-03-04accel/amdxdna: Fix major version check on NPU1 platformLizhi Hou-1/+1
Add the missing major number in npu1_fw_feature_table. Without the major version specified, the firmware feature check fails, preventing new firmware commands from being enabled on the NPU1 platform. With the correct major version populated, the driver properly detects firmware support and enables the new command. Fixes: f1eac46fe5f7 ("accel/amdxdna: Update firmware version check for latest firmware") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260304195012.3616908-1-lizhi.hou@amd.com
2026-03-02accel/amdxdna: Fix NULL pointer dereference of mgmt_channLizhi Hou-10/+19
mgmt_chann may be set to NULL if the firmware returns an unexpected error in aie2_send_mgmt_msg_wait(). This can later lead to a NULL pointer dereference in aie2_hw_stop(). Fix this by introducing a dedicated helper to destroy mgmt_chann and by adding proper NULL checks before accessing it. Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260226213857.3068474-1-lizhi.hou@amd.com
2026-02-27accel/amdxdna: Fill invalid payload for failed commandLizhi Hou-15/+38
Newer userspace applications may read the payload of a failed command to obtain detailed error information. However, the driver and old firmware versions may not support returning advanced error information. In this case, initialize the command payload with an invalid value so userspace can detect that no detailed error information is available. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260227004841.3080241-1-lizhi.hou@amd.com
2026-02-25accel/amdxdna: Use a different name for latest firmwareLizhi Hou-5/+26
Using legacy driver with latest firmware causes a power off issue. Fix this by assigning a different filename (npu_7.sbin) to the latest firmware. The driver attempts to load the latest firmware first and falls back to the previous firmware version if loading fails. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5009 Fixes: f1eac46fe5f7 ("accel/amdxdna: Update firmware version check for latest firmware") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260225204752.2711734-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Validate command buffer payload countLizhi Hou-1/+4
The count field in the command header is used to determine the valid payload size. Verify that the valid payload does not exceed the remaining buffer space. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260219211946.1920485-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Prevent ubuf size overflowLizhi Hou-1/+5
The ubuf size calculation may overflow, resulting in an undersized allocation and possible memory corruption. Use check_add_overflow() helpers to validate the size calculation before allocation. Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260217192815.1784689-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Fix out-of-bounds memset in command slot handlingLizhi Hou-4/+4
The remaining space in a command slot may be smaller than the size of the command header. Clearing the command header with memset() before verifying the available slot space can result in an out-of-bounds write and memory corruption. Fix this by moving the memset() call after the size validation. Fixes: 3d32eb7a5ecf ("accel/amdxdna: Fix cu_idx being cleared by memset() during command setup") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260217185415.1781908-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Fix command hang on suspended hardware contextLizhi Hou-7/+11
When a hardware context is suspended, the job scheduler is stopped. If a command is submitted while the context is suspended, the job is queued in the scheduler but aie2_sched_job_run() is never invoked to restart the hardware context. As a result, the command hangs. Fix this by modifying the hardware context suspend routine to keep the job scheduler running so that queued jobs can trigger context restart properly. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260211205341.722982-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Fix suspend failure after enabling turbo modeLizhi Hou-4/+5
Enabling turbo mode disables hardware clock gating. Suspend requires hardware clock gating to be re-enabled, otherwise suspend will fail. Fix this by calling aie2_runtime_cfg() from aie2_hw_stop() to re-enable clock gating during suspend. Also ensure that firmware is initialized in aie2_hw_start() before modifying clock-gating settings during resume. Fixes: f4d7b8a6bc8c ("accel/amdxdna: Enhance power management settings") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260211204716.722788-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Fix dead lock for suspend and resumeLizhi Hou-19/+26
When an application issues a query IOCTL while auto suspend is running, a deadlock can occur. The query path holds dev_lock and then calls pm_runtime_resume_and_get(), which waits for the ongoing suspend to complete. Meanwhile, the suspend callback attempts to acquire dev_lock and blocks, resulting in a deadlock. Fix this by releasing dev_lock before calling pm_runtime_resume_and_get() and reacquiring it after the call completes. Also acquire dev_lock in the resume callback to keep the locking consistent. Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260211204644.722758-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Reduce log noise during process terminationMario Limonciello-3/+7
During process termination, several error messages are logged that are not actual errors but expected conditions when a process is killed or interrupted. This creates unnecessary noise in the kernel log. The specific scenarios are: 1. HMM invalidation returns -ERESTARTSYS when the wait is interrupted by a signal during process cleanup. This is expected when a process is being terminated and should not be logged as an error. 2. Context destruction returns -ENODEV when the firmware or device has already stopped, which commonly occurs during cleanup if the device was already torn down. This is also an expected condition during orderly shutdown. Downgrade these expected error conditions from error level to debug level to reduce log noise while still keeping genuine errors visible. Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260210164521.1094274-3-mario.limonciello@amd.com
2026-02-23accel/amdxdna: Fix crash when destroying a suspended hardware contextLizhi Hou-0/+3
If userspace issues an ioctl to destroy a hardware context that has already been automatically suspended, the driver may crash because the mailbox channel pointer is NULL for the suspended context. Fix this by checking the mailbox channel pointer in aie2_destroy_context() before accessing it. Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260206060306.4050531-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Switch to always use chained commandLizhi Hou-2/+2
Preempt commands are only supported when submitted as chained commands. To ensure preempt support works consistently, always submit commands in chained command format. Set force_cmdlist to true so that single commands are filled using the chained command layout, enabling correct handling of preempt commands. Fixes: 3a0ff7b98af4 ("accel/amdxdna: Support preemption requests") Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260206060251.4050512-1-lizhi.hou@amd.com
2026-02-23accel/amdxdna: Remove buffer size check when creating command BOLizhi Hou-19/+19
Large command buffers may be used, and they do not always need to be mapped or accessed by the driver. Performing a size check at command BO creation time unnecessarily rejects valid use cases. Remove the buffer size check from command BO creation, and defer vmap and size validation to the paths where the driver actually needs to map and access the command buffer. Fixes: ac49797c1815 ("accel/amdxdna: Add GEM buffer object management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260206060237.4050492-1-lizhi.hou@amd.com
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds-2/+1
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds-3/+3
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds-16/+16
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook-21/+21
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-04accel/amdxdna: Move RPM resume into job run functionLizhi Hou-10/+9
Currently, amdxdna_pm_resume_get() is called during job creation, and amdxdna_pm_suspend_put() is called when the hardware notifies job completion. If a job is canceled before it is run, no hardware completion notification is generated, resulting in an unbalanced runtime PM resume/suspend pair. Fix this by moving amdxdna_pm_resume_get() to the job run path, ensuring runtime PM is only resumed for jobs that are actually executed. Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260204171118.3165607-1-lizhi.hou@amd.com
2026-02-04accel/amdxdna: Fix incorrect DPM level after suspend/resumeLizhi Hou-2/+3
The suspend routine sets the DPM level to 0, which unintentionally overwrites the previously saved DPM level. As a result, the device always resumes with DPM level 0 instead of restoring the original value. Fix this by ensuring the suspend path does not overwrite the saved DPM level, allowing the correct DPM level to be restored during resume. Fixes: f4d7b8a6bc8c ("accel/amdxdna: Enhance power management settings") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260204171048.3165580-1-lizhi.hou@amd.com
2026-02-03accel/amdxdna: Fix incorrect error code returned for failed chain commandLizhi Hou-1/+1
The driver currently returns an incorrect error code when a chain command fails. In this case, ERT_CMD_STATE_ERROR is expected to be reported for failed chain commands. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260203184037.2751889-1-lizhi.hou@amd.com
2026-02-03accel/amdxdna: Remove hardware context statusLizhi Hou-28/+5
One newly supported command does not require hardware context configuration to be performed upfront. As a result, checking hardware context status causes this command to fail incorrectly. Remove hardware context status handling entirely. For other commands, if userspace submits a request without configuring the hardware context first, the firmware will report an error or time out as appropriate. Fixes: aac243092b70 ("accel/amdxdna: Add command execution") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260202212450.2681273-1-lizhi.hou@amd.com
2026-01-30accel/amdxdna: Fix memory leak in amdxdna_ubuf_mapZishun Yi-2/+8
The amdxdna_ubuf_map() function allocates memory for sg and internal sg table structures, but it fails to free them if subsequent operations (sg_alloc_table_from_pages or dma_map_sgtable) fail. Fixes: bd72d4acda10 ("accel/amdxdna: Support user space allocated buffer") Signed-off-by: Zishun Yi <zishun.yi.dev@gmail.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Min Ma <mamin506@gmail.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260129171022.68578-1-zishun.yi.dev@gmail.com
2026-01-30accel/amdxdna: Stop job scheduling across aie2_release_resource()Lizhi Hou-0/+6
Running jobs on a hardware context while it is in the process of releasing resources can lead to use-after-free and crashes. Fix this by stopping job scheduling before calling aie2_release_resource() and restarting it after the release completes. Additionally, aie2_sched_job_run() now checks whether the hardware context is still active. Fixes: 4fd6ca90fc7f ("accel/amdxdna: Refactor hardware context destroy routine") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260130003255.2083255-1-lizhi.hou@amd.com
2026-01-30accel/amdxdna: Hold mm structure across iommu_sva_unbind_device()Lizhi Hou-0/+4
Some tests trigger a crash in iommu_sva_unbind_device() due to accessing iommu_mm after the associated mm structure has been freed. Fix this by taking an explicit reference to the mm structure after successfully binding the device, and releasing it only after the device is unbound. This ensures the mm remains valid for the entire SVA bind/unbind lifetime. Fixes: be462c97b7df ("accel/amdxdna: Add hardware context") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260128002356.1858122-1-lizhi.hou@amd.com
2026-01-16Merge tag 'drm-misc-next-2026-01-15' of ↵Dave Airlie-58/+54
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.20: Core Changes: - atomic: Introduce Gamma/Degamma LUT size check - gem: Fix a leak in drm_gem_get_unmapped_area - gpuvm: API sanitation for Rust bindings - panic: Few corner-cases fixes Driver Changes: - Replace system workqueue with percpu equivalent - amdxdna: Update message buffer allocation requirements, Update firmware version check - imagination: Add AM62P support - ivpu: Implement warm boot flow - rockchip: Get rid of atomic_check fixups, Add Rockchip RK3506 Support - rocket: Cleanups - bridge: - dw-hdmi-qp: Add support for HPD-less setups - panel: - mantix: Various power management related improvements - new panels: Innolux G150XGE-L05, - dma-buf: - cma: Call clear_page instead of memset Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260115-lilac-dragon-of-opposition-ac0a30@houat
2026-01-14accel/amdxdna: Fix notifier_wq flushing warningLizhi Hou-1/+1
Create notifier_wq with WQ_MEM_RECLAIM flag to fix the possible warning. workqueue: WQ_MEM_RECLAIM amdxdna_js:drm_sched_free_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM notifier_wq:0x0 Fixes: e486147c912f ("accel/amdxdna: Add BO import and export") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20260113173624.256053-1-lizhi.hou@amd.com
2026-01-08accel/amdxdna: Update firmware version check for latest firmwareLizhi Hou-42/+20
The latest firmware increases the major version number. Update aie2_check_protocol() to accept and support the new firmware version. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251219014356.2234241-2-lizhi.hou@amd.com
2026-01-08accel/amdxdna: Update message DMA buffer allocationLizhi Hou-15/+33
The latest firmware requires the message DMA buffer to - have a minimum size of 8K - use a power-of-two size - be aligned to the buffer size - not cross 64M boundary Update the buffer allocation logic to meet these requirements and support the latest firmware. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251219014356.2234241-1-lizhi.hou@amd.com
2025-12-26Merge tag 'drm-misc-next-2025-12-19' of ↵Dave Airlie-178/+101
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.20: Core Changes: - dma-buf: Add tracepoints - sched: Introduce new helpers Driver Changes: - amdxdna: Enable hardware context priority, Remove (obsolete and never public) NPU2 Support, Race condition fix - rockchip: Add RK3368 HDMI Support - rz-du: Add RZ/V2H(P) MIPI-DSI Support - panels: - st7571: Introduce SPI support - New panels: Sitronix ST7920, Samsung LTL106HL02, LG LH546WF1-ED01, HannStar HSD156JUW2 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20251219-arcane-quaint-skunk-e383b0@houat
2025-12-26Merge tag 'drm-misc-next-2025-12-12' of ↵Dave Airlie-70/+64
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.19: UAPI Changes: - panfrost: Add PANFROST_BO_SYNC ioctl - panthor: Add PANTHOR_BO_SYNC ioctl Core Changes: - atomic: Add drm_device pointer to drm_private_obj - bridge: Introduce drm_bridge_unplug, drm_bridge_enter, and drm_bridge_exit - dma-buf: Improve sg_table debugging - dma-fence: Add new helpers, and use them when needed - dp_mst: Avoid out-of-bounds access with VCPI==0 - gem: Reduce page table overhead with transparent huge pages - panic: Report invalid panic modes - sched: Add TODO entries - ttm: Various cleanups - vblank: Various refactoring and cleanups - Kconfig cleanups - Removed support for kdb Driver Changes: - amdxdna: Fix race conditions at suspend, Improve handling of zero tail pointers, Fix cu_idx being overwritten during command setup - ast: Support imported cursor buffers - - panthor: Enable timestamp propagation, Multiple improvements and fixes to improve the overall robustness, notably of the scheduler. - panels: - panel-edp: Support for CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H Signed-off-by: Dave Airlie <airlied@redhat.com> [airlied: fix mm conflict] From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20251212-spectacular-agama-of-abracadabra-aaef32@penduick
2025-12-18accel/amdxdna: Enable hardware context priorityLizhi Hou-1/+27
Newer firmware supports hardware context priority. Set the priority based on application input. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251217171719.2139025-1-lizhi.hou@amd.com
2025-12-18accel/amdxdna: Enable temporal sharing only modeLizhi Hou-4/+21
Newer firmware versions prefer temporal sharing only mode. In this mode, the driver no longer needs to manage AIE array column allocation. Instead, a new field, num_unused_col, is added to the hardware context creation request to specify how many columns will not be used by this hardware context. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251217191150.2145937-1-lizhi.hou@amd.com
2025-12-18accel/amdxdna: Remove NPU2 supportLizhi Hou-120/+0
NPU2 hardware was never publicly released and is now obsolete. Remove all remaining NPU2 support from the driver. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251217190818.2145781-1-lizhi.hou@amd.com
2025-12-17accel/amdxdna: Remove amdxdna_flush()Lizhi Hou-18/+11
amdxdna_flush() was introduced to ensure that the device does not access a process address space after it has been freed. However, this is no longer necessary because the driver now increments the mm reference count when a command is submitted and decrements it only after the command has completed. This guarantees that the process address space remains valid for the entire duration of command execution. Remove amdxdna_flush to simplify the teardown path. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251216031311.2033399-1-lizhi.hou@amd.com