aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/stackcollapse.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-18drm/xe/configfs: Add post context restore bbLucas De Marchi1-2/+278
Allow the user to specify commands to execute during a context restore. Currently it's possible to parse 2 types of actions: - cmd: the instructions are added as is to the bb - reg: just use the address and value, without worrying about encoding the right LRI instruction. This is possibly the most useful use case, so added a dedicated action for that. This also prepares for future BBs: mid context restore and rc6 context restore that can re-use the same parsing functions. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-4-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/lrc: Allow to add user commands on context switchLucas De Marchi3-0/+51
During validation it's useful to allows additional commands to be executed on context switch. Fetch the commands from configfs (to be added) and add them to the WA BB. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-3-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/configfs: Allow to select by class onlyLucas De Marchi1-15/+35
For a future configfs attribute, it's desirable to select by engine mask only as the instance doesn't make sense. Rename the function lookup_engine_mask() to lookup_engine_info() and make it return the entry. This allows parse_engine() to still return an item if the caller wants to allow parsing a class-only string like "rcs", "bcs", "ccs", etc. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-2-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/configfs: Extract function to parse engineLucas De Marchi1-11/+21
Move the part that copies the engine to a local buffer so it can be shared in future for other configfs attributes parsing an engine. Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-1-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/xe_late_bind_fw: Extract and print version infoBadal Nilawar3-0/+193
Extract and print version info of the late binding binary. v2: Some refinements (Daniele) Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-10-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/xe_late_bind_fw: Introduce debug fs node to disable late bindingBadal Nilawar3-0/+46
Introduce a debug filesystem node to disable late binding fw reload during the system or runtime resume. This is intended for situations where the late binding fw needs to be loaded from user mode, perticularly for validation purpose. Note that xe kmd doesn't participate in late binding flow from user space. Binary loaded from the userspace will be lost upon entering to D3 cold hence user space app need to handle this situation. v2: - s/(uval == 1) ? true : false/!!uval/ (Daniele) v3: - Refine the commit message (Daniele) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-9-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/xe_late_bind_fw: Reload late binding fw during system resumeBadal Nilawar1-0/+4
Reload late binding fw during resume from system suspend v2: - Unconditionally reload late binding fw (Rodrigo) - Flush worker during system suspend Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-8-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/xe_late_bind_fw: Reload late binding fw in rpm resumeBadal Nilawar3-1/+6
Reload late binding fw during runtime resume. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-7-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/xe_late_bind_fw: Load late binding firmwareBadal Nilawar3-2/+165
Load late binding firmware v2: - s/EAGAIN/EBUSY/ - Flush worker in suspend and driver unload (Daniele) v3: - Use retry interval of 6s, in steps of 200ms, to allow other OS components release MEI CL handle (Sasha) v4: - return -ENODEV if component not added (Daniele) - parse and print status returned by csc v5: - Use payload to check firmware valid (Daniele) - Obtain the RPM reference before scheduling the worker to ensure the device remains awake until the worker completes firmware loading (Rodrigo) v6: - In case of error donot re-attempt fw download (Daniele) v7 (Rodrigo): - Rename of mei structs and callback. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-6-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/xe_late_bind_fw: Initialize late binding firmwareBadal Nilawar2-1/+129
Search for late binding firmware binaries and populate the meta data of firmware structures. v2 (Daniele): - drm_err if firmware size is more than max pay load size - s/request_firmware/firmware_request_nowarn/ as firmware will not be available for all possible cards v3 (Daniele): - init firmware from within xe_late_bind_init, propagate error - switch late_bind_fw to array to handle multiple firmware types v4 (Daniele): - Alloc payload dynamically, fix nits v6 (Daniele) - %s/MAX_PAYLOAD_SIZE/XE_LB_MAX_PAYLOAD_SIZE/ Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-5-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fwBadal Nilawar8-0/+147
Introduce xe_late_bind_fw to enable firmware loading for the devices, such as the fan controller, during the driver probe. Typically, firmware for such devices are part of IFWI flash image but can be replaced at probe after OEM tuning. This patch binds mei late binding component to enable firmware loading. v2: - Add devm_add_action_or_reset to remove the component (Daniele) - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele) v3: - Fail driver probe if late bind initialization fails, add has_late_bind flag (Daniele) v4: - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/ v6: - rebased v7: - rebased - In xe_late_bind_init, use drm_err when returning an error to stop the probe (Lucas) - Use imperative mode in commit message (Lucas) Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18mei: late_bind: add late binding component driverAlexander Usyskin5-0/+397
Introduce a new MEI client driver to support Late Binding firmware upload/update for Intel discrete graphics platforms. Late Binding is a runtime firmware upload/update mechanism that allows payloads, such as fan control and voltage regulator, to be securely delivered and applied without requiring SPI flash updates or system reboots. This driver enables the Xe graphics driver and other user-space tools to push such firmware blobs to the authentication firmware via the MEI interface. The driver handles authentication, versioning, and communication with the authentication firmware, which in turn coordinates with the PUnit/PCODE to apply the payload. This is a foundational component for enabling dynamic, secure, and re-entrant configuration updates on platforms like Battlemage. Cc: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20250905154953.3974335-3-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18mei: bus: add mei_cldev_mtu interfaceAlexander Usyskin2-0/+14
Add a new helper function that allows MEI client drivers to query the maximum transmission unit (MTU) for a connected MEI client. This is useful for clients that need to transmit large payloads, such as firmware blobs, allowing them to determine the maximum message size that can be safely sent before starting transmission and size of the buffer to allocate when receiving data. Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20250905154953.3974335-2-badal.nilawar@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18drm/xe: Work around clang multiple goto-label errorThomas Hellström1-10/+19
When using drm_exec_retry_on_contention(), clang may consider all labels for which we take addresses in a function as potential retry goto targets, although strictly only one is possible. It will then in some situations generate false positive errors. In this case, the compiler, for some architectures, consider the might_lock(&m->job_mutex); as a potential goto target from drm_exec_retry_on_contention(), and errors. Work around that by moving the xe_validate / drm_exec transaction to a separate function. v2: - New commit message based on analysis of Nathan Chancellor Fixes: 59eabff2a352 ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction") Cc: Matthew Brost <matthew.brost@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509101853.nDmyxTEM-lkp@intel.com/ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> # build Link: https://lore.kernel.org/r/20250911080324.180307-1-thomas.hellstrom@linux.intel.com
2025-09-17drm/xe/sysfs: Simplify sysfs registrationMichal Wajdeczko1-63/+41
Instead of manually maintaining each sysfs file define and use attribute groups and register them using device managed function. Then use is_visible() to filter-out unsupported attributes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-3-michal.wajdeczko@intel.com
2025-09-17drm/xe/vf: Don't expose sysfs attributes not applicable for VFsMichal Wajdeczko1-1/+1
VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE (already guarded by the info.skip_pcode flag) so we shouldn't expose attributes that require any of them to avoid errors like: [] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \ inaccessible register 0x138340+0x0 [] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe] [] Call Trace: [] xe_mmio_read32+0x110/0x280 [xe] [] auto_link_downgrade_capable_show+0x2e/0x70 [xe] [] dev_attr_show+0x1a/0x70 [] sysfs_kf_seq_show+0xaa/0x120 [] kernfs_seq_show+0x41/0x60 Fixes: 0e414bf7ad01 ("drm/xe: Expose PCIe link downgrade attributes") Fixes: cdc36b66cd41 ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com
2025-09-17drm/xe/madvise: Fix ioctl argument checkShuicheng Lin1-4/+2
It is "preferred_mem_loc" instead of "atomic" for the ATTR_PREFERRED_LOC path. Also include 2 minor changes with no functional impact. 1. Remove the redundant "attr.atomic_access" assignment. 2. Replace down_read_interruptible() with xe_svm_notifier_lock_interruptible() to pair with xe_svm_notifier_unlock(). Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250911173139.1405878-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-09-17drm/xe: Misc refine for svmShuicheng Lin1-14/+11
These changes should have no functional impact. 1. Correct typo of "operation"in macro range_debug(). 2. Combine 2 spin_lock() call in xe_svm_garbage_collector() into 1. 3. Drop redundant preferred_region_is_vram check in xe_svm_range_needs_migrate_to_vram(). 4. Combine the devmem_possible check in xe_svm_handle_pagefault(). need_vram includes the IS_DGFX() check, so there is no change for .devmem_only. v2: revert !ctx.devmem_only change (Matt) v3: rebase code and refine commit message. v4: rebase code and refine commit message. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250911031405.1371812-2-shuicheng.lin@intel.com
2025-09-17drm/xe/tests: Add pre-GMDID IP descriptors to param generatorsMichal Wajdeczko1-0/+35
Recently introduced kunit parameter generators were based on the existing arrays which have only GDMID-based IPs and didn't take into account IP definitions from pre-GMDID era. Add test only arrays with pre-GMDID IPs (as those will not change) and extend param generators to start iterating over them. [ ] =================== xe_pci (2 subtests) ==================== [ ] ==================== check_graphics_ip ==================== [ ] [PASSED] 12.00 Xe_LP [ ] [PASSED] 12.10 Xe_LP+ [ ] [PASSED] 12.55 Xe_HPG [ ] [PASSED] 12.60 Xe_HPC [ ] [PASSED] 12.70 Xe_LPG [ ] [PASSED] 12.71 Xe_LPG [ ] [PASSED] 12.74 Xe_LPG+ [ ] [PASSED] 20.01 Xe2_HPG [ ] [PASSED] 20.02 Xe2_HPG [ ] [PASSED] 20.04 Xe2_LPG [ ] [PASSED] 30.00 Xe3_LPG [ ] [PASSED] 30.01 Xe3_LPG [ ] [PASSED] 30.03 Xe3_LPG [ ] ================ [PASSED] check_graphics_ip ================ [ ] ===================== check_media_ip ====================== [ ] [PASSED] 12.00 Xe_M [ ] [PASSED] 12.55 Xe_HPM [ ] [PASSED] 13.00 Xe_LPM+ [ ] [PASSED] 13.01 Xe2_HPM [ ] [PASSED] 20.00 Xe2_LPM [ ] [PASSED] 30.00 Xe3_LPM [ ] [PASSED] 30.02 Xe3_LPM [ ] ================= [PASSED] check_media_ip ================== [ ] ===================== [PASSED] xe_pci ====================== Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916171645.3335-1-michal.wajdeczko@intel.com
2025-09-16drm/xe: Allow error injection for xe_pxp_exec_queue_addDaniele Ceraolo Spurio1-0/+1
This will allow us to simulate this function returning an error like we do for other functions called in the exec_queue_create path. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-4-daniele.ceraolospurio@intel.com
2025-09-16drm/xe: Fix error handling if PXP fails to startDaniele Ceraolo Spurio6-41/+72
Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini). Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
2025-09-16drm/xe: Remove duplicate header filesYang Li2-4/+2
Fix some duplicate includes in xe: ./drivers/gpu/drm/xe/xe_tlb_inval.c: xe_tlb_inval.h is included more than once. ./drivers/gpu/drm/xe/xe_pt.c: xe_tlb_inval_job.h is included more than once. While at it, also sort the include lines alphabetically. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24705 Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=24706 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [Reword commit message] Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916021039.1632766-1-yang.lee@linux.alibaba.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-16drm/xe/guc: Return an error code if the GuC load failsJohn Harrison1-4/+9
Due to multiple explosion issues in the early days of the Xe driver, the GuC load was hacked to never return a failure. That prevented kernel panics and such initially, but now all it achieves is creating more confusing errors when the driver tries to submit commands to a GuC it already knows is not there. So fix that up. As a stop-gap and to help with debug of load failures due to invalid GuC init params, a wedge call had been added to the inner GuC load function. The reason being that it leaves the GuC log accessible via debugfs. However, for an end user, simply aborting the module load is much cleaner than wedging and trying to continue. The wedge blocks user submissions but it seems that various bits of the driver itself still try to submit to a dead GuC and lots of subsequent errors occur. And with regards to developers debugging why their particular code change is being rejected by the GuC, it is trivial to either add the wedge back in and hack the return code to zero again or to just do a GuC log dump to dmesg. v2: Add support for error injection testing and drop the now redundant wedge call. CC: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250909224132.536320-1-John.C.Harrison@Intel.com
2025-09-16drm/xe/sysfs: Add cleanup action in xe_device_sysfs_initZongyao Bai1-2/+6
On partial failure, some sysfs files created before the failure might not be removed. Add common cleanup step to remove them all immediately, as is should be harmless to attempt to remove non-existing files. Fixes: 0e414bf7ad01 ("drm/xe: Expose PCIe link downgrade attributes") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Zongyao Bai <zongyao.bai@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250915214716.1327379-2-zongyao.bai@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-15drm/xe/guc: Add test for G2G communicationsJohn Harrison8-0/+801
Add a test for sending messages from every GuC to every other GuC to test G2G communications. Note that, being a debug only feature, the test interface only exists in pre-production builds of the GuC firmware. v2: Fix 'default' case to actually use the driver's registration code as well as allocation. Add comments explaining the different test types. Fix (C) date and an assert. Review feedback from Daniele. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-5-John.C.Harrison@Intel.com
2025-09-15drm/xe: Allow freeing of a managed boJohn Harrison2-0/+6
If a bo is created via xe_managed_bo_create_pin_map() then it cannot be freed by the driver using xe_bo_unpin_map_no_vm(), or indeed any other existing function. The DRM layer will still have a pointer stashed away for later freeing, causing a invalid memory access on driver unload. So add a helper for releasing the DRM action as well. v2: Drop 'xe' parameter (review feedbak from Michal W) Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-4-John.C.Harrison@Intel.com
2025-09-15drm/xe/guc: Add firmware build type to available infoJohn Harrison2-0/+4
Some test features are not available in production builds of the GuC firmware. So add the build type field to the available information that tests can inspect to decide if they should skip or run. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-3-John.C.Harrison@Intel.com
2025-09-15drm/xe/guc: Update CSS header structuresJohn Harrison2-35/+53
Rework the CSS header structure according to recent updates to the GuC API spec. Also include more field definitions. v2: Also pass the new GuC specific structure to a GuC specific function instead of the higher level, generic structure (review feedback from Daniele). Also correct naming of CSS_TIME_* fields. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-2-John.C.Harrison@Intel.com
2025-09-15drm/xe: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...))Fushuai Wang1-1/+1
Use ERR_CAST inline function instead of ERR_PTR(PTR_ERR(...)). Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250914101630.17719-1-wangfushuai@baidu.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-15drm/xe: Use ARRAY_SIZE in guc_waklv_init()Lucas De Marchi1-2/+2
Prefer using ARRAY_SIZE where needed and just passing 1 instead of calculating the size of one element. Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508130158.eogeBZQT-lkp@intel.com/ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250912-guc-ads-array-size-v1-1-a6555392a1f8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-15drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()Dan Carpenter1-2/+2
The xe_preempt_fence_create() function returns error pointers. It never returns NULL. Update the error checking to match. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-12drm/xe: defer free of NVM auxiliary container to device release callbackNitin Gote1-1/+4
Do not kfree the intel_dg_nvm_dev in xe_nvm_fini() right after auxiliary_device_delete/uninit. The auxiliary_device embeds the device/kobject (and its name); freeing it too early can race with asynchronous device_del/udev processing and cause a use-after-free. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Fixes: c28bfb107dac ("drm/xe/nvm: add on-die non-volatile memory device") Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250911052823.226696-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12drm/xe/configfs: Fix documentation warningLucas De Marchi1-2/+2
Fix this warning while building the documentation: Documentation/gpu/xe/xe_configfs:9: drivers/gpu/drm/xe/xe_configfs.c:138: WARNING: Definition list ends without a blank line; unexpected unindent. That also makes it better formatted in the output. While at it, also fix the underline length in "Overview". Fixes: e2b33fce5eb0 ("drm/xe/configfs: Improve documentation steps") Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-2-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12drm/xe: Update workaround documentationLucas De Marchi1-13/+32
Bring it up to reality, better documenting the existing batch buffers, OOB rules and fixing some typos. Bspec: 60122 Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250911-wa-bb-cmds-v4-1-c8f7e48f7eae@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12drm/xe/hwmon: Remove type castingMallesh Koujalagi1-16/+19
Refactor: eliminate type casts by using proper u32 declarations. v2: - Address review comments. (Karthik) v3: - Use the proper u32 type and drop cast. (Lucas De Marchi) - Modify variable when actually using u64 value. - Change r value to reg_value with u32 type. v4: - Remove newline between trailer and Signed-off-by. (Lucas De Marchi) - Change reg_val to val for more user-friendly logging. - Use mul_u32_u32 function since both values are u32. v5: - mul_u32_u32 function with shift. (Lucas De Marchi) Fixes: 7596d839f6228 ("drm/xe/hwmon: Add support to manage power limits though mailbox") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12drm/xe/guc: Fix spelling mistake "sheduling" -> "scheduling"Colin Ian King1-1/+1
There is a spelling mistake in a xe_gt_err error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250912074330.1275279-1-colin.i.king@gmail.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12drm/xe/xe3: Extend Wa_18041344222 to graphics IP versions 30.00 and 30.01Harish Chegondi1-0/+7
Apply WA 18041344222 to Xe3 LPG graphics IP versions 30.00 and 30.01 too. Bspec: 56024 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/7368f8059013424ac94f4a01c23f9c98a37b06dc.1757552915.git.harish.chegondi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-12drm/xe: Fix circular locking dependencyRodrigo Vivi2-1/+3
Fix this: ====================================================== WARNING: possible circular locking dependency detected 6.17.0-rc4-lgci-xe-xe-pw-153723v2+ #1 Tainted: G S U ------------------------------------------------------ xe_pm/11324 is trying to acquire lock: ffff8881085f22a0 (&pc->freq_lock){+.+.}-{3:3}, at: xe_guc_pc_start+0x39f/0xf70 [xe] but task is already holding lock: ffffffffa1020420 (xe_rpm_nod3cold_map){+.+.}-{0:0}, at: xe_rpm_lockmap_acquire+0x1a/0x70 [xe] which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); lock(xe_rpm_nod3cold_map); lock(&pc->freq_lock); Reported-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6122 Fixes: 60d2b7899142 ("drm/xe/guc: Add SLPC power profile interface") Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250911212024.966757-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-12drm/xe: Use tile-oriented messages in GGTT codeMichal Wajdeczko1-6/+6
Use recently added macros to print tile-oriented messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-6-michal.wajdeczko@intel.com
2025-09-12drm/xe: Add dedicated printk macros for tile and deviceMichal Wajdeczko3-6/+261
We already have dedicated helper macros for printing GT-oriented messages but we don't have any to print messages that are tile oriented and we wrongly try to use plain drm or GT-oriented ones. Add tile-oriented printk messages and to provide similar coverage as we have with xe_assert() macros. Also add set of simple macros for the top level xe_device, which we could easily tweak to include extra device specific info if needed. Typical output of our printk macros will look like: [drm] this is xe_WARN() [drm] *ERROR* this is xe_err() [drm] *ERROR* this is xe_err_printer() [drm] this is xe_info() [drm] this is xe_info_printer() [drm:printk_demo.cold] this is xe_dbg() [drm:printk_demo.cold] this is xe_dbg_printer() [drm] Tile0: this is xe_tile_WARN() [drm] *ERROR* Tile0: this is xe_tile_err() [drm] *ERROR* Tile0: this is xe_tile_err_printer() [drm] Tile0: this is xe_tile_info() [drm] Tile0: this is xe_tile_info_printer() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg() [drm:printk_demo.cold] Tile0: this is xe_tile_dbg_printer() [drm] Tile0: GT0: this is xe_gt_WARN() [drm] *ERROR* Tile0: GT0: this is xe_gt_err() [drm] *ERROR* Tile0: GT0: this is xe_gt_err_printer() [drm] Tile0: GT0: this is xe_gt_info() [drm] Tile0: GT0: this is xe_gt_info_printer() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg() [drm:printk_demo.cold] Tile0: GT0: this is xe_gt_dbg_printer() Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-5-michal.wajdeczko@intel.com
2025-09-12drm/xe: Prepare format for GT-oriented messages in one placeMichal Wajdeczko1-4/+9
To avoid code duplication (and thus potential mistakes) and to allow easier changes (if needed) of the prefix format of the GT-oriented messages, prepare that prefix in dedicated macro. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-4-michal.wajdeczko@intel.com
2025-09-12drm/xe: Drop "gt_" prefix from xe_gt_WARN() macrosMichal Wajdeczko1-2/+2
Those WARN messages will already include GT-specific "GT%u:" prefix so there is no point to include additional "gt_" prefix. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-3-michal.wajdeczko@intel.com
2025-09-12drm/xe: Keep xe_gt_err() macro definitions togetherMichal Wajdeczko1-5/+5
There is no need to keep them separated. No functional changes. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250909165941.31730-2-michal.wajdeczko@intel.com
2025-09-11drm/xe/guc: Set RCS/CCS yield policyDaniele Ceraolo Spurio6-5/+97
All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: d9a1ae0d17bd ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
2025-09-11drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitationMichal Wajdeczko1-1/+0
This effectively reverts commit 4c3fe5eae46b ("drm/xe/pf: Limit fair VF LMEM provisioning") since we don't need it any more after non-contig VRAM allocations were fixed. This allows larger LMEM auto-provisioning for VFs, so instead: [ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M) [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM or [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M) [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM we may get: [ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M) [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM and [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M) [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM Fixes: 1e32ffbc9dc8 ("drm/xe/sriov: support non-contig VRAM provisioning") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com
2025-09-11drm/xe: Fix driver reference in FLR commentVarun Gupta1-1/+1
Rectify the reference of i915 to Xe in a comment. v2: Cosmetic changes. (Karthik) v3: Rephrased the commit message. (Karthik) Signed-off-by: Varun Gupta <varun.gupta@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Link: https://lore.kernel.org/r/20250911111712.811524-1-varun.gupta@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-11drm/xe/guc: Add SLPC power profile interfaceVinay Belgaumkar5-0/+102
GuC has an interface to set a power profile for the SLPC algorithm. Base mode is default and ensures a balanced performance, power_saving mode has conservative up/down thresholds and is suitable for use with apps that typically need to be power efficient. This will result in lower GT frequencies, thus consuming lower power. Selected power profile will be displayed in this format: $ cat power_profile [base] power_saving $ echo power_saving > power_profile $ cat power_profile base [power_saving] v2: Address review comments (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-11drm/xe: Fix uninitialized return valuesThomas Hellström2-2/+2
clang warned about two uninitialized variables used as return values in the exhaustive eviction series. Fix those. Fixes: 1f1541720f65 ("drm/xe: Rework instances of variants of xe_bo_create_locked()") Fixes: 7bcb6e38c14d ("drm/xe/display: Convert __xe_pin_fb_vma()") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250910151128.49693-1-thomas.hellstrom@linux.intel.com
2025-09-10drm/xe/tile: Release kobject for the failure pathShuicheng Lin1-5/+7
Call kobject_put() for the failure path to release the kobject v2: remove extra newline. (Matt) Fixes: e3d0839aa501 ("drm/xe/tile: Abort driver load for sysfs creation failure") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-10drm/xe: Convert pinned suspend eviction for exhaustive evictionThomas Hellström1-81/+103
Pinned suspend eviction and preparation for eviction validates system memory for eviction buffers. Do that under a validation exclusive lock to avoid interfering with other processes validating system graphics memory. v2: - Avoid gotos from within xe_validation_guard(). - Adapt to signature change of xe_validation_guard(). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-14-thomas.hellstrom@linux.intel.com