| Age | Commit message (Collapse) | Author | Lines |
|
The PF driver might be resumed just to configure VFs, but since
it is doing some asynchronous GuC reconfigurations after fresh
reset, we should wait until all pending works are completed.
This is especially important in case of LMEM provisioning, since
we also need to update the LMTT and send invalidation requests
to all GuCs, which are expected to be already in the VGT mode.
Fixes: 68ae022278a1 ("drm/xe/pf: Force GuC virtualization mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com
(cherry picked from commit c6c86441c465ea440dfb5039f1c26e629a6fd64c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
We can't let restart worker run once device is removed, since other
data that it might want to access could be already released.
Explicitly disable worker as part of device cleanup action.
Fixes: a4d1c5d0b99b ("drm/xe/pf: Move VFs reprovisioning to worker")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com
(cherry picked from commit a424353937c24554bb242a6582ed8f018b4a411c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.
The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.
With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.
[1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
Call Trace:
<TASK>
? drm_printf+0x64/0x90
__xe_devcoredump_read+0x23f/0x2d0 [xe]
? __pfx___drm_printfn_coredump+0x10/0x10
? __pfx___drm_puts_coredump+0x10/0x10
xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
process_one_work+0x22e/0x6f0
worker_thread+0x1e8/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x11f/0x250
? __pfx_kthread+0x10/0x10
ret_from_fork+0x47/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
v2: Detailed commit description (Rodrigo)
v3: FIXME added (Rodrigo, Stuart)
Fixes: 4209d635a823 ("drm/xe: Remove devcoredump during driver release")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 1fdc4c381ff765479d76ccf3134717c430c871b8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
We already claim official support for SR-IOV PF/VF modes on PTL
and BMG platforms, but by default we start the Xe driver on those
platforms in non-virtualized mode (native) since we still have
max_vfs modparam set to disable creation of the VFs.
It's time to let the Xe driver support SR-IOV PF mode by default.
We were already testing this on our CI, which was relying on the
patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit a2b461bd6f3b36bded0a74178dec0e58e4714d3d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Extend WA 13012615864 to Graphics Versions 20.01,20.02,20.04
and 30.03.
Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20250731220143.72942-2-jonathan.cavitt@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Fix typo in comments:
souch -> such.
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250721152818.1891212-1-hugo@hugovil.com
|
|
Remove unnecessary semicolons reported by Coccinelle/coccicheck and the
semantic patch at scripts/coccinelle/misc/semicolon.cocci.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250729054214.2264377-1-nichen@iscas.ac.cn
|
|
Fix bug in nt35560_set_brightness() which causes the function to
erroneously report an error. mipi_dsi_dcs_write() returns either a
negative value when an error occurred or a positive number of bytes
written when no error occurred. The buggy code reports an error under
either condition.
Fixes: 8152c2bfd780 ("drm/panel: Add driver for Sony ACX424AKP panel")
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Brigham Campbell <me@brighamcampbell.com>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250731032343.1258366-2-me@brighamcampbell.com
|
|
HV101HD1-1E1 is a color active matrix TFT LCD module using amorphous
silicon TFT's (Thin Film Transistors) as an active switching devices. This
module has a 10.1 inch diagonally measured active area with HD resolutions
(1366 horizontal by 768 vertical pixel array).
Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250717135752.55958-3-clamor95@gmail.com
|
|
Use dev_err_probe() helper as directed by core driver model to handle
driver probe error. Use standard helper defined at drivers/base/core.c
to maintain code consistency.
Inspired by,
commit a787e5400a1ce ("driver core: add device probe log helper")
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/aIJagJ/RnhSCtb2t@bhairav-test.ee.iitb.ac.in
|
|
There is a spelling mistake in the LEDS_BD2606MVV config. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250724112358.142351-1-colin.i.king@gmail.com
|
|
Samsung AMS561RA01 is an AMOLED panel, using the Samsung S6E8AA5X01 MIPI
DSI controller. Implement a basic panel driver for such panels.
The driver also initializes a backlight device, which works by changing
the panel's gamma values and aid brightness levels appropriately, with
the help of look-up tables acquired from downstream kernel sources.
Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250721-panel-samsung-s6e8aa5x01-v5-2-1a315aba530b@disroot.org
|
|
Add support for the Olimex LCD-OLinuXino-5CTS, a 5-inch TFT LCD panel
with a mode operating at 33.3 MHz.
Signed-off-by: Paul Kocialkowski <paulk@sys-base.io>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250702082230.1291953-2-paulk@sys-base.io
|
|
Parallel exec queues have an additional command streamer buffer which holds
a GGTT reference to data within context status. The GGTT references have to
be fixed after VF migration.
v2: Properly handle nop entry, verify if parsing goes ok
v3: Improve error/warn logging, add propagation of errors,
give names to magic offsets
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-9-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
The WA buffer we use to capture context utilization contains GGTT
references. This means its instructions have to be either fixed or
re-emitted during VF post-migration recovery.
This patch adds re-emitting content of the utilization WA BB during
the recovery.
The way we write to vram requires scratch buffer to be used before
the whole block is memcopied. We are re-using a scratch buffer
introduced in earlier part of the recovery. This is not a performance
optimization, but a necessity to avoid creating dependencies between
locks.
v2: Notable rebase after "Prepare WA BB setup for more users" patch
v3: Added error propagation
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-8-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
The commands within ring area allocated for a request may contain
references to GGTT. These references require update after VF
migration, in order to continue any preempted LRCs, or jobs which
were emitted to the ring but not sent to GuC yet.
This change calls the emit function again for all such jobs,
as part of post-migration recovery.
v2: Moved few functions to better files
v3: Take job_list_lock
v4: Rephrased comments
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-7-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
All contexts require an update of state data, as the data includes
GGTT references to memirq-related buffers.
Default contexts need these references updated as well, because they
are not refreshed when a new context is created from them.
The way we write to vram requires scratch buffer to be used
before the whole block is memcopied. Since using kalloc() within
specific recovery functions would lead to unintended relations
between locks, we are allocating the buffer earlier, before
any locks are taken. The same buffer will be used for other steps
of the recovery.
v2: Update addresses by xe_lrc_write_ctx_reg() rather than
set_memory_based_intr()
v3: Renamed parameter, reordered parameters in some functs
v4: Check if have MEMIRQ, move `xe_gt*` funct to proper file
v5: Revert back to requiring scratch buffer, but allocate it
earlier this time
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-6-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
All contexts require an update due to GGTT range shift, as that
affects their HWSP.
The HW status page of a context contains GGTT references, which
need to be shifted to a new range (or re-computed using the
previously updated vma nodes). The references include ring start
address and indirect state address.
v2: move some functions to better matched files
v3: Add missing kerneldocs
v4: Style fix
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-5-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
Resetting GuC during recovery could interfere with the recovery
process. Such reset might be also triggered without justification,
due to migration taking time, rather than due to the workload not
progressing.
Doing GuC reset during the recovery would cause exit of RESFIX state,
and therefore continuation of GuC work while fixups are still being
applied. To avoid that, reset needs to be blocked during the recovery.
This patch blocks the reset during recovery. Reset request in that
time range will be stalled, and unblocked only after GuC goes out
of RESFIX state.
In case a reset procedure already started while the recovery is
triggered, there isn't much we can do - we cannot wait for it to
finish as it involves waiting for hardware, and we can't be sure
at which exact point of the reset procedure the GPU got switched.
Therefore, the rare cases where migration happens while reset is
in progress, are still dangerous. Resets are not a part of the
standard flow, and cause unfinished workloads - that will happen
during the reset interrupted by migration as well, so it doesn't
diverge that much from what normally happens during such resets.
v2: Introduce a new atomic for reset blocking, as we cannot reuse
`stopped` atomic (that could lead to losing a workload).
v3: Switched atomic functs to ones which include proper barriers
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-4-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
While applying post-migration fixups to VF, GuC will not respond
to any commands. This means submissions have no way of finishing.
To avoid acquiring additional resources and then stalling
on hardware access, pause the submission work. This will
decrease the chance of depleting resources, and speed up
the recovery.
v2: Commented xe_irq_resume() call
v3: Typo fix
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-3-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
Non-virtualized resources require fixups after SRIOV VF migration.
Caching GGTT references rather than re-computing them from the
underlying Buffer Object is something we want to avoid, as such
code would require additional fixup step and additional locking
around all the places where the address is accessed.
This change removes the cached GPU address from the Sub-Allocation
Manager, and introduces a function which recomputes and returns
the address instead.
v2: renamed xe_sa_manager_gpu_addr(), added kerneldoc
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-2-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
|
|
It has been observed that during `xe_display_pm_suspend()` execution,
an HPD interrupt can still be triggered, resulting in `dig_port_work`
being scheduled. The issue arises when this work executes after
`xe_display_pm_suspend_late()`, by which time the display is fully
suspended.
This can lead to errors such as "DC state mismatch", as the dig_port
work accesses display resources that are no longer available or
powered.
To address this, introduce 'intel_encoder_block_all_hpds' and
'intel_encoder_unblock_all_hpds' functions, which iterate over all
encoders and block/unblock HPD respectively.
These are used to:
- Block HPD IRQs before calling 'intel_hpd_cancel_work' in suspend
and shutdown
- Unblock HPD IRQs after 'intel_hpd_init' in resume
This will prevent 'dig_port_work' being scheduled during display
suspend.
Continuation of previous patch discussion:
https://patchwork.freedesktop.org/patch/663964/
Changes in v2:
- Add 'intel_encoder_block_all_hpds' to 'xe_display_pm_shutdown'.(Imre
Deak)
- Add 'intel_hpd_cancel_work' to 'xe_display_fini_early' to cancel
any HPD pending work at late driver removal. (Imre Deak)
Changes in v3:
- Move 'intel_encoder_block_all_hpds' after intel_dp_mst_suspend
in 'xe_display_pm_shutdown'.(Imre Deak)
Signed-off-by: Dibin Moolakadan Subrahmanian <dibin.moolakadan.subrahmanian@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://lore.kernel.org/r/20250724083928.2298199-1-dibin.moolakadan.subrahmanian@intel.com
|
|
This driver, for the time being, assumes that the kernel page size is 4kB,
so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB
pages.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org # v6.8+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250802024152.3021-1-Simon.Richter@hogyros.de
|
|
Display has switched to its own workqueue, no longer using
xe->unordered_wq.
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250731111214.1130130-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Expose the places that need i915_utils.h, and include it where needed.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/6338c8524e600e048b56c5484624cfb51ed49d1d.1753965351.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
After refactors, a lot of platform macros have become unused. Remove
them before new users have a chance to pop up.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/4507b49ead12c997de4615fa6ec277e666e5226a.1753965351.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
drm_bridge_detect
This fix the make htmldocs warnings:
drivers/gpu/drm/drm_bridge.c:1242: warning: Function parameter or struct
member 'connector' not described in 'drm_bridge_detect'
Fixes: 5d156a9c3d5e ("drm/bridge: Pass down connector to drm bridge detect hook")
Signed-off-by: Andy Yan <andyshrk@163.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250716125602.3166573-1-andyshrk@163.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
|
|
Seems we missed one drm-misc-next pull-request in drm-misc-next-fixes.
This is required to pull in 5d156a9c3d5e ("drm/bridge: Pass down
connector to drm bridge detect hook") and update its docs.
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Use recommended values as per wa_14021694213 to compare with the
calculated value and choose minimum of them.
v2: corrected checkpatch warning and retain the restriction for
min_hblank (Jani)
v3: use calculated value to compare with recomended value and choose
minimum of them (Imre)
v4: As driver supported min bpc is 8, omit the condition check for
bpc6 with ycbcr420. Added a note for the same (Imre)
v5: Add a warn for the unexpected case of 6bpc + uhbr + ycbcr420
v6: Reworded the comments and check fo anything < compressed bpp 8(Imre)
v7: Fix checkpatch warning. (Ankit)
Bspec: 74379
Signed-off-by: Arun R Murthy <arun.r.murthy@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Link: https://lore.kernel.org/r/20250730-min_hblank-v7-1-179360220ced@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
us closer to being able to remove page->mapping
- "relayfs: misc changes" (Jason Xing) does some maintenance and
minor feature addition work in relayfs
- "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
us from static preallocation of the kdump crashkernel's working
memory over to dynamic allocation. So the difficulty of a-priori
estimation of the second kernel's needs is removed and the first
kernel obtains extra memory
- "generalize panic_print's dump function to be used by other
kernel parts" (Feng Tang) implements some consolidation and
rationalization of the various ways in which a failing kernel
splats information at the operator
* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
tools/getdelays: add backward compatibility for taskstats version
kho: add test for kexec handover
delaytop: enhance error logging and add PSI feature description
samples: Kconfig: fix spelling mistake "instancess" -> "instances"
fat: fix too many log in fat_chain_add()
scripts/spelling.txt: add notifer||notifier to spelling.txt
xen/xenbus: fix typo "notifer"
net: mvneta: fix typo "notifer"
drm/xe: fix typo "notifer"
cxl: mce: fix typo "notifer"
KVM: x86: fix typo "notifer"
MAINTAINERS: add maintainers for delaytop
ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
ucount: fix atomic_long_inc_below() argument type
kexec: enable CMA based contiguous allocation
stackdepot: make max number of pools boot-time configurable
lib/xxhash: remove unused functions
init/Kconfig: restore CONFIG_BROKEN help text
lib/raid6: update recov_rvv.c zero page usage
docs: update docs after introducing delaytop
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Enable a set of Clippy lints: 'ptr_as_ptr', 'ptr_cast_constness',
'as_ptr_cast_mut', 'as_underscore', 'cast_lossless' and
'ref_as_ptr'
These are intended to avoid type casts with the 'as' operator,
which are quite powerful, into restricted variants that are less
powerful and thus should help to avoid mistakes
- Remove the 'author' key now that most instances were moved to the
plural one in the previous cycle
'kernel' crate:
- New 'bug' module: add 'warn_on!' macro which reuses the existing
'BUG'/'WARN' infrastructure, i.e. it respects the usual sysctls and
kernel parameters:
warn_on!(value == 42);
To avoid duplicating the assembly code, the same strategy is
followed as for the static branch code in order to share the
assembly between both C and Rust
This required a few rearrangements on C arch headers -- the
existing C macros should still generate the same outputs, thus no
functional change expected there
- 'workqueue' module: add delayed work items, including a
'DelayedWork' struct, a 'impl_has_delayed_work!' macro and an
'enqueue_delayed' method, e.g.:
/// Enqueue the struct for execution on the system workqueue,
/// where its value will be printed 42 jiffies later.
fn print_later(value: Arc<MyStruct>) {
let _ = workqueue::system().enqueue_delayed(value, 42);
}
- New 'bits' module: add support for 'bit' and 'genmask' functions,
with runtime- and compile-time variants, e.g.:
static_assert!(0b00010000 == bit_u8(4));
static_assert!(0b00011110 == genmask_u8(1..=4));
assert!(checked_bit_u32(u32::BITS).is_none());
- 'uaccess' module: add 'UserSliceReader::strcpy_into_buf', which
reads NUL-terminated strings from userspace into a '&CStr'
Introduce 'UserPtr' newtype, similar in purpose to '__user' in C,
to minimize mistakes handling userspace pointers, including mixing
them up with integers and leaking them via the 'Debug' trait. Add
it to the prelude, too
- Start preparations for the replacement of our custom 'CStr' type
with the analogous type in the 'core' standard library. This will
take place across several cycles to make it easier. For this one,
it includes a new 'fmt' module, using upstream method names and
some other cleanups
Replace 'fmt!' with a re-export, which helps Clippy lint properly,
and clean up the found 'uninlined-format-args' instances
- 'dma' module:
- Clarify wording and be consistent in 'coherent' nomenclature
- Convert the 'read!()' and 'write!()' macros to return a 'Result'
- Add 'as_slice()', 'write()' methods in 'CoherentAllocation'
- Expose 'count()' and 'size()' in 'CoherentAllocation' and add
the corresponding type invariants
- Implement 'CoherentAllocation::dma_handle_with_offset()'
- 'time' module:
- Make 'Instant' generic over clock source. This allows the
compiler to assert that arithmetic expressions involving the
'Instant' use 'Instants' based on the same clock source
- Make 'HrTimer' generic over the timer mode. 'HrTimer' timers
take a 'Duration' or an 'Instant' when setting the expiry time,
depending on the timer mode. With this change, the compiler can
check the type matches the timer mode
- Add an abstraction for 'fsleep'. 'fsleep' is a flexible sleep
function that will select an appropriate sleep method depending
on the requested sleep time
- Avoid 64-bit divisions on 32-bit hardware when calculating
timestamps
- Seal the 'HrTimerMode' trait. This prevents users of the
'HrTimerMode' from implementing the trait on their own types
- Pass the correct timer mode ID to 'hrtimer_start_range_ns()'
- 'list' module: remove 'OFFSET' constants, allowing to remove
pointer arithmetic; now 'impl_list_item!' invokes
'impl_has_list_links!' or 'impl_has_list_links_self_ptr!'. Other
simplifications too
- 'types' module: remove 'ForeignOwnable::PointedTo' in favor of a
constant, which avoids exposing the type of the opaque pointer, and
require 'into_foreign' to return non-null
Remove the 'Either<L, R>' type as well. It is unused, and we want
to encourage the use of custom enums for concrete use cases
- 'sync' module: implement 'Borrow' and 'BorrowMut' for 'Arc' types
to allow them to be used in generic APIs
- 'alloc' module: implement 'Borrow' and 'BorrowMut' for 'Box<T, A>';
and 'Borrow', 'BorrowMut' and 'Default' for 'Vec<T, A>'
- 'Opaque' type: add 'cast_from' method to perform a restricted cast
that cannot change the inner type and use it in callers of
'container_of!'. Rename 'raw_get' to 'cast_into' to match it
- 'rbtree' module: add 'is_empty' method
- 'sync' module: new 'aref' submodule to hold 'AlwaysRefCounted' and
'ARef', which are moved from the too general 'types' module which
we want to reduce or eventually remove. Also fix a safety comment
in 'static_lock_class'
'pin-init' crate:
- Add 'impl<T, E> [Pin]Init<T, E> for Result<T, E>', so results are
now (pin-)initializers
- Add 'Zeroable::init_zeroed()' that delegates to 'init_zeroed()'
- New 'zeroed()', a safe version of 'mem::zeroed()' and also provide
it via 'Zeroable::zeroed()'
- Implement 'Zeroable' for 'Option<&T>', 'Option<&mut T>' and for
'Option<[unsafe] [extern "abi"] fn(...args...) -> ret>' for
'"Rust"' and '"C"' ABIs and up to 20 arguments
- Changed blanket impls of 'Init' and 'PinInit' from 'impl<T, E>
[Pin]Init<T, E> for T' to 'impl<T> [Pin]Init<T> for T'
- Renamed 'zeroed()' to 'init_zeroed()'
- Upstream dev news: improve CI more to deny warnings, use
'--all-targets'. Check the synchronization status of the two
'-next' branches in upstream and the kernel
MAINTAINERS:
- Add Vlastimil Babka, Liam R. Howlett, Uladzislau Rezki and Lorenzo
Stoakes as reviewers (thanks everyone)
And a few other cleanups and improvements"
* tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (76 commits)
rust: Add warn_on macro
arm64/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
riscv/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
x86/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust
rust: kernel: move ARef and AlwaysRefCounted to sync::aref
rust: sync: fix safety comment for `static_lock_class`
rust: types: remove `Either<L, R>`
rust: kernel: use `core::ffi::CStr` method names
rust: str: add `CStr` methods matching `core::ffi::CStr`
rust: str: remove unnecessary qualification
rust: use `kernel::{fmt,prelude::fmt!}`
rust: kernel: add `fmt` module
rust: kernel: remove `fmt!`, fix clippy::uninlined-format-args
scripts: rust: emit path candidates in panic message
scripts: rust: replace length checks with match
rust: list: remove nonexistent generic parameter in link
rust: bits: add support for bits/genmask macros
rust: list: remove OFFSET constants
rust: list: add `impl_list_item!` examples
rust: list: use fully qualified path
...
|
|
There is a spelling mistake of 'notifer' in the comment which
should be 'notifier'.
Link: https://lkml.kernel.org/r/94190C5F54A19F3E+20250722073431.21983-3-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Make sure to drop the OF node reference taken when creating the aux
bridge device when the device is later released.
Fixes: 6914968a0b52 ("drm/bridge: properly refcount DT nodes in aux bridge drivers")
Cc: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250708085124.15445-2-johan@kernel.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
|
|
Pull virtio updates from Michael Tsirkin:
- vhost can now support legacy threading if enabled in Kconfig
- vsock memory allocation strategies for large buffers have been
improved, reducing pressure on kmalloc
- vhost now supports the in-order feature. guest bits missed the merge
window.
- fixes, cleanups all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (30 commits)
vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers
vsock/virtio: Rename virtio_vsock_skb_rx_put()
vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers
vsock/virtio: Move SKB allocation lower-bound check to callers
vsock/virtio: Rename virtio_vsock_alloc_skb()
vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page
vsock/virtio: Move length check to callers of virtio_vsock_skb_rx_put()
vsock/virtio: Validate length in packet header before skb_put()
vhost/vsock: Avoid allocating arbitrarily-sized SKBs
vhost_net: basic in_order support
vhost: basic in order support
vhost: fail early when __vhost_add_used() fails
vhost: Reintroduce kthread API and add mode selection
vdpa: Fix IDR memory leak in VDUSE module exit
vdpa/mlx5: Fix release of uninitialized resources on error path
vhost-scsi: Fix check for inline_sg_cnt exceeding preallocated limit
virtio: virtio_dma_buf: fix missing parameter documentation
vhost: Fix typos
vhost: vringh: Remove unused functions
vhost: vringh: Remove unused iotlb functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Allow built-in drivers, not just modular drivers, to use async
initial probing (Lukas Wunner)
- Support Immediate Readiness even on devices with no PM Capability
(Sean Christopherson)
- Consolidate definition of PCIE_RESET_CONFIG_WAIT_MS (100ms), the
required delay between a reset and sending config requests to a
device (Niklas Cassel)
- Add pci_is_display() to check for "Display" base class and use it
in ALSA hda, vfio, vga_switcheroo, vt-d (Mario Limonciello)
- Allow 'isolated PCI functions' (multi-function devices without a
function 0) for LoongArch, similar to s390 and jailhouse (Huacai
Chen)
Power control:
- Add ability to enable optional slot clock for cases where the PCIe
host controller and the slot are supplied by different clocks
(Marek Vasut)
PCIe native device hotplug:
- Fix runtime PM ref imbalance on Hot-Plug Capable ports caused by
misinterpreting a config read failure after a device has been
removed (Lukas Wunner)
- Avoid creating a useless PCIe port service device for pciehp if the
slot is handled by the ACPI hotplug driver (Lukas Wunner)
- Ignore ACPI hotplug slots when calculating depth of pciehp hotplug
ports (Lukas Wunner)
Virtualization:
- Save VF resizable BAR state and restore it after reset (Michał
Winiarski)
- Allow IOV resources (VF BARs) to be resized (Michał Winiarski)
- Add pci_iov_vf_bar_set_size() so drivers can control VF BAR size
(Michał Winiarski)
Endpoint framework:
- Add RC-to-EP doorbell support using platform MSI controller,
including a test case (Frank Li)
- Allow BAR assignment via configfs so platforms have flexibility in
determining BAR usage (Jerome Brunet)
Native PCIe controller drivers:
- Convert amazon,al-alpine-v[23]-pcie, apm,xgene-pcie,
axis,artpec6-pcie, marvell,armada-3700-pcie, st,spear1340-pcie to
DT schema format (Rob Herring)
- Use dev_fwnode() instead of of_fwnode_handle() to remove OF
dependency in altera (fixes an unused variable), designware-host,
mediatek, mediatek-gen3, mobiveil, plda, xilinx, xilinx-dma,
xilinx-nwl (Jiri Slaby, Arnd Bergmann)
- Convert aardvark, altera, brcmstb, designware-host, iproc,
mediatek, mediatek-gen3, mobiveil, plda, rcar-host, vmd, xilinx,
xilinx-dma, xilinx-nwl from using pci_msi_create_irq_domain() to
using msi_create_parent_irq_domain() instead; this makes the
interrupt controller per-PCI device, allows dynamic allocation of
vectors after initialization, and allows support of IMS (Nam Cao)
APM X-Gene PCIe controller driver:
- Rewrite MSI handling to MSI CPU affinity, drop useless CPU hotplug
bits, use device-managed memory allocations, and clean things up
(Marc Zyngier)
- Probe xgene-msi as a standard platform driver rather than a
subsys_initcall (Marc Zyngier)
Broadcom STB PCIe controller driver:
- Add optional DT 'num-lanes' property and if present, use it to
override the Maximum Link Width advertised in Link Capabilities
(Jim Quinlan)
Cadence PCIe controller driver:
- Use PCIe Message routing types from the PCI core rather than
defining private ones (Hans Zhang)
Freescale i.MX6 PCIe controller driver:
- Add IMX8MQ_EP third 64-bit BAR in epc_features (Richard Zhu)
- Add IMX8MM_EP and IMX8MP_EP fixed 256-byte BAR 4 in epc_features
(Richard Zhu)
- Configure LUT for MSI/IOMMU in Endpoint mode so Root Complex can
trigger doorbel on Endpoint (Frank Li)
- Remove apps_reset (LTSSM_EN) from
imx_pcie_{assert,deassert}_core_reset(), which fixes a hotplug
regression on i.MX8MM (Richard Zhu)
- Delay Endpoint link start until configfs 'start' written (Richard
Zhu)
Intel VMD host bridge driver:
- Add Intel Panther Lake (PTL)-H/P/U Vendor ID (George D Sworo)
Qualcomm PCIe controller driver:
- Add DT binding and driver support for SA8255p, which supports ECAM
for Configuration Space access (Mayank Rana)
- Update DT binding and driver to describe PHYs and per-Root Port
resets in a Root Port stanza and deprecate describing them in the
host bridge; this makes it possible to support multiple Root Ports
in the future (Krishna Chaitanya Chundru)
- Add Qualcomm QCS615 to SM8150 DT binding (Ziyue Zhang)
- Add Qualcomm QCS8300 to SA8775p DT binding (Ziyue Zhang)
- Drop TBU and ref clocks from Qualcomm SM8150 and SC8180x DT
bindings (Konrad Dybcio)
- Document 'link_down' reset in Qualcomm SA8775P DT binding (Ziyue
Zhang)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Rockchip PCIe controller driver:
- Drop unused PCIe Message routing and code definitions (Hans Zhang)
- Remove several unused header includes (Hans Zhang)
- Use standard PCIe config register definitions instead of
rockchip-specific redefinitions (Geraldo Nascimento)
- Set Target Link Speed to 5.0 GT/s before retraining so we have a
chance to train at a higher speed (Geraldo Nascimento)
Rockchip DesignWare PCIe controller driver:
- Prevent race between link training and register update via DBI by
inhibiting link training after hot reset and link down (Wilfred
Mallawa)
- Add required PCIE_RESET_CONFIG_WAIT_MS delay after Link up IRQ
(Niklas Cassel)
Sophgo PCIe controller driver:
- Add DT binding and driver for Sophgo SG2044 PCIe controller driver
in Root Complex mode (Inochi Amaoto)
Synopsys DesignWare PCIe controller driver:
- Add required PCIE_RESET_CONFIG_WAIT_MS after waiting for Link up on
Ports that support > 5.0 GT/s. Slower Ports still rely on the
not-quite-correct PCIE_LINK_WAIT_SLEEP_MS 90ms default delay while
waiting for the Link (Niklas Cassel)"
* tag 'pci-v6.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset
dt-bindings: PCI: Remove 83xx-512x-pci.txt
dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema
dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema
dt-bindings: PCI: Convert apm,xgene-pcie to DT schema
dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema
dt-bindings: PCI: Convert st,spear1340-pcie to DT schema
PCI: Move is_pciehp check out of pciehp_is_native()
PCI: pciehp: Use is_pciehp instead of is_hotplug_bridge
PCI/portdrv: Use is_pciehp instead of is_hotplug_bridge
PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
selftests: pci_endpoint: Add doorbell test case
misc: pci_endpoint_test: Add doorbell test case
PCI: endpoint: pci-epf-test: Add doorbell test support
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for inbound address alignment
PCI: endpoint: pci-ep-msi: Add checks for MSI parent and mutability
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: dwc: Add Sophgo SG2044 PCIe controller driver in Root Complex mode
PCI: vmd: Switch to msi_create_parent_irq_domain()
PCI: vmd: Convert to lock guards
...
|
|
There is an extraneous space before a newline in a drm_dbg_kms message.
Remove the space.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801164658.2438212-1-colin.i.king@gmail.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The PF's restart worker shouldn't attempt to resume the device on
its own, since its goal is to finish PF and VFs reprovisioning on
the recently reset GuC. Take extra RPM reference while scheduling
a work and release it from the worker or when we cancel a work.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-4-michal.wajdeczko@intel.com
|
|
The PF driver might be resumed just to configure VFs, but since
it is doing some asynchronous GuC reconfigurations after fresh
reset, we should wait until all pending works are completed.
This is especially important in case of LMEM provisioning, since
we also need to update the LMTT and send invalidation requests
to all GuCs, which are expected to be already in the VGT mode.
Fixes: 68ae022278a1 ("drm/xe/pf: Force GuC virtualization mode")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com
|
|
We can't let restart worker run once device is removed, since other
data that it might want to access could be already released.
Explicitly disable worker as part of device cleanup action.
Fixes: a4d1c5d0b99b ("drm/xe/pf: Move VFs reprovisioning to worker")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com
|
|
During VF unprovisioning, if VF was not provisioned with LMEM,
there is no need to trigger LMTT update, as VF LMTT was never
set. This will spare us sending full TLB invalidation requests.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250801144418.180584-1-michal.wajdeczko@intel.com
|
|
Avoid fd_install() until there are no more potential error paths, to
avoid put_unused_fd() after the fd is made visible to userspace.
Fixes: 68dc6c2d5eec ("drm/msm: Fix submit error-path leaks")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/665363/
|
|
submit_unpin_objects() should come before we unlock the objects. This
fixes the splat:
WARNING: CPU: 2 PID: 2171 at drivers/gpu/drm/msm/msm_gem.h:395 msm_gem_unpin_locked+0x8c/0xd8 [msm]
Modules linked in: uinput snd_seq_dummy snd_hrtimer aes_ce_ccm snd_soc_wsa884x regmap_sdw q6prm_clocks q6apm_lpass_dais q6apm_dai snd_q6dsp_common q6prm snd_q6apm qcom_pd_mapper cdc_mbim cdc_wdm cdc_ncm r8153_ecm cdc_ether usbnet sunrpc nls_ascii nls_cp437 vfat fat snd_soc_x1e80100 snd_soc_lpass_rx_macro snd_soc_lpass_tx_macro snd_soc_lpass_va_macro snd_soc_lpass_wsa_macro snd_soc_qcom_common soundwire_qcom snd_soc_lpass_macro_common snd_soc_hdmi_codec snd_soc_qcom_sdw ext4 snd_soc_core snd_compress soundwire_bus snd_pcm_dmaengine snd_seq mbcache jbd2 snd_seq_device snd_pcm pm8941_pwrkey snd_timer r8152 qcom_spmi_temp_alarm industrialio snd lenovo_yoga_slim7x ath12k mii arm_smccc_trng soundcore rng_core evdev loop panel_samsung_atna33xc20 msm ubwc_config drm_client_lib drm_gpuvm drm_exec gpu_sched drm_display_helper pmic_glink_altmode aux_hpd_bridge ucsi_glink qcom_battmgr phy_qcom_qmp_combo ps883x cec aux_bridge drm_dp_aux_bus i2c_hid_of aes_ce_blk drm_kms_helper aes_ce_cipher i2c_hid qcom_q6v5_pas
ghash_ce qcom_pil_info drm sha1_ce qcom_common phy_snps_eusb2 qcom_geni_serial qcom_q6v5 qcom_sysmon pinctrl_sm8550_lpass_lpi lpasscc_sc8280xp sbsa_gwdt mdt_loader gpio_keys pmic_glink i2c_dev efivarfs autofs4
CPU: 2 UID: 1000 PID: 2171 Comm: gnome-shell Not tainted 6.16.0-rc4-debug+ #25 PREEMPT(voluntary)
Hardware name: LENOVO 83ED/LNVNB161216, BIOS NHCN53WW 08/02/2024
pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : msm_gem_unpin_locked+0x8c/0xd8 [msm]
lr : msm_gem_unpin_locked+0x88/0xd8 [msm]
sp : ffff80009c963820
x29: ffff80009c963820 x28: ffff80009c9639f8 x27: ffff00080552a830
x26: 0000000000000000 x25: ffff0009d5655800 x24: 0000000000000000
x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000
x20: ffff000831db5480 x19: ffff000816e74400 x18: 0000000000000000
x17: 0000000000000000 x16: ffffc1396afdd720 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: ffff0008c065bc00
x11: ffff0008c065c000 x10: 0000000000000000 x9 : ffffc13945b19074
x8 : 0000000000000000 x7 : 0000000000000209 x6 : 0000000000000002
x5 : 0000000000019d01 x4 : ffff0008ba8db080 x3 : 000000000004093f
x2 : ffff3ed5e727f000 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
msm_gem_unpin_locked+0x8c/0xd8 [msm] (P)
msm_ioctl_gem_submit+0x32c/0x1760 [msm]
drm_ioctl_kernel+0xc8/0x138 [drm]
drm_ioctl+0x2c8/0x618 [drm]
__arm64_sys_ioctl+0xac/0x108
invoke_syscall.constprop.0+0x64/0xe8
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc+0x24/0x38
el0_svc+0x54/0x1d8
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x19c/0x1a0
irq event stamp: 2185036
hardirqs last enabled at (2185035): [<ffffc1396afeef9c>] _raw_spin_unlock_irqrestore+0x74/0x80
hardirqs last disabled at (2185036): [<ffffc1396afd8164>] el1_dbg+0x24/0x90
softirqs last enabled at (2184778): [<ffffc13969675e44>] fpsimd_restore_current_state+0x3c/0x328
softirqs last disabled at (2184776): [<ffffc13969675e14>] fpsimd_restore_current_state+0xc/0x328
---[ end trace 0000000000000000 ]---
Fixes: 111fdd2198e6 ("drm/msm: drm_gpuvm conversion")
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/665357/
|
|
If we hit an error path in GEM obj creation before msm_gem_new_handle()
updates obj->resv to point to the gpuvm resv object, then obj->resv
still points to &obj->_resv. In this case we don't want to decrement
the refcount of the object being freed (since the refcnt is already
zero). This fixes the following splat:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 9 PID: 7013 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148
Modules linked in: uinput snd_seq_dummy snd_hrtimer aes_ce_ccm snd_soc_wsa884x regmap_sdw q6prm_clocks q6apm_lpass_da>
qcom_pil_info i2c_hid drm_kms_helper qcom_common qcom_q6v5 phy_snps_eusb2 qcom_geni_serial drm qcom_sysmon pinctrl_s>
CPU: 9 UID: 1000 PID: 7013 Comm: deqp-vk Not tainted 6.16.0-rc4-debug+ #25 PREEMPT(voluntary)
Hardware name: LENOVO 83ED/LNVNB161216, BIOS NHCN53WW 08/02/2024
pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : refcount_warn_saturate+0xf4/0x148
lr : refcount_warn_saturate+0xf4/0x148
sp : ffff8000a2073920
x29: ffff8000a2073920 x28: 0000000000000010 x27: 0000000000000010
x26: 0000000000000042 x25: ffff000810e09800 x24: 0000000000000010
x23: ffff8000a2073b94 x22: ffff000ddb22de00 x21: ffff000ddb22dc00
x20: ffff000ddb22ddf8 x19: ffff0008024934e0 x18: 000000000000000a
x17: 0000000000000000 x16: ffff9f8c67d77340 x15: 0000000000000000
x14: 00000000ffffffff x13: 2e656572662d7265 x12: 7466612d65737520
x11: 3b776f6c66726564 x10: 00000000ffff7fff x9 : ffff9f8c67506c70
x8 : ffff9f8c69fa26f0 x7 : 00000000000bffe8 x6 : c0000000ffff7fff
x5 : ffff000f53e14548 x4 : ffff6082ea2b2000 x3 : ffff0008b86ab080
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0008b86ab080
Call trace:
refcount_warn_saturate+0xf4/0x148 (P)
msm_gem_free_object+0x248/0x260 [msm]
drm_gem_object_free+0x24/0x40 [drm]
msm_gem_new+0x1c4/0x1e0 [msm]
msm_gem_new_handle+0x3c/0x1a0 [msm]
msm_ioctl_gem_new+0x38/0x70 [msm]
drm_ioctl_kernel+0xc8/0x138 [drm]
drm_ioctl+0x2c8/0x618 [drm]
__arm64_sys_ioctl+0xac/0x108
invoke_syscall.constprop.0+0x64/0xe8
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc+0x24/0x38
el0_svc+0x54/0x1d8
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x19c/0x1a0
irq event stamp: 3698694
hardirqs last enabled at (3698693): [<ffff9f8c675021dc>] __up_console_sem+0x74/0x90
hardirqs last disabled at (3698694): [<ffff9f8c68ce8164>] el1_dbg+0x24/0x90
softirqs last enabled at (3697578): [<ffff9f8c6744ec5c>] handle_softirqs+0x454/0x4b0
softirqs last disabled at (3697567): [<ffff9f8c67360244>] __do_softirq+0x1c/0x28
---[ end trace 0000000000000000 ]---
Fixes: b58e12a66e47 ("drm/msm: Add _NO_SHARE flag")
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/665355/
|
|
The global fault counter is no longer used since commit 12578c075f89
("drm/msm/gpu: Skip retired submits in recover worker"). However, it's
still needed, as we need to handle cases where a GPU fault occurs after
the faulting process has already ended.
Hence, increment the global fault counter when the submitting process
had already ended. This way, the number of faults returned by
MSM_PARAM_FAULTS will stay consistent.
While here, s/unusuable/unusable.
Fixes: 12578c075f89 ("drm/msm/gpu: Skip retired submits in recover worker")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Patchwork: https://patchwork.freedesktop.org/patch/664853/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
An atomic counter is not sufficient, as one task could still be in the
process of tearing things down while another task increments the counter
back up to one and begins setup again. The race condition existed since
commit b145c6e65eb0 ("drm/msm: Add support to create a local pagetable")
but got bigger in commit dbbde63c9e9d ("drm/msm: Add PRR support").
Fixes: dbbde63c9e9d ("drm/msm: Add PRR support")
Fixes: b145c6e65eb0 ("drm/msm: Add support to create a local pagetable")
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/664433/
|
|
When commit 98290b0a7d60 ("drm/msm: make it possible to disable
KMS-related code.") was rebased on top of commit 3bebfd53af0f ("drm/msm:
Defer VMA unmap for fb unpins"), the additional use of msm_kms was
overlooked, resulting in a build break when KMS is disabled. Add some
additional ifdef to fix that.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 98290b0a7d60 ("drm/msm: make it possible to disable KMS-related code.")
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jessica Zhang <jessica.zhang@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/663240/
|
|
block
Stephen Rothwell reports multiple indentation warnings when merging
drm-msm tree:
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2445: ERROR: Unexpected indentation. [docutils]
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2447: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2451: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2452: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2456: ERROR: Unexpected indentation. [docutils]
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2457: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2458: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Documentation/gpu/drm-mm:506: ./drivers/gpu/drm/drm_gpuvm.c:2459: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]
Fix these by wrapping drm_gpuvm_sm_map_exec_lock() expected usage
example in literal code block.
Fixes: 471920ce25d5 ("drm/gpuvm: Add locking helpers")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250708192038.6b0fd31d@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/663121/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.
The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.
With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.
[1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
Call Trace:
<TASK>
? drm_printf+0x64/0x90
__xe_devcoredump_read+0x23f/0x2d0 [xe]
? __pfx___drm_printfn_coredump+0x10/0x10
? __pfx___drm_puts_coredump+0x10/0x10
xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
process_one_work+0x22e/0x6f0
worker_thread+0x1e8/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x11f/0x250
? __pfx_kthread+0x10/0x10
ret_from_fork+0x47/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
v2: Detailed commit description (Rodrigo)
v3: FIXME added (Rodrigo, Stuart)
Fixes: 4209d635a823 ("drm/xe: Remove devcoredump during driver release")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The xe_migrate_alloc() function returns NULL on error. It doesn't return
error pointers. Update the checking to match.
Fixes: a843b9894705 ("drm/xe/vf: Fix VM crash during VF driver release")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/aIzB8-Y6wtZvfNQT@stanley.mountain
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Calling drm_dev_unplug() is the drm way to say the device
is gone and can not be accessed any more.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250507082821.2710706-1-kraxel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|