summaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/stackcollapse.py
diff options
context:
space:
mode:
authorCosmin Ratiu <cratiu@nvidia.com>2026-03-05 10:10:19 +0200
committerJakub Kicinski <kuba@kernel.org>2026-03-06 17:24:28 -0800
commitaed763abf0e905b4b8d747d1ba9e172961572f57 (patch)
treea679848d9f0e9ea5e1a4e5f3fe166fc3610137cf /tools/perf/scripts/python/stackcollapse.py
parent55f854dd5bdd8e19b936a00ef1f8d776ac32c7b0 (diff)
downloadlinux-aed763abf0e905b4b8d747d1ba9e172961572f57.tar.gz
linux-aed763abf0e905b4b8d747d1ba9e172961572f57.zip
net/mlx5: Fix deadlock between devlink lock and esw->wq
esw->work_queue executes esw_functions_changed_event_handler -> esw_vfs_changed_event_handler and acquires the devlink lock. .eswitch_mode_set (acquires devlink lock in devlink_nl_pre_doit) -> mlx5_devlink_eswitch_mode_set -> mlx5_eswitch_disable_locked -> mlx5_eswitch_event_handler_unregister -> flush_workqueue deadlocks when esw_vfs_changed_event_handler executes. Fix that by no longer flushing the work to avoid the deadlock, and using a generation counter to keep track of work relevance. This avoids an old handler manipulating an esw that has undergone one or more mode changes: - the counter is incremented in mlx5_eswitch_event_handler_unregister. - the counter is read and passed to the ephemeral mlx5_host_work struct. - the work handler takes the devlink lock and bails out if the current generation is different than the one it was scheduled to operate on. - mlx5_eswitch_cleanup does the final draining before destroying the wq. No longer flushing the workqueue has the side effect of maybe no longer cancelling pending vport_change_handler work items, but that's ok since those are disabled elsewhere: - mlx5_eswitch_disable_locked disables the vport eq notifier. - mlx5_esw_vport_disable disarms the HW EQ notification and marks vport->enabled under state_lock to false to prevent pending vport handler from doing anything. - mlx5_eswitch_cleanup destroys the workqueue and makes sure all events are disabled/finished. Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305081019.1811100-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'tools/perf/scripts/python/stackcollapse.py')
0 files changed, 0 insertions, 0 deletions