diff options
| author | Cosmin Ratiu <cratiu@nvidia.com> | 2026-03-05 10:10:19 +0200 |
|---|---|---|
| committer | Jakub Kicinski <kuba@kernel.org> | 2026-03-06 17:24:28 -0800 |
| commit | aed763abf0e905b4b8d747d1ba9e172961572f57 (patch) | |
| tree | a679848d9f0e9ea5e1a4e5f3fe166fc3610137cf /tools/perf/scripts/python | |
| parent | 55f854dd5bdd8e19b936a00ef1f8d776ac32c7b0 (diff) | |
| download | linux-aed763abf0e905b4b8d747d1ba9e172961572f57.tar.gz linux-aed763abf0e905b4b8d747d1ba9e172961572f57.zip | |
net/mlx5: Fix deadlock between devlink lock and esw->wq
esw->work_queue executes esw_functions_changed_event_handler ->
esw_vfs_changed_event_handler and acquires the devlink lock.
.eswitch_mode_set (acquires devlink lock in devlink_nl_pre_doit) ->
mlx5_devlink_eswitch_mode_set -> mlx5_eswitch_disable_locked ->
mlx5_eswitch_event_handler_unregister -> flush_workqueue deadlocks
when esw_vfs_changed_event_handler executes.
Fix that by no longer flushing the work to avoid the deadlock, and using
a generation counter to keep track of work relevance. This avoids an old
handler manipulating an esw that has undergone one or more mode changes:
- the counter is incremented in mlx5_eswitch_event_handler_unregister.
- the counter is read and passed to the ephemeral mlx5_host_work struct.
- the work handler takes the devlink lock and bails out if the current
generation is different than the one it was scheduled to operate on.
- mlx5_eswitch_cleanup does the final draining before destroying the wq.
No longer flushing the workqueue has the side effect of maybe no longer
cancelling pending vport_change_handler work items, but that's ok since
those are disabled elsewhere:
- mlx5_eswitch_disable_locked disables the vport eq notifier.
- mlx5_esw_vport_disable disarms the HW EQ notification and marks
vport->enabled under state_lock to false to prevent pending vport
handler from doing anything.
- mlx5_eswitch_cleanup destroys the workqueue and makes sure all events
are disabled/finished.
Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260305081019.1811100-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'tools/perf/scripts/python')
0 files changed, 0 insertions, 0 deletions
