linux/net/core/dev.h, branch v6.16

netdev: add "ops compat locking" helpers

2025-04-10T00:01:51Z

Add helpers to "lock a netdev in a backward-compatible way", which for ops-locked netdevs will mean take the instance lock. For drivers which haven't opted into the ops locking we'll take rtnl_lock. The scoped foreach is dropping and re-taking the lock for each device, even if prev and next are both under rtnl_lock. I hope that's fine since we expect that netdev nl to be mostly supported by modern drivers, and modern drivers should also opt into the instance locking. Note that these helpers are mostly needed for queue related state, because drivers modify queue config in their ops in a non-atomic way. Or differently put, queue changes don't have a clear-cut API like NAPI configuration. Any state that can should just use the instance lock directly, not the "compat" hacks. Reviewed-by: Joe Damato Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250408195956.412733-4-kuba@kernel.org Signed-off-by: Jakub Kicinski

net: avoid potential race between netdev_get_by_index_lock() and netns switch

2025-04-10T00:01:51Z

netdev_get_by_index_lock() performs following steps: rcu_lock(); dev = lookup(netns, ifindex); dev_get(dev); rcu_unlock(); [... lock & validate the dev ...] return dev Validation right now only checks if the device is registered but since the lookup is netns-aware we must also protect against the device switching netns right after we dropped the RCU lock. Otherwise the caller in netns1 may get a pointer to a device which has just switched to netns2. We can't hold the lock for the entire netns change process (because of the NETDEV_UNREGISTER notifier), and there's no existing marking to indicate that the netns is unlisted because of netns move, so add one. AFAIU none of the existing netdev_get_by_index_lock() callers can suffer from this problem (NAPI code double checks the netns membership and other callers are either under rtnl_lock or not ns-sensitive), so this patch does not have to be treated as a fix. Reviewed-by: Joe Damato Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250408195956.412733-2-kuba@kernel.org Signed-off-by: Jakub Kicinski

net: rps: remove kfree_rcu_mightsleep() use

2025-04-08T19:30:55Z

Add an rcu_head to sd_flow_limit and rps_sock_flow_table structs to use the more conventional and predictable k[v]free_rcu(). Signed-off-by: Eric Dumazet Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250407163602.170356-5-edumazet@google.com Signed-off-by: Jakub Kicinski

net: rps: annotate data-races around (struct sd_flow_limit)->count

2025-04-08T19:30:55Z

softnet_seq_show() can read fl->count while another cpu updates this field from skb_flow_limit(). Make this field an 'unsigned int', as its only consumer only deals with 32 bit. Signed-off-by: Eric Dumazet Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250407163602.170356-3-edumazet@google.com Signed-off-by: Jakub Kicinski

net: rps: change skb_flow_limit() hash function

2025-04-08T19:30:55Z

As explained in commit f3483c8e1da6 ("net: rfs: hash function change"), masking low order bits of skb_get_hash(skb) has low entropy. A NIC with 32 RX queues uses the 5 low order bits of rss key to select a queue. This means all packets landing to a given queue share the same 5 low order bits. Switch to hash_32() to reduce hash collisions. Signed-off-by: Eric Dumazet Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250407163602.170356-2-edumazet@google.com Signed-off-by: Jakub Kicinski

Revert "net: replace dev_addr_sem with netdev instance lock"

2025-03-19T17:52:00Z

This reverts commit df43d8bf10316a7c3b1e47e3cc0057a54df4a5b8. Cc: Kohei Enju Reviewed-by: Kuniyuki Iwashima Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250312190513.1252045-2-sdf@fomichev.me Tested-by: Lei Yang Signed-off-by: Paolo Abeni

net: move misc netdev_lock flavors to a separate header

2025-03-08T17:06:50Z

Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers). The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h. Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski

net: replace dev_addr_sem with netdev instance lock

2025-03-06T20:59:45Z

Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most of the paths that set/get mac are RTNL protected. Two places where it's not, convert to explicit locking: - sysfs address_show - dev_get_mac_address via dev_ioctl 0: https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/993321/24-router-bridge-1d-lag-sh/stderr Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-12-sdf@fomichev.me Signed-off-by: Jakub Kicinski

net: hold netdev instance lock during rtnetlink operations

2025-03-06T20:59:44Z

To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries: - (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference) This avoids touching most of the existing ioctl/sysfs/drivers paths. Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked. Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski

net: hold netdev instance lock during ndo_open/ndo_stop

2025-03-06T20:59:43Z

For the drivers that use shaper API, switch to the mode where core stack holds the netdev lock. This affects two drivers: * iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly remove these * netdevsim - switch to _locked APIs to avoid deadlock iavf_close diff is a bit confusing, the existing call looks like this: iavf_close() { netdev_lock() .. netdev_unlock() wait_event_timeout(down_waitqueue) } I change it to the following: netdev_lock() iavf_close() { .. netdev_unlock() wait_event_timeout(down_waitqueue) netdev_lock() // reusing this lock call } netdev_unlock() Since I'm reusing existing netdev_lock call, so it looks like I only add netdev_unlock. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski