<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/workqueue.c, branch v7.1-rc2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v7.1-rc2</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v7.1-rc2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2026-04-15T17:32:08Z</updated>
<entry>
<title>Merge tag 'wq-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2026-04-15T17:32:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-15T17:32:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7de6b4a246330fe29fa2fd144b4724ca35d60d6c'/>
<id>urn:sha1:7de6b4a246330fe29fa2fd144b4724ca35d60d6c</id>
<content type='text'>
Pull workqueue updates from Tejun Heo:

 - New default WQ_AFFN_CACHE_SHARD affinity scope subdivides LLCs into
   smaller shards to improve scalability on machines with many CPUs per
   LLC

 - Misc:
    - system_dfl_long_wq for long unbound works
    - devm_alloc_workqueue() for device-managed allocation
    - sysfs exposure for ordered workqueues and the EFI workqueue
    - removal of HK_TYPE_WQ from wq_unbound_cpumask
    - various small fixes

* tag 'wq-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (21 commits)
  workqueue: validate cpumask_first() result in llc_populate_cpu_shard_id()
  workqueue: use NR_STD_WORKER_POOLS instead of hardcoded value
  workqueue: avoid unguarded 64-bit division
  docs: workqueue: document WQ_AFFN_CACHE_SHARD affinity scope
  workqueue: add test_workqueue benchmark module
  tools/workqueue: add CACHE_SHARD support to wq_dump.py
  workqueue: set WQ_AFFN_CACHE_SHARD as the default affinity scope
  workqueue: add WQ_AFFN_CACHE_SHARD affinity scope
  workqueue: fix typo in WQ_AFFN_SMT comment
  workqueue: Remove HK_TYPE_WQ from affecting wq_unbound_cpumask
  workqueue: unlink pwqs from wq-&gt;pwqs list in alloc_and_link_pwqs() error path
  workqueue: Remove NULL wq WARN in __queue_delayed_work()
  workqueue: fix parse_affn_scope() prefix matching bug
  workqueue: devres: Add device-managed allocate workqueue
  workqueue: Add system_dfl_long_wq for long unbound works
  tools/workqueue/wq_dump.py: add NODE prefix to all node columns
  tools/workqueue/wq_dump.py: fix column alignment in node_nr/max_active section
  tools/workqueue/wq_dump.py: remove backslash separator from node_nr/max_active header
  efi: Allow to expose the workqueue via sysfs
  workqueue: Allow to expose ordered workqueues via sysfs
  ...
</content>
</entry>
<entry>
<title>workqueue: validate cpumask_first() result in llc_populate_cpu_shard_id()</title>
<updated>2026-04-13T16:15:26Z</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-04-13T14:26:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=76af54648899abbd6b449c035583e47fd407078a'/>
<id>urn:sha1:76af54648899abbd6b449c035583e47fd407078a</id>
<content type='text'>
On uniprocessor (UP) configs such as nios2, NR_CPUS is 1, so
cpu_shard_id[] is a single-element array (int[1]). In
llc_populate_cpu_shard_id(), cpumask_first(sibling_cpus) returns an
unsigned int that the compiler cannot prove is always 0, triggering
a -Warray-bounds warning when the result is used to index
cpu_shard_id[]:

  kernel/workqueue.c:8321:55: warning: array subscript 1 is above
  array bounds of 'int[1]' [-Warray-bounds]
   8321 |  cpu_shard_id[c] = cpu_shard_id[cpumask_first(sibling_cpus)];
        |                    ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is a false positive: sibling_cpus can never be empty here because
'c' itself is always set in it, so cpumask_first() will always return a
valid CPU. However, the compiler cannot prove this statically, and the
warning only manifests on UP configs where the array size is 1.

Add a bounds check with WARN_ON_ONCE to silence the warning, and store
the result in a local variable to make the code clearer and avoid calling
cpumask_first() twice.

Fixes: 5920d046f7ae ("workqueue: add WQ_AFFN_CACHE_SHARD affinity scope")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202604022343.GQtkF2vO-lkp@intel.com/
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: use NR_STD_WORKER_POOLS instead of hardcoded value</title>
<updated>2026-04-07T18:13:19Z</updated>
<author>
<name>Maninder Singh</name>
<email>maninder1.s@samsung.com</email>
</author>
<published>2026-04-07T03:42:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=034db4dd4449c556705e6b32bc07bd31df3889ba'/>
<id>urn:sha1:034db4dd4449c556705e6b32bc07bd31df3889ba</id>
<content type='text'>
use NR_STD_WORKER_POOLS for irq_work_fns[] array definition.
NR_STD_WORKER_POOLS is also 2, but better to use MACRO.
Initialization loop for_each_bh_worker_pool() also uses same MACRO.

Signed-off-by: Maninder Singh &lt;maninder1.s@samsung.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: set WQ_AFFN_CACHE_SHARD as the default affinity scope</title>
<updated>2026-04-01T20:24:18Z</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-04-01T13:03:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4cdc8a7389d5025051f6c4a60fb5b7cb9b7960bb'/>
<id>urn:sha1:4cdc8a7389d5025051f6c4a60fb5b7cb9b7960bb</id>
<content type='text'>
Set WQ_AFFN_CACHE_SHARD as the default affinity scope for unbound
workqueues. On systems where many CPUs share one LLC, the previous
default (WQ_AFFN_CACHE) collapses all CPUs to a single worker pool,
causing heavy spinlock contention on pool-&gt;lock.

WQ_AFFN_CACHE_SHARD subdivides each LLC into smaller groups, providing
a better balance between locality and contention. Users can revert to
the previous behavior with workqueue.default_affinity_scope=cache.

On systems with 8 or fewer cores per LLC, CACHE_SHARD produces a single
shard covering the entire LLC, making it functionally identical to the
previous CACHE default. The sharding only activates when an LLC has more
than 8 cores.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: add WQ_AFFN_CACHE_SHARD affinity scope</title>
<updated>2026-04-01T20:24:18Z</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-04-01T13:03:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5920d046f7ae3bf9cf51b9d915c1fff13d299d84'/>
<id>urn:sha1:5920d046f7ae3bf9cf51b9d915c1fff13d299d84</id>
<content type='text'>
On systems where many CPUs share one LLC, unbound workqueues using
WQ_AFFN_CACHE collapse to a single worker pool, causing heavy spinlock
contention on pool-&gt;lock. For example, Chuck Lever measured 39% of
cycles lost to native_queued_spin_lock_slowpath on a 12-core shared-L3
NFS-over-RDMA system.

The existing affinity hierarchy (cpu, smt, cache, numa, system) offers
no intermediate option between per-LLC and per-SMT-core granularity.

Add WQ_AFFN_CACHE_SHARD, which subdivides each LLC into groups of at
most wq_cache_shard_size cores (default 8, tunable via boot parameter).
Shards are always split on core (SMT group) boundaries so that
Hyper-Threading siblings are never placed in different pods. Cores are
distributed across shards as evenly as possible -- for example, 36 cores
in a single LLC with max shard size 8 produces 5 shards of 8+7+7+7+7
cores.

The implementation follows the same comparator pattern as other affinity
scopes: precompute_cache_shard_ids() pre-fills the cpu_shard_id[] array
from the already-initialized WQ_AFFN_CACHE and WQ_AFFN_SMT topology,
and cpus_share_cache_shard() is passed to init_pod_type().

Benchmark on NVIDIA Grace (72 CPUs, single LLC, 50k items/thread), show
cache_shard delivers ~5x the throughput and ~6.5x lower p50 latency
compared to cache scope on this 72-core single-LLC system.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works</title>
<updated>2026-04-01T20:18:22Z</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2026-04-01T01:07:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=703ccb63ae9f7444d6ff876d024e17f628103c69'/>
<id>urn:sha1:703ccb63ae9f7444d6ff876d024e17f628103c69</id>
<content type='text'>
In unplug_oldest_pwq(), the first inactive work item on the
pool_workqueue is activated correctly. However, if multiple inactive
works exist on the same pool_workqueue, subsequent works fail to
activate because wq_node_nr_active.pending_pwqs is empty — the list
insertion is skipped when the pool_workqueue is plugged.

Fix this by checking for additional inactive works in
unplug_oldest_pwq() and updating wq_node_nr_active.pending_pwqs
accordingly.

Fixes: 4c065dbce1e8 ("workqueue: Enable unbound cpumask update on ordered workqueues")
Cc: stable@vger.kernel.org
Cc: Carlos Santa &lt;carlos.santa@intel.com&gt;
Cc: Ryan Neph &lt;ryanneph@google.com&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
</content>
</entry>
<entry>
<title>workqueue: Remove HK_TYPE_WQ from affecting wq_unbound_cpumask</title>
<updated>2026-03-31T21:43:25Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-03-31T18:35:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ab739383113107a1335ce7bcbc93110afb69267'/>
<id>urn:sha1:2ab739383113107a1335ce7bcbc93110afb69267</id>
<content type='text'>
For historical reason, wq_unbound_cpumask is initially set as
intersection of HK_TYPE_DOMAIN, HK_TYPE_WQ and workqueue.unbound_cpus
boot command line option.

At run time, users can update the unbound cpumask via the
/sys/devices/virtual/workqueue/cpumask sysfs file. Creation
and modification of cpuset isolated partitions will also update
wq_unbound_cpumask based on the latest HK_TYPE_DOMAIN cpumask.
The HK_TYPE_WQ cpumask is out of the picture with these runtime updates.

Complete the transition by taking HK_TYPE_WQ out from the workqueue code
and make it depends on HK_TYPE_DOMAIN only from the housekeeping side.
The final goal is to eliminate HK_TYPE_WQ as a housekeeping cpumask type.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: unlink pwqs from wq-&gt;pwqs list in alloc_and_link_pwqs() error path</title>
<updated>2026-03-25T15:52:01Z</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-03-23T10:18:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=afeaa9f2532d1d8d04803d09ac2d4f7107854f29'/>
<id>urn:sha1:afeaa9f2532d1d8d04803d09ac2d4f7107854f29</id>
<content type='text'>
When alloc_and_link_pwqs() fails partway through the per-cpu allocation
loop, some pool_workqueues may have already been linked into wq-&gt;pwqs
via link_pwq(). The error path frees these pwqs with kmem_cache_free()
but never removes them from the wq-&gt;pwqs list, leaving dangling pointers
in the list.

Currently this is not exploitable because the workqueue was never added
to the global workqueues list and the caller frees the wq immediately
after. However, this makes sure that alloc_and_link_pwqs() doesn't leave
any half-baked structure, which may have side effects if not properly
cleaned up.

Fix this by unlinking each pwq from wq-&gt;pwqs before freeing it. No
locking is needed as the workqueue has not been published yet, thus
no concurrency is possible.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Better describe stall check</title>
<updated>2026-03-25T15:51:02Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2026-03-25T12:34:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e398978ddf18fe5a2fc8299c77e6fe50e6c306c4'/>
<id>urn:sha1:e398978ddf18fe5a2fc8299c77e6fe50e6c306c4</id>
<content type='text'>
Try to be more explicit why the workqueue watchdog does not take
pool-&gt;lock by default. Spin locks are full memory barriers which
delay anything. Obviously, they would primary delay operations
on the related worker pools.

Explain why it is enough to prevent the false positive by re-checking
the timestamp under the pool-&gt;lock.

Finally, make it clear what would be the alternative solution in
__queue_work() which is a hotter path.

Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Fix false positive stall reports</title>
<updated>2026-03-22T04:34:59Z</updated>
<author>
<name>Song Liu</name>
<email>song@kernel.org</email>
</author>
<published>2026-03-22T03:30:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7f27a8ab9f2f43570f0725256597a0d7abe2c5b'/>
<id>urn:sha1:c7f27a8ab9f2f43570f0725256597a0d7abe2c5b</id>
<content type='text'>
On weakly ordered architectures (e.g., arm64), the lockless check in
wq_watchdog_timer_fn() can observe a reordering between the worklist
insertion and the last_progress_ts update. Specifically, the watchdog
can see a non-empty worklist (from a list_add) while reading a stale
last_progress_ts value, causing a false positive stall report.

This was confirmed by reading pool-&gt;last_progress_ts again after holding
pool-&gt;lock in wq_watchdog_timer_fn():

  workqueue watchdog: pool 7 false positive detected!
    lockless_ts=4784580465 locked_ts=4785033728
    diff=453263ms worklist_empty=0

To avoid slowing down the hot path (queue_work, etc.), recheck
last_progress_ts with pool-&gt;lock held. This will eliminate the false
positive with minimal overhead.

Remove two extra empty lines in wq_watchdog_timer_fn() as we are on it.

Fixes: 82607adcf9cd ("workqueue: implement lockup detector")
Cc: stable@vger.kernel.org # v4.5+
Assisted-by: claude-code:claude-opus-4-6
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
