<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/rcu/tree_plugin.h, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-08-11T03:13:49Z</updated>
<entry>
<title>rcu: Fix racy re-initialization of irq_work causing hangs</title>
<updated>2025-08-11T03:13:49Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-08-08T17:03:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=61399e0c5410567ef60cb1cda34cca42903842e3'/>
<id>urn:sha1:61399e0c5410567ef60cb1cda34cca42903842e3</id>
<content type='text'>
RCU re-initializes the deferred QS irq work everytime before attempting
to queue it. However there are situations where the irq work is
attempted to be queued even though it is already queued. In that case
re-initializing messes-up with the irq work queue that is about to be
handled.

The chances for that to happen are higher when the architecture doesn't
support self-IPIs and irq work are then all lazy, such as with the
following sequence:

1) rcu_read_unlock() is called when IRQs are disabled and there is a
   grace period involving blocked tasks on the node. The irq work
   is then initialized and queued.

2) The related tasks are unblocked and the CPU quiescent state
   is reported. rdp-&gt;defer_qs_iw_pending is reset to DEFER_QS_IDLE,
   allowing the irq work to be requeued in the future (note the previous
   one hasn't fired yet).

3) A new grace period starts and the node has blocked tasks.

4) rcu_read_unlock() is called when IRQs are disabled again. The irq work
   is re-initialized (but it's queued! and its node is cleared) and
   requeued. Which means it's requeued to itself.

5) The irq work finally fires with the tick. But since it was requeued
   to itself, it loops and hangs.

Fix this with initializing the irq work only once before the CPU boots.

Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Joel Fernandes &lt;joelagnelf@nvidia.com&gt;
Signed-off-by: Neeraj Upadhyay (AMD) &lt;neeraj.upadhyay@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branches 'rcu-exp.23.07.2025', 'rcu.22.07.2025', 'torture-scripts.16.07.2025', 'srcu.19.07.2025', 'rcu.nocb.18.07.2025' and 'refscale.07.07.2025' into rcu.merge.23.07.2025</title>
<updated>2025-07-23T16:12:20Z</updated>
<author>
<name>Neeraj Upadhyay (AMD)</name>
<email>neeraj.upadhyay@kernel.org</email>
</author>
<published>2025-07-23T16:12:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cc1d1365f0f414f6522378867baa997642a7e6b2'/>
<id>urn:sha1:cc1d1365f0f414f6522378867baa997642a7e6b2</id>
<content type='text'>
</content>
</entry>
<entry>
<title>rcu: Refactor expedited handling check in rcu_read_unlock_special()</title>
<updated>2025-07-16T04:31:06Z</updated>
<author>
<name>Joel Fernandes</name>
<email>joelagnelf@nvidia.com</email>
</author>
<published>2025-07-15T20:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=908a97eba8c8b510996bf5d77d1e3070d59caa6d'/>
<id>urn:sha1:908a97eba8c8b510996bf5d77d1e3070d59caa6d</id>
<content type='text'>
Extract the complex expedited handling condition in rcu_read_unlock_special()
into a separate function rcu_unlock_needs_exp_handling() with detailed
comments explaining each condition.

This improves code readability. No functional change intended.

Reviewed-by: "Paul E. McKenney" &lt;paulmck@kernel.org&gt;
Signed-off-by: Joel Fernandes &lt;joelagnelf@nvidia.com&gt;
Signed-off-by: Neeraj Upadhyay (AMD) &lt;neeraj.upadhyay@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Fix rcu_read_unlock() deadloop due to IRQ work</title>
<updated>2025-07-16T04:08:26Z</updated>
<author>
<name>Joel Fernandes</name>
<email>joelagnelf@nvidia.com</email>
</author>
<published>2025-07-08T14:22:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b41642c87716bbd09797b1e4ea7d904f06c39b7b'/>
<id>urn:sha1:b41642c87716bbd09797b1e4ea7d904f06c39b7b</id>
<content type='text'>
During rcu_read_unlock_special(), if this happens during irq_exit(), we
can lockup if an IPI is issued. This is because the IPI itself triggers
the irq_exit() path causing a recursive lock up.

This is precisely what Xiongfeng found when invoking a BPF program on
the trace_tick_stop() tracepoint As shown in the trace below. Fix by
managing the irq_work state correctly.

irq_exit()
  __irq_exit_rcu()
    /* in_hardirq() returns false after this */
    preempt_count_sub(HARDIRQ_OFFSET)
    tick_irq_exit()
      tick_nohz_irq_exit()
	    tick_nohz_stop_sched_tick()
	      trace_tick_stop()  /* a bpf prog is hooked on this trace point */
		   __bpf_trace_tick_stop()
		      bpf_trace_run2()
			    rcu_read_unlock_special()
                              /* will send a IPI to itself */
			      irq_work_queue_on(&amp;rdp-&gt;defer_qs_iw, rdp-&gt;cpu);

A simple reproducer can also be obtained by doing the following in
tick_irq_exit(). It will hang on boot without the patch:

  static inline void tick_irq_exit(void)
  {
 +	rcu_read_lock();
 +	WRITE_ONCE(current-&gt;rcu_read_unlock_special.b.need_qs, true);
 +	rcu_read_unlock();
 +

Reported-by: Xiongfeng Wang &lt;wangxiongfeng2@huawei.com&gt;
Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@huawei.com/
Tested-by: Qi Xi &lt;xiqi2@huawei.com&gt;
Signed-off-by: Joel Fernandes &lt;joelagnelf@nvidia.com&gt;
Reviewed-by: "Paul E. McKenney" &lt;paulmck@kernel.org&gt;
Reported-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
[neeraj: Apply Frederic's suggested fix for PREEMPT_RT]
Signed-off-by: Neeraj Upadhyay (AMD) &lt;neeraj.upadhyay@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Protect -&gt;defer_qs_iw_pending from data race</title>
<updated>2025-07-16T03:52:02Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2025-04-24T23:49:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=90c09d57caeca94e6f3f87c49e96a91edd40cbfd'/>
<id>urn:sha1:90c09d57caeca94e6f3f87c49e96a91edd40cbfd</id>
<content type='text'>
On kernels built with CONFIG_IRQ_WORK=y, when rcu_read_unlock() is
invoked within an interrupts-disabled region of code [1], it will invoke
rcu_read_unlock_special(), which uses an irq-work handler to force the
system to notice when the RCU read-side critical section actually ends.
That end won't happen until interrupts are enabled at the soonest.

In some kernels, such as those booted with rcutree.use_softirq=y, the
irq-work handler is used unconditionally.

The per-CPU rcu_data structure's -&gt;defer_qs_iw_pending field is
updated by the irq-work handler and is both read and updated by
rcu_read_unlock_special().  This resulted in the following KCSAN splat:

------------------------------------------------------------------------

BUG: KCSAN: data-race in rcu_preempt_deferred_qs_handler / rcu_read_unlock_special

read to 0xffff96b95f42d8d8 of 1 bytes by task 90 on cpu 8:
 rcu_read_unlock_special+0x175/0x260
 __rcu_read_unlock+0x92/0xa0
 rt_spin_unlock+0x9b/0xc0
 __local_bh_enable+0x10d/0x170
 __local_bh_enable_ip+0xfb/0x150
 rcu_do_batch+0x595/0xc40
 rcu_cpu_kthread+0x4e9/0x830
 smpboot_thread_fn+0x24d/0x3b0
 kthread+0x3bd/0x410
 ret_from_fork+0x35/0x40
 ret_from_fork_asm+0x1a/0x30

write to 0xffff96b95f42d8d8 of 1 bytes by task 88 on cpu 8:
 rcu_preempt_deferred_qs_handler+0x1e/0x30
 irq_work_single+0xaf/0x160
 run_irq_workd+0x91/0xc0
 smpboot_thread_fn+0x24d/0x3b0
 kthread+0x3bd/0x410
 ret_from_fork+0x35/0x40
 ret_from_fork_asm+0x1a/0x30

no locks held by irq_work/8/88.
irq event stamp: 200272
hardirqs last  enabled at (200272): [&lt;ffffffffb0f56121&gt;] finish_task_switch+0x131/0x320
hardirqs last disabled at (200271): [&lt;ffffffffb25c7859&gt;] __schedule+0x129/0xd70
softirqs last  enabled at (0): [&lt;ffffffffb0ee093f&gt;] copy_process+0x4df/0x1cc0
softirqs last disabled at (0): [&lt;0000000000000000&gt;] 0x0

------------------------------------------------------------------------

The problem is that irq-work handlers run with interrupts enabled, which
means that rcu_preempt_deferred_qs_handler() could be interrupted,
and that interrupt handler might contain an RCU read-side critical
section, which might invoke rcu_read_unlock_special().  In the strict
KCSAN mode of operation used by RCU, this constitutes a data race on
the -&gt;defer_qs_iw_pending field.

This commit therefore disables interrupts across the portion of the
rcu_preempt_deferred_qs_handler() that updates the -&gt;defer_qs_iw_pending
field.  This suffices because this handler is not a fast path.

Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Neeraj Upadhyay (AMD) &lt;neeraj.upadhyay@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu/exp: Remove confusing needless full barrier on task unblock</title>
<updated>2025-07-08T17:50:19Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-04-29T13:43:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4b9432ed65cb3a692e8d8bd86abb93f5f2dc5b69'/>
<id>urn:sha1:4b9432ed65cb3a692e8d8bd86abb93f5f2dc5b69</id>
<content type='text'>
A full memory barrier in the RCU-PREEMPT task unblock path advertizes
to order the context switch (or rather the accesses prior to
rcu_read_unlock()) with the expedited grace period fastpath.

However the grace period can not complete without the rnp calling into
rcu_report_exp_rnp() with the node locked. This reports the quiescent
state in a fully ordered fashion against updater's accesses thanks to:

1) The READ-SIDE smp_mb__after_unlock_lock() barrier across nodes
   locking while propagating QS up to the root.

2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the
   the root rnp to wait/check for the GP completion.

3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end()
   before the grace period completes.

This makes the explicit barrier in this place superfluous. Therefore
remove it as it is confusing.

Acked-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Joel Fernandes &lt;joelagnelf@nvidia.com&gt;
Signed-off-by: Neeraj Upadhyay (AMD) &lt;neeraj.upadhyay@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Add Safe checks for access offloaded rdp</title>
<updated>2025-05-16T13:00:54Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang1211@gmail.com</email>
</author>
<published>2025-05-07T11:26:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=36f8e3087562bc3f55f584fb42d105dfdf6686f4'/>
<id>urn:sha1:36f8e3087562bc3f55f584fb42d105dfdf6686f4</id>
<content type='text'>
For built with CONFIG_PROVE_RCU=y and CONFIG_PREEMPT_RT=y kernels,
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count(), but change current-&gt;softirq_disable_cnt, this
resulted in the following splat:

WARNING: suspicious RCU usage
kernel/rcu/tree_plugin.h:36 Unsafe read of RCU_NOCB offloaded state!
stack backtrace:
CPU: 0 UID: 0 PID: 22 Comm: rcuc/0
Call Trace:
[    0.407907]  &lt;TASK&gt;
[    0.407910]  dump_stack_lvl+0xbb/0xd0
[    0.407917]  dump_stack+0x14/0x20
[    0.407920]  lockdep_rcu_suspicious+0x133/0x210
[    0.407932]  rcu_rdp_is_offloaded+0x1c3/0x270
[    0.407939]  rcu_core+0x471/0x900
[    0.407942]  ? lockdep_hardirqs_on+0xd5/0x160
[    0.407954]  rcu_cpu_kthread+0x25f/0x870
[    0.407959]  ? __pfx_rcu_cpu_kthread+0x10/0x10
[    0.407966]  smpboot_thread_fn+0x34c/0xa50
[    0.407970]  ? trace_preempt_on+0x54/0x120
[    0.407977]  ? __pfx_smpboot_thread_fn+0x10/0x10
[    0.407982]  kthread+0x40e/0x840
[    0.407990]  ? __pfx_kthread+0x10/0x10
[    0.407994]  ? rt_spin_unlock+0x4e/0xb0
[    0.407997]  ? rt_spin_unlock+0x4e/0xb0
[    0.408000]  ? __pfx_kthread+0x10/0x10
[    0.408006]  ? __pfx_kthread+0x10/0x10
[    0.408011]  ret_from_fork+0x40/0x70
[    0.408013]  ? __pfx_kthread+0x10/0x10
[    0.408018]  ret_from_fork_asm+0x1a/0x30
[    0.408042]  &lt;/TASK&gt;

Currently, triggering an rdp offloaded state change need the
corresponding rdp's CPU goes offline, and at this time the rcuc
kthreads has already in parking state. this means the corresponding
rcuc kthreads can safely read offloaded state of rdp while it's
corresponding cpu is online.

This commit therefore add softirq_count() check for
Preempt-RT kernels.

Suggested-by: Joel Fernandes &lt;joelagnelf@nvidia.com&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Zqiang &lt;qiang.zhang1211@gmail.com&gt;
Signed-off-by: Joel Fernandes &lt;joelagnelf@nvidia.com&gt;
</content>
</entry>
<entry>
<title>rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y</title>
<updated>2025-02-05T15:01:55Z</updated>
<author>
<name>Ankur Arora</name>
<email>ankur.a.arora@oracle.com</email>
</author>
<published>2024-12-13T04:06:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=83b28cfe796464ebbde1cf7916c126da6d572685'/>
<id>urn:sha1:83b28cfe796464ebbde1cf7916c126da6d572685</id>
<content type='text'>
With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
states for read-side critical sections via rcu_all_qs().
One reason why this was needed: lacking preempt-count, the tick
handler has no way of knowing whether it is executing in a
read-side critical section or not.

With (PREEMPT_LAZY=y, PREEMPT_DYNAMIC=n), we get (PREEMPT_COUNT=y,
PREEMPT_RCU=n). In this configuration cond_resched() is a stub and
does not provide quiescent states via rcu_all_qs().
(PREEMPT_RCU=y provides this information via rcu_read_unlock() and
its nesting counter.)

So, use the availability of preempt_count() to report quiescent states
in rcu_flavor_sched_clock_irq().

Suggested-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Ankur Arora &lt;ankur.a.arora@oracle.com&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Boqun Feng &lt;boqun.feng@gmail.com&gt;
</content>
</entry>
<entry>
<title>rcu: handle unstable rdp in rcu_read_unlock_strict()</title>
<updated>2025-02-05T15:01:55Z</updated>
<author>
<name>Ankur Arora</name>
<email>ankur.a.arora@oracle.com</email>
</author>
<published>2024-12-13T04:06:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fcf0e25ad4c8d14d2faab4d9a17040f31efce205'/>
<id>urn:sha1:fcf0e25ad4c8d14d2faab4d9a17040f31efce205</id>
<content type='text'>
rcu_read_unlock_strict() can be called with preemption enabled
which can make for an unstable rdp and a racy norm value.

Fix this by dropping the preempt-count in __rcu_read_unlock()
after the call to rcu_read_unlock_strict(), adjusting the
preempt-count check appropriately.

Suggested-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Ankur Arora &lt;ankur.a.arora@oracle.com&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Boqun Feng &lt;boqun.feng@gmail.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks</title>
<updated>2025-01-22T01:10:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-01-22T01:10:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1d6d3992235ed08929846f98fecf79682e0b422c'/>
<id>urn:sha1:1d6d3992235ed08929846f98fecf79682e0b422c</id>
<content type='text'>
Pull kthread updates from Frederic Weisbecker:
 "Kthreads affinity follow either of 4 existing different patterns:

   1) Per-CPU kthreads must stay affine to a single CPU and never
      execute relevant code on any other CPU. This is currently handled
      by smpboot code which takes care of CPU-hotplug operations.
      Affinity here is a correctness constraint.

   2) Some kthreads _have_ to be affine to a specific set of CPUs and
      can't run anywhere else. The affinity is set through
      kthread_bind_mask() and the subsystem takes care by itself to
      handle CPU-hotplug operations. Affinity here is assumed to be a
      correctness constraint.

   3) Per-node kthreads _prefer_ to be affine to a specific NUMA node.
      This is not a correctness constraint but merely a preference in
      terms of memory locality. kswapd and kcompactd both fall into this
      category. The affinity is set manually like for any other task and
      CPU-hotplug is supposed to be handled by the relevant subsystem so
      that the task is properly reaffined whenever a given CPU from the
      node comes up. Also care should be taken so that the node affinity
      doesn't cross isolated (nohz_full) cpumask boundaries.

   4) Similar to the previous point except kthreads have a _preferred_
      affinity different than a node. Both RCU boost kthreads and RCU
      exp kworkers fall into this category as they refer to "RCU nodes"
      from a distinctly distributed tree.

  Currently the preferred affinity patterns (3 and 4) have at least 4
  identified users, with more or less success when it comes to handle
  CPU-hotplug operations and CPU isolation. Each of which do it in its
  own ad-hoc way.

  This is an infrastructure proposal to handle this with the following
  API changes:

   - kthread_create_on_node() automatically affines the created kthread
     to its target node unless it has been set as per-cpu or bound with
     kthread_bind[_mask]() before the first wake-up.

   - kthread_affine_preferred() is a new function that can be called
     right after kthread_create_on_node() to specify a preferred
     affinity different than the specified node.

  When the preferred affinity can't be applied because the possible
  targets are offline or isolated (nohz_full), the kthread is affine to
  the housekeeping CPUs (which means to all online CPUs most of the time
  or only the non-nohz_full CPUs when nohz_full= is set).

  kswapd, kcompactd, RCU boost kthreads and RCU exp kworkers have been
  converted, along with a few old drivers.

  Summary of the changes:

   - Consolidate a bunch of ad-hoc implementations of
     kthread_run_on_cpu()

   - Introduce task_cpu_fallback_mask() that defines the default last
     resort affinity of a task to become nohz_full aware

   - Add some correctness check to ensure kthread_bind() is always
     called before the first kthread wake up.

   - Default affine kthread to its preferred node.

   - Convert kswapd / kcompactd and remove their halfway working ad-hoc
     affinity implementation

   - Implement kthreads preferred affinity

   - Unify kthread worker and kthread API's style

   - Convert RCU kthreads to the new API and remove the ad-hoc affinity
     implementation"

* tag 'kthread-for-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks:
  kthread: modify kernel-doc function name to match code
  rcu: Use kthread preferred affinity for RCU exp kworkers
  treewide: Introduce kthread_run_worker[_on_cpu]()
  kthread: Unify kthread_create_on_cpu() and kthread_create_worker_on_cpu() automatic format
  rcu: Use kthread preferred affinity for RCU boost
  kthread: Implement preferred affinity
  mm: Create/affine kswapd to its preferred node
  mm: Create/affine kcompactd to its preferred node
  kthread: Default affine kthread to its preferred NUMA node
  kthread: Make sure kthread hasn't started while binding it
  sched,arm64: Handle CPU isolation on last resort fallback rq selection
  arm64: Exclude nohz_full CPUs from 32bits el0 support
  lib: test_objpool: Use kthread_run_on_cpu()
  kallsyms: Use kthread_run_on_cpu()
  soc/qman: test: Use kthread_run_on_cpu()
  arm/bL_switcher: Use kthread_run_on_cpu()
</content>
</entry>
</feed>
