<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/rcu/tree.c, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-04-06T17:04:23Z</updated>
<entry>
<title>rcu/kvfree: Avoid freeing new kfree_rcu() memory after old grace period</title>
<updated>2023-04-06T17:04:23Z</updated>
<author>
<name>Ziwei Dai</name>
<email>ziwei.dai@unisoc.com</email>
</author>
<published>2023-03-31T12:42:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5da7cb193db32da783a3f3e77d8b639989321d48'/>
<id>urn:sha1:5da7cb193db32da783a3f3e77d8b639989321d48</id>
<content type='text'>
Memory passed to kvfree_rcu() that is to be freed is tracked by a
per-CPU kfree_rcu_cpu structure, which in turn contains pointers
to kvfree_rcu_bulk_data structures that contain pointers to memory
that has not yet been handed to RCU, along with an kfree_rcu_cpu_work
structure that tracks the memory that has already been handed to RCU.
These structures track three categories of memory: (1) Memory for
kfree(), (2) Memory for kvfree(), and (3) Memory for both that arrived
during an OOM episode.  The first two categories are tracked in a
cache-friendly manner involving a dynamically allocated page of pointers
(the aforementioned kvfree_rcu_bulk_data structures), while the third
uses a simple (but decidedly cache-unfriendly) linked list through the
rcu_head structures in each block of memory.

On a given CPU, these three categories are handled as a unit, with that
CPU's kfree_rcu_cpu_work structure having one pointer for each of the
three categories.  Clearly, new memory for a given category cannot be
placed in the corresponding kfree_rcu_cpu_work structure until any old
memory has had its grace period elapse and thus has been removed.  And
the kfree_rcu_monitor() function does in fact check for this.

Except that the kfree_rcu_monitor() function checks these pointers one
at a time.  This means that if the previous kfree_rcu() memory passed
to RCU had only category 1 and the current one has only category 2, the
kfree_rcu_monitor() function will send that current category-2 memory
along immediately.  This can result in memory being freed too soon,
that is, out from under unsuspecting RCU readers.

To see this, consider the following sequence of events, in which:

o	Task A on CPU 0 calls rcu_read_lock(), then uses "from_cset",
	then is preempted.

o	CPU 1 calls kfree_rcu(cset, rcu_head) in order to free "from_cset"
	after a later grace period.  Except that "from_cset" is freed
	right after the previous grace period ended, so that "from_cset"
	is immediately freed.  Task A resumes and references "from_cset"'s
	member, after which nothing good happens.

In full detail:

CPU 0					CPU 1
----------------------			----------------------
count_memcg_event_mm()
|rcu_read_lock()  &lt;---
|mem_cgroup_from_task()
 |// css_set_ptr is the "from_cset" mentioned on CPU 1
 |css_set_ptr = rcu_dereference((task)-&gt;cgroups)
 |// Hard irq comes, current task is scheduled out.

					cgroup_attach_task()
					|cgroup_migrate()
					|cgroup_migrate_execute()
					|css_set_move_task(task, from_cset, to_cset, true)
					|cgroup_move_task(task, to_cset)
					|rcu_assign_pointer(.., to_cset)
					|...
					|cgroup_migrate_finish()
					|put_css_set_locked(from_cset)
					|from_cset-&gt;refcount return 0
					|kfree_rcu(cset, rcu_head) // free from_cset after new gp
					|add_ptr_to_bulk_krc_lock()
					|schedule_delayed_work(&amp;krcp-&gt;monitor_work, ..)

					kfree_rcu_monitor()
					|krcp-&gt;bulk_head[0]'s work attached to krwp-&gt;bulk_head_free[]
					|queue_rcu_work(system_wq, &amp;krwp-&gt;rcu_work)
					|if rwork-&gt;rcu.work is not in WORK_STRUCT_PENDING_BIT state,
					|call_rcu(&amp;rwork-&gt;rcu, rcu_work_rcufn) &lt;--- request new gp

					// There is a perious call_rcu(.., rcu_work_rcufn)
					// gp end, rcu_work_rcufn() is called.
					rcu_work_rcufn()
					|__queue_work(.., rwork-&gt;wq, &amp;rwork-&gt;work);

					|kfree_rcu_work()
					|krwp-&gt;bulk_head_free[0] bulk is freed before new gp end!!!
					|The "from_cset" is freed before new gp end.

// the task resumes some time later.
 |css_set_ptr-&gt;subsys[(subsys_id) &lt;--- Caused kernel crash, because css_set_ptr is freed.

This commit therefore causes kfree_rcu_monitor() to refrain from moving
kfree_rcu() memory to the kfree_rcu_cpu_work structure until the RCU
grace period has completed for all three categories.

v2: Use helper function instead of inserted code block at kfree_rcu_monitor().

Fixes: 34c881745549 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Fixes: 5f3c8d620447 ("rcu/tree: Maintain separate array for vmalloc ptrs")
Reported-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Signed-off-by: Ziwei Dai &lt;ziwei.dai@unisoc.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Tested-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'stall.2023.01.09a' into HEAD</title>
<updated>2023-02-03T00:40:07Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2023-02-03T00:40:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bba8d3d17dc2678f9647962900aa421a18c25320'/>
<id>urn:sha1:bba8d3d17dc2678f9647962900aa421a18c25320</id>
<content type='text'>
stall.2023.01.09a: RCU CPU stall-warning updates.
</content>
</entry>
<entry>
<title>Merge branches 'doc.2023.01.05a', 'fixes.2023.01.23a', 'kvfree.2023.01.03a', 'srcu.2023.01.03a', 'srcu-always.2023.02.02a', 'tasks.2023.01.03a', 'torture.2023.01.05a' and 'torturescript.2023.01.03a' into HEAD</title>
<updated>2023-02-03T00:33:43Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2023-02-03T00:33:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8e1704b6a8c91f37910f4d9957857f6d4424415c'/>
<id>urn:sha1:8e1704b6a8c91f37910f4d9957857f6d4424415c</id>
<content type='text'>
doc.2023.01.05a: Documentation update.
fixes.2023.01.23a: Miscellaneous fixes.
kvfree.2023.01.03a: kvfree_rcu() updates.
srcu.2023.01.03a: SRCU updates.
srcu-always.2023.02.02a: Finish making SRCU be unconditionally available.
tasks.2023.01.03a: Tasks-RCU updates.
torture.2023.01.05a: Torture-test updates.
torturescript.2023.01.03a: Torture-test scripting updates.
</content>
</entry>
<entry>
<title>rcu: Disable laziness if lazy-tracking says so</title>
<updated>2023-01-24T00:51:29Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2023-01-12T00:52:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf7066b97e27b2319af1ae2ef6889c4a1704312d'/>
<id>urn:sha1:cf7066b97e27b2319af1ae2ef6889c4a1704312d</id>
<content type='text'>
During suspend, we see failures to suspend 1 in 300-500 suspends.
Looking closer, it appears that asynchronous RCU callbacks are being
queued as lazy even though synchronous callbacks are expedited. These
delays appear to not be very welcome by the suspend/resume code as
evidenced by these occasional suspend failures.

This commit modifies call_rcu() to check if rcu_async_should_hurry(),
which will return true if we are in suspend or in-kernel boot.

[ paulmck: Alphabetize local variables. ]

Ignoring the lazy hint makes the 3000 suspend/resume cycles pass
reliably on a 12th gen 12-core Intel CPU, and there is some evidence
that it also slightly speeds up boot performance.

Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Track laziness during boot and suspend</title>
<updated>2023-01-18T04:20:11Z</updated>
<author>
<name>Joel Fernandes (Google)</name>
<email>joel@joelfernandes.org</email>
</author>
<published>2023-01-12T00:52:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6efdda8bec2900ce5166ee4ff4b1844b47b529cd'/>
<id>urn:sha1:6efdda8bec2900ce5166ee4ff4b1844b47b529cd</id>
<content type='text'>
Boot and suspend/resume should not be slowed down in kernels built with
CONFIG_RCU_LAZY=y.  In particular, suspend can sometimes fail in such
kernels.

This commit therefore adds rcu_async_hurry(), rcu_async_relax(), and
rcu_async_should_hurry() functions that track whether or not either
a boot or a suspend/resume operation is in progress.  This will
enable a later commit to refrain from laziness during those times.

Export rcu_async_should_hurry(), rcu_async_hurry(), and rcu_async_relax()
for later use by rcutorture.

[ paulmck: Apply feedback from Steve Rostedt. ]

Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
Signed-off-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Remove redundant call to rcu_boost_kthread_setaffinity()</title>
<updated>2023-01-12T19:30:11Z</updated>
<author>
<name>Zqiang</name>
<email>qiang1.zhang@intel.com</email>
</author>
<published>2022-12-21T19:15:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ccfe1fef9409ca80ffad6ce822a6d15eaee67c91'/>
<id>urn:sha1:ccfe1fef9409ca80ffad6ce822a6d15eaee67c91</id>
<content type='text'>
The rcu_boost_kthread_setaffinity() function is invoked at
rcutree_online_cpu() and rcutree_offline_cpu() time, early in the online
timeline and late in the offline timeline, respectively.  It is also
invoked from rcutree_dead_cpu(), however, in the absence of userspace
manipulations (for which userspace must take responsibility), this call
is redundant with that from rcutree_offline_cpu().  This redundancy can
be demonstrated by printing out the relevant cpumasks

This commit therefore removes the call to rcu_boost_kthread_setaffinity()
from rcutree_dead_cpu().

Signed-off-by: Zqiang &lt;qiang1.zhang@intel.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Add RCU stall diagnosis information</title>
<updated>2023-01-05T20:21:11Z</updated>
<author>
<name>Zhen Lei</name>
<email>thunder.leizhen@huawei.com</email>
</author>
<published>2022-11-19T09:25:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=be42f00b73a0f50710d16eb7cb4efda0cce062dd'/>
<id>urn:sha1:be42f00b73a0f50710d16eb7cb4efda0cce062dd</id>
<content type='text'>
Because RCU CPU stall warnings are driven from the scheduling-clock
interrupt handler, a workload consisting of a very large number of
short-duration hardware interrupts can result in misleading stall-warning
messages.  On systems supporting only a single level of interrupts,
that is, where interrupts handlers cannot be interrupted, this can
produce misleading diagnostics.  The stack traces will show the
innocent-bystander interrupted task, not the interrupts that are
at the very least exacerbating the stall.

This situation can be improved by displaying the number of interrupts
and the CPU time that they have consumed.  Diagnosing other types
of stalls can be eased by also providing the count of softirqs and
the CPU time that they consumed as well as the number of context
switches and the task-level CPU time consumed.

Consider the following output given this change:

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:     0-....: (1250 ticks this GP) &lt;omitted&gt;
rcu:          hardirqs   softirqs   csw/system
rcu:  number:      624         45            0
rcu: cputime:       69          1         2425   ==&gt; 2500(ms)

This output shows that the number of hard and soft interrupts is small,
there are no context switches, and the system takes up a lot of time. This
indicates that the current task is looping with preemption disabled.

The impact on system performance is negligible because snapshot is
recorded only once for all continuous RCU stalls.

This added debugging information is suppressed by default and can be
enabled by building the kernel with CONFIG_RCU_CPU_STALL_CPUTIME=y or
by booting with rcupdate.rcu_cpu_stall_cputime=1.

Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Reviewed-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu/kvfree: Split ready for reclaim objects from a batch</title>
<updated>2023-01-04T01:48:41Z</updated>
<author>
<name>Uladzislau Rezki (Sony)</name>
<email>urezki@gmail.com</email>
</author>
<published>2022-12-14T12:06:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ca836b1da1777c75b7363a7ca2973e8ab11fc21'/>
<id>urn:sha1:2ca836b1da1777c75b7363a7ca2973e8ab11fc21</id>
<content type='text'>
This patch splits the lists of objects so as to avoid sending any
through RCU that have already been queued for more than one grace
period.  These long-term-resident objects are immediately freed.
The remaining short-term-resident objects are queued for later freeing
using queue_rcu_work().

This change avoids delaying workqueue handlers with synchronize_rcu()
invocations.  Yes, workqueue handlers are designed to handle blocking,
but avoiding blocking when unnecessary improves performance during
low-memory situations.

Signed-off-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu/kvfree: Carefully reset number of objects in krcp</title>
<updated>2023-01-04T01:48:41Z</updated>
<author>
<name>Uladzislau Rezki (Sony)</name>
<email>urezki@gmail.com</email>
</author>
<published>2022-12-14T12:06:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4c33464ae85e59cba3f8048a34d571edf229823a'/>
<id>urn:sha1:4c33464ae85e59cba3f8048a34d571edf229823a</id>
<content type='text'>
The schedule_delayed_monitor_work() function relies on the count of
objects queued into any given kfree_rcu_cpu structure.  This count is
used to determine how quickly to schedule passing these objects to RCU.

There are three pipes where pointers can be placed.  When any pipe is
offloaded, the kfree_rcu_cpu structure's -&gt;count counter is set to zero,
which is wrong because the other pipes might still be non-empty.

This commit therefore maintains per-pipe counters, and introduces a
krc_count() helper to access the aggregate value of those counters.

Signed-off-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu/kvfree: Use READ_ONCE() when access to krcp-&gt;head</title>
<updated>2023-01-04T01:48:41Z</updated>
<author>
<name>Uladzislau Rezki (Sony)</name>
<email>urezki@gmail.com</email>
</author>
<published>2022-12-02T13:18:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9627456101ec9bb502daae7276e5141f66a9ddd1'/>
<id>urn:sha1:9627456101ec9bb502daae7276e5141f66a9ddd1</id>
<content type='text'>
The need_offload_krc() function is now lock-free, which gives the
compiler freedom to load old values from plain C-language loads from
the kfree_rcu_cpu struture's -&gt;head pointer.  This commit therefore
applied READ_ONCE() to these loads.

Signed-off-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
</feed>
