<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/rcu, branch v5.19</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.19</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.19'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-07-22T17:01:20Z</updated>
<entry>
<title>Merge tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu</title>
<updated>2022-07-22T17:01:20Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-07-22T17:01:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4ba1329cbb9456c937bff1ed8ad4ca91ab75eab6'/>
<id>urn:sha1:4ba1329cbb9456c937bff1ed8ad4ca91ab75eab6</id>
<content type='text'>
Pull RCU fix from Paul McKenney:
 "This contains a pair of commits that fix 282d8998e997 ("srcu: Prevent
  expedited GPs and blocking readers from consuming CPU"), which was
  itself a fix to an SRCU expedited grace-period problem that could
  prevent kernel live patching (KLP) from completing.

  That SRCU fix for KLP introduced large (as in minutes) boot-time
  delays to embedded Linux kernels running on qemu/KVM. These delays
  were due to the emulation of certain MMIO operations controlling
  memory layout, which were emulated with one expedited grace period per
  access. Common configurations required thousands of boot-time MMIO
  accesses, and thus thousands of boot-time expedited SRCU grace
  periods.

  In these configurations, the occasional sleeps that allowed KLP to
  proceed caused excessive boot delays. These commits preserve enough
  sleeps to permit KLP to proceed, but few enough that the virtual
  embedded kernels still boot reasonably quickly.

  This represents a regression introduced in the v5.19 merge window, and
  the bug is causing significant inconvenience"

* tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  srcu: Make expedited RCU grace periods block even less frequently
  srcu: Block less aggressively for expedited grace periods
</content>
</entry>
<entry>
<title>srcu: Make expedited RCU grace periods block even less frequently</title>
<updated>2022-07-19T18:39:59Z</updated>
<author>
<name>Neeraj Upadhyay</name>
<email>quic_neeraju@quicinc.com</email>
</author>
<published>2022-07-01T03:15:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4f2bfd9494a072d58203600de6bedd72680e612a'/>
<id>urn:sha1:4f2bfd9494a072d58203600de6bedd72680e612a</id>
<content type='text'>
The purpose of commit 282d8998e997 ("srcu: Prevent expedited GPs
and blocking readers from consuming CPU") was to prevent a long
series of never-blocking expedited SRCU grace periods from blocking
kernel-live-patching (KLP) progress.  Although it was successful, it also
resulted in excessive boot times on certain embedded workloads running
under qemu with the "-bios QEMU_EFI.fd" command line.  Here "excessive"
means increasing the boot time up into the three-to-four minute range.
This increase in boot time was due to the more than 6000 back-to-back
invocations of synchronize_rcu_expedited() within the KVM host OS, which
in turn resulted from qemu's emulation of a long series of MMIO accesses.

Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace
periods") did not significantly help this particular use case.

Zhangfei Gao and Shameerali Kolothum Thodi did experiments varying the
value of SRCU_MAX_NODELAY_PHASE with HZ=250 and with various values
of non-sleeping per phase counts on a system with preemption enabled,
and observed the following boot times:

+──────────────────────────+────────────────+
| SRCU_MAX_NODELAY_PHASE   | Boot time (s)  |
+──────────────────────────+────────────────+
| 100                      | 30.053         |
| 150                      | 25.151         |
| 200                      | 20.704         |
| 250                      | 15.748         |
| 500                      | 11.401         |
| 1000                     | 11.443         |
| 10000                    | 11.258         |
| 1000000                  | 11.154         |
+──────────────────────────+────────────────+

Analysis on the experiment results show additional improvements with
CPU-bound delays approaching one jiffy in duration. This improvement was
also seen when number of per-phase iterations were scaled to one jiffy.

This commit therefore scales per-grace-period phase number of non-sleeping
polls so that non-sleeping polls extend for about one jiffy. In addition,
the delay-calculation call to srcu_get_delay() in srcu_gp_end() is
replaced with a simple check for an expedited grace period.  This change
schedules callback invocation immediately after expedited grace periods
complete, which results in greatly improved boot times.  Testing done
by Marc and Zhangfei confirms that this change recovers most of the
performance degradation in boottime; for CONFIG_HZ_250 configuration,
specifically, boot times improve from 3m50s to 41s on Marc's setup;
and from 2m40s to ~9.7s on Zhangfei's setup.

In addition to the changes to default per phase delays, this
change adds 3 new kernel parameters - srcutree.srcu_max_nodelay,
srcutree.srcu_max_nodelay_phase, and srcutree.srcu_retry_check_delay.
This allows users to configure the srcu grace period scanning delays in
order to more quickly react to additional use cases.

Fixes: 640a7d37c3f4 ("srcu: Block less aggressively for expedited grace periods")
Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Reported-by: Zhangfei Gao &lt;zhangfei.gao@linaro.org&gt;
Reported-by: yueluck &lt;yueluck@163.com&gt;
Signed-off-by: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Tested-by: Marc Zyngier &lt;maz@kernel.org&gt;
Tested-by: Zhangfei Gao &lt;zhangfei.gao@linaro.org&gt;
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>srcu: Block less aggressively for expedited grace periods</title>
<updated>2022-07-19T18:39:59Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2022-06-12T22:00:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8f870e6eb8c0c3f9869bf3fcf9db39f86cfcea49'/>
<id>urn:sha1:8f870e6eb8c0c3f9869bf3fcf9db39f86cfcea49</id>
<content type='text'>
Commit 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers
from consuming CPU") fixed a problem where a long-running expedited SRCU
grace period could block kernel live patching.  It did so by giving up
on expediting once a given SRCU expedited grace period grew too old.

Unfortunately, this added excessive delays to boots of virtual embedded
systems specifying "-bios QEMU_EFI.fd" to qemu.  This commit therefore
makes the transition away from expediting less aggressive, increasing
the per-grace-period phase number of non-sleeping polls of readers from
one to three and increasing the required grace-period age from one jiffy
(actually from zero to one jiffies) to two jiffies (actually from one
to two jiffies).

Fixes: 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers from consuming CPU")
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Reported-by: Zhangfei Gao &lt;zhangfei.gao@linaro.org&gt;
Reported-by: chenxiang (M)" &lt;chenxiang66@hisilicon.com&gt;
Cc: Shameerali Kolothum Thodi  &lt;shameerali.kolothum.thodi@huawei.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Reviewed-by: Neeraj Upadhyay &lt;quic_neeraju@quicinc.com&gt;
Link: https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
</content>
</entry>
<entry>
<title>Merge branch 'rework/kthreads' into for-linus</title>
<updated>2022-06-23T17:11:28Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2022-06-23T17:11:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51889d225ce2ce118d8413eb4282045add81a689'/>
<id>urn:sha1:51889d225ce2ce118d8413eb4282045add81a689</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Revert "printk: add functions to prefer direct printing"</title>
<updated>2022-06-23T16:41:40Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2022-06-23T14:51:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=07a22b61946f0b80065b0ddcc703b715f84355f5'/>
<id>urn:sha1:07a22b61946f0b80065b0ddcc703b715f84355f5</id>
<content type='text'>
This reverts commit 2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2.

The testing of 5.19 release candidates revealed missing synchronization
between early and regular console functionality.

It would be possible to start the console kthreads later as a workaround.
But it is clear that console lock serialized console drivers between
each other. It opens a big area of possible problems that were not
considered by people involved in the development and review.

printk() is crucial for debugging kernel issues and console output is
very important part of it. The number of consoles is huge and a proper
review would take some time. As a result it need to be reverted for 5.19.

Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Link: https://lore.kernel.org/r/20220623145157.21938-7-pmladek@suse.com
</content>
</entry>
<entry>
<title>Merge tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux</title>
<updated>2022-05-26T23:57:20Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-26T23:57:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=44d35720c9a660074b77ab9de37abf2c01c5b44f'/>
<id>urn:sha1:44d35720c9a660074b77ab9de37abf2c01c5b44f</id>
<content type='text'>
Pull sysctl updates from Luis Chamberlain:
 "For two kernel releases now kernel/sysctl.c has been being cleaned up
  slowly, since the tables were grossly long, sprinkled with tons of
  #ifdefs and all this caused merge conflicts with one susbystem or
  another.

  This tree was put together to help try to avoid conflicts with these
  cleanups going on different trees at time. So nothing exciting on this
  pull request, just cleanups.

  Thanks a lot to the Uniontech and Huawei folks for doing some of this
  nasty work"

* tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits)
  sched: Fix build warning without CONFIG_SYSCTL
  reboot: Fix build warning without CONFIG_SYSCTL
  kernel/kexec_core: move kexec_core sysctls into its own file
  sysctl: minor cleanup in new_dir()
  ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n
  fs/proc: Introduce list_for_each_table_entry for proc sysctl
  mm: fix unused variable kernel warning when SYSCTL=n
  latencytop: move sysctl to its own file
  ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y
  ftrace: Fix build warning
  ftrace: move sysctl_ftrace_enabled to ftrace.c
  kernel/do_mount_initrd: move real_root_dev sysctls to its own file
  kernel/delayacct: move delayacct sysctls to its own file
  kernel/acct: move acct sysctls to its own file
  kernel/panic: move panic sysctls to its own file
  kernel/lockdep: move lockdep sysctls to its own file
  mm: move page-writeback sysctls to their own file
  mm: move oom_kill sysctls to their own file
  kernel/reboot: move reboot sysctls to its own file
  sched: Move energy_aware sysctls to topology.c
  ...
</content>
</entry>
<entry>
<title>Merge tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux</title>
<updated>2022-05-25T17:32:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-25T17:32:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=537e62c865dcb9b91d07ed83f8615b71fa0b51bb'/>
<id>urn:sha1:537e62c865dcb9b91d07ed83f8615b71fa0b51bb</id>
<content type='text'>
Pull printk updates from Petr Mladek:

 - Offload writing printk() messages on consoles to per-console
   kthreads.

   It prevents soft-lockups when an extensive amount of messages is
   printed. It was observed, for example, during boot of large systems
   with a lot of peripherals like disks or network interfaces.

   It prevents live-lockups that were observed, for example, when
   messages about allocation failures were reported and a CPU handled
   consoles instead of reclaiming the memory. It was hard to solve even
   with rate limiting because it would need to take into account the
   amount of messages and the speed of all consoles.

   It is a must to have for real time. Otherwise, any printk() might
   break latency guarantees.

   The per-console kthreads allow to handle each console on its own
   speed. Slow consoles do not longer slow down faster ones. And
   printk() does not longer unpredictably slows down various code paths.

   There are situations when the kthreads are either not available or
   not reliable, for example, early boot, suspend, or panic. In these
   situations, printk() uses the legacy mode and tries to handle
   consoles immediately.

 - Add documentation for the printk index.

* tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk, tracing: fix console tracepoint
  printk: remove @console_locked
  printk: extend console_lock for per-console locking
  printk: add kthread console printers
  printk: add functions to prefer direct printing
  printk: add pr_flush()
  printk: move buffer definitions into console_emit_next_record() caller
  printk: refactor and rework printing logic
  printk: add con_printk() macro for console details
  printk: call boot_delay_msec() in printk_delay()
  printk: get caller_id/timestamp after migration disable
  printk: wake waiters for safe and NMI contexts
  printk: wake up all waiters
  printk: add missing memory barrier to wake_up_klogd()
  printk: cpu sync always disable interrupts
  printk: rename cpulock functions
  printk/index: Printk index feature documentation
  MAINTAINERS: Add printk indexing maintainers on mention of printk_index
</content>
</entry>
<entry>
<title>Merge branch 'exp.2022.05.11a' into HEAD</title>
<updated>2022-05-11T18:49:35Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@kernel.org</email>
</author>
<published>2022-05-11T18:49:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ce13389053a347aa9f8ffbfda2238352536e15c9'/>
<id>urn:sha1:ce13389053a347aa9f8ffbfda2238352536e15c9</id>
<content type='text'>
exp.2022.05.11a: Expedited-grace-period latency-reduction updates.
</content>
</entry>
<entry>
<title>rcu: Move expedited grace period (GP) work to RT kthread_worker</title>
<updated>2022-05-11T18:47:10Z</updated>
<author>
<name>Kalesh Singh</name>
<email>kaleshsingh@google.com</email>
</author>
<published>2022-04-09T00:35:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9621fbee44df940e2e1b94b0676460a538dffefa'/>
<id>urn:sha1:9621fbee44df940e2e1b94b0676460a538dffefa</id>
<content type='text'>
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period
latency because its workqueues run at SCHED_OTHER, and thus can be
delayed by normal processes.  This commit avoids these delays by moving
the expedited GP work items to a real-time-priority kthread_worker.

This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by
default on PREEMPT_RT=y kernels which disable expedited grace periods
after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1.

The results were evaluated on arm64 Android devices (6GB ram) running
5.10 kernel, and capturing trace data in critical user-level code.

The table below shows the resulting order-of-magnitude improvements
in synchronize_rcu_expedited() latency:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Count                    |          725  |            688  |         |
------------------------------------------------------------------------
| Min Duration       (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Q1                 (ns)  |       39,428  |         38,971  |  -1.16% |
------------------------------------------------------------------------
| Q2 - Median        (ns)  |       98,225  |         69,743  | -29.00% |
------------------------------------------------------------------------
| Q3                 (ns)  |      342,122  |        126,638  | -62.98% |
------------------------------------------------------------------------
| Max Duration       (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Avg Duration       (ns)  |    2,746,353  |        151,242  | -94.49% |
------------------------------------------------------------------------
| Standard Deviation (ns)  |   19,327,765  |        294,408  |         |
------------------------------------------------------------------------

The below table show the range of maximums/minimums for
synchronize_rcu_expedited() latency from all experiments:

------------------------------------------------------------------------
|                          |   workqueues  |  kthread_worker |  Diff   |
------------------------------------------------------------------------
| Total No. of Experiments |           25  |             23  |         |
------------------------------------------------------------------------
| Largest  Maximum   (ns)  |  372,766,967  |      2,329,671  | -99.38% |
------------------------------------------------------------------------
| Smallest Maximum   (ns)  |       38,819  |         86,954  | 124.00% |
------------------------------------------------------------------------
| Range of Maximums  (ns)  |  372,728,148  |      2,242,717  |         |
------------------------------------------------------------------------
| Largest  Minimum   (ns)  |       88,623  |         27,588  | -68.87% |
------------------------------------------------------------------------
| Smallest Minimum   (ns)  |          326  |            447  |  37.12% |
------------------------------------------------------------------------
| Range of Minimums  (ns)  |       88,297  |         27,141  |         |
------------------------------------------------------------------------

Cc: "Paul E. McKenney" &lt;paulmck@kernel.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Tim Murray &lt;timmurray@google.com&gt;
Reported-by: Wei Wang &lt;wvw@google.com&gt;
Tested-by: Kyle Lin &lt;kylelin@google.com&gt;
Tested-by: Chunwei Lu &lt;chunweilu@google.com&gt;
Tested-by: Lulu Wang &lt;luluw@google.com&gt;
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT</title>
<updated>2022-05-11T18:38:50Z</updated>
<author>
<name>Uladzislau Rezki</name>
<email>uladzislau.rezki@sony.com</email>
</author>
<published>2022-02-16T13:52:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=28b3ae426598e722cf5d5ab9cc7038791b955a56'/>
<id>urn:sha1:28b3ae426598e722cf5d5ab9cc7038791b955a56</id>
<content type='text'>
Currently both expedited and regular grace period stall warnings use
a single timeout value that with units of seconds.  However, recent
Android use cases problem require a sub-100-millisecond expedited RCU CPU
stall warning.  Given that expedited RCU grace periods normally complete
in far less than a single millisecond, especially for small systems,
this is not unreasonable.

Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel
configuration that defaults to 20 msec on Android and remains the same
as that of the non-expedited stall warnings otherwise.  It also can be
changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout.

[ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ]

Signed-off-by: Uladzislau Rezki &lt;uladzislau.rezki@sony.com&gt;
Signed-off-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
</content>
</entry>
</feed>
