<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v5.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-10-19T15:46:11Z</updated>
<entry>
<title>sched/scs: Reset the shadow stack when idle_task_exit</title>
<updated>2021-10-19T15:46:11Z</updated>
<author>
<name>Woody Lin</name>
<email>woodylin@google.com</email>
</author>
<published>2021-10-12T08:35:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63acd42c0d4942f74710b11c38602fb14dea7320'/>
<id>urn:sha1:63acd42c0d4942f74710b11c38602fb14dea7320</id>
<content type='text'>
Commit f1a0a376ca0c ("sched/core: Initialize the idle task with
preemption disabled") removed the init_idle() call from
idle_thread_get(). This was the sole call-path on hotplug that resets
the Shadow Call Stack (scs) Stack Pointer (sp).

Not resetting the scs-sp leads to scs overflow after enough hotplug
cycles. Therefore add an explicit scs_task_reset() to the hotplug code
to make sure the scs-sp does get reset on hotplug.

Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
Signed-off-by: Woody Lin &lt;woodylin@google.com&gt;
[peterz: Changelog]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Link: https://lore.kernel.org/r/20211012083521.973587-1-woodylin@google.com
</content>
</entry>
<entry>
<title>sched/fair: Null terminate buffer when updating tunable_scaling</title>
<updated>2021-10-01T11:57:57Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2021-09-27T11:46:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=703066188f63d66cc6b9d678e5b5ef1213c5938e'/>
<id>urn:sha1:703066188f63d66cc6b9d678e5b5ef1213c5938e</id>
<content type='text'>
This patch null-terminates the temporary buffer in sched_scaling_write()
so kstrtouint() does not return failure and checks the value is valid.

Before:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 &gt; /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1

After:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 &gt; /sys/kernel/debug/sched/tunable_scaling
  $ cat /sys/kernel/debug/sched/tunable_scaling
  0
  $ echo 3 &gt; /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument

Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20210927114635.GH3959@techsingularity.net
</content>
</entry>
<entry>
<title>sched/fair: Add ancestors of unthrottled undecayed cfs_rq</title>
<updated>2021-10-01T11:57:57Z</updated>
<author>
<name>Michal Koutný</name>
<email>mkoutny@suse.com</email>
</author>
<published>2021-09-17T15:30:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2630cde26711dab0d0b56a8be1616475be646d13'/>
<id>urn:sha1:2630cde26711dab0d0b56a8be1616475be646d13</id>
<content type='text'>
Since commit a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to
list on unthrottle") we add cfs_rqs with no runnable tasks but not fully
decayed into the load (leaf) list. We may ignore adding some ancestors
and therefore breaking tmp_alone_branch invariant. This broke LTP test
cfs_bandwidth01 and it was partially fixed in commit fdaba61ef8a2
("sched/fair: Ensure that the CFS parent is added after unthrottling").

I noticed the named test still fails even with the fix (but with low
probability, 1 in ~1000 executions of the test). The reason is when
bailing out of unthrottle_cfs_rq early, we may miss adding ancestors of
the unthrottled cfs_rq, thus, not joining tmp_alone_branch properly.

Fix this by adding ancestors if we notice the unthrottled cfs_rq was
added to the load list.

Fixes: a7b359fc6a37 ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")
Signed-off-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Odin Ugedal &lt;odin@uged.al&gt;
Link: https://lore.kernel.org/r/20210917153037.11176-1-mkoutny@suse.com
</content>
</entry>
<entry>
<title>Merge tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2021-09-12T18:37:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-12T18:37:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=56c244382fdb793986097df8e29f9f9320bc2c60'/>
<id>urn:sha1:56c244382fdb793986097df8e29f9f9320bc2c60</id>
<content type='text'>
Pull scheduler fixes from Borislav Petkov:

 - Make sure the idle timer expires in hardirq context, on PREEMPT_RT

 - Make sure the run-queue balance callback is invoked only on the
   outgoing CPU

* tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Prevent balance_push() on remote runqueues
  sched/idle: Make the idle timer expire in hard interrupt context
</content>
</entry>
<entry>
<title>sched: Prevent balance_push() on remote runqueues</title>
<updated>2021-09-09T09:27:23Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-28T13:55:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=868ad33bfa3bf39960982682ad3a0f8ebda1656e'/>
<id>urn:sha1:868ad33bfa3bf39960982682ad3a0f8ebda1656e</id>
<content type='text'>
sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
callback after changing priorities or the scheduling class of a task. The
run-queue for which the callback is invoked can be local or remote.

That's not a problem for the regular rq::push_work which is serialized with
a busy flag in the run-queue struct, but for the balance_push() work which
is only valid to be invoked on the outgoing CPU that's wrong. It not only
triggers the debug warning, but also leaves the per CPU variable push_work
unprotected, which can result in double enqueues on the stop machine list.

Remove the warning and validate that the function is invoked on the
outgoing CPU.

Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
Reported-by: Sebastian Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87zgt1hdw7.ffs@tglx
</content>
</entry>
<entry>
<title>sched/idle: Make the idle timer expire in hard interrupt context</title>
<updated>2021-09-09T08:36:16Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2021-09-06T11:30:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9848417926353daa59d2b05eb26e185063dbac6e'/>
<id>urn:sha1:9848417926353daa59d2b05eb26e185063dbac6e</id>
<content type='text'>
The intel powerclamp driver will setup a per-CPU worker with RT
priority. The worker will then invoke play_idle() in which it remains in
the idle poll loop until it is stopped by the timer it started earlier.

That timer needs to expire in hard interrupt context on PREEMPT_RT.
Otherwise the timer will expire in ksoftirqd as a SOFT timer but that task
won't be scheduled on the CPU because its priority is lower than the
priority of the worker which is in the idle loop.

Always expire the idle timer in hard interrupt context.

Reported-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210906113034.jgfxrjdvxnjqgtmc@linutronix.de

</content>
</entry>
<entry>
<title>Merge tag 'pm-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2021-08-31T20:21:58Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-08-31T20:21:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5cbba60596b1f32f637190ca9ed5b1acdadb852c'/>
<id>urn:sha1:5cbba60596b1f32f637190ca9ed5b1acdadb852c</id>
<content type='text'>
Pull power management updates from Rafael Wysocki:
 "These address some PCI device power management issues, add new
  hardware support to the RAPL power capping driver, add HWP guaranteed
  performance change notification support to the intel_pstate driver,
  replace deprecated CPU-hotplug functions in a few places, update CPU
  PM notifiers to use raw spinlocks, update the PM domains framework
  (new DT property support, Kconfig fix), do a couple of cleanups in
  code related to system sleep, and improve the energy model and the
  schedutil cpufreq governor.

  Specifics:

   - Address 3 PCI device power management issues (Rafael Wysocki).

   - Add Power Limit4 support for Alder Lake to the Intel RAPL power
     capping driver (Sumeet Pawnikar).

   - Add HWP guaranteed performance change notification support to the
     intel_pstate driver (Srinivas Pandruvada).

   - Replace deprecated CPU-hotplug functions in code related to power
     management (Sebastian Andrzej Siewior).

   - Update CPU PM notifiers to use raw spinlocks (Valentin Schneider).

   - Add support for 'required-opps' DT property to the generic power
     domains (genpd) framework and use this property for I2C on ARM64
     sc7180 (Rajendra Nayak).

   - Fix Kconfig issue related to genpd (Geert Uytterhoeven).

   - Increase energy calculation precision in the Energy Model (Lukasz
     Luba).

   - Fix kobject deletion in the exit code of the schedutil cpufreq
     governor (Kevin Hao).

   - Unmark some functions as kernel-doc in the PM core to avoid
     false-positive documentation build warnings (Randy Dunlap).

   - Check RTC features instead of ops in suspend_test Alexandre
     Belloni)"

* tag 'pm-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: domains: Fix domain attach for CONFIG_PM_OPP=n
  powercap: Add Power Limit4 support for Alder Lake SoC
  cpufreq: intel_pstate: Process HWP Guaranteed change notification
  thermal: intel: Allow processing of HWP interrupt
  notifier: Remove atomic_notifier_call_chain_robust()
  PM: cpu: Make notifier chain use a raw_spinlock_t
  PM: sleep: unmark 'state' functions as kernel-doc
  arm64: dts: sc7180: Add required-opps for i2c
  PM: domains: Add support for 'required-opps' to set default perf state
  opp: Don't print an error if required-opps is missing
  cpufreq: schedutil: Use kobject release() method to free sugov_tunables
  PM: EM: Increase energy calculation precision
  PM: sleep: check RTC features instead of ops in suspend_test
  PM: sleep: s2idle: Replace deprecated CPU-hotplug functions
  cpufreq: Replace deprecated CPU-hotplug functions
  powercap: intel_rapl: Replace deprecated CPU-hotplug functions
  PCI: PM: Enable PME if it can be signaled from D3cold
  PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
  PCI: Use pci_update_current_state() in pci_enable_device_flags()
</content>
</entry>
<entry>
<title>Merge tag 'locking-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2021-08-30T21:26:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-08-30T21:26:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e5e726f7bb9f711102edea7e5bd511835640e3b4'/>
<id>urn:sha1:e5e726f7bb9f711102edea7e5bd511835640e3b4</id>
<content type='text'>
Pull locking and atomics updates from Thomas Gleixner:
 "The regular pile:

   - A few improvements to the mutex code

   - Documentation updates for atomics to clarify the difference between
     cmpxchg() and try_cmpxchg() and to explain the forward progress
     expectations.

   - Simplification of the atomics fallback generator

   - The addition of arch_atomic_long*() variants and generic arch_*()
     bitops based on them.

   - Add the missing might_sleep() invocations to the down*() operations
     of semaphores.

  The PREEMPT_RT locking core:

   - Scheduler updates to support the state preserving mechanism for
     'sleeping' spin- and rwlocks on RT.

     This mechanism is carefully preserving the state of the task when
     blocking on a 'sleeping' spin- or rwlock and takes regular wake-ups
     targeted at the same task into account. The preserved or updated
     (via a regular wakeup) state is restored when the lock has been
     acquired.

   - Restructuring of the rtmutex code so it can be utilized and
     extended for the RT specific lock variants.

   - Restructuring of the ww_mutex code to allow sharing of the ww_mutex
     specific functionality for rtmutex based ww_mutexes.

   - Header file disentangling to allow substitution of the regular lock
     implementations with the PREEMPT_RT variants without creating an
     unmaintainable #ifdef mess.

   - Shared base code for the PREEMPT_RT specific rw_semaphore and
     rwlock implementations.

     Contrary to the regular rw_semaphores and rwlocks the PREEMPT_RT
     implementation is writer unfair because it is infeasible to do
     priority inheritance on multiple readers. Experience over the years
     has shown that real-time workloads are not the typical workloads
     which are sensitive to writer starvation.

     The alternative solution would be to allow only a single reader
     which has been tried and discarded as it is a major bottleneck
     especially for mmap_sem. Aside of that many of the writer
     starvation critical usage sites have been converted to a writer
     side mutex/spinlock and RCU read side protections in the past
     decade so that the issue is less prominent than it used to be.

   - The actual rtmutex based lock substitutions for PREEMPT_RT enabled
     kernels which affect mutex, ww_mutex, rw_semaphore, spinlock_t and
     rwlock_t. The spin/rw_lock*() functions disable migration across
     the critical section to preserve the existing semantics vs per-CPU
     variables.

   - Rework of the futex REQUEUE_PI mechanism to handle the case of
     early wake-ups which interleave with a re-queue operation to
     prevent the situation that a task would be blocked on both the
     rtmutex associated to the outer futex and the rtmutex based hash
     bucket spinlock.

     While this situation cannot happen on !RT enabled kernels the
     changes make the underlying concurrency problems easier to
     understand in general. As a result the difference between !RT and
     RT kernels is reduced to the handling of waiting for the critical
     section. !RT kernels simply spin-wait as before and RT kernels
     utilize rcu_wait().

   - The substitution of local_lock for PREEMPT_RT with a spinlock which
     protects the critical section while staying preemptible. The CPU
     locality is established by disabling migration.

  The underlying concepts of this code have been in use in PREEMPT_RT for
  way more than a decade. The code has been refactored several times over
  the years and this final incarnation has been optimized once again to be
  as non-intrusive as possible, i.e. the RT specific parts are mostly
  isolated.

  It has been extensively tested in the 5.14-rt patch series and it has
  been verified that !RT kernels are not affected by these changes"

* tag 'locking-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (92 commits)
  locking/rtmutex: Return success on deadlock for ww_mutex waiters
  locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes
  locking/rtmutex: Dequeue waiter on ww_mutex deadlock
  locking/rtmutex: Dont dereference waiter lockless
  locking/semaphore: Add might_sleep() to down_*() family
  locking/ww_mutex: Initialize waiter.ww_ctx properly
  static_call: Update API documentation
  locking/local_lock: Add PREEMPT_RT support
  locking/spinlock/rt: Prepare for RT local_lock
  locking/rtmutex: Add adaptive spinwait mechanism
  locking/rtmutex: Implement equal priority lock stealing
  preempt: Adjust PREEMPT_LOCK_OFFSET for RT
  locking/rtmutex: Prevent lockdep false positive with PI futexes
  futex: Prevent requeue_pi() lock nesting issue on RT
  futex: Simplify handle_early_requeue_pi_wakeup()
  futex: Reorder sanity checks in futex_requeue()
  futex: Clarify comment in futex_requeue()
  futex: Restructure futex_requeue()
  futex: Correct the number of requeued waiters for PI
  futex: Remove bogus condition for requeue PI
  ...
</content>
</entry>
<entry>
<title>Merge tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2021-08-30T20:42:10Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-08-30T20:42:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d3c0db4598c5de511824649df2aa976259cf10a'/>
<id>urn:sha1:5d3c0db4598c5de511824649df2aa976259cf10a</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - The biggest change in this cycle is scheduler support for asymmetric
   scheduling affinity, to support the execution of legacy 32-bit tasks
   on AArch32 systems that also have 64-bit-only CPUs.

   Architectures can fill in this functionality by defining their own
   task_cpu_possible_mask(p). When this is done, the scheduler will make
   sure the task will only be scheduled on CPUs that support it.

   (The actual arm64 specific changes are not part of this tree.)

   For other architectures there will be no change in functionality.

 - Add cgroup SCHED_IDLE support

 - Increase node-distance flexibility &amp; delay determining it until a CPU
   is brought online. (This enables platforms where node distance isn't
   final until the CPU is only.)

 - Deadline scheduler enhancements &amp; fixes

 - Misc fixes &amp; cleanups.

* tag 'sched-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  eventfd: Make signal recursion protection a task bit
  sched/fair: Mark tg_is_idle() an inline in the !CONFIG_FAIR_GROUP_SCHED case
  sched: Introduce dl_task_check_affinity() to check proposed affinity
  sched: Allow task CPU affinity to be restricted on asymmetric systems
  sched: Split the guts of sched_setaffinity() into a helper function
  sched: Introduce task_struct::user_cpus_ptr to track requested affinity
  sched: Reject CPU affinity changes based on task_cpu_possible_mask()
  cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq()
  cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()
  cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1
  sched: Introduce task_cpu_possible_mask() to limit fallback rq selection
  sched: Cgroup SCHED_IDLE support
  sched/topology: Skip updating masks for non-online nodes
  sched: Replace deprecated CPU-hotplug functions.
  sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
  sched: Fix UCLAMP_FLAG_IDLE setting
  sched/deadline: Fix missing clock update in migrate_task_rq_dl()
  sched/fair: Avoid a second scan of target in select_idle_cpu
  sched/fair: Use prev instead of new target as recent_used_cpu
  sched: Don't report SCHED_FLAG_SUGOV in sched_getattr()
  ...
</content>
</entry>
<entry>
<title>Merge branch 'core-rcu.2021.08.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu</title>
<updated>2021-08-30T19:48:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-08-30T19:48:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4ca4256453effb885c1688633676682529593f82'/>
<id>urn:sha1:4ca4256453effb885c1688633676682529593f82</id>
<content type='text'>
Pull RCU updates from Paul McKenney:
 "RCU changes for this cycle were:

   - Documentation updates

   - Miscellaneous fixes

   - Offloaded-callbacks updates

   - Updates to the nolibc library

   - Tasks-RCU updates

   - In-kernel torture-test updates

   - Torture-test scripting, perhaps most notably the pinning of
     torture-test guest OSes so as to force differences in memory
     latency. For example, in a two-socket system, a four-CPU guest OS
     will have one pair of its CPUs pinned to threads in a single core
     on one socket and the other pair pinned to threads in a single core
     on the other socket. This approach proved able to force race
     conditions that earlier testing missed. Some of these race
     conditions are still being tracked down"

* 'core-rcu.2021.08.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (61 commits)
  torture: Replace deprecated CPU-hotplug functions.
  rcu: Replace deprecated CPU-hotplug functions
  rcu: Print human-readable message for schedule() in RCU reader
  rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCU
  rcu: Use per_cpu_ptr to get the pointer of per_cpu variable
  rcu: Remove useless "ret" update in rcu_gp_fqs_loop()
  rcu: Mark accesses in tree_stall.h
  rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack
  rcu: Mark lockless -&gt;qsmask read in rcu_check_boost_fail()
  srcutiny: Mark read-side data races
  rcu: Start timing stall repetitions after warning complete
  rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()
  rcu/tree: Handle VM stoppage in stall detection
  rculist: Unify documentation about missing list_empty_rcu()
  rcu: Mark accesses to -&gt;rcu_read_lock_nesting
  rcu: Weaken -&gt;dynticks accesses and updates
  rcu: Remove special bit at the bottom of the -&gt;dynticks counter
  rcu: Fix stall-warning deadlock due to non-release of rcu_node -&gt;lock
  rcu: Fix to include first blocked task in stall warning
  torture: Make kvm-test-1-run-qemu.sh check for reboot loops
  ...
</content>
</entry>
</feed>
