<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v5.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-08-26T17:02:00Z</updated>
<entry>
<title>sched: Fix get_push_task() vs migrate_disable()</title>
<updated>2021-08-26T17:02:00Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2021-08-26T13:37:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e681dcbaa4b284454fecd09617f8b24231448446'/>
<id>urn:sha1:e681dcbaa4b284454fecd09617f8b24231448446</id>
<content type='text'>
push_rt_task() attempts to move the currently running task away if the
next runnable task has migration disabled and therefore is pinned on the
current CPU.

The current task is retrieved via get_push_task() which only checks for
nr_cpus_allowed == 1, but does not check whether the task has migration
disabled and therefore cannot be moved either. The consequence is a
pointless invocation of the migration thread which correctly observes
that the task cannot be moved.

Return NULL if the task has migration disabled and cannot be moved to
another CPU.

Fixes: a7c81556ec4d3 ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de
</content>
</entry>
<entry>
<title>sched: Fix Core-wide rq-&gt;lock for uninitialized CPUs</title>
<updated>2021-08-20T10:32:53Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-08-19T11:09:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3c474b3239f12fe0b00d7e82481f36a1f31e79ab'/>
<id>urn:sha1:3c474b3239f12fe0b00d7e82481f36a1f31e79ab</id>
<content type='text'>
Eugene tripped over the case where rq_lock(), as called in a
for_each_possible_cpu() loop came apart because rq-&gt;core hadn't been
setup yet.

This is a somewhat unusual, but valid case.

Rework things such that rq-&gt;core is initialized to point at itself. IOW
initialize each CPU as a single threaded Core. CPU online will then join
the new CPU (thread) to an existing Core where needed.

For completeness sake, have CPU offline fully undo the state so as to
not presume the topology will match the next time it comes online.

Fixes: 9edeaea1bc45 ("sched: Core-wide rq-&gt;lock")
Reported-by: Eugene Syromiatnikov &lt;esyr@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Josh Don &lt;joshdon@google.com&gt;
Tested-by: Eugene Syromiatnikov &lt;esyr@redhat.com&gt;
Link: https://lkml.kernel.org/r/YR473ZGeKqMs6kw+@hirez.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>sched/rt: Fix double enqueue caused by rt_effective_prio</title>
<updated>2021-08-04T13:16:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-08-03T10:45:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f558c2b834ec27e75d37b1c860c139e7b7c3a8e4'/>
<id>urn:sha1:f558c2b834ec27e75d37b1c860c139e7b7c3a8e4</id>
<content type='text'>
Double enqueues in rt runqueues (list) have been reported while running
a simple test that spawns a number of threads doing a short sleep/run
pattern while being concurrently setscheduled between rt and fair class.

  WARNING: CPU: 3 PID: 2825 at kernel/sched/rt.c:1294 enqueue_task_rt+0x355/0x360
  CPU: 3 PID: 2825 Comm: setsched__13
  RIP: 0010:enqueue_task_rt+0x355/0x360
  Call Trace:
   __sched_setscheduler+0x581/0x9d0
   _sched_setscheduler+0x63/0xa0
   do_sched_setscheduler+0xa0/0x150
   __x64_sys_sched_setscheduler+0x1a/0x30
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xae

  list_add double add: new=ffff9867cb629b40, prev=ffff9867cb629b40,
		       next=ffff98679fc67ca0.
  kernel BUG at lib/list_debug.c:31!
  invalid opcode: 0000 [#1] PREEMPT_RT SMP PTI
  CPU: 3 PID: 2825 Comm: setsched__13
  RIP: 0010:__list_add_valid+0x41/0x50
  Call Trace:
   enqueue_task_rt+0x291/0x360
   __sched_setscheduler+0x581/0x9d0
   _sched_setscheduler+0x63/0xa0
   do_sched_setscheduler+0xa0/0x150
   __x64_sys_sched_setscheduler+0x1a/0x30
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x44/0xae

__sched_setscheduler() uses rt_effective_prio() to handle proper queuing
of priority boosted tasks that are setscheduled while being boosted.
rt_effective_prio() is however called twice per each
__sched_setscheduler() call: first directly by __sched_setscheduler()
before dequeuing the task and then by __setscheduler() to actually do
the priority change. If the priority of the pi_top_task is concurrently
being changed however, it might happen that the two calls return
different results. If, for example, the first call returned the same rt
priority the task was running at and the second one a fair priority, the
task won't be removed by the rt list (on_list still set) and then
enqueued in the fair runqueue. When eventually setscheduled back to rt
it will be seen as enqueued already and the WARNING/BUG be issued.

Fix this by calling rt_effective_prio() only once and then reusing the
return value. While at it refactor code as well for clarity. Concurrent
priority inheritance handling is still safe and will eventually converge
to a new state by following the inheritance chain(s).

Fixes: 0782e63bc6fe ("sched: Handle priority boosted tasks proper in setscheduler()")
[squashed Peterz changes; added changelog]
Reported-by: Mark Simmons &lt;msimmons@redhat.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210803104501.38333-1-juri.lelli@redhat.com
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2021-07-11T18:13:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-07-11T18:13:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=877029d9216dcc842f50d37571f318cd17a30a2d'/>
<id>urn:sha1:877029d9216dcc842f50d37571f318cd17a30a2d</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Three fixes:

   - Fix load tracking bug/inconsistency

   - Fix a sporadic CFS bandwidth constraints enforcement bug

   - Fix a uclamp utilization tracking bug for newly woken tasks"

* tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/uclamp: Ignore max aggregation if rq is idle
  sched/fair: Fix CFS bandwidth hrtimer expiry type
  sched/fair: Sync load_sum with load_avg after dequeue
</content>
</entry>
<entry>
<title>Merge tag 'pm-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2021-07-07T20:22:59Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-07-07T20:22:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aef4226f914016cc00affa8476ba5164dcca56fd'/>
<id>urn:sha1:aef4226f914016cc00affa8476ba5164dcca56fd</id>
<content type='text'>
Pull more power management updates from Rafael Wysocki:
 "These include cpufreq core simplifications and fixes, cpufreq driver
  updates, cpuidle driver update, a generic power domains (genpd)
  locking fix and a debug-related simplification of the PM core.

  Specifics:

   - Drop the -&gt;stop_cpu() (not really useful) and -&gt;resolve_freq()
     (unused) cpufreq driver callbacks and modify the users of the
     former accordingly (Viresh Kumar, Rafael Wysocki).

   - Add frequency invariance support to the ACPI CPPC cpufreq driver
     again along with the related fixes and cleanups (Viresh Kumar).

   - Update the Meditak, qcom and SCMI ARM cpufreq drivers (Fabien
     Parent, Seiya Wang, Sibi Sankar, Christophe JAILLET).

   - Rename black/white-lists in the DT cpufreq driver (Viresh Kumar).

   - Add generic performance domains support to the dvfs DT bindings
     (Sudeep Holla).

   - Refine locking in the generic power domains (genpd) support code to
     avoid lock dependency issues (Stephen Boyd).

   - Update the MSM and qcom ARM cpuidle drivers (Bartosz Dudziak).

   - Simplify the PM core debug code by using ktime_us_delta() to
     compute time interval lengths (Mark-PK Tsai)"

* tag 'pm-5.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (21 commits)
  PM: domains: Shrink locking area of the gpd_list_lock
  PM: sleep: Use ktime_us_delta() in initcall_debug_report()
  cpufreq: CPPC: Add support for frequency invariance
  arch_topology: Avoid use-after-free for scale_freq_data
  cpufreq: CPPC: Pass structure instance by reference
  cpufreq: CPPC: Fix potential memleak in cppc_cpufreq_cpu_init
  cpufreq: Remove -&gt;resolve_freq()
  cpufreq: Reuse cpufreq_driver_resolve_freq() in __cpufreq_driver_target()
  cpufreq: Remove the -&gt;stop_cpu() driver callback
  cpufreq: powernv: Migrate to -&gt;exit() callback instead of -&gt;stop_cpu()
  cpufreq: CPPC: Migrate to -&gt;exit() callback instead of -&gt;stop_cpu()
  cpufreq: intel_pstate: Combine -&gt;stop_cpu() and -&gt;offline()
  cpuidle: qcom: Add SPM register data for MSM8226
  dt-bindings: arm: msm: Add SAW2 for MSM8226
  dt-bindings: cpufreq: update cpu type and clock name for MT8173 SoC
  clk: mediatek: remove deprecated CLK_INFRA_CA57SEL for MT8173 SoC
  cpufreq: dt: Rename black/white-lists
  cpufreq: scmi: Fix an error message
  cpufreq: mediatek: add support for mt8365
  dt-bindings: dvfs: Add support for generic performance domains
  ...
</content>
</entry>
<entry>
<title>sched/uclamp: Ignore max aggregation if rq is idle</title>
<updated>2021-07-02T13:58:24Z</updated>
<author>
<name>Xuewen Yan</name>
<email>xuewen.yan@unisoc.com</email>
</author>
<published>2021-06-30T14:12:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3e1493f46390618ea78607cb30c58fc19e2a5035'/>
<id>urn:sha1:3e1493f46390618ea78607cb30c58fc19e2a5035</id>
<content type='text'>
When a task wakes up on an idle rq, uclamp_rq_util_with() would max
aggregate with rq value. But since there is no task enqueued yet, the
values are stale based on the last task that was running. When the new
task actually wakes up and enqueued, then the rq uclamp values should
reflect that of the newly woken up task effective uclamp values.

This is a problem particularly for uclamp_max because it default to
1024. If a task p with uclamp_max = 512 wakes up, then max aggregation
would ignore the capping that should apply when this task is enqueued,
which is wrong.

Fix that by ignoring max aggregation if the rq is idle since in that
case the effective uclamp value of the rq will be the ones of the task
that will wake up.

Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()")
Signed-off-by: Xuewen Yan &lt;xuewen.yan@unisoc.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
[qias: Changelog]
Reviewed-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Link: https://lore.kernel.org/r/20210630141204.8197-1-xuewen.yan94@gmail.com
</content>
</entry>
<entry>
<title>sched/fair: Fix CFS bandwidth hrtimer expiry type</title>
<updated>2021-07-02T13:58:24Z</updated>
<author>
<name>Odin Ugedal</name>
<email>odin@uged.al</email>
</author>
<published>2021-06-29T12:14:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=72d0ad7cb5bad265adb2014dbe46c4ccb11afaba'/>
<id>urn:sha1:72d0ad7cb5bad265adb2014dbe46c4ccb11afaba</id>
<content type='text'>
The time remaining until expiry of the refresh_timer can be negative.
Casting the type to an unsigned 64-bit value will cause integer
underflow, making the runtime_refresh_within return false instead of
true. These situations are rare, but they do happen.

This does not cause user-facing issues or errors; other than
possibly unthrottling cfs_rq's using runtime from the previous period(s),
making the CFS bandwidth enforcement less strict in those (special)
situations.

Signed-off-by: Odin Ugedal &lt;odin@uged.al&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Link: https://lore.kernel.org/r/20210629121452.18429-1-odin@uged.al
</content>
</entry>
<entry>
<title>sched/fair: Sync load_sum with load_avg after dequeue</title>
<updated>2021-07-02T13:58:23Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2021-07-01T17:18:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ceb6ba45dc8074d2a1ec1117463dc94a20d4203d'/>
<id>urn:sha1:ceb6ba45dc8074d2a1ec1117463dc94a20d4203d</id>
<content type='text'>
commit 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
reported some inconsitencies between *_avg and *_sum.

commit 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent")
fixed some but one remains when dequeuing load.

sync the cfs's load_sum with its load_avg after dequeuing the load of a
sched_entity.

Fixes: 9e077b52d86a ("sched/pelt: Check that *_avg are null when *_sum are")
Reported-by: Sachin Sant &lt;sachinp@linux.vnet.ibm.com&gt;
Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Odin Ugedal &lt;odin@uged.al&gt;
Tested-by: Sachin Sant &lt;sachinp@linux.vnet.ibm.com&gt;
Link: https://lore.kernel.org/r/20210701171837.32156-1-vincent.guittot@linaro.org
</content>
</entry>
<entry>
<title>Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2021-07-02T00:22:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-07-02T00:22:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3dbdb38e286903ec220aaf1fb29a8d94297da246'/>
<id>urn:sha1:3dbdb38e286903ec220aaf1fb29a8d94297da246</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cgroup.kill is added which implements atomic killing of the whole
   subtree.

   Down the line, this should be able to replace the multiple userland
   implementations of "keep killing till empty".

 - PSI can now be turned off at boot time to avoid overhead for
   configurations which don't care about PSI.

* 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: make per-cgroup pressure stall tracking configurable
  cgroup: Fix kernel-doc
  cgroup: inline cgroup_task_freeze()
  tests/cgroup: test cgroup.kill
  tests/cgroup: move cg_wait_for(), cg_prepare_for_wait()
  tests/cgroup: use cgroup.kill in cg_killall()
  docs/cgroup: add entry for cgroup.kill
  cgroup: introduce cgroup.kill
</content>
</entry>
<entry>
<title>cpufreq: CPPC: Add support for frequency invariance</title>
<updated>2021-07-01T02:02:14Z</updated>
<author>
<name>Viresh Kumar</name>
<email>viresh.kumar@linaro.org</email>
</author>
<published>2020-06-23T10:19:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1eb5dde674f57b1a1918dab33f09e35cdd64eb07'/>
<id>urn:sha1:1eb5dde674f57b1a1918dab33f09e35cdd64eb07</id>
<content type='text'>
The Frequency Invariance Engine (FIE) is providing a frequency scaling
correction factor that helps achieve more accurate load-tracking.

Normally, this scaling factor can be obtained directly with the help of
the cpufreq drivers as they know the exact frequency the hardware is
running at. But that isn't the case for CPPC cpufreq driver.

Another way of obtaining that is using the arch specific counter
support, which is already present in kernel, but that hardware is
optional for platforms.

This patch updates the CPPC driver to register itself with the topology
core to provide its own implementation (cppc_scale_freq_tick()) of
topology_scale_freq_tick() which gets called by the scheduler on every
tick. Note that the arch specific counters have higher priority than
CPPC counters, if available, though the CPPC driver doesn't need to have
any special handling for that.

On an invocation of cppc_scale_freq_tick(), we schedule an irq work
(since we reach here from hard-irq context), which then schedules a
normal work item and cppc_scale_freq_workfn() updates the per_cpu
arch_freq_scale variable based on the counter updates since the last
tick.

To allow platforms to disable this CPPC counter-based frequency
invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE,
which is enabled by default.

This also exports sched_setattr_nocheck() as the CPPC driver can be
built as a module.

Cc: linux-acpi@vger.kernel.org
Tested-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Ionela Voinescu &lt;ionela.voinescu@arm.com&gt;
Tested-by: Qian Cai &lt;quic_qiancai@quicinc.com&gt;
Signed-off-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
</content>
</entry>
</feed>
