<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/core.c, branch v5.10</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.10</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.10'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-11-17T12:15:28Z</updated>
<entry>
<title>sched/deadline: Fix priority inheritance with multiple scheduling classes</title>
<updated>2020-11-17T12:15:28Z</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2020-11-17T06:14:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2279f540ea7d05f22d2f0c4224319330228586bc'/>
<id>urn:sha1:2279f540ea7d05f22d2f0c4224319330228586bc</id>
<content type='text'>
Glenn reported that "an application [he developed produces] a BUG in
deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
PTHREAD_PRIO_INHERIT mutexes.  I believe the bug is triggered when a CFS
task that was boosted by a SCHED_DEADLINE task boosts another CFS task
(nested priority inheritance).

 ------------[ cut here ]------------
 kernel BUG at kernel/sched/deadline.c:1462!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
 Hardware name: ...
 RIP: 0010:enqueue_task_dl+0x335/0x910
 Code: ...
 RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002
 RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500
 RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600
 RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078
 R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600
 R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600
 FS:  00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  ? intel_pstate_update_util_hwp+0x13/0x170
  rt_mutex_setprio+0x1cc/0x4b0
  task_blocks_on_rt_mutex+0x225/0x260
  rt_spin_lock_slowlock_locked+0xab/0x2d0
  rt_spin_lock_slowlock+0x50/0x80
  hrtimer_grab_expiry_lock+0x20/0x30
  hrtimer_cancel+0x13/0x30
  do_nanosleep+0xa0/0x150
  hrtimer_nanosleep+0xe1/0x230
  ? __hrtimer_init_sleeper+0x60/0x60
  __x64_sys_nanosleep+0x8d/0xa0
  do_syscall_64+0x4a/0x100
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fa58b52330d
 ...
 ---[ end trace 0000000000000002 ]—

He also provided a simple reproducer creating the situation below:

 So the execution order of locking steps are the following
 (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
 are mutexes that are enabled * with priority inheritance.)

 Time moves forward as this timeline goes down:

 N1              N2               D1
 |               |                |
 |               |                |
 Lock(M1)        |                |
 |               |                |
 |             Lock(M2)           |
 |               |                |
 |               |              Lock(M2)
 |               |                |
 |             Lock(M1)           |
 |             (!!bug triggered!) |

Daniel reported a similar situation as well, by just letting ksoftirqd
run with DEADLINE (and eventually block on a mutex).

Problem is that boosted entities (Priority Inheritance) use static
DEADLINE parameters of the top priority waiter. However, there might be
cases where top waiter could be a non-DEADLINE entity that is currently
boosted by a DEADLINE entity from a different lock chain (i.e., nested
priority chains involving entities of non-DEADLINE classes). In this
case, top waiter static DEADLINE parameters could be null (initialized
to 0 at fork()) and replenish_dl_entity() would hit a BUG().

Fix this by keeping track of the original donor and using its parameters
when a task is boosted.

Reported-by: Glenn Elliott &lt;glenn@aurora.tech&gt;
Reported-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com
</content>
</entry>
<entry>
<title>sched: Fix rq-&gt;nr_iowait ordering</title>
<updated>2020-11-17T12:15:28Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-09-24T11:50:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec618b84f6e15281cc3660664d34cd0dd2f2579e'/>
<id>urn:sha1:ec618b84f6e15281cc3660664d34cd0dd2f2579e</id>
<content type='text'>
  schedule()				ttwu()
    deactivate_task();			  if (p-&gt;on_rq &amp;&amp; ...) // false
					    atomic_dec(&amp;task_rq(p)-&gt;nr_iowait);
    if (prev-&gt;in_iowait)
      atomic_inc(&amp;rq-&gt;nr_iowait);

Allows nr_iowait to be decremented before it gets incremented,
resulting in more dodgy IO-wait numbers than usual.

Note that because we can now do ttwu_queue_wakelist() before
p-&gt;on_cpu==0, we lose the natural ordering and have to further delay
the decrement.

Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p-&gt;on_cpu")
Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Link: https://lkml.kernel.org/r/20201117093829.GD3121429@hirez.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>sched/features: Fix !CONFIG_JUMP_LABEL case</title>
<updated>2020-10-14T17:55:46Z</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2020-10-13T05:31:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a73f863af4ce9730795eab7097fb2102e6854365'/>
<id>urn:sha1:a73f863af4ce9730795eab7097fb2102e6854365</id>
<content type='text'>
Commit:

  765cc3a4b224e ("sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds")

made sched features static for !CONFIG_SCHED_DEBUG configurations, but
overlooked the CONFIG_SCHED_DEBUG=y and !CONFIG_JUMP_LABEL cases.

For the latter echoing changes to /sys/kernel/debug/sched_features has
the nasty effect of effectively changing what sched_features reports,
but without actually changing the scheduler behaviour (since different
translation units get different sysctl_sched_features).

Fix CONFIG_SCHED_DEBUG=y and !CONFIG_JUMP_LABEL configurations by properly
restructuring ifdefs.

Fixes: 765cc3a4b224e ("sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds")
Co-developed-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Signed-off-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Patrick Bellasi &lt;patrick.bellasi@matbug.net&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Link: https://lore.kernel.org/r/20201013053114.160628-1-juri.lelli@redhat.com
</content>
</entry>
<entry>
<title>Merge tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-10-12T19:56:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-10-12T19:56:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=edaa5ddf3833669a25654d42c0fb653dfdd906df'/>
<id>urn:sha1:edaa5ddf3833669a25654d42c0fb653dfdd906df</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - reorganize &amp; clean up the SD* flags definitions and add a bunch of
   sanity checks. These new checks caught quite a few bugs or at least
   inconsistencies, resulting in another set of patches.

 - rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ

 - add a new tracepoint to improve CPU capacity tracking

 - improve overloaded SMP system load-balancing behavior

 - tweak SMT balancing

 - energy-aware scheduling updates

 - NUMA balancing improvements

 - deadline scheduler fixes and improvements

 - CPU isolation fixes

 - misc cleanups, simplifications and smaller optimizations

* tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  sched/deadline: Unthrottle PI boosted threads while enqueuing
  sched/debug: Add new tracepoint to track cpu_capacity
  sched/fair: Tweak pick_next_entity()
  rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  rseq/selftests,x86_64: Add rseq_offset_deref_addv()
  rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  sched/fair: Use dst group while checking imbalance for NUMA balancer
  sched/fair: Reduce busy load balance interval
  sched/fair: Minimize concurrent LBs between domain level
  sched/fair: Reduce minimal imbalance threshold
  sched/fair: Relax constraint on task's load during load balance
  sched/fair: Remove the force parameter of update_tg_load_avg()
  sched/fair: Fix wrong cpu selecting from isolated domain
  sched: Remove unused inline function uclamp_bucket_base_value()
  sched/rt: Disable RT_RUNTIME_SHARE by default
  sched/deadline: Fix stale throttling on de-/boosted tasks
  sched/numa: Use runnable_avg to classify node
  sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
  MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
  sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of linux/sched/topology.h
  ...
</content>
</entry>
<entry>
<title>sched/debug: Add new tracepoint to track cpu_capacity</title>
<updated>2020-10-03T14:30:52Z</updated>
<author>
<name>Vincent Donnefort</name>
<email>vincent.donnefort@arm.com</email>
</author>
<published>2020-08-28T09:00:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51cf18c90ca1b51d1cb4af3064e85fcf8610b5d2'/>
<id>urn:sha1:51cf18c90ca1b51d1cb4af3064e85fcf8610b5d2</id>
<content type='text'>
rq-&gt;cpu_capacity is a key element in several scheduler parts, such as EAS
task placement and load balancing. Tracking this value enables testing
and/or debugging by a toolkit.

Signed-off-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/1598605249-72651-1-git-send-email-vincent.donnefort@arm.com
</content>
</entry>
<entry>
<title>sched: Remove unused inline function uclamp_bucket_base_value()</title>
<updated>2020-09-25T12:23:25Z</updated>
<author>
<name>YueHaibing</name>
<email>yuehaibing@huawei.com</email>
</author>
<published>2020-09-22T13:24:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51bd5121c4eb25b911f6bc1ab4de5fe865fe0dcb'/>
<id>urn:sha1:51bd5121c4eb25b911f6bc1ab4de5fe865fe0dcb</id>
<content type='text'>
There is no caller in tree, so can remove it.

Signed-off-by: YueHaibing &lt;yuehaibing@huawei.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Link: https://lkml.kernel.org/r/20200922132410.48440-1-yuehaibing@huawei.com
</content>
</entry>
<entry>
<title>sched: Cache task_struct::flags in sched_submit_work()</title>
<updated>2020-08-26T10:41:58Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2020-08-19T20:00:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c1cecf884ad748f63f9139d5a18ee265ee2f70fb'/>
<id>urn:sha1:c1cecf884ad748f63f9139d5a18ee265ee2f70fb</id>
<content type='text'>
sched_submit_work() is considered to be a hot path. The preempt_disable()
instruction is a compiler barrier and forces the compiler to load
task_struct::flags for the second comparison.
By using a local variable, the compiler can load the value once and keep it in
a register for the second comparison.

Verified on x86-64 with gcc-10.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200819200025.lqvmyefqnbok5i4f@linutronix.de
</content>
</entry>
<entry>
<title>treewide: Use fallthrough pseudo-keyword</title>
<updated>2020-08-23T22:36:59Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavoars@kernel.org</email>
</author>
<published>2020-08-23T22:36:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df561f6688fef775baa341a0f5d960becd248b11'/>
<id>urn:sha1:df561f6688fef775baa341a0f5d960becd248b11</id>
<content type='text'>
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-08-15T17:36:40Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-15T17:36:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1195d58f003bf84d37fc5c30017600e09ba567b2'/>
<id>urn:sha1:1195d58f003bf84d37fc5c30017600e09ba567b2</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Two fixes: fix a new tracepoint's output value, and fix the formatting
  of show-state syslog printouts"

* tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Fix the alignment of the show-state debug output
  sched: Fix use of count for nr_running tracepoint
</content>
</entry>
<entry>
<title>sched/debug: Fix the alignment of the show-state debug output</title>
<updated>2020-08-14T10:36:18Z</updated>
<author>
<name>Libing Zhou</name>
<email>libing.zhou@nokia-sbell.com</email>
</author>
<published>2020-08-14T03:02:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cc172ff301d8079e941a6eb31758951a6d764084'/>
<id>urn:sha1:cc172ff301d8079e941a6eb31758951a6d764084</id>
<content type='text'>
Current sysrq(t) output task fields name are not aligned with
actual task fields value, e.g.:

	kernel: sysrq: Show State
	kernel:  task                        PC stack   pid father
	kernel: systemd         S12456     1      0 0x00000000
	kernel: Call Trace:
	kernel: ? __schedule+0x240/0x740

To make it more readable, print fields name together with task fields
value in the same line, with fixed width:

	kernel: sysrq: Show State
	kernel: task:systemd         state:S stack:12920 pid:    1 ppid:     0 flags:0x00000000
	kernel: Call Trace:
	kernel: __schedule+0x282/0x620

Signed-off-by: Libing Zhou &lt;libing.zhou@nokia-sbell.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20200814030236.37835-1-libing.zhou@nokia-sbell.com
</content>
</entry>
</feed>
