<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/rt.c, branch v5.6</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.6</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.6'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-12-25T09:42:10Z</updated>
<entry>
<title>sched/rt: Make RT capacity-aware</title>
<updated>2019-12-25T09:42:10Z</updated>
<author>
<name>Qais Yousef</name>
<email>qais.yousef@arm.com</email>
</author>
<published>2019-10-09T10:46:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=804d402fb6f6487b825aae8cf42fda6426c62867'/>
<id>urn:sha1:804d402fb6f6487b825aae8cf42fda6426c62867</id>
<content type='text'>
Capacity Awareness refers to the fact that on heterogeneous systems
(like Arm big.LITTLE), the capacity of the CPUs is not uniform, hence
when placing tasks we need to be aware of this difference of CPU
capacities.

In such scenarios we want to ensure that the selected CPU has enough
capacity to meet the requirement of the running task. Enough capacity
means here that capacity_orig_of(cpu) &gt;= task.requirement.

The definition of task.requirement is dependent on the scheduling class.

For CFS, utilization is used to select a CPU that has &gt;= capacity value
than the cfs_task.util.

	capacity_orig_of(cpu) &gt;= cfs_task.util

DL isn't capacity aware at the moment but can make use of the bandwidth
reservation to implement that in a similar manner CFS uses utilization.
The following patchset implements that:

https://lore.kernel.org/lkml/20190506044836.2914-1-luca.abeni@santannapisa.it/

	capacity_orig_of(cpu)/SCHED_CAPACITY &gt;= dl_deadline/dl_runtime

For RT we don't have a per task utilization signal and we lack any
information in general about what performance requirement the RT task
needs. But with the introduction of uclamp, RT tasks can now control
that by setting uclamp_min to guarantee a minimum performance point.

ATM the uclamp value are only used for frequency selection; but on
heterogeneous systems this is not enough and we need to ensure that the
capacity of the CPU is &gt;= uclamp_min. Which is what implemented here.

	capacity_orig_of(cpu) &gt;= rt_task.uclamp_min

Note that by default uclamp.min is 1024, which means that RT tasks will
always be biased towards the big CPUs, which make for a better more
predictable behavior for the default case.

Must stress that the bias acts as a hint rather than a definite
placement strategy. For example, if all big cores are busy executing
other RT tasks we can't guarantee that a new RT task will be placed
there.

On non-heterogeneous systems the original behavior of RT should be
retained. Similarly if uclamp is not selected in the config.

[ mingo: Minor edits to comments. ]

Signed-off-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/20191009104611.15363-1-qais.yousef@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Further clarify sched_class::set_next_task()</title>
<updated>2019-11-11T07:35:21Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-11-08T13:16:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a0e813f26ebcb25c0b5e504498fbd796cca1a4ba'/>
<id>urn:sha1:a0e813f26ebcb25c0b5e504498fbd796cca1a4ba</id>
<content type='text'>
It turns out there really is something special to the first
set_next_task() invocation. In specific the 'change' pattern really
should not cause balance callbacks.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Fixes: f95d4eaee6d0 ("sched/{rt,deadline}: Fix set_next_task vs pick_next_task")
Link: https://lkml.kernel.org/r/20191108131909.775434698@infradead.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Simplify sched_class::pick_next_task()</title>
<updated>2019-11-11T07:35:20Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-11-08T13:15:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=98c2f700edb413e4baa4a0368c5861d96211a775'/>
<id>urn:sha1:98c2f700edb413e4baa4a0368c5861d96211a775</id>
<content type='text'>
Now that the indirect class call never uses the last two arguments of
pick_next_task(), remove them.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20191108131909.660595546@infradead.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix pick_next_task() vs 'change' pattern race</title>
<updated>2019-11-08T21:34:14Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-11-08T10:11:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e2df0581f569038719cf2bc2b3baa3fcc83cab4'/>
<id>urn:sha1:6e2df0581f569038719cf2bc2b3baa3fcc83cab4</id>
<content type='text'>
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq-&gt;lock and
sched_class::put_prev_task().

The comments about dropping rq-&gt;lock, in for example
newidle_balance(), only mentions the task being current and -&gt;on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):

	queued = task_on_rq_queued(p); /* p-&gt;on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq-&gt;curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:

	prev-&gt;sched_class-&gt;put_prev_task(rq, prev, rf);
	if (!rq-&gt;nr_running)
		newidle_balance(rq, rf);

The newidle_balance() call will drop rq-&gt;lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.

Furthermore, it turns out we lost the RT-pull when we put the last DL
task.

Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().

Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret &lt;qperret@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Quentin Perret &lt;qperret@google.com&gt;
Tested-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2019-09-17T19:35:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-17T19:35:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f2444d38f6bbfa12bc15e2533d8f9daa85ca02b'/>
<id>urn:sha1:7f2444d38f6bbfa12bc15e2533d8f9daa85ca02b</id>
<content type='text'>
Pull core timer updates from Thomas Gleixner:
 "Timers and timekeeping updates:

   - A large overhaul of the posix CPU timer code which is a preparation
     for moving the CPU timer expiry out into task work so it can be
     properly accounted on the task/process.

     An update to the bogus permission checks will come later during the
     merge window as feedback was not complete before heading of for
     travel.

   - Switch the timerqueue code to use cached rbtrees and get rid of the
     homebrewn caching of the leftmost node.

   - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
     single function

   - Implement the separation of hrtimers to be forced to expire in hard
     interrupt context even when PREEMPT_RT is enabled and mark the
     affected timers accordingly.

   - Implement a mechanism for hrtimers and the timer wheel to protect
     RT against priority inversion and live lock issues when a (hr)timer
     which should be canceled is currently executing the callback.
     Instead of infinitely spinning, the task which tries to cancel the
     timer blocks on a per cpu base expiry lock which is held and
     released by the (hr)timer expiry code.

   - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
     resulting in faster access to timekeeping functions.

   - Updates to various clocksource/clockevent drivers and their device
     tree bindings.

   - The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  posix-cpu-timers: Fix permission check regression
  posix-cpu-timers: Always clear head pointer on dequeue
  hrtimer: Add a missing bracket and hide `migration_base' on !SMP
  posix-cpu-timers: Make expiry_active check actually work correctly
  posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
  tick: Mark sched_timer to expire in hard interrupt context
  hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
  x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
  posix-cpu-timers: Utilize timerqueue for storage
  posix-cpu-timers: Move state tracking to struct posix_cputimers
  posix-cpu-timers: Deduplicate rlimit handling
  posix-cpu-timers: Remove pointless comparisons
  posix-cpu-timers: Get rid of 64bit divisions
  posix-cpu-timers: Consolidate timer expiry further
  posix-cpu-timers: Get rid of zero checks
  rlimit: Rewrite non-sensical RLIMIT_CPU comment
  posix-cpu-timers: Respect INFINITY for hard RTTIME limit
  posix-cpu-timers: Switch thread group sampling to array
  posix-cpu-timers: Restructure expiry array
  posix-cpu-timers: Remove cputime_expires
  ...
</content>
</entry>
<entry>
<title>posix-cpu-timers: Move expiry cache into struct posix_cputimers</title>
<updated>2019-08-28T09:50:35Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-08-21T19:09:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3a245c0f110e2bfcf7f2cd2248a29005c78999e3'/>
<id>urn:sha1:3a245c0f110e2bfcf7f2cd2248a29005c78999e3</id>
<content type='text'>
The expiry cache belongs into the posix_cputimers container where the other
cpu timers information is.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190821192921.014444012@linutronix.de

</content>
</entry>
<entry>
<title>sched: Rework pick_next_task() slow-path</title>
<updated>2019-08-08T07:09:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=67692435c411e5c53a1c588ecca2037aebd81f2e'/>
<id>urn:sha1:67692435c411e5c53a1c588ecca2037aebd81f2e</id>
<content type='text'>
Avoid the RETRY_TASK case in the pick_next_task() slow path.

By doing the put_prev_task() early, we get the rt/deadline pull done,
and by testing rq-&gt;nr_running we know if we need newidle_balance().

This then gives a stable state to pick a task from.

Since the fast-path is fair only; it means the other classes will
always have pick_next_task(.prev=NULL, .rf=NULL) and we can simplify.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/aa34d24b36547139248f32a30138791ac6c02bd6.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
<entry>
<title>sched: Allow put_prev_task() to drop rq-&gt;lock</title>
<updated>2019-08-08T07:09:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5f2a45fc9e89e022233085e6f0f352eb6ff770bb'/>
<id>urn:sha1:5f2a45fc9e89e022233085e6f0f352eb6ff770bb</id>
<content type='text'>
Currently the pick_next_task() loop is convoluted and ugly because of
how it can drop the rq-&gt;lock and needs to restart the picking.

For the RT/Deadline classes, it is put_prev_task() where we do
balancing, and we could do this before the picking loop. Make this
possible.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/e4519f6850477ab7f3d257062796e6425ee4ba7c.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
<entry>
<title>sched: Add task_struct pointer to sched_class::set_curr_task</title>
<updated>2019-08-08T07:09:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=03b7fad167efca3b7abbbb39733933f9df56e79c'/>
<id>urn:sha1:03b7fad167efca3b7abbbb39733933f9df56e79c</id>
<content type='text'>
In preparation of further separating pick_next_task() and
set_curr_task() we have to pass the actual task into it, while there,
rename the thing to better pair with put_prev_task().

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/a96d1bcdd716db4a4c5da2fece647a1456c0ed78.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
<entry>
<title>sched/{rt,deadline}: Fix set_next_task vs pick_next_task</title>
<updated>2019-08-08T07:09:30Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f95d4eaee6d0207bff2dc93371133d31227d4cfb'/>
<id>urn:sha1:f95d4eaee6d0207bff2dc93371133d31227d4cfb</id>
<content type='text'>
Because pick_next_task() implies set_curr_task() and some of the
details haven't mattered too much, some of what _should_ be in
set_curr_task() ended up in pick_next_task, correct this.

This prepares the way for a pick_next_task() variant that does not
affect the current state; allowing remote picking.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/38c61d5240553e043c27c5e00b9dd0d184dd6081.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
</feed>
