<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/rt.c, branch v5.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-11-08T21:34:14Z</updated>
<entry>
<title>sched: Fix pick_next_task() vs 'change' pattern race</title>
<updated>2019-11-08T21:34:14Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-11-08T10:11:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e2df0581f569038719cf2bc2b3baa3fcc83cab4'/>
<id>urn:sha1:6e2df0581f569038719cf2bc2b3baa3fcc83cab4</id>
<content type='text'>
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq-&gt;lock and
sched_class::put_prev_task().

The comments about dropping rq-&gt;lock, in for example
newidle_balance(), only mentions the task being current and -&gt;on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):

	queued = task_on_rq_queued(p); /* p-&gt;on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq-&gt;curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:

	prev-&gt;sched_class-&gt;put_prev_task(rq, prev, rf);
	if (!rq-&gt;nr_running)
		newidle_balance(rq, rf);

The newidle_balance() call will drop rq-&gt;lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.

Furthermore, it turns out we lost the RT-pull when we put the last DL
task.

Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().

Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret &lt;qperret@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Quentin Perret &lt;qperret@google.com&gt;
Tested-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2019-09-17T19:35:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-17T19:35:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f2444d38f6bbfa12bc15e2533d8f9daa85ca02b'/>
<id>urn:sha1:7f2444d38f6bbfa12bc15e2533d8f9daa85ca02b</id>
<content type='text'>
Pull core timer updates from Thomas Gleixner:
 "Timers and timekeeping updates:

   - A large overhaul of the posix CPU timer code which is a preparation
     for moving the CPU timer expiry out into task work so it can be
     properly accounted on the task/process.

     An update to the bogus permission checks will come later during the
     merge window as feedback was not complete before heading of for
     travel.

   - Switch the timerqueue code to use cached rbtrees and get rid of the
     homebrewn caching of the leftmost node.

   - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
     single function

   - Implement the separation of hrtimers to be forced to expire in hard
     interrupt context even when PREEMPT_RT is enabled and mark the
     affected timers accordingly.

   - Implement a mechanism for hrtimers and the timer wheel to protect
     RT against priority inversion and live lock issues when a (hr)timer
     which should be canceled is currently executing the callback.
     Instead of infinitely spinning, the task which tries to cancel the
     timer blocks on a per cpu base expiry lock which is held and
     released by the (hr)timer expiry code.

   - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
     resulting in faster access to timekeeping functions.

   - Updates to various clocksource/clockevent drivers and their device
     tree bindings.

   - The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  posix-cpu-timers: Fix permission check regression
  posix-cpu-timers: Always clear head pointer on dequeue
  hrtimer: Add a missing bracket and hide `migration_base' on !SMP
  posix-cpu-timers: Make expiry_active check actually work correctly
  posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
  tick: Mark sched_timer to expire in hard interrupt context
  hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
  x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
  posix-cpu-timers: Utilize timerqueue for storage
  posix-cpu-timers: Move state tracking to struct posix_cputimers
  posix-cpu-timers: Deduplicate rlimit handling
  posix-cpu-timers: Remove pointless comparisons
  posix-cpu-timers: Get rid of 64bit divisions
  posix-cpu-timers: Consolidate timer expiry further
  posix-cpu-timers: Get rid of zero checks
  rlimit: Rewrite non-sensical RLIMIT_CPU comment
  posix-cpu-timers: Respect INFINITY for hard RTTIME limit
  posix-cpu-timers: Switch thread group sampling to array
  posix-cpu-timers: Restructure expiry array
  posix-cpu-timers: Remove cputime_expires
  ...
</content>
</entry>
<entry>
<title>posix-cpu-timers: Move expiry cache into struct posix_cputimers</title>
<updated>2019-08-28T09:50:35Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-08-21T19:09:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3a245c0f110e2bfcf7f2cd2248a29005c78999e3'/>
<id>urn:sha1:3a245c0f110e2bfcf7f2cd2248a29005c78999e3</id>
<content type='text'>
The expiry cache belongs into the posix_cputimers container where the other
cpu timers information is.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Link: https://lkml.kernel.org/r/20190821192921.014444012@linutronix.de

</content>
</entry>
<entry>
<title>sched: Rework pick_next_task() slow-path</title>
<updated>2019-08-08T07:09:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=67692435c411e5c53a1c588ecca2037aebd81f2e'/>
<id>urn:sha1:67692435c411e5c53a1c588ecca2037aebd81f2e</id>
<content type='text'>
Avoid the RETRY_TASK case in the pick_next_task() slow path.

By doing the put_prev_task() early, we get the rt/deadline pull done,
and by testing rq-&gt;nr_running we know if we need newidle_balance().

This then gives a stable state to pick a task from.

Since the fast-path is fair only; it means the other classes will
always have pick_next_task(.prev=NULL, .rf=NULL) and we can simplify.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/aa34d24b36547139248f32a30138791ac6c02bd6.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
<entry>
<title>sched: Allow put_prev_task() to drop rq-&gt;lock</title>
<updated>2019-08-08T07:09:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5f2a45fc9e89e022233085e6f0f352eb6ff770bb'/>
<id>urn:sha1:5f2a45fc9e89e022233085e6f0f352eb6ff770bb</id>
<content type='text'>
Currently the pick_next_task() loop is convoluted and ugly because of
how it can drop the rq-&gt;lock and needs to restart the picking.

For the RT/Deadline classes, it is put_prev_task() where we do
balancing, and we could do this before the picking loop. Make this
possible.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/e4519f6850477ab7f3d257062796e6425ee4ba7c.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
<entry>
<title>sched: Add task_struct pointer to sched_class::set_curr_task</title>
<updated>2019-08-08T07:09:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=03b7fad167efca3b7abbbb39733933f9df56e79c'/>
<id>urn:sha1:03b7fad167efca3b7abbbb39733933f9df56e79c</id>
<content type='text'>
In preparation of further separating pick_next_task() and
set_curr_task() we have to pass the actual task into it, while there,
rename the thing to better pair with put_prev_task().

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/a96d1bcdd716db4a4c5da2fece647a1456c0ed78.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
<entry>
<title>sched/{rt,deadline}: Fix set_next_task vs pick_next_task</title>
<updated>2019-08-08T07:09:30Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2019-05-29T20:36:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f95d4eaee6d0207bff2dc93371133d31227d4cfb'/>
<id>urn:sha1:f95d4eaee6d0207bff2dc93371133d31227d4cfb</id>
<content type='text'>
Because pick_next_task() implies set_curr_task() and some of the
details haven't mattered too much, some of what _should_ be in
set_curr_task() ended up in pick_next_task, correct this.

This prepares the way for a pick_next_task() variant that does not
affect the current state; allowing remote picking.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Aaron Lu &lt;aaron.lwe@gmail.com&gt;
Cc: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Cc: mingo@kernel.org
Cc: Phil Auld &lt;pauld@redhat.com&gt;
Cc: Julien Desfossez &lt;jdesfossez@digitalocean.com&gt;
Cc: Nishanth Aravamudan &lt;naravamudan@digitalocean.com&gt;
Link: https://lkml.kernel.org/r/38c61d5240553e043c27c5e00b9dd0d184dd6081.1559129225.git.vpillai@digitalocean.com
</content>
</entry>
<entry>
<title>sched: Mark hrtimers to expire in hard interrupt context</title>
<updated>2019-08-01T18:51:19Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2019-07-26T18:30:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d5096aa65acd0ef2d18ac8247260ab4481ade399'/>
<id>urn:sha1:d5096aa65acd0ef2d18ac8247260ab4481ade399</id>
<content type='text'>
The scheduler related hrtimers need to expire in hard interrupt context
even on PREEMPT_RT enabled kernels. Mark then as such.

No functional change.

[ tglx: Split out from larger combo patch. Add changelog. ]

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20190726185753.077004842@linutronix.de



</content>
</entry>
<entry>
<title>sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks</title>
<updated>2019-06-24T17:23:48Z</updated>
<author>
<name>Patrick Bellasi</name>
<email>patrick.bellasi@arm.com</email>
</author>
<published>2019-06-21T08:42:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=982d9cdc22c9f6df5ad790caa229ff74fb1d95e7'/>
<id>urn:sha1:982d9cdc22c9f6df5ad790caa229ff74fb1d95e7</id>
<content type='text'>
Each time a frequency update is required via schedutil, a frequency is
selected to (possibly) satisfy the utilization reported by each
scheduling class and irqs. However, when utilization clamping is in use,
the frequency selection should consider userspace utilization clamping
hints.  This will allow, for example, to:

 - boost tasks which are directly affecting the user experience
   by running them at least at a minimum "requested" frequency

 - cap low priority tasks not directly affecting the user experience
   by running them only up to a maximum "allowed" frequency

These constraints are meant to support a per-task based tuning of the
frequency selection thus supporting a fine grained definition of
performance boosting vs energy saving strategies in kernel space.

Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
within the boundaries defined by their aggregated utilization clamp
constraints.

Do that by considering the max(min_util, max_util) to give boosted tasks
the performance they need even when they happen to be co-scheduled with
other capped tasks.

Signed-off-by: Patrick Bellasi &lt;patrick.bellasi@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alessio Balsini &lt;balsini@android.com&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Joel Fernandes &lt;joelaf@google.com&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Morten Rasmussen &lt;morten.rasmussen@arm.com&gt;
Cc: Paul Turner &lt;pjt@google.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Quentin Perret &lt;quentin.perret@arm.com&gt;
Cc: Rafael J . Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Steve Muckle &lt;smuckle@google.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Todd Kjos &lt;tkjos@google.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Cc: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Link: https://lkml.kernel.org/r/20190621084217.8167-10-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Provide a pointer to the valid CPU mask</title>
<updated>2019-06-03T09:49:37Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2019-04-23T14:26:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3bd3706251ee8ab67e69d9340ac2abdca217e733'/>
<id>urn:sha1:3bd3706251ee8ab67e69d9340ac2abdca217e733</id>
<content type='text'>
In commit:

  4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper")

the tsk_nr_cpus_allowed() wrapper was removed. There was not
much difference in !RT but in RT we used this to implement
migrate_disable(). Within a migrate_disable() section the CPU mask is
restricted to single CPU while the "normal" CPU mask remains untouched.

As an alternative implementation Ingo suggested to use:

	struct task_struct {
		const cpumask_t		*cpus_ptr;
		cpumask_t		cpus_mask;
        };
with
	t-&gt;cpus_ptr = &amp;t-&gt;cpus_mask;

In -RT we then can switch the cpus_ptr to:

	t-&gt;cpus_ptr = &amp;cpumask_of(task_cpu(p));

in a migration disabled region. The rules are simple:

 - Code that 'uses' -&gt;cpus_allowed would use the pointer.
 - Code that 'modifies' -&gt;cpus_allowed would use the direct mask.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20190423142636.14347-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
