<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/syscalls.c, branch v6.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-10-29T12:57:51Z</updated>
<entry>
<title>sched: Pass correct scheduling policy to __setscheduler_class</title>
<updated>2024-10-29T12:57:51Z</updated>
<author>
<name>Aboorva Devarajan</name>
<email>aboorvad@linux.ibm.com</email>
</author>
<published>2024-10-25T18:50:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5db91545ef8150c45a526675ef99e8998b648a41'/>
<id>urn:sha1:5db91545ef8150c45a526675ef99e8998b648a41</id>
<content type='text'>
Commit 98442f0ccd82 ("sched: Fix delayed_dequeue vs
switched_from_fair()") overlooked that __setscheduler_prio(), now
__setscheduler_class() relies on p-&gt;policy for task_should_scx(), and
moved the call before __setscheduler_params() updates it, causing it
to be using the old p-&gt;policy value.

Resolve this by changing task_should_scx() to take the policy itself
instead of a task pointer, such that __sched_setscheduler() can pass
in the updated policy.

Fixes: 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()")
Signed-off-by: Aboorva Devarajan &lt;aboorvad@linux.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix delayed_dequeue vs switched_from_fair()</title>
<updated>2024-10-11T08:49:32Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2024-10-10T09:54:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=98442f0ccd828ac42e89281a815e9e7a97533822'/>
<id>urn:sha1:98442f0ccd828ac42e89281a815e9e7a97533822</id>
<content type='text'>
Commit 2e0199df252a ("sched/fair: Prepare exit/cleanup paths for delayed_dequeue")
and its follow up fixes try to deal with a rather unfortunate
situation where is task is enqueued in a new class, even though it
shouldn't have been. Mostly because the existing -&gt;switched_to/from()
hooks are in the wrong place for this case.

This all led to Paul being able to trigger failures at something like
once per 10k CPU hours of RCU torture.

For now, do the ugly thing and move the code to the right place by
ignoring the switch hooks.

Note: Clean up the whole sched_class::switch*_{to,from}() thing.

Fixes: 2e0199df252a ("sched/fair: Prepare exit/cleanup paths for delayed_dequeue")
Reported-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20241003185037.GA5594@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2024-09-21T16:44:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-09-21T16:44:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=88264981f2082248e892a706b2c5004650faac54'/>
<id>urn:sha1:88264981f2082248e892a706b2c5004650faac54</id>
<content type='text'>
Pull sched_ext support from Tejun Heo:
 "This implements a new scheduler class called ‘ext_sched_class’, or
  sched_ext, which allows scheduling policies to be implemented as BPF
  programs.

  The goals of this are:

   - Ease of experimentation and exploration: Enabling rapid iteration
     of new scheduling policies.

   - Customization: Building application-specific schedulers which
     implement policies that are not applicable to general-purpose
     schedulers.

   - Rapid scheduler deployments: Non-disruptive swap outs of scheduling
     policies in production environments"

See individual commits for more documentation, but also the cover letter
for the latest series:

Link: https://lore.kernel.org/all/20240618212056.2833381-1-tj@kernel.org/

* tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (110 commits)
  sched: Move update_other_load_avgs() to kernel/sched/pelt.c
  sched_ext: Don't trigger ops.quiescent/runnable() on migrations
  sched_ext: Synchronize bypass state changes with rq lock
  scx_qmap: Implement highpri boosting
  sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()
  sched_ext: Compact struct bpf_iter_scx_dsq_kern
  sched_ext: Replace consume_local_task() with move_local_task_to_local_dsq()
  sched_ext: Move consume_local_task() upward
  sched_ext: Move sanity check and dsq_mod_nr() into task_unlink_from_dsq()
  sched_ext: Reorder args for consume_local/remote_task()
  sched_ext: Restructure dispatch_to_local_dsq()
  sched_ext: Fix processs_ddsp_deferred_locals() by unifying DTL_INVALID handling
  sched_ext: Make find_dsq_for_dispatch() handle SCX_DSQ_LOCAL_ON
  sched_ext: Refactor consume_remote_task()
  sched_ext: Rename scx_kfunc_set_sleepable to unlocked and relocate
  sched_ext: Add missing static to scx_dump_data
  sched_ext: Add missing static to scx_has_op[]
  sched_ext: Temporarily work around pick_task_scx() being called without balance_scx()
  sched_ext: Add a cgroup scheduler which uses flattened hierarchy
  sched_ext: Add cgroup support
  ...
</content>
</entry>
<entry>
<title>Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2024-09-19T13:55:58Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-09-19T13:55:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2004cef11ea072838f99bd95cefa5c8e45df0847'/>
<id>urn:sha1:2004cef11ea072838f99bd95cefa5c8e45df0847</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot
   de Oliveira's last major contribution to the kernel:

     "SCHED_DEADLINE servers can help fixing starvation issues of low
      priority tasks (e.g., SCHED_OTHER) when higher priority tasks
      monopolize CPU cycles. Today we have RT Throttling; DEADLINE
      servers should be able to replace and improve that."

   (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef
   Esmat, Huang Shijie)

 - Preparatory changes for sched_ext integration:
     - Use set_next_task(.first) where required
     - Fix up set_next_task() implementations
     - Clean up DL server vs. core sched
     - Split up put_prev_task_balance()
     - Rework pick_next_task()
     - Combine the last put_prev_task() and the first set_next_task()
     - Rework dl_server
     - Add put_prev_task(.next)

   (Peter Zijlstra, with a fix by Tejun Heo)

 - Complete the EEVDF transition and refine EEVDF scheduling:
     - Implement delayed dequeue
     - Allow shorter slices to wakeup-preempt
     - Use sched_attr::sched_runtime to set request/slice suggestion
     - Document the new feature flags
     - Remove unused and duplicate-functionality fields
     - Simplify &amp; unify pick_next_task_fair()
     - Misc debuggability enhancements

   (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin
   Schneider and Chuyi Zhou)

 - Initialize the vruntime of a new task when it is first enqueued,
   resulting in significant decrease in latency of newly woken tasks
   (Zhang Qiao)

 - Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
   (K Prateek Nayak, Peter Zijlstra)

 - Clean up and clarify the usage of Clean up usage of rt_task()
   (Qais Yousef)

 - Preempt SCHED_IDLE entities in strict cgroup hierarchies
   (Tianchen Ding)

 - Clarify the documentation of time units for deadline scheduler
   parameters (Christian Loehle)

 - Remove the HZ_BW chicken-bit feature flag introduced a year ago,
   the original change seems to be working fine (Phil Auld)

 - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
   Peilin He, Qais Yousefm and Vincent Guittot)

* tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched/cpufreq: Use NSEC_PER_MSEC for deadline task
  cpufreq/cppc: Use NSEC_PER_MSEC for deadline task
  sched/deadline: Clarify nanoseconds in uapi
  sched/deadline: Convert schedtool example to chrt
  sched/debug: Fix the runnable tasks output
  sched: Fix sched_delayed vs sched_core
  kernel/sched: Fix util_est accounting for DELAY_DEQUEUE
  kthread: Fix task state in kthread worker if being frozen
  sched/pelt: Use rq_clock_task() for hw_pressure
  sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c
  sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
  sched: Add put_prev_task(.next)
  sched: Rework dl_server
  sched: Combine the last put_prev_task() and the first set_next_task()
  sched: Rework pick_next_task()
  sched: Split up put_prev_task_balance()
  sched: Clean up DL server vs core sched
  sched: Fixup set_next_task() implementations
  sched: Use set_next_task(.first) where required
  sched/fair: Properly deactivate sched_delayed task upon class change
  ...
</content>
</entry>
<entry>
<title>sched: Move update_other_load_avgs() to kernel/sched/pelt.c</title>
<updated>2024-09-12T06:00:21Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-09-11T19:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=902d67a2d40f5b0815f4f627b26d91f96cc51fb3'/>
<id>urn:sha1:902d67a2d40f5b0815f4f627b26d91f96cc51fb3</id>
<content type='text'>
96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from
__update_blocked_others()") added update_other_load_avgs() in
kernel/sched/syscalls.c right above effective_cpu_util(). This location
didn't fit that well in the first place, and with 5d871a63997f ("sched/fair:
Move effective_cpu_util() and effective_cpu_util() in fair.c") moving
effective_cpu_util() to kernel/sched/fair.c, it looks even more out of
place.

Relocate the function to kernel/sched/pelt.c where all its callees are.

No functional changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tip/sched/core' into sched_ext/for-6.12</title>
<updated>2024-09-11T18:43:26Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-09-11T18:43:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0b1777f0fa045c561fd26c8fda61f5eb7a930ed3'/>
<id>urn:sha1:0b1777f0fa045c561fd26c8fda61f5eb7a930ed3</id>
<content type='text'>
Pull in tip/sched/core to resolve two merge conflicts:

- 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()")
  5d871a63997f ("sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c")

  A simple context conflict. The former added __update_blocked_others() in
  the same #ifdef CONFIG_SMP block that effective_cpu_util() and
  sched_cpu_util() are in and the latter moved those functions to fair.c.
  This makes __update_blocked_others() more out of place. Will follow up
  with a patch to relocate.

- 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()")
  84d265281d6c ("sched/pelt: Use rq_clock_task() for hw_pressure")

  The former factored out the body of __update_blocked_others() into
  update_other_load_avgs(). The latter changed how update_hw_load_avg() is
  called in the body. Resolved by applying the change to
  update_other_load_avgs() instead.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c</title>
<updated>2024-09-10T07:51:14Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2024-09-04T09:24:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d871a63997fa8bcf80adb49ea1f2f7840dff932'/>
<id>urn:sha1:5d871a63997fa8bcf80adb49ea1f2f7840dff932</id>
<content type='text'>
Move effective_cpu_util() and sched_cpu_util() functions in fair.c file
with others utilization related functions.

No functional change.

Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20240904092417.20660-1-vincent.guittot@linaro.org
</content>
</entry>
<entry>
<title>hrtimer: Use and report correct timerslack values for realtime tasks</title>
<updated>2024-08-23T18:13:02Z</updated>
<author>
<name>Felix Moessbauer</name>
<email>felix.moessbauer@siemens.com</email>
</author>
<published>2024-08-14T12:10:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ed4fb6d7ef68111bb539283561953e5c6e9a6e38'/>
<id>urn:sha1:ed4fb6d7ef68111bb539283561953e5c6e9a6e38</id>
<content type='text'>
The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.

This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.

This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". Setting non-zero slack values (either
via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored,
as stated in "man 2 PR_SET_TIMERSLACK":

  Timer slack is not applied to threads that are scheduled under a
  real-time scheduling policy (see sched_setscheduler(2)).

The special handling of timerslack on rt tasks in downstream users
is removed as well.

Signed-off-by: Felix Moessbauer &lt;felix.moessbauer@siemens.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemens.com

</content>
</entry>
<entry>
<title>Merge branch 'tip/sched/core' into for-6.12</title>
<updated>2024-08-20T18:55:26Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-08-20T18:55:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5ac998574f93ac042cb84b4f1d919e2b20966afe'/>
<id>urn:sha1:5ac998574f93ac042cb84b4f1d919e2b20966afe</id>
<content type='text'>
To receive 863ccdbb918a ("sched: Allow sched_class::dequeue_task() to fail")
which makes sched_class.dequeue_task() return bool instead of void. This
leads to compile breakage and will be fixed by a follow-up patch.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion</title>
<updated>2024-08-17T09:06:45Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-05-22T11:46:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=857b158dc5e81c6de795ef6be006eed146098fc6'/>
<id>urn:sha1:857b158dc5e81c6de795ef6be006eed146098fc6</id>
<content type='text'>
Allow applications to directly set a suggested request/slice length using
sched_attr::sched_runtime.

The implementation clamps the value to: 0.1[ms] &lt;= slice &lt;= 100[ms]
which is 1/10 the size of HZ=1000 and 10 times the size of HZ=100.

Applications should strive to use their periodic runtime at a high
confidence interval (95%+) as the target slice. Using a smaller slice
will introduce undue preemptions, while using a larger value will
increase latency.

For all the following examples assume a scheduling quantum of 8, and for
consistency all examples have W=4:

  {A,B,C,D}(w=1,r=8):

  ABCD...
  +---+---+---+---

  t=0, V=1.5				t=1, V=3.5
  A  |------&lt;				A          |------&lt;
  B   |------&lt;				B   |------&lt;
  C    |------&lt;				C    |------&lt;
  D     |------&lt;			D     |------&lt;
  ---+*------+-------+---		---+--*----+-------+---

  t=2, V=5.5				t=3, V=7.5
  A          |------&lt;			A          |------&lt;
  B           |------&lt;			B           |------&lt;
  C    |------&lt;				C            |------&lt;
  D     |------&lt;			D     |------&lt;
  ---+----*--+-------+---		---+------*+-------+---

Note: 4 identical tasks in FIFO order

~~~

  {A,B}(w=1,r=16) C(w=2,r=16)

  AACCBBCC...
  +---+---+---+---

  t=0, V=1.25				t=2, V=5.25
  A  |--------------&lt;                   A                  |--------------&lt;
  B   |--------------&lt;                  B   |--------------&lt;
  C    |------&lt;                         C    |------&lt;
  ---+*------+-------+---               ---+----*--+-------+---

  t=4, V=8.25				t=6, V=12.25
  A                  |--------------&lt;   A                  |--------------&lt;
  B   |--------------&lt;                  B                   |--------------&lt;
  C            |------&lt;                 C            |------&lt;
  ---+-------*-------+---               ---+-------+---*---+---

Note: 1 heavy task -- because q=8, double r such that the deadline of the w=2
      task doesn't go below q.

Note: observe the full schedule becomes: W*max(r_i/w_i) = 4*2q = 8q in length.

Note: the period of the heavy task is half the full period at:
      W*(r_i/w_i) = 4*(2q/2) = 4q

~~~

  {A,C,D}(w=1,r=16) B(w=1,r=8):

  BAACCBDD...
  +---+---+---+---

  t=0, V=1.5				t=1, V=3.5
  A  |--------------&lt;			A  |---------------&lt;
  B   |------&lt;				B           |------&lt;
  C    |--------------&lt;			C    |--------------&lt;
  D     |--------------&lt;		D     |--------------&lt;
  ---+*------+-------+---		---+--*----+-------+---

  t=3, V=7.5				t=5, V=11.5
  A                  |---------------&lt;  A                  |---------------&lt;
  B           |------&lt;                  B           |------&lt;
  C    |--------------&lt;                 C                    |--------------&lt;
  D     |--------------&lt;                D     |--------------&lt;
  ---+------*+-------+---               ---+-------+--*----+---

  t=6, V=13.5
  A                  |---------------&lt;
  B                   |------&lt;
  C                    |--------------&lt;
  D     |--------------&lt;
  ---+-------+----*--+---

Note: 1 short task -- again double r so that the deadline of the short task
      won't be below q. Made B short because its not the leftmost task, but is
      eligible with the 0,1,2,3 spread.

Note: like with the heavy task, the period of the short task observes:
      W*(r_i/w_i) = 4*(1q/1) = 4q

~~~

  A(w=1,r=16) B(w=1,r=8) C(w=2,r=16)

  BCCAABCC...
  +---+---+---+---

  t=0, V=1.25				t=1, V=3.25
  A  |--------------&lt;                   A  |--------------&lt;
  B   |------&lt;                          B           |------&lt;
  C    |------&lt;                         C    |------&lt;
  ---+*------+-------+---               ---+--*----+-------+---

  t=3, V=7.25				t=5, V=11.25
  A  |--------------&lt;                   A                  |--------------&lt;
  B           |------&lt;                  B           |------&lt;
  C            |------&lt;                 C            |------&lt;
  ---+------*+-------+---               ---+-------+--*----+---

  t=6, V=13.25
  A                  |--------------&lt;
  B                   |------&lt;
  C            |------&lt;
  ---+-------+----*--+---

Note: 1 heavy and 1 short task -- combine them all.

Note: both the short and heavy task end up with a period of 4q

~~~

  A(w=1,r=16) B(w=2,r=16) C(w=1,r=8)

  BBCAABBC...
  +---+---+---+---

  t=0, V=1				t=2, V=5
  A  |--------------&lt;                   A  |--------------&lt;
  B   |------&lt;                          B           |------&lt;
  C    |------&lt;                         C    |------&lt;
  ---+*------+-------+---               ---+----*--+-------+---

  t=3, V=7				t=5, V=11
  A  |--------------&lt;                   A                  |--------------&lt;
  B           |------&lt;                  B           |------&lt;
  C            |------&lt;                 C            |------&lt;
  ---+------*+-------+---               ---+-------+--*----+---

  t=7, V=15
  A                  |--------------&lt;
  B                   |------&lt;
  C            |------&lt;
  ---+-------+------*+---

Note: as before but permuted

~~~

From all this it can be deduced that, for the steady state:

 - the total period (P) of a schedule is:	W*max(r_i/w_i)
 - the average period of a task is:		W*(r_i/w_i)
 - each task obtains the fair share:		w_i/W of each full period P

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Link: https://lkml.kernel.org/r/20240727105030.842834421@infradead.org
</content>
</entry>
</feed>
