<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/ext.c, branch v6.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2024-11-15T17:59:51Z</updated>
<entry>
<title>Merge tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2024-11-15T17:59:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-15T17:59:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d79944b0948c3a5e80229606e36281d6ef746b21'/>
<id>urn:sha1:d79944b0948c3a5e80229606e36281d6ef746b21</id>
<content type='text'>
Pull sched_ext fix from Tejun Heo:
 "One more fix for v6.12-rc7

  ops.cpu_acquire() was being invoked with the wrong kfunc mask allowing
  the operation to call kfuncs which shouldn't be allowed. Fix it by
  using SCX_KF_REST instead, which is trivial and low risk"

* tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
</content>
</entry>
<entry>
<title>sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST</title>
<updated>2024-11-14T18:50:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-11-14T18:50:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a4af89cc50f3c1035c1e0dfb50948a23107f3e95'/>
<id>urn:sha1:a4af89cc50f3c1035c1e0dfb50948a23107f3e95</id>
<content type='text'>
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: David Vernet &lt;void@manifault.com&gt;
Cc: Zhao Mengmeng &lt;zhaomzhao@126.com&gt;
Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org
Fixes: 245254f7081d ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2024-11-11T22:09:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-11T22:09:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3022e9d00ebec31ed435ae0844e3f235dba998a9'/>
<id>urn:sha1:3022e9d00ebec31ed435ae0844e3f235dba998a9</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:

 - The fair sched class currently has a bug where its balance() returns
   true telling the sched core that it has tasks to run but then NULL
   from pick_task(). This makes sched core call sched_ext's pick_task()
   without preceding balance() which can lead to stalls in partial mode.

   For now, work around by detecting the condition and forcing the CPU
   to go through another scheduling cycle.

 - Add a missing newline to an error message and fix drgn introspection
   tool which went out of sync.

* tag 'sched_ext-for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()
  sched_ext: Update scx_show_state.py to match scx_ops_bypass_depth's new type
  sched_ext: Add a missing newline at the end of an error message
</content>
</entry>
<entry>
<title>sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()</title>
<updated>2024-11-09T20:43:55Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-11-09T20:43:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a6250aa251eacaf3ebfcfe152a96a727fd483ecd'/>
<id>urn:sha1:a6250aa251eacaf3ebfcfe152a96a727fd483ecd</id>
<content type='text'>
sched_ext dispatches tasks from the BPF scheduler from balance_scx() and
thus every pick_task_scx() call must be preceded by balance_scx(). While
this usually holds, due to a bug, there are cases where the fair class's
balance() returns true indicating that it has tasks to run on the CPU and
thus terminating balance() calls but fails to actually find the next task to
run when pick_task() is called. In such cases, pick_task_scx() can be called
without preceding balance_scx().

Detect this condition using SCX_RQ_BAL_PENDING flags. If detected, keep
running the previous task if possible and avoid stalling from entering idle
without balancing.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/Ztj_h5c2LYsdXYbA@slm.duckdns.org
</content>
</entry>
<entry>
<title>sched_ext: Add a missing newline at the end of an error message</title>
<updated>2024-11-05T21:45:24Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-11-05T21:45:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f7d1b585e1533e26801c13569b96b84b2ad2d3c1'/>
<id>urn:sha1:f7d1b585e1533e26801c13569b96b84b2ad2d3c1</id>
<content type='text'>
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2024-11-03T18:18:28Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-03T18:18:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=33e83ffe4c57132c73b7d3fb7919006c5296c496'/>
<id>urn:sha1:33e83ffe4c57132c73b7d3fb7919006c5296c496</id>
<content type='text'>
Pull scheduler fixes from Thomas Gleixner:

 - Plug a race between pick_next_task_fair() and try_to_wake_up() where
   both try to write to the same task, even though both paths hold a
   runqueue lock, but obviously from different runqueues.

   The problem is that the store to task::on_rq in __block_task() is
   visible to try_to_wake_up() which assumes that the task is not
   queued. Both sides then operate on the same task.

   Cure it by rearranging __block_task() so the the store to task::on_rq
   is the last operation on the task.

 - Prevent a potential NULL pointer dereference in task_numa_work()

   task_numa_work() iterates the VMAs of a process. A concurrent unmap
   of the address space can result in a NULL pointer return from
   vma_next() which is unchecked.

   Add the missing NULL pointer check to prevent this.

 - Operate on the correct scheduler policy in task_should_scx()

   task_should_scx() returns true when a task should be handled by sched
   EXT. It checks the tasks scheduling policy.

   This fails when the check is done before a policy has been set.

   Cure it by handing the policy into task_should_scx() so it operates
   on the requested value.

 - Add the missing handling of sched EXT in the delayed dequeue
   mechanism. This was simply forgotten.

* tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/ext: Fix scx vs sched_delayed
  sched: Pass correct scheduling policy to __setscheduler_class
  sched/numa: Fix the potential null pointer dereference in task_numa_work()
  sched: Fix pick_next_task_fair() vs try_to_wake_up() race
</content>
</entry>
<entry>
<title>sched/ext: Fix scx vs sched_delayed</title>
<updated>2024-10-30T21:42:12Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2024-10-30T10:49:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=69d5e722be949a1e2409c3f2865ba6020c279db6'/>
<id>urn:sha1:69d5e722be949a1e2409c3f2865ba6020c279db6</id>
<content type='text'>
Commit 98442f0ccd82 ("sched: Fix delayed_dequeue vs
switched_from_fair()") forgot about scx :/

Fixes: 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()")
Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lkml.kernel.org/r/20241030104934.GK14555@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.12-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2024-10-30T02:35:40Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-10-30T02:35:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=daa9f66fe194f672d2c94d879b6dad7035e03ebe'/>
<id>urn:sha1:daa9f66fe194f672d2c94d879b6dad7035e03ebe</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:

 - Instances of scx_ops_bypass() could race each other leading to
   misbehavior. Fix by protecting the operation with a spinlock.

 - selftest and userspace header fixes

* tag 'sched_ext-for-6.12-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix enq_last_no_enq_fails selftest
  sched_ext: Make cast_mask() inline
  scx: Fix raciness in scx_ops_bypass()
  scx: Fix exit selftest to use custom DSQ
  sched_ext: Fix function pointer type mismatches in BPF selftests
  selftests/sched_ext: add order-only dependency of runner.o on BPFOBJ
</content>
</entry>
<entry>
<title>sched: Pass correct scheduling policy to __setscheduler_class</title>
<updated>2024-10-29T12:57:51Z</updated>
<author>
<name>Aboorva Devarajan</name>
<email>aboorvad@linux.ibm.com</email>
</author>
<published>2024-10-25T18:50:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5db91545ef8150c45a526675ef99e8998b648a41'/>
<id>urn:sha1:5db91545ef8150c45a526675ef99e8998b648a41</id>
<content type='text'>
Commit 98442f0ccd82 ("sched: Fix delayed_dequeue vs
switched_from_fair()") overlooked that __setscheduler_prio(), now
__setscheduler_class() relies on p-&gt;policy for task_should_scx(), and
moved the call before __setscheduler_params() updates it, causing it
to be using the old p-&gt;policy value.

Resolve this by changing task_should_scx() to take the policy itself
instead of a task pointer, such that __sched_setscheduler() can pass
in the updated policy.

Fixes: 98442f0ccd82 ("sched: Fix delayed_dequeue vs switched_from_fair()")
Signed-off-by: Aboorva Devarajan &lt;aboorvad@linux.ibm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>scx: Fix raciness in scx_ops_bypass()</title>
<updated>2024-10-25T21:10:51Z</updated>
<author>
<name>David Vernet</name>
<email>void@manifault.com</email>
</author>
<published>2024-10-25T20:54:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0e7ffff1b8117b05635c87d3c9099f6aa9c9b689'/>
<id>urn:sha1:0e7ffff1b8117b05635c87d3c9099f6aa9c9b689</id>
<content type='text'>
scx_ops_bypass() can currently race on the ops enable / disable path as
follows:

1. scx_ops_bypass(true) called on enable path, bypass depth is set to 1
2. An op on the init path exits, which schedules scx_ops_disable_workfn()
3. scx_ops_bypass(false) is called on the disable path, and bypass depth
   is decremented to 0
4. kthread is scheduled to execute scx_ops_disable_workfn()
5. scx_ops_bypass(true) called, bypass depth set to 1
6. scx_ops_bypass() races when iterating over CPUs

While it's not safe to take any blocking locks on the bypass path, it is
safe to take a raw spinlock which cannot be preempted. This patch therefore
updates scx_ops_bypass() to use a raw spinlock to synchronize, and changes
scx_ops_bypass_depth to be a regular int.

Without this change, we observe the following warnings when running the
'exit' sched_ext selftest (sometimes requires a couple of runs):

.[root@virtme-ng sched_ext]# ./runner -t exit
===== START =====
TEST: exit
...
[   14.935078] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4332 scx_ops_bypass+0x1ca/0x280
[   14.935126] Modules linked in:
[   14.935150] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Not tainted 6.11.0-virtme #24
[   14.935192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   14.935242] Sched_ext: exit (enabling+all)
[   14.935244] RIP: 0010:scx_ops_bypass+0x1ca/0x280
[   14.935300] Code: ff ff ff e8 48 96 10 00 fb e9 08 ff ff ff c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 &lt;0f&gt; 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 0f 0b 90 41 8b 84 24 24
[   14.935394] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010002
[   14.935424] RAX: 0000000000000009 RBX: 0000000000000001 RCX: 00000000e3fb8b2a
[   14.935465] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff88a4c080
[   14.935512] RBP: 0000000000009b56 R08: 0000000000000004 R09: 00000003f12e520a
[   14.935555] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300
[   14.935598] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018
[   14.935642] FS:  0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000
[   14.935684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.935721] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0
[   14.935765] PKRU: 55555554
[   14.935782] Call Trace:
[   14.935802]  &lt;TASK&gt;
[   14.935823]  ? __warn+0xce/0x220
[   14.935850]  ? scx_ops_bypass+0x1ca/0x280
[   14.935881]  ? report_bug+0xc1/0x160
[   14.935909]  ? handle_bug+0x61/0x90
[   14.935934]  ? exc_invalid_op+0x1a/0x50
[   14.935959]  ? asm_exc_invalid_op+0x1a/0x20
[   14.935984]  ? raw_spin_rq_lock_nested+0x15/0x30
[   14.936019]  ? scx_ops_bypass+0x1ca/0x280
[   14.936046]  ? srso_alias_return_thunk+0x5/0xfbef5
[   14.936081]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[   14.936111]  scx_ops_disable_workfn+0x146/0xac0
[   14.936142]  ? finish_task_switch+0xa9/0x2c0
[   14.936172]  ? srso_alias_return_thunk+0x5/0xfbef5
[   14.936211]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[   14.936244]  kthread_worker_fn+0x101/0x2c0
[   14.936268]  ? __pfx_kthread_worker_fn+0x10/0x10
[   14.936299]  kthread+0xec/0x110
[   14.936327]  ? __pfx_kthread+0x10/0x10
[   14.936351]  ret_from_fork+0x37/0x50
[   14.936374]  ? __pfx_kthread+0x10/0x10
[   14.936400]  ret_from_fork_asm+0x1a/0x30
[   14.936427]  &lt;/TASK&gt;
[   14.936443] irq event stamp: 21002
[   14.936467] hardirqs last  enabled at (21001): [&lt;ffffffff863aa35f&gt;] resched_cpu+0x9f/0xd0
[   14.936521] hardirqs last disabled at (21002): [&lt;ffffffff863dd0ba&gt;] scx_ops_bypass+0x11a/0x280
[   14.936571] softirqs last  enabled at (20642): [&lt;ffffffff863683d7&gt;] __irq_exit_rcu+0x67/0xd0
[   14.936622] softirqs last disabled at (20637): [&lt;ffffffff863683d7&gt;] __irq_exit_rcu+0x67/0xd0
[   14.936672] ---[ end trace 0000000000000000 ]---
[   14.953282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   14.953352] ------------[ cut here ]------------
[   14.953383] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4335 scx_ops_bypass+0x1d8/0x280
[   14.953428] Modules linked in:
[   14.953453] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Tainted: G        W          6.11.0-virtme #24
[   14.953505] Tainted: [W]=WARN
[   14.953527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   14.953574] RIP: 0010:scx_ops_bypass+0x1d8/0x280
[   14.953603] Code: c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 0f 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 &lt;0f&gt; 0b 90 41 8b 84 24 24 0a 00 00 eb 92 f3 0f 1e fa 49 8d 84 24 f0
[   14.953693] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010046
[   14.953722] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
[   14.953763] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fc5fec31318
[   14.953804] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   14.953845] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300
[   14.953888] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018
[   14.953934] FS:  0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000
[   14.953974] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.954009] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0
[   14.954052] PKRU: 55555554
[   14.954068] Call Trace:
[   14.954085]  &lt;TASK&gt;
[   14.954102]  ? __warn+0xce/0x220
[   14.954126]  ? scx_ops_bypass+0x1d8/0x280
[   14.954150]  ? report_bug+0xc1/0x160
[   14.954178]  ? handle_bug+0x61/0x90
[   14.954203]  ? exc_invalid_op+0x1a/0x50
[   14.954226]  ? asm_exc_invalid_op+0x1a/0x20
[   14.954250]  ? raw_spin_rq_lock_nested+0x15/0x30
[   14.954285]  ? scx_ops_bypass+0x1d8/0x280
[   14.954311]  ? __mutex_unlock_slowpath+0x3a/0x260
[   14.954343]  scx_ops_disable_workfn+0xa3e/0xac0
[   14.954381]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[   14.954413]  kthread_worker_fn+0x101/0x2c0
[   14.954442]  ? __pfx_kthread_worker_fn+0x10/0x10
[   14.954479]  kthread+0xec/0x110
[   14.954507]  ? __pfx_kthread+0x10/0x10
[   14.954530]  ret_from_fork+0x37/0x50
[   14.954553]  ? __pfx_kthread+0x10/0x10
[   14.954576]  ret_from_fork_asm+0x1a/0x30
[   14.954603]  &lt;/TASK&gt;
[   14.954621] irq event stamp: 21002
[   14.954644] hardirqs last  enabled at (21001): [&lt;ffffffff863aa35f&gt;] resched_cpu+0x9f/0xd0
[   14.954686] hardirqs last disabled at (21002): [&lt;ffffffff863dd0ba&gt;] scx_ops_bypass+0x11a/0x280
[   14.954735] softirqs last  enabled at (20642): [&lt;ffffffff863683d7&gt;] __irq_exit_rcu+0x67/0xd0
[   14.954782] softirqs last disabled at (20637): [&lt;ffffffff863683d7&gt;] __irq_exit_rcu+0x67/0xd0
[   14.954829] ---[ end trace 0000000000000000 ]---
[   15.022283] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   15.092282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[   15.149282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
ok 1 exit #
=====  END  =====

And with it, the test passes without issue after 1000s of runs:

.[root@virtme-ng sched_ext]# ./runner -t exit
===== START =====
TEST: exit
DESCRIPTION: Verify we can cleanly exit a scheduler in multiple places
OUTPUT:
[    7.412856] sched_ext: BPF scheduler "exit" enabled
[    7.427924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.466677] sched_ext: BPF scheduler "exit" enabled
[    7.475923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.512803] sched_ext: BPF scheduler "exit" enabled
[    7.532924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.586809] sched_ext: BPF scheduler "exit" enabled
[    7.595926] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.661923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
[    7.723923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF)
ok 1 exit #
=====  END  =====

=============================

RESULTS:

PASSED:  1
SKIPPED: 0
FAILED:  0

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: David Vernet &lt;void@manifault.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
