<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v6.18</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.18</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.18'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-11-20T19:04:37Z</updated>
<entry>
<title>Merge tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2025-11-20T19:04:37Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-11-20T19:04:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fd95357fd8c6778ac7dea6c57a19b8b182b6e91f'/>
<id>urn:sha1:fd95357fd8c6778ac7dea6c57a19b8b182b6e91f</id>
<content type='text'>
Pull sched_ext fix from Tejun Heo:
 "One low risk and obvious fix: scx_enable() was dereferencing an error
  pointer on helper kthread creation failure. Fixed"

* tag 'sched_ext-for-6.18-rc6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix scx_enable() crash on helper kthread creation failure
</content>
</entry>
<entry>
<title>sched_ext: Fix scx_enable() crash on helper kthread creation failure</title>
<updated>2025-11-20T18:45:43Z</updated>
<author>
<name>Saket Kumar Bhaskar</name>
<email>skb99@linux.ibm.com</email>
</author>
<published>2025-11-19T10:37:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7b6216baae751369195fa3c83d434d23bcda406a'/>
<id>urn:sha1:7b6216baae751369195fa3c83d434d23bcda406a</id>
<content type='text'>
A crash was observed when the sched_ext selftests runner was
terminated with Ctrl+\ while test 15 was running:

NIP [c00000000028fa58] scx_enable.constprop.0+0x358/0x12b0
LR [c00000000028fa2c] scx_enable.constprop.0+0x32c/0x12b0
Call Trace:
scx_enable.constprop.0+0x32c/0x12b0 (unreliable)
bpf_struct_ops_link_create+0x18c/0x22c
__sys_bpf+0x23f8/0x3044
sys_bpf+0x2c/0x6c
system_call_exception+0x124/0x320
system_call_vectored_common+0x15c/0x2ec

kthread_run_worker() returns an ERR_PTR() on failure rather than NULL,
but the current code in scx_alloc_and_add_sched() only checks for a NULL
helper. Incase of failure on SIGQUIT, the error is not handled in
scx_alloc_and_add_sched() and scx_enable() ends up dereferencing an
error pointer.

Error handling is fixed in scx_alloc_and_add_sched() to propagate
PTR_ERR() into ret, so that scx_enable() jumps to the existing error
path, avoiding random dereference on failure.

Fixes: bff3b5aec1b7 ("sched_ext: Move disable machinery into scx_sched")
Cc: stable@vger.kernel.org # v6.16+
Reported-and-tested-by: Samir Mulani &lt;samir@linux.ibm.com&gt;
Signed-off-by: Saket Kumar Bhaskar &lt;skb99@linux.ibm.com&gt;
Reviewed-by: Emil Tsalapatis &lt;emil@etsalapatis.com&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Reviewed-by: Vishal Chourasia &lt;vishalc@linux.ibm.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2025-11-17T17:01:22Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-11-17T17:01:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=418592a04021baf5fcfbd0b7b7c5330bb135d10f'/>
<id>urn:sha1:418592a04021baf5fcfbd0b7b7c5330bb135d10f</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:
 "Five fixes addressing PREEMPT_RT compatibility and locking issues.

  Three commits fix potential deadlocks and sleeps in atomic contexts on
  RT kernels by converting locks to raw spinlocks and ensuring IRQ work
  runs in hard-irq context. The remaining two fix unsafe locking in the
  debug dump path and a variable dereference typo"

* tag 'sched_ext-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Use IRQ_WORK_INIT_HARD() to initialize rq-&gt;scx.kick_cpus_irq_work
  sched_ext: Fix possible deadlock in the deferred_irq_workfn()
  sched/ext: convert scx_tasks_lock to raw spinlock
  sched_ext: Fix unsafe locking in the scx_dump_state()
  sched_ext: Fix use of uninitialized variable in scx_bpf_cpuperf_set()
</content>
</entry>
<entry>
<title>sched_ext: Use IRQ_WORK_INIT_HARD() to initialize rq-&gt;scx.kick_cpus_irq_work</title>
<updated>2025-11-17T15:07:22Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang@linux.dev</email>
</author>
<published>2025-11-17T12:53:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=36c6f3c03d104faf1aa90922f2310549c175420f'/>
<id>urn:sha1:36c6f3c03d104faf1aa90922f2310549c175420f</id>
<content type='text'>
For PREEMPT_RT kernels, the kick_cpus_irq_workfn() be invoked in
the per-cpu irq_work/* task context and there is no rcu-read critical
section to protect. this commit therefore use IRQ_WORK_INIT_HARD() to
initialize the per-cpu rq-&gt;scx.kick_cpus_irq_work in the
init_sched_ext_class().

Signed-off-by: Zqiang &lt;qiang.zhang@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix possible deadlock in the deferred_irq_workfn()</title>
<updated>2025-11-13T18:29:28Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang@linux.dev</email>
</author>
<published>2025-11-13T11:43:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a257e974210320ede524f340ffe16bf4bf0dda1e'/>
<id>urn:sha1:a257e974210320ede524f340ffe16bf4bf0dda1e</id>
<content type='text'>
For PREEMPT_RT=y kernels, the deferred_irq_workfn() is executed in
the per-cpu irq_work/* task context and not disable-irq, if the rq
returned by container_of() is current CPU's rq, the following scenarios
may occur:

lock(&amp;rq-&gt;__lock);
&lt;Interrupt&gt;
  lock(&amp;rq-&gt;__lock);

This commit use IRQ_WORK_INIT_HARD() to replace init_irq_work() to
initialize rq-&gt;scx.deferred_irq_work, make the deferred_irq_workfn()
is always invoked in hard-irq context.

Signed-off-by: Zqiang &lt;qiang.zhang@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/ext: convert scx_tasks_lock to raw spinlock</title>
<updated>2025-11-12T18:42:02Z</updated>
<author>
<name>Emil Tsalapatis</name>
<email>etsal@meta.com</email>
</author>
<published>2025-11-12T18:42:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c87488a12393a23f8a1b9850b989b386c58cac3f'/>
<id>urn:sha1:c87488a12393a23f8a1b9850b989b386c58cac3f</id>
<content type='text'>
Update scx_task_locks so that it's safe to lock/unlock in a
non-sleepable context in PREEMPT_RT kernels. scx_task_locks is
(non-raw) spinlock used to protect the list of tasks under SCX.
This list is updated during from finish_task_switch(), which
cannot sleep. Regular spinlocks can be locked in such a context
in non-RT kernels, but are sleepable under when CONFIG_PREEMPT_RT=y.

Convert scx_task_locks into a raw spinlock, which is not sleepable
even on RT kernels.

Sample backtrace:

&lt;TASK&gt;
dump_stack_lvl+0x83/0xa0
__might_resched+0x14a/0x200
rt_spin_lock+0x61/0x1c0
? sched_ext_dead+0x2d/0xf0
? lock_release+0xc6/0x280
sched_ext_dead+0x2d/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
finish_task_switch.isra.0+0x254/0x360
__schedule+0x584/0x11d0
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? tick_nohz_idle_exit+0x7e/0x120
schedule_idle+0x23/0x40
cpu_startup_entry+0x29/0x30
start_secondary+0xf8/0x100
common_startup_64+0x13e/0x148
&lt;/TASK&gt;

Signed-off-by: Emil Tsalapatis &lt;emil@etsalapatis.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix unsafe locking in the scx_dump_state()</title>
<updated>2025-11-12T16:28:32Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang@linux.dev</email>
</author>
<published>2025-11-12T07:33:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5f02151c411dda46efcc5dc57b0845efcdcfc26d'/>
<id>urn:sha1:5f02151c411dda46efcc5dc57b0845efcdcfc26d</id>
<content type='text'>
For built with CONFIG_PREEMPT_RT=y kernels, the dump_lock will be converted
sleepable spinlock and not disable-irq, so the following scenarios occur:

inconsistent {IN-HARDIRQ-W} -&gt; {HARDIRQ-ON-W} usage.
irq_work/0/27 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&amp;rq-&gt;__lock){?...}-{2:2}, at: raw_spin_rq_lock_nested+0x2b/0x40
{IN-HARDIRQ-W} state was registered at:
   lock_acquire+0x1e1/0x510
   _raw_spin_lock_nested+0x42/0x80
   raw_spin_rq_lock_nested+0x2b/0x40
   sched_tick+0xae/0x7b0
   update_process_times+0x14c/0x1b0
   tick_periodic+0x62/0x1f0
   tick_handle_periodic+0x48/0xf0
   timer_interrupt+0x55/0x80
   __handle_irq_event_percpu+0x20a/0x5c0
   handle_irq_event_percpu+0x18/0xc0
   handle_irq_event+0xb5/0x150
   handle_level_irq+0x220/0x460
   __common_interrupt+0xa2/0x1e0
   common_interrupt+0xb0/0xd0
   asm_common_interrupt+0x2b/0x40
   _raw_spin_unlock_irqrestore+0x45/0x80
   __setup_irq+0xc34/0x1a30
   request_threaded_irq+0x214/0x2f0
   hpet_time_init+0x3e/0x60
   x86_late_time_init+0x5b/0xb0
   start_kernel+0x308/0x410
   x86_64_start_reservations+0x1c/0x30
   x86_64_start_kernel+0x96/0xa0
   common_startup_64+0x13e/0x148

 other info that might help us debug this:
 Possible unsafe locking scenario:

        CPU0
        ----
   lock(&amp;rq-&gt;__lock);
   &lt;Interrupt&gt;
     lock(&amp;rq-&gt;__lock);

  *** DEADLOCK ***

 stack backtrace:
 CPU: 0 UID: 0 PID: 27 Comm: irq_work/0
 Call Trace:
  &lt;TASK&gt;
  dump_stack_lvl+0x8c/0xd0
  dump_stack+0x14/0x20
  print_usage_bug+0x42e/0x690
  mark_lock.part.44+0x867/0xa70
  ? __pfx_mark_lock.part.44+0x10/0x10
  ? string_nocheck+0x19c/0x310
  ? number+0x739/0x9f0
  ? __pfx_string_nocheck+0x10/0x10
  ? __pfx_check_pointer+0x10/0x10
  ? kvm_sched_clock_read+0x15/0x30
  ? sched_clock_noinstr+0xd/0x20
  ? local_clock_noinstr+0x1c/0xe0
  __lock_acquire+0xc4b/0x62b0
  ? __pfx_format_decode+0x10/0x10
  ? __pfx_string+0x10/0x10
  ? __pfx___lock_acquire+0x10/0x10
  ? __pfx_vsnprintf+0x10/0x10
  lock_acquire+0x1e1/0x510
  ? raw_spin_rq_lock_nested+0x2b/0x40
  ? __pfx_lock_acquire+0x10/0x10
  ? dump_line+0x12e/0x270
  ? raw_spin_rq_lock_nested+0x20/0x40
  _raw_spin_lock_nested+0x42/0x80
  ? raw_spin_rq_lock_nested+0x2b/0x40
  raw_spin_rq_lock_nested+0x2b/0x40
  scx_dump_state+0x3b3/0x1270
  ? finish_task_switch+0x27e/0x840
  scx_ops_error_irq_workfn+0x67/0x80
  irq_work_single+0x113/0x260
  irq_work_run_list.part.3+0x44/0x70
  run_irq_workd+0x6b/0x90
  ? __pfx_run_irq_workd+0x10/0x10
  smpboot_thread_fn+0x529/0x870
  ? __pfx_smpboot_thread_fn+0x10/0x10
  kthread+0x305/0x3f0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x40/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  &lt;/TASK&gt;

This commit therefore use rq_lock_irqsave/irqrestore() to replace
rq_lock/unlock() in the scx_dump_state().

Fixes: 07814a9439a3 ("sched_ext: Print debug dump after an error exit")
Signed-off-by: Zqiang &lt;qiang.zhang@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining</title>
<updated>2025-11-06T11:30:52Z</updated>
<author>
<name>Aaron Lu</name>
<email>ziqianlu@bytedance.com</email>
</author>
<published>2025-10-30T03:27:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=956dfda6a70885f18c0f8236a461aa2bc4f556ad'/>
<id>urn:sha1:956dfda6a70885f18c0f8236a461aa2bc4f556ad</id>
<content type='text'>
When a cfs_rq is to be throttled, its limbo list should be empty and
that's why there is a warn in tg_throttle_down() for non empty
cfs_rq-&gt;throttled_limbo_list.

When running a test with the following hierarchy:

          root
        /      \
        A*     ...
     /  |  \   ...
        B
       /  \
      C*

where both A and C have quota settings, that warn on non empty limbo list
is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
part of the cfs_rq for the sake of simpler representation).

Debug showed it happened like this:
Task group C is created and quota is set, so in tg_set_cfs_bandwidth(),
cfs_rq_c is initialized with runtime_enabled set, runtime_remaining
equals to 0 and *unthrottled*. Before any tasks are enqueued to cfs_rq_c,
*multiple* throttled tasks can migrate to cfs_rq_c (e.g., due to task
group changes). When enqueue_task_fair(cfs_rq_c, throttled_task) is
called and cfs_rq_c is in a throttled hierarchy (e.g., A is throttled),
these throttled tasks are directly placed into cfs_rq_c's limbo list by
enqueue_throttled_task().

Later, when A is unthrottled, tg_unthrottle_up(cfs_rq_c) enqueues these
tasks. The first enqueue triggers check_enqueue_throttle(), and with zero
runtime_remaining, cfs_rq_c can be throttled in throttle_cfs_rq() if it
can't get more runtime and enters tg_throttle_down(), where the warning
is hit due to remaining tasks in the limbo list.

I think it's a chaos to trigger throttle on unthrottle path, the status
of a being unthrottled cfs_rq can be in a mixed state in the end, so fix
this by granting 1ns to cfs_rq in tg_set_cfs_bandwidth(). This ensures
cfs_rq_c has a positive runtime_remaining when initialized as unthrottled
and cannot enter tg_unthrottle_up() with zero runtime_remaining.

Also, update outdated comments in tg_throttle_down() since
unthrottle_cfs_rq() is no longer called with zero runtime_remaining.
While at it, remove a redundant assignment to se in tg_throttle_down().

Fixes: e1fad12dcb66 ("sched/fair: Switch to task based throttle model")
Reviewed-By: Benjamin Segall &lt;bsegall@google.com&gt;
Suggested-by: Benjamin Segall &lt;bsegall@google.com&gt;
Signed-off-by: Aaron Lu &lt;ziqianlu@bytedance.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Tested-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Tested-by: Hao Jia &lt;jiahao1@lixiang.com&gt;
Link: https://patch.msgid.link/20251030032755.560-1-ziqianlu@bytedance.com
</content>
</entry>
<entry>
<title>sched_ext: Fix use of uninitialized variable in scx_bpf_cpuperf_set()</title>
<updated>2025-10-29T15:14:39Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-10-29T13:08:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f4fa7c25f632cd925352b4d46f245653a23b1d1a'/>
<id>urn:sha1:f4fa7c25f632cd925352b4d46f245653a23b1d1a</id>
<content type='text'>
scx_bpf_cpuperf_set() has a typo where it dereferences the local
variable @sch, instead of the global @scx_root pointer. Fix by
dereferencing the correct variable.

Fixes: 956f2b11a8a4f ("sched_ext: Drop kf_cpu_valid()")
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Reviewed-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.18-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2025-10-27T17:52:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-10-27T17:52:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fd57572253bc356330dbe5b233c2e1d8426c66fd'/>
<id>urn:sha1:fd57572253bc356330dbe5b233c2e1d8426c66fd</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:

 - Fix scx_kick_pseqs corruption when multiple schedulers are loaded
   concurrently

 - Allocate scx_kick_cpus_pnt_seqs lazily using kvzalloc() to handle
   systems with large CPU counts

 - Defer queue_balance_callback() until after ops.dispatch to fix
   callback ordering issues

 - Sync error_irq_work before freeing scx_sched to prevent
   use-after-free

 - Mark scx_bpf_dsq_move_set_[slice|vtime]() with KF_RCU for proper RCU
   protection

 - Fix flag check for deferred callbacks

* tag 'sched_ext-for-6.18-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: fix flag check for deferred callbacks
  sched_ext: Fix scx_kick_pseqs corruption on concurrent scheduler loads
  sched_ext: Allocate scx_kick_cpus_pnt_seqs lazily using kvzalloc()
  sched_ext: defer queue_balance_callback() until after ops.dispatch
  sched_ext: Sync error_irq_work before freeing scx_sched
  sched_ext: Mark scx_bpf_dsq_move_set_[slice|vtime]() with KF_RCU
</content>
</entry>
</feed>
