<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v6.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-05-13T01:02:05Z</updated>
<entry>
<title>Merge tag 'sched_ext-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2025-05-13T01:02:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-13T01:02:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e9565e23cd89d4d5cd4388f8742130be1d6f182d'/>
<id>urn:sha1:e9565e23cd89d4d5cd4388f8742130be1d6f182d</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:
 "A little bit invasive for rc6 but they're important fixes, pass tests
  fine and won't break anything outside sched_ext:

   - scx_bpf_cpuperf_set() calls internal functions that require the rq
     to be locked. It assumed that the BPF caller has rq locked but
     that's not always true. Fix it by tracking whether rq is currently
     held by the CPU and grabbing it if necessary

   - bpf_iter_scx_dsq_new() was leaving the DSQ iterator in an
     uninitialized state after an error. However, next() and destroy()
     can be called on an iterator which failed initialization and thus
     they always need to be initialized even after an init error. Fix by
     always initializing the iterator

   - Remove duplicate BTF_ID_FLAGS() entries"

* tag 'sched_ext-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: bpf_iter_scx_dsq_new() should always initialize iterator
  sched_ext: Fix rq lock state in hotplug ops
  sched_ext: Remove duplicate BTF_ID_FLAGS definitions
  sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set()
  sched_ext: Track currently locked rq
</content>
</entry>
<entry>
<title>sched_ext: bpf_iter_scx_dsq_new() should always initialize iterator</title>
<updated>2025-05-07T16:24:07Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-05-05T21:30:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=428dc9fc0873989d73918d4a9cc22745b7bbc799'/>
<id>urn:sha1:428dc9fc0873989d73918d4a9cc22745b7bbc799</id>
<content type='text'>
BPF programs may call next() and destroy() on BPF iterators even after new()
returns an error value (e.g. bpf_for_each() macro ignores error returns from
new()). bpf_iter_scx_dsq_new() could leave the iterator in an uninitialized
state after an error return causing bpf_iter_scx_dsq_next() to dereference
garbage data. Make bpf_iter_scx_dsq_new() always clear $kit-&gt;dsq so that
next() and destroy() become noops.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 650ba21b131e ("sched_ext: Implement DSQ iterator")
Cc: stable@vger.kernel.org # v6.12+
Acked-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix rq lock state in hotplug ops</title>
<updated>2025-04-29T18:01:32Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-04-28T21:43:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e38be1c7647c8c78304ce6d931b3b654e27948b3'/>
<id>urn:sha1:e38be1c7647c8c78304ce6d931b3b654e27948b3</id>
<content type='text'>
The ops.cpu_online() and ops.cpu_offline() callbacks incorrectly assume
that the rq involved in the operation is locked, which is not the case
during hotplug, triggering the following warning:

  WARNING: CPU: 1 PID: 20 at kernel/sched/sched.h:1504 handle_hotplug+0x280/0x340

Fix by not tracking the target rq as locked in the context of
ops.cpu_online() and ops.cpu_offline().

Fixes: 18853ba782bef ("sched_ext: Track currently locked rq")
Reported-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Tested-by: Changwoo Min &lt;changwoo@igalia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-04-26T16:23:20Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-04-26T16:23:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3d23ef05c32464dfb1b010301e332c0dfc62e282'/>
<id>urn:sha1:3d23ef05c32464dfb1b010301e332c0dfc62e282</id>
<content type='text'>
Pull scheduler fix from Ingo Molnar:
 "Fix sporadic crashes in dequeue_entities() due to ... bad math.

  [ Arguably if pick_eevdf()/pick_next_entity() was less trusting of
    complex math being correct it could have de-escalated a crash into
    a warning, but that's for a different patch ]"

* tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/eevdf: Fix se-&gt;slice being set to U64_MAX and resulting crash
</content>
</entry>
<entry>
<title>sched/eevdf: Fix se-&gt;slice being set to U64_MAX and resulting crash</title>
<updated>2025-04-26T08:44:36Z</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2025-04-25T08:51:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bbce3de72be56e4b5f68924b7da9630cc89aa1a8'/>
<id>urn:sha1:bbce3de72be56e4b5f68924b7da9630cc89aa1a8</id>
<content type='text'>
There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.

The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:

1. In the if (entity_is_task(se)) else block at the beginning of
   dequeue_entities(), slice is set to
   cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
   it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
   tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
   first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's -&gt;slice to
   the saved slice, which is still U64_MAX.

This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:

6. In update_entity_lag(), se-&gt;slice is used to calculate limit, which
   ends up as a huge negative number.
7. limit is used in se-&gt;vlag = clamp(vlag, -limit, limit). Because limit
   is negative, vlag &gt; limit, so se-&gt;vlag is set to the same huge
   negative number.
8. In place_entity(), se-&gt;vlag is scaled, which overflows and results in
   another huge (positive or negative) number.
9. The adjusted lag is subtracted from se-&gt;vruntime, which increases or
   decreases se-&gt;vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
    incorrectly returns false because the vruntime is so far from the
    other vruntimes on the queue, causing the
    (vruntime - cfs_rq-&gt;min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
    pick_eevdf() and crashes.

Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se-&gt;slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).

Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.

Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
</content>
</entry>
<entry>
<title>sched_ext: Remove duplicate BTF_ID_FLAGS definitions</title>
<updated>2025-04-25T18:41:02Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-04-25T17:57:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e7dcd1304b9ac08c428c84fdb96ce6a04ff2114f'/>
<id>urn:sha1:e7dcd1304b9ac08c428c84fdb96ce6a04ff2114f</id>
<content type='text'>
Some kfuncs specific to the idle CPU selection policy are registered in
both the scx_kfunc_ids_any and scx_kfunc_ids_idle blocks, even though
they should only be defined in the latter.

Remove the duplicates from scx_kfunc_ids_any.

Fixes: 337d1b354a297 ("sched_ext: Move built-in idle CPU selection policy to a separate file")
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set()</title>
<updated>2025-04-22T19:28:12Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-04-22T08:26:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a11d6784d7316a6c77ca9f14fb1a698ebbb3c1fb'/>
<id>urn:sha1:a11d6784d7316a6c77ca9f14fb1a698ebbb3c1fb</id>
<content type='text'>
scx_bpf_cpuperf_set() can be used to set a performance target level on
any CPU. However, it doesn't correctly acquire the corresponding rq
lock, which may lead to unsafe behavior and trigger the following
warning, due to the lockdep_assert_rq_held() check:

[   51.713737] WARNING: CPU: 3 PID: 3899 at kernel/sched/sched.h:1512 scx_bpf_cpuperf_set+0x1a0/0x1e0
...
[   51.713836] Call trace:
[   51.713837]  scx_bpf_cpuperf_set+0x1a0/0x1e0 (P)
[   51.713839]  bpf_prog_62d35beb9301601f_bpfland_init+0x168/0x440
[   51.713841]  bpf__sched_ext_ops_init+0x54/0x8c
[   51.713843]  scx_ops_enable.constprop.0+0x2c0/0x10f0
[   51.713845]  bpf_scx_reg+0x18/0x30
[   51.713847]  bpf_struct_ops_link_create+0x154/0x1b0
[   51.713849]  __sys_bpf+0x1934/0x22a0

Fix by properly acquiring the rq lock when possible or raising an error
if we try to operate on a CPU that is not the one currently locked.

Fixes: d86adb4fc0655 ("sched_ext: Add cpuperf support")
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Acked-by: Changwoo Min &lt;changwoo@igalia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Track currently locked rq</title>
<updated>2025-04-22T19:27:50Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-04-22T08:26:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=18853ba782bef65fc81ef2b3370382e5b479c5eb'/>
<id>urn:sha1:18853ba782bef65fc81ef2b3370382e5b479c5eb</id>
<content type='text'>
Some kfuncs provided by sched_ext may need to operate on a struct rq,
but they can be invoked from various contexts, specifically, different
scx callbacks.

While some of these callbacks are invoked with a particular rq already
locked, others are not. This makes it impossible for a kfunc to reliably
determine whether it's safe to access a given rq, triggering potential
bugs or unsafe behaviors, see for example [1].

To address this, track the currently locked rq whenever a sched_ext
callback is invoked via SCX_CALL_OP*().

This allows kfuncs that need to operate on an arbitrary rq to retrieve
the currently locked one and apply the appropriate action as needed.

[1] https://lore.kernel.org/lkml/20250325140021.73570-1-arighi@nvidia.com/

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Acked-by: Changwoo Min &lt;changwoo@igalia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2025-04-22T02:16:29Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-04-22T02:16:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a33b5a08cbbdd7aadff95f40cbb45ab86841679e'/>
<id>urn:sha1:a33b5a08cbbdd7aadff95f40cbb45ab86841679e</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:

 - Use kvzalloc() so that large exit_dump buffer allocations don't fail
   easily

 - Remove cpu.weight / cpu.idle unimplemented warnings which are more
   annoying than helpful.

   This makes SCX_OPS_HAS_CGROUP_WEIGHT unnecessary. Mark it for
   deprecation

* tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Mark SCX_OPS_HAS_CGROUP_WEIGHT for deprecation
  sched_ext: Remove cpu.weight / cpu.idle unimplemented warnings
  sched_ext: Use kvzalloc for large exit_dump allocation
</content>
</entry>
<entry>
<title>cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()</title>
<updated>2025-04-17T15:54:44Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2025-04-15T10:00:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=75da043d8f880bde8616fd81638c4e2cdb186a08'/>
<id>urn:sha1:75da043d8f880bde8616fd81638c4e2cdb186a08</id>
<content type='text'>
Notice that ignore_dl_rate_limit() need not piggy back on the
limits_changed handling to achieve its goal (which is to enforce a
frequency update before its due time).

Namely, if sugov_should_update_freq() is updated to check
sg_policy-&gt;need_freq_update and return 'true' if it is set when
sg_policy-&gt;limits_changed is not set, ignore_dl_rate_limit() may
set the former directly instead of setting the latter, so it can
avoid hitting the memory barrier in sugov_should_update_freq().

Update the code accordingly.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Link: https://patch.msgid.link/10666429.nUPlyArG6x@rjwysocki.net
</content>
</entry>
</feed>
