<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched/debug.c, branch v6.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-07-10T04:07:56Z</updated>
<entry>
<title>Revert "sched/numa: add statistics of numa balance task"</title>
<updated>2025-07-10T04:07:56Z</updated>
<author>
<name>Chen Yu</name>
<email>yu.c.chen@intel.com</email>
</author>
<published>2025-07-04T13:56:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=db6cc3f4ac2e6cdc898fc9cbc8b32ae1bf56bdad'/>
<id>urn:sha1:db6cc3f4ac2e6cdc898fc9cbc8b32ae1bf56bdad</id>
<content type='text'>
This reverts commit ad6b26b6a0a79166b53209df2ca1cf8636296382.

This commit introduces per-memcg/task NUMA balance statistics, but
unfortunately it introduced a NULL pointer exception due to the following
race condition: After a swap task candidate was chosen, its mm_struct
pointer was set to NULL due to task exit.  Later, when performing the
actual task swapping, the p-&gt;mm caused the problem.

CPU0                                   CPU1
:
...
task_numa_migrate
     task_numa_find_cpu
      task_numa_compare
        # a normal task p is chosen
        env-&gt;best_task = p

                                          # p exit:
                                          exit_signals(p);
                                             p-&gt;flags |= PF_EXITING
                                          exit_mm
                                             p-&gt;mm = NULL;

      migrate_swap_stop
        __migrate_swap_task((arg-&gt;src_task, arg-&gt;dst_cpu)
         count_memcg_event_mm(p-&gt;mm, NUMA_TASK_SWAP)# p-&gt;mm is NULL

task_lock() should be held and the PF_EXITING flag needs to be checked to
prevent this from happening.  After discussion, the conclusion was that
adding a lock is not worthwhile for some statistics calculations.  Revert
the change and rely on the tracepoint for this purpose.

Link: https://lkml.kernel.org/r/20250704135620.685752-1-yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/20250708064917.BBD13C4CEED@smtp.kernel.org
Fixes: ad6b26b6a0a7 ("sched/numa: add statistics of numa balance task")
Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Reported-by: Jirka Hladky &lt;jhladky@redhat.com&gt;
Closes: https://lore.kernel.org/all/CAE4VaGBLJxpd=NeRJXpSCuw=REhC5LWJpC29kDy-Zh2ZDyzQZA@mail.gmail.com/
Reported-by: Srikanth Aithal &lt;Srikanth.Aithal@amd.com&gt;
Reported-by: Suneeth D &lt;Suneeth.D@amd.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Hladky &lt;jhladky@redhat.com&gt;
Cc: Libo Chen &lt;libo.chen@oracle.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-06-02T23:00:26Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-06-02T23:00:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fd1f8473503e5bf897bd3e8efe3545c0352954e6'/>
<id>urn:sha1:fd1f8473503e5bf897bd3e8efe3545c0352954e6</id>
<content type='text'>
Pull more MM updates from Andrew Morton:

 - "zram: support algorithm-specific parameters" from Sergey Senozhatsky
   adds infrastructure for passing algorithm-specific parameters into
   zram. A single parameter `winbits' is implemented at this time.

 - "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
   charging nmi-safe, which is required by BFP, which can operate in NMI
   context.

 - "Some random fixes and cleanup to shmem" from Kemeng Shi implements
   small fixes and cleanups in the shmem code.

 - "Skip mm selftests instead when kernel features are not present" from
   Zi Yan fixes some issues in the MM selftest code.

 - "mm/damon: build-enable essential DAMON components by default" from
   SeongJae Park reworks DAMON Kconfig to make it easier to enable
   CONFIG_DAMON.

 - "sched/numa: add statistics of numa balance task migration" from Libo
   Chen adds more info into sysfs and procfs files to improve visibility
   into the NUMA balancer's task migration activity.

 - "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
   provides various updates to some of the MM selftests to make them
   play better with the overall containing framework.

* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
  mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  selftests/mm: fix test result reporting in gup_longterm
  selftests/mm: report unique test names for each cow test
  selftests/mm: add helper for logging test start and results
  selftests/mm: use standard ksft_finished() in cow and gup_longterm
  selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
  sched/numa: add statistics of numa balance task
  sched/numa: fix task swap by skipping kernel threads
  tools/testing: check correct variable in open_procmap()
  tools/testing/vma: add missing function stub
  mm/gup: update comment explaining why gup_fast() disables IRQs
  selftests/mm: two fixes for the pfnmap test
  mm/khugepaged: fix race with folio split/free using temporary reference
  mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
  mmu_notifiers: remove leftover stub macros
  selftests/mm: deduplicate test names in madv_populate
  kcov: rust: add flags for KCOV with Rust
  mm: rust: make CONFIG_MMU ifdefs more narrow
  mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  mm/damon/Kconfig: enable CONFIG_DAMON by default
  ...
</content>
</entry>
<entry>
<title>sched/numa: add statistics of numa balance task</title>
<updated>2025-06-01T05:46:15Z</updated>
<author>
<name>Chen Yu</name>
<email>yu.c.chen@intel.com</email>
</author>
<published>2025-05-23T12:51:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad6b26b6a0a79166b53209df2ca1cf8636296382'/>
<id>urn:sha1:ad6b26b6a0a79166b53209df2ca1cf8636296382</id>
<content type='text'>
On systems with NUMA balancing enabled, it has been found that tracking
task activities resulting from NUMA balancing is beneficial.  NUMA
balancing employs two mechanisms for task migration: one is to migrate
a task to an idle CPU within its preferred node, and the other is to
swap tasks located on different nodes when they are on each other's
preferred nodes.

The kernel already provides NUMA page migration statistics in
/sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched.  However, it
lacks statistics regarding task migration and swapping.  Therefore,
relevant counts for task migration and swapping should be added.

The following two new fields:

numa_task_migrated
numa_task_swapped

will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
and /proc/vmstat.

Introducing both per-task and per-memory cgroup (memcg) NUMA balancing
statistics facilitates a rapid evaluation of the performance and
resource utilization of the target workload.  For instance, users can
first identify the container with high NUMA balancing activity and then
further pinpoint a specific task within that group, and subsequently
adjust the memory policy for that task.  In short, although it is
possible to iterate through /proc/$pid/sched to locate the problematic
task, the introduction of aggregated NUMA balancing activity for tasks
within each memcg can assist users in identifying the task more
efficiently through a divide-and-conquer approach.

As Libo Chen pointed out, the memcg event relies on the text names in
vmstat_text, and /proc/vmstat generates corresponding items based on
vmstat_text.  Thus, the relevant task migration and swapping events
introduced in vmstat_text also need to be populated by
count_vm_numa_event(), otherwise these values are zero in /proc/vmstat.

In theory, task migration and swap events are part of the scheduler's
activities.  The reason for exposing them through the
memory.stat/vmstat interface is that we already have NUMA balancing
statistics in memory.stat/vmstat, and these events are closely related
to each other.  Following Shakeel's suggestion, we describe the
end-to-end flow/story of all these events occurring on a timeline for
future reference:

The goal of NUMA balancing is to co-locate a task and its memory pages
on the same NUMA node.  There are two strategies: migrate the pages to
the task's node, or migrate the task to the node where its pages
reside.

Suppose a task p1 is running on Node 0, but its pages are located on
Node 1.  NUMA page fault statistics for p1 reveal its "page footprint"
across nodes.  If NUMA balancing detects that most of p1's pages are on
Node 1:

1.Page Migration Attempt:
The Numa balance first tries to migrate p1's pages to Node 0.
The numa_page_migrate counter increments.

2.Task Migration Strategies:
After the page migration finishes, Numa balance checks every
1 second to see if p1 can be migrated to Node 1.

Case 2.1: Idle CPU Available

  If Node 1 has an idle CPU, p1 is directly scheduled there.  This
  event is logged as numa_task_migrated.

Case 2.2: No Idle CPU (Task Swap)

  If all CPUs on Node1 are busy, direct migration could cause CPU
  contention or load imbalance.  Instead: The Numa balance selects a
  candidate task p2 on Node 1 that prefers Node 0 (e.g., due to its own
  page footprint).  p1 and p2 are swapped.  This cross-node swap is
  recorded as numa_task_swapped.

Link: https://lkml.kernel.org/r/d00edb12ba0f0de3c5222f61487e65f2ac58f5b1.1748493462.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com
Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Tested-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Tested-by: Madadi Vineeth Reddy &lt;vineethr@linux.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Cc: Aubrey Li &lt;aubrey.li@intel.com&gt;
Cc: Ayush Jain &lt;Ayush.jain3@amd.com&gt;
Cc: "Chen, Tim C" &lt;tim.c.chen@intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Libo Chen &lt;libo.chen@oracle.com&gt;
Cc: Mel Gorman &lt;mgorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/debug: Print the local group's asym_prefer_cpu</title>
<updated>2025-04-16T19:09:11Z</updated>
<author>
<name>K Prateek Nayak</name>
<email>kprateek.nayak@amd.com</email>
</author>
<published>2025-04-09T05:34:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=44671e21e3463f36f6c6e4b691216f60e85840e4'/>
<id>urn:sha1:44671e21e3463f36f6c6e4b691216f60e85840e4</id>
<content type='text'>
Add a file to read local group's "asym_prefer_cpu" from debugfs. This
information was useful when debugging issues where "asym_prefer_cpu" was
incorrectly set to a CPU with a lower asym priority.

Signed-off-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250409053446.23367-5-kprateek.nayak@amd.com
</content>
</entry>
<entry>
<title>sched/topology: Wrappers for sched_domains_mutex</title>
<updated>2025-03-17T10:23:41Z</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2025-03-13T17:03:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=56209334dda1832c0a919e1d74768c6d0f3b2ca9'/>
<id>urn:sha1:56209334dda1832c0a919e1d74768c6d0f3b2ca9</id>
<content type='text'>
Create wrappers for sched_domains_mutex so that it can transparently be
used on both CONFIG_SMP and !CONFIG_SMP, as some function will need to
do.

Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
Reported-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Tested-by: Waiman Long &lt;longman@redhat.com&gt;
Tested-by: Jon Hunter &lt;jonathanh@nvidia.com&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Link: https://lore.kernel.org/r/Z9MP5Oq9RB8jBs3y@jlelli-thinkpadt14gen4.remote.csb
</content>
</entry>
<entry>
<title>sched: Add a generic function to return the preemption string</title>
<updated>2025-03-17T10:23:38Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-03-14T16:08:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8bdc5daaa01e3054647d394d354762210ad88f17'/>
<id>urn:sha1:8bdc5daaa01e3054647d394d354762210ad88f17</id>
<content type='text'>
The individual architectures often add the preemption model to the begin
of the backtrace. This is the case on X86 or ARM64 for the "die" case
but not for regular warning. With the addition of DYNAMIC_PREEMPT for
PREEMPT_RT we end up with CONFIG_PREEMPT and CONFIG_PREEMPT_RT set
simultaneously. That means that everyone who tried to add that piece of
information gets it wrong for PREEMPT_RT because PREEMPT is checked
first.

Provide a generic function which returns the current scheduling model
considering LAZY preempt and the current state of PREEMPT_DYNAMIC.

The resulting strings are:
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃   Model   ┃  -RT -DYN    ┃     +RT -DYN      ┃     -RT +DYN       ┃     +RT +DYN      ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│NONE       │ NONE         │ n/a               │ PREEMPT(none)      │ n/a               │
├───────────┼──────────────┼───────────────────┼────────────────────┼───────────────────┤
│VOLUNTARY  │ VOLUNTARY    │ n/a               │ PREEMPT(voluntary) │ n/a               │
├───────────┼──────────────┼───────────────────┼────────────────────┼───────────────────┤
│FULL       │ PREEMPT      │ PREEMPT_RT        │ PREEMPT(full)      │ PREEMPT_{RT,full} │
├───────────┼──────────────┼───────────────────┼────────────────────┼───────────────────┤
│LAZY       │ PREEMPT_LAZY │ PREEMPT_{RT,LAZY} │ PREEMPT(lazy)      │ PREEMPT_{RT,lazy} │
└───────────┴──────────────┴───────────────────┴────────────────────┴───────────────────┘

[ The dynamic building of the string can lead to an empty string if the
  function is invoked simultaneously on two CPUs. ]

Co-developed-by: "Peter Zijlstra (Intel)" &lt;peterz@infradead.org&gt;
Signed-off-by: "Peter Zijlstra (Intel)" &lt;peterz@infradead.org&gt;
Co-developed-by: "Steven Rostedt (Google)" &lt;rostedt@goodmis.org&gt;
Signed-off-by: "Steven Rostedt (Google)" &lt;rostedt@goodmis.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Shrikanth Hegde &lt;sshegde@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250314160810.2373416-2-bigeasy@linutronix.de
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-02-08T19:16:22Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-02-08T19:16:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7b92e8969cd60a585629cef8d19b7ef330bc171'/>
<id>urn:sha1:c7b92e8969cd60a585629cef8d19b7ef330bc171</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Fix a cfs_rq-&gt;h_nr_runnable accounting bug that trips up a defensive
  SCHED_WARN_ON() on certain workloads. The bug is believed to be
  (accidentally) self-correcting, hence no behavioral side effects are
  expected.

  Also print se.slice in debug output, since this value can now be set
  via the syscall ABI and can be useful to track"

* tag 'sched-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Provide slice length for fair tasks
  sched/fair: Fix inaccurate h_nr_runnable accounting with delayed dequeue
</content>
</entry>
<entry>
<title>sched/debug: Provide slice length for fair tasks</title>
<updated>2025-01-31T09:45:33Z</updated>
<author>
<name>Christian Loehle</name>
<email>christian.loehle@arm.com</email>
</author>
<published>2025-01-29T17:59:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9065ce69754dece78606c8bbb3821449272e56bf'/>
<id>urn:sha1:9065ce69754dece78606c8bbb3821449272e56bf</id>
<content type='text'>
Since commit:

  857b158dc5e8 ("sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion")

... we have the userspace per-task tunable slice length, which is
a key parameter that is otherwise difficult to obtain, so provide
it in /proc/$PID/sched.

[ mingo: Clarified the changelog. ]

Signed-off-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/453349b1-1637-42f5-a7b2-2385392b5956@arm.com
</content>
</entry>
<entry>
<title>sched/debug: Change need_resched warnings to pr_err</title>
<updated>2025-01-13T13:10:23Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2025-01-10T00:24:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8061b9f5e111a3012f8b691e5b75dd81eafbb793'/>
<id>urn:sha1:8061b9f5e111a3012f8b691e5b75dd81eafbb793</id>
<content type='text'>
need_resched warnings, if enabled, are treated as WARNINGs.  If
kernel.panic_on_warn is enabled, then this causes a kernel panic.

It's highly unlikely that a panic is desired for these warnings, only a
stack trace is normally required to debug and resolve.

Thus, switch need_resched warnings to simply be a printk with an
associated stack trace so they are no longer in scope for panic_on_warn.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Madadi Vineeth Reddy &lt;vineethr@linux.ibm.com&gt;
Acked-by: Josh Don &lt;joshdon@google.com&gt;
Link: https://lkml.kernel.org/r/e8d52023-5291-26bd-5299-8bb9eb604929@google.com
</content>
</entry>
<entry>
<title>sched/fair: Rename cfs_rq.nr_running into nr_queued</title>
<updated>2024-12-09T10:48:12Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2024-12-02T17:46:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=736c55a02c477ad31c57ae4c69130f437855e051'/>
<id>urn:sha1:736c55a02c477ad31c57ae4c69130f437855e051</id>
<content type='text'>
Rename cfs_rq.nr_running into cfs_rq.nr_queued which better reflects the
reality as the value includes both the ready to run tasks and the delayed
dequeue tasks.

Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Link: https://lore.kernel.org/r/20241202174606.4074512-10-vincent.guittot@linaro.org
</content>
</entry>
</feed>
