summaryrefslogtreecommitdiffstats
path: root/Documentation/scheduler
AgeCommit message (Collapse)AuthorLines
2026-02-10Merge tag 'bpf-next-7.0' of ↵Linus Torvalds-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support associating BPF program with struct_ops (Amery Hung) - Switch BPF local storage to rqspinlock and remove recursion detection counters which were causing false positives (Amery Hung) - Fix live registers marking for indirect jumps (Anton Protopopov) - Introduce execution context detection BPF helpers (Changwoo Min) - Improve verifier precision for 32bit sign extension pattern (Cupertino Miranda) - Optimize BTF type lookup by sorting vmlinux BTF and doing binary search (Donglin Peng) - Allow states pruning for misc/invalid slots in iterator loops (Eduard Zingerman) - In preparation for ASAN support in BPF arenas teach libbpf to move global BPF variables to the end of the region and enable arena kfuncs while holding locks (Emil Tsalapatis) - Introduce support for implicit arguments in kfuncs and migrate a number of them to new API. This is a prerequisite for cgroup sub-schedulers in sched-ext (Ihor Solodrai) - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen) - Fix ORC stack unwind from kprobe_multi (Jiri Olsa) - Speed up fentry attach by using single ftrace direct ops in BPF trampolines (Jiri Olsa) - Require frozen map for calculating map hash (KP Singh) - Fix lock entry creation in TAS fallback in rqspinlock (Kumar Kartikeya Dwivedi) - Allow user space to select cpu in lookup/update operations on per-cpu array and hash maps (Leon Hwang) - Make kfuncs return trusted pointers by default (Matt Bobrowski) - Introduce "fsession" support where single BPF program is executed upon entry and exit from traced kernel function (Menglong Dong) - Allow bpf_timer and bpf_wq use in all programs types (Mykyta Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei Starovoitov) - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their definition across the tree (Puranjay Mohan) - Allow BPF arena calls from non-sleepable context (Puranjay Mohan) - Improve register id comparison logic in the verifier and extend linked registers with negative offsets (Puranjay Mohan) - In preparation for BPF-OOM introduce kfuncs to access memcg events (Roman Gushchin) - Use CFI compatible destructor kfunc type (Sami Tolvanen) - Add bitwise tracking for BPF_END in the verifier (Tianci Cao) - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou Tang) - Make BPF selftests work with 64k page size (Yonghong Song) * tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits) selftests/bpf: Fix outdated test on storage->smap selftests/bpf: Choose another percpu variable in bpf for btf_dump test selftests/bpf: Remove test_task_storage_map_stress_lookup selftests/bpf: Update task_local_storage/task_storage_nodeadlock test selftests/bpf: Update task_local_storage/recursion test selftests/bpf: Update sk_storage_omem_uncharge test bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} bpf: Support lockless unlink when freeing map or local storage bpf: Prepare for bpf_selem_unlink_nofail() bpf: Remove unused percpu counter from bpf_local_storage_map_free bpf: Remove cgroup local storage percpu counter bpf: Remove task local storage percpu counter bpf: Change local_storage->lock and b->lock to rqspinlock bpf: Convert bpf_selem_unlink to failable bpf: Convert bpf_selem_link_map to failable bpf: Convert bpf_selem_unlink_map to failable bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage selftests/xsk: fix number of Tx frags in invalid packet selftests/xsk: properly handle batch ending in the middle of a packet bpf: Prevent reentrance into call_rcu_tasks_trace() ...
2026-02-09Merge tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linuxLinus Torvalds-7/+0
Pull documentation updates from Jonathan Corbet: "A slightly calmer cycle for docs this time around, though there is still a fair amount going on, including: - Some signs of life on the long-moribund Japanese translation - Documentation on policies around the use of generative tools for patch submissions, and a separate document intended for consumption by generative tools - The completion of the move of the documentation tools to tools/docs. For now we're leaving a /scripts/kernel-doc symlink behind to avoid breaking scripts - Ongoing build-system work includes the incorporation of documentation in Python code, better support for documenting variables, and lots of improvements and fixes - Automatic linking of man-page references -- cat(1), for example -- to the online pages in the HTML build ...and the usual array of typo fixes and such" * tag 'docs-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (107 commits) doc: development-process: add notice on testing tools: sphinx-build-wrapper: improve its help message docs: sphinx-build-wrapper: allow -v override -q docs: kdoc: Fix pdfdocs build for tools docs: ja_JP: process: translate 'Obtain a current source tree' docs: fix 're-use' -> 'reuse' in documentation docs: ioctl-number: fix a typo in ioctl-number.rst docs: filesystems: ensure proc pid substitutable is complete docs: automarkup.py: Skip common English words as C identifiers Documentation: use a source-read extension for the index link boilerplate docs: parse_features: make documentation more consistent docs: add parse_features module documentation docs: jobserver: do some documentation improvements docs: add jobserver module documentation docs: kabi: helpers: add documentation for each "enum" value docs: kabi: helpers: add helper for debug bits 7 and 8 docs: kabi: system_symbols: end docstring phrases with a dot docs: python: abi_regex: do some improvements at documentation docs: python: abi_parser: do some improvements at documentation docs: add kabi modules documentation ...
2026-01-30Documentation: Fix typos in energy model documentationPatrick Little-4/+4
Fix typos in documentation related to energy model management. Signed-off-by: Patrick Little <plittle@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20260128-documentation-fix-grammar-v1-1-39238dc471f9@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-01-23Documentation: use a source-read extension for the index link boilerplateJani Nikula-7/+0
The root document usually has a special :ref:`genindex` link to the generated index. This is also the case for Documentation/index.rst. The other index.rst files deeper in the directory hierarchy usually don't. For SPHINXDIRS builds, the root document isn't Documentation/index.rst, but some other index.rst in the hierarchy. Currently they have a ".. only::" block to add the index link when doing SPHINXDIRS html builds. This is obviously very tedious and repetitive. The link is also added to all index.rst files in the hierarchy for SPHINXDIRS builds, not just the root document. Put the boilerplate in a sphinx-includes/subproject-index.rst file, and include it at the end of the root document for subproject builds in an ad-hoc source-read extension defined in conf.py. For now, keep having the boilerplate in translations, because this approach currently doesn't cover translated index link headers. Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Tested-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> [jc: did s/doctree/kern_doc_dir/ ] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260123143149.2024303-1-jani.nikula@intel.com>
2025-12-19lib/Kconfig.debug: Set the minimum required pahole version to v1.22Ihor Solodrai-1/+0
Subsequent patches in the series change vmlinux linking scripts to unconditionally pass --btf_encode_detached to pahole, which was introduced in v1.22 [1][2]. This change allows to remove PAHOLE_HAS_SPLIT_BTF Kconfig option and other checks of older pahole versions. [1] https://github.com/acmel/dwarves/releases/tag/v1.22 [2] https://lore.kernel.org/bpf/cbafbf4e-9073-4383-8ee6-1353f9e5869c@oracle.com/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Nicolas Schier <nsc@kernel.org> Link: https://lore.kernel.org/bpf/20251219181825.1289460-1-ihor.solodrai@linux.dev
2025-07-31Merge tag 'sched_ext-for-6.17' of ↵Linus Torvalds-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - Add support for cgroup "cpu.max" interface - Code organization cleanup so that ext_idle.c doesn't depend on the source-file-inclusion build method of sched/ - Drop UP paths in accordance with sched core changes - Documentation and other misc changes * tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Fix scx_bpf_reenqueue_local() reference sched_ext: Drop kfuncs marked for removal in 6.15 sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments sched_ext: Add support for cgroup bandwidth control interface sched_ext, sched/core: Factor out struct scx_task_group sched_ext: Return NULL in llc_span sched_ext: Always use SMP versions in kernel/sched/ext_idle.h sched_ext: Always use SMP versions in kernel/sched/ext_idle.c sched_ext: Always use SMP versions in kernel/sched/ext.h sched_ext: Always use SMP versions in kernel/sched/ext.c sched_ext: Documentation: Clarify time slice handling in task lifecycle sched_ext: Make scx_locked_rq() inline sched_ext: Make scx_rq_bypassing() inline sched_ext: idle: Make local functions static in ext_idle.c sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
2025-06-09docs/sched: Make the sched-stats documentation consistentSwapnil Sapkal-22/+31
pull_task(), the original function to move the task from src_rq to the dst_rq during load balancing was renamed to move_tasks() in commit ddcdf6e7d991 ("sched: Rename load-balancing fields") As a part of commit 163122b7fcfa ("sched/fair: Remove double_lock_balance() from load_balance()"), move_task() was broken down into detach_tasks() and attach_tasks() pair to avoid holding locks of both src_rq and dst_rq at the same time during load balancing. Despite the evolution of pull_task() over the years, the sched-stats documentation remained unchanged. Update the documentation to refer to detach_task() instead of pull_task() which is responsible for removing the task from the src_rq during load balancing. commit 1c055a0f5d3b ("sched: Move sched domain name out of CONFIG_SCHED_DEBUG") moves sched domain name out of CONFIG_SCHED_DEBUG. Update the documentation related to that. Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250430062559.1188661-1-swapnil.sapkal@amd.com
2025-06-09sched_deadline, docs: add affinity setting with cgroup2 cpuset controllerShashank Balaji-6/+23
Setting the cpu affinity mask of a SCHED_DEADLINE process using the cgroup v1 cpuset controller is already detailed. Add similar information for cgroup v2's cpuset controller. Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250527-sched-deadline-cpu-affinity-v2-2-b8b40a4feefa@sony.com
2025-06-09sched_deadline, docs: replace rt-app examples with chrt or use config.jsonShashank Balaji-22/+34
rt-app no longer accepts command-line arguments. So, replace rt-app example with chrt and use the JSON format in the other example instead of command- line arguments. Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250527-sched-deadline-cpu-affinity-v2-1-b8b40a4feefa@sony.com
2025-06-09sched_ext: Documentation: Clarify time slice handling in task lifecycleAndrea Righi-3/+8
It is not always obvious how a task's time slice can be refilled, either explicitly from ops.dispatch() or automatically by the sched_ext core, to skip subsequent ops.enqueue() and ops.dispatch() calls. This typically happens when the task is the only one running on a CPU. To make this behavior easier to understand, update the task lifecycle diagram to explicitly document how time slice handling works in such cases. Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-27Merge tag 'sched_ext-for-6.16' of ↵Linus Torvalds-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - More in-kernel idle CPU selection improvements. Expand topology awareness coverage add scx_bpf_select_cpu_and() to allow more flexibility. The idle CPU selection kfuncs can now be called from unlocked contexts too. - A bunch of reorganization changes to lay the foundation for multiple hierarchical scheduler support. This isn't ready yet and the included changes don't make meaningful behavior differences. One notable change is replacing some static_key tests with dynamic tests as the test results may differ depending on the scheduler instance. This isn't expected to cause meaningful performance difference. - Other minor and doc updates. - There were multiple patches in for-6.15-fixes which conflicted with changes in for-6.16. for-6.15-fixes were pulled three times into for-6.16 to resolve the conflicts. * tag 'sched_ext-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (49 commits) sched_ext: Call ops.update_idle() after updating builtin idle bits sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler" selftests/sched_ext: Update test enq_select_cpu_fails sched_ext: idle: Consolidate default idle CPU selection kfuncs selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and() sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c sched_ext, docs: add label sched_ext: Explain the temporary situation around scx_root dereferences sched_ext: Add @sch to SCX_CALL_OP*() sched_ext: Cleanup [__]scx_exit/error*() sched_ext: Add @sch to SCX_CALL_OP*() sched_ext: Clean up scx_root usages Documentation: scheduler: Changed lowercase acronyms to uppercase sched_ext: Avoid NULL scx_root deref in __scx_exit() sched_ext: Add RCU protection to scx_root in DSQ iterator sched_ext: Clean up SCX_EXIT_NONE handling in scx_disable_workfn() sched_ext: Move disable machinery into scx_sched sched_ext: Move event_stats_cpu into scx_sched ...
2025-05-22sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler"Shashank Balaji-4/+4
Mentions of CFS are stale since the fair-class scheduler is implemented using EEVDF. So, convert such mentions to "fair-class scheduler" to stay algorithm-name agnostic. Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-21Documentation/scheduler: Fix typo in sched-stats domain field descriptionMadadi Vineeth Reddy-1/+1
Fixes a typo in the description of the 23rd field of the scheduling domain statistics, which was missing the word "cpu". Fixes: 7c8cd569ff66 ("docs: Update Schedstat version to 17") Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20250520100752.39921-1-vineethr@linux.ibm.com>
2025-05-20sched_ext, docs: add labelShashank Balaji-0/+2
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-13Documentation: scheduler: Changed lowercase acronyms to uppercaseJake Rice-2/+2
Everywhere else in this doc, the dispatch queue acronym (DSQ) is uppercase. There were a couple places where the acronym was written in lowercase. I changed them to uppercase to make it homogeneous. Signed-off-by: Jake Rice <jake@jakerice.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-24Merge tag 'sched-core-2025-03-22' of ↵Linus Torvalds-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Cancel the slice protection of the idle entity (Zihan Zhou) - Reduce the default slice to avoid tasks getting an extra tick (Zihan Zhou) - Force propagating min_slice of cfs_rq when {en,de}queue tasks (Tianchen Ding) - Refactor can_migrate_task() to elimate looping (I Hsin Cheng) - Add unlikey branch hints to several system calls (Colin Ian King) - Optimize current_clr_polling() on certain architectures (Yujun Dong) Deadline scheduler: (Juri Lelli) - Remove redundant dl_clear_root_domain call - Move dl_rebuild_rd_accounting to cpuset.h Uclamp: - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan) - Optimize sched_uclamp_used static key enabling (Xuewen Yan) Scheduler topology support: (Juri Lelli) - Ignore special tasks when rebuilding domains - Add wrappers for sched_domains_mutex - Generalize unique visiting of root domains - Rebuild root domain accounting after every update - Remove partition_and_rebuild_sched_domains - Stop exposing partition_sched_domains_locked RSEQ: (Michael Jeanson) - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y - Fix segfault on registration when rseq_cs is non-zero - selftests: Add rseq syscall errors test - selftests: Ensure the rseq ABI TLS is actually 1024 bytes Membarriers: - Fix redundant load of membarrier_state (Nysal Jan K.A.) Scheduler debugging: - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior) - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar) Fixes and cleanups: - Always save/restore x86 TSC sched_clock() on suspend/resume (Guilherme G. Piccoli) - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian Andrzej Siewior)" * tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling() sched/debug: Remove CONFIG_SCHED_DEBUG sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional sched/debug: Make 'const_debug' tunables unconditional __read_mostly sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() rseq/selftests: Fix namespace collision with rseq UAPI header include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h sched/topology: Stop exposing partition_sched_domains_locked cgroup/cpuset: Remove partition_and_rebuild_sched_domains sched/topology: Remove redundant dl_clear_root_domain call sched/deadline: Rebuild root domain accounting after every update sched/deadline: Generalize unique visiting of root domains sched/topology: Wrappers for sched_domains_mutex sched/deadline: Ignore special tasks when rebuilding domains tracing: Use preempt_model_str() xtensa: Rely on generic printing of preemption model x86: Rely on generic printing of preemption model s390: Rely on generic printing of preemption model ...
2025-03-24Merge tag 'docs-6.15' of git://git.lwn.net/linuxLinus Torvalds-1/+1
Pull documentation updates from Jonathan Corbet: "It has been a reasonably busy cycle for docs... - Significant changes throughout the tree to bring Python code up to current standards and raise the minimum Python required to 3.9 Much of this is preparatory to replacing the ancient Perl scripts/kernel-doc horror with a slightly less horrifying Python implementation, expected for 6.16 - Update the minimum Sphinx required to 3.4.3, allowing us to remove a bunch of older compatibility code - Rework and improve the generation of the ABI documentation (All of the above done by Mauro) - Lots of translation updates. Alex Shi and Yanteng Si are taking on responsibility for the Chinese translations going forward; that work will still get to you via docs-next - Try to standardize the format for indicating a developer's affiliation in commit tags - Clarify the TAB's role in CoC enforcement actions - Try to spell out the rules for when a commit tag can name another developer without their explicit permission Plus lots of other typo fixes and updates" * tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits) docs/zh_CN: fix spelling mistake docs/Chinese: change the disclaimer words docs/zh_CN: Add snp-tdx-threat-model index Chinese translation docs: driver-api: firmware: clarify userspace requirements docs: clarify rules wrt tagging other people docs: Remove outdated highuid.rst documentation Documentation: dma-buf: heaps: Add heap name definitions docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README docs: Correct installation instruction Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst Documentation/CoC: Spell out the TAB role in enforcement decisions Documentation: ocxl.rst: Update consortium site scripts: get_feat.pl: substitute s390x with s390 scripts/kernel-doc: drop dead code for Wcontents_before_sections scripts/kernel-doc: don't add not needed new lines docs: driver-api/infiniband.rst: fix Kerneldoc markup drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup include/asm-generic/io.h: fix kerneldoc markup Docs/arch/arm64: Fix spelling in amu.rst ...
2025-03-24Merge tag 'sched_ext-for-6.15' of ↵Linus Torvalds-0/+36
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - Add mechanism to count and report internal events. This significantly improves visibility on subtle corner conditions. - The default idle CPU selection logic is revamped and improved in multiple ways including being made topology aware. - sched_ext was disabling ttwu_queue for simplicity, which can be costly when hardware topology is more complex. Implement SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively enable ttwu_queue. - tools/sched_ext updates to improve compatibility among others. - Other misc updates and fixes. - sched_ext/for-6.14-fixes were pulled a few times to receive prerequisite fixes and resolve conflicts. * tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits) sched_ext: idle: Refactor scx_select_cpu_dfl() sched_ext: idle: Honor idle flags in the built-in idle selection policy sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local() sched_ext: Add trace point to track sched_ext core events sched_ext: Change the event type from u64 to s64 sched_ext: Documentation: add task lifecycle summary tools/sched_ext: Provide a compatible helper for scx_bpf_events() selftests/sched_ext: Add NUMA-aware scheduler test tools/sched_ext: Provide consistent access to scx flags sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior sched_ext: idle: Introduce scx_bpf_nr_node_ids() sched_ext: idle: Introduce node-aware idle cpu kfunc helpers sched_ext: idle: Per-node idle cpumasks sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE sched_ext: idle: Make idle static keys private sched/topology: Introduce for_each_node_numadist() iterator mm/numa: Introduce nearest_node_nodemask() nodemask: numa: reorganize inclusion path nodemask: add nodes_copy() tools/sched_ext: Sync with scx repo ...
2025-03-19sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from ↵Ingo Molnar-8/+6
documentation Since it's enabled unconditionally now, remove all references to it. (Left out languages I cannot read.) Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250317104257.3496611-5-mingo@kernel.org
2025-03-06sched/rt: Update limit of sched_rt sysctl in documentationShrikanth Hegde-0/+3
By default fair_server dl_server allocates 5% of the bandwidth to the root domain. Due to this writing any value less than 5% fails due to -EBUSY: $ cat /proc/sys/kernel/sched_rt_period_us 1000000 $ echo 49999 > /proc/sys/kernel/sched_rt_runtime_us -bash: echo: write error: Device or resource busy $ echo 50000 > /proc/sys/kernel/sched_rt_runtime_us $ Since the sched_rt_runtime_us allows -1 as the minimum, put this restriction in the documentation. One should check average of runtime/period in /sys/kernel/debug/sched/fair_server/cpuX/* for exact value. Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20250306052954.452005-3-sshegde@linux.ibm.com
2025-02-27sched_ext: Documentation: add task lifecycle summaryAndrea Righi-0/+36
Understanding the lifecycle of a task in sched_ext can be not trivial, therefore add a section to the main documentation that summarizes the entire workflow of a task using pseudo-code. Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-21docs: scheduler: fix spelling in sched-bwc documentationSuchit Karunakaran-1/+1
Fix spelling of "interference" in the CFS bandwidth control documentation. The word was misspelled as "interferenece". Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250219112254.28691-1-suchitkarunakaran@gmail.com
2025-01-23Merge tag 'sched_ext-for-6.14' of ↵Linus Torvalds-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - scx_bpf_now() added so that BPF scheduler can access the cached timestamp in struct rq to avoid reading TSC multiple times within a locked scheduling operation. - Minor updates to the built-in idle CPU selection logic. - tool/sched_ext updates and other misc changes. * tag 'sched_ext-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: fix kernel-doc warnings sched_ext: Use time helpers in BPF schedulers sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now() sched_ext: Add time helpers for BPF schedulers sched_ext: Add scx_bpf_now() for BPF scheduler sched_ext: Implement scx_bpf_now() sched_ext: Relocate scx_enabled() related code sched_ext: Add option -l in selftest runner to list all available tests sched_ext: Include remaining task time slice in error state dump sched_ext: update scx_bpf_dsq_insert() doc for SCX_DSQ_LOCAL_ON sched_ext: idle: small CPU iteration refactoring sched_ext: idle: introduce check_builtin_idle_enabled() helper sched_ext: idle: clarify comments sched_ext: idle: use assign_cpu() to update the idle cpumask sched_ext: Use str_enabled_disabled() helper in update_selcpu_topology() sched_ext: Use sizeof_field for key_len in dsq_hash_params tools/sched_ext: Receive updates from SCX repo sched_ext: Use the NUMA scheduling domain for NUMA optimizations
2025-01-21Merge tag 'docs-6.14' of git://git.lwn.net/linuxLinus Torvalds-13/+14
Pull Documentation updates from Jonathan Corbet: - Quite a bit of Chinese and Spanish translation work - Clarifying that Git commit IDs >12chars are OK - A new nvme-multipath document - A reorganization of the admin-guide top-level page to make it readable - Clarification of the role of Acked-by and maintainer discretion on their acceptance - Some reorganization of debugging-oriented docs ... and typo fixes, documentation updates, etc as usual * tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits) Documentation: Fix x86_64 UEFI outdated references to elilo Documentation/sysctl: Add timer_migration to kernel.rst docs/mm: Physical memory: Remove zone_t docs: submitting-patches: clarify that signers may use their discretion on tags docs: submitting-patches: clarify difference between Acked-by and Reviewed-by docs: submitting-patches: clarify Acked-by and introduce "# Suffix" Documentation: bug-hunting.rst: remove odd contact information docs/zh_CN: Add sak index Chinese translation doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes doc: module: Fix documented type of namespace Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst docs/zh_CN: Add landlock index Chinese translation Documentation: Fix typo localmodonfig -> localmodconfig overlayfs.rst: Fix and improve grammar docs/zh_CN: Add siphash index Chinese translation docs/zh_CN: Add security IMA-templates Chinese translation docs/zh_CN: Add security digsig Chinese translation Align git commit ID abbreviation guidelines and checks docs: process: submitting-patches: split canonical patch format section docs/zh_CN: Add security lsm Chinese translation ...
2025-01-06sched_ext: update scx_bpf_dsq_insert() doc for SCX_DSQ_LOCAL_ONAndrea Righi-3/+3
With commit 5b26f7b920f7 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches"), scx_bpf_dsq_insert() can use SCX_DSQ_LOCAL_ON for direct dispatch from ops.enqueue() to target the local DSQ of any CPU. Update the documentation accordingly. Fixes: 5b26f7b920f7 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-12-20docs: Update Schedstat version to 17Swapnil Sapkal-51/+75
Update the Schedstat version to 17 as more fields are added to report different kinds of imbalances in the sched domain. Also domain field started printing corresponding domain name. Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241220063224.17767-7-swapnil.sapkal@amd.com
2024-12-13Documentation: sched/RT: Update paragraphs about RT bandwidth controlMichal Koutný-10/+11
This has slightly changed with the introduction of fair_server. Update the most relevant parts. Link: https://lore.kernel.org/r/Z0c8S8i3qt7SEU14@jlelli-thinkpadt14gen4.remote.csb/ Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241211170052.2449581-1-mkoutny@suse.com
2024-12-11Documentation: remove :kyb: tagsCengiz Can-3/+3
:kyb: is an extra markup that we should avoid when we can. It worsens the plain-text reading experience and adds very little value to rendered views. Remove all :kbd: tags from Documentation/* Signed-off-by: Cengiz Can <cengiz@kernel.wtf> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241202090514.1716-1-cengiz@kernel.wtf
2024-11-11sched_ext: Rename scx_bpf_consume() to scx_bpf_dsq_move_to_local()Tejun Heo-11/+10
In sched_ext API, a repeatedly reported pain point is the overuse of the verb "dispatch" and confusion around "consume": - ops.dispatch() - scx_bpf_dispatch[_vtime]() - scx_bpf_consume() - scx_bpf_dispatch[_vtime]_from_dsq*() This overloading of the term is historical. Originally, there were only built-in DSQs and moving a task into a DSQ always dispatched it for execution. Using the verb "dispatch" for the kfuncs to move tasks into these DSQs made sense. Later, user DSQs were added and scx_bpf_dispatch[_vtime]() updated to be able to insert tasks into any DSQ. The only allowed DSQ to DSQ transfer was from a non-local DSQ to a local DSQ and this operation was named "consume". This was already confusing as a task could be dispatched to a user DSQ from ops.enqueue() and then the DSQ would have to be consumed in ops.dispatch(). Later addition of scx_bpf_dispatch_from_dsq*() made the confusion even worse as "dispatch" in this context meant moving a task to an arbitrary DSQ from a user DSQ. Clean up the API with the following renames: 1. scx_bpf_dispatch[_vtime]() -> scx_bpf_dsq_insert[_vtime]() 2. scx_bpf_consume() -> scx_bpf_dsq_move_to_local() 3. scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*() This patch performs the second rename. Compatibility is maintained by: - The previous kfunc names are still provided by the kernel so that old binaries can run. Kernel generates a warning when the old names are used. - compat.bpf.h provides wrappers for the new names which automatically fall back to the old names when running on older kernels. They also trigger build error if old names are used for new builds. The compat features will be dropped after v6.15. v2: Comment and documentation updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Johannes Bechberger <me@mostlynerdless.de> Acked-by: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Dan Schatzberg <dschatzberg@meta.com> Cc: Ming Yang <yougmark94@gmail.com>
2024-11-11sched_ext: Rename scx_bpf_dispatch[_vtime]() to scx_bpf_dsq_insert[_vtime]()Tejun Heo-25/+25
In sched_ext API, a repeatedly reported pain point is the overuse of the verb "dispatch" and confusion around "consume": - ops.dispatch() - scx_bpf_dispatch[_vtime]() - scx_bpf_consume() - scx_bpf_dispatch[_vtime]_from_dsq*() This overloading of the term is historical. Originally, there were only built-in DSQs and moving a task into a DSQ always dispatched it for execution. Using the verb "dispatch" for the kfuncs to move tasks into these DSQs made sense. Later, user DSQs were added and scx_bpf_dispatch[_vtime]() updated to be able to insert tasks into any DSQ. The only allowed DSQ to DSQ transfer was from a non-local DSQ to a local DSQ and this operation was named "consume". This was already confusing as a task could be dispatched to a user DSQ from ops.enqueue() and then the DSQ would have to be consumed in ops.dispatch(). Later addition of scx_bpf_dispatch_from_dsq*() made the confusion even worse as "dispatch" in this context meant moving a task to an arbitrary DSQ from a user DSQ. Clean up the API with the following renames: 1. scx_bpf_dispatch[_vtime]() -> scx_bpf_dsq_insert[_vtime]() 2. scx_bpf_consume() -> scx_bpf_dsq_move_to_local() 3. scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*() This patch performs the first set of renames. Compatibility is maintained by: - The previous kfunc names are still provided by the kernel so that old binaries can run. Kernel generates a warning when the old names are used. - compat.bpf.h provides wrappers for the new names which automatically fall back to the old names when running on older kernels. They also trigger build error if old names are used for new builds. The compat features will be dropped after v6.15. v2: Documentation updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Johannes Bechberger <me@mostlynerdless.de> Acked-by: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Dan Schatzberg <dschatzberg@meta.com> Cc: Ming Yang <yougmark94@gmail.com>
2024-10-08sched_ext: Documentation: Update instructions for running example schedulersDevaansh-Kumar-1/+1
Since the artifact paths for tools changed, we need to update the documentation to reflect that path. Signed-off-by: Devaansh-Kumar <devaanshk840@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-23sched_ext: Provide a sysfs enable_seq counterAndrea Righi-0/+10
As discussed during the distro-centric session within the sched_ext Microconference at LPC 2024, introduce a sequence counter that is incremented every time a BPF scheduler is loaded. This feature can help distributions in diagnosing potential performance regressions by identifying systems where users are running (or have ran) custom BPF schedulers. Example: arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq 0 arighi@virtme-ng~> sudo scx_simple local=1 global=0 ^CEXIT: unregistered from user space arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq 1 In this way user-space tools (such as Ubuntu's apport and similar) are able to gather and include this information in bug reports. Cc: Giovanni Gherdovich <giovanni.gherdovich@suse.com> Cc: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Cc: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Cc: Phil Auld <pauld@redhat.com> Signed-off-by: Andrea Righi <andrea.righi@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-21Merge tag 'sched_ext-for-6.12' of ↵Linus Torvalds-0/+317
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext support from Tejun Heo: "This implements a new scheduler class called ‘ext_sched_class’, or sched_ext, which allows scheduling policies to be implemented as BPF programs. The goals of this are: - Ease of experimentation and exploration: Enabling rapid iteration of new scheduling policies. - Customization: Building application-specific schedulers which implement policies that are not applicable to general-purpose schedulers. - Rapid scheduler deployments: Non-disruptive swap outs of scheduling policies in production environments" See individual commits for more documentation, but also the cover letter for the latest series: Link: https://lore.kernel.org/all/20240618212056.2833381-1-tj@kernel.org/ * tag 'sched_ext-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (110 commits) sched: Move update_other_load_avgs() to kernel/sched/pelt.c sched_ext: Don't trigger ops.quiescent/runnable() on migrations sched_ext: Synchronize bypass state changes with rq lock scx_qmap: Implement highpri boosting sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq() sched_ext: Compact struct bpf_iter_scx_dsq_kern sched_ext: Replace consume_local_task() with move_local_task_to_local_dsq() sched_ext: Move consume_local_task() upward sched_ext: Move sanity check and dsq_mod_nr() into task_unlink_from_dsq() sched_ext: Reorder args for consume_local/remote_task() sched_ext: Restructure dispatch_to_local_dsq() sched_ext: Fix processs_ddsp_deferred_locals() by unifying DTL_INVALID handling sched_ext: Make find_dsq_for_dispatch() handle SCX_DSQ_LOCAL_ON sched_ext: Refactor consume_remote_task() sched_ext: Rename scx_kfunc_set_sleepable to unlocked and relocate sched_ext: Add missing static to scx_dump_data sched_ext: Add missing static to scx_has_op[] sched_ext: Temporarily work around pick_task_scx() being called without balance_scx() sched_ext: Add a cgroup scheduler which uses flattened hierarchy sched_ext: Add cgroup support ...
2024-09-19Merge tag 'sched-core-2024-09-19' of ↵Linus Torvalds-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot de Oliveira's last major contribution to the kernel: "SCHED_DEADLINE servers can help fixing starvation issues of low priority tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU cycles. Today we have RT Throttling; DEADLINE servers should be able to replace and improve that." (Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef Esmat, Huang Shijie) - Preparatory changes for sched_ext integration: - Use set_next_task(.first) where required - Fix up set_next_task() implementations - Clean up DL server vs. core sched - Split up put_prev_task_balance() - Rework pick_next_task() - Combine the last put_prev_task() and the first set_next_task() - Rework dl_server - Add put_prev_task(.next) (Peter Zijlstra, with a fix by Tejun Heo) - Complete the EEVDF transition and refine EEVDF scheduling: - Implement delayed dequeue - Allow shorter slices to wakeup-preempt - Use sched_attr::sched_runtime to set request/slice suggestion - Document the new feature flags - Remove unused and duplicate-functionality fields - Simplify & unify pick_next_task_fair() - Misc debuggability enhancements (Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin Schneider and Chuyi Zhou) - Initialize the vruntime of a new task when it is first enqueued, resulting in significant decrease in latency of newly woken tasks (Zhang Qiao) - Introduce SM_IDLE and an idle re-entry fast-path in __schedule() (K Prateek Nayak, Peter Zijlstra) - Clean up and clarify the usage of Clean up usage of rt_task() (Qais Yousef) - Preempt SCHED_IDLE entities in strict cgroup hierarchies (Tianchen Ding) - Clarify the documentation of time units for deadline scheduler parameters (Christian Loehle) - Remove the HZ_BW chicken-bit feature flag introduced a year ago, the original change seems to be working fine (Phil Auld) - Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie, Peilin He, Qais Yousefm and Vincent Guittot) * tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched/cpufreq: Use NSEC_PER_MSEC for deadline task cpufreq/cppc: Use NSEC_PER_MSEC for deadline task sched/deadline: Clarify nanoseconds in uapi sched/deadline: Convert schedtool example to chrt sched/debug: Fix the runnable tasks output sched: Fix sched_delayed vs sched_core kernel/sched: Fix util_est accounting for DELAY_DEQUEUE kthread: Fix task state in kthread worker if being frozen sched/pelt: Use rq_clock_task() for hw_pressure sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule() sched: Add put_prev_task(.next) sched: Rework dl_server sched: Combine the last put_prev_task() and the first set_next_task() sched: Rework pick_next_task() sched: Split up put_prev_task_balance() sched: Clean up DL server vs core sched sched: Fixup set_next_task() implementations sched: Use set_next_task(.first) where required sched/fair: Properly deactivate sched_delayed task upon class change ...
2024-09-11Merge branch 'tip/sched/core' into sched_ext/for-6.12Tejun Heo-8/+6
Pull in tip/sched/core to resolve two merge conflicts: - 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") 5d871a63997f ("sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c") A simple context conflict. The former added __update_blocked_others() in the same #ifdef CONFIG_SMP block that effective_cpu_util() and sched_cpu_util() are in and the latter moved those functions to fair.c. This makes __update_blocked_others() more out of place. Will follow up with a patch to relocate. - 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") 84d265281d6c ("sched/pelt: Use rq_clock_task() for hw_pressure") The former factored out the body of __update_blocked_others() into update_other_load_avgs(). The latter changed how update_hw_load_avg() is called in the body. Resolved by applying the change to update_other_load_avgs() instead. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-11sched/deadline: Convert schedtool example to chrtChristian Loehle-8/+6
chrt has SCHED_DEADLINE support so convert the example instead of relying on a schedtool fork. While at it fix the wrong mentioning of microseconds, it was nanoseconds for both schedtool and chrt. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20240813144348.1180344-2-christian.loehle@arm.com
2024-09-05docs: scheduler: completion: Update member of struct completionI Hsin Cheng-1/+1
The member "wait" in struct completion isn't of type wait_queue_head_t anymore, as it is now "struct swait_queue_head", fix it to match with the current implementation. Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240828165036.178011-1-richard120310@gmail.com
2024-08-07docs: scheduler: Start documenting the EEVDF schedulerCarlos Bilbao-4/+50
Add some documentation regarding the newly introduced scheduler EEVDF. Signed-off-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240720002207.444286-2-carlos.bilbao.osdev@gmail.com
2024-07-30Merge tag 'v6.11-rc1' into for-6.12Tejun Heo-0/+2
Linux 6.11-rc1
2024-07-09docs/sp_SP: Add translation for scheduler/sched-design-CFS.rstSergio González Collado-0/+2
Translate Documentation/scheduler/sched-design-CFS.rst into Spanish Signed-off-by: Sergio González Collado <sergio.collado@gmail.com> Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240707195047.14359-1-sergio.collado@gmail.com
2024-07-02sched_ext: Documentation: Remove mentions of scx_bpf_switch_allAboorva Devarajan-11/+13
Updated sched_ext doc to eliminate references to scx_bpf_switch_all, which has been removed in recent sched_ext versions. Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-06-18sched_ext: Documentation: scheduler: Document extensible scheduler classTejun Heo-0/+315
Add Documentation/scheduler/sched-ext.rst which gives a high-level overview and pointers to the examples. v6: - Add paragraph explaining debug dump. v5: - Updated to reflect /sys/kernel interface change. Kconfig options added. v4: - README improved, reformatted in markdown and renamed to README.md. v3: - Added tools/sched_ext/README. - Dropped _example prefix from scheduler names. v2: - Apply minor edits suggested by Bagas. Caveats section dropped as all of them are addressed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com>
2024-03-25Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branchIngo Molnar-0/+43
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-03-24Merge tag 'sched-urgent-2024-03-24' of ↵Linus Torvalds-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler doc clarification from Thomas Gleixner: "A single update for the documentation of the base_slice_ns tunable to clarify that any value which is less than the tick slice has no effect because the scheduler tick is not guaranteed to happen within the set time slice" * tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
2024-03-21sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relationMukesh Kumar Chaurasiya-0/+3
The tunable base_slice_ns is dependent on CONFIG_HZ (i.e. TICK_NSEC) for any significant performance improvement. The reason being the scheduler tick is not frequent enough to force preemption when base_slice expires in case of: base_slice_ns < TICK_NSEC The below data is of stress-ng: Number of CPU: 1 Stressor threads: 4 Time: 30sec On CONFIG_HZ=1000 | base_slice | avg-run (msec) | context-switches | | ---------- | -------------- | ---------------- | | 3ms | 2.914 | 10342 | | 6ms | 4.857 | 6196 | | 9ms | 6.754 | 4482 | | 12ms | 7.872 | 3802 | | 22ms | 11.294 | 2710 | | 32ms | 13.425 | 2284 | On CONFIG_HZ=100 | base_slice | avg-run (msec) | context-switches | | ---------- | -------------- | ---------------- | | 3ms | 9.144 | 3337 | | 6ms | 9.113 | 3301 | | 9ms | 8.991 | 3315 | | 12ms | 12.935 | 2328 | | 22ms | 16.031 | 1915 | | 32ms | 18.608 | 1622 | base_slice: the value of base_slice in ms avg-run (msec): average time of the stressor threads got on cpu before it got preempted context-switches: number of context switches for the stress-ng process Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240320173815.927637-2-mchauras@linux.ibm.com
2024-03-12sched/balancing: Rename load_balance() => sched_balance_rq()Ingo Molnar-18/+18
Standardize scheduler load-balancing function names on the sched_balance_() prefix. Also load_balance() has become somewhat of a misnomer: historically it was the first and primary load-balancing function that was called, but with the introduction of sched domains, it's become a lower layer function that balances runqueues. Rename it to sched_balance_rq() accordingly. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-6-mingo@kernel.org
2024-03-12sched/balancing: Rename rebalance_domains() => sched_balance_domains()Ingo Molnar-1/+1
Standardize scheduler load-balancing function names on the sched_balance_() prefix. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-5-mingo@kernel.org
2024-03-12sched/balancing: Rename trigger_load_balance() => sched_balance_trigger()Ingo Molnar-1/+1
Standardize scheduler load-balancing function names on the sched_balance_() prefix. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-4-mingo@kernel.org
2024-03-12sched/balancing: Rename scheduler_tick() => sched_tick()Ingo Molnar-2/+2
- Standardize on prefixing scheduler-internal functions defined in <linux/sched.h> with sched_*() prefix. scheduler_tick() was the only function using the scheduler_ prefix. Harmonize it. - The other reason to rename it is the NOHZ scheduler tick handling functions are already named sched_tick_*(). Make the 'git grep sched_tick' more meaningful. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-3-mingo@kernel.org
2024-03-12sched/balancing: Rename run_rebalance_domains() => sched_balance_softirq()Ingo Molnar-1/+1
run_rebalance_domains() is a misnomer, as it doesn't only run rebalance_domains(), but since the introduction of the NOHZ code it also runs nohz_idle_balance(). Rename it to sched_balance_softirq(), reflecting its more generic purpose and that it's a softirq handler. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20240308111819.1101550-2-mingo@kernel.org