<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sched, branch v6.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-09-09T19:08:40Z</updated>
<entry>
<title>Merge tag 'driver-core-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core</title>
<updated>2022-09-09T19:08:40Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-09-09T19:08:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e35be05d748a1b82c0bd3f62dafbad859a3bd027'/>
<id>urn:sha1:e35be05d748a1b82c0bd3f62dafbad859a3bd027</id>
<content type='text'>
Pull driver core fixes from Greg KH:
 "Here are some small driver core and debugfs fixes for 6.0-rc5.

  Included in here are:

   - multiple attempts to get the arch_topology code to work properly on
     non-cluster SMT systems. First attempt caused build breakages in
     linux-next and 0-day, second try worked.

   - debugfs fixes for a long-suffering memory leak. The pattern of
     debugfs_remove(debugfs_lookup(...)) turns out to leak dentries, so
     add debugfs_lookup_and_remove() to fix this problem. Also fix up
     the scheduler debug code that highlighted this problem. Fixes for
     other subsystems will be trickling in over the next few months for
     this same issue once the debugfs function is merged.

  All of these have been in linux-next since Wednesday with no reported
  problems"

* tag 'driver-core-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  arch_topology: Make cluster topology span at least SMT CPUs
  sched/debug: fix dentry leak in update_sched_domain_debugfs
  debugfs: add debugfs_lookup_and_remove()
  driver core: fix driver_set_override() issue with empty strings
  Revert "arch_topology: Make cluster topology span at least SMT CPUs"
  arch_topology: Make cluster topology span at least SMT CPUs
</content>
</entry>
<entry>
<title>sched/debug: fix dentry leak in update_sched_domain_debugfs</title>
<updated>2022-09-05T11:02:38Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2022-09-02T12:31:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c2e406596571659451f4b95e37ddfd5a8ef1d0dc'/>
<id>urn:sha1:c2e406596571659451f4b95e37ddfd5a8ef1d0dc</id>
<content type='text'>
Kuyo reports that the pattern of using debugfs_remove(debugfs_lookup())
leaks a dentry and with a hotplug stress test, the machine eventually
runs out of memory.

Fix this up by using the newly created debugfs_lookup_and_remove() call
instead which properly handles the dentry reference counting logic.

Cc: Major Chen &lt;major.chen@samsung.com&gt;
Cc: stable &lt;stable@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Cc: Valentin Schneider &lt;vschneid@redhat.com&gt;
Cc: Matthias Brugger &lt;matthias.bgg@gmail.com&gt;
Reported-by: Kuyo Chang &lt;kuyo.chang@mediatek.com&gt;
Tested-by: Kuyo Chang &lt;kuyo.chang@mediatek.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220902123107.109274-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>wait_on_bit: add an acquire memory barrier</title>
<updated>2022-08-26T16:30:25Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2022-08-26T13:17:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8238b4579866b7c1bb99883cfe102a43db5506ff'/>
<id>urn:sha1:8238b4579866b7c1bb99883cfe102a43db5506ff</id>
<content type='text'>
There are several places in the kernel where wait_on_bit is not followed
by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read).

On architectures with weak memory ordering, it may happen that memory
accesses that follow wait_on_bit are reordered before wait_on_bit and
they may return invalid data.

Fix this class of bugs by introducing a new function "test_bit_acquire"
that works like test_bit, but has acquire memory ordering semantics.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/psi: Remove unused parameter nbytes of psi_trigger_create()</title>
<updated>2022-08-15T22:35:25Z</updated>
<author>
<name>Hao Jia</name>
<email>jiahao.os@bytedance.com</email>
</author>
<published>2022-08-06T12:05:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=76b079ef4cc954fc2c2e0333a01855b0b2b6bdee'/>
<id>urn:sha1:76b079ef4cc954fc2c2e0333a01855b0b2b6bdee</id>
<content type='text'>
psi_trigger_create()'s 'nbytes' parameter is not used, so we can remove it.

Signed-off-by: Hao Jia &lt;jiahao.os@bytedance.com&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/psi: Zero the memory of struct psi_group</title>
<updated>2022-08-15T22:35:13Z</updated>
<author>
<name>Hao Jia</name>
<email>jiahao.os@bytedance.com</email>
</author>
<published>2022-08-06T12:05:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2b97cf76289a4fcae66d7959b0d74a87207d7068'/>
<id>urn:sha1:2b97cf76289a4fcae66d7959b0d74a87207d7068</id>
<content type='text'>
After commit 5f69a6577bc3 ("psi: dont alloc memory for psi by default"),
the memory used by struct psi_group is no longer allocated and zeroed
in cgroup_create().

Since the memory of struct psi_group is not zeroed, the data in this
memory is random, which will lead to inaccurate psi statistics when
creating a new cgroup.

So we use kzlloc() to allocate and zero the struct psi_group and
remove the redundant zeroing in group_init().

Steps to reproduce:
1. Use cgroup v2 and enable CONFIG_PSI
2. Create a new cgroup, and query psi statistics
mkdir /sys/fs/cgroup/test
cat /sys/fs/cgroup/test/cpu.pressure
some avg10=0.00 avg60=0.00 avg300=47927752200.00 total=12884901
full avg10=561815124.00 avg60=125835394188.00 avg300=1077090462000.00 total=10273561772

cat /sys/fs/cgroup/test/io.pressure
some avg10=1040093132823.95 avg60=1203770351379.21 avg300=3862252669559.46 total=4294967296
full avg10=921884564601.39 avg60=0.00 avg300=1984507298.35 total=442381631

cat /sys/fs/cgroup/test/memory.pressure
some avg10=232476085778.11 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=2585658472280.57 total=12884901

Fixes: commit 5f69a6577bc3 ("psi: dont alloc memory for psi by default")
Signed-off-by: Hao Jia &lt;jiahao.os@bytedance.com&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-08-07T00:34:06Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-08-07T00:34:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cac03ac368fabff0122853de2422d4e17a32de08'/>
<id>urn:sha1:cac03ac368fabff0122853de2422d4e17a32de08</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Various fixes: a deadline scheduler fix, a migration fix, a Sparse fix
  and a comment fix"

* tag 'sched-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Do not requeue task on CPU excluded from cpus_mask
  sched/rt: Fix Sparse warnings due to undefined rt.c declarations
  exit: Fix typo in comment: s/sub-theads/sub-threads
  sched, cpuset: Fix dl_cpu_busy() panic due to empty cs-&gt;cpus_allowed
</content>
</entry>
<entry>
<title>sched/core: Do not requeue task on CPU excluded from cpus_mask</title>
<updated>2022-08-04T09:26:13Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2022-08-04T09:21:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=751d4cbc43879229dbc124afefe240b70fd29a85'/>
<id>urn:sha1:751d4cbc43879229dbc124afefe240b70fd29a85</id>
<content type='text'>
The following warning was triggered on a large machine early in boot on
a distribution kernel but the same problem should also affect mainline.

   WARNING: CPU: 439 PID: 10 at ../kernel/workqueue.c:2231 process_one_work+0x4d/0x440
   Call Trace:
    &lt;TASK&gt;
    rescuer_thread+0x1f6/0x360
    kthread+0x156/0x180
    ret_from_fork+0x22/0x30
    &lt;/TASK&gt;

Commit c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p-&gt;on_cpu")
optimises ttwu by queueing a task that is descheduling on the wakelist,
but does not check if the task descheduling is still allowed to run on that CPU.

In this warning, the problematic task is a workqueue rescue thread which
checks if the rescue is for a per-cpu workqueue and running on the wrong CPU.
While this is early in boot and it should be possible to create workers,
the rescue thread may still used if the MAYDAY_INITIAL_TIMEOUT is reached
or MAYDAY_INTERVAL and on a sufficiently large machine, the rescue
thread is being used frequently.

Tracing confirmed that the task should have migrated properly using the
stopper thread to handle the migration. However, a parallel wakeup from udev
running on another CPU that does not share CPU cache observes p-&gt;on_cpu and
uses task_cpu(p), queues the task on the old CPU and triggers the warning.

Check that the wakee task that is descheduling is still allowed to run
on its current CPU and if not, wait for the descheduling to complete
and select an allowed CPU.

Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p-&gt;on_cpu")
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20220804092119.20137-1-mgorman@techsingularity.net
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2022-08-03T16:45:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-08-03T16:45:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b6bb70f9ab80a11161252bf217993d2c40ea5eb2'/>
<id>urn:sha1:b6bb70f9ab80a11161252bf217993d2c40ea5eb2</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:
 "Several core optimizations:

   - threadgroup_rwsem write locking is skipped when configuring
     controllers in empty subtrees.

     Combined with CLONE_INTO_CGROUP, this allows the common static
     usage pattern to not grab threadgroup_rwsem at all (glibc still
     doesn't seem ready for CLONE_INTO_CGROUP unfortunately).

   - threadgroup_rwsem used to be put into non-percpu mode by default
     due to latency concerns in specific use cases. There's no reason
     for everyone else to pay for it. Make the behavior optional.

   - psi no longer allocates memory when disabled.

  ... along with some code cleanups"

* tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Skip subtree root in cgroup_update_dfl_csses()
  cgroup: remove "no" prefixed mount options
  cgroup: Make !percpu threadgroup_rwsem operations optional
  cgroup: Add "no" prefixed mount options
  cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
  cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
  cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
  psi: dont alloc memory for psi by default
</content>
</entry>
<entry>
<title>sched/rt: Fix Sparse warnings due to undefined rt.c declarations</title>
<updated>2022-08-03T09:22:37Z</updated>
<author>
<name>Ben Dooks</name>
<email>ben-linux@fluff.org</email>
</author>
<published>2022-07-21T14:51:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=87514b2c24f294c32e9e743b095541dcf43928f7'/>
<id>urn:sha1:87514b2c24f294c32e9e743b095541dcf43928f7</id>
<content type='text'>
There are several symbols defined in kernel/sched/sched.h but get wrapped
in CONFIG_CGROUP_SCHED, even though dummy versions get built in rt.c and
therefore trigger Sparse warnings:

  kernel/sched/rt.c:309:6: warning: symbol 'unregister_rt_sched_group' was not declared. Should it be static?
  kernel/sched/rt.c:311:6: warning: symbol 'free_rt_sched_group' was not declared. Should it be static?
  kernel/sched/rt.c:313:5: warning: symbol 'alloc_rt_sched_group' was not declared. Should it be static?

Fix this by moving them outside the CONFIG_CGROUP_SCHED block.

[ mingo: Refreshed to the latest scheduler tree, tweaked changelog. ]

Signed-off-by: Ben Dooks &lt;ben-linux@fluff.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20220721145155.358366-1-ben-linux@fluff.org
</content>
</entry>
<entry>
<title>sched, cpuset: Fix dl_cpu_busy() panic due to empty cs-&gt;cpus_allowed</title>
<updated>2022-08-03T08:34:26Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2022-08-03T01:54:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b6e8d40d43ae4dec00c8fea2593eeea3114b8f44'/>
<id>urn:sha1:b6e8d40d43ae4dec00c8fea2593eeea3114b8f44</id>
<content type='text'>
With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating
that the cpuset will just use the effective CPUs of its parent. So
cpuset_can_attach() can call task_can_attach() with an empty mask.
This can lead to cpumask_any_and() returns nr_cpu_ids causing the call
to dl_bw_of() to crash due to percpu value access of an out of bound
CPU value. For example:

	[80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0
	  :
	[80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0
	  :
	[80468.207946] Call Trace:
	[80468.208947]  cpuset_can_attach+0xa0/0x140
	[80468.209953]  cgroup_migrate_execute+0x8c/0x490
	[80468.210931]  cgroup_update_dfl_csses+0x254/0x270
	[80468.211898]  cgroup_subtree_control_write+0x322/0x400
	[80468.212854]  kernfs_fop_write_iter+0x11c/0x1b0
	[80468.213777]  new_sync_write+0x11f/0x1b0
	[80468.214689]  vfs_write+0x1eb/0x280
	[80468.215592]  ksys_write+0x5f/0xe0
	[80468.216463]  do_syscall_64+0x5c/0x80
	[80468.224287]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix that by using effective_cpus instead. For cgroup v1, effective_cpus
is the same as cpus_allowed. For v2, effective_cpus is the real cpumask
to be used by tasks within the cpuset anyway.

Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to
reflect the change. In addition, a check is added to task_can_attach()
to guard against the possibility that cpumask_any_and() may return a
value &gt;= nr_cpu_ids.

Fixes: 7f51412a415d ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com
</content>
</entry>
</feed>
