<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cgroup, branch v6.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-09-23T17:18:45Z</updated>
<entry>
<title>cgroup: cgroup_get_from_id() must check the looked-up kn is a directory</title>
<updated>2022-09-23T17:18:45Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2022-09-23T11:51:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df02452f3df069a59bc9e69c84435bf115cb6e37'/>
<id>urn:sha1:df02452f3df069a59bc9e69c84435bf115cb6e37</id>
<content type='text'>
cgroup has to be one kernfs dir, otherwise kernel panic is caused,
especially cgroup id is provide from userspace.

Reported-by: Marco Patalano &lt;mpatalan@redhat.com&gt;
Fixes: 6b658c4863c1 ("scsi: cgroup: Add cgroup_get_from_id()")
Cc: Muneendra &lt;muneendra.kumar@broadcom.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Acked-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()</title>
<updated>2022-08-25T17:36:30Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2022-08-25T08:38:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43626dade36fa74d3329046f4ae2d7fdefe401c6'/>
<id>urn:sha1:43626dade36fa74d3329046f4ae2d7fdefe401c6</id>
<content type='text'>
syzbot is hitting percpu_rwsem_assert_held(&amp;cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem &lt;-&gt; cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e [1]
Reported-by: syzbot &lt;syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com&gt;
Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem &lt;-&gt; cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Fix race condition at rebind_subsystems()</title>
<updated>2022-08-23T18:11:06Z</updated>
<author>
<name>Jing-Ting Wu</name>
<email>Jing-Ting.Wu@mediatek.com</email>
</author>
<published>2022-08-23T05:41:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=763f4fb76e24959c370cdaa889b2492ba6175580'/>
<id>urn:sha1:763f4fb76e24959c370cdaa889b2492ba6175580</id>
<content type='text'>
Root cause:
The rebind_subsystems() is no lock held when move css object from A
list to B list,then let B's head be treated as css node at
list_for_each_entry_rcu().

Solution:
Add grace period before invalidating the removed rstat_css_node.

Reported-by: Jing-Ting Wu &lt;jing-ting.wu@mediatek.com&gt;
Suggested-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Jing-Ting Wu &lt;jing-ting.wu@mediatek.com&gt;
Tested-by: Jing-Ting Wu &lt;jing-ting.wu@mediatek.com&gt;
Link: https://lore.kernel.org/linux-arm-kernel/d8f0bc5e2fb6ed259f9334c83279b4c011283c41.camel@mediatek.com/T/
Acked-by: Mukesh Ojha &lt;quic_mojha@quicinc.com&gt;
Fixes: a7df69b81aac ("cgroup: rstat: support cgroup1")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Fix threadgroup_rwsem &lt;-&gt; cpus_read_lock() deadlock</title>
<updated>2022-08-17T17:36:05Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-08-15T23:27:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4f7e7236435ca0abe005c674ebd6892c6e83aeb3'/>
<id>urn:sha1:4f7e7236435ca0abe005c674ebd6892c6e83aeb3</id>
<content type='text'>
Bringing up a CPU may involve creating and destroying tasks which requires
read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
cpus_read_lock(). However, cpuset's -&gt;attach(), which may be called with
thredagroup_rwsem write-locked, also wants to disable CPU hotplug and
acquires cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that -&gt;attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-and-tested-by: Imran Khan &lt;imran.f.khan@oracle.com&gt;
Reported-and-tested-by: Xuewen Yan &lt;xuewen.yan@unisoc.com&gt;
Fixes: 05c7b7a92cc8 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
</content>
</entry>
<entry>
<title>sched/psi: Remove unused parameter nbytes of psi_trigger_create()</title>
<updated>2022-08-15T22:35:25Z</updated>
<author>
<name>Hao Jia</name>
<email>jiahao.os@bytedance.com</email>
</author>
<published>2022-08-06T12:05:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=76b079ef4cc954fc2c2e0333a01855b0b2b6bdee'/>
<id>urn:sha1:76b079ef4cc954fc2c2e0333a01855b0b2b6bdee</id>
<content type='text'>
psi_trigger_create()'s 'nbytes' parameter is not used, so we can remove it.

Signed-off-by: Hao Jia &lt;jiahao.os@bytedance.com&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-08-07T00:34:06Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-08-07T00:34:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cac03ac368fabff0122853de2422d4e17a32de08'/>
<id>urn:sha1:cac03ac368fabff0122853de2422d4e17a32de08</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Various fixes: a deadline scheduler fix, a migration fix, a Sparse fix
  and a comment fix"

* tag 'sched-urgent-2022-08-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Do not requeue task on CPU excluded from cpus_mask
  sched/rt: Fix Sparse warnings due to undefined rt.c declarations
  exit: Fix typo in comment: s/sub-theads/sub-threads
  sched, cpuset: Fix dl_cpu_busy() panic due to empty cs-&gt;cpus_allowed
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2022-08-03T16:45:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-08-03T16:45:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b6bb70f9ab80a11161252bf217993d2c40ea5eb2'/>
<id>urn:sha1:b6bb70f9ab80a11161252bf217993d2c40ea5eb2</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:
 "Several core optimizations:

   - threadgroup_rwsem write locking is skipped when configuring
     controllers in empty subtrees.

     Combined with CLONE_INTO_CGROUP, this allows the common static
     usage pattern to not grab threadgroup_rwsem at all (glibc still
     doesn't seem ready for CLONE_INTO_CGROUP unfortunately).

   - threadgroup_rwsem used to be put into non-percpu mode by default
     due to latency concerns in specific use cases. There's no reason
     for everyone else to pay for it. Make the behavior optional.

   - psi no longer allocates memory when disabled.

  ... along with some code cleanups"

* tag 'cgroup-for-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Skip subtree root in cgroup_update_dfl_csses()
  cgroup: remove "no" prefixed mount options
  cgroup: Make !percpu threadgroup_rwsem operations optional
  cgroup: Add "no" prefixed mount options
  cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
  cgroup.c: remove redundant check for mixable cgroup in cgroup_migrate_vet_dst
  cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes
  psi: dont alloc memory for psi by default
</content>
</entry>
<entry>
<title>sched, cpuset: Fix dl_cpu_busy() panic due to empty cs-&gt;cpus_allowed</title>
<updated>2022-08-03T08:34:26Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2022-08-03T01:54:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b6e8d40d43ae4dec00c8fea2593eeea3114b8f44'/>
<id>urn:sha1:b6e8d40d43ae4dec00c8fea2593eeea3114b8f44</id>
<content type='text'>
With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating
that the cpuset will just use the effective CPUs of its parent. So
cpuset_can_attach() can call task_can_attach() with an empty mask.
This can lead to cpumask_any_and() returns nr_cpu_ids causing the call
to dl_bw_of() to crash due to percpu value access of an out of bound
CPU value. For example:

	[80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0
	  :
	[80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0
	  :
	[80468.207946] Call Trace:
	[80468.208947]  cpuset_can_attach+0xa0/0x140
	[80468.209953]  cgroup_migrate_execute+0x8c/0x490
	[80468.210931]  cgroup_update_dfl_csses+0x254/0x270
	[80468.211898]  cgroup_subtree_control_write+0x322/0x400
	[80468.212854]  kernfs_fop_write_iter+0x11c/0x1b0
	[80468.213777]  new_sync_write+0x11f/0x1b0
	[80468.214689]  vfs_write+0x1eb/0x280
	[80468.215592]  ksys_write+0x5f/0xe0
	[80468.216463]  do_syscall_64+0x5c/0x80
	[80468.224287]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix that by using effective_cpus instead. For cgroup v1, effective_cpus
is the same as cpus_allowed. For v2, effective_cpus is the real cpumask
to be used by tasks within the cpuset anyway.

Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to
reflect the change. In addition, a check is added to task_can_attach()
to guard against the possibility that cpumask_any_and() may return a
value &gt;= nr_cpu_ids.

Fixes: 7f51412a415d ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com
</content>
</entry>
<entry>
<title>Merge tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-08-01T18:49:06Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-08-01T18:49:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b167fdffe9e737007cbf7c691cde5fa489ca58d7'/>
<id>urn:sha1:b167fdffe9e737007cbf7c691cde5fa489ca58d7</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:
"Load-balancing improvements:

   - Improve NUMA balancing on AMD Zen systems for affine workloads.

   - Improve the handling of reduced-capacity CPUs in load-balancing.

   - Energy Model improvements: fix &amp; refine all the energy fairness
     metrics (PELT), and remove the conservative threshold requiring 6%
     energy savings to migrate a task. Doing this improves power
     efficiency for most workloads, and also increases the reliability
     of energy-efficiency scheduling.

   - Optimize/tweak select_idle_cpu() to spend (much) less time
     searching for an idle CPU on overloaded systems. There's reports of
     several milliseconds spent there on large systems with large
     workloads ...

     [ Since the search logic changed, there might be behavioral side
       effects. ]

   - Improve NUMA imbalance behavior. On certain systems with spare
     capacity, initial placement of tasks is non-deterministic, and such
     an artificial placement imbalance can persist for a long time,
     hurting (and sometimes helping) performance.

     The fix is to make fork-time task placement consistent with runtime
     NUMA balancing placement.

     Note that some performance regressions were reported against this,
     caused by workloads that are not memory bandwith limited, which
     benefit from the artificial locality of the placement bug(s). Mel
     Gorman's conclusion, with which we concur, was that consistency is
     better than random workload benefits from non-deterministic bugs:

        "Given there is no crystal ball and it's a tradeoff, I think
         it's better to be consistent and use similar logic at both fork
         time and runtime even if it doesn't have universal benefit."

   - Improve core scheduling by fixing a bug in
     sched_core_update_cookie() that caused unnecessary forced idling.

   - Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs
     for newly woken tasks.

   - Fix a newidle balancing bug that introduced unnecessary wakeup
     latencies.

  ABI improvements/fixes:

   - Do not check capabilities and do not issue capability check denial
     messages when a scheduler syscall doesn't require privileges. (Such
     as increasing niceness.)

   - Add forced-idle accounting to cgroups too.

   - Fix/improve the RSEQ ABI to not just silently accept unknown flags.
     (No existing tooling is known to have learned to rely on the
     previous behavior.)

   - Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags.

  Optimizations:

   - Optimize &amp; simplify leaf_cfs_rq_list()

   - Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg().

  Misc fixes &amp; cleanups:

   - Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems.

   - Fix a full-NOHZ bug that can in some cases result in the tick not
     being re-enabled when the last SCHED_RT task is gone from a
     runqueue but there's still SCHED_OTHER tasks around.

   - Various PREEMPT_RT related fixes.

   - Misc cleanups &amp; smaller fixes"

* tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  rseq: Kill process when unknown flags are encountered in ABI structures
  rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags
  sched/core: Fix the bug that task won't enqueue into core tree when update cookie
  nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt()
  sched/core: Always flush pending blk_plug
  sched/fair: fix case with reduced capacity CPU
  sched/core: Use try_cmpxchg in set_nr_{and_not,if}_polling
  sched/core: add forced idle accounting for cgroups
  sched/fair: Remove the energy margin in feec()
  sched/fair: Remove task_util from effective utilization in feec()
  sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu()
  sched/fair: Rename select_idle_mask to select_rq_mask
  sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util()
  sched/fair: Decay task PELT values during wakeup migration
  sched/fair: Provide u64 read for 32-bits arch helper
  sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg
  sched: only perform capability check on privileged operation
  sched: Remove unused function group_first_cpu()
  sched/fair: Remove redundant word " *"
  selftests/rseq: check if libc rseq support is registered
  ...
</content>
</entry>
<entry>
<title>cgroup: Skip subtree root in cgroup_update_dfl_csses()</title>
<updated>2022-07-28T17:26:30Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2022-07-28T00:58:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=265792d0dede9259f0ca56bb3efcc23eceee7d01'/>
<id>urn:sha1:265792d0dede9259f0ca56bb3efcc23eceee7d01</id>
<content type='text'>
The cgroup_update_dfl_csses() function updates css associations when a
cgroup's subtree_control file is modified. Any changes made to a cgroup's
subtree_control file, however, will only affect its descendants but not
the cgroup itself. So there is no point in migrating csses associated
with that cgroup. We can skip them instead.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
