<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cgroup/cgroup.c, branch v6.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-08-22T17:47:43Z</updated>
<entry>
<title>cgroup/psi: Set of-&gt;priv to NULL upon file release</title>
<updated>2025-08-22T17:47:43Z</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2025-08-22T07:07:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=94a4acfec14615e971eb2c9e1fa6c992c85ff6c6'/>
<id>urn:sha1:94a4acfec14615e971eb2c9e1fa6c992c85ff6c6</id>
<content type='text'>
Setting of-&gt;priv to NULL when the file is released enables earlier bug
detection. This allows potential bugs to manifest as NULL pointer
dereferences rather than use-after-free errors[1], which are generally more
difficult to diagnose.

[1] https://lore.kernel.org/cgroups/38ef3ff9-b380-44f0-9315-8b3714b0948d@huaweicloud.com/T/#m8a3b3f88f0ff3da5925d342e90043394f8b2091b
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: split cgroup_destroy_wq into 3 workqueues</title>
<updated>2025-08-22T17:44:11Z</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2025-08-19T01:07:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=79f919a89c9d06816dbdbbd168fa41d27411a7f9'/>
<id>urn:sha1:79f919a89c9d06816dbdbbd168fa41d27411a7f9</id>
<content type='text'>
A hung task can occur during [1] LTP cgroup testing when repeatedly
mounting/unmounting perf_event and net_prio controllers with
systemd.unified_cgroup_hierarchy=1. The hang manifests in
cgroup_lock_and_drain_offline() during root destruction.

Related case:
cgroup_fj_function_perf_event cgroup_fj_function.sh perf_event
cgroup_fj_function_net_prio cgroup_fj_function.sh net_prio

Call Trace:
	cgroup_lock_and_drain_offline+0x14c/0x1e8
	cgroup_destroy_root+0x3c/0x2c0
	css_free_rwork_fn+0x248/0x338
	process_one_work+0x16c/0x3b8
	worker_thread+0x22c/0x3b0
	kthread+0xec/0x100
	ret_from_fork+0x10/0x20

Root Cause:

CPU0                            CPU1
mount perf_event                umount net_prio
cgroup1_get_tree                cgroup_kill_sb
rebind_subsystems               // root destruction enqueues
				// cgroup_destroy_wq
// kill all perf_event css
                                // one perf_event css A is dying
                                // css A offline enqueues cgroup_destroy_wq
                                // root destruction will be executed first
                                css_free_rwork_fn
                                cgroup_destroy_root
                                cgroup_lock_and_drain_offline
                                // some perf descendants are dying
                                // cgroup_destroy_wq max_active = 1
                                // waiting for css A to die

Problem scenario:
1. CPU0 mounts perf_event (rebind_subsystems)
2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
3. A dying perf_event CSS gets queued for offline after root destruction
4. Root destruction waits for offline completion, but offline work is
   blocked behind root destruction in cgroup_destroy_wq (max_active=1)

Solution:
Split cgroup_destroy_wq into three dedicated workqueues:
cgroup_offline_wq – Handles CSS offline operations
cgroup_release_wq – Manages resource release
cgroup_free_wq – Performs final memory deallocation

This separation eliminates blocking in the CSS free path while waiting for
offline operations to complete.

[1] https://github.com/linux-test-project/ltp/blob/master/runtest/controllers
Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
Reported-by: Gao Yingjie &lt;gaoyingjie@uniontech.com&gt;
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Suggested-by: Teju Heo &lt;tj@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Add bpf prog revisions to struct cgroup_bpf</title>
<updated>2025-06-09T23:17:11Z</updated>
<author>
<name>Yonghong Song</name>
<email>yonghong.song@linux.dev</email>
</author>
<published>2025-06-06T16:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9b8367b604c739947ec308874f087fe0eb80f412'/>
<id>urn:sha1:9b8367b604c739947ec308874f087fe0eb80f412</id>
<content type='text'>
One of key items in mprog API is revision for prog list. The revision
number will be increased if the prog list changed, e.g., attach, detach
or replace.

Add 'revisions' field to struct cgroup_bpf, representing revisions for
all cgroup related attachment types. The initial revision value is
set to 1, the same as kernel mprog implementations.

Signed-off-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20250606163136.2428732-1-yonghong.song@linux.dev
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2025-05-28T03:59:53Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-28T03:59:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3b66e6b3c098646f75c54e269dd963e2281c555c'/>
<id>urn:sha1:3b66e6b3c098646f75c54e269dd963e2281c555c</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cgroup rstat shared the tracking tree across all controllers with the
   rationale being that a cgroup which is using one resource is likely
   to be using other resources at the same time (ie. if something is
   allocating memory, it's probably consuming CPU cycles).

   However, this turned out to not scale very well especially with memcg
   using rstat for internal operations which made memcg stat read and
   flush patterns substantially different from other controllers. JP
   Kobryn split the rstat tree per controller.

 - cgroup BPF support was hooking into cgroup init/exit paths directly.

   Convert them to use a notifier chain instead so that other usages can
   be added easily. The two of the patches which implement this are
   mislabeled as belonging to sched_ext instead of cgroup. Sorry.

 - Relatively minor cpuset updates

 - Documentation updates

* tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits)
  sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier
  sched_ext: Introduce cgroup_lifetime_notifier
  cgroup: Minor reorganization of cgroup_create()
  cgroup, docs: cpu controller's interaction with various scheduling policies
  cgroup, docs: convert space indentation to tab indentation
  cgroup: avoid per-cpu allocation of size zero rstat cpu locks
  cgroup, docs: be specific about bandwidth control of rt processes
  cgroup: document the rstat per-cpu initialization
  cgroup: helper for checking rstat participation of css
  cgroup: use subsystem-specific rstat locks to avoid contention
  cgroup: use separate rstat trees for each subsystem
  cgroup: compare css to cgroup::self in helper for distingushing css
  cgroup: warn on rstat usage by early init subsystems
  cgroup/cpuset: drop useless cpumask_empty() in compute_effective_exclusive_cpumask()
  cgroup/rstat: Improve cgroup_rstat_push_children() documentation
  cgroup: fix goto ordering in cgroup_init()
  cgroup: fix pointer check in css_rstat_init()
  cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs
  cgroup/cpuset: Fix obsolete comment in cpuset_css_offline()
  cgroup/cpuset: Always use cpu_active_mask
  ...
</content>
</entry>
<entry>
<title>sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier</title>
<updated>2025-05-22T19:20:19Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-05-14T04:46:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=82648b8b2ae0a0ff371e2a98133844658cfaae9a'/>
<id>urn:sha1:82648b8b2ae0a0ff371e2a98133844658cfaae9a</id>
<content type='text'>
Replace explicit cgroup_bpf_inherit/offline() calls from cgroup
creation/destruction paths with notification callback registered on
cgroup_lifetime_notifier.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Introduce cgroup_lifetime_notifier</title>
<updated>2025-05-22T19:20:11Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-05-14T04:44:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9e8c67a9e526f497d606d721cf6b92dd68d90806'/>
<id>urn:sha1:9e8c67a9e526f497d606d721cf6b92dd68d90806</id>
<content type='text'>
Other subsystems may make use of the cgroup hierarchy with the cgroup_bpf
support being one such example. For such a feature, it's useful to be able
to hook into cgroup creation and destruction paths to perform
feature-specific initializations and cleanups.

Add cgroup_lifetime_notifier which generates CGROUP_LIFETIME_ONLINE and
CGROUP_LIFETIME_OFFLINE events whenever cgroups are created and destroyed,
respectively.

The next patch will convert cgroup_bpf to use the new notifier and other
uses are planned.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: Minor reorganization of cgroup_create()</title>
<updated>2025-05-22T19:19:57Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-05-14T04:43:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd22cbad1b800e55f5bd38c58be16f9d0b171d9a'/>
<id>urn:sha1:cd22cbad1b800e55f5bd38c58be16f9d0b171d9a</id>
<content type='text'>
cgroup_bpf init and exit handling will be moved to a notifier chain. In
prepartion, reorganize cgroup_create() a bit so that the new cgroup is fully
initialized before any outside changes are made.

- cgrp-&gt;ancestors[] initialization and the hierarchical nr_descendants and
  nr_frozen_descendants updates were in the same loop. Separate them out and
  do the former earlier and do the latter later.

- Relocate cgroup_bpf_inherit() call so that it's after all cgroup
  initializations are complete.

No visible behavior changes expected.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: use subsystem-specific rstat locks to avoid contention</title>
<updated>2025-05-19T20:29:42Z</updated>
<author>
<name>JP Kobryn</name>
<email>inwardvessel@gmail.com</email>
</author>
<published>2025-05-15T00:19:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=748922dcfabdd655d25fb6dd09a60e694a3d35e6'/>
<id>urn:sha1:748922dcfabdd655d25fb6dd09a60e694a3d35e6</id>
<content type='text'>
It is possible to eliminate contention between subsystems when
updating/flushing stats by using subsystem-specific locks. Let the existing
rstat locks be dedicated to the cgroup base stats and rename them to
reflect that. Add similar locks to the cgroup_subsys struct for use with
individual subsystems.

Lock initialization is done in the new function ss_rstat_init(ss) which
replaces cgroup_rstat_boot(void). If NULL is passed to this function, the
global base stat locks will be initialized. Otherwise, the subsystem locks
will be initialized.

Change the existing lock helper functions to accept a reference to a css.
Then within these functions, conditionally select the appropriate locks
based on the subsystem affiliation of the given css. Add helper functions
for this selection routine to avoid repeated code.

Signed-off-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: use separate rstat trees for each subsystem</title>
<updated>2025-05-19T20:28:59Z</updated>
<author>
<name>JP Kobryn</name>
<email>inwardvessel@gmail.com</email>
</author>
<published>2025-05-15T00:19:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5da3bfa029d6809e192d112f39fca4dbe0137aaf'/>
<id>urn:sha1:5da3bfa029d6809e192d112f39fca4dbe0137aaf</id>
<content type='text'>
Different subsystems may call cgroup_rstat_updated() within the same
cgroup, resulting in a tree of pending updates from multiple subsystems.
When one of these subsystems is flushed via cgroup_rstat_flushed(), all
other subsystems with pending updates on the tree will also be flushed.

Change the paradigm of having a single rstat tree for all subsystems to
having separate trees for each subsystem. This separation allows for
subsystems to perform flushes without the side effects of other subsystems.
As an example, flushing the cpu stats will no longer cause the memory stats
to be flushed and vice versa.

In order to achieve subsystem-specific trees, change the tree node type
from cgroup to cgroup_subsys_state pointer. Then remove those pointers from
the cgroup and instead place them on the css. Finally, change update/flush
functions to make use of the different node type (css). These changes allow
a specific subsystem to be associated with an update or flush. Separate
rstat trees will now exist for each unique subsystem.

Since updating/flushing will now be done at the subsystem level, there is
no longer a need to keep track of updated css nodes at the cgroup level.
The list management of these nodes done within the cgroup (rstat_css_list
and related) has been removed accordingly.

Conditional guards for checking validity of a given css were placed within
css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc
usage in bpf programs. Guards were also placed within css_rstat_init/exit()
in order to help consolidate calls to them. At call sites for all four
functions, the existing guards were removed.

Signed-off-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: compare css to cgroup::self in helper for distingushing css</title>
<updated>2025-05-19T20:21:24Z</updated>
<author>
<name>JP Kobryn</name>
<email>inwardvessel@gmail.com</email>
</author>
<published>2025-05-15T00:19:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=541a4219bd66bef56d93dbd306dc64a4d70ae99e'/>
<id>urn:sha1:541a4219bd66bef56d93dbd306dc64a4d70ae99e</id>
<content type='text'>
Adjust the implementation of css_is_cgroup() so that it compares the given
css to cgroup::self. Rename the function to css_is_self() in order to
reflect that. Change the existing css-&gt;ss NULL check to a warning in the
true branch. Finally, adjust call sites to use the new function name.

Signed-off-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
