<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cgroup, branch v6.13</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.13</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.13'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-01-09T01:54:39Z</updated>
<entry>
<title>cgroup/cpuset: remove kernfs active break</title>
<updated>2025-01-09T01:54:39Z</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2025-01-06T08:19:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3cb97a927fffe443e1e7e8eddbfebfdb062e86ed'/>
<id>urn:sha1:3cb97a927fffe443e1e7e8eddbfebfdb062e86ed</id>
<content type='text'>
A warning was found:

WARNING: CPU: 10 PID: 3486953 at fs/kernfs/file.c:828
CPU: 10 PID: 3486953 Comm: rmdir Kdump: loaded Tainted: G
RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
RSP: 0018:ffff8881107ef9e0 EFLAGS: 00010202
RAX: 0000000080000002 RBX: ffff888154738c00 RCX: dffffc0000000000
RDX: 0000000000000007 RSI: 0000000000000004 RDI: ffff888154738c04
RBP: ffff888154738c04 R08: ffffffffaf27fa15 R09: ffffed102a8e7180
R10: ffff888154738c07 R11: 0000000000000000 R12: ffff888154738c08
R13: ffff888750f8c000 R14: ffff888750f8c0e8 R15: ffff888154738ca0
FS:  00007f84cd0be740(0000) GS:ffff8887ddc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555f9fbe00c8 CR3: 0000000153eec001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 kernfs_drain+0x15e/0x2f0
 __kernfs_remove+0x165/0x300
 kernfs_remove_by_name_ns+0x7b/0xc0
 cgroup_rm_file+0x154/0x1c0
 cgroup_addrm_files+0x1c2/0x1f0
 css_clear_dir+0x77/0x110
 kill_css+0x4c/0x1b0
 cgroup_destroy_locked+0x194/0x380
 cgroup_rmdir+0x2a/0x140

It can be explained by:
rmdir 				echo 1 &gt; cpuset.cpus
				kernfs_fop_write_iter // active=0
cgroup_rm_file
kernfs_remove_by_name_ns	kernfs_get_active // active=1
__kernfs_remove					  // active=0x80000002
kernfs_drain			cpuset_write_resmask
wait_event
//waiting (active == 0x80000001)
				kernfs_break_active_protection
				// active = 0x80000001
// continue
				kernfs_unbreak_active_protection
				// active = 0x80000002
...
kernfs_should_drain_open_files
// warning occurs
				kernfs_put_active

This warning is caused by 'kernfs_break_active_protection' when it is
writing to cpuset.cpus, and the cgroup is removed concurrently.

The commit 3a5a6d0c2b03 ("cpuset: don't nest cgroup_mutex inside
get_online_cpus()") made cpuset_hotplug_workfn asynchronous, This change
involves calling flush_work(), which can create a multiple processes
circular locking dependency that involve cgroup_mutex, potentially leading
to a deadlock. To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset: break
kernfs active protection in cpuset_write_resmask()") added
'kernfs_break_active_protection' in the cpuset_write_resmask. This could
lead to this warning.

After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
processing synchronous"), the cpuset_write_resmask no longer needs to
wait the hotplug to finish, which means that concurrent hotplug and cpuset
operations are no longer possible. Therefore, the deadlock doesn't exist
anymore and it does not have to 'break active protection' now. To fix this
warning, just remove kernfs_break_active_protection operation in the
'cpuset_write_resmask'.

Fixes: bdb2fd7fc56e ("kernfs: Skip kernfs_drain_open_files() more aggressively")
Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()")
Reported-by: Ji Fa &lt;jifa@huawei.com&gt;
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Prevent leakage of isolated CPUs into sched domains</title>
<updated>2024-12-11T15:45:52Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-12-05T19:51:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9b496a8bbed9cc292b0dfd796f38ec58b6d0375f'/>
<id>urn:sha1:9b496a8bbed9cc292b0dfd796f38ec58b6d0375f</id>
<content type='text'>
Isolated CPUs are not allowed to be used in a non-isolated partition.
The only exception is the top cpuset which is allowed to contain boot
time isolated CPUs.

Commit ccac8e8de99c ("cgroup/cpuset: Fix remote root partition creation
problem") introduces a simplified scheme of including only partition
roots in sched domain generation. However, it does not properly account
for this exception case. This can result in leakage of isolated CPUs
into a sched domain.

Fix it by making sure that isolated CPUs are excluded from the top
cpuset before generating sched domains.

Also update the way the boot time isolated CPUs are handled in
test_cpuset_prs.sh to make sure that those isolated CPUs are really
isolated instead of just skipping them in the tests.

Fixes: ccac8e8de99c ("cgroup/cpuset: Fix remote root partition creation problem")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Remove stale text</title>
<updated>2024-12-11T06:38:41Z</updated>
<author>
<name>Costa Shulyupin</name>
<email>costa.shul@redhat.com</email>
</author>
<published>2024-12-04T11:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eb1dd15fb26d9ad85204f444ef03f29f9049eb1e'/>
<id>urn:sha1:eb1dd15fb26d9ad85204f444ef03f29f9049eb1e</id>
<content type='text'>
Task's cpuset pointer was removed by
commit 8793d854edbc ("Task Control Groups: make cpusets a client of cgroups")

Paragraph "The task_lock() exception ...." was removed by
commit 2df167a300d7 ("cgroups: update comments in cpuset.c")

Remove stale text:

 We also require taking task_lock() when dereferencing a
 task's cpuset pointer. See "The task_lock() exception", at the end of this
 comment.

 Accessing a task's cpuset should be done in accordance with the
 guidelines for accessing subsystem state in kernel/cgroup.c

and reformat.

Co-developed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Co-developed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Costa Shulyupin &lt;costa.shul@redhat.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2024-11-20T17:54:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-20T17:54:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7586d5276515a54656bc46530b32e10913c44b1f'/>
<id>urn:sha1:7586d5276515a54656bc46530b32e10913c44b1f</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cpu.stat now also shows niced CPU time

 - Freezer and cpuset optimizations

 - Other misc changes

* tag 'cgroup-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing
  cgroup/cpuset: Further optimize code if CONFIG_CPUSETS_V1 not set
  cgroup/cpuset: Enforce at most one rebuild_sched_domains_locked() call per operation
  cgroup/cpuset: Revert "Allow suppression of sched domain rebuild in update_cpumasks_hier()"
  MAINTAINERS: remove Zefan Li
  cgroup/freezer: Add cgroup CGRP_FROZEN flag update helper
  cgroup/freezer: Reduce redundant traversal for cgroup_freeze
  cgroup/bpf: only cgroup v2 can be attached by bpf programs
  Revert "cgroup: Fix memory leak caused by missing cgroup_bpf_offline"
  selftests/cgroup: Fix compile error in test_cpu.c
  cgroup/rstat: Selftests for niced CPU statistics
  cgroup/rstat: Tracking cgroup-level niced CPU time
  cgroup/cpuset: Fix spelling errors in file kernel/cgroup/cpuset.c
</content>
</entry>
<entry>
<title>Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2024-11-18T20:24:06Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-18T20:24:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0f25f0e4efaeb68086f7e65c442f2d648b21736f'/>
<id>urn:sha1:0f25f0e4efaeb68086f7e65c442f2d648b21736f</id>
<content type='text'>
Pull 'struct fd' class updates from Al Viro:
 "The bulk of struct fd memory safety stuff

  Making sure that struct fd instances are destroyed in the same scope
  where they'd been created, getting rid of reassignments and passing
  them by reference, converting to CLASS(fd{,_pos,_raw}).

  We are getting very close to having the memory safety of that stuff
  trivial to verify"

* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
  deal with the last remaing boolean uses of fd_file()
  css_set_fork(): switch to CLASS(fd_raw, ...)
  memcg_write_event_control(): switch to CLASS(fd)
  assorted variants of irqfd setup: convert to CLASS(fd)
  do_pollfd(): convert to CLASS(fd)
  convert do_select()
  convert vfs_dedupe_file_range().
  convert cifs_ioctl_copychunk()
  convert media_request_get_by_fd()
  convert spu_run(2)
  switch spufs_calls_{get,put}() to CLASS() use
  convert cachestat(2)
  convert do_preadv()/do_pwritev()
  fdget(), more trivial conversions
  fdget(), trivial conversions
  privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
  o2hb_region_dev_store(): avoid goto around fdget()/fdput()
  introduce "fd_pos" class, convert fdget_pos() users to it.
  fdget_raw() users: switch to CLASS(fd_raw)
  convert vmsplice() to CLASS(fd)
  ...
</content>
</entry>
<entry>
<title>cgroup/cpuset: Disable cpuset_cpumask_can_shrink() test if not load balancing</title>
<updated>2024-11-14T18:44:03Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-11-14T18:19:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fbfbf86685b3270dc27d1c5d6108532334aaf329'/>
<id>urn:sha1:fbfbf86685b3270dc27d1c5d6108532334aaf329</id>
<content type='text'>
With some recent proposed changes [1] in the deadline server code,
it has caused a test failure in test_cpuset_prs.sh when a change
is being made to an isolated partition. This is due to failing
the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
validate_change().

This is actually a false positive as the failed test case involves an
isolated partition with load balancing disabled. The deadline check
is not meaningful in this case and the users should know what they
are doing.

Fix this by doing the cpuset_cpumask_can_shrink() check only when loading
balanced is enabled. Also change its arguments to use effective_cpus
for the current cpuset and user_xcpus() as an approiximation for the
target effective_cpus as the real effective_cpus hasn't been fully
computed yet as this early stage.

As the check isn't comprehensive, there may be false positives or
negatives. We may have to revise the code to do a more thorough check
in the future if this becomes a concern.

[1] https://lore.kernel.org/lkml/82be06c1-6d6d-4651-86c9-bcc828cbcb80@redhat.com/T/#t

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Further optimize code if CONFIG_CPUSETS_V1 not set</title>
<updated>2024-11-12T19:07:38Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-11-10T02:50:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c4c9cebe2fb9cdc73e55513de7af7a4f50260e88'/>
<id>urn:sha1:c4c9cebe2fb9cdc73e55513de7af7a4f50260e88</id>
<content type='text'>
Currently the cpuset code uses group_subsys_on_dfl() to check if we
are running with cgroup v2. If CONFIG_CPUSETS_V1 isn't set, there is
really no need to do this check and we can optimize out some of the
unneeded v1 specific code paths. Introduce a new cpuset_v2() and use it
to replace the cgroup_subsys_on_dfl() check to further optimize the
code.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Enforce at most one rebuild_sched_domains_locked() call per operation</title>
<updated>2024-11-12T19:07:09Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-11-10T02:50:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a040c351283e3ac75422621ea205b1d8d687e108'/>
<id>urn:sha1:a040c351283e3ac75422621ea205b1d8d687e108</id>
<content type='text'>
Since commit ff0ce721ec21 ("cgroup/cpuset: Eliminate unncessary
sched domains rebuilds in hotplug"), there is only one
rebuild_sched_domains_locked() call per hotplug operation. However,
writing to the various cpuset control files may still casue more than
one rebuild_sched_domains_locked() call to happen in some cases.

Juri had found that two rebuild_sched_domains_locked() calls in
update_prstate(), one from update_cpumasks_hier() and another one from
update_partition_sd_lb() could cause cpuset partition to be created
with null total_bw for DL tasks. IOW, DL tasks may not be scheduled
correctly in such a partition.

A sample command sequence that can reproduce null total_bw is as
follows.

  # echo Y &gt;/sys/kernel/debug/sched/verbose
  # echo +cpuset &gt;/sys/fs/cgroup/cgroup.subtree_control
  # mkdir /sys/fs/cgroup/test
  # echo 0-7 &gt; /sys/fs/cgroup/test/cpuset.cpus
  # echo 6-7 &gt; /sys/fs/cgroup/test/cpuset.cpus.exclusive
  # echo root &gt;/sys/fs/cgroup/test/cpuset.cpus.partition

Fix this double rebuild_sched_domains_locked() calls problem
by replacing existing calls with cpuset_force_rebuild() except
the rebuild_sched_domains_cpuslocked() call at the end of
cpuset_handle_hotplug(). Checking of the force_sd_rebuild flag is
now done at the end of cpuset_write_resmask() and update_prstate()
to determine if rebuild_sched_domains_locked() should be called or not.

The cpuset v1 code can still call rebuild_sched_domains_locked()
directly as double rebuild_sched_domains_locked() calls is not possible.

Reported-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Closes: https://lore.kernel.org/lkml/ZyuUcJDPBln1BK1Y@jlelli-thinkpadt14gen4.remote.csb/
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Tested-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Revert "Allow suppression of sched domain rebuild in update_cpumasks_hier()"</title>
<updated>2024-11-12T19:07:01Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-11-10T02:50:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bcd7012afd7bcd45fcd7a0e2f48e57b273702317'/>
<id>urn:sha1:bcd7012afd7bcd45fcd7a0e2f48e57b273702317</id>
<content type='text'>
Revert commit 3ae0b773211e ("cgroup/cpuset: Allow suppression of sched
domain rebuild in update_cpumasks_hier()") to allow for an alternative
way to suppress unnecessary rebuild_sched_domains_locked() calls in
update_cpumasks_hier() and elsewhere in a following commit.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>css_set_fork(): switch to CLASS(fd_raw, ...)</title>
<updated>2024-11-03T06:28:07Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-06-02T19:03:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=457a6549394cd680e935bc6743e832ac42f2603a'/>
<id>urn:sha1:457a6549394cd680e935bc6743e832ac42f2603a</id>
<content type='text'>
reference acquired there by fget_raw() is not stashed anywhere -
we could as well borrow instead.

Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
</feed>
