<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cpuset.c, branch v4.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-12-03T15:18:21Z</updated>
<entry>
<title>cgroup: fix handling of multi-destination migration from subtree_control enabling</title>
<updated>2015-12-03T15:18:21Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-03T15:18:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1f7dd3e5a6e4f093017fff12232572ee1aa4639b'/>
<id>urn:sha1:1f7dd3e5a6e4f093017fff12232572ee1aa4639b</id>
<content type='text'>
Consider the following v2 hierarchy.

  P0 (+memory) --- P1 (-memory) --- A
                                 \- B
       
P0 has memory enabled in its subtree_control while P1 doesn't.  If
both A and B contain processes, they would belong to the memory css of
P1.  Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter.  IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses.  pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

 WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
 Modules linked in:
 CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
 ...
  ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
  ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
  ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
 Call Trace:
  [&lt;ffffffff81551ffc&gt;] dump_stack+0x4e/0x82
  [&lt;ffffffff810de202&gt;] warn_slowpath_common+0x82/0xc0
  [&lt;ffffffff810de2fa&gt;] warn_slowpath_null+0x1a/0x20
  [&lt;ffffffff8118e031&gt;] pids_cancel.constprop.6+0x31/0x40
  [&lt;ffffffff8118e0fd&gt;] pids_can_attach+0x6d/0xf0
  [&lt;ffffffff81188a4c&gt;] cgroup_taskset_migrate+0x6c/0x330
  [&lt;ffffffff81188e05&gt;] cgroup_migrate+0xf5/0x190
  [&lt;ffffffff81189016&gt;] cgroup_attach_task+0x176/0x200
  [&lt;ffffffff8118949d&gt;] __cgroup_procs_write+0x2ad/0x460
  [&lt;ffffffff81189684&gt;] cgroup_procs_write+0x14/0x20
  [&lt;ffffffff811854e5&gt;] cgroup_file_write+0x35/0x1c0
  [&lt;ffffffff812e26f1&gt;] kernfs_fop_write+0x141/0x190
  [&lt;ffffffff81265f88&gt;] __vfs_write+0x28/0xe0
  [&lt;ffffffff812666fc&gt;] vfs_write+0xac/0x1a0
  [&lt;ffffffff81267019&gt;] SyS_write+0x49/0xb0
  [&lt;ffffffff81bcef32&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, -&gt;can_attach, -&gt;cancel_attach() and -&gt;attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated.  All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
  target csses can be converted trivially.  cpu, io, freezer, perf,
  netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
  and destination and thus doesn't support v2 hierarchy already.  The
  only change made by this patchset is how that single destination css
  is obtained.

* memory migration path already doesn't do anything on v2.  How the
  single destination css is obtained is updated and the prep stage of
  mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug.  It now
  correctly handles multi-destination migrations and no longer causes
  counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2015-11-06T07:10:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-06T07:10:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2e3078af2c67730c479f1d183af5b367f5d95337'/>
<id>urn:sha1:2e3078af2c67730c479f1d183af5b367f5d95337</id>
<content type='text'>
Merge patch-bomb from Andrew Morton:

 - inotify tweaks

 - some ocfs2 updates (many more are awaiting review)

 - various misc bits

 - kernel/watchdog.c updates

 - Some of mm.  I have a huge number of MM patches this time and quite a
   lot of it is quite difficult and much will be held over to next time.

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (162 commits)
  selftests: vm: add tests for lock on fault
  mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
  mm: introduce VM_LOCKONFAULT
  mm: mlock: add new mlock system call
  mm: mlock: refactor mlock, munlock, and munlockall code
  kasan: always taint kernel on report
  mm, slub, kasan: enable user tracking by default with KASAN=y
  kasan: use IS_ALIGNED in memory_is_poisoned_8()
  kasan: Fix a type conversion error
  lib: test_kasan: add some testcases
  kasan: update reference to kasan prototype repo
  kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
  kasan: various fixes in documentation
  kasan: update log messages
  kasan: accurately determine the type of the bad access
  kasan: update reported bug types for kernel memory accesses
  kasan: update reported bug types for not user nor kernel memory accesses
  mm/kasan: prevent deadlock in kasan reporting
  mm/kasan: don't use kasan shadow pointer in generic functions
  mm/kasan: MODULE_VADDR is not available on all archs
  ...
</content>
</entry>
<entry>
<title>mm, oom: remove task_lock protecting comm printing</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2015-11-06T02:48:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=da39da3a54fed88e29024f2f1f6cd7357cd03a44'/>
<id>urn:sha1:da39da3a54fed88e29024f2f1f6cd7357cd03a44</id>
<content type='text'>
The oom killer takes task_lock() in a couple of places solely to protect
printing the task's comm.

A process's comm, including current's comm, may change due to
/proc/pid/comm or PR_SET_NAME.

The comm will always be NULL-terminated, so the worst race scenario would
only be during update.  We can tolerate a comm being printed that is in
the middle of an update to avoid taking the lock.

Other locations in the kernel have already dropped task_lock() when
printing comm, so this is consistent.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky.work@gmail.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace cgroup_has_tasks() with cgroup_is_populated()</title>
<updated>2015-10-15T20:41:50Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-10-15T20:41:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27bd4dbb8d51c476298e62bd088225317b7853de'/>
<id>urn:sha1:27bd4dbb8d51c476298e62bd088225317b7853de</id>
<content type='text'>
Currently, cgroup_has_tasks() tests whether the target cgroup has any
css_set linked to it.  This works because a css_set's refcnt converges
with the number of tasks linked to it and thus there's no css_set
linked to a cgroup if it doesn't have any live tasks.

To help tracking resource usage of zombie tasks, putting the ref of
css_set will be separated from disassociating the task from the
css_set which means that a cgroup may have css_sets linked to it even
when it doesn't have any live tasks.

This patch replaces cgroup_has_tasks() with cgroup_is_populated()
which tests cgroup-&gt;nr_populated instead which locally counts the
number of populated css_sets.  Unlike cgroup_has_tasks(),
cgroup_is_populated() is recursive - if any of the descendants is
populated, the cgroup is populated too.  While this changes the
meaning of the test, all the existing users are okay with the change.

While at it, replace the open-coded -&gt;populated_cnt test in
cgroup_events_show() with cgroup_is_populated().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup, memcg, cpuset: implement cgroup_taskset_for_each_leader()</title>
<updated>2015-09-22T16:46:53Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-11T19:00:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4530eddb59494b89650d6bcd980fc7f7717ad80c'/>
<id>urn:sha1:4530eddb59494b89650d6bcd980fc7f7717ad80c</id>
<content type='text'>
It wasn't explicitly documented but, when a process is being migrated,
cpuset and memcg depend on cgroup_taskset_first() returning the
threadgroup leader; however, this approach is somewhat ghetto and
would no longer work for the planned multi-process migration.

This patch introduces explicit cgroup_taskset_for_each_leader() which
iterates over only the threadgroup leaders and replaces
cgroup_taskset_first() usages for accessing the leader with it.

This prepares both memcg and cpuset for multi-process migration.  This
patch also updates the documentation for cgroup_taskset_for_each() to
clarify the iteration rules and removes comments mentioning task
ordering in tasksets.

v2: A previous patch which added threadgroup leader test was dropped.
    Patch updated accordingly.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
</content>
</entry>
<entry>
<title>cpuset: migrate memory only for threadgroup leaders</title>
<updated>2015-09-22T16:46:53Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-11T19:00:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3df9ca0a2b8b50db5a079ae9d97c5b55435e9a6c'/>
<id>urn:sha1:3df9ca0a2b8b50db5a079ae9d97c5b55435e9a6c</id>
<content type='text'>
If memory_migrate flag is set, cpuset migrates memory according to the
destnation css's nodemask.  The current implementation migrates memory
whenever any thread of a process is migrated making the behavior
somewhat arbitrary.  Let's tie memory operations to the threadgroup
leader so that memory is migrated only when the leader is migrated.

While this is a behavior change, given the inherent fuziness, this
change is not too likely to be noticed and allows us to clearly define
who owns the memory (always the leader) and helps the planned atomic
multi-process migration.

Note that we're currently migrating memory in migration path proper
while holding all the locks.  In the long term, this should be moved
out to an async work item.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cgroup: replace cftype-&gt;mode with CFTYPE_WORLD_WRITABLE</title>
<updated>2015-09-18T21:54:23Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-18T21:54:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7dbdb199d3bf88f043ea17e97113eb28d5b100bc'/>
<id>urn:sha1:7dbdb199d3bf88f043ea17e97113eb28d5b100bc</id>
<content type='text'>
cftype-&gt;mode allows controllers to give arbitrary permissions to
interface knobs.  Except for "cgroup.event_control", the existing uses
are spurious.

* Some explicitly specify S_IRUGO | S_IWUSR even though that's the
  default.

* "cpuset.memory_pressure" specifies S_IRUGO while also setting a
  write callback which returns -EACCES.  All it needs to do is simply
  not setting a write callback.

"cgroup.event_control" uses cftype-&gt;mode to make the file
world-writable.  It's a misdesigned interface and we don't want
controllers to be tweaking interface file permissions in general.
This patch removes cftype-&gt;mode and all its spurious uses and
implements CFTYPE_WORLD_WRITABLE for "cgroup.event_control" which is
marked as compatibility-only.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
</content>
</entry>
<entry>
<title>cgroup: replace cgroup_on_dfl() tests in controllers with cgroup_subsys_on_dfl()</title>
<updated>2015-09-18T15:56:28Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-09-18T15:56:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9e10a130d9b62af976d17d120c95f3650769312c'/>
<id>urn:sha1:9e10a130d9b62af976d17d120c95f3650769312c</id>
<content type='text'>
cgroup_on_dfl() tests whether the cgroup's root is the default
hierarchy; however, an individual controller is only interested in
whether the controller is attached to the default hierarchy and never
tests a cgroup which doesn't belong to the hierarchy that the
controller is attached to.

This patch replaces cgroup_on_dfl() tests in controllers with faster
static_key based cgroup_subsys_on_dfl().  This leaves cgroup core as
the only user of cgroup_on_dfl() and the function is moved from the
header file to cgroup.c.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
</content>
</entry>
<entry>
<title>cpuset: use trialcs-&gt;mems_allowed as a temp variable</title>
<updated>2015-08-10T15:18:41Z</updated>
<author>
<name>Alban Crequy</name>
<email>alban.crequy@gmail.com</email>
</author>
<published>2015-08-06T14:21:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=24ee3cf89bef04e8bc23788aca4e029a3f0f06d9'/>
<id>urn:sha1:24ee3cf89bef04e8bc23788aca4e029a3f0f06d9</id>
<content type='text'>
The comment says it's using trialcs-&gt;mems_allowed as a temp variable but
it didn't match the code. Change the code to match the comment.

This fixes an issue when writing in cpuset.mems when a sub-directory
exists: we need to write several times for the information to persist:

| root@alban:/sys/fs/cgroup/cpuset# mkdir footest9
| root@alban:/sys/fs/cgroup/cpuset# cd footest9
| root@alban:/sys/fs/cgroup/cpuset/footest9# mkdir aa
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 &gt; cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 &gt; cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 &gt; aa/cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9#

This should help to fix the following issue in Docker:
https://github.com/opencontainers/runc/issues/133
In some conditions, a Docker container needs to be started twice in
order to work.

Signed-off-by: Alban Crequy &lt;alban@endocode.com&gt;
Tested-by: Iago López Galeiras &lt;iago@endocode.com&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.17+
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel, cpuset: remove exception for __GFP_THISNODE</title>
<updated>2015-04-14T23:49:03Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2015-04-14T22:47:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e276d2a517fba71a2be2252fdffb237e746e784'/>
<id>urn:sha1:6e276d2a517fba71a2be2252fdffb237e746e784</id>
<content type='text'>
Nothing calls __cpuset_node_allowed() with __GFP_THISNODE set anymore, so
remove the obscure comment about it and its special-case exception.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Pravin Shelar &lt;pshelar@nicira.com&gt;
Cc: Jarno Rajahalme &lt;jrajahalme@nicira.com&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
