<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cpuset.c, branch v3.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-02-20T17:18:31Z</updated>
<entry>
<title>Merge branch 'for-3.9-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2013-02-20T17:18:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-20T17:18:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9ae46e6702d98d22037368896298d05958ad5737'/>
<id>urn:sha1:9ae46e6702d98d22037368896298d05958ad5737</id>
<content type='text'>
Pull cpuset changes from Tejun Heo:

 - Synchornization has seen a lot of changes with focus on decoupling
   cpuset synchronization from cgroup internal locking.

   After this change, there only remain a couple of mostly trivial
   dependencies on cgroup_lock outside cgroup core proper.  cgroup_lock
   is scheduled to be unexported in this devel cycle.

   This will finally remove the fragile locking order around cgroup
   (cgroup locking wants to / should be one of the outermost but yet has
   been acquired from deep inside individual controllers).

 - At this point, Li is most knowlegeable with cpuset and taking over
   the maintainership of cpuset.

* 'for-3.9-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: drop spurious retval assignment in proc_cpuset_show()
  cpuset: fix RCU lockdep splat
  cpuset: update MAINTAINERS
  cpuset: remove cpuset-&gt;parent
  cpuset: replace cpuset-&gt;stack_list with cpuset_for_each_descendant_pre()
  cpuset: replace cgroup_mutex locking with cpuset internal locking
  cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty
  cpuset: pin down cpus and mems while a task is being attached
  cpuset: make CPU / memory hotplug propagation asynchronous
  cpuset: drop async_rebuild_sched_domains()
  cpuset: don't nest cgroup_mutex inside get_online_cpus()
  cpuset: reorganize CPU / memory hotplug handling
  cpuset: cleanup cpuset[_can]_attach()
  cpuset: introduce cpuset_for_each_child()
  cpuset: introduce CS_ONLINE
  cpuset: introduce -&gt;css_on/offline()
  cpuset: remove fast exit path from remove_tasks_in_empty_cpuset()
  cpuset: remove unused cpuset_unlock()
</content>
</entry>
<entry>
<title>cpuset: fix cpuset_print_task_mems_allowed() vs rename() race</title>
<updated>2013-02-18T17:08:15Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2013-01-25T08:08:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63f43f55c9bbc14f76b582644019b8a07dc8219a'/>
<id>urn:sha1:63f43f55c9bbc14f76b582644019b8a07dc8219a</id>
<content type='text'>
rename() will change dentry-&gt;d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.

It's safe in the protection of dentry-&gt;d_lock.

v2: check NULL dentry before acquiring dentry lock.

Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>cpuset: drop spurious retval assignment in proc_cpuset_show()</title>
<updated>2013-01-15T16:38:55Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2013-01-15T06:11:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d127027baf98dce3ca31bec18c2c0e048ceda7c4'/>
<id>urn:sha1:d127027baf98dce3ca31bec18c2c0e048ceda7c4</id>
<content type='text'>
proc_cpuset_show() has a spurious -EINVAL assignment which does
nothing.  Remove it.

This patch doesn't make any functional difference.

tj: Rewrote patch description.

Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cpuset: fix RCU lockdep splat</title>
<updated>2013-01-15T16:38:55Z</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2013-01-15T06:10:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27e89ae5d6e94e30231ee89a7736f62d84ba4c6f'/>
<id>urn:sha1:27e89ae5d6e94e30231ee89a7736f62d84ba4c6f</id>
<content type='text'>
5d21cc2db040d01f8c19b8602f6987813e1176b4 ("cpuset: replace
cgroup_mutex locking with cpuset internal locking") incorrectly
converted proc_cpuset_show() from cgroup_lock() to cpuset_mutex.
proc_cpuset_show() is accessing cgroup hierarchy proper to determine
cgroup path which can't be protected by cpuset_mutex.  This triggered
the following RCU warning.

 ===============================
 [ INFO: suspicious RCU usage. ]
 3.8.0-rc3-next-20130114-sasha-00016-ga107525-dirty #262 Tainted: G        W
 -------------------------------
 include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 1
 2 locks held by trinity/7514:
  #0:  (&amp;p-&gt;lock){+.+.+.}, at: [&lt;ffffffff812b06aa&gt;] seq_read+0x3a/0x3e0
  #1:  (cpuset_mutex){+.+...}, at: [&lt;ffffffff811abae4&gt;] proc_cpuset_show+0x84/0x190

 stack backtrace:
 Pid: 7514, comm: trinity Tainted: G        W
+3.8.0-rc3-next-20130114-sasha-00016-ga107525-dirty #262
 Call Trace:
  [&lt;ffffffff81182cab&gt;] lockdep_rcu_suspicious+0x10b/0x120
  [&lt;ffffffff811abb71&gt;] proc_cpuset_show+0x111/0x190
  [&lt;ffffffff812b0827&gt;] seq_read+0x1b7/0x3e0
  [&lt;ffffffff812b0670&gt;] ? seq_lseek+0x110/0x110
  [&lt;ffffffff8128b4fb&gt;] do_loop_readv_writev+0x4b/0x90
  [&lt;ffffffff8128b776&gt;] do_readv_writev+0xf6/0x1d0
  [&lt;ffffffff8128b8ee&gt;] vfs_readv+0x3e/0x60
  [&lt;ffffffff8128b960&gt;] sys_readv+0x50/0xd0
  [&lt;ffffffff83d33d18&gt;] tracesys+0xe1/0xe6

The operation can be performed under RCU read lock.  Replace
cpuset_mutex locking with RCU read locking.

tj: Rewrote patch description.

Reported-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cpuset: remove cpuset-&gt;parent</title>
<updated>2013-01-07T16:51:08Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-01-07T16:51:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c431069fe4bacc0cd3ca94a8453987140a5b3517'/>
<id>urn:sha1:c431069fe4bacc0cd3ca94a8453987140a5b3517</id>
<content type='text'>
cgroup already tracks the hierarchy.  Follow cgroup-&gt;parent to find
the parent and drop cpuset-&gt;parent.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cpuset: replace cpuset-&gt;stack_list with cpuset_for_each_descendant_pre()</title>
<updated>2013-01-07T16:51:08Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-01-07T16:51:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc560a26accea9567b7f8f41eefa01e1bb127d74'/>
<id>urn:sha1:fc560a26accea9567b7f8f41eefa01e1bb127d74</id>
<content type='text'>
Implement cpuset_for_each_descendant_pre() and replace the
cpuset-specific tree walking using cpuset-&gt;stack_list with it.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cpuset: replace cgroup_mutex locking with cpuset internal locking</title>
<updated>2013-01-07T16:51:08Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-01-07T16:51:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d21cc2db040d01f8c19b8602f6987813e1176b4'/>
<id>urn:sha1:5d21cc2db040d01f8c19b8602f6987813e1176b4</id>
<content type='text'>
Supposedly for historical reasons, cpuset depends on cgroup core for
locking.  It depends on cgroup_mutex in cgroup callbacks and grabs
cgroup_mutex from other places where it wants to be synchronized.
This is majorly messy and highly prone to introducing circular locking
dependency especially because cgroup_mutex is supposed to be one of
the outermost locks.

As previous patches already plugged possible races which may happen by
decoupling from cgroup_mutex, replacing cgroup_mutex with cpuset
specific cpuset_mutex is mostly straight-forward.  Introduce
cpuset_mutex, replace all occurrences of cgroup_mutex with it, and add
cpuset_mutex locking to places which inherited cgroup_mutex from
cgroup core.

The only complication is from cpuset wanting to initiate task
migration when a cpuset loses all cpus or memory nodes.  Task
migration may go through full cgroup and all subsystem locking and
should be initiated without holding any cpuset specific lock; however,
a previous patch already made hotplug handled asynchronously and
moving the task migration part outside other locks is easy.
cpuset_propagate_hotplug_workfn() now invokes
remove_tasks_in_empty_cpuset() without holding any lock.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty</title>
<updated>2013-01-07T16:51:08Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-01-07T16:51:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02bb586372a71595203b3ff19a9be48eaa076f6c'/>
<id>urn:sha1:02bb586372a71595203b3ff19a9be48eaa076f6c</id>
<content type='text'>
cpuset is scheduled to be decoupled from cgroup_lock which will make
hotplug handling race with task migration.  cpus or mems will be
allowed to go offline between -&gt;can_attach() and -&gt;attach().  If
hotplug takes down all cpus or mems of a cpuset while attach is in
progress, -&gt;attach() may end up putting tasks into an empty cpuset.

This patchset makes -&gt;attach() schedule hotplug propagation if the
cpuset is empty after attaching is complete.  This will move the tasks
to the nearest ancestor which can execute and the end result would be
as if hotplug handling happened after the tasks finished attaching.

cpuset_write_resmask() now also flushes cpuset_propagate_hotplug_wq to
wait for propagations scheduled directly by cpuset_attach().

This currently doesn't make any functional difference as everything is
protected by cgroup_mutex but enables decoupling the locking.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cpuset: pin down cpus and mems while a task is being attached</title>
<updated>2013-01-07T16:51:07Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-01-07T16:51:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=452477fa68c6d8ef80adebd05194c1c157ad9a53'/>
<id>urn:sha1:452477fa68c6d8ef80adebd05194c1c157ad9a53</id>
<content type='text'>
cpuset is scheduled to be decoupled from cgroup_lock which will make
configuration updates race with task migration.  Any config update
will be allowed to happen between -&gt;can_attach() and -&gt;attach().  If
such config update removes either all cpus or mems, by the time
-&gt;attach() is called, the condition verified by -&gt;can_attach(), that
the cpuset is capable of hosting the tasks, is no longer true.

This patch adds cpuset-&gt;attach_in_progress which is incremented from
-&gt;can_attach() and decremented when the attach operation finishes
either successfully or not.  validate_change() treats cpusets w/
non-zero -&gt;attach_in_progress like cpusets w/ tasks and refuses to
remove all cpus or mems from it.

This currently doesn't make any functional difference as everything is
protected by cgroup_mutex but enables decoupling the locking.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cpuset: make CPU / memory hotplug propagation asynchronous</title>
<updated>2013-01-07T16:51:07Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-01-07T16:51:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d03394877ecdf87e1d694664c460747b8e05aa1'/>
<id>urn:sha1:8d03394877ecdf87e1d694664c460747b8e05aa1</id>
<content type='text'>
cpuset_hotplug_workfn() has been invoking cpuset_propagate_hotplug()
directly to propagate hotplug updates to !root cpusets; however, this
has the following problems.

* cpuset locking is scheduled to be decoupled from cgroup_mutex,
  cgroup_mutex will be unexported, and cgroup_attach_task() will do
  cgroup locking internally, so propagation can't synchronously move
  tasks to a parent cgroup while walking the hierarchy.

* We can't use cgroup generic tree iterator because propagation to
  each cpuset may sleep.  With propagation done asynchronously, we can
  lose the rather ugly cpuset specific iteration.

Convert cpuset_propagate_hotplug() to
cpuset_propagate_hotplug_workfn() and execute it from newly added
cpuset-&gt;hotplug_work.  The work items are run on an ordered workqueue,
so the propagation order is preserved.  cpuset_hotplug_workfn()
schedules all propagations while holding cgroup_mutex and waits for
completion without cgroup_mutex.  Each in-flight propagation holds a
reference to the cpuset-&gt;css.

This patch doesn't cause any functional difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
</feed>
