<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cgroup.c, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-01-22T15:42:58Z</updated>
<entry>
<title>cgroup: make sure a parent css isn't freed before its children</title>
<updated>2016-01-22T15:42:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-01-21T20:32:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8bb5ef79bc0f4016ecf79e8dce6096a3c63603e4'/>
<id>urn:sha1:8bb5ef79bc0f4016ecf79e8dce6096a3c63603e4</id>
<content type='text'>
There are three subsystem callbacks in css shutdown path -
css_offline(), css_released() and css_free().  Except for
css_released(), cgroup core didn't guarantee the order of invocation.
css_offline() or css_free() could be called on a parent css before its
children.  This behavior is unexpected and led to bugs in cpu and
memory controller.

The previous patch updated ordering for css_offline() which fixes the
cpu controller issue.  While there currently isn't a known bug caused
by misordering of css_free() invocations, let's fix it too for
consistency.

css_free() ordering can be trivially fixed by moving putting of the
parent css below css_free() invocation.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>cgroup: make sure a parent css isn't offlined before its children</title>
<updated>2016-01-22T15:42:57Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-01-21T20:31:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa226ff4a1ce79f229c6b7a4c0a14e17fececd01'/>
<id>urn:sha1:aa226ff4a1ce79f229c6b7a4c0a14e17fececd01</id>
<content type='text'>
There are three subsystem callbacks in css shutdown path -
css_offline(), css_released() and css_free().  Except for
css_released(), cgroup core didn't guarantee the order of invocation.
css_offline() or css_free() could be called on a parent css before its
children.  This behavior is unexpected and led to bugs in cpu and
memory controller.

This patch updates offline path so that a parent css is never offlined
before its children.  Each css keeps online_cnt which reaches zero iff
itself and all its children are offline and offline_css() is invoked
only after online_cnt reaches zero.

This fixes the memory controller bug and allows the fix for cpu
controller.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Reported-by: Brian Christiansen &lt;brian.o.christiansen@gmail.com&gt;
Link: http://lkml.kernel.org/g/5698A023.9070703@de.ibm.com
Link: http://lkml.kernel.org/g/CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg@mail.gmail.com
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>cpuset: make mm migration asynchronous</title>
<updated>2016-01-22T15:22:46Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-01-19T17:18:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e93ad19d05648397ef3bcb838d26aec06c245dc0'/>
<id>urn:sha1:e93ad19d05648397ef3bcb838d26aec06c245dc0</id>
<content type='text'>
If "cpuset.memory_migrate" is set, when a process is moved from one
cpuset to another with a different memory node mask, pages in used by
the process are migrated to the new set of nodes.  This was performed
synchronously in the -&gt;attach() callback, which is synchronized
against process management.  Recently, the synchronization was changed
from per-process rwsem to global percpu rwsem for simplicity and
optimization.

Combined with the synchronous mm migration, this led to deadlocks
because mm migration could schedule a work item which may in turn try
to create a new worker blocking on the process management lock held
from cgroup process migration path.

This heavy an operation shouldn't be performed synchronously from that
deep inside cgroup migration in the first place.  This patch punts the
actual migration to an ordered workqueue and updates cgroup process
migration and cpuset config update paths to flush the workqueue after
all locks are released.  This way, the operations still seem
synchronous to userland without entangling mm migration with process
management synchronization.  CPU hotplug can also invoke mm migration
but there's no reason for it to wait for mm migrations and thus
doesn't synchronize against their completions.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: stable@vger.kernel.org # v4.4+
</content>
</entry>
<entry>
<title>Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2016-01-13T03:20:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-01-13T03:20:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8'/>
<id>urn:sha1:34a9304a96d6351c2d35dcdc9293258378fc0bd8</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - cgroup v2 interface is now official.  It's no longer hidden behind a
   devel flag and can be mounted using the new cgroup2 fs type.

   Unfortunately, cpu v2 interface hasn't made it yet due to the
   discussion around in-process hierarchical resource distribution and
   only memory and io controllers can be used on the v2 interface at the
   moment.

 - The existing documentation which has always been a bit of mess is
   relocated under Documentation/cgroup-v1/. Documentation/cgroup-v2.txt
   is added as the authoritative documentation for the v2 interface.

 - Some features are added through for-4.5-ancestor-test branch to
   enable netfilter xt_cgroup match to use cgroup v2 paths.  The actual
   netfilter changes will be merged through the net tree which pulled in
   the said branch.

 - Various cleanups

* 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rename cgroup documentations
  cgroup: fix a typo.
  cgroup: Remove resource_counter.txt in Documentation/cgroup-legacy/00-INDEX.
  cgroup: demote subsystem init messages to KERN_DEBUG
  cgroup: Fix uninitialized variable warning
  cgroup: put controller Kconfig options in meaningful order
  cgroup: clean up the kernel configuration menu nomenclature
  cgroup_pids: fix a typo.
  Subject: cgroup: Fix incomplete dd command in blkio documentation
  cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends
  cpuset: Replace all instances of time_t with time64_t
  cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation
  cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/
  cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
</content>
</entry>
<entry>
<title>cgroup: fix a typo.</title>
<updated>2016-01-10T17:26:45Z</updated>
<author>
<name>Rami Rosen</name>
<email>rami.rosen@intel.com</email>
</author>
<published>2016-01-09T21:33:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2cfa2b191e56fd4c5c6ffb72e1d1e82ed225676d'/>
<id>urn:sha1:2cfa2b191e56fd4c5c6ffb72e1d1e82ed225676d</id>
<content type='text'>
This patch fixes a typo in a comment in cgroup.c.

Signed-off-by: Rami Rosen &lt;rami.rosen@intel.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: demote subsystem init messages to KERN_DEBUG</title>
<updated>2016-01-02T11:49:43Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-29T19:53:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a5ae989957cbc3f3b7bc40677bd9e459c0917528'/>
<id>urn:sha1:a5ae989957cbc3f3b7bc40677bd9e459c0917528</id>
<content type='text'>
These are noisy during boot and not all that interesting.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-12-18T03:08:28Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-12-18T03:08:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b3e0d3d7bab14f2544a3314bec53a23dc7dd2206'/>
<id>urn:sha1:b3e0d3d7bab14f2544a3314bec53a23dc7dd2206</id>
<content type='text'>
Conflicts:
	drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net, cgroup: cgroup_sk_updat_lock was missing initializer</title>
<updated>2015-12-14T19:20:33Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-14T16:24:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3fa4cc9c2df37b393b968dcc3bb2ab1e2ff7ea7f'/>
<id>urn:sha1:3fa4cc9c2df37b393b968dcc3bb2ab1e2ff7ea7f</id>
<content type='text'>
bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup") added global
spinlock cgroup_sk_update_lock but erroneously skipped initializer
leading to uninitialized spinlock warning.  Fix it by using
DEFINE_SPINLOCK().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Dexuan Cui &lt;decui@microsoft.com&gt;
Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sock, cgroup: add sock-&gt;sk_cgroup</title>
<updated>2015-12-09T03:02:33Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-07T22:38:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bd1060a1d67128bb8fbe2e1384c518912cbe54e7'/>
<id>urn:sha1:bd1060a1d67128bb8fbe2e1384c518912cbe54e7</id>
<content type='text'>
In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound.  As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.

net_cls and net_prio controllers are examples of the latter.  They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.

Both net_cls and net_prio aren't properly hierarchical.  Both inherit
configuration from the parent on creation but there's no interaction
afterwards.  An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level.  net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.

While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.

In preparation, this patch updates sock-&gt;sk_cgrp_data handling so that
it points to the v2 cgroup that sock was created in until either
net_prio or net_cls is used.  Once either of the two is used,
sock-&gt;sk_cgrp_data reverts to its previous role of carrying prioidx
and classid.  This is to avoid adding yet another cgroup related field
to struct sock.

As the mode switching can happen at most once per boot, the switching
mechanism is aimed at lowering hot path overhead.  It may leak a
finite, likely small, number of cgroup refs and report spurious
prioidx or classid on switching; however, dynamic updates of prioidx
and classid have always been racy and lossy - socks between creation
and fd installation are never updated, config changes don't update
existing sockets at all, and prioidx may index with dead and recycled
cgroup IDs.  Non-critical inaccuracies from small race windows won't
make any noticeable difference.

This patch doesn't make use of the pointer yet.  The following patch
will implement netfilter match for cgroup2 membership.

v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
    cgroup specific field.

v3: Add comments explaining why sock_data_prioidx() and
    sock_data_classid() use different fallback values.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
CC: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.5-ancestor-test' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-4.5</title>
<updated>2015-12-07T22:24:10Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-12-07T22:24:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=177493987c1a15145922a65240f2f8ab6c63770a'/>
<id>urn:sha1:177493987c1a15145922a65240f2f8ab6c63770a</id>
<content type='text'>
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
