<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/cgroup.c, branch v3.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2012-07-07T23:08:18Z</updated>
<entry>
<title>cgroup: fix cgroup hierarchy umount race</title>
<updated>2012-07-07T23:08:18Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-07-07T23:08:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5db9a4d99b0157a513944e9a44d29c9cec2e91dc'/>
<id>urn:sha1:5db9a4d99b0157a513944e9a44d29c9cec2e91dc</id>
<content type='text'>
48ddbe1946 "cgroup: make css-&gt;refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

Destroying a superblock which has dentries with positive refcnts is a
critical bug and triggers BUG() in vfs code.  As each cgroup dentry
holds an s_active reference, any lingering cgroup has both its dentry
and the superblock pinned and thus preventing premature release of
superblock.

Unfortunately, after 48ddbe1946, there's a small window while
releasing a cgroup which is directly under the root of the hierarchy.
When a cgroup directory is released, vfs layer first deletes the
corresponding dentry and then invokes dput() on the parent, which may
recurse further, so when a cgroup directly below root cgroup is
released, the cgroup is first destroyed - which releases the s_active
it was holding - and then the dentry for the root cgroup is dput().

This creates a window where the root dentry's refcnt isn't zero but
superblock's s_active is.  If umount happens before or during this
window, vfs will see the root dentry with non-zero refcnt and trigger
BUG().

Before 48ddbe1946, this problem didn't exist because the last dentry
reference was guaranteed to be put synchronously from rmdir(2)
invocation which holds s_active around the whole process.

Fix it by holding an extra superblock-&gt;s_active reference across
dput() from css release, which is the dput() path added by 48ddbe1946
and the only one which doesn't hold an extra s_active ref across the
final cgroup dput().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
LKML-Reference: &lt;4FEEA5CB.8070809@huawei.com&gt;
Reported-by: shyju pv &lt;shyju.pv@huawei.com&gt;
Tested-by: shyju pv &lt;shyju.pv@huawei.com&gt;
Cc: Sasha Levin &lt;levinsasha928@gmail.com&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>Revert "cgroup: superblock can't be released with active dentries"</title>
<updated>2012-07-07T22:55:47Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-07-07T22:55:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7db5b3ca0ecdb2e8fad52a4770e4e320e61c77a6'/>
<id>urn:sha1:7db5b3ca0ecdb2e8fad52a4770e4e320e61c77a6</id>
<content type='text'>
This reverts commit fa980ca87d15bb8a1317853f257a505990f3ffde.  The
commit was an attempt to fix a race condition where a cgroup hierarchy
may be unmounted with positive dentry reference on root cgroup.  While
the commit made the race condition slightly more difficult to trigger,
the race was still there and could be reliably triggered using a
different test case.

Revert the incorrect fix.  The next commit will describe the race and
fix it correctly.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
LKML-Reference: &lt;4FEEA5CB.8070809@huawei.com&gt;
Reported-by: shyju pv &lt;shyju.pv@huawei.com&gt;
Cc: Sasha Levin &lt;levinsasha928@gmail.com&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>cgroups: Account for CSS_DEACT_BIAS in __css_put</title>
<updated>2012-06-18T22:38:02Z</updated>
<author>
<name>Salman Qazi</name>
<email>sqazi@google.com</email>
</author>
<published>2012-06-14T21:55:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8e3bbf42c6d73881956863cc3305456afe2bc4ea'/>
<id>urn:sha1:8e3bbf42c6d73881956863cc3305456afe2bc4ea</id>
<content type='text'>
When we fixed the race between atomic_dec and css_refcnt, we missed
the fact that css_refcnt internally subtracts CSS_DEACT_BIAS to get
the actual reference count.  This can potentially cause a refcount leak
if __css_put races with cgroup_clear_css_refs.

Signed-off-by: Salman Qazi &lt;sqazi@google.com&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: make sure that decisions in __css_put are atomic</title>
<updated>2012-06-07T01:51:35Z</updated>
<author>
<name>Salman Qazi</name>
<email>sqazi@google.com</email>
</author>
<published>2012-06-07T01:51:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=967db0ea65b0bf8507a7643ac8f296c4f2c0a834'/>
<id>urn:sha1:967db0ea65b0bf8507a7643ac8f296c4f2c0a834</id>
<content type='text'>
__css_put is using atomic_dec on the ref count, and then
looking at the ref count to make decisions.  This is prone
to races, as someone else may decrement ref count between
our decrement and our decision.  Instead, we should base our
decisions on the value that we decremented the ref count to.

(This results in an actual race on Google's kernel which I
haven't been able to reproduce on the upstream kernel.  Having
said that, it's still incorrect by inspection).

Signed-off-by: Salman Qazi &lt;sqazi@google.com&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@vger.kernel.org
</content>
</entry>
<entry>
<title>Merge branch 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2012-06-05T18:54:12Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-06-05T18:54:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=365f0e173f44aad979c464eb8250f6138a9911ef'/>
<id>urn:sha1:365f0e173f44aad979c464eb8250f6138a9911ef</id>
<content type='text'>
Pull cgroup fix from Tejun Heo:
 "This fixes the possible premature superblock release on umount bug
  mentioned during v3.5-rc1 pull request.

  Originally, cgroup dentry destruction path assumed that cgroup dentry
  didn't have any reference left after cgroup removal thus put super
  during dentry removal.  Now that there can be lingering dentry
  references, this led to super being put with live dentries.  This
  patch fixes the problem by putting super ref on dentry release instead
  of removal."

* 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: superblock can't be released with active dentries
</content>
</entry>
<entry>
<title>kernel: cgroup: push rcu read locking from css_is_ancestor() to callsite</title>
<updated>2012-05-29T23:22:20Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2012-05-29T22:06:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=91c63734f6908425903aed69c04035592f18d398'/>
<id>urn:sha1:91c63734f6908425903aed69c04035592f18d398</id>
<content type='text'>
Library functions should not grab locks when the callsites can do it,
even if the lock nests like the rcu read-side lock does.

Push the rcu_read_lock() from css_is_ancestor() to its single user,
mem_cgroup_same_or_subtree() in preparation for another user that may
already hold the rcu read-side lock.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: superblock can't be released with active dentries</title>
<updated>2012-05-28T00:22:56Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-05-24T15:24:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa980ca87d15bb8a1317853f257a505990f3ffde'/>
<id>urn:sha1:fa980ca87d15bb8a1317853f257a505990f3ffde</id>
<content type='text'>
48ddbe1946 "cgroup: make css-&gt;refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

cgroup_create() does grab an active reference on the superblock to
prevent it from going away while there are !root cgroups; however, the
reference is put from cgroup_diput() which is invoked on cgroup
removal, so cgroup dentries which are removed but persisting due to
lingering csses already have released their superblock active refs
allowing superblock to be killed while those dentries are around.

Given the right condition, this makes cgroup_kill_sb() call
kill_litter_super() with dentries with non-zero d_count leading to
BUG() in shrink_dcache_for_umount_subtree().

Fix it by adding cgroup_dops-&gt;d_release() operation and moving
deactivate_super() to it.  cgroup_diput() now marks dentry-&gt;d_fsdata
with itself if superblock should be deactivated and cgroup_d_release()
deactivates the superblock on dentry release.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Sasha Levin &lt;levinsasha928@gmail.com&gt;
Tested-by: Sasha Levin &lt;levinsasha928@gmail.com&gt;
LKML-Reference: &lt;CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2012-05-24T00:42:39Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-05-24T00:42:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=644473e9c60c1ff4f6351fed637a6e5551e3dce7'/>
<id>urn:sha1:644473e9c60c1ff4f6351fed637a6e5551e3dce7</id>
<content type='text'>
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
</content>
</entry>
<entry>
<title>userns: Convert cgroup permission checks to use uid_eq</title>
<updated>2012-05-15T21:59:30Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2012-03-12T22:44:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=14a590c3f987977d7b09ec926481ee0238c08eee'/>
<id>urn:sha1:14a590c3f987977d7b09ec926481ee0238c08eee</id>
<content type='text'>
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads</title>
<updated>2012-04-23T18:03:51Z</updated>
<author>
<name>Mike Galbraith</name>
<email>mgalbraith@suse.de</email>
</author>
<published>2012-04-21T07:13:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c4c27fbdda4e8ba87806c415b6d15266b07bce4b'/>
<id>urn:sha1:c4c27fbdda4e8ba87806c415b6d15266b07bce4b</id>
<content type='text'>
Allowing kthreadd to be moved to a non-root group makes no sense, it being
a global resource, and needlessly leads unsuspecting users toward trouble.

1. An RT workqueue worker thread spawned in a task group with no rt_runtime
allocated is not schedulable.  Simple user error, but harmful to the box.

2. A worker thread which acquires PF_THREAD_BOUND can never leave a cpuset,
rendering the cpuset immortal.

Save the user some unexpected trouble, just say no.

Signed-off-by: Mike Galbraith &lt;mgalbraith@suse.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
