<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/Documentation/cgroups, branch v3.19</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.19</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.19'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-12-13T21:00:36Z</updated>
<entry>
<title>Merge branch 'akpm' (second patch-bomb from Andrew)</title>
<updated>2014-12-13T21:00:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-12-13T21:00:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=78a45c6f067824cf5d0a9fedea7339ac2e28603c'/>
<id>urn:sha1:78a45c6f067824cf5d0a9fedea7339ac2e28603c</id>
<content type='text'>
Merge second patchbomb from Andrew Morton:
 - the rest of MM
 - misc fs fixes
 - add execveat() syscall
 - new ratelimit feature for fault-injection
 - decompressor updates
 - ipc/ updates
 - fallocate feature creep
 - fsnotify cleanups
 - a few other misc things

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (99 commits)
  cgroups: Documentation: fix trivial typos and wrong paragraph numberings
  parisc: percpu: update comments referring to __get_cpu_var
  percpu: update local_ops.txt to reflect this_cpu operations
  percpu: remove __get_cpu_var and __raw_get_cpu_var macros
  fsnotify: remove destroy_list from fsnotify_mark
  fsnotify: unify inode and mount marks handling
  fallocate: create FAN_MODIFY and IN_MODIFY events
  mm/cma: make kmemleak ignore CMA regions
  slub: fix cpuset check in get_any_partial
  slab: fix cpuset check in fallback_alloc
  shmdt: use i_size_read() instead of -&gt;i_size
  ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
  ipc/msg: increase MSGMNI, remove scaling
  ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
  ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
  lib/decompress.c: consistency of compress formats for kernel image
  decompress_bunzip2: off by one in get_next_block()
  usr/Kconfig: make initrd compression algorithm selection not expert
  fault-inject: add ratelimit option
  ratelimit: add initialization macro
  ...
</content>
</entry>
<entry>
<title>cgroups: Documentation: fix trivial typos and wrong paragraph numberings</title>
<updated>2014-12-13T20:42:53Z</updated>
<author>
<name>SeongJae Park</name>
<email>sj38.park@gmail.com</email>
</author>
<published>2014-12-13T00:58:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=29d293b6007b91a4463f05bc8d0b26e0e65c5816'/>
<id>urn:sha1:29d293b6007b91a4463f05bc8d0b26e0e65c5816</id>
<content type='text'>
Signed-off-by: SeongJae Park &lt;sj38.park@gmail.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6</title>
<updated>2014-12-12T22:42:48Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-12-12T22:42:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=823e334ecd247dd49ca2c5c90414435d77135340'/>
<id>urn:sha1:823e334ecd247dd49ca2c5c90414435d77135340</id>
<content type='text'>
Pull documentation update from Jonathan Corbet:
 "Here's my set of accumulated documentation changes for 3.19.

  It includes a couple of additions to the coding style document, some
  fixes for minor build problems within the documentation tree, the
  relocation of the kselftest docs, and various tweaks and additions.

  A couple of changes reach outside of Documentation/; they only make
  trivial comment changes and I did my best to get the required acks.

  Complete with a shiny signed tag this time around"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6:
  kobject: grammar fix
  Input: xpad - update docs to reflect current state
  Documentation: Build mic/mpssd only for x86_64
  cgroups: Documentation: fix wrong cgroupfs paths
  Documentation/email-clients.txt: add info about Claws Mail
  CodingStyle: add some more error handling guidelines
  kselftest: Move the docs to the Documentation dir
  Documentation: fix formatting to make 's' happy
  Documentation: power: Fix typo in Documentation/power
  Documentation: vm: Add 1GB large page support information
  ipv4: add kernel parameter tcpmhash_entries
  Documentation: Fix a typo in mailbox.txt
  treewide: Fix typo in Documentation/DocBook/device-drivers
  CodingStyle: Add a chapter on conditional compilation
</content>
</entry>
<entry>
<title>mm: embed the memcg pointer directly into struct page</title>
<updated>2014-12-11T01:41:09Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-12-10T23:44:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1306a85aed3ec3db98945aafb7dfbe5648a1203c'/>
<id>urn:sha1:1306a85aed3ec3db98945aafb7dfbe5648a1203c</id>
<content type='text'>
Memory cgroups used to have 5 per-page pointers.  To allow users to
disable that amount of overhead during runtime, those pointers were
allocated in a separate array, with a translation layer between them and
struct page.

There is now only one page pointer remaining: the memcg pointer, that
indicates which cgroup the page is associated with when charged.  The
complexity of runtime allocation and the runtime translation overhead is
no longer justified to save that *potential* 0.19% of memory.  With
CONFIG_SLUB, page-&gt;mem_cgroup actually sits in the doubleword padding
after the page-&gt;private member and doesn't even increase struct page,
and then this patch actually saves space.  Remaining users that care can
still compile their kernels without CONFIG_MEMCG.

     text    data     bss     dec     hex     filename
  8828345 1725264  983040 11536649 b00909  vmlinux.old
  8827425 1725264  966656 11519345 afc571  vmlinux.new

[mhocko@suse.cz: update Documentation/cgroups/memory.txt]
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Acked-by: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel: res_counter: remove the unused API</title>
<updated>2014-12-11T01:41:04Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-12-10T23:42:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5b1efc027c0b51ca3e76f4e00c83358f8349f543'/>
<id>urn:sha1:5b1efc027c0b51ca3e76f4e00c83358f8349f543</id>
<content type='text'>
All memory accounting and limiting has been switched over to the
lockless page counters.  Bye, res_counter!

[akpm@linux-foundation.org: update Documentation/cgroups/memory.txt]
[mhocko@suse.cz: ditch the last remainings of res_counter]
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Paul Bolle &lt;pebolle@tiscali.nl&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: hugetlb_cgroup: convert to lockless page counters</title>
<updated>2014-12-11T01:41:04Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-12-10T23:42:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=71f87bee38edddb21d97895fa938744cf3f477bb'/>
<id>urn:sha1:71f87bee38edddb21d97895fa938744cf3f477bb</id>
<content type='text'>
Abandon the spinlock-protected byte counters in favor of the unlocked
page counters in the hugetlb controller as well.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: lockless page counters</title>
<updated>2014-12-11T01:41:04Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-12-10T23:42:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3e32cb2e0a12b6915056ff04601cf1bb9b44f967'/>
<id>urn:sha1:3e32cb2e0a12b6915056ff04601cf1bb9b44f967</id>
<content type='text'>
Memory is internally accounted in bytes, using spinlock-protected 64-bit
counters, even though the smallest accounting delta is a page.  The
counter interface is also convoluted and does too many things.

Introduce a new lockless word-sized page counter API, then change all
memory accounting over to it.  The translation from and to bytes then only
happens when interfacing with userspace.

The removed locking overhead is noticable when scaling beyond the per-cpu
charge caches - on a 4-socket machine with 144-threads, the following test
shows the performance differences of 288 memcgs concurrently running a
page fault benchmark:

vanilla:

   18631648.500498      task-clock (msec)         #  140.643 CPUs utilized            ( +-  0.33% )
         1,380,638      context-switches          #    0.074 K/sec                    ( +-  0.75% )
            24,390      cpu-migrations            #    0.001 K/sec                    ( +-  8.44% )
     1,843,305,768      page-faults               #    0.099 M/sec                    ( +-  0.00% )
50,134,994,088,218      cycles                    #    2.691 GHz                      ( +-  0.33% )
   &lt;not supported&gt;      stalled-cycles-frontend
   &lt;not supported&gt;      stalled-cycles-backend
 8,049,712,224,651      instructions              #    0.16  insns per cycle          ( +-  0.04% )
 1,586,970,584,979      branches                  #   85.176 M/sec                    ( +-  0.05% )
     1,724,989,949      branch-misses             #    0.11% of all branches          ( +-  0.48% )

     132.474343877 seconds time elapsed                                          ( +-  0.21% )

lockless:

   12195979.037525      task-clock (msec)         #  133.480 CPUs utilized            ( +-  0.18% )
           832,850      context-switches          #    0.068 K/sec                    ( +-  0.54% )
            15,624      cpu-migrations            #    0.001 K/sec                    ( +- 10.17% )
     1,843,304,774      page-faults               #    0.151 M/sec                    ( +-  0.00% )
32,811,216,801,141      cycles                    #    2.690 GHz                      ( +-  0.18% )
   &lt;not supported&gt;      stalled-cycles-frontend
   &lt;not supported&gt;      stalled-cycles-backend
 9,999,265,091,727      instructions              #    0.30  insns per cycle          ( +-  0.10% )
 2,076,759,325,203      branches                  #  170.282 M/sec                    ( +-  0.12% )
     1,656,917,214      branch-misses             #    0.08% of all branches          ( +-  0.55% )

      91.369330729 seconds time elapsed                                          ( +-  0.45% )

On top of improved scalability, this also gets rid of the icky long long
types in the very heart of memcg, which is great for 32 bit and also makes
the code a lot more readable.

Notable differences between the old and new API:

- res_counter_charge() and res_counter_charge_nofail() become
  page_counter_try_charge() and page_counter_charge() resp. to match
  the more common kernel naming scheme of try_do()/do()

- res_counter_uncharge_until() is only ever used to cancel a local
  counter and never to uncharge bigger segments of a hierarchy, so
  it's replaced by the simpler page_counter_cancel()

- res_counter_set_limit() is replaced by page_counter_limit(), which
  expects its callers to serialize against themselves

- res_counter_memparse_write_strategy() is replaced by
  page_counter_limit(), which rounds down to the nearest page size -
  rather than up.  This is more reasonable for explicitely requested
  hard upper limits.

- to keep charging light-weight, page_counter_try_charge() charges
  speculatively, only to roll back if the result exceeds the limit.
  Because of this, a failing bigger charge can temporarily lock out
  smaller charges that would otherwise succeed.  The error is bounded
  to the difference between the smallest and the biggest possible
  charge size, so for memcg, this means that a failing THP charge can
  send base page charges into reclaim upto 2MB (4MB) before the limit
  would have been reached.  This should be acceptable.

[akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse]
[akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroups: Documentation: fix wrong cgroupfs paths</title>
<updated>2014-12-03T13:53:12Z</updated>
<author>
<name>SeongJae Park</name>
<email>sj38.park@gmail.com</email>
</author>
<published>2014-12-03T11:53:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=845502d26301f8c37e2a57ad867672f306e674fb'/>
<id>urn:sha1:845502d26301f8c37e2a57ad867672f306e674fb</id>
<content type='text'>
Few paths used as example to describe cgroupfs usage have been wrong
from f6e07d38078e ("Documentation: update cgroupfs mount point") by
mistake. This patch fix those trivial wrong paths.

Signed-off-by: SeongJae Park &lt;sj38.park@gmail.com&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags</title>
<updated>2014-09-25T02:16:06Z</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-09-25T01:41:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ad654bc5e2b211e92f66da1d819e47d79a866f0'/>
<id>urn:sha1:2ad654bc5e2b211e92f66da1d819e47d79a866f0</id>
<content type='text'>
When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk-&gt;flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.

Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.

Here's the full report:
https://lkml.org/lkml/2014/9/19/230

To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.

v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Cc: &lt;stable@vger.kernel.org&gt; # 2.6.31+
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: memcontrol: rewrite uncharge API</title>
<updated>2014-08-08T22:57:17Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2014-08-08T21:19:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0a31bc97c80c3fa87b32c091d9a930ac19cd0c40'/>
<id>urn:sha1:0a31bc97c80c3fa87b32c091d9a930ac19cd0c40</id>
<content type='text'>
The memcg uncharging code that is involved towards the end of a page's
lifetime - truncation, reclaim, swapout, migration - is impressively
complicated and fragile.

Because anonymous and file pages were always charged before they had their
page-&gt;mapping established, uncharges had to happen when the page type
could still be known from the context; as in unmap for anonymous, page
cache removal for file and shmem pages, and swap cache truncation for swap
pages.  However, these operations happen well before the page is actually
freed, and so a lot of synchronization is necessary:

- Charging, uncharging, page migration, and charge migration all need
  to take a per-page bit spinlock as they could race with uncharging.

- Swap cache truncation happens during both swap-in and swap-out, and
  possibly repeatedly before the page is actually freed.  This means
  that the memcg swapout code is called from many contexts that make
  no sense and it has to figure out the direction from page state to
  make sure memory and memory+swap are always correctly charged.

- On page migration, the old page might be unmapped but then reused,
  so memcg code has to prevent untimely uncharging in that case.
  Because this code - which should be a simple charge transfer - is so
  special-cased, it is not reusable for replace_page_cache().

But now that charged pages always have a page-&gt;mapping, introduce
mem_cgroup_uncharge(), which is called after the final put_page(), when we
know for sure that nobody is looking at the page anymore.

For page migration, introduce mem_cgroup_migrate(), which is called after
the migration is successful and the new page is fully rmapped.  Because
the old page is no longer uncharged after migration, prevent double
charges by decoupling the page's memcg association (PCG_USED and
pc-&gt;mem_cgroup) from the page holding an actual charge.  The new bits
PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
to the new page during migration.

mem_cgroup_migrate() is suitable for replace_page_cache() as well,
which gets rid of mem_cgroup_replace_page_cache().  However, care
needs to be taken because both the source and the target page can
already be charged and on the LRU when fuse is splicing: grab the page
lock on the charge moving side to prevent changing pc-&gt;mem_cgroup of a
page under migration.  Also, the lruvecs of both pages change as we
uncharge the old and charge the new during migration, and putback may
race with us, so grab the lru lock and isolate the pages iff on LRU to
prevent races and ensure the pages are on the right lruvec afterward.

Swap accounting is massively simplified: because the page is no longer
uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
before the final put_page() in page reclaim.

Finally, page_cgroup changes are now protected by whatever protection the
page itself offers: anonymous pages are charged under the page table lock,
whereas page cache insertions, swapin, and migration hold the page lock.
Uncharging happens under full exclusion with no outstanding references.
Charging and uncharging also ensure that the page is off-LRU, which
serializes against charge migration.  Remove the very costly page_cgroup
lock and set pc-&gt;flags non-atomically.

[mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
[vdavydov@parallels.com: fix flags definition]
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Tested-by: Jet Chen &lt;jet.chen@intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Tested-by: Felipe Balbi &lt;balbi@ti.com&gt;
Signed-off-by: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
