<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/fork.c, branch v4.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-02-13T02:54:10Z</updated>
<entry>
<title>mm: do not use mm-&gt;nr_pmds on !MMU configurations</title>
<updated>2015-02-13T02:54:10Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-12T22:59:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2d2f5119b8bb057595e18f5b2f07aa097ea1b233'/>
<id>urn:sha1:2d2f5119b8bb057595e18f5b2f07aa097ea1b233</id>
<content type='text'>
mm-&gt;nr_pmds doesn't make sense on !MMU configurations

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix false-positive warning on exit due mm_nr_pmds(mm)</title>
<updated>2015-02-12T01:06:04Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-11T23:26:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b30fe6c7ced70f62862c3d09357e7e8084e98d9f'/>
<id>urn:sha1:b30fe6c7ced70f62862c3d09357e7e8084e98d9f</id>
<content type='text'>
The problem is that we check nr_ptes/nr_pmds in exit_mmap() which happens
*before* pgd_free().  And if an arch does pte/pmd allocation in
pgd_alloc() and frees them in pgd_free() we see offset in counters by the
time of the checks.

We tried to workaround this by offsetting expected counter value according
to FIRST_USER_ADDRESS for both nr_pte and nr_pmd in exit_mmap().  But it
doesn't work in some cases:

1. ARM with LPAE enabled also has non-zero USER_PGTABLES_CEILING, but
   upper addresses occupied with huge pmd entries, so the trick with
   offsetting expected counter value will get really ugly: we will have
   to apply it nr_pmds, but not nr_ptes.

2. Metag has non-zero FIRST_USER_ADDRESS, but doesn't do allocation
   pte/pmd page tables allocation in pgd_alloc(), just setup a pgd entry
   which is allocated at boot and shared accross all processes.

The proposal is to move the check to check_mm() which happens *after*
pgd_free() and do proper accounting during pgd_alloc() and pgd_free()
which would bring counters to zero if nothing leaked.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-by: Tyler Baker &lt;tyler.baker@linaro.org&gt;
Tested-by: Tyler Baker &lt;tyler.baker@linaro.org&gt;
Tested-by: Nishanth Menon &lt;nm@ti.com&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Guan Xuetao &lt;gxt@mprc.pku.edu.cn&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: account pmd page tables to the process</title>
<updated>2015-02-12T01:06:04Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-11T23:26:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dc6c9a35b66b520cf67e05d8ca60ebecad3b0479'/>
<id>urn:sha1:dc6c9a35b66b520cf67e05d8ca60ebecad3b0479</id>
<content type='text'>
Dave noticed that unprivileged process can allocate significant amount of
memory -- &gt;500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup.  The trick is to allocate a lot of PMD page tables.  Linux
kernel doesn't account PMD tables to the process, only PTE.

The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low.  oom_score for the process will be 0.

	#include &lt;errno.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;stdlib.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;sys/mman.h&gt;
	#include &lt;sys/prctl.h&gt;

	#define PUD_SIZE (1UL &lt;&lt; 30)
	#define PMD_SIZE (1UL &lt;&lt; 21)

	#define NR_PUD 130000

	int main(void)
	{
		char *addr = NULL;
		unsigned long i;

		prctl(PR_SET_THP_DISABLE);
		for (i = 0; i &lt; NR_PUD ; i++) {
			addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
			if (addr == MAP_FAILED) {
				perror("mmap");
				break;
			}
			*addr = 'x';
			munmap(addr, PMD_SIZE);
			mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
			if (addr == MAP_FAILED)
				perror("re-mmap"), exit(1);
		}
		printf("PID %d consumed %lu KiB in PMD page tables\n",
				getpid(), i * 4096 &gt;&gt; 10);
		return pause();
	}

The patch addresses the issue by account PMD tables to the process the
same way we account PTE.

The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:

 - HugeTLB can share PMD page tables. The patch handles by accounting
   the table to all processes who share it.

 - x86 PAE pre-allocates few PMD tables on fork.

 - Architectures with FIRST_USER_ADDRESS &gt; 0. We need to adjust sanity
   check on exit(2).

Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded).  As with nr_ptes we use per-mm counter.  The
counter value is used to calculate baseline for badness score by
oom-killer.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Tested-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>rmap: drop support of non-linear mappings</title>
<updated>2015-02-10T22:30:31Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-02-10T22:09:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=27ba0644ea9dfe6e7693abc85837b60e40583b96'/>
<id>urn:sha1:27ba0644ea9dfe6e7693abc85837b60e40583b96</id>
<content type='text'>
We don't create non-linear mappings anymore.  Let's drop code which
handles them in rmap.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use new helper functions around the i_mmap_mutex</title>
<updated>2014-12-13T20:42:45Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2014-12-13T00:54:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=83cde9e8ba95d180eaefefe834958fbf7008cf39'/>
<id>urn:sha1:83cde9e8ba95d180eaefefe834958fbf7008cf39</id>
<content type='text'>
Convert all open coded mutex_lock/unlock calls to the
i_mmap_[lock/unlock]_write() helpers.

Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>signal: Document the RCU protection of -&gt;sighand</title>
<updated>2014-10-29T17:07:18Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-09-28T21:44:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=392809b25833548ccfc55e61b76c8451a5073216'/>
<id>urn:sha1:392809b25833548ccfc55e61b76c8451a5073216</id>
<content type='text'>
__cleanup_sighand() frees sighand without RCU grace period. This is
correct but this looks "obviously buggy" and constantly confuses the
readers, add the comments to explain how this works.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Pranith Kumar &lt;bobby.prani@gmail.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-10-13T14:23:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-10-13T14:23:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=faafcba3b5e15999cf75d5c5a513ac8e47e2545f'/>
<id>urn:sha1:faafcba3b5e15999cf75d5c5a513ac8e47e2545f</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
     Hansen)

   - Various sched/idle refinements for better idle handling (Nicolas
     Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

   - sched/numa updates and optimizations (Rik van Riel)

   - sysbench speedup (Vincent Guittot)

   - capacity calculation cleanups/refactoring (Vincent Guittot)

   - Various cleanups to thread group iteration (Oleg Nesterov)

   - Double-rq-lock removal optimization and various refactorings
     (Kirill Tkhai)

   - various sched/deadline fixes

  ... and lots of other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
  sched/fair: Delete resched_cpu() from idle_balance()
  sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
  sched: Improve sysbench performance by fixing spurious active migration
  sched/x86: Fix up typo in topology detection
  x86, sched: Add new topology for multi-NUMA-node CPUs
  sched/rt: Use resched_curr() in task_tick_rt()
  sched: Use rq-&gt;rd in sched_setaffinity() under RCU read lock
  sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
  sched: Use dl_bw_of() under RCU read lock
  sched/fair: Remove duplicate code from can_migrate_task()
  sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  sched: print_rq(): Don't use tasklist_lock
  sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
  sched: Fix the task-group check in tg_has_rt_tasks()
  sched/fair: Leverage the idle state info when choosing the "idlest" cpu
  sched: Let the scheduler see CPU idle states
  sched/deadline: Fix inter- exclusive cpusets migrations
  sched/deadline: Clear dl_entity params when setscheduling to different class
  sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
  ...
</content>
</entry>
<entry>
<title>mm: use VM_BUG_ON_MM where possible</title>
<updated>2014-10-10T02:25:58Z</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2014-10-09T22:28:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=96dad67ff244e797c4bc3e4f7f0fdaa0cfdf0a7d'/>
<id>urn:sha1:96dad67ff244e797c4bc3e4f7f0fdaa0cfdf0a7d</id>
<content type='text'>
Dump the contents of the relevant struct_mm when we hit the bug condition.

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>perf: fix perf bug in fork()</title>
<updated>2014-10-02T23:28:44Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-10-02T23:17:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6c72e3501d0d62fc064d3680e5234f3463ec5a86'/>
<id>urn:sha1:6c72e3501d0d62fc064d3680e5234f3463ec5a86</id>
<content type='text'>
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on -&gt;perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process.  This is bad..

Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Sylvain 'ythier' Hitier &lt;sylvain.hitier@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>init/main.c: Give init_task a canary</title>
<updated>2014-09-19T10:35:22Z</updated>
<author>
<name>Aaron Tomlin</name>
<email>atomlin@redhat.com</email>
</author>
<published>2014-09-12T13:16:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d4311ff1a8da48d609db9500f121c15580dfeeb7'/>
<id>urn:sha1:d4311ff1a8da48d609db9500f121c15580dfeeb7</id>
<content type='text'>
Tasks get their end of stack set to STACK_END_MAGIC with the
aim to catch stack overruns. Currently this feature does not
apply to init_task. This patch removes this restriction.

Note that a similar patch was posted by Prarit Bhargava
some time ago but was never merged:

  http://marc.info/?l=linux-kernel&amp;m=127144305403241&amp;w=2

Signed-off-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: aneesh.kumar@linux.vnet.ibm.com
Cc: dzickus@redhat.com
Cc: bmr@redhat.com
Cc: jcastillo@redhat.com
Cc: jgh@redhat.com
Cc: minchan@kernel.org
Cc: tglx@linutronix.de
Cc: hannes@cmpxchg.org
Cc: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Daeseok Youn &lt;daeseok.youn@gmail.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Masami Hiramatsu &lt;masami.hiramatsu.pt@hitachi.com&gt;
Cc: Michael Opdenacker &lt;michael.opdenacker@free-electrons.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Cc: Seiji Aguchi &lt;seiji.aguchi@hds.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Vladimir Davydov &lt;vdavydov@parallels.com&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
