<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/fork.c, branch v3.13</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.13</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.13'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-01-18T01:29:36Z</updated>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2014-01-18T01:29:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-01-18T01:29:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=48ba620aab90f4c7e9bb002e2f30863a4ea0f915'/>
<id>urn:sha1:48ba620aab90f4c7e9bb002e2f30863a4ea0f915</id>
<content type='text'>
Pull namespace fixes from Eric Biederman:
 "This is a set of 3 regression fixes.

  This fixes /proc/mounts when using "ip netns add &lt;netns&gt;" to display
  the actual mount point.

  This fixes a regression in clone that broke lxc-attach.

  This fixes a regression in the permission checks for mounting /proc
  that made proc unmountable if binfmt_misc was in use.  Oops.

  My apologies for sending this pull request so late.  Al Viro gave
  interesting review comments about the d_path fix that I wanted to
  address in detail before I sent this pull request.  Unfortunately a
  bad round of colds kept from addressing that in detail until today.
  The executive summary of the review was:

  Al: Is patching d_path really sufficient?
      The prepend_path, d_path, d_absolute_path, and __d_path family of
      functions is a really mess.

  Me: Yes, patching d_path is really sufficient.  Yes, the code is mess.
      No it is not appropriate to rewrite all of d_path for a regression
      that has existed for entirely too long already, when a two line
      change will do"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Fix a regression in mounting proc
  fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)
  vfs: In d_path don't call d_dname on a mount point
</content>
</entry>
<entry>
<title>mm: fix TLB flush race between migration, and change_protection_range</title>
<updated>2013-12-19T03:04:51Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2013-12-19T01:08:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=20841405940e7be0617612d521e206e4b6b325db'/>
<id>urn:sha1:20841405940e7be0617612d521e206e4b6b325db</id>
<content type='text'>
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.

The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.

During that time, a CPU may continue writing to the page.

This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.

When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.

This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).

The basic race looks like this:

CPU A			CPU B			CPU C

						load TLB entry
make entry PTE/PMD_NUMA
			fault on entry
						read/write old page
			start migrating page
			change PTE/PMD to new page
						read/write old page [*]
flush TLB
						reload TLB from new entry
						read/write new page
						lose data

[*] the old page may belong to a new user at this point!

The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.

This should fix both NUMA migration and compaction.

[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)</title>
<updated>2013-11-27T04:54:15Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2013-11-15T05:10:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1f7f4dde5c945f41a7abc2285be43d918029ecc5'/>
<id>urn:sha1:1f7f4dde5c945f41a7abc2285be43d918029ecc5</id>
<content type='text'>
Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt; writes:
&gt; Hi Oleg,
&gt;
&gt; commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
&gt; "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
&gt; breaks lxc-attach in 3.12.  That code forks a child which does
&gt; setns() and then does a clone(CLONE_PARENT).  That way the
&gt; grandchild can be in the right namespaces (which the child was
&gt; not) and be a child of the original task, which is the monitor.
&gt;
&gt; lxc-attach in 3.11 was working fine with no side effects that I
&gt; could see.  Is there a real danger in allowing CLONE_PARENT
&gt; when current-&gt;nsproxy-&gt;pidns_for_children is not our pidns,
&gt; or was this done out of an "over-abundance of caution"?  Can we
&gt; safely revert that new extra check?

The two fundamental things I know we can not allow are:
- A shared signal queue aka CLONE_THREAD.  Because we compute the pid
  and uid of the signal when we place it in the queue.

- Changing the pid and by extention pid_namespace of an existing
  process.

From a parents perspective there is nothing special about the pid
namespace, to deny CLONE_PARENT, because the parent simply won't know or
care.

From the childs perspective all that is special really are shared signal
queues.

User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks
in different pid namespaces is almost certainly going to break because
it is complicated.  But shared signal handlers can look at per thread
information to know which pid namespace a process is in, so I don't know
of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads
at the kernel level.  It would be absolutely stupid to implement but
that is a different thing.

So hmm.

Because it can do no harm, and because it is a regression let's remove
the CLONE_PARENT check and send it stable.

Cc: stable@vger.kernel.org
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Acked-by: Serge E. Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>mm: implement split page table lock for PMD level</title>
<updated>2013-11-15T00:32:15Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-11-14T22:31:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e009bb30c8df8a52a9622b616b67436b6a03a0cd'/>
<id>urn:sha1:e009bb30c8df8a52a9622b616b67436b6a03a0cd</id>
<content type='text'>
The basic idea is the same as with PTE level: the lock is embedded into
struct page of table's page.

We can't use mm-&gt;pmd_huge_pte to store pgtables for THP, since we don't
take mm-&gt;page_table_lock anymore.  Let's reuse page-&gt;lru of table's page
for that.

pgtable_pmd_page_ctor() returns true, if initialization is successful
and false otherwise.  Current implementation never fails, but assumption
that constructor can fail will help to port it to -rt where spinlock_t
is rather huge and cannot be embedded into struct page -- dynamic
allocation is required.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E . McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Robin Holt &lt;robinmholt@gmail.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: convert mm-&gt;nr_ptes to atomic_long_t</title>
<updated>2013-11-15T00:32:14Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2013-11-14T22:30:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e1f56c89b040134add93f686931cc266541d239a'/>
<id>urn:sha1:e1f56c89b040134add93f686931cc266541d239a</id>
<content type='text'>
With split page table lock for PMD level we can't hold mm-&gt;page_table_lock
while updating nr_ptes.

Let's convert it to atomic_long_t to avoid races.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Tested-by: Alex Thorlton &lt;athorlton@sgi.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Paul E . McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Robin Holt &lt;robinmholt@gmail.com&gt;
Cc: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-11-12T01:20:12Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-11-12T01:20:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=39cf275a1a18ba3c7eb9b986c5c9b35b57332798'/>
<id>urn:sha1:39cf275a1a18ba3c7eb9b986c5c9b35b57332798</id>
<content type='text'>
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle are:

   - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
     van Riel, Peter Zijlstra et al.  Yay!

   - optimize preemption counter handling: merge the NEED_RESCHED flag
     into the preempt_count variable, by Peter Zijlstra.

   - wait.h fixes and code reorganization from Peter Zijlstra

   - cfs_bandwidth fixes from Ben Segall

   - SMP load-balancer cleanups from Peter Zijstra

   - idle balancer improvements from Jason Low

   - other fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
  stop_machine: Fix race between stop_two_cpus() and stop_cpus()
  sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
  sched: Fix asymmetric scheduling for POWER7
  sched: Move completion code from core.c to completion.c
  sched: Move wait code from core.c to wait.c
  sched: Move wait.c into kernel/sched/
  sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
  sched: Avoid throttle_cfs_rq() racing with period_timer stopping
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq-&gt;lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  sched: Remove extra put_online_cpus() inside sched_setaffinity()
  sched/rt: Fix task_tick_rt() comment
  sched/wait: Fix build breakage
  sched/wait: Introduce prepare_to_wait_event()
  sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
  sched: Remove get_online_cpus() usage
  sched: Fix race in migrate_swap_stop()
  ...
</content>
</entry>
<entry>
<title>uprobes: Teach uprobe_copy_process() to handle CLONE_VFORK</title>
<updated>2013-10-29T17:02:55Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-10-16T17:39:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3ab679661721b1ec2aaad99a801870ed59ab1110'/>
<id>urn:sha1:3ab679661721b1ec2aaad99a801870ed59ab1110</id>
<content type='text'>
uprobe_copy_process() does nothing if the child shares -&gt;mm with
the forking process, but there is a special case: CLONE_VFORK.
In this case it would be more correct to do dup_utask() but avoid
dup_xol(). This is not that important, the child should not unwind
its stack too much, this can corrupt the parent's stack, but at
least we need this to allow to ret-probe __vfork() itself.

Note: in theory, it would be better to check task_pt_regs(p)-&gt;sp
instead of CLONE_VFORK, we need to dup_utask() if and only if the
child can return from the function called by the parent. But this
needs the arch-dependant helper, and I think that nobody actually
does clone(same_stack, CLONE_VM).

Reported-by: Martin Cermak &lt;mcermak@redhat.com&gt;
Reported-by: David Smith &lt;dsmith@redhat.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
</content>
</entry>
<entry>
<title>uprobes: Change the callsite of uprobe_copy_process()</title>
<updated>2013-10-29T17:02:48Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-10-13T19:18:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b68e0749100e1b901bf11330f149b321c082178e'/>
<id>urn:sha1:b68e0749100e1b901bf11330f149b321c082178e</id>
<content type='text'>
Preparation for the next patches.

Move the callsite of uprobe_copy_process() in copy_process() down
to the succesfull return. We do not care if copy_process() fails,
uprobe_free_utask() won't be called in this case so the wrong
-&gt;utask != NULL doesn't matter.

OTOH, with this change we know that copy_process() can't fail when
uprobe_copy_process() is called, the new task should either return
to user-mode or call do_exit(). This way uprobe_copy_process() can:

	1. setup p-&gt;utask != NULL if necessary

	2. setup uprobes_state.xol_area

	3. use task_work_add(p)

Also, move the definition of uprobe_copy_process() down so that it
can see get_utask().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>sched/numa: Stay on the same node if CLONE_VM</title>
<updated>2013-10-09T12:47:57Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2013-10-07T10:29:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5e1576ed0e54d419286a8096133029062b6ad456'/>
<id>urn:sha1:5e1576ed0e54d419286a8096133029062b6ad456</id>
<content type='text'>
A newly spawned thread inside a process should stay on the same
NUMA node as its parent. This prevents processes from being "torn"
across multiple NUMA nodes every time they spawn a new thread.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1381141781-10992-49-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node"</title>
<updated>2013-10-09T10:40:17Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2013-10-07T10:28:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b726b7dfb400c937546fa91cf8523dcb1aa2fc6e'/>
<id>urn:sha1:b726b7dfb400c937546fa91cf8523dcb1aa2fc6e</id>
<content type='text'>
PTE scanning and NUMA hinting fault handling is expensive so commit
5bca2303 ("mm: sched: numa: Delay PTE scanning until a task is scheduled
on a new node") deferred the PTE scan until a task had been scheduled on
another node. The problem is that in the purely shared memory case that
this may never happen and no NUMA hinting fault information will be
captured. We are not ruling out the possibility that something better
can be done here but for now, this patch needs to be reverted and depend
entirely on the scan_delay to avoid punishing short-lived processes.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1381141781-10992-16-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
