<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sys.c, branch v4.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-02-28T17:57:51Z</updated>
<entry>
<title>kernel/sys.c: fix UNAME26 for 4.0</title>
<updated>2015-02-28T17:57:51Z</updated>
<author>
<name>Jon DeVree</name>
<email>nuxi@vault24.org</email>
</author>
<published>2015-02-27T23:52:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=39afb5ee4640b4ed2cdd9e12b2a67cf785cfced8'/>
<id>urn:sha1:39afb5ee4640b4ed2cdd9e12b2a67cf785cfced8</id>
<content type='text'>
There's a uname workaround for broken userspace which can't handle kernel
versions of 3.x.  Update it for 4.x.

Signed-off-by: Jon DeVree &lt;nuxi@vault24.org&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus</title>
<updated>2015-02-22T03:41:38Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-02-22T03:41:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a135c717d5cdb311cff7661af4c17fef0562e590'/>
<id>urn:sha1:a135c717d5cdb311cff7661af4c17fef0562e590</id>
<content type='text'>
Pull MIPS updates from Ralf Baechle:
 "This is the main pull request for MIPS:

   - a number of fixes that didn't make the 3.19 release.

   - a number of cleanups.

   - preliminary support for Cavium's Octeon 3 SOCs which feature up to
     48 MIPS64 R3 cores with FPU and hardware virtualization.

   - support for MIPS R6 processors.

     Revision 6 of the MIPS architecture is a major revision of the MIPS
     architecture which does away with many of original sins of the
     architecture such as branch delay slots.  This and other changes in
     R6 require major changes throughout the entire MIPS core
     architecture code and make up for the lion share of this pull
     request.

   - finally some preparatory work for eXtendend Physical Address
     support, which allows support of up to 40 bit of physical address
     space on 32 bit processors"

     [ Ahh, MIPS can't leave the PAE brain damage alone.  It's like
       every CPU architect has to make that mistake, but pee in the snow
       by changing the TLA.  But whether it's called PAE, LPAE or XPA,
       it's horrid crud   - Linus ]

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (114 commits)
  MIPS: sead3: Corrected get_c0_perfcount_int
  MIPS: mm: Remove dead macro definitions
  MIPS: OCTEON: irq: add CIB and other fixes
  MIPS: OCTEON: Don't do acknowledge operations for level triggered irqs.
  MIPS: OCTEON: More OCTEONIII support
  MIPS: OCTEON: Remove setting of processor specific CVMCTL icache bits.
  MIPS: OCTEON: Core-15169 Workaround and general CVMSEG cleanup.
  MIPS: OCTEON: Update octeon-model.h code for new SoCs.
  MIPS: OCTEON: Implement DCache errata workaround for all CN6XXX
  MIPS: OCTEON: Add little-endian support to asm/octeon/octeon.h
  MIPS: OCTEON: Implement the core-16057 workaround
  MIPS: OCTEON: Delete unused COP2 saving code
  MIPS: OCTEON: Use correct instruction to read 64-bit COP0 register
  MIPS: OCTEON: Save and restore CP2 SHA3 state
  MIPS: OCTEON: Fix FP context save.
  MIPS: OCTEON: Save/Restore wider multiply registers in OCTEON III CPUs
  MIPS: boot: Provide more uImage options
  MIPS: Remove unneeded #ifdef __KERNEL__ from asm/processor.h
  MIPS: ip22-gio: Remove legacy suspend/resume support
  mips: pci: Add ifdef around pci_proc_domain
  ...
</content>
</entry>
<entry>
<title>MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS</title>
<updated>2015-02-12T11:30:29Z</updated>
<author>
<name>Paul Burton</name>
<email>paul.burton@imgtec.com</email>
</author>
<published>2015-01-08T12:17:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9791554b45a2acc28247f66a5fd5bbc212a6b8c8'/>
<id>urn:sha1:9791554b45a2acc28247f66a5fd5bbc212a6b8c8</id>
<content type='text'>
Userland code may be built using an ABI which permits linking to objects
that have more restrictive floating point requirements. For example,
userland code may be built to target the O32 FPXX ABI. Such code may be
linked with other FPXX code, or code built for either one of the more
restrictive FP32 or FP64. When linking with more restrictive code, the
overall requirement of the process becomes that of the more restrictive
code. The kernel has no way to know in advance which mode the process
will need to be executed in, and indeed it may need to change during
execution. The dynamic loader is the only code which will know the
overall required mode, and so it needs to have a means to instruct the
kernel to switch the FP mode of the process.

This patch introduces 2 new options to the prctl syscall which provide
such a capability. The FP mode of the process is represented as a
simple bitmask combining a number of mode bits mirroring those present
in the hardware. Userland can either retrieve the current FP mode of
the process:

  mode = prctl(PR_GET_FP_MODE);

or modify the current FP mode of the process:

  err = prctl(PR_SET_FP_MODE, new_mode);

Signed-off-by: Paul Burton &lt;paul.burton@imgtec.com&gt;
Cc: Matthew Fortune &lt;matthew.fortune@imgtec.com&gt;
Cc: Markos Chandras &lt;markos.chandras@imgtec.com&gt;
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8899/
Signed-off-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
</content>
</entry>
<entry>
<title>x86, mpx: Strictly enforce empty prctl() args</title>
<updated>2015-01-22T20:11:06Z</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2015-01-08T22:30:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e9d1b4f3c60997fe197bf0243cb4a41a44387a88'/>
<id>urn:sha1:e9d1b4f3c60997fe197bf0243cb4a41a44387a88</id>
<content type='text'>
Description from Michael Kerrisk.  He suggested an identical patch
to one I had already coded up and tested.

commit fe3d197f8431 "x86, mpx: On-demand kernel allocation of bounds
tables" added two new prctl() operations, PR_MPX_ENABLE_MANAGEMENT and
PR_MPX_DISABLE_MANAGEMENT.  However, no checks were included to ensure
that unused arguments are zero, as is done in many existing prctl()s
and as should be done for all new prctl()s. This patch adds the
required checks.

Suggested-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Suggested-by: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Link: http://lkml.kernel.org/r/20150108223022.7F56FD13@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>x86, mpx: On-demand kernel allocation of bounds tables</title>
<updated>2014-11-17T23:58:53Z</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2014-11-14T15:18:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fe3d197f84319d3bce379a9c0dc17b1f48ad358c'/>
<id>urn:sha1:fe3d197f84319d3bce379a9c0dc17b1f48ad358c</id>
<content type='text'>
This is really the meat of the MPX patch set.  If there is one patch to
review in the entire series, this is the one.  There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner.  (small FAQ below).

Long Description:

This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers.  The prctl() is an
explicit signal from userspace.

PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.

PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.

PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (-&gt;bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address.  Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.

Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive.  Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.

==== Why does the hardware even have these Bounds Tables? ====

MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".

They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.

The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
   that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
   area needs 4*X GB of virtual space, plus 2GB for the bounds
   directory. If we were to preallocate them for the 128TB of
   user virtual address space, we would need to reserve 512TB+2GB,
   which is larger than the entire virtual address space today.
   This means they can not be reserved ahead of time. Also, a
   single process's pre-popualated bounds directory consumes 2GB
   of virtual *AND* physical memory. IOW, it's completely
   infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
   is allocated which might contain pointers that might eventually
   need bounds tables?
A: This would work if we could hook the site of each and every
   memory allocation syscall. This can be done for small,
   constrained applications. But, it isn't practical at a larger
   scale since a given app has no way of controlling how all the
   parts of the app might allocate memory (think libraries). The
   kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
   allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
   handler functions and even if mmap() would work it still
   requires locking or nasty tricks to keep track of the
   allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.

Based-on-patch-by: Qiaowei Ren &lt;qiaowei.ren@intel.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen &lt;dave@sr71.net&gt;
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-10-13T14:23:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-10-13T14:23:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=faafcba3b5e15999cf75d5c5a513ac8e47e2545f'/>
<id>urn:sha1:faafcba3b5e15999cf75d5c5a513ac8e47e2545f</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
     Hansen)

   - Various sched/idle refinements for better idle handling (Nicolas
     Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

   - sched/numa updates and optimizations (Rik van Riel)

   - sysbench speedup (Vincent Guittot)

   - capacity calculation cleanups/refactoring (Vincent Guittot)

   - Various cleanups to thread group iteration (Oleg Nesterov)

   - Double-rq-lock removal optimization and various refactorings
     (Kirill Tkhai)

   - various sched/deadline fixes

  ... and lots of other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
  sched/fair: Delete resched_cpu() from idle_balance()
  sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
  sched: Improve sysbench performance by fixing spurious active migration
  sched/x86: Fix up typo in topology detection
  x86, sched: Add new topology for multi-NUMA-node CPUs
  sched/rt: Use resched_curr() in task_tick_rt()
  sched: Use rq-&gt;rd in sched_setaffinity() under RCU read lock
  sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
  sched: Use dl_bw_of() under RCU read lock
  sched/fair: Remove duplicate code from can_migrate_task()
  sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  sched: print_rq(): Don't use tasklist_lock
  sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
  sched: Fix the task-group check in tg_has_rt_tasks()
  sched/fair: Leverage the idle state info when choosing the "idlest" cpu
  sched: Let the scheduler see CPU idle states
  sched/deadline: Fix inter- exclusive cpusets migrations
  sched/deadline: Clear dl_entity params when setscheduling to different class
  sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
  ...
</content>
</entry>
<entry>
<title>kernel/sys.c: compat sysinfo syscall: fix undefined behavior</title>
<updated>2014-10-10T02:26:04Z</updated>
<author>
<name>Scotty Bauer</name>
<email>sbauer@eng.utah.edu</email>
</author>
<published>2014-10-09T22:30:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0baae41ea8365a7b5a34c6474a77d7eb1126f6b2'/>
<id>urn:sha1:0baae41ea8365a7b5a34c6474a77d7eb1126f6b2</id>
<content type='text'>
Fix undefined behavior and compiler warning by replacing right shift 32
with upper_32_bits macro

Signed-off-by: Scotty Bauer &lt;sbauer@eng.utah.edu&gt;
Cc: Clemens Ladisch &lt;clemens@ladisch.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/sys.c: whitespace fixes</title>
<updated>2014-10-10T02:26:04Z</updated>
<author>
<name>vishnu.ps</name>
<email>vishnu.ps@samsung.com</email>
</author>
<published>2014-10-09T22:30:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec94fc3d59b54561da03a0e433d93217b08c1481'/>
<id>urn:sha1:ec94fc3d59b54561da03a0e433d93217b08c1481</id>
<content type='text'>
Fix minor errors and warning messages in kernel/sys.c.  These errors were
reported by checkpatch while working with some modifications in sys.c
file.  Fixing this first will help me to improve my further patches.

ERROR: trailing whitespace - 9
ERROR: do not use assignment in if condition - 4
ERROR: spaces required around that '?' (ctx:VxO) - 10
ERROR: switch and case should be at the same indent - 3

total 26 errors &amp; 3 warnings fixed.

Signed-off-by: vishnu.ps &lt;vishnu.ps@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: use VM_BUG_ON_MM where possible</title>
<updated>2014-10-10T02:25:58Z</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2014-10-09T22:28:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=96dad67ff244e797c4bc3e4f7f0fdaa0cfdf0a7d'/>
<id>urn:sha1:96dad67ff244e797c4bc3e4f7f0fdaa0cfdf0a7d</id>
<content type='text'>
Dump the contents of the relevant struct_mm when we hit the bug condition.

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation</title>
<updated>2014-10-10T02:25:55Z</updated>
<author>
<name>Cyrill Gorcunov</name>
<email>gorcunov@openvz.org</email>
</author>
<published>2014-10-09T22:27:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f606b77f1a9e362451aca8f81d8f36a3a112139e'/>
<id>urn:sha1:f606b77f1a9e362451aca8f81d8f36a3a112139e</id>
<content type='text'>
During development of c/r we've noticed that in case if we need to support
user namespaces we face a problem with capabilities in prctl(PR_SET_MM,
...) call, in particular once new user namespace is created
capable(CAP_SYS_RESOURCE) no longer passes.

A approach is to eliminate CAP_SYS_RESOURCE check but pass all new values
in one bundle, which would allow the kernel to make more intensive test
for sanity of values and same time allow us to support checkpoint/restore
of user namespaces.

Thus a new command PR_SET_MM_MAP introduced. It takes a pointer of
prctl_mm_map structure which carries all the members to be updated.

	prctl(PR_SET_MM, PR_SET_MM_MAP, struct prctl_mm_map *, size)

	struct prctl_mm_map {
		__u64	start_code;
		__u64	end_code;
		__u64	start_data;
		__u64	end_data;
		__u64	start_brk;
		__u64	brk;
		__u64	start_stack;
		__u64	arg_start;
		__u64	arg_end;
		__u64	env_start;
		__u64	env_end;
		__u64	*auxv;
		__u32	auxv_size;
		__u32	exe_fd;
	};

All members except @exe_fd correspond ones of struct mm_struct.  To figure
out which available values these members may take here are meanings of the
members.

 - start_code, end_code: represent bounds of executable code area
 - start_data, end_data: represent bounds of data area
 - start_brk, brk: used to calculate bounds for brk() syscall
 - start_stack: used when accounting space needed for command
   line arguments, environment and shmat() syscall
 - arg_start, arg_end, env_start, env_end: represent memory area
   supplied for command line arguments and environment variables
 - auxv, auxv_size: carries auxiliary vector, Elf format specifics
 - exe_fd: file descriptor number for executable link (/proc/self/exe)

Thus we apply the following requirements to the values

1) Any member except @auxv, @auxv_size, @exe_fd is rather an address
   in user space thus it must be laying inside [mmap_min_addr, mmap_max_addr)
   interval.

2) While @[start|end]_code and @[start|end]_data may point to an nonexisting
   VMAs (say a program maps own new .text and .data segments during execution)
   the rest of members should belong to VMA which must exist.

3) Addresses must be ordered, ie @start_ member must not be greater or
   equal to appropriate @end_ member.

4) As in regular Elf loading procedure we require that @start_brk and
   @brk be greater than @end_data.

5) If RLIMIT_DATA rlimit is set to non-infinity new values should not
   exceed existing limit. Same applies to RLIMIT_STACK.

6) Auxiliary vector size must not exceed existing one (which is
   predefined as AT_VECTOR_SIZE and depends on architecture).

7) File descriptor passed in @exe_file should be pointing
   to executable file (because we use existing prctl_set_mm_exe_file_locked
   helper it ensures that the file we are going to use as exe link has all
   required permission granted).

Now about where these members are involved inside kernel code:

 - @start_code and @end_code are used in /proc/$pid/[stat|statm] output;

 - @start_data and @end_data are used in /proc/$pid/[stat|statm] output,
   also they are considered if there enough space for brk() syscall
   result if RLIMIT_DATA is set;

 - @start_brk shown in /proc/$pid/stat output and accounted in brk()
   syscall if RLIMIT_DATA is set; also this member is tested to
   find a symbolic name of mmap event for perf system (we choose
   if event is generated for "heap" area); one more aplication is
   selinux -- we test if a process has PROCESS__EXECHEAP permission
   if trying to make heap area being executable with mprotect() syscall;

 - @brk is a current value for brk() syscall which lays inside heap
   area, it's shown in /proc/$pid/stat. When syscall brk() succesfully
   provides new memory area to a user space upon brk() completion the
   mm::brk is updated to carry new value;

   Both @start_brk and @brk are actively used in /proc/$pid/maps
   and /proc/$pid/smaps output to find a symbolic name "heap" for
   VMA being scanned;

 - @start_stack is printed out in /proc/$pid/stat and used to
   find a symbolic name "stack" for task and threads in
   /proc/$pid/maps and /proc/$pid/smaps output, and as the same
   as with @start_brk -- perf system uses it for event naming.
   Also kernel treat this member as a start address of where
   to map vDSO pages and to check if there is enough space
   for shmat() syscall;

 - @arg_start, @arg_end, @env_start and @env_end are printed out
   in /proc/$pid/stat. Another access to the data these members
   represent is to read /proc/$pid/environ or /proc/$pid/cmdline.
   Any attempt to read these areas kernel tests with access_process_vm
   helper so a user must have enough rights for this action;

 - @auxv and @auxv_size may be read from /proc/$pid/auxv. Strictly
   speaking kernel doesn't care much about which exactly data is
   sitting there because it is solely for userspace;

 - @exe_fd is referred from /proc/$pid/exe and when generating
   coredump. We uses prctl_set_mm_exe_file_locked helper to update
   this member, so exe-file link modification remains one-shot
   action.

Still note that updating exe-file link now doesn't require sys-resource
capability anymore, after all there is no much profit in preventing setup
own file link (there are a number of ways to execute own code -- ptrace,
ld-preload, so that the only reliable way to find which exactly code is
executed is to inspect running program memory).  Still we require the
caller to be at least user-namespace root user.

I believe the old interface should be deprecated and ripped off in a
couple of kernel releases if no one against.

To test if new interface is implemented in the kernel one can pass
PR_SET_MM_MAP_SIZE opcode and the kernel returns the size of currently
supported struct prctl_mm_map.

[akpm@linux-foundation.org: fix 80-col wordwrap in macro definitions]
Signed-off-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Andrew Vagin &lt;avagin@openvz.org&gt;
Tested-by: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: Julien Tinnes &lt;jln@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
