<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/sysctl.c, branch v4.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2016-01-22T18:24:03Z</updated>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2016-01-22T18:24:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-01-22T18:24:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eadee0ce6fd33defe449c97e671bf83fa230b5de'/>
<id>urn:sha1:eadee0ce6fd33defe449c97e671bf83fa230b5de</id>
<content type='text'>
Pull more vfs updates from Al Viro:
 "Embarrassing braino fix + pipe page accounting + fixing an eyesore in
  find_filesystem() (checking that s1 is equal to prefix of s2 of given
  length can be done in many ways, but "compare strlen(s1) with length
  and then do strncmp()" is not a good one...)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [regression] fix braino in fs/dlm/user.c
  pipe: limit the per-user amount of pages allocated in pipes
  find_filesystem(): simplify comparison
</content>
</entry>
<entry>
<title>sysctl: enable strict writes</title>
<updated>2016-01-21T01:09:18Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2016-01-20T23:00:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=41662f5cc55335807d39404371cfcbb1909304c4'/>
<id>urn:sha1:41662f5cc55335807d39404371cfcbb1909304c4</id>
<content type='text'>
SYSCTL_WRITES_WARN was added in commit f4aacea2f5d1 ("sysctl: allow for
strict write position handling"), and released in v3.16 in August of
2014.  Since then I can find only 1 instance of non-zero offset
writing[1], and it was fixed immediately in CRIU[2].  As such, it
appears safe to flip this to the strict state now.

[1] https://www.google.com/search?q="when%20file%20position%20was%20not%200"
[2] http://lists.openvz.org/pipermail/criu/2015-April/019819.html

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pipe: limit the per-user amount of pages allocated in pipes</title>
<updated>2016-01-20T00:25:21Z</updated>
<author>
<name>Willy Tarreau</name>
<email>w@1wt.eu</email>
</author>
<published>2016-01-18T15:36:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=759c01142a5d0f364a462346168a56de28a80f52'/>
<id>urn:sha1:759c01142a5d0f364a462346168a56de28a80f52</id>
<content type='text'>
On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.

This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.

The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).

Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: mmap: add new /proc tunable for mmap_base ASLR</title>
<updated>2016-01-15T00:00:49Z</updated>
<author>
<name>Daniel Cashman</name>
<email>dcashman@google.com</email>
</author>
<published>2016-01-14T23:19:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d07e22597d1d355829b7b18ac19afa912cf758d1'/>
<id>urn:sha1:d07e22597d1d355829b7b18ac19afa912cf758d1</id>
<content type='text'>
Address Space Layout Randomization (ASLR) provides a barrier to
exploitation of user-space processes in the presence of security
vulnerabilities by making it more difficult to find desired code/data
which could help an attack.  This is done by adding a random offset to
the location of regions in the process address space, with a greater
range of potential offset values corresponding to better protection/a
larger search-space for brute force, but also to greater potential for
fragmentation.

The offset added to the mmap_base address, which provides the basis for
the majority of the mappings for a process, is set once on process exec
in arch_pick_mmap_layout() and is done via hard-coded per-arch values,
which reflect, hopefully, the best compromise for all systems.  The
trade-off between increased entropy in the offset value generation and
the corresponding increased variability in address space fragmentation
is not absolute, however, and some platforms may tolerate higher amounts
of entropy.  This patch introduces both new Kconfig values and a sysctl
interface which may be used to change the amount of entropy used for
offset generation on a system.

The direct motivation for this change was in response to the
libstagefright vulnerabilities that affected Android, specifically to
information provided by Google's project zero at:

  http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html

The attack presented therein, by Google's project zero, specifically
targeted the limited randomness used to generate the offset added to the
mmap_base address in order to craft a brute-force-based attack.
Concretely, the attack was against the mediaserver process, which was
limited to respawning every 5 seconds, on an arm device.  The hard-coded
8 bits used resulted in an average expected success rate of defeating
the mmap ASLR after just over 10 minutes (128 tries at 5 seconds a
piece).  With this patch, and an accompanying increase in the entropy
value to 16 bits, the same attack would take an average expected time of
over 45 hours (32768 tries), which makes it both less feasible and more
likely to be noticed.

The introduced Kconfig and sysctl options are limited by per-arch
minimum and maximum values, the minimum of which was chosen to match the
current hard-coded value and the maximum of which was chosen so as to
give the greatest flexibility without generating an invalid mmap_base
address, generally a 3-4 bits less than the number of bits in the
user-space accessible virtual address space.

When decided whether or not to change the default value, a system
developer should consider that mmap_base address could be placed
anywhere up to 2^(value) bits away from the non-randomized location,
which would introduce variable-sized areas above and below the mmap_base
address such that the maximum vm_area_struct size may be reduced,
preventing very large allocations.

This patch (of 4):

ASLR only uses as few as 8 bits to generate the random offset for the
mmap base address on 32 bit architectures.  This value was chosen to
prevent a poorly chosen value from dividing the address space in such a
way as to prevent large allocations.  This may not be an issue on all
platforms.  Allow the specification of a minimum number of bits so that
platforms desiring greater ASLR protection may determine where to place
the trade-off.

Signed-off-by: Daniel Cashman &lt;dcashman@google.com&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Heinrich Schuchardt &lt;xypron.glpk@gmx.de&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Mark Salyzyn &lt;salyzyn@android.com&gt;
Cc: Jeff Vander Stoep &lt;jeffv@google.com&gt;
Cc: Nick Kralevich &lt;nnk@google.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Hector Marco-Gisbert &lt;hecmargi@upv.es&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/*: switch to memdup_user_nul()</title>
<updated>2016-01-04T15:27:55Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2015-12-24T05:13:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=70f6cbb6f9c95535acd327d1ac1ce5fd078cff1e'/>
<id>urn:sha1:70f6cbb6f9c95535acd327d1ac1ce5fd078cff1e</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: add sysctl knob hardlockup_panic</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Don Zickus</name>
<email>dzickus@redhat.com</email>
</author>
<published>2015-11-06T02:44:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ac1f591249d95372f3a5ab3828d4af5dfbf5efd3'/>
<id>urn:sha1:ac1f591249d95372f3a5ab3828d4af5dfbf5efd3</id>
<content type='text'>
The only way to enable a hardlockup to panic the machine is to set
'nmi_watchdog=panic' on the kernel command line.

This makes it awkward for end users and folks who want to run automate
tests (like myself).

Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic
knob.

Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Jiri Kosina</name>
<email>jkosina@suse.cz</email>
</author>
<published>2015-11-06T02:44:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=55537871ef666b4153fd1ef8782e4a13fee142cc'/>
<id>urn:sha1:55537871ef666b4153fd1ef8782e4a13fee142cc</id>
<content type='text'>
In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.

IOW, we are often looking at the stacktrace of the victim and not the
actual offender.

Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.

Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: enable non-root eBPF programs</title>
<updated>2015-10-13T02:13:35Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-10-08T05:23:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1be7f75d1668d6296b80bf35dcf6762393530afc'/>
<id>urn:sha1:1be7f75d1668d6296b80bf35dcf6762393530afc</id>
<content type='text'>
In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
  (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
  (except if (map_value_ptr == 0) ... )
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps

Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.

Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.

Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.

For example, the following unprivileged socket filter program is allowed:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&amp;my_map, &amp;index);

  if (value)
	*value += skb-&gt;len;
  return 0;
}

but the following program is not:
int bpf_prog1(struct __sk_buff *skb)
{
  u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
  u64 *value = bpf_map_lookup_elem(&amp;my_map, &amp;index);

  if (value)
	*value += (u64) skb;
  return 0;
}
since it would leak the kernel address into the map.

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns

The feature is controlled by sysctl kernel.unprivileged_bpf_disabled.
This toggle defaults to off (0), but can be set true (1).  Once true,
bpf programs and maps cannot be accessed from unprivileged process,
and the toggle cannot be set back to false.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sysctl: fix int -&gt; unsigned long assignments in INT_MIN case</title>
<updated>2015-09-10T20:29:01Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-09-09T22:39:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9a5bc726d559221a3394bb8ef97d0abc1ee94d00'/>
<id>urn:sha1:9a5bc726d559221a3394bb8ef97d0abc1ee94d00</id>
<content type='text'>
The following

    if (val &lt; 0)
        *lvalp = (unsigned long)-val;

is incorrect because the compiler is free to assume -val to be positive
and use a sign-extend instruction for extending the bit pattern.  This is
a problem if val == INT_MIN:

    # echo -2147483648 &gt;/proc/sys/dev/scsi/logging_level
    # cat /proc/sys/dev/scsi/logging_level
    -18446744071562067968

Cast to unsigned long before negation - that way we first sign-extend and
then negate an unsigned, which is well defined.  With this:

    # cat /proc/sys/dev/scsi/logging_level
    -2147483648

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Cc: Mikulas Patocka &lt;mikulas@twibright.com&gt;
Cc: Robert Xiao &lt;nneonneo@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kexec: split kexec_load syscall from kexec core code</title>
<updated>2015-09-10T20:29:01Z</updated>
<author>
<name>Dave Young</name>
<email>dyoung@redhat.com</email>
</author>
<published>2015-09-09T22:38:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2965faa5e03d1e71e9ff9aa143fff39e0a77543a'/>
<id>urn:sha1:2965faa5e03d1e71e9ff9aa143fff39e0a77543a</id>
<content type='text'>
There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Petr Tesarik &lt;ptesarik@suse.cz&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: Josh Boyer &lt;jwboyer@fedoraproject.org&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
