<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/Documentation/sysctl, branch v4.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2017-11-30T02:40:43Z</updated>
<entry>
<title>Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"</title>
<updated>2017-11-30T02:40:43Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2017-11-30T00:10:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=90daf3062fc0f8f919d5496fe167bbd6016a6a63'/>
<id>urn:sha1:90daf3062fc0f8f919d5496fe167bbd6016a6a63</id>
<content type='text'>
This reverts commit 0f6d24f87856 ("mm/page-writeback.c: print a warning
if the vm dirtiness settings are illogical") because it causes false
positive warnings during OOM situations as noticed by Tetsuo Handa:

  Node 0 active_anon:3525940kB inactive_anon:8372kB active_file:216kB inactive_file:1872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2504kB dirty:52kB writeback:0kB shmem:8660kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 636928kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
  Node 0 DMA free:14848kB min:284kB low:352kB high:420kB active_anon:992kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 2687 3645 3645
  Node 0 DMA32 free:53004kB min:49608kB low:62008kB high:74408kB active_anon:2712648kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2773132kB mlocked:0kB kernel_stack:96kB pagetables:5096kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 0 958 958
  Node 0 Normal free:17140kB min:17684kB low:22104kB high:26524kB active_anon:812300kB inactive_anon:8372kB active_file:1228kB inactive_file:1868kB unevictable:0kB writepending:52kB present:1048576kB managed:981224kB mlocked:0kB kernel_stack:3520kB pagetables:8552kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  [...]
  Out of memory: Kill process 8459 (a.out) score 999 or sacrifice child
  Killed process 8459 (a.out) total-vm:4180kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB
  oom_reaper: reaped process 8459 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  vm direct limit must be set greater than background limit.

The problem is that both thresh and bg_thresh will be 0 if
available_memory is less than 4 pages when evaluating
global_dirtyable_memory.

While this might be worked around the whole point of the warning is
dubious at best.  We do rely on admins to do sensible things when
changing tunable knobs.  Dirty memory writeback knobs are not any
special in that regards so revert the warning rather than adding more
hacks to work this around.

Debugged by Yafang Shao.

Link: http://lkml.kernel.org/r/20171127091939.tahb77nznytcxw55@dhcp22.suse.cz
Fixes: 0f6d24f87856 ("mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Documentation/sysctl/vm.txt: fix typo</title>
<updated>2017-11-18T00:10:03Z</updated>
<author>
<name>Kangmin Park</name>
<email>l4stpr0gr4m@gmail.com</email>
</author>
<published>2017-11-17T23:30:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2743232c0c4c82311718bb4ec1fb659ed64ecf7a'/>
<id>urn:sha1:2743232c0c4c82311718bb4ec1fb659ed64ecf7a</id>
<content type='text'>
Link: http://lkml.kernel.org/r/CAKW4uUyCi=PnKf3epgFVz8z=1tMtHSOHNm+fdNxrNw3-THvRCA@mail.gmail.com
Signed-off-by: Kangmin Park &lt;l4stpr0gr4m@gmail.com&gt;
Cc: Jiri Kosina &lt;trivial@kernel.org&gt;
Cc: Alan Cox &lt;alan@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, sysctl: make NUMA stats configurable</title>
<updated>2017-11-16T02:21:07Z</updated>
<author>
<name>Kemi Wang</name>
<email>kemi.wang@intel.com</email>
</author>
<published>2017-11-16T01:38:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4518085e127dff97e74f74a8780d7564e273bec8'/>
<id>urn:sha1:4518085e127dff97e74f74a8780d7564e273bec8</id>
<content type='text'>
This is the second step which introduces a tunable interface that allow
numa stats configurable for optimizing zone_statistics(), as suggested
by Dave Hansen and Ying Huang.

=========================================================================

When page allocation performance becomes a bottleneck and you can
tolerate some possible tool breakage and decreased numa counter
precision, you can do:

	echo 0 &gt; /proc/sys/vm/numa_stat

In this case, numa counter update is ignored.  We can see about
*4.8%*(185-&gt;176) drop of cpu cycles per single page allocation and
reclaim on Jesper's page_bench01 (single thread) and *8.1%*(343-&gt;315)
drop of cpu cycles per single page allocation and reclaim on Jesper's
page_bench03 (88 threads) running on a 2-Socket Broadwell-based server
(88 threads, 126G memory).

Benchmark link provided by Jesper D Brouer (increase loop times to
10000000):

  https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench

=========================================================================

When page allocation performance is not a bottleneck and you want all
tooling to work, you can do:

	echo 1 &gt; /proc/sys/vm/numa_stat

This is system default setting.

Many thanks to Michal Hocko, Dave Hansen, Ying Huang and Vlastimil Babka
for comments to help improve the original patch.

[keescook@chromium.org: make sure mutex is a global static]
  Link: http://lkml.kernel.org/r/20171107213809.GA4314@beast
Link: http://lkml.kernel.org/r/1508290927-8518-1-git-send-email-kemi.wang@intel.com
Signed-off-by: Kemi Wang &lt;kemi.wang@intel.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reported-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Suggested-by: Dave Hansen &lt;dave.hansen@intel.com&gt;
Suggested-by: Ying Huang &lt;ying.huang@intel.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: "Luis R . Rodriguez" &lt;mcgrof@kernel.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Tim Chen &lt;tim.c.chen@intel.com&gt;
Cc: Andi Kleen &lt;andi.kleen@intel.com&gt;
Cc: Aaron Lu &lt;aaron.lu@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: consolidate page table accounting</title>
<updated>2017-11-16T02:21:04Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2017-11-16T01:35:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=af5b0f6a09e42c9f4fa87735f2a366748767b686'/>
<id>urn:sha1:af5b0f6a09e42c9f4fa87735f2a366748767b686</id>
<content type='text'>
Currently, we account page tables separately for each page table level,
but that's redundant -- we only make use of total memory allocated to
page tables for oom_badness calculation.  We also provide the
information to userspace, but it has dubious value there too.

This patch switches page table accounting to single counter.

mm-&gt;pgtables_bytes is now used to account all page table levels.  We use
bytes, because page table size for different levels of page table tree
may be different.

The change has user-visible effect: we don't have VmPMD and VmPUD
reported in /proc/[pid]/status.  Not sure if anybody uses them.  (As
alternative, we can always report 0 kB for them.)

OOM-killer report is also slightly changed: we now report pgtables_bytes
instead of nr_ptes, nr_pmd, nr_puds.

Apart from reducing number of counters per-mm, the benefit is that we
now calculate oom_badness() more correctly for machines which have
different size of page tables depending on level or where page tables
are less than a page in size.

The only downside can be debuggability because we do not know which page
table level could leak.  But I do not remember many bugs that would be
caught by separate counters so I wouldn't lose sleep over this.

[akpm@linux-foundation.org: fix mm/huge_memory.c]
Link: http://lkml.kernel.org/r/20171006100651.44742-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
[kirill.shutemov@linux.intel.com: fix build]
  Link: http://lkml.kernel.org/r/20171016150113.ikfxy3e7zzfvsr4w@black.fi.intel.com
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: account pud page tables</title>
<updated>2017-11-16T02:21:04Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2017-11-16T01:35:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4e98d9ac775907cc53fb08fcb6776deb7694e30'/>
<id>urn:sha1:b4e98d9ac775907cc53fb08fcb6776deb7694e30</id>
<content type='text'>
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup.  The trick is to allocate a lot of PUD page tables.  We don't
account PUD page tables, only PMD and PTE.

We already addressed the same issue for PMD page tables, see commit
dc6c9a35b66b ("mm: account pmd page tables to the process").
Introduction of 5-level paging brings the same issue for PUD page
tables.

The patch expands accounting to PUD level.

[kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
  Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
[heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
  Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical</title>
<updated>2017-11-16T02:21:03Z</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2017-11-16T01:33:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0f6d24f878568fac579a1962d0bf7cb9f01e0ceb'/>
<id>urn:sha1:0f6d24f878568fac579a1962d0bf7cb9f01e0ceb</id>
<content type='text'>
The vm direct limit setting must be set greater than vm background limit
setting.  Otherwise print a warning to help the operator to figure out
that the vm dirtiness settings is in illogical state.

Link: http://lkml.kernel.org/r/1506592464-30962-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>docs: Update binfmt_misc links</title>
<updated>2017-10-03T21:23:38Z</updated>
<author>
<name>Tom Saeger</name>
<email>tom.saeger@oracle.com</email>
</author>
<published>2017-10-03T21:16:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=852f1a21ffae19a45d4944791ce574e92a2b31ab'/>
<id>urn:sha1:852f1a21ffae19a45d4944791ce574e92a2b31ab</id>
<content type='text'>
Documentation/binfmt_misc.txt moved to
Documentation/admin-guide/binfmt-misc.rst

Signed-off-by: Tom Saeger &lt;tom.saeger@oracle.com&gt;
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>Merge tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux</title>
<updated>2017-09-23T02:16:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-09-23T02:16:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c0a3a64e723324ae6dda53214061a71de63808c3'/>
<id>urn:sha1:c0a3a64e723324ae6dda53214061a71de63808c3</id>
<content type='text'>
Pull seccomp updates from Kees Cook:
 "Major additions:

   - sysctl and seccomp operation to discover available actions
     (tyhicks)

   - new per-filter configurable logging infrastructure and sysctl
     (tyhicks)

   - SECCOMP_RET_LOG to log allowed syscalls (tyhicks)

   - SECCOMP_RET_KILL_PROCESS as the new strictest possible action

   - self-tests for new behaviors"

[ This is the seccomp part of the security pull request during the merge
  window that was nixed due to unrelated problems   - Linus ]

* tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  samples: Unrename SECCOMP_RET_KILL
  selftests/seccomp: Test thread vs process killing
  seccomp: Implement SECCOMP_RET_KILL_PROCESS action
  seccomp: Introduce SECCOMP_RET_KILL_PROCESS
  seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD
  seccomp: Action to log before allowing
  seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
  seccomp: Selftest for detection of filter flag support
  seccomp: Sysctl to configure actions that are allowed to be logged
  seccomp: Operation for checking if an action is available
  seccomp: Sysctl to display available actions
  seccomp: Provide matching filter for introspection
  selftests/seccomp: Refactor RET_ERRNO tests
  selftests/seccomp: Add simple seccomp overhead benchmark
  selftests/seccomp: Add tests for basic ptrace actions
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2017-09-07T03:49:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-09-07T03:49:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d34fc1adf01ff87026da85fb972dc259dc347540'/>
<id>urn:sha1:d34fc1adf01ff87026da85fb972dc259dc347540</id>
<content type='text'>
Merge updates from Andrew Morton:

 - various misc bits

 - DAX updates

 - OCFS2

 - most of MM

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (119 commits)
  mm,fork: introduce MADV_WIPEONFORK
  x86,mpx: make mpx depend on x86-64 to free up VMA flag
  mm: add /proc/pid/smaps_rollup
  mm: hugetlb: clear target sub-page last when clearing huge page
  mm: oom: let oom_reap_task and exit_mmap run concurrently
  swap: choose swap device according to numa node
  mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
  mm, oom: do not rely on TIF_MEMDIE for memory reserves access
  z3fold: use per-cpu unbuddied lists
  mm, swap: don't use VMA based swap readahead if HDD is used as swap
  mm, swap: add sysfs interface for VMA based swap readahead
  mm, swap: VMA based swap readahead
  mm, swap: fix swap readahead marking
  mm, swap: add swap readahead hit statistics
  mm/vmalloc.c: don't reinvent the wheel but use existing llist API
  mm/vmstat.c: fix wrong comment
  selftests/memfd: add memfd_create hugetlbfs selftest
  mm/shmem: add hugetlbfs support to memfd_create()
  mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
  mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
  ...
</content>
</entry>
<entry>
<title>mm, page_alloc: rip out ZONELIST_ORDER_ZONE</title>
<updated>2017-09-07T00:27:25Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2017-09-06T23:20:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c9bff3eebc09be23fbc868f5e6731666d23cbea3'/>
<id>urn:sha1:c9bff3eebc09be23fbc868f5e6731666d23cbea3</id>
<content type='text'>
Patch series "cleanup zonelists initialization", v1.

This is aimed at cleaning up the zonelists initialization code we have
but the primary motivation was bug report [2] which got resolved but the
usage of stop_machine is just too ugly to live.  Most patches are
straightforward but 3 of them need a special consideration.

Patch 1 removes zone ordered zonelists completely.  I am CCing linux-api
because this is a user visible change.  As I argue in the patch
description I do not think we have a strong usecase for it these days.
I have kept sysctl in place and warn into the log if somebody tries to
configure zone lists ordering.  If somebody has a real usecase for it we
can revert this patch but I do not expect anybody will actually notice
runtime differences.  This patch is not strictly needed for the rest but
it made patch 6 easier to implement.

Patch 7 removes stop_machine from build_all_zonelists without adding any
special synchronization between iterators and updater which I _believe_
is acceptable as explained in the changelog.  I hope I am not missing
anything.

Patch 8 then removes zonelists_mutex which is kind of ugly as well and
not really needed AFAICS but a care should be taken when double checking
my thinking.

This patch (of 9):

Supporting zone ordered zonelists costs us just a lot of code while the
usefulness is arguable if existent at all.  Mel has already made node
ordering default on 64b systems.  32b systems are still using
ZONELIST_ORDER_ZONE because it is considered better to fallback to a
different NUMA node rather than consume precious lowmem zones.

This argument is, however, weaken by the fact that the memory reclaim
has been reworked to be node rather than zone oriented.  This means that
lowmem requests have to skip over all highmem pages on LRUs already and
so zone ordering doesn't save the reclaim time much.  So the only
advantage of the zone ordering is under a light memory pressure when
highmem requests do not ever hit into lowmem zones and the lowmem
pressure doesn't need to reclaim.

Considering that 32b NUMA systems are rather suboptimal already and it
is generally advisable to use 64b kernel on such a HW I believe we
should rather care about the code maintainability and just get rid of
ZONELIST_ORDER_ZONE altogether.  Keep systcl in place and warn if
somebody tries to set zone ordering either from kernel command line or
the sysctl.

[mhocko@suse.com: reading vm.numa_zonelist_order will never terminate]
Link: http://lkml.kernel.org/r/20170721143915.14161-2-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Joonsoo Kim &lt;js1304@gmail.com&gt;
Cc: Shaohua Li &lt;shaohua.li@intel.com&gt;
Cc: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: Abdul Haleem &lt;abdhalee@linux.vnet.ibm.com&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
