<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/Documentation/admin-guide/sysctl, branch v5.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-08-12T17:58:01Z</updated>
<entry>
<title>coredump: add %f for executable filename</title>
<updated>2020-08-12T17:58:01Z</updated>
<author>
<name>Lepton Wu</name>
<email>ytht.net@gmail.com</email>
</author>
<published>2020-08-12T01:36:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f38c85f1ba6902e4e2e2bf1b84edf065a904cdeb'/>
<id>urn:sha1:f38c85f1ba6902e4e2e2bf1b84edf065a904cdeb</id>
<content type='text'>
The document reads "%e" should be "executable filename" while actually it
could be changed by things like pr_ctl PR_SET_NAME.  People who uses "%e"
in core_pattern get surprised when they find out they get thread name
instead of executable filename.

This is either a bug of document or a bug of code.  Since the behavior of
"%e" is there for long time, it could bring another surprise for users if
we "fix" the code.

So we just "fix" the document.  And more, for users who really need the
"executable filename" in core_pattern, we introduce a new "%f" for the
real executable filename.  We already have "%E" for executable path in
kernel, so just reuse most of its code for the new added "%f" format.

Signed-off-by: Lepton Wu &lt;ytht.net@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20200701031432.2978761-1-ytht.net@gmail.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: proactive compaction</title>
<updated>2020-08-12T17:57:56Z</updated>
<author>
<name>Nitin Gupta</name>
<email>nigupta@nvidia.com</email>
</author>
<published>2020-08-12T01:31:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=facdaa917c4d5a376d09d25865f5a863f906234a'/>
<id>urn:sha1:facdaa917c4d5a376d09d25865f5a863f906234a</id>
<content type='text'>
For some applications, we need to allocate almost all memory as hugepages.
However, on a running system, higher-order allocations can fail if the
memory is fragmented.  Linux kernel currently does on-demand compaction as
we request more hugepages, but this style of compaction incurs very high
latency.  Experiments with one-time full memory compaction (followed by
hugepage allocations) show that kernel is able to restore a highly
fragmented memory state to a fairly compacted memory state within &lt;1 sec
for a 32G system.  Such data suggests that a more proactive compaction can
help us allocate a large fraction of memory as hugepages keeping
allocation latencies low.

For a more proactive compaction, the approach taken here is to define a
new sysctl called 'vm.compaction_proactiveness' which dictates bounds for
external fragmentation which kcompactd tries to maintain.

The tunable takes a value in range [0, 100], with a default of 20.

Note that a previous version of this patch [1] was found to introduce too
many tunables (per-order extfrag{low, high}), but this one reduces them to
just one sysctl.  Also, the new tunable is an opaque value instead of
asking for specific bounds of "external fragmentation", which would have
been difficult to estimate.  The internal interpretation of this opaque
value allows for future fine-tuning.

Currently, we use a simple translation from this tunable to [low, high]
"fragmentation score" thresholds (low=100-proactiveness, high=low+10%).
The score for a node is defined as weighted mean of per-zone external
fragmentation.  A zone's present_pages determines its weight.

To periodically check per-node score, we reuse per-node kcompactd threads,
which are woken up every 500 milliseconds to check the same.  If a node's
score exceeds its high threshold (as derived from user-provided
proactiveness value), proactive compaction is started until its score
reaches its low threshold value.  By default, proactiveness is set to 20,
which implies threshold values of low=80 and high=90.

This patch is largely based on ideas from Michal Hocko [2].  See also the
LWN article [3].

Performance data
================

System: x64_64, 1T RAM, 80 CPU threads.
Kernel: 5.6.0-rc3 + this patch

echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Before starting the driver, the system was fragmented from a userspace
program that allocates all memory and then for each 2M aligned section,
frees 3/4 of base pages using munmap.  The workload is mainly anonymous
userspace pages, which are easy to move around.  I intentionally avoided
unmovable pages in this test to see how much latency we incur when
hugepage allocations hit direct compaction.

1. Kernel hugepage allocation latencies

With the system in such a fragmented state, a kernel driver then allocates
as many hugepages as possible and measures allocation latency:

(all latency values are in microseconds)

- With vanilla 5.6.0-rc3

  percentile latency
  –––––––––– –––––––
	   5    7894
	  10    9496
	  25   12561
	  30   15295
	  40   18244
	  50   21229
	  60   27556
	  75   30147
	  80   31047
	  90   32859
	  95   33799

Total 2M hugepages allocated = 383859 (749G worth of hugepages out of 762G
total free =&gt; 98% of free memory could be allocated as hugepages)

- With 5.6.0-rc3 + this patch, with proactiveness=20

sysctl -w vm.compaction_proactiveness=20

  percentile latency
  –––––––––– –––––––
	   5       2
	  10       2
	  25       3
	  30       3
	  40       3
	  50       4
	  60       4
	  75       4
	  80       4
	  90       5
	  95     429

Total 2M hugepages allocated = 384105 (750G worth of hugepages out of 762G
total free =&gt; 98% of free memory could be allocated as hugepages)

2. JAVA heap allocation

In this test, we first fragment memory using the same method as for (1).

Then, we start a Java process with a heap size set to 700G and request the
heap to be allocated with THP hugepages.  We also set THP to madvise to
allow hugepage backing of this heap.

/usr/bin/time
 java -Xms700G -Xmx700G -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch

The above command allocates 700G of Java heap using hugepages.

- With vanilla 5.6.0-rc3

17.39user 1666.48system 27:37.89elapsed

- With 5.6.0-rc3 + this patch, with proactiveness=20

8.35user 194.58system 3:19.62elapsed

Elapsed time remains around 3:15, as proactiveness is further increased.

Note that proactive compaction happens throughout the runtime of these
workloads.  The situation of one-time compaction, sufficient to supply
hugepages for following allocation stream, can probably happen for more
extreme proactiveness values, like 80 or 90.

In the above Java workload, proactiveness is set to 20.  The test starts
with a node's score of 80 or higher, depending on the delay between the
fragmentation step and starting the benchmark, which gives more-or-less
time for the initial round of compaction.  As t he benchmark consumes
hugepages, node's score quickly rises above the high threshold (90) and
proactive compaction starts again, which brings down the score to the low
threshold level (80).  Repeat.

bpftrace also confirms proactive compaction running 20+ times during the
runtime of this Java benchmark.  kcompactd threads consume 100% of one of
the CPUs while it tries to bring a node's score within thresholds.

Backoff behavior
================

Above workloads produce a memory state which is easy to compact.  However,
if memory is filled with unmovable pages, proactive compaction should
essentially back off.  To test this aspect:

- Created a kernel driver that allocates almost all memory as hugepages
  followed by freeing first 3/4 of each hugepage.
- Set proactiveness=40
- Note that proactive_compact_node() is deferred maximum number of times
  with HPAGE_FRAG_CHECK_INTERVAL_MSEC of wait between each check
  (=&gt; ~30 seconds between retries).

[1] https://patchwork.kernel.org/patch/11098289/
[2] https://lore.kernel.org/linux-mm/20161230131412.GI13301@dhcp22.suse.cz/
[3] https://lwn.net/Articles/817905/

Signed-off-by: Nitin Gupta &lt;nigupta@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Tested-by: Oleksandr Natalenko &lt;oleksandr@redhat.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Reviewed-by: Oleksandr Natalenko &lt;oleksandr@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Nitin Gupta &lt;ngupta@nitingupta.dev&gt;
Cc: Oleksandr Natalenko &lt;oleksandr@redhat.com&gt;
Link: http://lkml.kernel.org/r/20200616204527.19185-1-nigupta@nvidia.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'docs-5.9' of git://git.lwn.net/linux</title>
<updated>2020-08-05T05:47:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-08-05T05:47:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2324d50d051ec0f14a548e78554fb02513d6dcef'/>
<id>urn:sha1:2324d50d051ec0f14a548e78554fb02513d6dcef</id>
<content type='text'>
Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for &lt;alobakin@marvell.com&gt;
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
</content>
</entry>
<entry>
<title>Documentation/sysctl: Document uclamp sysctl knobs</title>
<updated>2020-07-29T11:51:48Z</updated>
<author>
<name>Qais Yousef</name>
<email>qais.yousef@arm.com</email>
</author>
<published>2020-07-16T11:03:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1f73d1abe5836bd8ffe747ff5cb7561b17ce5bc6'/>
<id>urn:sha1:1f73d1abe5836bd8ffe747ff5cb7561b17ce5bc6</id>
<content type='text'>
Uclamp exposes 3 sysctl knobs:

	* sched_util_clamp_min
	* sched_util_clamp_max
	* sched_util_clamp_min_rt_default

Document them in sysctl/kernel.rst.

Signed-off-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20200716110347.19553-3-qais.yousef@arm.com
</content>
</entry>
<entry>
<title>Replace HTTP links with HTTPS ones: Documentation/admin-guide</title>
<updated>2020-07-05T20:13:00Z</updated>
<author>
<name>Alexander A. Klimov</name>
<email>grandmaster@al2klimov.de</email>
</author>
<published>2020-06-27T07:29:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6b2484e13a52e9836e67e89cadc6189c40c8ae8c'/>
<id>urn:sha1:6b2484e13a52e9836e67e89cadc6189c40c8ae8c</id>
<content type='text'>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
          If both the HTTP and HTTPS versions
          return 200 OK and serve the same content:
            Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov &lt;grandmaster@al2klimov.de&gt;
Link: https://lore.kernel.org/r/20200627072935.62652-1-grandmaster@al2klimov.de
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>Documentation/admin-guide: sysctl/kernel: drop doubled word</title>
<updated>2020-07-05T20:01:49Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2020-07-04T03:20:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ee74db082abf8f4ce2ef534da4ed8950de21ae29'/>
<id>urn:sha1:ee74db082abf8f4ce2ef534da4ed8950de21ae29</id>
<content type='text'>
Drop the doubled word "set".

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/20200704032020.21923-12-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'mauro' into docs-next</title>
<updated>2020-06-26T17:35:10Z</updated>
<author>
<name>Jonathan Corbet</name>
<email>corbet@lwn.net</email>
</author>
<published>2020-06-26T17:35:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=435a77434653faabcdf26c3d08c4fd25602c8613'/>
<id>urn:sha1:435a77434653faabcdf26c3d08c4fd25602c8613</id>
<content type='text'>
A big set of fixes and RST conversions from Mauro.  He swears that this is
the last RST conversion set, which is certainly cause for celebration.
</content>
</entry>
<entry>
<title>docs: move nommu-mmap.txt to admin-guide and rename to ReST</title>
<updated>2020-06-26T17:33:35Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab+huawei@kernel.org</email>
</author>
<published>2020-06-23T13:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=800c02f5d03019716a5926b73144be3bf0276923'/>
<id>urn:sha1:800c02f5d03019716a5926b73144be3bf0276923</id>
<content type='text'>
The nommu-mmap.txt file provides description of user visible
behaviuour. So, move it to the admin-guide.

As it is already at the ReST, also rename it.

Suggested-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Suggested-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+huawei@kernel.org&gt;
Link: https://lore.kernel.org/r/3a63d1833b513700755c85bf3bda0a6c4ab56986.1592918949.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>docs: sysctl/kernel: document random</title>
<updated>2020-06-26T17:25:14Z</updated>
<author>
<name>Stephen Kitt</name>
<email>steve@sk2.org</email>
</author>
<published>2020-06-23T11:25:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0b227076d509c2facf177d7cdb67cbbb885cb661'/>
<id>urn:sha1:0b227076d509c2facf177d7cdb67cbbb885cb661</id>
<content type='text'>
This documents the random directory, based on the behaviour seen in
drivers/char/random.c.

Signed-off-by: Stephen Kitt &lt;steve@sk2.org&gt;
Link: https://lore.kernel.org/r/20200623112514.10650-1-steve@sk2.org
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
<entry>
<title>Documentation: fix sysctl/kernel.rst heading format warnings</title>
<updated>2020-06-19T19:24:57Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2020-06-15T04:11:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e996919b7292d8aa1ff16d90b1e0684e7627fc53'/>
<id>urn:sha1:e996919b7292d8aa1ff16d90b1e0684e7627fc53</id>
<content type='text'>
Fix heading format warnings in admin-guide/sysctl/kernel.rst:

Documentation/admin-guide/sysctl/kernel.rst:339: WARNING: Title underline too short.
hung_task_all_cpu_backtrace:
================

Documentation/admin-guide/sysctl/kernel.rst:650: WARNING: Title underline too short.
oops_all_cpu_backtrace:
================

Fixes: 0ec9dc9bcba0 ("kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected")
Fixes: 60c958d8df9c ("panic: add sysctl to dump all CPUs backtraces on oops event")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Reviewed-by: Mauro Carvalho Chehab &lt;mchehab+huawei@kernel.org&gt;
Link: https://lore.kernel.org/r/8af1cb77-4b5a-64b9-da5d-f6a95e537f99@infradead.org
Signed-off-by: Jonathan Corbet &lt;corbet@lwn.net&gt;
</content>
</entry>
</feed>
