<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/init/main.c, branch v5.12</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.12</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.12'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-02-26T17:41:05Z</updated>
<entry>
<title>kgdb: fix to kill breakpoints on initmem after boot</title>
<updated>2021-02-26T17:41:05Z</updated>
<author>
<name>Sumit Garg</name>
<email>sumit.garg@linaro.org</email>
</author>
<published>2021-02-26T01:22:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d54ce6158e354f5358a547b96299ecd7f3725393'/>
<id>urn:sha1:d54ce6158e354f5358a547b96299ecd7f3725393</id>
<content type='text'>
Currently breakpoints in kernel .init.text section are not handled
correctly while allowing to remove them even after corresponding pages
have been freed.

Fix it via killing .init.text section breakpoints just prior to initmem
pages being freed.

Doug: "HW breakpoints aren't handled by this patch but it's probably
not such a big deal".

Link: https://lkml.kernel.org/r/20210224081652.587785-1-sumit.garg@linaro.org
Signed-off-by: Sumit Garg &lt;sumit.garg@linaro.org&gt;
Suggested-by: Doug Anderson &lt;dianders@chromium.org&gt;
Acked-by: Doug Anderson &lt;dianders@chromium.org&gt;
Acked-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Tested-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib: stackdepot: add support to disable stack depot</title>
<updated>2021-02-26T17:41:04Z</updated>
<author>
<name>Vijayanand Jitta</name>
<email>vjitta@codeaurora.org</email>
</author>
<published>2021-02-26T01:21:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e1fdc403349c64fa58f4c163f4bf9b860b4db808'/>
<id>urn:sha1:e1fdc403349c64fa58f4c163f4bf9b860b4db808</id>
<content type='text'>
Add a kernel parameter stack_depot_disable to disable stack depot.  So
that stack hash table doesn't consume any memory when stack depot is
disabled.

The use case is CONFIG_PAGE_OWNER without page_owner=on.  Without this
patch, stackdepot will consume the memory for the hashtable.  By default,
it's 8M which is never trivial.

With this option, in CONFIG_PAGE_OWNER configured system, page_owner=off,
stack_depot_disable in kernel command line, we could save the wasted
memory for the hashtable.

[akpm@linux-foundation.org: fix CONFIG_STACKDEPOT=n build]

Link: https://lkml.kernel.org/r/1611749198-24316-2-git-send-email-vjitta@codeaurora.org
Signed-off-by: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Signed-off-by: Vijayanand Jitta &lt;vjitta@codeaurora.org&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Yogesh Lal &lt;ylal@codeaurora.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: add Kernel Electric-Fence infrastructure</title>
<updated>2021-02-26T17:41:02Z</updated>
<author>
<name>Alexander Potapenko</name>
<email>glider@google.com</email>
</author>
<published>2021-02-26T01:18:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0ce20dd840897b12ae70869c69f1ba34d6d16965'/>
<id>urn:sha1:0ce20dd840897b12ae70869c69f1ba34d6d16965</id>
<content type='text'>
Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.  This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) and production server-workload benchmarks that a kernel with
KFENCE (using sample intervals 100-500ms) is performance-neutral
compared to a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

	https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence

This patch (of 9):

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error. To detect out-of-bounds
writes to memory within the object's page itself, KFENCE also uses
pattern-based redzones. The following figure illustrates the page
layout:

  ---+-----------+-----------+-----------+-----------+-----------+---
     | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
     | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
     | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
     | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
     | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
     | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
  ---+-----------+-----------+-----------+-----------+-----------+---

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval, a
guarded allocation from the KFENCE object pool is returned to the main
allocator (SLAB or SLUB). At this point, the timer is reset, and the
next allocation is set up after the expiration of the interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE. To date, we have verified by running synthetic
benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
is performance-neutral compared to the non-KFENCE baseline.

For more details, see Documentation/dev-tools/kfence.rst (added later in
the series).

[elver@google.com: fix parameter description for kfence_object_start()]
  Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
[elver@google.com: avoid stalling work queue task without allocations]
  Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
  Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
[elver@google.com: fix potential deadlock due to wake_up()]
  Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
  Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
[elver@google.com: add option to use KFENCE without static keys]
  Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
[elver@google.com: add missing copyright and description headers]
  Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com

Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Alexander Potapenko &lt;glider@google.com&gt;
Reviewed-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reviewed-by: SeongJae Park &lt;sjpark@amazon.de&gt;
Co-developed-by: Marco Elver &lt;elver@google.com&gt;
Reviewed-by: Jann Horn &lt;jannh@google.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Cc: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Joern Engel &lt;joern@purestorage.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sfi: Remove framework for deprecated firmware</title>
<updated>2021-02-15T19:09:46Z</updated>
<author>
<name>Andy Shevchenko</name>
<email>andriy.shevchenko@linux.intel.com</email>
</author>
<published>2021-02-11T13:40:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4590d98f5a4f466d17e5c81d7c9fc796da9a8cee'/>
<id>urn:sha1:4590d98f5a4f466d17e5c81d7c9fc796da9a8cee</id>
<content type='text'>
SFI-based platforms are gone. So does this framework.

This removes mention of SFI through the drivers and other code as well.

Signed-off-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Reviewed-by: Hans de Goede &lt;hdegoede@redhat.com&gt;
Acked-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov</title>
<updated>2021-02-05T19:03:47Z</updated>
<author>
<name>Johannes Berg</name>
<email>johannes.berg@intel.com</email>
</author>
<published>2021-02-05T02:32:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=55b6f763d8bcb5546997933105d66d3e6b080e6a'/>
<id>urn:sha1:55b6f763d8bcb5546997933105d66d3e6b080e6a</id>
<content type='text'>
On ARCH=um, loading a module doesn't result in its constructors getting
called, which breaks module gcov since the debugfs files are never
registered.  On the other hand, in-kernel constructors have already been
called by the dynamic linker, so we can't call them again.

Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be
selected, but avoiding the in-kernel constructor calls.

Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we
really do want CONSTRUCTORS, just not kernel binary ones.

Link: https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid
Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Reviewed-by: Peter Oberparleiter &lt;oberpar@linux.ibm.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Jessica Yu &lt;jeyu@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "init/console: Use ttynull as a fallback when there is no console"</title>
<updated>2021-01-08T19:02:18Z</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2021-01-08T11:48:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a91bd6223ecd46addc71ee6fcd432206d39365d2'/>
<id>urn:sha1:a91bd6223ecd46addc71ee6fcd432206d39365d2</id>
<content type='text'>
This reverts commit 757055ae8dedf5333af17b3b5b4b70ba9bc9da4e.

The commit caused that ttynull was used as the default console
on several systems[1][2][3]. As a result, the console was
blank even when a better alternative existed.

It happened when there was no console configured
on the command line and ttynull_init() was the first initcall
calling register_console().

Or it happened when /dev/ did not exist when console_on_rootfs()
was called. It was not able to open /dev/console even though
a console driver was registered. It tried to add ttynull console
but it obviously did not help. But ttynull became the preferred
console and was used by /dev/console when it was available later.

The commit tried to fix a historical problem that have been there
for ages. The primary motivation was the commit 3cffa06aeef7ece30f6
("printk/console: Allow to disable console output by using console=""
 or console=null"). It provided a clean solution for a workaround
 that was widely used and worked only by chance.

This revert causes that the console="" or console=null command line
options will again work only by chance. These options will cause that
a particular console will be preferred and the default (tty) ones
will not get enabled. There will be no console registered at
all. As a result there won't be stdin, stdout, and stderr for
the init process. But it worked exactly this way even before.

The proper solution has to fulfill many conditions:

  + Register ttynull only when explicitly required or as
    the ultimate fallback.

  + ttynull should get associated with /dev/console but it must
    not become preferred console when used as a fallback.
    Especially, it must still be possible to replace it
    by a better console later.

Such a change requires clean up of the register_console() code.
Otherwise, it would be even harder to follow. Especially, the use
of has_preferred_console and CON_CONSDEV flag is tricky. The clean
up is risky. The ordering of consoles is not well defined. And
any changes tend to break existing user settings.

Do the revert at the least risky solution for now.

[1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/
[2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/
[3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/

Reported-by: Andy Shevchenko &lt;andriy.shevchenko@linux.intel.com&gt;
Reported-by: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Reported-by: Thomas Meyer &lt;thomas@m3y3r.de&gt;
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu</title>
<updated>2021-01-04T18:55:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-01-04T18:55:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=36bbbd0e234d817938bdc52121a0f5473b3e58f5'/>
<id>urn:sha1:36bbbd0e234d817938bdc52121a0f5473b3e58f5</id>
<content type='text'>
Pull RCU fix from Paul McKenney:
 "This is a fix for a regression in the v5.10 merge window, but it was
  reported quite late in the v5.10 process, plus generating and testing
  the fix took some time.

  The regression is due to commit 36dadef23fcc ("kprobes: Init kprobes
  in early_initcall") which on powerpc can use RCU Tasks before
  initialization, resulting in boot failures.

  The fix is straightforward, simply moving initialization of RCU Tasks
  before the early_initcall()s. The fix has been exposed to -next and
  kbuild test robot testing, and has been tested by the PowerPC guys"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu-tasks: Move RCU-tasks initialization to before early_initcall()
</content>
</entry>
<entry>
<title>Merge tag 'printk-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux</title>
<updated>2020-12-16T18:45:11Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-12-16T18:45:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d3eb52113d162cc88975fbd03c9e6f9cf2f8a771'/>
<id>urn:sha1:d3eb52113d162cc88975fbd03c9e6f9cf2f8a771</id>
<content type='text'>
Pull printk updates from Petr Mladek:

 - Finally allow parallel writes and reads into/from the lockless
   ringbuffer. But it is not a complete solution. Readers are still
   serialized against each other. And nested writes are still prevented
   by printk_safe per-CPU buffers.

 - Use ttynull as the ultimate fallback for /dev/console.

 - Officially allow disabling console output by using console="" or
   console=null

 - A few code cleanups

* tag 'printk-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: remove logbuf_lock writer-protection of ringbuffer
  printk: inline log_output(),log_store() in vprintk_store()
  printk: remove obsolete dead assignment
  printk/console: Allow to disable console output by using console="" or console=null
  init/console: Use ttynull as a fallback when there is no console
  printk: ringbuffer: Reference text_data_ring directly in callees.
</content>
</entry>
<entry>
<title>mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters</title>
<updated>2020-12-15T20:13:46Z</updated>
<author>
<name>Vlastimil Babka</name>
<email>vbabka@suse.cz</email>
</author>
<published>2020-12-15T03:13:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=04013513cc84c401c7de9023ff3eda7863fc4add'/>
<id>urn:sha1:04013513cc84c401c7de9023ff3eda7863fc4add</id>
<content type='text'>
Patch series "cleanup page poisoning", v3.

I have identified a number of issues and opportunities for cleanup with
CONFIG_PAGE_POISON and friends:

 - interaction with init_on_alloc and init_on_free parameters depends on
   the order of parameters (Patch 1)

 - the boot time enabling uses static key, but inefficienty (Patch 2)

 - sanity checking is incompatible with hibernation (Patch 3)

 - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
   init_on_free (Patch 4)

 - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
   have init_on_free (Patch 5)

This patch (of 5):

Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
produces a warning in dmesg that page_poison takes precedence.  However,
as these warnings are printed in early_param handlers for
init_on_alloc/free, they are not printed if page_poison is enabled later
on the command line (handlers are called in the order of their
parameters), or when init_on_alloc/free is always enabled by the
respective config option - before the page_poison early param handler is
called, it is not considered to be enabled.  This is inconsistent.

We can remove the dependency on order by making the init_on_* parameters
only set a boolean variable, and postponing the evaluation after all early
params have been processed.  Introduce a new
init_mem_debugging_and_hardening() function for that, and move the related
debug_pagealloc processing there as well.

As a result init_mem_debugging_and_hardening() knows always accurately if
init_on_* and/or page_poison options were enabled.  Thus we can also
optimize want_init_on_alloc() and want_init_on_free().  We don't need to
check page_poisoning_enabled() there, we can instead not enable the
init_on_* static keys at all, if page poisoning is enabled.  This results
in a simpler and more effective code.

Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz
Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Mateusz Nosek &lt;mateusznosek0@gmail.com&gt;
Cc: Laura Abbott &lt;labbott@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set</title>
<updated>2020-12-15T20:13:44Z</updated>
<author>
<name>Lin Feng</name>
<email>linf@wangsu.com</email>
</author>
<published>2020-12-15T03:11:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ba8f3587f55667c688acd7c5103c870983e294dd'/>
<id>urn:sha1:ba8f3587f55667c688acd7c5103c870983e294dd</id>
<content type='text'>
In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set,
we have following callchain:

start_kernel
...
  mm_init
    mem_init
     memblock_free_all
       reset_all_zones_managed_pages
       free_low_memory_core_early
...
  buffer_init
    nr_free_buffer_pages
      zone-&gt;managed_pages
...
  rest_init
    kernel_init
      kernel_init_freeable
        page_alloc_init_late
          kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
          wait_for_completion(&amp;pgdat_init_all_done_comp);
          ...
          files_maxfiles_init

It's clear that buffer_init depends on zone-&gt;managed_pages, but it's reset
in reset_all_zones_managed_pages after that pages are readded into
zone-&gt;managed_pages, but when buffer_init runs this process is half done
and most of them will finally be added till deferred_init_memmap done.  In
large memory couting of nr_free_buffer_pages drifts too much, also
drifting from kernels to kernels on same hardware.

Fix is simple, it delays buffer_init run till deferred_init_memmap all
done.

But as corrected by this patch, max_buffer_heads becomes very large, the
value is roughly as many as 4 times of totalram_pages, formula:
max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct
buffer_head));

Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads
turns out to be roughly 67,108,864.  In common cases, should a buffer_head
be mapped to one page/block(4KB)?  So max_buffer_heads never exceeds
totalram_pages.  IMO it's likely to make buffer_heads_over_limit bool
value alwasy false, then make codes 'if (buffer_heads_over_limit)' test in
vmscan unnecessary.

So this patch will change the original behavior related to
buffer_heads_over_limit in vmscan since we used a half done value of
zone-&gt;managed_pages before, or should we use a smaller factor(&lt;10%) in
previous formula.

akpm: I think this is OK - the max_buffer_heads code is only needed on
highmem machines, to prevent ZONE_NORMAL from being consumed by large
amounts of buffer_heads attached to highmem pagecache.  This problem will
not occur on 64-bit machines, so this feature's non-functionality on such
machines is a feature, not a bug.

Link: https://lkml.kernel.org/r/20201123110500.103523-1-linf@wangsu.com
Signed-off-by: Lin Feng &lt;linf@wangsu.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
