<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/sparse.c, branch v5.19</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.19</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.19'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-05-13T14:20:19Z</updated>
<entry>
<title>mm/memory-failure.c: move clear_hwpoisoned_pages</title>
<updated>2022-05-13T14:20:19Z</updated>
<author>
<name>zhenwei pi</name>
<email>pizhenwei@bytedance.com</email>
</author>
<published>2022-05-13T03:23:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=60f272f6b09a8f14156df88cccd21447ab394452'/>
<id>urn:sha1:60f272f6b09a8f14156df88cccd21447ab394452</id>
<content type='text'>
Patch series "memory-failure: fix hwpoison_filter", v2.

As well known, the memory failure mechanism handles memory corrupted
event, and try to send SIGBUS to the user process which uses this
corrupted page.

For the virtualization case, QEMU catches SIGBUS and tries to inject MCE
into the guest, and the guest handles memory failure again.  Thus the
guest gets the minimal effect from hardware memory corruption.

The further step I'm working on:

1, try to modify code to decrease poisoned pages in a single place
   (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series).

2, try to use page_handle_poison() to handle SetPageHWPoison() and
   num_poisoned_pages_inc() together.  It would be best to call
   num_poisoned_pages_inc() in a single place too.

3, introduce memory failure notifier list in memory-failure.c: notify
   the corrupted PFN to someone who registers this list.  If I can
   complete [1] and [2] part, [3] will be quite easy(just call notifier
   list after increasing poisoned page).

4, introduce memory recover VQ for memory balloon device, and registers
   memory failure notifier list.  During the guest kernel handles memory
   failure, balloon device gets notified by memory failure notifier list,
   and tells the host to recover the corrupted PFN(GPA) by the new VQ.

5, host side remaps the corrupted page(HVA), and tells the guest side
   to unpoison the PFN(GPA).  Then the guest fixes the corrupted page(GPA)
   dynamically.


This patch (of 5):

clear_hwpoisoned_pages() clears HWPoison flag and decreases the number of
poisoned pages, this actually works as part of memory failure.

Move this function from sparse.c to memory-failure.c, finally there is no
CONFIG_MEMORY_FAILURE in sparse.c.

Link: https://lkml.kernel.org/r/20220509105641.491313-1-pizhenwei@bytedance.com
Link: https://lkml.kernel.org/r/20220509105641.491313-2-pizhenwei@bytedance.com
Signed-off-by: zhenwei pi &lt;pizhenwei@bytedance.com&gt;
Acked-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/sparse-vmemmap: add a pgmap argument to section activation</title>
<updated>2022-04-29T06:16:15Z</updated>
<author>
<name>Joao Martins</name>
<email>joao.m.martins@oracle.com</email>
</author>
<published>2022-04-29T06:16:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e3246d8f52173a798710314a42fea83223036fc8'/>
<id>urn:sha1:e3246d8f52173a798710314a42fea83223036fc8</id>
<content type='text'>
Patch series "sparse-vmemmap: memory savings for compound devmaps (device-dax)", v9.

This series minimizes 'struct page' overhead by pursuing a similar
approach as Muchun Song series "Free some vmemmap pages of hugetlb page"
(now merged since v5.14), but applied to devmap with @vmemmap_shift
(device-dax).  

The vmemmap dedpulication original idea (already used in HugeTLB) is to
reuse/deduplicate tail page vmemmap areas, particular the area which only
describes tail pages.  So a vmemmap page describes 64 struct pages, and
the first page for a given ZONE_DEVICE vmemmap would contain the head page
and 63 tail pages.  The second vmemmap page would contain only tail pages,
and that's what gets reused across the rest of the subsection/section. 
The bigger the page size, the bigger the savings (2M hpage -&gt; save 6
vmemmap pages; 1G hpage -&gt; save 4094 vmemmap pages).  

This is done for PMEM /specifically only/ on device-dax configured
namespaces, not fsdax.  In other words, a devmap with a @vmemmap_shift.

In terms of savings, per 1Tb of memory, the struct page cost would go down
with compound devmap:

* with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of
  total memory)

* with 1G pages we lose 40MB instead of 16G (0.0014% instead of 1.5% of
  total memory)

The series is mostly summed up by patch 4, and to summarize what the
series does:

Patches 1 - 3: Minor cleanups in preparation for patch 4.  Move the very
nice docs of hugetlb_vmemmap.c into a Documentation/vm/ entry.

Patch 4: Patch 4 is the one that takes care of the struct page savings
(also referred to here as tail-page/vmemmap deduplication).  Much like
Muchun series, we reuse the second PTE tail page vmemmap areas across a
given @vmemmap_shift On important difference though, is that contrary to
the hugetlbfs series, there's no vmemmap for the area because we are
late-populating it as opposed to remapping a system-ram range.  IOW no
freeing of pages of already initialized vmemmap like the case for
hugetlbfs, which greatly simplifies the logic (besides not being
arch-specific).  altmap case unchanged and still goes via the
vmemmap_populate().  Also adjust the newly added docs to the device-dax
case.

[Note that device-dax is still a little behind HugeTLB in terms of
savings.  I have an additional simple patch that reuses the head vmemmap
page too, as a follow-up.  That will double the savings and namespaces
initialization.]

Patch 5: Initialize fewer struct pages depending on the page size with
DRAM backed struct pages -- because fewer pages are unique and most tail
pages (with bigger vmemmap_shift).

    NVDIMM namespace bootstrap improves from ~268-358 ms to
    ~80-110/&lt;1ms on 128G NVDIMMs with 2M and 1G respectivally.  And struct
    page needed capacity will be 3.8x / 1071x smaller for 2M and 1G
    respectivelly.  Tested on x86 with 1.5Tb of pmem (including pinning,
    and RDMA registration/deregistration scalability with 2M MRs)


This patch (of 5):

In support of using compound pages for devmap mappings, plumb the pgmap
down to the vmemmap_populate implementation.  Note that while altmap is
retrievable from pgmap the memory hotplug code passes altmap without
pgmap[*], so both need to be independently plumbed.

So in addition to @altmap, pass @pgmap to sparse section populate
functions namely:

	sparse_add_section
	  section_activate
	    populate_section_memmap
   	      __populate_section_memmap

Passing @pgmap allows __populate_section_memmap() to both fetch the
vmemmap_shift in which memmap metadata is created for and also to let
sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick
whether to just reuse tail pages from past onlined sections.

While at it, fix the kdoc for @altmap for sparse_add_section().

[*] https://lore.kernel.org/linux-mm/20210319092635.6214-1-osalvador@suse.de/

Link: https://lkml.kernel.org/r/20220420155310.9712-1-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/20220420155310.9712-2-joao.m.martins@oracle.com
Signed-off-by: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/sparse: make mminit_validate_memmodel_limits() static</title>
<updated>2022-03-22T22:57:05Z</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2022-03-22T21:42:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c7878534a1b61c5cc2effa3a539099f2cf87cd3a'/>
<id>urn:sha1:c7878534a1b61c5cc2effa3a539099f2cf87cd3a</id>
<content type='text'>
It's only used in the sparse.c now. So we can make it static and further
clean up the relevant code.

Link: https://lkml.kernel.org/r/20220127093221.63524-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bootmem: Use page-&gt;index instead of page-&gt;freelist</title>
<updated>2022-01-06T11:27:03Z</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2021-10-04T13:46:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c5e97ed154589524a1df4ae2be55c4cfdb0d0573'/>
<id>urn:sha1:c5e97ed154589524a1df4ae2be55c4cfdb0d0573</id>
<content type='text'>
page-&gt;freelist is for the use of slab.  Using page-&gt;index is the same
set of bits as page-&gt;freelist, and by using an integer instead of a
pointer, we can avoid casts.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: &lt;x86@kernel.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
</content>
</entry>
<entry>
<title>memblock: use memblock_free for freeing virtual pointers</title>
<updated>2021-11-06T20:30:41Z</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2021-11-05T20:43:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4421cca0a3e4833b3bf0f20de98eb580ab8c7290'/>
<id>urn:sha1:4421cca0a3e4833b3bf0f20de98eb580ab8c7290</id>
<content type='text'>
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()

The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.

    @@
    identifier vaddr;
    expression size;
    @@
    (
    - memblock_phys_free(__pa(vaddr), size);
    + memblock_free(vaddr, size);
    |
    - memblock_free_ptr(vaddr, size);
    + memblock_free(vaddr, size);
    )

[sfr@canb.auug.org.au: fixup]
  Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au

Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Shahab Vahedi &lt;Shahab.Vahedi@synopsys.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memblock: rename memblock_free to memblock_phys_free</title>
<updated>2021-11-06T20:30:41Z</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2021-11-05T20:43:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3ecc68349bbab6bff1d12cbc7951ca6019b2faf6'/>
<id>urn:sha1:3ecc68349bbab6bff1d12cbc7951ca6019b2faf6</id>
<content type='text'>
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().

The callers are updated with the below semantic patch:

    @@
    expression addr;
    expression size;
    @@
    - memblock_free(addr, size);
    + memblock_phys_free(addr, size);

Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Shahab Vahedi &lt;Shahab.Vahedi@synopsys.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memblock: drop memblock_free_early_nid() and memblock_free_early()</title>
<updated>2021-11-06T20:30:41Z</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2021-11-05T20:43:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa27717110ae51b9b9013ced0b5143888257bb79'/>
<id>urn:sha1:fa27717110ae51b9b9013ced0b5143888257bb79</id>
<content type='text'>
memblock_free_early_nid() is unused and memblock_free_early() is an
alias for memblock_free().

Replace calls to memblock_free_early() with calls to memblock_free() and
remove memblock_free_early() and memblock_free_early_nid().

Link: https://lkml.kernel.org/r/20210930185031.18648-4-rppt@kernel.org
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Shahab Vahedi &lt;Shahab.Vahedi@synopsys.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: introduce memmap_alloc() to unify memory map allocation</title>
<updated>2021-09-03T16:58:15Z</updated>
<author>
<name>Mike Rapoport</name>
<email>rppt@linux.ibm.com</email>
</author>
<published>2021-09-02T21:58:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c803b3c8b3b70f306ee6300bf8acdd70ffd1441a'/>
<id>urn:sha1:c803b3c8b3b70f306ee6300bf8acdd70ffd1441a</id>
<content type='text'>
There are several places that allocate memory for the memory map:
alloc_node_mem_map() for FLATMEM, sparse_buffer_init() and
__populate_section_memmap() for SPARSEMEM.

The memory allocated in the FLATMEM case is zeroed and it is never
poisoned, regardless of CONFIG_PAGE_POISON setting.

The memory allocated in the SPARSEMEM cases is not zeroed and it is
implicitly poisoned inside memblock if CONFIG_PAGE_POISON is set.

Introduce memmap_alloc() wrapper for memblock allocators that will be used
for both FLATMEM and SPARSEMEM cases and will makei memory map zeroing and
poisoning consistent for different memory models.

Link: https://lkml.kernel.org/r/20210714123739.16493-4-rppt@kernel.org
Signed-off-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Michal Simek &lt;monstr@monstr.eu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/sparse: clarify pgdat_to_phys</title>
<updated>2021-09-03T16:58:14Z</updated>
<author>
<name>Miles Chen</name>
<email>miles.chen@mediatek.com</email>
</author>
<published>2021-09-02T21:57:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bdbda735508ca83341899a77f143e4d5c58007b3'/>
<id>urn:sha1:bdbda735508ca83341899a77f143e4d5c58007b3</id>
<content type='text'>
Clarify pgdat_to_phys() by testing if
pgdat == &amp;contig_page_data when CONFIG_NUMA=n.

We only expect contig_page_data in such case, so we
use &amp;contig_page_data directly instead of pgdat.

No functional change intended when CONFIG_BUG_VM=n.

Comment from Mark [1]:
"
... and I reckon it'd be clearer and more robust to define
pgdat_to_phys() in the same ifdefs as contig_page_data so
that these, stay in-sync. e.g. have:

| #ifdef CONFIG_NUMA
| #define pgdat_to_phys(x)	virt_to_phys(x)
| #else /* CONFIG_NUMA */
|
| extern struct pglist_data contig_page_data;
| ...
| #define pgdat_to_phys(x)	__pa_symbol(&amp;contig_page_data)
|
| #endif /* CONIFIG_NUMA */
"

[1] https://lore.kernel.org/linux-arm-kernel/20210615131902.GB47121@C02TD0UTHF1T.local/

Link: https://lkml.kernel.org/r/20210723123342.26406-1-miles.chen@mediatek.com
Signed-off-by: Miles Chen &lt;miles.chen@mediatek.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>include/linux/mmzone.h: avoid a warning in sparse memory support</title>
<updated>2021-09-03T16:58:14Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2021-09-02T21:57:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e0dbb2bccf19ce5e870afb420a3d0480c582bb7b'/>
<id>urn:sha1:e0dbb2bccf19ce5e870afb420a3d0480c582bb7b</id>
<content type='text'>
cppcheck warns that we're possibly losing information by shifting an int.
It's a false positive, because we don't allow for a NUMA node ID that
large, but if we ever change SECTION_NID_SHIFT, it could become a problem,
and in any case this is usually a legitimate warning.  Fix it by adding
the necessary cast, which makes the compiler generate the right code.

Link: https://lkml.kernel.org/r/YOya+aBZFFmC476e@casper.infradead.org
Link: https://lkml.kernel.org/r/202107130348.6LsVT9Nc-lkp@intel.com
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
