linux/mm/memory.c, branch v5.10

mm: allow a NULL fn callback in apply_to_page_range

2020-10-18T16:27:10Z

Besides calling the callback on each page, apply_to_page_range also has the effect of pre-faulting all PTEs for the range. To support callers that only need the pre-faulting, make the callback optional. Based on a patch from Minchan Kim . Signed-off-by: Christoph Hellwig Signed-off-by: Andrew Morton Cc: Boris Ostrovsky Cc: Chris Wilson Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Juergen Gross Cc: Matthew Auld Cc: "Matthew Wilcox (Oracle)" Cc: Nitin Gupta Cc: Peter Zijlstra Cc: Rodrigo Vivi Cc: Stefano Stabellini Cc: Tvrtko Ursulin Cc: Uladzislau Rezki (Sony) Link: https://lkml.kernel.org/r/20201002122204.1534411-5-hch@lst.de Signed-off-by: Linus Torvalds

mm/memory: remove page fault assumption of compound page size

2020-10-16T18:11:15Z

A compound page in the page cache will not necessarily be of PMD size, so check explicitly. [willy@infradead.org: fix remove page fault assumption of compound page size] Link: https://lkml.kernel.org/r/20201001152259.14932-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton Cc: Huang Ying Cc: Kirill A. Shutemov Link: https://lkml.kernel.org/r/20200908195539.25896-3-willy@infradead.org Signed-off-by: Linus Torvalds

Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping

2020-10-15T21:43:29Z

Pull dma-mapping updates from Christoph Hellwig: - rework the non-coherent DMA allocator - move private definitions out of - lower CMA_ALIGNMENT (Paul Cercueil) - remove the omap1 dma address translation in favor of the common code - make dma-direct aware of multiple dma offset ranges (Jim Quinlan) - support per-node DMA CMA areas (Barry Song) - increase the default seg boundary limit (Nicolin Chen) - misc fixes (Robin Murphy, Thomas Tai, Xu Wang) - various cleanups * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits) ARM/ixp4xx: add a missing include of dma-map-ops.h dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling dma-direct: factor out a dma_direct_alloc_from_pool helper dma-direct check for highmem pages in dma_direct_alloc_pages dma-mapping: merge into dma-mapping: move large parts of to kernel/dma dma-mapping: move dma-debug.h to kernel/dma/ dma-mapping: remove dma-mapping: merge into dma-contiguous: remove dma_contiguous_set_default dma-contiguous: remove dev_set_cma_area dma-contiguous: remove dma_declare_contiguous dma-mapping: split cma: decrease CMA_ALIGNMENT lower limit to 2 firewire-ohci: use dma_alloc_pages dma-iommu: implement ->alloc_noncoherent dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods dma-mapping: add a new dma_alloc_pages API dma-mapping: remove dma_cache_sync 53c700: convert to dma_alloc_noncoherent ...

mm: remove src/dst mm parameter in copy_page_range()

2020-10-14T01:38:32Z

Both of the mm pointers are not needed after commit 7a4830c380f3 ("mm/fork: Pass new vma pointer into copy_page_range()"). Jason Gunthorpe also reported that the ordering of copy_page_range() is odd. Since working at it, reorder the parameters to be logical, by (1) always put the dst_* fields to be before src_* fields, and (2) keep the same type of parameters together. [peterx@redhat.com: further reorder some parameters and line format, per Jason] Link: https://lkml.kernel.org/r/20201002192647.7161-1-peterx@redhat.com [peterx@redhat.com: fix warnings] Link: https://lkml.kernel.org/r/20201006200138.GA6026@xz-x1 Reported-by: Kirill A. Shutemov Signed-off-by: Peter Xu Signed-off-by: Andrew Morton Reviewed-by: Jason Gunthorpe Acked-by: Kirill A. Shutemov Link: https://lkml.kernel.org/r/20200930204950.6668-1-peterx@redhat.com Signed-off-by: Linus Torvalds

mm/memory.c: fix spello of "function"

2020-10-14T01:38:31Z

Fix typo/spello of "function". Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Link: https://lkml.kernel.org/r/e7bf180e-c558-b1d5-9a15-6d9708823c9c@infradead.org Signed-off-by: Linus Torvalds

mm/memory.c: replace vmf->vma with variable vma

2020-10-14T01:38:31Z

The code has declared a vma_struct named vma which is assigned a value of vmf->vma. Thus, use variable vma directly here. Signed-off-by: Yanfei Xu Signed-off-by: Andrew Morton Reviewed-by: Matthew Wilcox (Oracle) Link: http://lkml.kernel.org/r/20200818084607.37616-1-yanfei.xu@windriver.com Signed-off-by: Linus Torvalds

mm/memory.c: fix typo in __do_fault() comment

2020-10-14T01:38:31Z

It's "pte_alloc_one", not "pte_alloc_pne". Let's fix that. Signed-off-by: Yanfei Xu Signed-off-by: Andrew Morton Reviewed-by: David Hildenbrand Link: http://lkml.kernel.org/r/20200818104339.5310-1-yanfei.xu@windriver.com Signed-off-by: Linus Torvalds

mm: avoid early COW write protect games during fork()

2020-10-08T17:11:32Z

In commit 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") we write-protected the PTE before doing the page pinning check, in order to avoid a race with concurrent fast-GUP pinning (which doesn't take the mm semaphore or the page table lock). That trick doesn't actually work - it doesn't handle memory ordering properly, and doing so would be prohibitively expensive. It also isn't really needed. While we're moving in the direction of allowing and supporting page pinning without marking the pinned area with MADV_DONTFORK, the fact is that we've never really supported this kind of odd "concurrent fork() and page pinning", and doing the serialization on a pte level is just wrong. We can add serialization with a per-mm sequence counter, so we know how to solve that race properly, but we'll do that at a more appropriate time. Right now this just removes the write protect games. It also turns out that the write protect games actually break on Power, as reported by Aneesh Kumar: "Architecture like ppc64 expects set_pte_at to be not used for updating a valid pte. This is further explained in commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")" and the code triggered a warning there: WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185 Call Trace: copy_present_page mm/memory.c:857 [inline] copy_present_pte mm/memory.c:899 [inline] copy_pte_range mm/memory.c:1014 [inline] copy_pmd_range mm/memory.c:1092 [inline] copy_pud_range mm/memory.c:1127 [inline] copy_p4d_range mm/memory.c:1150 [inline] copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212 dup_mmap kernel/fork.c:592 [inline] dup_mm+0x77c/0xab0 kernel/fork.c:1355 copy_mm kernel/fork.c:1411 [inline] copy_process+0x1f00/0x2740 kernel/fork.c:2070 _do_fork+0xc4/0x10b0 kernel/fork.c:2429 Link: https://lore.kernel.org/lkml/CAHk-=wiWr+gO0Ro4LvnJBMs90OiePNyrE3E+pJvc9PzdBShdmw@mail.gmail.com/ Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.kumar@linux.ibm.com/ Reported-by: Aneesh Kumar K.V Tested-by: Leon Romanovsky Cc: Peter Xu Cc: Jason Gunthorpe Cc: John Hubbard Cc: Andrew Morton Cc: Jan Kara Cc: Michal Hocko Cc: Kirill Shutemov Cc: Hugh Dickins Signed-off-by: Linus Torvalds

dma-mapping: move dma-debug.h to kernel/dma/

2020-10-06T05:07:05Z

Most of dma-debug.h is not required by anything outside of kernel/dma. Move the four declarations needed by dma-mappin.h or dma-ops providers into dma-mapping.h and dma-map-ops.h, and move the remainder of the file to kernel/dma/debug.h. Signed-off-by: Christoph Hellwig

mm: Do early cow for pinned pages during fork() for ptes

2020-09-27T18:21:35Z

This allows copy_pte_range() to do early cow if the pages were pinned on the source mm. Currently we don't have an accurate way to know whether a page is pinned or not. The only thing we have is page_maybe_dma_pinned(). However that's good enough for now. Especially, with the newly added mm->has_pinned flag to make sure we won't affect processes that never pinned any pages. It would be easier if we can do GFP_KERNEL allocation within copy_one_pte(). Unluckily, we can't because we're with the page table locks held for both the parent and child processes. So the page allocation needs to be done outside copy_one_pte(). Some trick is there in copy_present_pte(), majorly the wrprotect trick to block concurrent fast-gup. Comments in the function should explain better in place. Oleg Nesterov reported a (probably harmless) bug during review that we didn't reset entry.val properly in copy_pte_range() so that potentially there's chance to call add_swap_count_continuation() multiple times on the same swp entry. However that should be harmless since even if it happens, the same function (add_swap_count_continuation()) will return directly noticing that there're enough space for the swp counter. So instead of a standalone stable patch, it is touched up in this patch directly. Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/ Suggested-by: Linus Torvalds Signed-off-by: Peter Xu Signed-off-by: Linus Torvalds