linux/mm/memory.c, branch v3.0

mm: __tlb_remove_page() check the correct batch

2011-07-09T04:14:43Z

__tlb_remove_page() switches to a new batch page, but still checks space in the old batch. This check always fails, and causes a forced tlb flush. Signed-off-by: Shaohua Li Acked-by: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: move vmtruncate_range to truncate.c

2011-06-28T01:00:12Z

You would expect to find vmtruncate_range() next to vmtruncate() in mm/truncate.c: move it there. Signed-off-by: Hugh Dickins Acked-by: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: fix wrong kunmap_atomic() pointer

2011-06-16T03:04:00Z

Running a ktest.pl test, I hit the following bug on x86_32: ------------[ cut here ]------------ WARNING: at arch/x86/mm/highmem_32.c:81 __kunmap_atomic+0x64/0xc1() Hardware name: Modules linked in: Pid: 93, comm: sh Not tainted 2.6.39-test+ #1 Call Trace: [] warn_slowpath_common+0x7c/0x91 [] ? __kunmap_atomic+0x64/0xc1 [] ? __kunmap_atomic+0x64/0xc1^M [] warn_slowpath_null+0x22/0x24 [] __kunmap_atomic+0x64/0xc1 [] unmap_vmas+0x43a/0x4e0 [] exit_mmap+0x91/0xd2 [] mmput+0x43/0xad [] exit_mm+0x111/0x119 [] do_exit+0x1ff/0x5fa [] ? set_current_blocked+0x3c/0x40 [] ? sigprocmask+0x7e/0x8e [] do_group_exit+0x65/0x88 [] sys_exit_group+0x18/0x1c [] sysenter_do_call+0x12/0x38 ---[ end trace 8055f74ea3c0eb62 ]--- Running a ktest.pl git bisect, found the culprit: commit e303297e6c3a ("mm: extended batches for generic mmu_gather") But although this was the commit triggering the bug, it was not the one originally responsible for the bug. That was commit d16dfc550f53 ("mm: mmu_gather rework"). The code in zap_pte_range() has something that looks like the following: pte = pte_offset_map_lock(mm, pmd, addr, &ptl); do { [...] } while (pte++, addr += PAGE_SIZE, addr != end); pte_unmap_unlock(pte - 1, ptl); The pte starts off pointing at the first element in the page table directory that was returned by the pte_offset_map_lock(). When it's done with the page, pte will be pointing to anything between the next entry and the first entry of the next page inclusive. By doing a pte - 1, this puts the pte back onto the original page, which is all that pte_unmap_unlock() needs. In most archs (64 bit), this is not an issue as the pte is ignored in the pte_unmap_unlock(). But on 32 bit archs, where things may be kmapped, it is essential that the pte passed to pte_unmap_unlock() resides on the same page that was given by pte_offest_map_lock(). The problem came in d16dfc55 ("mm: mmu_gather rework") where it introduced a "break;" from the while loop. This alone did not seem to easily trigger the bug. But the modifications made by e303297e6 caused that "break;" to be hit on the first iteration, before the pte++. The pte not being incremented will now cause pte_unmap_unlock(pte - 1) to be pointing to the previous page. This will cause the wrong page to be unmapped, and also trigger the warning above. The simple solution is to just save the pointer given by pte_offset_map_lock() and use it in the unlock. Signed-off-by: Steven Rostedt Cc: Peter Zijlstra Cc: KAMEZAWA Hiroyuki Acked-by: Hugh Dickins Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/memory.c: fix kernel-doc notation

2011-06-16T03:03:59Z

Fix new kernel-doc warnings in mm/memory.c: Warning(mm/memory.c:1327): No description found for parameter 'tlb' Warning(mm/memory.c:1327): Excess function parameter 'tlbp' description in 'unmap_vmas' Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memcg: add the pagefault count into memcg stats

2011-05-27T00:12:36Z

Two new stats in per-memcg memory.stat which tracks the number of page faults and number of major page faults. "pgfault" "pgmajfault" They are different from "pgpgin"/"pgpgout" stat which count number of pages charged/discharged to the cgroup and have no meaning of reading/ writing page to disk. It is valuable to track the two stats for both measuring application's performance as well as the efficiency of the kernel page reclaim path. Counting pagefaults per process is useful, but we also need the aggregated value since processes are monitored and controlled in cgroup basis in memcg. Functional test: check the total number of pgfault/pgmajfault of all memcgs and compare with global vmstat value: $ cat /proc/vmstat | grep fault pgfault 1070751 pgmajfault 553 $ cat /dev/cgroup/memory.stat | grep fault pgfault 1071138 pgmajfault 553 total_pgfault 1071142 total_pgmajfault 553 $ cat /dev/cgroup/A/memory.stat | grep fault pgfault 199 pgmajfault 0 total_pgfault 199 total_pgmajfault 0 Performance test: run page fault test(pft) wit 16 thread on faulting in 15G anon pages in 16G container. There is no regression noticed on the "flt/cpu/s" Sample output from pft: TAG pft:anon-sys-default: Gb Thr CLine User System Wall flt/cpu/s fault/wsec 15 16 1 0.67s 233.41s 14.76s 16798.546 266356.260 +-------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 16682.962 17344.027 16913.524 16928.812 166.5362 + 10 16695.568 16923.896 16820.604 16824.652 84.816568 No difference proven at 95.0% confidence [akpm@linux-foundation.org: fix build] [hughd@google.com: shmem fix] Signed-off-by: Ying Han Acked-by: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Reviewed-by: Minchan Kim Cc: Daisuke Nishimura Acked-by: Balbir Singh Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: don't access vm_flags as 'int'

2011-05-26T16:20:31Z

The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor 'unsigned int'. This patch fixes such misuse. Signed-off-by: KOSAKI Motohiro [ Changed to use a typedef - we'll extend it to cover more cases later, since there has been discussion about making it a 64-bit type.. - Linus ] Signed-off-by: Linus Torvalds

mm: uninline large generic tlb.h functions

2011-05-25T15:39:20Z

Some of these functions have grown beyond inline sanity, move them out-of-line. Signed-off-by: Peter Zijlstra Requested-by: Andrew Morton Requested-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: Convert i_mmap_lock to a mutex

2011-05-25T15:39:18Z

Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra Acked-by: Hugh Dickins Cc: Benjamin Herrenschmidt Cc: David Miller Cc: Martin Schwidefsky Cc: Russell King Cc: Paul Mundt Cc: Jeff Dike Cc: Richard Weinberger Cc: Tony Luck Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: KOSAKI Motohiro Cc: Nick Piggin Cc: Namhyung Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: Remove i_mmap_lock lockbreak

2011-05-25T15:39:17Z

Hugh says: "The only significant loser, I think, would be page reclaim (when concurrent with truncation): could spin for a long time waiting for the i_mmap_mutex it expects would soon be dropped? " Counter points: - cpu contention makes the spin stop (need_resched()) - zap pages should be freeing pages at a higher rate than reclaim ever can I think the simplification of the truncate code is definitely worth it. Effectively reverts: 2aa15890f3c ("mm: prevent concurrent unmap_mapping_range() on the same inode") and takes out the code that caused its problem. Signed-off-by: Peter Zijlstra Reviewed-by: KAMEZAWA Hiroyuki Cc: Hugh Dickins Cc: Benjamin Herrenschmidt Cc: David Miller Cc: Martin Schwidefsky Cc: Russell King Cc: Paul Mundt Cc: Jeff Dike Cc: Richard Weinberger Cc: Tony Luck Cc: Mel Gorman Cc: KOSAKI Motohiro Cc: Nick Piggin Cc: Namhyung Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: extended batches for generic mmu_gather

2011-05-25T15:39:16Z

Instead of using a single batch (the small on-stack, or an allocated page), try and extend the batch every time it runs out and only flush once either the extend fails or we're done. Signed-off-by: Peter Zijlstra Requested-by: Nick Piggin Reviewed-by: KAMEZAWA Hiroyuki Acked-by: Hugh Dickins Cc: Benjamin Herrenschmidt Cc: David Miller Cc: Martin Schwidefsky Cc: Russell King Cc: Paul Mundt Cc: Jeff Dike Cc: Richard Weinberger Cc: Tony Luck Cc: Mel Gorman Cc: KOSAKI Motohiro Cc: Nick Piggin Cc: Namhyung Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds