<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm/filemap.c, branch v5.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-01-04T21:13:48Z</updated>
<entry>
<title>mm/: remove caller signal_pending branch predictions</title>
<updated>2019-01-04T21:13:48Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-01-03T23:28:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa45f1162f28cbba6c38180647b7b300f317ecb4'/>
<id>urn:sha1:fa45f1162f28cbba6c38180647b7b300f317ecb4</id>
<content type='text'>
This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-5-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, fault_around: do not take a reference to a locked page</title>
<updated>2018-12-28T20:11:51Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2018-12-28T08:38:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e0975b2aae0e669f995f7d5f11db25c3080ae11c'/>
<id>urn:sha1:e0975b2aae0e669f995f7d5f11db25c3080ae11c</id>
<content type='text'>
filemap_map_pages takes a speculative reference to each page in the range
before it tries to lock that page.  While this is correct it also can
influence page migration which will bail out when seeing an elevated
reference count.  The faultaround code would bail on seeing a locked page
so we can pro-actively check the PageLocked bit before
page_cache_get_speculative and prevent from pointless reference count
churn.

Link: http://lkml.kernel.org/r/20181211142741.2607-4-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/filemap.c: remove useless check in pagecache_get_page()</title>
<updated>2018-12-28T20:11:50Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2018-12-28T08:37:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c16eb000ca03f8cdc2153bca6c977b6f898e6934'/>
<id>urn:sha1:c16eb000ca03f8cdc2153bca6c977b6f898e6934</id>
<content type='text'>
page always is not NULL, so we may remove this useless check.

Link: http://lkml.kernel.org/r/154419752044.18559.2452963074922917720.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Acked-by: Cyrill Gorcunov &lt;gorcunov@gmail.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: put_and_wait_on_page_locked() while page is migrated</title>
<updated>2018-12-28T20:11:48Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2018-12-28T08:36:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9a1ea439b16b92002e0a6fceebc5d1794906e297'/>
<id>urn:sha1:9a1ea439b16b92002e0a6fceebc5d1794906e297</id>
<content type='text'>
Waiting on a page migration entry has used wait_on_page_locked() all along
since 2006: but you cannot safely wait_on_page_locked() without holding a
reference to the page, and that extra reference is enough to make
migrate_page_move_mapping() fail with -EAGAIN, when a racing task faults
on the entry before migrate_page_move_mapping() gets there.

And that failure is retried nine times, amplifying the pain when trying to
migrate a popular page.  With a single persistent faulter, migration
sometimes succeeds; with two or three concurrent faulters, success becomes
much less likely (and the more the page was mapped, the worse the overhead
of unmapping and remapping it on each try).

This is especially a problem for memory offlining, where the outer level
retries forever (or until terminated from userspace), because a heavy
refault workload can trigger an endless loop of migration failures.
wait_on_page_locked() is the wrong tool for the job.

David Herrmann (but was he the first?) noticed this issue in 2014:
https://marc.info/?l=linux-mm&amp;m=140110465608116&amp;w=2

Tim Chen started a thread in August 2017 which appears relevant:
https://marc.info/?l=linux-mm&amp;m=150275941014915&amp;w=2 where Kan Liang went
on to implicate __migration_entry_wait():
https://marc.info/?l=linux-mm&amp;m=150300268411980&amp;w=2 and the thread ended
up with the v4.14 commits: 2554db916586 ("sched/wait: Break up long wake
list walk") 11a19c7b099f ("sched/wait: Introduce wakeup boomark in
wake_up_page_bit")

Baoquan He reported "Memory hotplug softlock issue" 14 November 2018:
https://marc.info/?l=linux-mm&amp;m=154217936431300&amp;w=2

We have all assumed that it is essential to hold a page reference while
waiting on a page lock: partly to guarantee that there is still a struct
page when MEMORY_HOTREMOVE is configured, but also to protect against
reuse of the struct page going to someone who then holds the page locked
indefinitely, when the waiter can reasonably expect timely unlocking.

But in fact, so long as wait_on_page_bit_common() does the put_page(), and
is careful not to rely on struct page contents thereafter, there is no
need to hold a reference to the page while waiting on it.  That does mean
that this case cannot go back through the loop: but that's fine for the
page migration case, and even if used more widely, is limited by the "Stop
walking if it's locked" optimization in wake_page_function().

Add interface put_and_wait_on_page_locked() to do this, using "behavior"
enum in place of "lock" arg to wait_on_page_bit_common() to implement it.
No interruptible or killable variant needed yet, but they might follow: I
have a vague notion that reporting -EINTR should take precedence over
return from wait_on_page_bit_common() without knowing the page state, so
arrange it accordingly - but that may be nothing but pedantic.

__migration_entry_wait() still has to take a brief reference to the page,
prior to calling put_and_wait_on_page_locked(): but now that it is dropped
before waiting, the chance of impeding page migration is very much
reduced.  Should we perhaps disable preemption across this?

shrink_page_list()'s __ClearPageLocked(): that was a surprise!  This
survived a lot of testing before that showed up.  PageWaiters may have
been set by wait_on_page_bit_common(), and the reference dropped, just
before shrink_page_list() succeeds in freezing its last page reference: in
such a case, unlock_page() must be used.  Follow the suggestion from
Michal Hocko, just revert a978d6f52106 ("mm: unlockless reclaim") now:
that optimization predates PageWaiters, and won't buy much these days; but
we can reinstate it for the !PageWaiters case if anyone notices.

It does raise the question: should vmscan.c's is_page_cache_freeable() and
__remove_mapping() now treat a PageWaiters page as if an extra reference
were held?  Perhaps, but I don't think it matters much, since
shrink_page_list() already had to win its trylock_page(), so waiters are
not very common there: I noticed no difference when trying the bigger
change, and it's surely not needed while put_and_wait_on_page_locked() is
only used for page migration.

[willy@infradead.org: add put_and_wait_on_page_locked() kerneldoc]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261121330.1116@eggly.anvils
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reported-by: Baoquan He &lt;bhe@redhat.com&gt;
Tested-by: Baoquan He &lt;bhe@redhat.com&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: David Herrmann &lt;dh.herrmann@gmail.com&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Nick Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux</title>
<updated>2018-11-02T16:33:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-11-02T16:33:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c2aa1a444cab2c673650ada80a7dffc4345ce2e6'/>
<id>urn:sha1:c2aa1a444cab2c673650ada80a7dffc4345ce2e6</id>
<content type='text'>
Pull vfs dedup fixes from Dave Chinner:
 "This reworks the vfs data cloning infrastructure.

  We discovered many issues with these interfaces late in the 4.19 cycle
  - the worst of them (data corruption, setuid stripping) were fixed for
  XFS in 4.19-rc8, but a larger rework of the infrastructure fixing all
  the problems was needed. That rework is the contents of this pull
  request.

  Rework the vfs_clone_file_range and vfs_dedupe_file_range
  infrastructure to use a common .remap_file_range method and supply
  generic bounds and sanity checking functions that are shared with the
  data write path. The current VFS infrastructure has problems with
  rlimit, LFS file sizes, file time stamps, maximum filesystem file
  sizes, stripping setuid bits, etc and so they are addressed in these
  commits.

  We also introduce the ability for the -&gt;remap_file_range methods to
  return short clones so that clones for vfs_copy_file_range() don't get
  rejected if the entire range can't be cloned. It also allows
  filesystems to sliently skip deduplication of partial EOF blocks if
  they are not capable of doing so without requiring errors to be thrown
  to userspace.

  Existing filesystems are converted to user the new remap_file_range
  method, and both XFS and ocfs2 are modified to make use of the new
  generic checking infrastructure"

* tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
  xfs: remove [cm]time update from reflink calls
  xfs: remove xfs_reflink_remap_range
  xfs: remove redundant remap partial EOF block checks
  xfs: support returning partial reflink results
  xfs: clean up xfs_reflink_remap_blocks call site
  xfs: fix pagecache truncation prior to reflink
  ocfs2: remove ocfs2_reflink_remap_range
  ocfs2: support partial clone range and dedupe range
  ocfs2: fix pagecache truncation prior to reflink
  ocfs2: truncate page cache for clone destination file before remapping
  vfs: clean up generic_remap_file_range_prep return value
  vfs: hide file range comparison function
  vfs: enable remap callers that can handle short operations
  vfs: plumb remap flags through the vfs dedupe functions
  vfs: plumb remap flags through the vfs clone functions
  vfs: make remap_file_range functions take and return bytes completed
  vfs: remap helper should update destination inode metadata
  vfs: pass remap flags to generic_remap_checks
  vfs: pass remap flags to generic_remap_file_range_prep
  vfs: combine the clone and dedupe into a single remap_file_range
  ...
</content>
</entry>
<entry>
<title>Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2018-11-02T02:58:52Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-11-02T02:58:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9931a07d518e86eb58a75e508ed9626f86359303'/>
<id>urn:sha1:9931a07d518e86eb58a75e508ed9626f86359303</id>
<content type='text'>
Pull AFS updates from Al Viro:
 "AFS series, with some iov_iter bits included"

* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  missing bits of "iov_iter: Separate type from direction and use accessor functions"
  afs: Probe multiple fileservers simultaneously
  afs: Fix callback handling
  afs: Eliminate the address pointer from the address list cursor
  afs: Allow dumping of server cursor on operation failure
  afs: Implement YFS support in the fs client
  afs: Expand data structure fields to support YFS
  afs: Get the target vnode in afs_rmdir() and get a callback on it
  afs: Calc callback expiry in op reply delivery
  afs: Fix FS.FetchStatus delivery from updating wrong vnode
  afs: Implement the YFS cache manager service
  afs: Remove callback details from afs_callback_break struct
  afs: Commit the status on a new file/dir/symlink
  afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
  afs: Don't invoke the server to read data beyond EOF
  afs: Add a couple of tracepoints to log I/O errors
  afs: Handle EIO from delivery function
  afs: Fix TTL on VL server and address lists
  afs: Implement VL server rotation
  afs: Improve FS server rotation error handling
  ...
</content>
</entry>
<entry>
<title>vfs: enable remap callers that can handle short operations</title>
<updated>2018-10-29T23:42:10Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>darrick.wong@oracle.com</email>
</author>
<published>2018-10-29T23:42:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eca3654e3cc7d93e9734d0fa96cfb15c7f356244'/>
<id>urn:sha1:eca3654e3cc7d93e9734d0fa96cfb15c7f356244</id>
<content type='text'>
Plumb in a remap flag that enables the filesystem remap handler to
shorten remapping requests for callers that can handle it.  Now
copy_file_range can report partial success (in case we run up against
alignment problems, resource limits, etc.).

We also enable CAN_SHORTEN for fideduperange to maintain existing
userspace-visible behavior where xfs/btrfs shorten the dedupe range to
avoid stale post-eof data exposure.

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>vfs: make remap_file_range functions take and return bytes completed</title>
<updated>2018-10-29T23:41:49Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>darrick.wong@oracle.com</email>
</author>
<published>2018-10-29T23:41:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=42ec3d4c02187a18e27ff94b409ec27234bf2ffd'/>
<id>urn:sha1:42ec3d4c02187a18e27ff94b409ec27234bf2ffd</id>
<content type='text'>
Change the remap_file_range functions to take a number of bytes to
operate upon and return the number of bytes they operated on.  This is a
requirement for allowing fs implementations to return short clone/dedupe
results to the user, which will enable us to obey resource limits in a
graceful manner.

A subsequent patch will enable copy_file_range to signal to the
-&gt;clone_file_range implementation that it can handle a short length,
which will be returned in the function's return value.  For now the
short return is not implemented anywhere so the behavior won't change --
either copy_file_range manages to clone the entire range or it tries an
alternative.

Neither clone ioctl can take advantage of this, alas.

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>vfs: pass remap flags to generic_remap_checks</title>
<updated>2018-10-29T23:41:34Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>darrick.wong@oracle.com</email>
</author>
<published>2018-10-29T23:41:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3d28193e1df043764deb7abdaba5e3a6660bc393'/>
<id>urn:sha1:3d28193e1df043764deb7abdaba5e3a6660bc393</id>
<content type='text'>
Pass the same remap flags to generic_remap_checks for consistency.

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
<entry>
<title>vfs: strengthen checking of file range inputs to generic_remap_checks</title>
<updated>2018-10-29T23:40:46Z</updated>
<author>
<name>Darrick J. Wong</name>
<email>darrick.wong@oracle.com</email>
</author>
<published>2018-10-29T23:40:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9fd91a90cb9837372af24a804853e15c11aed93e'/>
<id>urn:sha1:9fd91a90cb9837372af24a804853e15c11aed93e</id>
<content type='text'>
File range remapping, if allowed to run past the destination file's EOF,
is an optimization on a regular file write.  Regular file writes that
extend the file length are subject to various constraints which are not
checked by range cloning.

This is a correctness problem because we're never allowed to touch
ranges that the page cache can't support (s_maxbytes); we're not
supposed to deal with large offsets (MAX_NON_LFS) if O_LARGEFILE isn't
set; and we must obey resource limits (RLIMIT_FSIZE).

Therefore, add these checks to the new generic_remap_checks function so
that we curtail unexpected behavior.

Signed-off-by: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dave Chinner &lt;david@fromorbit.com&gt;

</content>
</entry>
</feed>
