<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/mpage.c, branch v4.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-10-16T18:42:28Z</updated>
<entry>
<title>mm, fs: obey gfp_mapping for add_to_page_cache()</title>
<updated>2015-10-16T18:42:28Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2015-10-15T22:28:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=063d99b4fa762cbae9324dbbf9b6bff4b3a8cfdc'/>
<id>urn:sha1:063d99b4fa762cbae9324dbbf9b6bff4b3a8cfdc</id>
<content type='text'>
Commit 6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache
allocation paths") has caught some users of hardcoded GFP_KERNEL used in
the page cache allocation paths.  This, however, wasn't complete and
there were others which went unnoticed.

Dave Chinner has reported the following deadlock for xfs on loop device:
: With the recent merge of the loop device changes, I'm now seeing
: XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
:
: The deadlocked is as follows:
:
: kloopd1: loop_queue_read_work
:       xfs_file_iter_read
:       lock XFS inode XFS_IOLOCK_SHARED (on image file)
:       page cache read (GFP_KERNEL)
:       radix tree alloc
:       memory reclaim
:       reclaim XFS inodes
:       log force to unpin inodes
:       &lt;wait for log IO completion&gt;
:
: xfs-cil/loop1: &lt;does log force IO work&gt;
:       xlog_cil_push
:       xlog_write
:       &lt;loop issuing log writes&gt;
:               xlog_state_get_iclog_space()
:               &lt;blocks due to all log buffers under write io&gt;
:               &lt;waits for IO completion&gt;
:
: kloopd1: loop_queue_write_work
:       xfs_file_write_iter
:       lock XFS inode XFS_IOLOCK_EXCL (on image file)
:       &lt;wait for inode to be unlocked&gt;
:
: i.e. the kloopd, with it's split read and write work queues, has
: introduced a dependency through memory reclaim. i.e. that writes
: need to be able to progress for reads make progress.
:
: The problem, fundamentally, is that mpage_readpages() does a
: GFP_KERNEL allocation, rather than paying attention to the inode's
: mapping gfp mask, which is set to GFP_NOFS.
:
: The didn't used to happen, because the loop device used to issue
: reads through the splice path and that does:
:
:       error = add_to_page_cache_lru(page, mapping, index,
:                       GFP_KERNEL &amp; mapping_gfp_mask(mapping));

This has changed by commit aa4d86163e4 ("block: loop: switch to VFS
ITER_BVEC").

This patch changes mpage_readpage{s} to follow gfp mask set for the
mapping.  There are, however, other places which are doing basically the
same.

lustre:ll_dir_filler is doing GFP_KERNEL from the function which
apparently uses GFP_NOFS for other allocations so let's make this
consistent.

cifs:readpages_get_pages is called from cifs_readpages and
__cifs_readpages_from_fscache called from the same path obeys mapping
gfp.

ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
regardless it uses mapping_gfp_mask for the page allocation.

ext4_mpage_readpages is the called from the page cache allocation path
same as read_pages and read_cache_pages

As I've noticed in my previous post I cannot say I would be happy about
sprinkling mapping_gfp_mask all over the place and it sounds like we
should drop gfp_mask argument altogether and use it internally in
__add_to_page_cache_locked that would require all the filesystems to use
mapping gfp consistently which I am not sure is the case here.  From a
quick glance it seems that some file system use it all the time while
others are selective.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reported-by: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Ming Lei &lt;ming.lei@canonical.com&gt;
Cc: Andreas Dilger &lt;andreas.dilger@intel.com&gt;
Cc: Oleg Drokin &lt;oleg.drokin@intel.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>block: remove bio_get_nr_vecs()</title>
<updated>2015-08-13T18:32:04Z</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@gmail.com</email>
</author>
<published>2015-05-19T12:31:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c'/>
<id>urn:sha1:b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c</id>
<content type='text'>
We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.

Acked-by: Steven Whitehouse &lt;swhiteho@redhat.com&gt;
Signed-off-by: Kent Overstreet &lt;kent.overstreet@gmail.com&gt;
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lin &lt;ming.l@ssi.samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>block: add a bi_error field to struct bio</title>
<updated>2015-07-29T14:55:15Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2015-07-20T13:29:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4246a0b63bd8f56a1469b12eafeb875b1041a451'/>
<id>urn:sha1:4246a0b63bd8f56a1469b12eafeb875b1041a451</id>
<content type='text'>
Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>writeback: implement foreign cgroup inode detection</title>
<updated>2015-06-02T14:40:20Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-05-28T18:50:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2a81490811d0296d390c571bb64eaa93e5ed7def'/>
<id>urn:sha1:2a81490811d0296d390c571bb64eaa93e5ed7def</id>
<content type='text'>
As concurrent write sharing of an inode is expected to be very rare
and memcg only tracks page ownership on first-use basis severely
confining the usefulness of such sharing, cgroup writeback tracks
ownership per-inode.  While the support for concurrent write sharing
of an inode is deemed unnecessary, an inode being written to by
different cgroups at different points in time is a lot more common,
and, more importantly, charging only by first-use can too readily lead
to grossly incorrect behaviors (single foreign page can lead to
gigabytes of writeback to be incorrectly attributed).

To resolve this issue, cgroup writeback detects the majority dirtier
of an inode and will transfer the ownership to it.  To avoid
unnnecessary oscillation, the detection mechanism keeps track of
history and gives out the switch verdict only if the foreign usage
pattern is stable over a certain amount of time and/or writeback
attempts.

The detection mechanism has fairly low space and computation overhead.
It adds 8 bytes to struct inode (one int and two u16's) and minimal
amount of calculation per IO.  The detection mechanism converges to
the correct answer usually in several seconds of IO time when there's
a clear majority dirtier.  Even when there isn't, it can reach an
acceptable answer fairly quickly under most circumstances.

Please see wb_detach_inode() for more details.

This patch only implements detection.  Following patches will
implement actual switching.

v2: wbc_account_io() now checks whether the wbc is associated with a
    wb before dereferencing it.  This can happen when pageout() is
    writing pages directly without going through the usual writeback
    path.  As pageout() path is single-threaded, we don't want it to
    be blocked behind a slow cgroup and ultimately want it to delegate
    actual writing to the usual writeback path.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>writeback: make writeback_control track the inode being written back</title>
<updated>2015-06-02T14:39:48Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-06-02T14:39:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b16b1deb553adcd7b3b7ce3e6d6fd1b923f314da'/>
<id>urn:sha1:b16b1deb553adcd7b3b7ce3e6d6fd1b923f314da</id>
<content type='text'>
Currently, for cgroup writeback, the IO submission paths directly
associate the bio's with the blkcg from inode_to_wb_blkcg_css();
however, it'd be necessary to keep more writeback context to implement
foreign inode writeback detection.  wbc (writeback_control) is the
natural fit for the extra context - it persists throughout the
writeback of each inode and is passed all the way down to IO
submission paths.

This patch adds wbc_attach_and_unlock_inode(), wbc_detach_inode(), and
wbc_attach_fdatawrite_inode() which are used to associate wbc with the
inode being written back.  IO submission paths now use wbc_init_bio()
instead of directly associating bio's with blkcg themselves.  This
leaves inode_to_wb_blkcg_css() w/o any user.  The function is removed.

wbc currently only tracks the associated wb (bdi_writeback).  Future
patches will add more for foreign inode detection.  The association is
established under i_lock which will be depended upon when migrating
foreign inodes to other wb's.

As currently, once established, inode to wb association never changes,
going through wbc when initializing bio's doesn't cause any behavior
changes.

v2: submit_blk_blkcg() now checks whether the wbc is associated with a
    wb before dereferencing it.  This can happen when pageout() is
    writing pages directly without going through the usual writeback
    path.  As pageout() path is single-threaded, we don't want it to
    be blocked behind a slow cgroup and ultimately want it to delegate
    actual writing to the usual writeback path.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>mpage: make __mpage_writepage() honor cgroup writeback</title>
<updated>2015-06-02T14:38:04Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-05-22T21:14:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=429b3fb027492b2b9f96e00bb1a9255fc56d4934'/>
<id>urn:sha1:429b3fb027492b2b9f96e00bb1a9255fc56d4934</id>
<content type='text'>
__mpage_writepage() is used to implement mpage_writepages() which in
turn is used for -&gt;writepages() of various filesystems.  All writeback
logic is now updated to handle cgroup writeback and the block cgroup
to issue IOs for is encoded in writeback_control and can be retrieved
from the inode; however, __mpage_writepage() currently ignores the
blkcg indicated by the inode and issues all bio's without explicit
blkcg association.

This patch updates __mpage_writepage() so that the issued bio's are
associated with inode_to_writeback_blkcg_css(inode).

v2: Updated for per-inode wb association.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>vfs: guard end of device for mpage interface</title>
<updated>2014-10-10T02:25:53Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2014-10-09T22:26:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4db96b71e3caea5bb39053d57683129e0682c66f'/>
<id>urn:sha1:4db96b71e3caea5bb39053d57683129e0682c66f</id>
<content type='text'>
Add guard_bio_eod() check for mpage code in order to allow us to do IO
even on the odd last sectors of a device, even if the block size is some
multiple of the physical sector size.

Using mpage_readpages() for block device requires this guard check.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/block_dev.c: add bdev_read_page() and bdev_write_page()</title>
<updated>2014-06-04T23:54:02Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>matthew.r.wilcox@intel.com</email>
</author>
<published>2014-06-04T23:07:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=47a191fd38ebddb1bd1510ec2bc1085c578c8868'/>
<id>urn:sha1:47a191fd38ebddb1bd1510ec2bc1085c578c8868</id>
<content type='text'>
A block device driver may choose to provide a rw_page operation.  These
will be called when the filesystem is attempting to do page sized I/O to
page cache pages (ie not for direct I/O).  This does preclude I/Os that
are larger than page size, so this may only be a performance gain for
some devices.

Signed-off-by: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Tested-by: Dheeraj Reddy &lt;dheeraj.reddy@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/mpage.c: factor page_endio() out of mpage_end_io()</title>
<updated>2014-06-04T23:54:02Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>matthew.r.wilcox@intel.com</email>
</author>
<published>2014-06-04T23:07:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=57d998456ae8680ed446aa1993f45f4d8a9a5973'/>
<id>urn:sha1:57d998456ae8680ed446aa1993f45f4d8a9a5973</id>
<content type='text'>
page_endio() takes care of updating all the appropriate page flags once
I/O has finished to a page.  Switch to using mapping_set_error() instead
of setting AS_EIO directly; this will handle thin-provisioned devices
correctly.

Signed-off-by: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dheeraj Reddy &lt;dheeraj.reddy@intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/mpage.c: factor clean_buffers() out of __mpage_writepage()</title>
<updated>2014-06-04T23:54:02Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>matthew.r.wilcox@intel.com</email>
</author>
<published>2014-06-04T23:07:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=90768eee4565adb28ea28b4ac5081c676a8fe1f2'/>
<id>urn:sha1:90768eee4565adb28ea28b4ac5081c676a8fe1f2</id>
<content type='text'>
__mpage_writepage() is over 200 lines long, has 20 local variables, four
goto labels and could desperately use simplification.  Splitting
clean_buffers() into a helper function improves matters a little,
removing 20+ lines from it.

Signed-off-by: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Dheeraj Reddy &lt;dheeraj.reddy@intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
