<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/mm, branch v4.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-10-23T22:20:57Z</updated>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.dk/linux-block</title>
<updated>2015-10-23T22:20:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-10-23T22:20:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ea1ee5ff1b500ccdc64782ecef13d276afb08f14'/>
<id>urn:sha1:ea1ee5ff1b500ccdc64782ecef13d276afb08f14</id>
<content type='text'>
Pull block layer fixes from Jens Axboe:
 "A final set of fixes for 4.3.

  It is (again) bigger than I would have liked, but it's all been
  through the testing mill and has been carefully reviewed by multiple
  parties.  Each fix is either a regression fix for this cycle, or is
  marked stable.  You can scold me at KS.  The pull request contains:

   - Three simple fixes for NVMe, fixing regressions since 4.3.  From
     Arnd, Christoph, and Keith.

   - A single xen-blkfront fix from Cathy, fixing a NULL dereference if
     an error is returned through the staste change callback.

   - Fixup for some bad/sloppy code in nbd that got introduced earlier
     in this cycle.  From Markus Pargmann.

   - A blk-mq tagset use-after-free fix from Junichi.

   - A backing device lifetime fix from Tejun, fixing a crash.

   - And finally, a set of regression/stable fixes for cgroup writeback
     from Tejun"

* 'for-linus' of git://git.kernel.dk/linux-block:
  writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy()
  NVMe: Fix memory leak on retried commands
  block: don't release bdi while request_queue has live references
  nvme: use an integer value to Linux errno values
  blk-mq: fix use-after-free in blk_mq_free_tag_set()
  nvme: fix 32-bit build warning
  writeback: fix incorrect calculation of available memory for memcg domains
  writeback: memcg dirty_throttle_control should be initialized with wb-&gt;memcg_completions
  writeback: bdi_writeback iteration must not skip dying ones
  writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback()
  writeback: laptop_mode_timer_fn() needs rcu_read_lock() around bdi_writeback iteration
  nbd: Add locking for tasks
  xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)
</content>
</entry>
<entry>
<title>mm: make sendfile(2) killable</title>
<updated>2015-10-23T08:55:10Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.com</email>
</author>
<published>2015-10-22T20:32:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=296291cdd1629c308114504b850dc343eabc2782'/>
<id>urn:sha1:296291cdd1629c308114504b850dc343eabc2782</id>
<content type='text'>
Currently a simple program below issues a sendfile(2) system call which
takes about 62 days to complete in my test KVM instance.

        int fd;
        off_t off = 0;

        fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644);
        ftruncate(fd, 2);
        lseek(fd, 0, SEEK_END);
        sendfile(fd, fd, &amp;off, 0xfffffff);

Now you should not ask kernel to do a stupid stuff like copying 256MB in
2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin
should have a way to stop you.

We actually do have a check for fatal_signal_pending() in
generic_perform_write() which triggers in this path however because we
always succeed in writing something before the check is done, we return
value &gt; 0 from generic_perform_write() and thus the information about
signal gets lost.

Fix the problem by doing the signal check before writing anything.  That
way generic_perform_write() returns -EINTR, the error gets propagated up
and the sendfile loop terminates early.

Signed-off-by: Jan Kara &lt;jack@suse.com&gt;
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>thp: use is_zero_pfn() only after pte_present() check</title>
<updated>2015-10-23T08:55:10Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan@kernel.org</email>
</author>
<published>2015-10-22T20:32:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=47aee4d8e314384807e98b67ade07f6da476aa75'/>
<id>urn:sha1:47aee4d8e314384807e98b67ade07f6da476aa75</id>
<content type='text'>
Use is_zero_pfn() on pteval only after pte_present() check on pteval
(It might be better idea to introduce is_zero_pte() which checks
pte_present() first).

Otherwise when working on a swap or migration entry and if pte_pfn's
result is equal to zero_pfn by chance, we lose user's data in
__collapse_huge_page_copy().  So if you're unlucky, the application
segfaults and finally you could see below message on exit:

BUG: Bad rss-counter state mm:ffff88007f099300 idx:2 val:3

Fixes: ca0984caa823 ("mm: incorporate zero pages into transparent huge pages")
Signed-off-by: Minchan Kim &lt;minchan@kernel.org&gt;
Reviewed-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.1+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: cma: fix incorrect type conversion for size during dma allocation</title>
<updated>2015-10-23T08:55:10Z</updated>
<author>
<name>Rohit Vaswani</name>
<email>rvaswani@codeaurora.org</email>
</author>
<published>2015-10-22T20:32:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=67a2e213e7e937c41c52ab5bc46bf3f4de469f6e'/>
<id>urn:sha1:67a2e213e7e937c41c52ab5bc46bf3f4de469f6e</id>
<content type='text'>
This was found during userspace fuzzing test when a large size dma cma
allocation is made by driver(like ion) through userspace.

  show_stack+0x10/0x1c
  dump_stack+0x74/0xc8
  kasan_report_error+0x2b0/0x408
  kasan_report+0x34/0x40
  __asan_storeN+0x15c/0x168
  memset+0x20/0x44
  __dma_alloc_coherent+0x114/0x18c

Signed-off-by: Rohit Vaswani &lt;rvaswani@codeaurora.org&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy()</title>
<updated>2015-10-21T14:17:29Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-10-13T22:14:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e27c5b9d23168cc2cb8fec147ae7ed1f7a2005c3'/>
<id>urn:sha1:e27c5b9d23168cc2cb8fec147ae7ed1f7a2005c3</id>
<content type='text'>
a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi
destruction") added rbtree_postorder_for_each_entry_safe() which is
used to remove all entries; however, according to Cody, the iterator
isn't safe against operations which may rebalance the tree.  Fix it by
switching to repeatedly removing rb_first() until empty.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Cody P Schafer &lt;dev@codyps.com&gt;
Fixes: a20135ffbc44 ("writeback: don't drain bdi_writeback_congested on bdi destruction")
Link: http://lkml.kernel.org/g/1443997973-1700-1-git-send-email-dev@codyps.com
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2015-10-16T18:42:37Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-10-16T18:42:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3d875182d7f4b27b7778c3ab6a39800d383968cb'/>
<id>urn:sha1:3d875182d7f4b27b7778c3ab6a39800d383968cb</id>
<content type='text'>
Merge misc fixes from Andrew Morton:
 "6 fixes"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;:
  sh: add copy_user_page() alias for __copy_user()
  lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE
  mm, dax: fix DAX deadlocks
  memcg: convert threshold to bytes
  builddeb: remove debian/files before build
  mm, fs: obey gfp_mapping for add_to_page_cache()
</content>
</entry>
<entry>
<title>mm, dax: fix DAX deadlocks</title>
<updated>2015-10-16T18:42:28Z</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2015-10-15T22:28:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0f90cc6609c72b0bdf2aad0cb0456194dd896e19'/>
<id>urn:sha1:0f90cc6609c72b0bdf2aad0cb0456194dd896e19</id>
<content type='text'>
The following two locking commits in the DAX code:

commit 843172978bb9 ("dax: fix race between simultaneous faults")
commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")

introduced a number of deadlocks and other issues which need to be fixed
for the v4.3 kernel.  The list of issues in DAX after these commits
(some newly introduced by the commits, some preexisting) can be found
here:

  https://lkml.org/lkml/2015/9/25/602 (Subject: "Re: [PATCH] dax: fix deadlock in __dax_fault").

This undoes most of the changes introduced by those two commits,
essentially returning us to the DAX locking scheme that was used in
v4.2.

Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Cc: Jan Kara &lt;jack@suse.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: convert threshold to bytes</title>
<updated>2015-10-16T18:42:28Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-10-15T22:28:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=424cdc14138088ada1b0e407a2195b2783c6e5ef'/>
<id>urn:sha1:424cdc14138088ada1b0e407a2195b2783c6e5ef</id>
<content type='text'>
page_counter_memparse() returns pages for the threshold, while
mem_cgroup_usage() returns bytes for memory usage.  Convert the
threshold to bytes.

Fixes: 3e32cb2e0a12b6915 ("memcg: rename cgroup_event to mem_cgroup_event").
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, fs: obey gfp_mapping for add_to_page_cache()</title>
<updated>2015-10-16T18:42:28Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2015-10-15T22:28:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=063d99b4fa762cbae9324dbbf9b6bff4b3a8cfdc'/>
<id>urn:sha1:063d99b4fa762cbae9324dbbf9b6bff4b3a8cfdc</id>
<content type='text'>
Commit 6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache
allocation paths") has caught some users of hardcoded GFP_KERNEL used in
the page cache allocation paths.  This, however, wasn't complete and
there were others which went unnoticed.

Dave Chinner has reported the following deadlock for xfs on loop device:
: With the recent merge of the loop device changes, I'm now seeing
: XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
:
: The deadlocked is as follows:
:
: kloopd1: loop_queue_read_work
:       xfs_file_iter_read
:       lock XFS inode XFS_IOLOCK_SHARED (on image file)
:       page cache read (GFP_KERNEL)
:       radix tree alloc
:       memory reclaim
:       reclaim XFS inodes
:       log force to unpin inodes
:       &lt;wait for log IO completion&gt;
:
: xfs-cil/loop1: &lt;does log force IO work&gt;
:       xlog_cil_push
:       xlog_write
:       &lt;loop issuing log writes&gt;
:               xlog_state_get_iclog_space()
:               &lt;blocks due to all log buffers under write io&gt;
:               &lt;waits for IO completion&gt;
:
: kloopd1: loop_queue_write_work
:       xfs_file_write_iter
:       lock XFS inode XFS_IOLOCK_EXCL (on image file)
:       &lt;wait for inode to be unlocked&gt;
:
: i.e. the kloopd, with it's split read and write work queues, has
: introduced a dependency through memory reclaim. i.e. that writes
: need to be able to progress for reads make progress.
:
: The problem, fundamentally, is that mpage_readpages() does a
: GFP_KERNEL allocation, rather than paying attention to the inode's
: mapping gfp mask, which is set to GFP_NOFS.
:
: The didn't used to happen, because the loop device used to issue
: reads through the splice path and that does:
:
:       error = add_to_page_cache_lru(page, mapping, index,
:                       GFP_KERNEL &amp; mapping_gfp_mask(mapping));

This has changed by commit aa4d86163e4 ("block: loop: switch to VFS
ITER_BVEC").

This patch changes mpage_readpage{s} to follow gfp mask set for the
mapping.  There are, however, other places which are doing basically the
same.

lustre:ll_dir_filler is doing GFP_KERNEL from the function which
apparently uses GFP_NOFS for other allocations so let's make this
consistent.

cifs:readpages_get_pages is called from cifs_readpages and
__cifs_readpages_from_fscache called from the same path obeys mapping
gfp.

ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
regardless it uses mapping_gfp_mask for the page allocation.

ext4_mpage_readpages is the called from the page cache allocation path
same as read_pages and read_cache_pages

As I've noticed in my previous post I cannot say I would be happy about
sprinkling mapping_gfp_mask all over the place and it sounds like we
should drop gfp_mask argument altogether and use it internally in
__add_to_page_cache_locked that would require all the filesystems to use
mapping gfp consistently which I am not sure is the case here.  From a
quick glance it seems that some file system use it all the time while
others are selective.

Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reported-by: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Ming Lei &lt;ming.lei@canonical.com&gt;
Cc: Andreas Dilger &lt;andreas.dilger@intel.com&gt;
Cc: Oleg Drokin &lt;oleg.drokin@intel.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmstat: explicitly schedule per-cpu work on the CPU we need it to run on</title>
<updated>2015-10-15T20:01:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-10-15T20:01:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=176bed1de5bf977938cad26551969eca8f0883b1'/>
<id>urn:sha1:176bed1de5bf977938cad26551969eca8f0883b1</id>
<content type='text'>
The vmstat code uses "schedule_delayed_work_on()" to do the initial
startup of the delayed work on the right CPU, but then once it was
started it would use the non-cpu-specific "schedule_delayed_work()" to
re-schedule it on that CPU.

That just happened to schedule it on the same CPU historically (well, in
almost all situations), but the code _requires_ this work to be per-cpu,
and should say so explicitly rather than depend on the non-cpu-specific
scheduling to schedule on the current CPU.

The timer code is being changed to not be as single-minded in always
running things on the calling CPU.

See also commit 874bbfe600a6 ("workqueue: make sure delayed work run in
local cpu") that for now maintains the local CPU guarantees just in case
there are other broken users that depended on the accidental behavior.

Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
