linux/drivers/block, branch v4.7

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

2016-07-07T22:34:09Z

Pull block IO fixes from Jens Axboe: "Three small fixes that have been queued up and tested for this series: - A bug fix for xen-blkfront from Bob Liu, fixing an issue with incomplete requests during migration. - A fix for an ancient issue in retrieving the IO priority of a different PID than self, preventing that task from going away while we access it. From Omar. - A writeback fix from Tahsin, fixing a case where we'd call ihold() with a zero ref count inode" * 'for-linus' of git://git.kernel.dk/linux-block: block: fix use-after-free in sys_ioprio_get() writeback: inode cgroup wb switch should not call ihold() xen-blkfront: save uncompleted reqs in blkfront_resume()

Merge branch 'stable/for-jens-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus

2016-06-29T19:15:19Z

xen-blkfront: save uncompleted reqs in blkfront_resume()

2016-06-29T16:32:39Z

Uncompleted reqs used to be 'saved and resubmitted' in blkfront_recover() during migration, but that's too late after multi-queue was introduced. After a migrate to another host (which may not have multiqueue support), the number of rings (block hardware queues) may be changed and the ring and shadow structure will also be reallocated. The blkfront_recover() then can't 'save and resubmit' the real uncompleted reqs because shadow structure have been reallocated. This patch fixes this issue by moving the 'save' logic out of blkfront_recover() to earlier place in blkfront_resume(). The 'resubmit' is not changed and still in blkfront_recover(). Signed-off-by: Bob Liu Signed-off-by: Konrad Rzeszutek Wilk Cc: stable@vger.kernel.org

tree wide: get rid of __GFP_REPEAT for order-0 allocations part I

2016-06-25T00:23:52Z

This is the third version of the patchset previously sent [1]. I have basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get rid of superfluous gfp flags" which went through dm tree. I am sending it now because it is tree wide and chances for conflicts are reduced considerably when we want to target rc2. I plan to send the next step and rename the flag and move to a better semantic later during this release cycle so we will have a new semantic ready for 4.8 merge window hopefully. Motivation: While working on something unrelated I've checked the current usage of __GFP_REPEAT in the tree. It seems that a majority of the usage is and always has been bogus because __GFP_REPEAT has always been about costly high order allocations while we are using it for order-0 or very small orders very often. It seems that a big pile of them is just a copy&paste when a code has been adopted from one arch to another. I think it makes some sense to get rid of them because they are just making the semantic more unclear. Please note that GFP_REPEAT is documented as * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. while !costly requests have basically nofail semantic. So one could reasonably expect that order-0 request with __GFP_REPEAT will not loop for ever. This is not implemented right now though. I would like to move on with __GFP_REPEAT and define a better semantic for it. $ git grep __GFP_REPEAT origin/master | wc -l 111 $ git grep __GFP_REPEAT | wc -l 36 So we are down to the third after this patch series. The remaining places really seem to be relying on __GFP_REPEAT due to large allocation requests. This still needs some double checking which I will do later after all the simple ones are sorted out. I am touching a lot of arch specific code here and I hope I got it right but as a matter of fact I even didn't compile test for some archs as I do not have cross compiler for them. Patches should be quite trivial to review for stupid compile mistakes though. The tricky parts are usually hidden by macro definitions and thats where I would appreciate help from arch maintainers. [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org This patch (of 19): __GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. Yet we have the full kernel tree with its usage for apparently order-0 allocations. This is really confusing because __GFP_REPEAT is explicitly documented to allow allocation failures which is a weaker semantic than the current order-0 has (basically nofail). Let's simply drop __GFP_REPEAT from those places. This would allow to identify place which really need allocator to retry harder and formulate a more specific semantic for what the flag is supposed to do actually. Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko Cc: "David S. Miller" Cc: "H. Peter Anvin" Cc: "James E.J. Bottomley" Cc: "Theodore Ts'o" Cc: Andy Lutomirski Cc: Benjamin Herrenschmidt Cc: Catalin Marinas Cc: Chen Liqin Cc: Chris Metcalf [for tile] Cc: Guan Xuetao Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Jan Kara Cc: John Crispin Cc: Lennox Wu Cc: Ley Foon Tan Cc: Martin Schwidefsky Cc: Matt Fleming Cc: Ralf Baechle Cc: Rich Felker Cc: Russell King Cc: Thomas Gleixner Cc: Vineet Gupta Cc: Will Deacon Cc: Yoshinori Sato Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

Merge branch 'stable/for-jens-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus

2016-06-09T15:49:55Z

Konrad writes: Thishas two fixes for a guest migrating from host that has multi-queue to one without it (and vice-versa).

xen-blkfront: fix resume issues after a migration

2016-06-08T17:54:46Z

After a migrate to another host (which may not have multiqueue support), the number of rings (block hardware queues) may be changed and the ring info structure will also be reallocated. This patch fixes two related bugs: * call blk_mq_update_nr_hw_queues() to make blk-core know the number of hardware queues have been changed. * Don't store rinfo pointer to hctx->driver_data, because rinfo may be reallocated so use hctx->queue_num to get the rinfo structure instead. Signed-off-by: Bob Liu Signed-off-by: Konrad Rzeszutek Wilk

xen-blkfront: don't call talk_to_blkback when already connected to blkback

2016-06-08T17:54:39Z

Sometimes blkfront may twice receive blkback_changed() notification (XenbusStateConnected) after migration, which will cause talk_to_blkback() to be called twice too and confuse xen-blkback. The flow is as follow: blkfront blkback blkfront_resume() > talk_to_blkback() > Set blkfront to XenbusStateInitialised front changed() > Connect() > Set blkback to XenbusStateConnected blkback_changed() > Skip talk_to_blkback() because frontstate == XenbusStateInitialised > blkfront_connect() > Set blkfront to XenbusStateConnected ----- And here we get another XenbusStateConnected notification leading to: ----- blkback_changed() > because now frontstate != XenbusStateInitialised talk_to_blkback() is also called again > blkfront state changed from XenbusStateConnected to XenbusStateInitialised (Which is not correct!) front_changed(): > Do nothing because blkback already in XenbusStateConnected Now blkback is in XenbusStateConnected but blkfront is still in XenbusStateInitialised - leading to no disks. Poking of the XenbusStateConnected state is allowed (to deal with block disk change) and has to be dealt with. The most likely cause of this bug are custom udev scripts hooking up the disks and then validating the size. Signed-off-by: Bob Liu Signed-off-by: Konrad Rzeszutek Wilk

nbd: pass the nbd pointer for flags debugfs

2016-06-08T15:03:54Z

We were passing in &nbd for the private data in debugfs_create_file() for the flags entry. We expect it to just be nbd, fix this so we get proper output from this debugfs entry. Signed-off-by: Josef Bacik Signed-off-by: Jens Axboe

Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

2016-05-27T02:34:26Z

Pull misc DAX updates from Vishal Verma: "DAX error handling for 4.7 - Until now, dax has been disabled if media errors were found on any device. This enables the use of DAX in the presence of these errors by making all sector-aligned zeroing go through the driver. - The driver (already) has the ability to clear errors on writes that are sent through the block layer using 'DSMs' defined in ACPI 6.1. Other misc changes: - When mounting DAX filesystems, check to make sure the partition is page aligned. This is a requirement for DAX, and previously, we allowed such unaligned mounts to succeed, but subsequent reads/writes would fail. - Misc/cleanup fixes from Jan that remove unused code from DAX related to zeroing, writeback, and some size checks" * tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: fix a comment in dax_zero_page_range and dax_truncate_page dax: for truncate/hole-punch, do zeroing through the driver if possible dax: export a low-level __dax_zero_page_range helper dax: use sb_issue_zerout instead of calling dax_clear_sectors dax: enable dax in the presence of known media errors (badblocks) dax: fallback from pmd to pte on error block: Update blkdev_dax_capable() for consistency xfs: Add alignment check for DAX mount ext2: Add alignment check for DAX mount ext4: Add alignment check for DAX mount block: Add bdev_dax_supported() for dax mount checks block: Add vfs_msg() interface dax: Remove redundant inode size checks dax: Remove pointless writeback from dax_do_io() dax: Remove zeroing from dax_io() dax: Remove dead zeroing code from fault handlers ext2: Avoid DAX zeroing to corrupt data ext2: Fix block zeroing in ext2_get_blocks() for DAX dax: Remove complete_unwritten argument DAX: move RADIX_DAX_ definitions to dax.c

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

2016-05-26T21:10:32Z

Pull Ceph updates from Sage Weil: "This changeset has a few main parts: - Ilya has finished a huge refactoring effort to sync up the client-side logic in libceph with the user-space client code, which has evolved significantly over the last couple years, with lots of additional behaviors (e.g., how requests are handled when cluster is full and transitions from full to non-full). This structure of the code is more closely aligned with userspace now such that it will be much easier to maintain going forward when behavior changes take place. There are some locking improvements bundled in as well. - Zheng adds multi-filesystem support (multiple namespaces within the same Ceph cluster) - Zheng has changed the readdir offsets and directory enumeration so that dentry offsets are hash-based and therefore stable across directory fragmentation events on the MDS. - Zheng has a smorgasbord of bug fixes across fs/ceph" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits) ceph: fix wake_up_session_cb() ceph: don't use truncate_pagecache() to invalidate read cache ceph: SetPageError() for writeback pages if writepages fails ceph: handle interrupted ceph_writepage() ceph: make ceph_update_writeable_page() uninterruptible libceph: make ceph_osdc_wait_request() uninterruptible ceph: handle -EAGAIN returned by ceph_update_writeable_page() ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM ceph: block non-fatal signals for fault/page_mkwrite ceph: make logical calculation functions return bool ceph: tolerate bad i_size for symlink inode ceph: improve fragtree change detection ceph: keep leaf frag when updating fragtree ceph: fix dir_auth check in ceph_fill_dirfrag() ceph: don't assume frag tree splits in mds reply are sorted ceph: fix inode reference leak ceph: using hash value to compose dentry offset ceph: don't forbid marking directory complete after forward seek ceph: record 'offset' for each entry of readdir result ceph: define 'end/complete' in readdir reply as bit flags ...