<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/md, branch v5.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-01-16T04:43:09Z</updated>
<entry>
<title>block: fix an integer overflow in logical block size</title>
<updated>2020-01-16T04:43:09Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2020-01-15T13:35:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad6bf88a6c19a39fb3b0045d78ea880325dfcf15'/>
<id>urn:sha1:ad6bf88a6c19a39fb3b0045d78ea880325dfcf15</id>
<content type='text'>
Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
  device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
  EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-linus-20191212' of git://git.kernel.dk/linux-block</title>
<updated>2019-12-13T22:27:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-12-13T22:27:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f1fcd7786ec8e316b69860ab856f29f346a9b301'/>
<id>urn:sha1:f1fcd7786ec8e316b69860ab856f29f346a9b301</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - stable fix for the bi_size overflow. Not a corruption issue, but a
   case wher we could merge but disallowed (Andreas)

 - NVMe pull request via Keith, with various fixes.

 - MD pull request from Song.

 - Merge window regression fix for the rq passthrough stats (Logan)

 - Remove unused blkcg_drain_queue() function (Guoqing)

* tag 'for-linus-20191212' of git://git.kernel.dk/linux-block:
  blk-cgroup: remove blkcg_drain_queue
  block: fix NULL pointer dereference in account statistics with IDE
  md: make sure desc_nr less than MD_SB_DISKS
  md: raid1: check rdev before reference in raid1_sync_request func
  raid5: need to set STRIPE_HANDLE for batch head
  block: fix "check bi_size overflow before merge"
  nvme/pci: Fix read queue count
  nvme/pci Limit write queue sizes to possible cpus
  nvme/pci: Fix write and poll queue types
  nvme/pci: Remove last_cq_head
  nvme: Namepace identification descriptor list is optional
  nvme-fc: fix double-free scenarios on hw queues
  nvme: else following return is not needed
  nvme: add error message on mismatching controller ids
  nvme_fc: add module to ops template to allow module references
  nvmet-loop: Avoid preallocating big SGL for data
  nvme-fc: Avoid preallocating big SGL for data
  nvme-rdma: Avoid preallocating big SGL for data
</content>
</entry>
<entry>
<title>Merge tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm</title>
<updated>2019-12-13T22:13:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-12-13T22:13:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=15da849c910da05622c75c5741dd6e176a8b6fe2'/>
<id>urn:sha1:15da849c910da05622c75c5741dd6e176a8b6fe2</id>
<content type='text'>
Pull device mapper fixes from Mike Snitzer:

 - Fix DM multipath by restoring full path selector functionality for
   bio-based configurations that don't haave a SCSI device handler.

 - Fix dm-btree removal to ensure non-root btree nodes have at least
   (max_entries / 3) entries. This resolves userspace thin_check
   utility's report of "too few entries in btree_node".

 - Fix both the DM thin-provisioning and dm-clone targets to properly
   flush the data device prior to metadata commit. This resolves the
   potential for inconsistency across a power loss event when the data
   device has a volatile writeback cache.

 - Small documentation fixes to dm-clone and dm-integrity.

* tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  docs: dm-integrity: remove reference to ARC4
  dm thin: Flush data device before committing metadata
  dm thin metadata: Add support for a pre-commit callback
  dm clone: Flush destination device before committing metadata
  dm clone metadata: Use a two phase commit
  dm clone metadata: Track exact changes per transaction
  dm btree: increase rebalance threshold in __rebalance2()
  dm: add dm-clone to the documentation index
  dm mpath: remove harmful bio-based optimization
</content>
</entry>
<entry>
<title>md: make sure desc_nr less than MD_SB_DISKS</title>
<updated>2019-12-11T18:38:08Z</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2019-12-10T07:01:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3b7436cc9449d5ff7fa1c1fd5bc3edb6402ff5b8'/>
<id>urn:sha1:3b7436cc9449d5ff7fa1c1fd5bc3edb6402ff5b8</id>
<content type='text'>
For super_90_load, we need to make sure 'desc_nr' less
than MD_SB_DISKS, avoiding invalid memory access of 'sb-&gt;disks'.

Fixes: 228fc7d76db6 ("md: avoid invalid memory access for array sb-&gt;dev_roles")
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>md: raid1: check rdev before reference in raid1_sync_request func</title>
<updated>2019-12-11T18:36:13Z</updated>
<author>
<name>Zhiqiang Liu</name>
<email>liuzhiqiang26@huawei.com</email>
</author>
<published>2019-12-10T02:42:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=028288df635f5a9addd48ac4677b720192747944'/>
<id>urn:sha1:028288df635f5a9addd48ac4677b720192747944</id>
<content type='text'>
In raid1_sync_request func, rdev should be checked before reference.

Signed-off-by: Zhiqiang Liu &lt;liuzhiqiang26@huawei.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>raid5: need to set STRIPE_HANDLE for batch head</title>
<updated>2019-12-11T18:12:09Z</updated>
<author>
<name>Guoqing Jiang</name>
<email>guoqing.jiang@cloud.ionos.com</email>
</author>
<published>2019-11-27T16:57:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a7ede3d16808b8f3915c8572d783530a82b2f027'/>
<id>urn:sha1:a7ede3d16808b8f3915c8572d783530a82b2f027</id>
<content type='text'>
With commit 6ce220dd2f8ea71d6afc29b9a7524c12e39f374a ("raid5: don't set
STRIPE_HANDLE to stripe which is in batch list"), we don't want to set
STRIPE_HANDLE flag for sh which is already in batch list.

However, the stripe which is the head of batch list should set this flag,
otherwise panic could happen inside init_stripe at BUG_ON(sh-&gt;batch_head),
it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved.

Thanks for Xiao's effort to verify the change.

Fixes: 6ce220dd2f8ea ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list")
Reported-by: Xiao Ni &lt;xni@redhat.com&gt;
Tested-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Guoqing Jiang &lt;guoqing.jiang@cloud.ionos.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
</content>
</entry>
<entry>
<title>treewide: Use sizeof_field() macro</title>
<updated>2019-12-09T18:36:44Z</updated>
<author>
<name>Pankaj Bharadiya</name>
<email>pankaj.laxminarayan.bharadiya@intel.com</email>
</author>
<published>2019-12-09T18:31:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c593642c8be046915ca3a4a300243a68077cd207'/>
<id>urn:sha1:c593642c8be046915ca3a4a300243a68077cd207</id>
<content type='text'>
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya &lt;pankaj.laxminarayan.bharadiya@intel.com&gt;
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: David Miller &lt;davem@davemloft.net&gt; # for net
</content>
</entry>
<entry>
<title>dm thin: Flush data device before committing metadata</title>
<updated>2019-12-06T16:46:16Z</updated>
<author>
<name>Nikos Tsironis</name>
<email>ntsironis@arrikto.com</email>
</author>
<published>2019-12-04T14:07:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=694cfe7f31db36912725e63a38a5179c8628a496'/>
<id>urn:sha1:694cfe7f31db36912725e63a38a5179c8628a496</id>
<content type='text'>
The thin provisioning target maintains per thin device mappings that map
virtual blocks to data blocks in the data device.

When we write to a shared block, in case of internal snapshots, or
provision a new block, in case of external snapshots, we copy the shared
block to a new data block (COW), update the mapping for the relevant
virtual block and then issue the write to the new data block.

Suppose the data device has a volatile write-back cache and the
following sequence of events occur:

1. We write to a shared block
2. A new data block is allocated
3. We copy the shared block to the new data block using kcopyd (COW)
4. We insert the new mapping for the virtual block in the btree for that
   thin device.
5. The commit timeout expires and we commit the metadata, that now
   includes the new mapping from step (4).
6. The system crashes and the data device's cache has not been flushed,
   meaning that the COWed data are lost.

The next time we read that virtual block of the thin device we read it
from the data block allocated in step (2), since the metadata have been
successfully committed. The data are lost due to the crash, so we read
garbage instead of the old, shared data.

This has the following implications:

1. In case of writes to shared blocks, with size smaller than the pool's
   block size (which means we first copy the whole block and then issue
   the smaller write), we corrupt data that the user never touched.

2. In case of writes to shared blocks, with size equal to the device's
   logical block size, we fail to provide atomic sector writes. When the
   system recovers the user will read garbage from that sector instead
   of the old data or the new data.

3. Even for writes to shared blocks, with size equal to the pool's block
   size (overwrites), after the system recovers, the written sectors
   will contain garbage instead of a random mix of sectors containing
   either old data or new data, thus we fail again to provide atomic
   sectors writes.

4. Even when the user flushes the thin device, because we first commit
   the metadata and then pass down the flush, the same risk for
   corruption exists (if the system crashes after the metadata have been
   committed but before the flush is passed down to the data device.)

The only case which is unaffected is that of writes with size equal to
the pool's block size and with the FUA flag set. But, because FUA writes
trigger metadata commits, this case can trigger the corruption
indirectly.

Moreover, apart from internal and external snapshots, the same issue
exists for newly provisioned blocks, when block zeroing is enabled.
After the system recovers the provisioned blocks might contain garbage
instead of zeroes.

To solve this and avoid the potential data corruption we flush the
pool's data device **before** committing its metadata.

This ensures that the data blocks of any newly inserted mappings are
properly written to non-volatile storage and won't be lost in case of a
crash.

Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis &lt;ntsironis@arrikto.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm thin metadata: Add support for a pre-commit callback</title>
<updated>2019-12-05T22:05:24Z</updated>
<author>
<name>Nikos Tsironis</name>
<email>ntsironis@arrikto.com</email>
</author>
<published>2019-12-04T14:07:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ecda7c0280e6b3398459dc589b9a41c1adb45529'/>
<id>urn:sha1:ecda7c0280e6b3398459dc589b9a41c1adb45529</id>
<content type='text'>
Add support for one pre-commit callback which is run right before the
metadata are committed.

This allows the thin provisioning target to run a callback before the
metadata are committed and is required by the next commit.

Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis &lt;ntsironis@arrikto.com&gt;
Acked-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm clone: Flush destination device before committing metadata</title>
<updated>2019-12-05T22:05:23Z</updated>
<author>
<name>Nikos Tsironis</name>
<email>ntsironis@arrikto.com</email>
</author>
<published>2019-12-04T14:06:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8b3fd1f53af3591d5624ab9df718369b14d09ed1'/>
<id>urn:sha1:8b3fd1f53af3591d5624ab9df718369b14d09ed1</id>
<content type='text'>
dm-clone maintains an on-disk bitmap which records which regions are
valid in the destination device, i.e., which regions have already been
hydrated, or have been written to directly, via user I/O.

Setting a bit in the on-disk bitmap meas the corresponding region is
valid in the destination device and we redirect all I/O regarding it to
the destination device.

Suppose the destination device has a volatile write-back cache and the
following sequence of events occur:

1. A region gets hydrated, either through the background hydration or
   because it was written to directly, via user I/O.

2. The commit timeout expires and we commit the metadata, marking that
   region as valid in the destination device.

3. The system crashes and the destination device's cache has not been
   flushed, meaning the region's data are lost.

The next time we read that region we read it from the destination
device, since the metadata have been successfully committed, but the
data are lost due to the crash, so we read garbage instead of the old
data.

This has several implications:

1. In case of background hydration or of writes with size smaller than
   the region size (which means we first copy the whole region and then
   issue the smaller write), we corrupt data that the user never
   touched.

2. In case of writes with size equal to the device's logical block size,
   we fail to provide atomic sector writes. When the system recovers the
   user will read garbage from the sector instead of the old data or the
   new data.

3. In case of writes without the FUA flag set, after the system
   recovers, the written sectors will contain garbage instead of a
   random mix of sectors containing either old data or new data, thus we
   fail again to provide atomic sector writes.

4. Even when the user flushes the dm-clone device, because we first
   commit the metadata and then pass down the flush, the same risk for
   corruption exists (if the system crashes after the metadata have been
   committed but before the flush is passed down).

The only case which is unaffected is that of writes with size equal to
the region size and with the FUA flag set. But, because FUA writes
trigger metadata commits, this case can trigger the corruption
indirectly.

To solve this and avoid the potential data corruption we flush the
destination device **before** committing the metadata.

This ensures that any freshly hydrated regions, for which we commit the
metadata, are properly written to non-volatile storage and won't be lost
in case of a crash.

Fixes: 7431b7835f55 ("dm: add clone target")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Nikos Tsironis &lt;ntsironis@arrikto.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
</entry>
</feed>
