<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/md, branch v6.13</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.13</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.13'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-01-08T18:12:01Z</updated>
<entry>
<title>Merge tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm</title>
<updated>2025-01-08T18:12:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-01-08T18:12:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0b7958fa05d562e514fd0abe2a4800042abf868b'/>
<id>urn:sha1:0b7958fa05d562e514fd0abe2a4800042abf868b</id>
<content type='text'>
Pull device mapper fixes from Mikulas Patocka:

 - dm-array fixes

 - dm-verity forward error correction fixes

 - remove the flag DM_TARGET_PASSES_INTEGRITY from dm-ebs

 - dm-thin RCU list fix

* tag 'for-6.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm thin: make get_first_thin use rcu-safe list first function
  dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY
  dm-verity FEC: Avoid copying RS parity bytes twice.
  dm-verity FEC: Fix RS FEC repair for roots unaligned to block size (take 2)
  dm array: fix cursor index when skipping across block boundaries
  dm array: fix unreleased btree blocks on closing a faulty array cursor
  dm array: fix releasing a faulty array block twice in dm_array_cursor_end
</content>
</entry>
<entry>
<title>dm thin: make get_first_thin use rcu-safe list first function</title>
<updated>2025-01-08T14:29:39Z</updated>
<author>
<name>Krister Johansen</name>
<email>kjlx@templeofstupid.com</email>
</author>
<published>2025-01-07T23:24:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=80f130bfad1dab93b95683fc39b87235682b8f72'/>
<id>urn:sha1:80f130bfad1dab93b95683fc39b87235682b8f72</id>
<content type='text'>
The documentation in rculist.h explains the absence of list_empty_rcu()
and cautions programmers against relying on a list_empty() -&gt;
list_first() sequence in RCU safe code.  This is because each of these
functions performs its own READ_ONCE() of the list head.  This can lead
to a situation where the list_empty() sees a valid list entry, but the
subsequent list_first() sees a different view of list head state after a
modification.

In the case of dm-thin, this author had a production box crash from a GP
fault in the process_deferred_bios path.  This function saw a valid list
head in get_first_thin() but when it subsequently dereferenced that and
turned it into a thin_c, it got the inside of the struct pool, since the
list was now empty and referring to itself.  The kernel on which this
occurred printed both a warning about a refcount_t being saturated, and
a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
prior to the fault itself.  When the resulting kdump was examined, it
was possible to see another thread patiently waiting in thin_dtr's
synchronize_rcu.

The thin_dtr call managed to pull the thin_c out of the active thins
list (and have it be the last entry in the active_thins list) at just
the wrong moment which lead to this crash.

Fortunately, the fix here is straight forward.  Switch get_first_thin()
function to use list_first_or_null_rcu() which performs just a single
READ_ONCE() and returns NULL if the list is already empty.

This was run against the devicemapper test suite's thin-provisioning
suites for delete and suspend and no regressions were observed.

Signed-off-by: Krister Johansen &lt;kjlx@templeofstupid.com&gt;
Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep")
Cc: stable@vger.kernel.org
Acked-by: Ming-Hung Tsai &lt;mtsai@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITY</title>
<updated>2025-01-08T14:28:47Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2025-01-07T16:47:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=47f33c27fc9565fb0bc7dfb76be08d445cd3d236'/>
<id>urn:sha1:47f33c27fc9565fb0bc7dfb76be08d445cd3d236</id>
<content type='text'>
dm-ebs uses dm-bufio to process requests that are not aligned on logical
sector size. dm-bufio doesn't support passing integrity data (and it is
unclear how should it do it), so we shouldn't set the
DM_TARGET_PASSES_INTEGRITY flag.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: stable@vger.kernel.org
Fixes: d3c7b35c20d6 ("dm: add emulated block size target")
</content>
</entry>
<entry>
<title>dm-verity FEC: Avoid copying RS parity bytes twice.</title>
<updated>2025-01-03T16:08:49Z</updated>
<author>
<name>Milan Broz</name>
<email>gmazyland@gmail.com</email>
</author>
<published>2024-12-18T12:56:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=548c6edbed92031baa4aa32cae55628c810c3ebb'/>
<id>urn:sha1:548c6edbed92031baa4aa32cae55628c810c3ebb</id>
<content type='text'>
Caching RS parity bytes is already done in fec_decode_bufs() now,
no need to use yet another buffer for conversion to uint16_t.

This patch removes that double copy of RS parity bytes.

Signed-off-by: Milan Broz &lt;gmazyland@gmail.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm-verity FEC: Fix RS FEC repair for roots unaligned to block size (take 2)</title>
<updated>2025-01-03T16:08:25Z</updated>
<author>
<name>Milan Broz</name>
<email>gmazyland@gmail.com</email>
</author>
<published>2024-12-18T12:56:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6df90c02bae468a3a6110bafbc659884d0c4966c'/>
<id>urn:sha1:6df90c02bae468a3a6110bafbc659884d0c4966c</id>
<content type='text'>
This patch fixes an issue that was fixed in the commit
  df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
but later broken again in the commit
  8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")

If the Reed-Solomon roots setting spans multiple blocks, the code does not
use proper parity bytes and randomly fails to repair even trivial errors.

This bug cannot happen if the sector size is multiple of RS roots
setting (Android case with roots 2).

The previous solution was to find a dm-bufio block size that is multiple
of the device sector size and roots size. Unfortunately, the optimization
in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
is incorrect and uses data block size for some roots (for example, it uses
4096 block size for roots = 20).

This patch uses a different approach:

 - It always uses a configured data block size for dm-bufio to avoid
 possible misaligned IOs.

 - and it caches the processed parity bytes, so it can join it
 if it spans two blocks.

As the RS calculation is called only if an error is detected and
the process is computationally intensive, copying a few more bytes
should not introduce performance issues.

The issue was reported to cryptsetup with trivial reproducer
  https://gitlab.com/cryptsetup/cryptsetup/-/issues/923

Reproducer (with roots=20):

 # create verity device with RS FEC
 dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
 veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \
 awk '/^Root hash/{ print $3 }' &gt;roothash

 # create an erasure that should always be repairable with this roots setting
 dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none

 # try to read it through dm-verity
 veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash)
 dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer

 Even now the log says it cannot repair it:
   : verity-fec: 7:1: FEC 0: failed to correct: -74
   : device-mapper: verity: 7:1: data block 0 is corrupted
   ...

With this fix, errors are properly repaired.
   : verity-fec: 7:1: FEC 0: corrected 4 errors

Signed-off-by: Milan Broz &lt;gmazyland@gmail.com&gt;
Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm array: fix cursor index when skipping across block boundaries</title>
<updated>2024-12-13T13:39:18Z</updated>
<author>
<name>Ming-Hung Tsai</name>
<email>mtsai@redhat.com</email>
</author>
<published>2024-12-05T11:41:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0bb1968da2737ba68fd63857d1af2b301a18d3bf'/>
<id>urn:sha1:0bb1968da2737ba68fd63857d1af2b301a18d3bf</id>
<content type='text'>
dm_array_cursor_skip() seeks to the target position by loading array
blocks iteratively until the specified number of entries to skip is
reached. When seeking across block boundaries, it uses
dm_array_cursor_next() to step into the next block.
dm_array_cursor_skip() must first move the cursor index to the end
of the current block; otherwise, the cursor position could incorrectly
remain in the same block, causing the actual number of skipped entries
to be much smaller than expected.

This bug affects cache resizing in v2 metadata and could lead to data
loss if the fast device is shrunk during the first-time resume. For
example:

1. create a cache metadata consists of 32768 blocks, with a dirty block
   assigned to the second bitmap block. cache_restore v1.0 is required.

cat &lt;&lt;EOF &gt;&gt; cmeta.xml
&lt;superblock uuid="" block_size="64" nr_cache_blocks="32768" \
policy="smq" hint_width="4"&gt;
  &lt;mappings&gt;
    &lt;mapping cache_block="32767" origin_block="0" dirty="true"/&gt;
  &lt;/mappings&gt;
&lt;/superblock&gt;
EOF
dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2

2. bring up the cache while attempt to discard all the blocks belonging
   to the second bitmap block (block# 32576 to 32767). The last command
   is expected to fail, but it actually succeeds.

dmsetup create cdata --table "0 2084864 linear /dev/sdc 8192"
dmsetup create corig --table "0 65536 linear /dev/sdc 2105344"
dmsetup create cache --table "0 65536 cache /dev/mapper/cmeta \
/dev/mapper/cdata /dev/mapper/corig 64 2 metadata2 writeback smq \
2 migration_threshold 0"

In addition to the reproducer described above, this fix can be
verified using the "array_cursor/skip" tests in dm-unit:
  dm-unit run /pdata/array_cursor/skip/ --kernel-dir &lt;KERNEL_DIR&gt;

Signed-off-by: Ming-Hung Tsai &lt;mtsai@redhat.com&gt;
Fixes: 9b696229aa7d ("dm persistent data: add cursor skip functions to the cursor APIs")
Reviewed-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm array: fix unreleased btree blocks on closing a faulty array cursor</title>
<updated>2024-12-13T13:37:39Z</updated>
<author>
<name>Ming-Hung Tsai</name>
<email>mtsai@redhat.com</email>
</author>
<published>2024-12-05T11:41:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=626f128ee9c4133b1cfce4be2b34a1508949370e'/>
<id>urn:sha1:626f128ee9c4133b1cfce4be2b34a1508949370e</id>
<content type='text'>
The cached block pointer in dm_array_cursor might be NULL if it reaches
an unreadable array block, or the array is empty. Therefore,
dm_array_cursor_end() should call dm_btree_cursor_end() unconditionally,
to prevent leaving unreleased btree blocks.

This fix can be verified using the "array_cursor/iterate/empty" test
in dm-unit:
  dm-unit run /pdata/array_cursor/iterate/empty --kernel-dir &lt;KERNEL_DIR&gt;

Signed-off-by: Ming-Hung Tsai &lt;mtsai@redhat.com&gt;
Fixes: fdd1315aa5f0 ("dm array: introduce cursor api")
Reviewed-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm array: fix releasing a faulty array block twice in dm_array_cursor_end</title>
<updated>2024-12-13T13:33:38Z</updated>
<author>
<name>Ming-Hung Tsai</name>
<email>mtsai@redhat.com</email>
</author>
<published>2024-12-05T11:41:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f2893c0804d86230ffb8f1c8703fdbb18648abc8'/>
<id>urn:sha1:f2893c0804d86230ffb8f1c8703fdbb18648abc8</id>
<content type='text'>
When dm_bm_read_lock() fails due to locking or checksum errors, it
releases the faulty block implicitly while leaving an invalid output
pointer behind. The caller of dm_bm_read_lock() should not operate on
this invalid dm_block pointer, or it will lead to undefined result.
For example, the dm_array_cursor incorrectly caches the invalid pointer
on reading a faulty array block, causing a double release in
dm_array_cursor_end(), then hitting the BUG_ON in dm-bufio cache_put().

Reproduce steps:

1. initialize a cache device

dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
dmsetup create cdata --table "0 65536 linear /dev/sdc 8192"
dmsetup create corig --table "0 524288 linear /dev/sdc $262144"
dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1
dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"

2. wipe the second array block offline

dmsteup remove cache cmeta cdata corig
mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \
2&gt;/dev/null | hexdump -e '1/8 "%u\n"')
ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \
2&gt;/dev/null | hexdump -e '1/8 "%u\n"')
dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock

3. try reopen the cache device

dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
dmsetup create cdata --table "0 65536 linear /dev/sdc 8192"
dmsetup create corig --table "0 524288 linear /dev/sdc $262144"
dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"

Kernel logs:

(snip)
device-mapper: array: array_block_check failed: blocknr 0 != wanted 10
device-mapper: block manager: array validator check failed for block 10
device-mapper: array: get_ablock failed
device-mapper: cache metadata: dm_array_cursor_next for mapping failed
------------[ cut here ]------------
kernel BUG at drivers/md/dm-bufio.c:638!

Fix by setting the cached block pointer to NULL on errors.

In addition to the reproducer described above, this fix can be
verified using the "array_cursor/damaged" test in dm-unit:
  dm-unit run /pdata/array_cursor/damaged --kernel-dir &lt;KERNEL_DIR&gt;

Signed-off-by: Ming-Hung Tsai &lt;mtsai@redhat.com&gt;
Fixes: fdd1315aa5f0 ("dm array: introduce cursor api")
Reviewed-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm: Fix dm-zoned-reclaim zone write pointer alignment</title>
<updated>2024-12-10T16:15:33Z</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2024-12-09T12:23:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b76b840fd93374240b59825f1ab8e2f5c9907acb'/>
<id>urn:sha1:b76b840fd93374240b59825f1ab8e2f5c9907acb</id>
<content type='text'>
The zone reclaim processing of the dm-zoned device mapper uses
blkdev_issue_zeroout() to align the write pointer of a zone being used
for reclaiming another zone, to write the valid data blocks from the
zone being reclaimed at the same position relative to the zone start in
the reclaim target zone.

The first call to blkdev_issue_zeroout() will try to use hardware
offload using a REQ_OP_WRITE_ZEROES operation if the device reports a
non-zero max_write_zeroes_sectors queue limit. If this operation fails
because of the lack of hardware support, blkdev_issue_zeroout() falls
back to using a regular write operation with the zero-page as buffer.
Currently, such REQ_OP_WRITE_ZEROES failure is automatically handled by
the block layer zone write plugging code which will execute a report
zones operation to ensure that the write pointer of the target zone of
the failed operation has not changed and to "rewind" the zone write
pointer offset of the target zone as it was advanced when the write zero
operation was submitted. So the REQ_OP_WRITE_ZEROES failure does not
cause any issue and blkdev_issue_zeroout() works as expected.

However, since the automatic recovery of zone write pointers by the zone
write plugging code can potentially cause deadlocks with queue freeze
operations, a different recovery must be implemented in preparation for
the removal of zone write plugging report zones based recovery.

Do this by introducing the new function blk_zone_issue_zeroout(). This
function first calls blkdev_issue_zeroout() with the flag
BLKDEV_ZERO_NOFALLBACK to intercept failures on the first execution
which attempt to use the device hardware offload with the
REQ_OP_WRITE_ZEROES operation. If this attempt fails, a report zone
operation is issued to restore the zone write pointer offset of the
target zone to the correct position and blkdev_issue_zeroout() is called
again without the BLKDEV_ZERO_NOFALLBACK flag. The report zones
operation performing this recovery is implemented using the helper
function disk_zone_sync_wp_offset() which calls the gendisk report_zones
file operation with the callback disk_report_zones_cb(). This callback
updates the target write pointer offset of the target zone using the new
function disk_zone_wplug_sync_wp_offset().

dmz_reclaim_align_wp() is modified to change its call to
blkdev_issue_zeroout() to a call to blk_zone_issue_zeroout() without any
other change needed as the two functions are functionnally equivalent.

Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Link: https://lore.kernel.org/r/20241209122357.47838-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: revert replacing IS_ERR_OR_NULL with IS_ERR again</title>
<updated>2024-12-03T22:06:27Z</updated>
<author>
<name>Liequan Che</name>
<email>cheliequan@inspur.com</email>
</author>
<published>2024-12-02T11:56:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b2e382ae12a63560fca35050498e19e760adf8c0'/>
<id>urn:sha1:b2e382ae12a63560fca35050498e19e760adf8c0</id>
<content type='text'>
Commit 028ddcac477b ("bcache: Remove unnecessary NULL point check in
node allocations") leads a NULL pointer deference in cache_set_flush().

1721         if (!IS_ERR_OR_NULL(c-&gt;root))
1722                 list_add(&amp;c-&gt;root-&gt;list, &amp;c-&gt;btree_cache);

&gt;From the above code in cache_set_flush(), if previous registration code
fails before allocating c-&gt;root, it is possible c-&gt;root is NULL as what
it is initialized. __bch_btree_node_alloc() never returns NULL but
c-&gt;root is possible to be NULL at above line 1721.

This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this.

Fixes: 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations")
Signed-off-by: Liequan Che &lt;cheliequan@inspur.com&gt;
Cc: stable@vger.kernel.org
Cc: Zheng Wang &lt;zyytlz.wz@163.com&gt;
Reviewed-by: Mingzhe Zou &lt;mingzhe.zou@easystack.cn&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20241202115638.28957-1-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
