linux/net/ceph, branch v6.19

libceph: make calc_target() set t->paused, not just clear it

2026-01-05T23:39:43Z

Currently calc_target() clears t->paused if the request shouldn't be paused anymore, but doesn't ever set t->paused even though it's able to determine when the request should be paused. Setting t->paused is left to __submit_request() which is fine for regular requests but doesn't work for linger requests -- since __submit_request() doesn't operate on linger requests, there is nowhere for lreq->t.paused to be set. One consequence of this is that watches don't get reestablished on paused -> unpaused transitions in cases where requests have been paused long enough for the (paused) unwatch request to time out and for the subsequent (re)watch request to enter the paused state. On top of the watch not getting reestablished, rbd_reregister_watch() gets stuck with rbd_dev->watch_mutex held: rbd_register_watch __rbd_register_watch ceph_osdc_watch linger_reg_commit_wait It's waiting for lreq->reg_commit_wait to be completed, but for that to happen the respective request needs to end up on need_resend_linger list and be kicked when requests are unpaused. There is no chance for that if the request in question is never marked paused in the first place. The fact that rbd_dev->watch_mutex remains taken out forever then prevents the image from getting unmapped -- "rbd unmap" would inevitably hang in D state on an attempt to grab the mutex. Cc: stable@vger.kernel.org Reported-by: Raphael Zimmer Signed-off-by: Ilya Dryomov Reviewed-by: Viacheslav Dubeyko

libceph: reset sparse-read state in osd_fault()

2026-01-05T21:46:43Z

When a fault occurs, the connection is abandoned, reestablished, and any pending operations are retried. The OSD client tracks the progress of a sparse-read reply using a separate state machine, largely independent of the messenger's state. If a connection is lost mid-payload or the sparse-read state machine returns an error, the sparse-read state is not reset. The OSD client will then interpret the beginning of a new reply as the continuation of the old one. If this makes the sparse-read machinery enter a failure state, it may never recover, producing loops like: libceph: [0] got 0 extents libceph: data len 142248331 != extent len 0 libceph: osd0 (1)...:6801 socket error on read libceph: data len 142248331 != extent len 0 libceph: osd0 (1)...:6801 socket error on read Therefore, reset the sparse-read state in osd_fault(), ensuring retries start from a clean state. Cc: stable@vger.kernel.org Fixes: f628d7999727 ("libceph: add sparse read support to OSD client") Signed-off-by: Sam Edwards Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov

libceph: return the handler error from mon_handle_auth_done()

2026-01-05T21:46:43Z

Currently any error from ceph_auth_handle_reply_done() is propagated via finish_auth() but isn't returned from mon_handle_auth_done(). This results in higher layers learning that (despite the monitor considering us to be successfully authenticated) something went wrong in the authentication phase and reacting accordingly, but msgr2 still trying to proceed with establishing the session in the background. In the case of secure mode this can trigger a WARN in setup_crypto() and later lead to a NULL pointer dereference inside of prepare_auth_signature(). Cc: stable@vger.kernel.org Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: Ilya Dryomov Reviewed-by: Viacheslav Dubeyko

libceph: make free_choose_arg_map() resilient to partial allocation

2026-01-05T12:28:26Z

free_choose_arg_map() may dereference a NULL pointer if its caller fails after a partial allocation. For example, in decode_choose_args(), if allocation of arg_map->args fails, execution jumps to the fail label and free_choose_arg_map() is called. Since arg_map->size is updated to a non-zero value before memory allocation, free_choose_arg_map() will iterate over arg_map->args and dereference a NULL pointer. To prevent this potential NULL pointer dereference and make free_choose_arg_map() more resilient, add checks for pointers before iterating. Cc: stable@vger.kernel.org Co-authored-by: Ilya Dryomov Signed-off-by: Tuo Li Reviewed-by: Viacheslav Dubeyko Signed-off-by: Ilya Dryomov

libceph: replace overzealous BUG_ON in osdmap_apply_incremental()

2026-01-05T12:28:26Z

If the osdmap is (maliciously) corrupted such that the incremental osdmap epoch is different from what is expected, there is no need to BUG. Instead, just declare the incremental osdmap to be invalid. Cc: stable@vger.kernel.org Reported-by: ziming zhang Signed-off-by: Ilya Dryomov

libceph: prevent potential out-of-bounds reads in handle_auth_done()

2026-01-05T12:28:25Z

Perform an explicit bounds check on payload_len to avoid a possible out-of-bounds access in the callout. [ idryomov: changelog ] Cc: stable@vger.kernel.org Signed-off-by: ziming zhang Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov

Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client

2025-12-14T03:24:10Z

Pull ceph updates from Ilya Dryomov: "We have a patch that adds an initial set of tracepoints to the MDS client from Max, a fix that hardens osdmap parsing code from myself (marked for stable) and a few assorted fixups" * tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client: rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES libceph: make decode_pool() more resilient against corrupted osdmaps libceph: Amend checking to fix `make W=1` build breakage ceph: Amend checking to fix `make W=1` build breakage ceph: add trace points to the MDS client libceph: fix log output race condition in OSD client

libceph: make decode_pool() more resilient against corrupted osdmaps

2025-12-10T10:50:54Z

If the osdmap is (maliciously) corrupted such that the encoded length of ceph_pg_pool envelope is less than what is expected for a particular encoding version, out-of-bounds reads may ensue because the only bounds check that is there is based on that length value. This patch adds explicit bounds checks for each field that is decoded or skipped. Cc: stable@vger.kernel.org Reported-by: ziming zhang Signed-off-by: Ilya Dryomov Reviewed-by: Xiubo Li Tested-by: ziming zhang

libceph: Amend checking to fix `make W=1` build breakage

2025-12-10T10:50:54Z

In a few cases the code compares 32-bit value to a SIZE_MAX derived constant which is much higher than that value on 64-bit platforms, Clang, in particular, is not happy about this net/ceph/osdmap.c:1441:10: error: result of comparison of constant 4611686018427387891 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 1441 | if (len > (SIZE_MAX - sizeof(*pg)) / sizeof(u32)) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/ceph/osdmap.c:1624:10: error: result of comparison of constant 2305843009213693945 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 1624 | if (len > (SIZE_MAX - sizeof(*pg)) / (2 * sizeof(u32))) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by casting to size_t. Note, that possible replacement of SIZE_MAX by U32_MAX may lead to the behaviour changes on the corner cases. Signed-off-by: Andy Shevchenko Reviewed-by: Viacheslav Dubeyko Signed-off-by: Ilya Dryomov

libceph: fix log output race condition in OSD client

2025-12-10T10:50:53Z

OSD client logging has a problem in get_osd() and put_osd(). For one logging output refcount_read() is called twice. If recount value changes between both calls logging output is not consistent. This patch prints out only the resulting value. [ idryomov: don't make the log messages more verbose ] Signed-off-by: Simon Buttgereit Reviewed-by: Viacheslav Dubeyko Signed-off-by: Ilya Dryomov