summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorLines
3 daysMerge tag 'nfs-for-7.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds-2/+8
Pull NFS client fixes from Anna Schumaker: - Fix NFS KConfig typos - Decrement re_receiving on the early exit paths - return EISDIR on nfs3_proc_create if d_alias is a dir * tag 'nfs-for-7.0-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix NFS KConfig typos xprtrdma: Decrement re_receiving on the early exit paths nfs: return EISDIR on nfs3_proc_create if d_alias is a dir
3 daysMerge tag 'for-7.0-rc3-tag' of ↵Linus Torvalds-9/+126
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - detect possible file name hash collision earlier so it does not lead to transaction abort - handle b-tree leaf overflows when snapshotting a subvolume with set received UUID, leading to transaction abort - in zoned mode, reorder relocation block group initialization after the transaction kthread start - fix orphan cleanup state tracking of subvolume, this could lead to invalid dentries under some conditions - add locking around updates of dynamic reclain state update - in subpage mode, add missing RCU unlock when trying to releae extent buffer - remap tree fixes: - add missing description strings for the newly added remap tree - properly update search key when iterating backrefs * tag 'for-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: remove duplicated definition of btrfs_printk_in_rcu() btrfs: remove unnecessary transaction abort in the received subvol ioctl btrfs: abort transaction on failure to update root in the received subvol ioctl btrfs: fix transaction abort on set received ioctl due to item overflow btrfs: fix transaction abort when snapshotting received subvolumes btrfs: fix transaction abort on file creation due to name hash collision btrfs: read key again after incrementing slot in move_existing_remaps() btrfs: add missing RCU unlock in error path in try_release_subpage_extent_buffer() btrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol create btrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread start btrfs: hold space_info->lock when clearing periodic reclaim ready btrfs: print-tree: add remap tree definitions
3 daysMerge tag 'net-7.0-rc4' of ↵Linus Torvalds-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN and netfilter. Current release - regressions: - eth: mana: Null service_wq on setup error to prevent double destroy Previous releases - regressions: - nexthop: fix percpu use-after-free in remove_nh_grp_entry - sched: teql: fix NULL pointer dereference in iptunnel_xmit on TEQL slave xmit - bpf: fix nd_tbl NULL dereference when IPv6 is disabled - neighbour: restore protocol != 0 check in pneigh update - tipc: fix divide-by-zero in tipc_sk_filter_connect() - eth: - mlx5: - fix crash when moving to switchdev mode - fix DMA FIFO desync on error CQE SQ recovery - iavf: fix PTP use-after-free during reset - bonding: fix type confusion in bond_setup_by_slave() - lan78xx: fix WARN in __netif_napi_del_locked on disconnect Previous releases - always broken: - core: add xmit recursion limit to tunnel xmit functions - net-shapers: don't free reply skb after genlmsg_reply() - netfilter: - fix stack out-of-bounds read in pipapo_drop() - fix OOB read in nfnl_cthelper_dump_table() - mctp: - fix device leak on probe failure - i2c: fix skb memory leak in receive path - can: keep the max bitrate error at 5% - eth: - bonding: fix nd_tbl NULL dereference when IPv6 is disabled - bnxt_en: fix RSS table size check when changing ethtool channels - amd-xgbe: prevent CRC errors during RX adaptation with AN disabled - octeontx2-af: devlink: fix NIX RAS reporter recovery condition" * tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits) net: prevent NULL deref in ip[6]tunnel_xmit() octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status octeontx2-af: devlink: fix NIX RAS reporter recovery condition net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support net/mana: Null service_wq on setup error to prevent double destroy selftests: rtnetlink: add neighbour update test neighbour: restore protocol != 0 check in pneigh update net: dsa: realtek: Fix LED group port bit for non-zero LED group tipc: fix divide-by-zero in tipc_sk_filter_connect() net: dsa: microchip: Fix error path in PTP IRQ setup bpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled bpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled net: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled ipv6: move the disable_ipv6_mod knob to core code net: bcmgenet: fix broken EEE by converting to phylib-managed state net-shapers: don't free reply skb after genlmsg_reply() net: dsa: mxl862xx: don't set user_mii_bus net: ethernet: arc: emac: quiesce interrupts before requesting IRQ page_pool: store detach_time as ktime_t to avoid false-negatives net: macb: Shuffle the tx ring before enabling tx ...
7 daysksmbd: Don't log keys in SMB3 signing and encryption key generationThorsten Blum-20/+2
When KSMBD_DEBUG_AUTH logging is enabled, generate_smb3signingkey() and generate_smb3encryptionkey() log the session, signing, encryption, and decryption key bytes. Remove the logs to avoid exposing credentials. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 dayssmb: server: fix use-after-free in smb2_open()Marios Makassikis-3/+2
The opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is dereferenced after rcu_read_unlock(), creating a use-after-free window. Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 daysksmbd: fix use-after-free in smb_lazy_parent_lease_break_close()Namjae Jeon-2/+4
opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is being accessed after rcu_read_unlock() has been called. This creates a race condition where the memory could be freed by a concurrent writer between the unlock and the subsequent pointer dereferences (opinfo->is_lease, etc.), leading to a use-after-free. Fixes: 5fb282ba4fef ("ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_close") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 daysksmbd: fix use-after-free by using call_rcu() for oplock_infoNamjae Jeon-10/+24
ksmbd currently frees oplock_info immediately using kfree(), even though it is accessed under RCU read-side critical sections in places like opinfo_get() and proc_show_files(). Since there is no RCU grace period delay between nullifying the pointer and freeing the memory, a reader can still access oplock_info structure after it has been freed. This can leads to a use-after-free especially in opinfo_get() where atomic_inc_not_zero() is called on already freed memory. Fix this by switching to deferred freeing using call_rcu(). Fixes: 18b4fac5ef17 ("ksmbd: fix use-after-free in smb_break_all_levII_oplock()") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 daysksmbd: fix use-after-free in proc_show_files due to early rcu_read_unlockAli Khaledi-5/+5
The opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is dereferenced after rcu_read_unlock(), creating a use-after-free window. A concurrent opinfo_put() can free the opinfo between the unlock and the subsequent access to opinfo->is_lease, opinfo->o_lease->state, and opinfo->level. Fix this by deferring rcu_read_unlock() until after all opinfo field accesses are complete. The values needed (const_names, count, level) are copied into local variables under the RCU read lock, and the potentially-sleeping seq_printf calls happen after the lock is released. Found by AI-assisted code review (Claude Opus 4.6, Anthropic) in collaboration with Ali Khaledi. Cc: stable@vger.kernel.org Fixes: b38f99c1217a ("ksmbd: add procfs interface for runtime monitoring and statistics") Signed-off-by: Ali Khaledi <ali.khaledi1989@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
7 dayssmb/server: Fix another refcount leak in smb2_open()Guenter Roeck-1/+2
If ksmbd_override_fsids() fails, we jump to err_out2. At that point, fp is NULL because it hasn't been assigned dh_info.fp yet, so ksmbd_fd_put(work, fp) will not be called. However, dh_info.fp was already inserted into the session file table by ksmbd_reopen_durable_fd(), so it will leak in the session file table until the session is closed. Move fp = dh_info.fp; ahead of the ksmbd_override_fsids() check to fix the problem. Found by an experimental AI code review agent at Google. Fixes: c8efcc786146a ("ksmbd: add support for durable handles v1/v2") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
9 daysrxrpc, afs: Fix missing error pointer check after rxrpc_kernel_lookup_peer()Miaoqian Lin-4/+4
rxrpc_kernel_lookup_peer() can also return error pointers in addition to NULL, so just checking for NULL is not sufficient. Fix this by: (1) Changing rxrpc_kernel_lookup_peer() to return -ENOMEM rather than NULL on allocation failure. (2) Making the callers in afs use IS_ERR() and PTR_ERR() to pass on the error code returned. Fixes: 72904d7b9bfb ("rxrpc, afs: Allow afs to pin rxrpc_peer objects") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/368272.1772713861@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds-58/+107
Pull smb client fixes from Steve French: - Fix potential oops on open failure - Fix unmount to better free deferred closes - Use proper constant-time MAC comparison function - Two buffer allocation size fixes - Two minor cleanups - make SMB2 kunit tests a distinct module * tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix oops due to uninitialised var in smb2_unlink() cifs: open files should not hold ref on superblock smb: client: Compare MACs in constant time smb/client: remove unused SMB311_posix_query_info() smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info() smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op() smb: update some doc references smb/client: make SMB2 maperror KUnit tests a separate module
10 dayssmb: client: fix oops due to uninitialised var in smb2_unlink()Paulo Alcantara-1/+3
If SMB2_open_init() or SMB2_close_init() fails (e.g. reconnect), the iovs set @rqst will be left uninitialised, hence calling SMB2_open_free(), SMB2_close_free() or smb2_set_related() on them will oops. Fix this by initialising @close_iov and @open_iov before setting them in @rqst. Reported-by: Thiago Becker <tbecker@redhat.com> Fixes: 1cf9f2a6a544 ("smb: client: handle unlink(2) of files open by different clients") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
10 daysMerge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds-0/+3
Pull fsverity fix from Eric Biggers: "Prevent CONFIG_FS_VERITY from being enabled when the page size is 256K, since it doesn't work in that case" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: add dependency on 64K or smaller pages
11 daysMerge tag 'vfs-7.0-rc3.fixes' of ↵Linus Torvalds-87/+265
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - kthread: consolidate kthread exit paths to prevent use-after-free - iomap: - don't mark folio uptodate if read IO has bytes pending - don't report direct-io retries to fserror - reject delalloc mappings during writeback - ns: tighten visibility checks - netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence * tag 'vfs-7.0-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: reject delalloc mappings during writeback iomap: don't mark folio uptodate if read IO has bytes pending selftests: fix mntns iteration selftests nstree: tighten permission checks for listing nsfs: tighten permission checks for handle opening nsfs: tighten permission checks for ns iteration ioctls netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence kthread: consolidate kthread exit paths to prevent use-after-free iomap: don't report direct-io retries to fserror
12 dayscifs: open files should not hold ref on superblockShyam Prasad N-13/+50
Today whenever we deal with a file, in addition to holding a reference on the dentry, we also get a reference on the superblock. This happens in two cases: 1. when a new cinode is allocated 2. when an oplock break is being processed The reasoning for holding the superblock ref was to make sure that when umount happens, if there are users of inodes and dentries, it does not try to clean them up and wait for the last ref to superblock to be dropped by last of such users. But the side effect of doing that is that umount silently drops a ref on the superblock and we could have deferred closes and lease breaks still holding these refs. Ideally, we should ensure that all of these users of inodes and dentries are cleaned up at the time of umount, which is what this code is doing. This code change allows these code paths to use a ref on the dentry (and hence the inode). That way, umount is ensured to clean up SMB client resources when it's the last ref on the superblock (For ex: when same objects are shared). The code change also moves the call to close all the files in deferred close list to the umount code path. It also waits for oplock_break workers to be flushed before calling kill_anon_super (which eventually frees up those objects). Fixes: 24261fc23db9 ("cifs: delay super block destruction until all cifsFileInfo objects are gone") Fixes: 705c79101ccf ("smb: client: fix use-after-free in cifs_oplock_break") Cc: <stable@vger.kernel.org> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
12 daysiomap: reject delalloc mappings during writebackDarrick J. Wong-6/+7
Filesystems should never provide a delayed allocation mapping to writeback; they're supposed to allocate the space before replying. This can lead to weird IO errors and crashes in the block layer if the filesystem is being malicious, or if it hadn't set iomap->dev because it's a delalloc mapping. Fix this by failing writeback on delalloc mappings. Currently no filesystems actually misbehave in this manner, but we ought to be stricter about things like that. Cc: stable@vger.kernel.org # v5.5 Fixes: 598ecfbaa742ac ("iomap: lift the xfs writeback code to iomap") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20260302173002.GL13829@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
12 daysiomap: don't mark folio uptodate if read IO has bytes pendingJoanne Koong-3/+12
If a folio has ifs metadata attached to it and the folio is partially read in through an async IO helper with the rest of it then being read in through post-EOF zeroing or as inline data, and the helper successfully finishes the read first, then post-EOF zeroing / reading inline will mark the folio as uptodate in iomap_set_range_uptodate(). This is a problem because when the read completion path later calls iomap_read_end(), it will call folio_end_read(), which sets the uptodate bit using XOR semantics. Calling folio_end_read() on a folio that was already marked uptodate clears the uptodate bit. Fix this by not marking the folio as uptodate if the read IO has bytes pending. The folio uptodate state will be set in the read completion path through iomap_end_read() -> folio_end_read(). Reported-by: Wei Gao <wegao@suse.com> Suggested-by: Sasha Levin <sashal@kernel.org> Tested-by: Wei Gao <wegao@suse.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: stable@vger.kernel.org # v6.19 Link: https://lore.kernel.org/linux-fsdevel/aYbmy8JdgXwsGaPP@autotest-wegao.qe.prg2.suse.org/ Fixes: b2f35ac4146d ("iomap: add caller-provided callbacks for read and readahead") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260303233420.874231-2-joannelkoong@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
12 dayssmb: client: Compare MACs in constant timeEric Biggers-2/+5
To prevent timing attacks, MAC comparisons need to be constant-time. Replace the memcmp() with the correct function, crypto_memneq(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb/client: remove unused SMB311_posix_query_info()ZhangGuoDong-21/+0
It is currently unused, as now we are doing compounding instead (see smb2_query_path_info()). Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info()ZhangGuoDong-1/+1
SMB311_posix_query_info() is currently unused, but it may still be used in some stable versions, so these changes are submitted as a separate patch. Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer, so the allocated buffer matches the actual struct size. Fixes: b1bc1874b885 ("smb311: Add support for SMB311 query info (non-compounded)") Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
12 dayssmb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op()ZhangGuoDong-2/+2
Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer, so the allocated buffer matches the actual struct size. Fixes: 6a5f6592a0b6 ("SMB311: Add support for query info using posix extensions (level 100)") Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
13 daysMerge tag 'for-7.0-rc2-tag' of ↵Linus Torvalds-28/+67
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "One-liner or short fixes for minor/moderate problems reported recently: - fixes or level adjustments of error messages - fix leaked transaction handles after aborted transactions, when using the remap tree feature - fix a few leaked chunk maps after errors - fix leaked page array in io_uring encoded read if an error occurs and the 'finished' is not called - fix double release of reserved extents when doing a range COW - don't commit super block when the filesystem is in shutdown state - fix squota accounting condition when checking members vs parent usage - other error handling fixes" * tag 'for-7.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: check block group lookup in remove_range_from_remap_tree() btrfs: fix transaction handle leaks in btrfs_last_identity_remap_gone() btrfs: fix chunk map leak in btrfs_map_block() after btrfs_translate_remap() btrfs: fix chunk map leak in btrfs_map_block() after btrfs_chunk_map_num_copies() btrfs: fix compat mask in error messages in btrfs_check_features() btrfs: print correct subvol num if active swapfile prevents deletion btrfs: fix warning in scrub_verify_one_metadata() btrfs: fix objectid value in error message in check_extent_data_ref() btrfs: fix incorrect key offset in error message in check_dev_extent_item() btrfs: fix error message order of parameters in btrfs_delete_delayed_dir_index() btrfs: don't commit the super block when unmounting a shutdown filesystem btrfs: free pages on error in btrfs_uring_read_extent() btrfs: fix referenced/exclusive check in squota_check_parent_usage() btrfs: remove pointless WARN_ON() in cache_save_setup() btrfs: convert log messages to error level in btrfs_replay_log() btrfs: remove btrfs_handle_fs_error() after failure to recover log trees btrfs: remove redundant warning message in btrfs_check_uuid_tree() btrfs: change warning messages to error level in open_ctree() btrfs: fix a double release on reserved extents in cow_one_range() btrfs: handle discard errors in in btrfs_finish_extent_commit()
13 daysbtrfs: remove duplicated definition of btrfs_printk_in_rcu()Filipe Manana-3/+0
It's defined twice in a row for the !CONFIG_PRINTK case, so remove one of the duplicates. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: remove unnecessary transaction abort in the received subvol ioctlFilipe Manana-1/+0
If we fail to remove an item from the uuid tree, we don't need to abort the transaction since we have not done any change before. So remove that transaction abort. Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: abort transaction on failure to update root in the received subvol ioctlFilipe Manana-1/+2
If we failed to update the root we don't abort the transaction, which is wrong since we already used the transaction to remove an item from the uuid tree. Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: fix transaction abort on set received ioctl due to item overflowFilipe Manana-2/+59
If the set received ioctl fails due to an item overflow when attempting to add the BTRFS_UUID_KEY_RECEIVED_SUBVOL we have to abort the transaction since we did some metadata updates before. This means that if a user calls this ioctl with the same received UUID field for a lot of subvolumes, we will hit the overflow, trigger the transaction abort and turn the filesystem into RO mode. A malicious user could exploit this, and this ioctl does not even requires that a user has admin privileges (CAP_SYS_ADMIN), only that he/she owns the subvolume. Fix this by doing an early check for item overflow before starting a transaction. This is also race safe because we are holding the subvol_sem semaphore in exclusive (write) mode. A test case for fstests will follow soon. Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: fix transaction abort when snapshotting received subvolumesFilipe Manana-0/+16
Currently a user can trigger a transaction abort by snapshotting a previously received snapshot a bunch of times until we reach a BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we can store in a leaf). This is very likely not common in practice, but if it happens, it turns the filesystem into RO mode. The snapshot, send and set_received_subvol and subvol_setflags (used by receive) don't require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user could use this to turn a filesystem into RO mode and disrupt a system. Reproducer script: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # Create a subvolume and set it to RO so that it can be used for send. btrfs subvolume create $MNT/sv touch $MNT/sv/foo btrfs property set $MNT/sv ro true # Send and receive the subvolume into snaps/sv. mkdir $MNT/snaps btrfs send $MNT/sv | btrfs receive $MNT/snaps # Now snapshot the received subvolume, which has a received_uuid, a # lot of times to trigger the leaf overflow. total=500 for ((i = 1; i <= $total; i++)); do echo -ne "\rCreating snapshot $i/$total" btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null done echo umount $MNT When running the test: $ ./test.sh (...) Create subvolume '/mnt/sdi/sv' At subvol /mnt/sdi/sv At subvol sv Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system And in dmesg/syslog: $ dmesg (...) [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252! [251067.629212] ------------[ cut here ]------------ [251067.630033] BTRFS: Transaction aborted (error -75) [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235 [251067.632851] Modules linked in: btrfs dm_zero (...) [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full) [251067.646165] Tainted: [W]=WARN [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs] [251067.649984] Code: f0 48 0f (...) [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292 [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3 [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750 [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820 [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0 [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5 [251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000 [251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0 [251067.661972] Call Trace: [251067.662292] <TASK> [251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs] [251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs] [251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs] [251067.665238] ? _raw_spin_unlock+0x15/0x30 [251067.665837] ? record_root_in_trans+0xa2/0xd0 [btrfs] [251067.666531] btrfs_mksubvol+0x330/0x580 [btrfs] [251067.667145] btrfs_mksnapshot+0x74/0xa0 [btrfs] [251067.667827] __btrfs_ioctl_snap_create+0x194/0x1d0 [btrfs] [251067.668595] btrfs_ioctl_snap_create_v2+0x107/0x130 [btrfs] [251067.669479] btrfs_ioctl+0x1580/0x2690 [btrfs] [251067.670093] ? count_memcg_events+0x6d/0x180 [251067.670849] ? handle_mm_fault+0x1a0/0x2a0 [251067.671652] __x64_sys_ioctl+0x92/0xe0 [251067.672406] do_syscall_64+0x50/0xf20 [251067.673129] entry_SYSCALL_64_after_hwframe+0x76/0x7e [251067.674096] RIP: 0033:0x7f2a495648db [251067.674812] Code: 00 48 89 (...) [251067.678227] RSP: 002b:00007ffc5aa57840 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [251067.679691] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2a495648db [251067.681145] RDX: 00007ffc5aa588b0 RSI: 0000000050009417 RDI: 0000000000000004 [251067.682511] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 [251067.683842] R10: 000000000000000a R11: 0000000000000246 R12: 00007ffc5aa59910 [251067.685176] R13: 00007ffc5aa588b0 R14: 0000000000000004 R15: 0000000000000006 [251067.686524] </TASK> [251067.686972] ---[ end trace 0000000000000000 ]--- [251067.687890] BTRFS: error (device sdi state A) in create_pending_snapshot:1907: errno=-75 unknown [251067.689049] BTRFS info (device sdi state EA): forced readonly [251067.689054] BTRFS warning (device sdi state EA): Skipping commit of aborted transaction. [251067.690119] BTRFS: error (device sdi state EA) in cleanup_transaction:2043: errno=-75 unknown [251067.702028] BTRFS info (device sdi state EA): last unmount of filesystem 46dc3975-30a2-4a69-a18f-418b859cccda Fix this by ignoring -EOVERFLOW errors from btrfs_uuid_tree_add() in the snapshot creation code when attempting to add the BTRFS_UUID_KEY_RECEIVED_SUBVOL item. This is OK because it's not critical and we are still able to delete the snapshot, as snapshot/subvolume deletion ignores if a BTRFS_UUID_KEY_RECEIVED_SUBVOL is missing (see inode.c:btrfs_delete_subvolume()). As for send/receive, we can still do send/receive operations since it always peeks the first root ID in the existing BTRFS_UUID_KEY_RECEIVED_SUBVOL (it could peek any since all snapshots have the same content), and even if the key is missing, it falls back to searching by BTRFS_UUID_KEY_SUBVOL key. A test case for fstests will be sent soon. Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: fix transaction abort on file creation due to name hash collisionFilipe Manana-0/+19
If we attempt to create several files with names that result in the same hash, we have to pack them in same dir item and that has a limit inherent to the leaf size. However if we reach that limit, we trigger a transaction abort and turns the filesystem into RO mode. This allows for a malicious user to disrupt a system, without the need to have administration privileges/capabilities. Reproducer: $ cat exploit-hash-collisions.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster and require fewer file # names that result in hash collision. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # List of names that result in the same crc32c hash for btrfs. declare -a names=( 'foobar' '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC' 'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z' 'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4' 'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:' 'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO' 'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us' 'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY' 'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO' 'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU' 'Ono7avN5GjC:_6dBJ_' 'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am' 'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k' 'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2' 'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd' 'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm' 'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ' 'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky' 'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS' 'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz' 'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu' 'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN' 'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4=' 'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn' 'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C' 'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW' '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc' 'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mCaaKGxb990jzaagRktDTyp' '9hD2ApKa_t_7x-a@GCG28kY:7$M@5udI1myQ$x5udtggvagmCQcq9QXWRC5hoB0o-_zHQUqZI5rMcz_kbMgvN5jr63LeYA4Cj-c6F5Ugmx6DgVf@2Jqm%MafecpgooqreJ53P-QTS' ) # Now create files with all those names in the same parent directory. # It should not fail since a 4K leaf has enough space for them. for name in "${names[@]}"; do touch $MNT/$name done # Now add one more file name that causes a crc32c hash collision. # This should fail, but it should not turn the filesystem into RO mode # (which could be exploited by malicious users) due to a transaction # abort. touch $MNT/'W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt' # Check that we are able to create another file, with a name that does not cause # a crc32c hash collision. echo -n "hello world" > $MNT/baz # Unmount and mount again, verify file baz exists and with the right content. umount $MNT mount $DEV $MNT echo "File baz content: $(cat $MNT/baz)" umount $MNT When running the reproducer: $ ./exploit-hash-collisions.sh (...) touch: cannot touch '/mnt/sdi/W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt': Value too large for defined data type ./exploit-hash-collisions.sh: line 57: /mnt/sdi/baz: Read-only file system cat: /mnt/sdi/baz: No such file or directory File baz content: And the transaction abort stack trace in dmesg/syslog: $ dmesg (...) [758240.509761] ------------[ cut here ]------------ [758240.510668] BTRFS: Transaction aborted (error -75) [758240.511577] WARNING: fs/btrfs/inode.c:6854 at btrfs_create_new_inode+0x805/0xb50 [btrfs], CPU#6: touch/888644 [758240.513513] Modules linked in: btrfs dm_zero (...) [758240.523221] CPU: 6 UID: 0 PID: 888644 Comm: touch Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full) [758240.524621] Tainted: [W]=WARN [758240.525037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [758240.526331] RIP: 0010:btrfs_create_new_inode+0x80b/0xb50 [btrfs] [758240.527093] Code: 0f 82 cf (...) [758240.529211] RSP: 0018:ffffce64418fbb48 EFLAGS: 00010292 [758240.529935] RAX: 00000000ffffffd3 RBX: 0000000000000000 RCX: 00000000ffffffb5 [758240.531040] RDX: 0000000d04f33e06 RSI: 00000000ffffffb5 RDI: ffffffffc0919dd0 [758240.531920] RBP: ffffce64418fbc10 R08: 0000000000000000 R09: 00000000ffffffb5 [758240.532928] R10: 0000000000000000 R11: ffff8e52c0000000 R12: ffff8e53eee7d0f0 [758240.533818] R13: ffff8e57f70932a0 R14: ffff8e5417629568 R15: 0000000000000000 [758240.534664] FS: 00007f1959a2a740(0000) GS:ffff8e5b27cae000(0000) knlGS:0000000000000000 [758240.535821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [758240.536644] CR2: 00007f1959b10ce0 CR3: 000000012a2cc005 CR4: 0000000000370ef0 [758240.537517] Call Trace: [758240.537828] <TASK> [758240.538099] btrfs_create_common+0xbf/0x140 [btrfs] [758240.538760] path_openat+0x111a/0x15b0 [758240.539252] do_filp_open+0xc2/0x170 [758240.539699] ? preempt_count_add+0x47/0xa0 [758240.540200] ? __virt_addr_valid+0xe4/0x1a0 [758240.540800] ? __check_object_size+0x1b3/0x230 [758240.541661] ? alloc_fd+0x118/0x180 [758240.542315] do_sys_openat2+0x70/0xd0 [758240.543012] __x64_sys_openat+0x50/0xa0 [758240.543723] do_syscall_64+0x50/0xf20 [758240.544462] entry_SYSCALL_64_after_hwframe+0x76/0x7e [758240.545397] RIP: 0033:0x7f1959abc687 [758240.546019] Code: 48 89 fa (...) [758240.548522] RSP: 002b:00007ffe16ff8690 EFLAGS: 00000202 ORIG_RAX: 0000000000000101 [758240.566278] RAX: ffffffffffffffda RBX: 00007f1959a2a740 RCX: 00007f1959abc687 [758240.567068] RDX: 0000000000000941 RSI: 00007ffe16ffa333 RDI: ffffffffffffff9c [758240.567860] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [758240.568707] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000561eec7c4b90 [758240.569712] R13: 0000561eec7c311f R14: 00007ffe16ffa333 R15: 0000000000000000 [758240.570758] </TASK> [758240.571040] ---[ end trace 0000000000000000 ]--- [758240.571681] BTRFS: error (device sdi state A) in btrfs_create_new_inode:6854: errno=-75 unknown [758240.572899] BTRFS info (device sdi state EA): forced readonly Fix this by checking for hash collision, and if the adding a new name is possible, early in btrfs_create_new_inode() before we do any tree updates, so that we don't need to abort the transaction if we cannot add the new name due to the leaf size limit. A test case for fstests will be sent soon. Fixes: caae78e03234 ("btrfs: move common inode creation code into btrfs_create_new_inode()") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: read key again after incrementing slot in move_existing_remaps()Mark Harmstone-0/+2
Fix move_existing_remaps() so that if we increment the slot because the key we encounter isn't a REMAP_BACKREF, we don't reuse the objectid and offset of the old item. Link: https://lore.kernel.org/linux-btrfs/20260125123908.2096548-1-clm@meta.com/ Reported-by: Chris Mason <clm@fb.com> Fixes: bbea42dfb91f ("btrfs: move existing remaps before relocating block group") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: add missing RCU unlock in error path in ↵Bart Van Assche-0/+1
try_release_subpage_extent_buffer() Call rcu_read_lock() before exiting the loop in try_release_subpage_extent_buffer() because there is a rcu_read_unlock() call past the loop. This has been detected by the Clang thread-safety analyzer. Fixes: ad580dfa388f ("btrfs: fix subpage deadlock in try_release_subpage_extent_buffer()") CC: stable@vger.kernel.org # 6.18+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol createBoris Burkov-0/+7
We have recently observed a number of subvolumes with broken dentries. ls-ing the parent dir looks like: drwxrwxrwt 1 root root 16 Jan 23 16:49 . drwxr-xr-x 1 root root 24 Jan 23 16:48 .. d????????? ? ? ? ? ? broken_subvol and similarly stat-ing the file fails. In this state, deleting the subvol fails with ENOENT, but attempting to create a new file or subvol over it errors out with EEXIST and even aborts the fs. Which leaves us a bit stuck. dmesg contains a single notable error message reading: "could not do orphan cleanup -2" 2 is ENOENT and the error comes from the failure handling path of btrfs_orphan_cleanup(), with the stack leading back up to btrfs_lookup(). btrfs_lookup btrfs_lookup_dentry btrfs_orphan_cleanup // prints that message and returns -ENOENT After some detailed inspection of the internal state, it became clear that: - there are no orphan items for the subvol - the subvol is otherwise healthy looking, it is not half-deleted or anything, there is no drop progress, etc. - the subvol was created a while ago and does the meaningful first btrfs_orphan_cleanup() call that sets BTRFS_ROOT_ORPHAN_CLEANUP much later. - after btrfs_orphan_cleanup() fails, btrfs_lookup_dentry() returns -ENOENT, which results in a negative dentry for the subvolume via d_splice_alias(NULL, dentry), leading to the observed behavior. The bug can be mitigated by dropping the dentry cache, at which point we can successfully delete the subvolume if we want. i.e., btrfs_lookup() btrfs_lookup_dentry() if (!sb_rdonly(inode->vfs_inode)->vfs_inode) btrfs_orphan_cleanup(sub_root) test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) btrfs_search_slot() // finds orphan item for inode N ... prints "could not do orphan cleanup -2" if (inode == ERR_PTR(-ENOENT)) inode = NULL; return d_splice_alias(NULL, dentry) // NEGATIVE DENTRY for valid subvolume btrfs_orphan_cleanup() does test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) on the root when it runs, so it cannot run more than once on a given root, so something else must run concurrently. However, the obvious routes to deleting an orphan when nlinks goes to 0 should not be able to run without first doing a lookup into the subvolume, which should run btrfs_orphan_cleanup() and set the bit. The final important observation is that create_subvol() calls d_instantiate_new() but does not set BTRFS_ROOT_ORPHAN_CLEANUP, so if the dentry cache gets dropped, the next lookup into the subvolume will make a real call into btrfs_orphan_cleanup() for the first time. This opens up the possibility of concurrently deleting the inode/orphan items but most typical evict() paths will be holding a reference on the parent dentry (child dentry holds parent->d_lockref.count via dget in d_alloc(), released in __dentry_kill()) and prevent the parent from being removed from the dentry cache. The one exception is delayed iputs. Ordered extent creation calls igrab() on the inode. If the file is unlinked and closed while those refs are held, iput() in __dentry_kill() decrements i_count but does not trigger eviction (i_count > 0). The child dentry is freed and the subvol dentry's d_lockref.count drops to 0, making it evictable while the inode is still alive. Since there are two races (the race between writeback and unlink and the race between lookup and delayed iputs), and there are too many moving parts, the following three diagrams show the complete picture. (Only the second and third are races) Phase 1: Create Subvol in dentry cache without BTRFS_ROOT_ORPHAN_CLEANUP set btrfs_mksubvol() lookup_one_len() __lookup_slow() d_alloc_parallel() __d_alloc() // d_lockref.count = 1 create_subvol(dentry) // doesn't touch the bit.. d_instantiate_new(dentry, inode) // dentry in cache with d_lockref.count == 1 Phase 2: Create a delayed iput for a file in the subvol but leave the subvol in state where its dentry can be evicted (d_lockref.count == 0) T1 (task) T2 (writeback) T3 (OE workqueue) write() // dirty pages btrfs_writepages() btrfs_run_delalloc_range() cow_file_range() btrfs_alloc_ordered_extent() igrab() // i_count: 1 -> 2 btrfs_unlink_inode() btrfs_orphan_add() close() __fput() dput() finish_dput() __dentry_kill() dentry_unlink_inode() iput() // 2 -> 1 --parent->d_lockref.count // 1 -> 0; evictable finish_ordered_fn() btrfs_finish_ordered_io() btrfs_put_ordered_extent() btrfs_add_delayed_iput() Phase 3: Once the delayed iput is pending and the subvol dentry is evictable, the shrinker can free it, causing the next lookup to go through btrfs_lookup() and call btrfs_orphan_cleanup() for the first time. If the cleaner kthread processes the delayed iput concurrently, the two race: T1 (shrinker) T2 (cleaner kthread) T3 (lookup) super_cache_scan() prune_dcache_sb() __dentry_kill() // subvol dentry freed btrfs_run_delayed_iputs() iput() // i_count -> 0 evict() // sets I_FREEING btrfs_evict_inode() // truncation loop btrfs_lookup() btrfs_lookup_dentry() btrfs_orphan_cleanup() // first call (bit never set) btrfs_iget() // blocks on I_FREEING btrfs_orphan_del() // inode freed // returns -ENOENT btrfs_del_orphan_item() // -ENOENT // "could not do orphan cleanup -2" d_splice_alias(NULL, dentry) // negative dentry for valid subvol The most straightforward fix is to ensure the invariant that a dentry for a subvolume can exist if and only if that subvolume has BTRFS_ROOT_ORPHAN_CLEANUP set on its root (and is known to have no orphans or ran btrfs_orphan_cleanup()). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread startJohannes Thumshirn-1/+6
btrfs_zoned_reserve_data_reloc_bg() is called on each mount of a file system and allocates a new block-group, to assign it to be the dedicated relocation target, if no pre-existing usable block-group for this task is found. If for some reason the transaction is aborted, btrfs_end_transaction() will wake up the transaction kthread. But the transaction kthread is not yet initialized at the time btrfs_zoned_reserve_data_reloc_bg() is called, leading to the following NULL-pointer dereference: RSP: 0018:ffffc9000c617c98 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 000000000000073c RCX: 0000000000000002 RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000001 RBP: 0000000000000207 R08: ffffffff8223c71d R09: 0000000000000635 R10: ffff888108588000 R11: 0000000000000003 R12: 0000000000000003 R13: 000000000000073c R14: 0000000000000000 R15: ffff888114dd6000 FS: 00007f2993745840(0000) GS:ffff8882b508d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000073c CR3: 0000000121a82006 CR4: 0000000000770eb0 PKRU: 55555554 Call Trace: <TASK> try_to_wake_up (./include/linux/spinlock.h:557 kernel/sched/core.c:4106) __btrfs_end_transaction (fs/btrfs/transaction.c:1115 (discriminator 2)) btrfs_zoned_reserve_data_reloc_bg (fs/btrfs/zoned.c:2840) open_ctree (fs/btrfs/disk-io.c:3588) btrfs_get_tree.cold (fs/btrfs/super.c:982 fs/btrfs/super.c:1944 fs/btrfs/super.c:2087 fs/btrfs/super.c:2121) vfs_get_tree (fs/super.c:1752) __do_sys_fsconfig (fs/fsopen.c:231 fs/fsopen.c:295 fs/fsopen.c:473) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) RIP: 0033:0x7f299392740e Move the call to btrfs_zoned_reserve_data_reloc_bg() after the transaction_kthread has been initialized to fix this problem. Fixes: 694ce5e143d6 ("btrfs: zoned: reserve data_reloc block group on mount") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: hold space_info->lock when clearing periodic reclaim readySun YangKai-1/+4
btrfs_set_periodic_reclaim_ready() requires space_info->lock to be held, as enforced by lockdep_assert_held(). However, btrfs_reclaim_sweep() was calling it after do_reclaim_sweep() returns, at which point space_info->lock is no longer held. Fix this by explicitly acquiring space_info->lock before clearing the periodic reclaim ready flag in btrfs_reclaim_sweep(). Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-btrfs/20260208182556.891815-1-clm@meta.com/ Fixes: 19eff93dc738 ("btrfs: fix periodic reclaim condition") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: print-tree: add remap tree definitionsMark Harmstone-0/+10
Add the definitions for the remap tree to print-tree.c, so that we get more useful information if a tree is dumped to dmesg. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysfsverity: add dependency on 64K or smaller pagesEric Biggers-0/+3
Currently, all filesystems that support fsverity (ext4, f2fs, and btrfs) cache the Merkle tree in the pagecache at a 64K aligned offset after the end of the file data. This offset needs to be a multiple of the page size, which is guaranteed only when the page size is 64K or smaller. 64K was chosen to be the "largest reasonable page size". But it isn't the largest *possible* page size: the hexagon and powerpc ports of Linux support 256K pages, though that configuration is rarely used. For now, just disable support for FS_VERITY in these odd configurations to ensure it isn't used in cases where it would have incorrect behavior. Fixes: 671e67b47e9f ("fs-verity: add Kconfig and the helper functions for hashing") Reported-by: Christoph Hellwig <hch@lst.de> Closes: https://lore.kernel.org/r/20260119063349.GA643@lst.de Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20260221204525.30426-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
14 daysMerge tag 'nfsd-7.0-1' of ↵Linus Torvalds-14/+15
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Restore previous nfsd thread count reporting behavior - Fix credential reference leaks in the NFSD netlink admin protocol * tag 'nfsd-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: report the requested maximum number of threads instead of number running nfsd: Fix cred ref leak in nfsd_nl_listener_set_doit(). nfsd: Fix cred ref leak in nfsd_nl_threads_set_doit().
2026-03-01smb: update some doc referencesZhangGuoDong-3/+9
To make it easier to locate the documentation during development. Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-01smb/client: make SMB2 maperror KUnit tests a separate moduleChenXiaoSong-16/+38
Build the SMB2 maperror KUnit tests as `smb2maperror_test.ko`. Link: https://lore.kernel.org/linux-cifs/20260210081040.4156383-1-geert@linux-m68k.org/ Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-01Merge tag 'sched-urgent-2026-03-01' of ↵Linus Torvalds-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix zero_vruntime tracking when there's a single task running - Fix slice protection logic - Fix the ->vprot logic for reniced tasks - Fix lag clamping in mixed slice workloads - Fix objtool uaccess warning (and bug) in the !CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining, which triggers with older compilers - Fix a comment in the rseq registration rseq_size bound check code - Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes differently, which special size we now reached naturally and want to avoid. The visible ugliness of the new reserved field will be avoided the next time the RSEQ area is extended. * tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: slice ext: Ensure rseq feature size differs from original rseq size rseq: Clarify rseq registration rseq_size bound check comment sched/core: Fix wakeup_preempt's next_class tracking rseq: Mark rseq_arm_slice_extension_timer() __always_inline sched/fair: Fix lag clamp sched/eevdf: Update se->vprot in reweight_entity() sched/fair: Only set slice protection at pick time sched/fair: Fix zero_vruntime tracking
2026-02-28Merge tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds-445/+489
Pull smb client fixes from Steve French: - Two multichannel fixes - Locking fix for superblock flags - Fix to remove debug message that could log password - Cleanup fix for setting credentials * tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: Use snprintf in cifs_set_cifscreds smb: client: Don't log plaintext credentials in cifs_set_cifscreds smb: client: fix broken multichannel with krb5+signing smb: client: use atomic_t for mnt_cifs_flags smb: client: fix cifs_pick_channel when channels are equally loaded
2026-02-27nsfs: tighten permission checks for handle openingChristian Brauner-1/+1
Even privileged services should not necessarily be able to see other privileged service's namespaces so they can't leak information to each other. Use may_see_all_namespaces() helper that centralizes this policy until the nstree adapts. Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-2-d2c2853313bd@kernel.org Fixes: 5222470b2fbb ("nsfs: support file handles") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@kernel.org # v6.18+ Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-27nsfs: tighten permission checks for ns iteration ioctlsChristian Brauner-0/+13
Even privileged services should not necessarily be able to see other privileged service's namespaces so they can't leak information to each other. Use may_see_all_namespaces() helper that centralizes this policy until the nstree adapts. Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-1-d2c2853313bd@kernel.org Fixes: a1d220d9dafa ("nsfs: iterate through mount namespaces") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@kernel.org # v6.12+ Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-27NFS: Fix NFS KConfig typosAnna Schumaker-1/+2
Two issues were noticed after the NFS v4.0 KConfig changes were merged upstream. First, the text of CONFIG_NFS_V4 should not encourage people to select it if they are unsure. Second, the new CONFIG_NFS_V4_0 option should default to "on" instead of "off" to avoid breaking people's setups if they are using NFS v4.0. Reported-by: Niklas Cassel <cassel@kernel.org> Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: 4e0269352534 ("NFS: Add a way to disable NFS v4.0 via KConfig") Fixes: 7537db24806f ("NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-27Merge tag 'xfs-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds-80/+209
Pull xfs fixes from Carlos Maiolino: "Nothing reeeally stands out here: a few bug fixes, some refactoring to easily fit the bug fixes, and a couple cosmetic changes" * tag 'xfs-fixes-7.0-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: add static size checks for ioctl UABI xfs: remove duplicate static size checks xfs: Add comments for usages of some macros. xfs: Update lazy counters in xfs_growfs_rt_bmblock() xfs: Add a comment in xfs_log_sb() xfs: Fix xfs_last_rt_bmblock() xfs: don't report half-built inodes to fserror xfs: don't report metadata inodes to fserror xfs: fix potential pointer access race in xfs_healthmon_get xfs: fix xfs_group release bug in xfs_dax_notify_dev_failure xfs: fix xfs_group release bug in xfs_verify_report_losses xfs: fix copy-paste error in previous fix xfs: Fix error pointer dereference xfs: remove metafile inodes from the active inode stat xfs: cleanup inode counter stats xfs: fix code alignment issues in xfs_ondisk.c xfs: Replace &rtg->rtg_group with rtg_group() xfs: Refactoring the nagcount and delta calculation xfs: Replace ASSERT with XFS_IS_CORRUPT in xfs_rtcopy_summary()
2026-02-27smb: client: Use snprintf in cifs_set_cifscredsThorsten Blum-7/+7
Replace unbounded sprintf() calls with the safer snprintf(). Avoid using magic numbers and use strlen() to calculate the key descriptor buffer size. Save the size in a local variable and reuse it for the bounded snprintf() calls. Remove CIFSCREDS_DESC_SIZE. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-26smb: client: Don't log plaintext credentials in cifs_set_cifscredsThorsten Blum-1/+0
When debug logging is enabled, cifs_set_cifscreds() logs the key payload and exposes the plaintext username and password. Remove the debug log to avoid exposing credentials. Fixes: 8a8798a5ff90 ("cifs: fetch credentials out of keyring for non-krb5 auth multiuser mounts") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-26smb: client: fix broken multichannel with krb5+signingPaulo Alcantara-12/+10
When mounting a share with 'multichannel,max_channels=n,sec=krb5i', the client was duplicating signing key for all secondary channels, thus making the server fail all commands sent from secondary channels due to bad signatures. Every channel has its own signing key, so when establishing a new channel with krb5 auth, make sure to use the new session key as the derived key to generate channel's signing key in SMB2_auth_kerberos(). Repro: $ mount.cifs //srv/share /mnt -o multichannel,max_channels=4,sec=krb5i $ sleep 5 $ umount /mnt $ dmesg ... CIFS: VFS: sign fail cmd 0x5 message id 0x2 CIFS: VFS: \\srv SMB signature verification returned error = -13 CIFS: VFS: sign fail cmd 0x5 message id 0x2 CIFS: VFS: \\srv SMB signature verification returned error = -13 CIFS: VFS: sign fail cmd 0x4 message id 0x2 CIFS: VFS: \\srv SMB signature verification returned error = -13 Reported-by: Xiaoli Feng <xifeng@redhat.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-26smb: client: use atomic_t for mnt_cifs_flagsPaulo Alcantara-414/+462
Use atomic_t for cifs_sb_info::mnt_cifs_flags as it's currently accessed locklessly and may be changed concurrently in mount/remount and reconnect paths. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-26Merge tag 'v7.0-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds-5/+9
Pull smb server fixes from Steve French: - auth security improvement - fix potential buffer overflow in smbdirect negotiation * tag 'v7.0-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix signededness bug in smb_direct_prepare_negotiation() ksmbd: Compare MACs in constant time
2026-02-26Merge tag 'mm-hotfixes-stable-2026-02-26-14-14' of ↵Linus Torvalds-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "12 hotfixes. 7 are cc:stable. 8 are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-02-26-14-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: update Yosry Ahmed's email address mailmap: add entry for Daniele Alessandrelli mm: fix NULL NODE_DATA dereference for memoryless nodes on boot mm/tracing: rss_stat: ensure curr is false from kthread context mm/kfence: fix KASAN hardware tag faults during late enablement mm/damon/core: disallow non-power of two min_region_sz Squashfs: check metadata block offset is within range MAINTAINERS, mailmap: update e-mail address for Vlastimil Babka liveupdate: luo_file: remember retrieve() status mm: thp: deny THP for files on anonymous inodes mm: change vma_alloc_folio_noprof() macro to inline function mm/kfence: disable KFENCE upon KASAN HW tags enablement