linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Lines
2026-04-13	Merge tag 'vfs-7.1-rc1.bh.metadata' of ↵	Linus Torvalds	-7/+25
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs buffer_head updates from Christian Brauner: "This cleans up the mess that has accumulated over the years in metadata buffer_head tracking for inodes. It moves the tracking into dedicated structure in filesystem-private part of the inode (so that we don't use private_list, private_data, and private_lock in struct address_space), and also moves couple other users of private_data and private_list so these are removed from struct address_space saving 3 longs in struct inode for 99% of inodes" * tag 'vfs-7.1-rc1.bh.metadata' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) fs: Drop i_private_list from address_space fs: Drop mapping_metadata_bhs from address space ext4: Track metadata bhs in fs-private inode part minix: Track metadata bhs in fs-private inode part udf: Track metadata bhs in fs-private inode part fat: Track metadata bhs in fs-private inode part bfs: Track metadata bhs in fs-private inode part affs: Track metadata bhs in fs-private inode part ext2: Track metadata bhs in fs-private inode part fs: Provide functions for handling mapping_metadata_bhs directly fs: Switch inode_has_buffers() to take mapping_metadata_bhs fs: Make bhs point to mapping_metadata_bhs fs: Move metadata bhs tracking to a separate struct fs: Fold fsync_buffers_list() into sync_mapping_buffers() fs: Drop osync_buffers_list() kvm: Use private inode list instead of i_private_list fs: Remove i_private_data aio: Stop using i_private_data and i_private_lock hugetlbfs: Stop using i_private_data fs: Stop using i_private_data for metadata bh tracking ...
2026-04-13	Merge tag 'vfs-7.1-rc1.kino' of ↵	Linus Torvalds	-5/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
2026-03-26	minix: Track metadata bhs in fs-private inode part	Jan Kara	-8/+24
	Track metadata bhs for an inode in fs-private part of the inode. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-81-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26	minix: Sync and invalidate metadata buffers from minix_evict_inode()	Jan Kara	-0/+2
	There are only very few filesystems using generic metadata buffer head tracking and everybody is paying the overhead. When we remove this tracking for inode reclaim code .evict will start to see inodes with metadata buffers attached so write them out and prune them. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-59-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26	minix: Switch to generic_buffers_fsync()	Jan Kara	-2/+2
	Minix uses list of metadata bhs attached to an inode. Switch it to generic_buffers_fsync() instead of generic_file_fsync() as we'll be removing metadata bh handling from generic_file_fsync(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-52-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-06	treewide: change inode->i_ino from unsigned long to u64	Jeff Layton	-5/+5
	On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-25	Merge tag 'vfs-7.0-rc2.fixes' of ↵	Linus Torvalds	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix an uninitialized variable in file_getattr(). The flags_valid field wasn't initialized before calling vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is not enabled. sysctl_hung_task_timeout_secs is 0 in that case causing spurious "waiting for writeback completion for more than 1 seconds" warnings - Fix a null-ptr-deref in do_statmount() when the mount is internal - Add missing kernel-doc description for the @private parameter in iomap_readahead() - Fix mount namespace creation to hold namespace_sem across the mount copy in create_new_namespace(). The previous drop-and-reacquire pattern was fragile and failed to clean up mount propagation links if the real rootfs was a shared or dependent mount - Fix /proc mount iteration where m->index wasn't updated when m->show() overflows, causing a restart to repeatedly show the same mount entry in a rapidly expanding mount table - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the inode number is out of range - Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared. copy_mnt_ns() received the live fs_struct so if a subsequent namespace creation failed the rollback would leave pwd and root pointing to detached mounts. Always allocate a new fs_struct when CLONE_NEWNS is requested - fserror bug fixes: - Remove the unused fsnotify_sb_error() helper now that all callers have been converted to fserror_report_metadata - Fix a lockdep splat in fserror_report() where igrab() takes inode::i_lock which can be held in IRQ context. Replace igrab() with a direct i_count bump since filesystems should not report inodes that are about to be freed or not yet exposed - Handle error pointer in procfs for try_lookup_noperm() - Fix an integer overflow in ep_loop_check_proc() where recursive calls returning INT_MAX would overflow when +1 is added, breaking the recursion depth check - Fix a misleading break in pidfs * tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pidfs: avoid misleading break eventpoll: Fix integer overflow in ep_loop_check_proc() proc: Fix pointer error dereference fserror: fix lockdep complaint when igrabbing inode fsnotify: drop unused helper unshare: fix unshare_fs() handling minix: Correct errno in minix_new_inode namespace: fix proc mount iteration mount: hold namespace_sem across copy in create_new_namespace() iomap: Describe @private in iomap_readahead() statmount: Fix the null-ptr-deref in do_statmount() writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK fs: init flags_valid before calling vfs_fileattr_get
2026-02-21	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument	Linus Torvalds	-1/+1
	This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	treewide: Replace kmalloc with kmalloc_obj for non-scalar types	Kees Cook	-1/+1
	This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-18	minix: Correct errno in minix_new_inode	Jori Koolstra	-1/+1
	The cases (!j \|\| j > sbi->s_ninodes) can never occur unless the filesystem is broken, so this should not return ENOSPC, but EFSCORRUPTED. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251201122338.90568-1-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-09	Merge tag 'vfs-7.0-rc1.minix' of ↵	Linus Torvalds	-21/+29
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull minix update from Christian Brauner: "Consolidate and strengthen superblock validation in minix_check_superblock() The minix filesystem driver does not validate several superblock fields before using them during mount, allowing a crafted filesystem image to trigger out-of-bounds accesses (reported by syzbot)" * tag 'vfs-7.0-rc1.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: minix: Add required sanity checking to minix_check_superblock()
2026-01-19	minix: Add required sanity checking to minix_check_superblock()	Jori Koolstra	-21/+29
	The fs/minix implementation of the minix filesystem does not currently support any other value for s_log_zone_size than 0. This is also the only value supported in util-linux; see mkfs.minix.c line 511. In addition, this patch adds some sanity checking for the other minix superblock fields, and moves the minix_blocks_needed() checks for the zmap and imap also to minix_check_super_block(). This also closes a related syzbot bug report. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251208153947.108343-1-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+5ad0824204c7bf9b67f2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5ad0824204c7bf9b67f2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13	uapi: promote EFSCORRUPTED and EUCLEAN to errno.h	Darrick J. Wong	-2/+0
	Stop definining these privately and instead move them to the uapi errno.h so that they become canonical instead of copy pasta. Cc: linux-api@vger.kernel.org Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402587.3490369.17659117524205214600.stgit@frogsfrogsfrogs Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-01	Merge tag 'vfs-6.19-rc1.minix' of ↵	Linus Torvalds	-7/+57
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull minix fixes from Christian Brauner: "Fix two syzbot corruption bugs in the minix filesystem. Syzbot fuzzes filesystems by trying to mount and manipulate deliberately corrupted images. This should not lead to BUG_ONs and WARN_ONs for easy to detect corruptions. - Add error handling to minix filesystem for inode corruption detection, enabling the filesystem to report such corruptions cleanly. - Fix a drop_nlink warning in minix_rmdir() triggered by corrupted directory link counts. - Fix a drop_nlink warning in minix_rename() triggered by corrupted inode link counts" * tag 'vfs-6.19-rc1.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Fix a drop_nlink warning in minix_rename Fix a drop_nlink warning in minix_rmdir Add error handling to minix filesystem for inode corruption detection
2025-11-05	Fix a drop_nlink warning in minix_rename	Jori Koolstra	-0/+16
	Syzbot found a drop_nlink warning that is triggered by an easy to detect nlink corruption. This patch adds sanity checks to minix_unlink and minix_rename to prevent the warning and instead return EFSCORRUPTED to the caller. The changes were tested using the syzbot reproducer as well as local testing. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251104143005.3283980-4-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+a65e824272c5f741247d@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=a65e824272c5f741247d Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-05	Fix a drop_nlink warning in minix_rmdir	Jori Koolstra	-8/+17
	Syzbot found a drop_nlink warning that is triggered by an easy to detect nlink corruption of a directory. This patch adds a sanity check to minix_rmdir to prevent the warning and instead return EFSCORRUPTED to the caller. The changes were tested using the syzbot reproducer as well as local testing. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251104143005.3283980-3-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+4e49728ec1cbaf3b91d2@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=4e49728ec1cbaf3b91d2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-05	Add error handling to minix filesystem for inode corruption detection	Jori Koolstra	-0/+25
	We would like to provide early and specific warnings of filesystem corruption without running into generic WARN_ONs and BUG_ONs. Towards this goal, ext4, e.g., has a EFSCORRUPTED errno and a standardized inode corruption message format. This patch adds this errno and message format to the minix filesystem. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251104143005.3283980-2-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20	Coccinelle-based conversion to use ->i_state accessors	Mateusz Guzik	-1/+1
	All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 \| flag2) @@ expression inode, flags; @@ - inode->i_state \|= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-19	minixfs: Verify inode mode when loading from disk	Tetsuo Handa	-1/+7
	The inode mode loaded from corrupted disk can be invalid. Do like what commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk") does. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/ec982681-84b8-4624-94fa-8af15b77cbd2@I-love.SAKURA.ne.jp Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-28	Merge tag 'vfs-6.17-rc1.mmap_prepare' of ↵	Linus Torvalds	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_mmap() users to .mmap_prepare() fs: convert simple use of generic_file__mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-16	fs: change write_begin/write_end interface to take struct kiocb *	Taotao Chen	-3/+4
	Change the address_space_operations callbacks write_begin() and write_end() to take struct kiocb * as the first argument instead of struct file *. Update all affected function prototypes, implementations, call sites, and related documentation across VFS, filesystems, and block layer. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-24	fs: Remove three arguments from block_write_end()	Matthew Wilcox (Oracle)	-1/+1
	block_write_end() looks like it can be used as a ->write_end() implementation. However, it can't as it does not unlock nor put the folio. Since it does not use the 'file', 'mapping' nor 'fsdata' arguments, remove them. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/20250624132130.1590285-1-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-17	fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()	Lorenzo Stoakes	-1/+1
	Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"), the f_op->mmap() hook has been deprecated in favour of f_op->mmap_prepare(). We have provided generic .mmap_prepare() equivalents, so update all file systems that specify these directly in their file_operations structures. This updates 9p, adfs, affs, bfs, fat, hfs, hfsplus, hostfs, hpfs, jffs2, jfs, minix, omfs, ramfs and ufs file systems directly. It updates generic_ro_fops which impacts qnx4, cramfs, befs, squashfs, frebxfs, qnx6, efs, romfs, erofs and isofs file systems. There are remaining file systems which use generic hooks in a less direct way which we address in a subsequent commit. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/c7dc90e44a9e75e750939ea369290d6e441a18e6.1750099179.git.lorenzo.stoakes@oracle.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27	Change inode_operations.mkdir to return struct dentry *	NeilBrown	-4/+4
	Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	buffer: Convert __block_write_begin() to take a folio	Matthew Wilcox (Oracle)	-1/+1
	Almost all callers have a folio now, so change __block_write_begin() to take a folio and remove a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	fs: Convert aops->write_begin to take a folio	Matthew Wilcox (Oracle)	-2/+2
	Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	buffer: Convert block_write_end() to take a folio	Matthew Wilcox (Oracle)	-1/+1
	All callers now have a folio, so pass it in instead of converting from a folio to a page and back to a folio again. Saves a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	minixfs: Convert dir_commit_chunk() to take a folio	Matthew Wilcox (Oracle)	-8/+8
	All callers now have a folio, so pass it in. Saves a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	minixfs: Convert minix_prepare_chunk() to take a folio	Matthew Wilcox (Oracle)	-18/+18
	All callers now have a folio, so convert minix_prepare_chunk() to take one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	minixfs: Convert minix_make_empty() to use a folio	Matthew Wilcox (Oracle)	-9/+9
	Removes a few hidden calls to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	minixfs: Convert minix_delete_entry() to work on a folio	Matthew Wilcox (Oracle)	-10/+10
	Match ext2 and remove a few hidden calls to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	minixfs: Convert minix_set_link() and minix_dotdot() to take a folio	Matthew Wilcox (Oracle)	-20/+17
	This matches ext2 and removes a few hidden calls to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	minixfs: Convert minix_find_entry() to take a folio	Matthew Wilcox (Oracle)	-33/+30
	Remove a few hidden calls to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	minixfs: Convert dir_get_page() to dir_get_folio()	Matthew Wilcox (Oracle)	-31/+35
	Remove a few conversions between page and folio. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-15	Merge tag 'vfs-6.11.module.description' of ↵	Linus Torvalds	-0/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs module description updates from Christian Brauner: "This contains patches to add module descriptions to all modules under fs/ currently lacking them" * tag 'vfs-6.11.module.description' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: openpromfs: add missing MODULE_DESCRIPTION() macro fs: nls: add missing MODULE_DESCRIPTION() macros fs: autofs: add MODULE_DESCRIPTION() fs: fat: add missing MODULE_DESCRIPTION() macros fs: binfmt: add missing MODULE_DESCRIPTION() macros fs: cramfs: add MODULE_DESCRIPTION() fs: hfs: add MODULE_DESCRIPTION() fs: hpfs: add MODULE_DESCRIPTION() qnx4: add MODULE_DESCRIPTION() qnx6: add MODULE_DESCRIPTION() fs: sysv: add MODULE_DESCRIPTION() fs: efs: add MODULE_DESCRIPTION() fs: minix: add MODULE_DESCRIPTION()
2024-07-10	minixfs: Fix minixfs_rename with HIGHMEM	Matthew Wilcox (Oracle)	-2/+1
	minixfs now uses kmap_local_page(), so we can't call kunmap() to undo it. This one call was missed as part of the commit this fixes. Fixes: 6628f69ee66a (minixfs: Use dir_put_page() in minix_unlink() and minix_rename()) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240709195841.1986374-1-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-28	fs: minix: add MODULE_DESCRIPTION()	Jeff Johnson	-0/+1
	Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/minix/minix.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240525-md-fs-minix-v1-1-824800f78f7d@quicinc.com Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-26	minix: convert minix to use the new mount api	Bill O'Donnell	-18/+30
	Convert the minix filesystem to use the new mount API. Tested using mount and remount on minix device. Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> Link: https://lore.kernel.org/r/20240307163325.998723-1-bodonnel@redhat.com Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27	minix: remove SLAB_MEM_SPREAD flag usage	Chengming Zhou	-1/+1
	The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224134935.829715-1-chengming.zhou@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-11	Merge tag 'pull-minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds	-57/+38
	Pull minixfs updates from Al Viro: "minixfs kmap_local_page() switchover and related fixes - very similar to sysv series" * tag 'pull-minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: minixfs: switch to kmap_local_page() minixfs: Use dir_put_page() in minix_unlink() and minix_rename() minixfs: change the signature of dir_get_page() minixfs: use offset_in_page()
2023-12-29	minix: remove writepage implementation	Matthew Wilcox (Oracle)	-3/+6
	If the filesystem implements migrate_folio and writepages, there is no need for a writepage implementation. Link: https://lkml.kernel.org/r/20231215200245.748418-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-18	minixfs: switch to kmap_local_page()	Al Viro	-22/+16
	Again, a counterpart of Fabio's fs/sysv patch Reviewed-by: Fabio M. De Francesco <fabio.maria.de.francesco@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-18	minixfs: Use dir_put_page() in minix_unlink() and minix_rename()	Al Viro	-14/+9
	... rather than open-coding it there. Counterpart of the corresponding fs/sysv commit from Fabio's series... Reviewed-by: Fabio M. De Francesco <fabio.maria.de.francesco@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-18	minixfs: change the signature of dir_get_page()	Al Viro	-26/+20
	Change the signature of dir_get_page() in order to prepare this function to the conversion to the use of kmap_local_page(). Change also those call sites which are required to adjust to the new signature. Essentially a copy of the corresponding fs/sysv commit by Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Fabio M. De Francesco <fabio.maria.de.francesco@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-12-18	minixfs: use offset_in_page()	Al Viro	-5/+3
	It's cheaper and more idiomatic than subtracting page_address() of the corresponding page... Reviewed-by: Fabio M. De Francesco <fabio.maria.de.francesco@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-10-18	minix: convert to new timestamp accessors	Jeff Layton	-14/+13
	Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-48-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-29	Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux	Linus Torvalds	-0/+1
	Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
2023-08-09	fs: pass the request_mask to generic_fillattr	Jeff Layton	-1/+1
	generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-02	fs: add CONFIG_BUFFER_HEAD	Christoph Hellwig	-0/+1
	Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-24	kernfs: convert to ctime accessor functions	Jeff Layton	-16/+12
	In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-54-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>