summaryrefslogtreecommitdiffstats
path: root/fs/jfs
AgeCommit message (Collapse)AuthorLines
2026-04-15Merge tag 'jfs-7.1' of github.com:kleikamp/linux-shaggyLinus Torvalds-30/+344
Pull jfs updates from Dave Kleikamp: "More robust data integrity checking and some fixes" * tag 'jfs-7.1' of github.com:kleikamp/linux-shaggy: jfs: avoid -Wtautological-constant-out-of-range-compare warning again JFS: always load filesystem UUID during mount jfs: hold LOG_LOCK on umount to avoid null-ptr-deref jfs: Set the lbmDone flag at the end of lbmIODone jfs: fix corrupted list in dbUpdatePMap jfs: add dmapctl integrity check to prevent invalid operations jfs: add dtpage integrity check to prevent index/pointer overflows jfs: add dtroot integrity check to prevent index out-of-bounds
2026-03-16jfs: avoid -Wtautological-constant-out-of-range-compare warning againArnd Bergmann-5/+2
The comparison of an __s8 value against DTPAGEMAXSLOT is still trivially true, causing a harmless (default disabled) warning with clang: fs/jfs/jfs_dtree.c:4419:25: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 4419 | p->header.freelist >= DTPAGEMAXSLOT)) { | ~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~ I previously worked around two of these in commit 7833570dae83 ("jfs: avoid -Wtautological-constant-out-of-range-compare warning"), but now a new one has come up, so address the same way by dropping the redundant range check. Fixes: 119e448bb50a ("jfs: add dtpage integrity check to prevent index/pointer overflows") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11JFS: always load filesystem UUID during mountJoão Paredes-1/+2
The filesystem UUID was only being loaded into super_block sb when an external journal device was in use. When mounting without an external journal, the UUID remained unset, which prevented the computation of a filesystem ID (fsid), which could be confirmed via `stat -f -c "%i"` and thus user space could not use fanotify correctly. A missing filesystem ID causes fanotify to return ENODEV when marking the filesystem for events like FAN_CREATE, FAN_DELETE, FAN_MOVED_TO, and FAN_MOVED_FROM. As a result, applications relying on fanotify could not monitor these events on JFS filesystems without an external journal. Moved the UUID initialization so it is always performed during mount, ensuring the superblock UUID is consistently available. Signed-off-by: João Paredes <joaommp@yahoo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: hold LOG_LOCK on umount to avoid null-ptr-derefHelen Koike-9/+24
write_special_inodes() function iterate through the log->sb_list and access the sbi fields, which can be set to NULL concurrently by umount. Fix concurrency issue by holding LOG_LOCK and checking for NULL. Reported-by: syzbot+e14b1036481911ae4d77@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e14b1036481911ae4d77 Signed-off-by: Helen Koike <koike@igalia.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: Set the lbmDone flag at the end of lbmIODoneEdward Adam Davis-11/+7
In lbmRead(), the I/O event waited for by wait_event() finishes before it goes to sleep, and the lbmIODone() prematurely sets the flag to lbmDONE, thus ending the wait. This causes wait_event() to return before lbmREAD is cleared (because lbmDONE was set first), the premature return of wait_event() leads to the release of lbuf before lbmIODone() returns, thus triggering the use-after-free vulnerability reported in [1]. Moving the operation of setting the lbmDONE flag to after clearing lbmREAD in lbmIODone() avoids the use-after-free vulnerability reported in [1]. [1] BUG: KASAN: slab-use-after-free in rt_spin_lock+0x88/0x3e0 kernel/locking/spinlock_rt.c:56 Call Trace: blk_update_request+0x57e/0xe60 block/blk-mq.c:1007 blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1169 blk_complete_reqs block/blk-mq.c:1244 [inline] blk_done_softirq+0x10a/0x160 block/blk-mq.c:1249 Allocated by task 6101: lbmLogInit fs/jfs/jfs_logmgr.c:1821 [inline] lmLogInit+0x3d0/0x19e0 fs/jfs/jfs_logmgr.c:1269 open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline] lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069 jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257 jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532 Freed by task 6101: kfree+0x1bd/0x900 mm/slub.c:6876 lbmLogShutdown fs/jfs/jfs_logmgr.c:1864 [inline] lmLogInit+0x1137/0x19e0 fs/jfs/jfs_logmgr.c:1415 open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline] lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069 jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257 jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532 Reported-by: syzbot+1d38eedcb25a3b5686a7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1d38eedcb25a3b5686a7 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: fix corrupted list in dbUpdatePMapYun Zhou-2/+4
This patch resolves the "list_add corruption. next is NULL" Oops reported by syzkaller in dbUpdatePMap(). The root cause is uninitialized synclist nodes in struct metapage and struct TxBlock, plus improper list node removal using list_del() (which leaves nodes in an invalid state). This fixes the following Oops reported by syzkaller. list_add corruption. next is NULL. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:28! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 122 Comm: jfsCommit Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 RIP: 0010:__list_add_valid_or_report+0xc3/0x130 lib/list_debug.c:27 Code: 4c 89 f2 48 89 d9 e8 0c 88 a4 fc 90 0f 0b 48 c7 c7 20 de 3d 8b e8 fd 87 a4 fc 90 0f 0b 48 c7 c7 c0 de 3d 8b e8 ee 87 a4 fc 90 <0f> 0b 48 89 df e8 13 c3 7d fd 42 80 7c 2d 00 00 74 08 4c 89 e7 e8 RSP: 0018:ffffc9000395fa20 EFLAGS: 00010246 RAX: 0000000000000022 RBX: 0000000000000000 RCX: 270c5dfadb559700 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00000000000f0000 R08: 0000000000000000 R09: 0000000000000000 R10: dffffc0000000000 R11: fffff5200072bee9 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000004 R15: 1ffff92000632266 FS: 0000000000000000(0000) GS:ffff888126ef9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056341fdb86c0 CR3: 0000000040a18000 CR4: 00000000003526f0 Call Trace: <TASK> __list_add_valid include/linux/list.h:96 [inline] __list_add include/linux/list.h:158 [inline] list_add include/linux/list.h:177 [inline] dbUpdatePMap+0x7e4/0xeb0 fs/jfs/jfs_dmap.c:577 txAllocPMap+0x57d/0x6b0 fs/jfs/jfs_txnmgr.c:2426 txUpdateMap+0x81e/0x9c0 fs/jfs/jfs_txnmgr.c:2364 txLazyCommit fs/jfs/jfs_txnmgr.c:2665 [inline] jfs_lazycommit+0x3f1/0xa10 fs/jfs/jfs_txnmgr.c:2734 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Reported-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4d0a0feb49c5138cac46 Tested-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: add dmapctl integrity check to prevent invalid operationsYun Zhou-3/+111
Add check_dmapctl() to validate dmapctl structure integrity, focusing on preventing invalid operations caused by on-disk corruption. Key checks: - nleafs bounded by [0, LPERCTL] (maximum leaf nodes per dmapctl). - l2nleafs bounded by [0, L2LPERCTL] and consistent with nleafs (nleafs must be 2^l2nleafs). - leafidx must be exactly CTLLEAFIND (expected leaf index position). - height bounded by [0, L2LPERCTL >> 1] (valid tree height range). - budmin validity: NOFREE only if nleafs=0; otherwise >= BUDMIN. - Leaf nodes fit within stree array (leafidx + nleafs <= CTLTREESIZE). - Leaf node values are either non-negative or NOFREE. Invoked in dbAllocAG(), dbFindCtl(), dbAdjCtl() and dbExtendFS() when accessing dmapctl pages, catching corruption early before dmap operations trigger invalid memory access or logic errors. This fixes the following UBSAN warning. [58245.668090][T14017] ------------[ cut here ]------------ [58245.668103][T14017] UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2641:11 [58245.668119][T14017] shift exponent 110 is too large for 32-bit type 'int' [58245.668137][T14017] CPU: 0 UID: 0 PID: 14017 Comm: 4c1966e88c28fa9 Tainted: G E 6.18.0-rc4-00253-g21ce5d4ba045-dirty #124 PREEMPT_{RT,(full)} [58245.668174][T14017] Tainted: [E]=UNSIGNED_MODULE [58245.668176][T14017] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [58245.668184][T14017] Call Trace: [58245.668200][T14017] <TASK> [58245.668208][T14017] dump_stack_lvl+0x189/0x250 [58245.668288][T14017] ? __pfx_dump_stack_lvl+0x10/0x10 [58245.668301][T14017] ? __pfx__printk+0x10/0x10 [58245.668315][T14017] ? lock_metapage+0x303/0x400 [jfs] [58245.668406][T14017] ubsan_epilogue+0xa/0x40 [58245.668422][T14017] __ubsan_handle_shift_out_of_bounds+0x386/0x410 [58245.668462][T14017] dbSplit+0x1f8/0x200 [jfs] [58245.668543][T14017] dbAdjCtl+0x34c/0xa20 [jfs] [58245.668628][T14017] dbAllocNear+0x2ee/0x3d0 [jfs] [58245.668710][T14017] dbAlloc+0x933/0xba0 [jfs] [58245.668797][T14017] ea_write+0x374/0xdd0 [jfs] [58245.668888][T14017] ? __pfx_ea_write+0x10/0x10 [jfs] [58245.668966][T14017] ? __jfs_setxattr+0x76e/0x1120 [jfs] [58245.669046][T14017] __jfs_setxattr+0xa01/0x1120 [jfs] [58245.669135][T14017] ? __pfx___jfs_setxattr+0x10/0x10 [jfs] [58245.669216][T14017] ? mutex_lock_nested+0x154/0x1d0 [58245.669252][T14017] ? __jfs_xattr_set+0xb9/0x170 [jfs] [58245.669333][T14017] __jfs_xattr_set+0xda/0x170 [jfs] [58245.669430][T14017] ? __pfx___jfs_xattr_set+0x10/0x10 [jfs] [58245.669509][T14017] ? xattr_full_name+0x6f/0x90 [58245.669546][T14017] ? jfs_xattr_set+0x33/0x60 [jfs] [58245.669636][T14017] ? __pfx_jfs_xattr_set+0x10/0x10 [jfs] [58245.669726][T14017] __vfs_setxattr+0x43c/0x480 [58245.669743][T14017] __vfs_setxattr_noperm+0x12d/0x660 [58245.669756][T14017] vfs_setxattr+0x16b/0x2f0 [58245.669768][T14017] ? __pfx_vfs_setxattr+0x10/0x10 [58245.669782][T14017] filename_setxattr+0x274/0x600 [58245.669795][T14017] ? __pfx_filename_setxattr+0x10/0x10 [58245.669806][T14017] ? getname_flags+0x1e5/0x540 [58245.669829][T14017] path_setxattrat+0x364/0x3a0 [58245.669840][T14017] ? __pfx_path_setxattrat+0x10/0x10 [58245.669859][T14017] ? __se_sys_chdir+0x1b9/0x280 [58245.669876][T14017] __x64_sys_lsetxattr+0xbf/0xe0 [58245.669888][T14017] do_syscall_64+0xfa/0xfa0 [58245.669901][T14017] ? lockdep_hardirqs_on+0x9c/0x150 [58245.669913][T14017] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [58245.669927][T14017] ? exc_page_fault+0xab/0x100 [58245.669937][T14017] entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported-by: syzbot+4c1966e88c28fa96e053@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4c1966e88c28fa96e053 Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: add dtpage integrity check to prevent index/pointer overflowsYun Zhou-4/+107
Add check_dtpage() to validate dtpage_t integrity, focusing on preventing index/pointer overflows from on-disk corruption. Key checks: - maxslot must be exactly DTPAGEMAXSLOT (128) as defined for dtpage slot array. - freecnt bounded by [0, DTPAGEMAXSLOT-1] (slot[0] reserved for header). - freelist validity: -1 when freecnt=0; 1~DTPAGEMAXSLOT-1 when non-zero, with linked list checks (no duplicates, proper termination via next=-1). - stblindex bounds: must be within range that avoids overlapping with stbl itself (stblindex < DTPAGEMAXSLOT - stblsize). - nextindex bounded by stbl size (stblsize << L2DTSLOTSIZE). stbl entries validity: within 1~DTPAGEMAXSLOT-1, no duplicates(excluding invalid entries marked as -1). Invoked when loading dtpage (in BT_GETPAGE macro context) to catch corruption early before directory operations trigger out-of-bounds access. Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: add dtroot integrity check to prevent index out-of-boundsYun Zhou-0/+92
Add check_dtroot() to validate dtroot_t integrity, focusing on preventing index/pointer overflows from on-disk corruption. Key checks: - freecnt bounded by [0, DTROOTMAXSLOT-1] (slot[0] reserved for header). - freelist validity: -1 when freecnt=0; 1~DTROOTMAXSLOT-1 when non-zero, with linked list checks (no duplicates, proper termination via next=-1). - stbl bounds: nextindex within stbl array size; entries within 0~8, no duplicates (excluding idx=0). Invoked in copy_from_dinode() when loading directory inodes, catching corruption early before directory operations trigger out-of-bounds access. This fixes the following UBSAN warning. [ 101.832754][ T5960] ------------[ cut here ]------------ [ 101.832762][ T5960] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:3713:8 [ 101.832792][ T5960] index -1 is out of range for type 'struct dtslot[128]' [ 101.832807][ T5960] CPU: 2 UID: 0 PID: 5960 Comm: 5f7f0caf9979e9d Tainted: G E 6.18.0-rc4-00250-g2603eb907f03 #119 PREEMPT_{RT,(full [ 101.832817][ T5960] Tainted: [E]=UNSIGNED_MODULE [ 101.832819][ T5960] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 101.832823][ T5960] Call Trace: [ 101.832833][ T5960] <TASK> [ 101.832838][ T5960] dump_stack_lvl+0x189/0x250 [ 101.832909][ T5960] ? __pfx_dump_stack_lvl+0x10/0x10 [ 101.832925][ T5960] ? __pfx__printk+0x10/0x10 [ 101.832934][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0 [ 101.832959][ T5960] ubsan_epilogue+0xa/0x40 [ 101.832966][ T5960] __ubsan_handle_out_of_bounds+0xe9/0xf0 [ 101.833007][ T5960] dtInsertEntry+0x936/0x1430 [jfs] [ 101.833094][ T5960] dtSplitPage+0x2c8b/0x3ed0 [jfs] [ 101.833177][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10 [ 101.833193][ T5960] dtInsert+0x109b/0x6000 [jfs] [ 101.833283][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0 [ 101.833296][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10 [ 101.833307][ T5960] ? rt_spin_unlock+0x161/0x200 [ 101.833315][ T5960] ? __pfx_dtInsert+0x10/0x10 [jfs] [ 101.833391][ T5960] ? txLock+0xaf9/0x1cb0 [jfs] [ 101.833477][ T5960] ? dtInitRoot+0x22a/0x670 [jfs] [ 101.833556][ T5960] jfs_mkdir+0x6ec/0xa70 [jfs] [ 101.833636][ T5960] ? __pfx_jfs_mkdir+0x10/0x10 [jfs] [ 101.833721][ T5960] ? generic_permission+0x2e5/0x690 [ 101.833760][ T5960] ? bpf_lsm_inode_mkdir+0x9/0x20 [ 101.833776][ T5960] vfs_mkdir+0x306/0x510 [ 101.833786][ T5960] do_mkdirat+0x247/0x590 [ 101.833795][ T5960] ? __pfx_do_mkdirat+0x10/0x10 [ 101.833804][ T5960] ? getname_flags+0x1e5/0x540 [ 101.833815][ T5960] __x64_sys_mkdir+0x6c/0x80 [ 101.833823][ T5960] do_syscall_64+0xfa/0xfa0 [ 101.833832][ T5960] ? lockdep_hardirqs_on+0x9c/0x150 [ 101.833840][ T5960] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 101.833847][ T5960] ? exc_page_fault+0xab/0x100 [ 101.833856][ T5960] entry_SYSCALL_64_after_hwframe+0x77/0x7f Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-06treewide: change inode->i_ino from unsigned long to u64Jeff Layton-3/+3
On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds-8/+8
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook-10/+10
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'jfs-7.0' of github.com:kleikamp/linux-shaggyLinus Torvalds-4/+7
Pull jfs updates from Dave Kleikamp: "Just a handful of minor jfs fixes" * tag 'jfs-7.0' of github.com:kleikamp/linux-shaggy: jfs: avoid -Wtautological-constant-out-of-range-compare warning jfs: Add missing set_freezable() for freezable kthread jfs: nlink overflow in jfs_rename
2026-02-09Merge tag 'vfs-7.0-rc1.misc' of ↵Linus Torvalds-7/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains a mix of VFS cleanups, performance improvements, API fixes, documentation, and a deprecation notice. Scalability and performance: - Rework pid allocation to only take pidmap_lock once instead of twice during alloc_pid(), improving thread creation/teardown throughput by 10-16% depending on false-sharing luck. Pad the namespace refcount to reduce false-sharing - Track file lock presence via a flag in ->i_opflags instead of reading ->i_flctx, avoiding false-sharing with ->i_readcount on open/close hot paths. Measured 4-16% improvement on 24-core open-in-a-loop benchmarks - Use a consume fence in locks_inode_context() to match the store-release/load-consume idiom, eliminating a hardware fence on some architectures - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent false-sharing - Remove a redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu() that never fires since the caller already verifies it, eliminating a 100% mispredicted branch - Fix a 100% mispredicted likely() in devcgroup_inode_permission() that became wrong after a prior code reorder Bug fixes and correctness: - Make insert_inode_locked() wait for inode destruction instead of skipping, fixing a corner case where two matching inodes could exist in the hash - Move f_mode initialization before file_ref_init() in alloc_file() to respect the SLAB_TYPESAFE_BY_RCU ordering contract - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with no buffers attached, preventing a null pointer dereference when AS_RELEASE_ALWAYS is set but no release_folio op exists - Fix select restart_block to store end_time as timespec64, avoiding truncation of tv_sec on 32-bit architectures - Make dump_inode() use get_kernel_nofault() to safely access inode and superblock fields, matching the dump_mapping() pattern API modernization: - Make posix_acl_to_xattr() allocate the buffer internally since every single caller was doing it anyway. Reduces boilerplate and unnecessary error checking across ~15 filesystems - Replace deprecated simple_strtoul() with kstrtoul() for the ihash_entries, dhash_entries, mhash_entries, and mphash_entries boot parameters, adding proper error handling - Convert chardev code to use guard(mutex) and __free(kfree) cleanup patterns - Replace min_t() with min() or umin() in VFS code to avoid silently truncating unsigned long to unsigned int - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers already check the flag Deprecation: - Begin deprecating legacy BSD process accounting (acct(2)). The interface has numerous footguns and better alternatives exist (eBPF) Documentation: - Fix and complete kernel-doc for struct export_operations, removing duplicated documentation between ReST and source - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait() Testing: - Add a kunit test for initramfs cpio handling of entries with filesize > PATH_MAX Misc: - Add missing <linux/init_task.h> include in fs_struct.c" * tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) posix_acl: make posix_acl_to_xattr() alloc the buffer fs: make insert_inode_locked() wait for inode destruction initramfs_test: kunit test for cpio.filesize > PATH_MAX fs: improve dump_inode() to safely access inode fields fs: add <linux/init_task.h> for 'init_fs' docs: exportfs: Use source code struct documentation fs: move initializing f_mode before file_ref_init() exportfs: Complete kernel-doc for struct export_operations exportfs: Mark struct export_operations functions at kernel-doc exportfs: Fix kernel-doc output for get_name() acct(2): begin the deprecation of legacy BSD process accounting device_cgroup: remove branch hint after code refactor VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS select: store end_time as timespec64 in restart block chardev: Switch to guard(mutex) and __free(kfree) namespace: Replace simple_strtoul with kstrtoul to parse boot params dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries ...
2026-02-02jfs: avoid -Wtautological-constant-out-of-range-compare warningArnd Bergmann-2/+2
A recent change for the range check started triggering a clang warning: fs/jfs/jfs_dtree.c:2906:31: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 2906 | if (stbl[i] < 0 || stbl[i] >= DTPAGEMAXSLOT) { | ~~~~~~~ ^ ~~~~~~~~~~~~~ fs/jfs/jfs_dtree.c:3111:30: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 3111 | if (stbl[0] < 0 || stbl[0] >= DTPAGEMAXSLOT) { | ~~~~~~~ ^ ~~~~~~~~~~~~~ Both the old and the new check were useless, but the previous version apparently did not lead to the warning. Remove the extraneous range check for simplicity. Fixes: cafc6679824a ("jfs: replace hardcoded magic number with DTPAGEMAXSLOT constant") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-01-16posix_acl: make posix_acl_to_xattr() alloc the bufferMiklos Szeredi-7/+2
Without exception all caller do that. So move the allocation into the helper. This reduces boilerplate and removes unnecessary error checking. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12jfs: add setlease file operationJeff Layton-0/+4
Add the setlease file_operation to jfs_file_operations and jfs_dir_operations, pointing to generic_setlease. A future patch will change the default behavior to reject lease attempts with -EINVAL when there is no setlease file operation defined. Add generic_setlease to retain the ability to set leases on this filesystem. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-12-ea4dec9b67fa@kernel.org Acked-by: Richard Weinberger <richard@nod.at> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-02jfs: Add missing set_freezable() for freezable kthreadHaotian Zhang-0/+1
The jfsIOWait() thread calls try_to_freeze() but lacks set_freezable(), causing it to remain non-freezable by default. This prevents proper freezing during system suspend. Add set_freezable() to make the thread freezable as intended. Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-12-02jfs: nlink overflow in jfs_renameJori Koolstra-2/+4
If nlink is maximal for a directory (-1) and inside that directory you perform a rename for some child directory (not moving from the parent), then the nlink of the first directory is first incremented and later decremented. Normally this is fine, but when nlink = -1 this causes a wrap around to 0, and then drop_nlink issues a warning. After applying the patch syzbot no longer issues any warnings. I also ran some basic fs tests to look for any regressions. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Reported-by: syzbot+9131ddfd7870623b719f@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=9131ddfd7870623b719f Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-12-01Merge tag 'vfs-6.19-rc1.inode' of ↵Linus Torvalds-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-10-29jfs: Rename _inline to avoid conflict with clang's '-fms-extensions'Nathan Chancellor-3/+3
Building fs/jfs with clang and '-fms-extensions' errors with: In file included from fs/jfs/jfs_unicode.c:8: fs/jfs/jfs_incore.h:86:13: error: type name does not allow function specifier to be specified 86 | unchar _inline[128]; | ^ fs/jfs/jfs_incore.h:86:20: error: expected member name or ';' after declaration specifiers 86 | unchar _inline[128]; | ~~~~~~~~~~~~~~^ '-fms-extensions' in clang enables several other Microsoft specific keywords such as _inline [1], presumably for compatibility with MSVC, as Microsoft's documentation [2] mentions: For compatibility with previous versions, _inline and _forceinline are synonyms for __inline and __forceinline, respectively Rename the _inline array in 'struct jfs_inode_info' to _inline_sym to avoid this conflict, which is not a large workaround as this member is only ever referred to via the i_inline macro. Link: https://github.com/llvm/llvm-project/blob/249883d0c5883996bed038cd82a8999f342994c9/clang/include/clang/Basic/TokenKinds.def#L744-L79 [1] Link: https://learn.microsoft.com/en-us/cpp/c-language/inline-functions [2] Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Link: https://patch.msgid.link/20251023-jfs-fix-conflict-with-clang-ms-ext-v1-1-e219d59a1e68@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-10-20Coccinelle-based conversion to use ->i_state accessorsMateusz Guzik-4/+4
All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 | flag2) @@ expression inode, flags; @@ - inode->i_state |= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-03Merge tag 'jfs-6.18' of github.com:kleikamp/linux-shaggyLinus Torvalds-13/+19
Pull jfs updates from Dave Kleikamp: "A few fixes and cleanups for JFS" * tag 'jfs-6.18' of github.com:kleikamp/linux-shaggy: jfs: replace hardcoded magic number with DTPAGEMAXSLOT constant JFS: Remove redundant 0 value initialization JFS: Remove unnecessary parentheses jfs: fix uninitialized waitqueue in transaction manager jfs: Verify inode mode when loading from disk
2025-09-18jfs: replace hardcoded magic number with DTPAGEMAXSLOT constantZheng Yu-2/+2
Replace hardcoded value 127 with DTPAGEMAXSLOT constant in boundary checks within jfs_readdir() and dtReadFirst(). This improves code maintainability and ensures consistency with the defined maximum slot value. Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18JFS: Remove redundant 0 value initializationLiao Yuanhong-1/+0
The jfs_log struct is already zeroed by kzalloc(). It's redundant to initialize dummy_log->base to 0. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18JFS: Remove unnecessary parenthesesLiao Yuanhong-5/+5
When using &, it's unnecessary to have parentheses afterward. Remove redundant parentheses to enhance readability. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18jfs: fix uninitialized waitqueue in transaction managerShaurya Rane-4/+5
The transaction manager initialization in txInit() was not properly initializing TxBlock[0].waitor waitqueue, causing a crash when txEnd(0) is called on read-only filesystems. When a filesystem is mounted read-only, txBegin() returns tid=0 to indicate no transaction. However, txEnd(0) still gets called and tries to access TxBlock[0].waitor via tid_to_tblock(0), but this waitqueue was never initialized because the initialization loop started at index 1 instead of 0. This causes a 'non-static key' lockdep warning and system crash: INFO: trying to register non-static key in txEnd Fix by ensuring all transaction blocks including TxBlock[0] have their waitqueues properly initialized during txInit(). Reported-by: syzbot+c4f3462d8b2ad7977bea@syzkaller.appspotmail.com Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18jfs: Verify inode mode when loading from diskTetsuo Handa-1/+7
The inode mode loaded from corrupted disk can be invalid. Do like what commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk") does. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-13treewide: remove MIGRATEPAGE_SUCCESSDavid Hildenbrand-4/+4
At this point MIGRATEPAGE_SUCCESS is misnamed for all folio users, and now that we remove MIGRATEPAGE_UNMAP, it's really the only "success" return value that the code uses and expects. Let's just get rid of MIGRATEPAGE_SUCCESS completely and just use "0" for success. Link: https://lkml.kernel.org/r/20250811143949.1117439-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> [mm] Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> [jfs] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Byungchul Park <byungchul@sk.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Eugenio Pé rez <eperezma@redhat.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Lance Yang <lance.yang@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-31Merge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggyLinus Torvalds-69/+96
Pull jfs updates from Dave Kleikamp: "Fixes and cleanups for JFS filesystem" * tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy: jfs: fix metapage reference count leak in dbAllocCtl jfs: stop using write_cache_pages jfs: truncate good inode pages when hard link is 0 jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage() jfs: Regular file corruption check jfs: upper bound check of tree index in dbAllocAG
2025-07-29jfs: fix metapage reference count leak in dbAllocCtlZheng Yu-1/+3
In dbAllocCtl(), read_metapage() increases the reference count of the metapage. However, when dp->tree.budmin < 0, the function returns -EIO without calling release_metapage() to decrease the reference count, leading to a memory leak. Add release_metapage(mp) before the error return to properly manage the metapage reference count and prevent the leak. Fixes: a5f5e4698f8abbb25fe4959814093fb5bfa1aa9d ("jfs: fix shift-out-of-bounds in dbSplit") Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-28Merge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-28Merge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28Merge tag 'vfs-6.17-rc1.misc' of ↵Linus Torvalds-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc VFS updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add ext4 IOCB_DONTCACHE support This refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Cleanups: - don't duplicate vfs_open() in kernel_file_open() - proc_fd_getattr(): don't bother with S_ISDIR() check - fs/ecryptfs: replace snprintf with sysfs_emit in show function - vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() - filelock: add new locks_wake_up_waiter() helper - fs: Remove three arguments from block_write_end() - VFS: change old_dir and new_dir in struct renamedata to dentrys - netfs: Remove unused declaration netfs_queue_write_request() Fixes: - eventpoll: Fix semi-unbounded recursion - eventpoll: fix sphinx documentation build warning - fs/read_write: Fix spelling typo - fs: annotate data race between poll_schedule_timeout() and pollwake() - fs/pipe: set FMODE_NOWAIT in create_pipe_files() - docs/vfs: update references to i_mutex to i_rwsem - fs/buffer: remove comment about hard sectorsize - fs/buffer: remove the min and max limit checks in __getblk_slow() - fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable - fs_context: fix parameter name in infofc() macro - fs: Prevent file descriptor table allocations exceeding INT_MAX" * tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) netfs: Remove unused declaration netfs_queue_write_request() eventpoll: fix sphinx documentation build warning ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create eventpoll: Fix semi-unbounded recursion vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable fs/buffer: remove the min and max limit checks in __getblk_slow() fs: Prevent file descriptor table allocations exceeding INT_MAX fs: Remove three arguments from block_write_end() fs/ecryptfs: replace snprintf with sysfs_emit in show function fs: annotate suspected data race between poll_schedule_timeout() and pollwake() docs/vfs: update references to i_mutex to i_rwsem fs/buffer: remove comment about hard sectorsize fs_context: fix parameter name in infofc() macro VFS: change old_dir and new_dir in struct renamedata to dentrys proc_fd_getattr(): don't bother with S_ISDIR() check ...
2025-07-16fs: change write_begin/write_end interface to take struct kiocb *Taotao Chen-7/+9
Change the address_space_operations callbacks write_begin() and write_end() to take struct kiocb * as the first argument instead of struct file *. Update all affected function prototypes, implementations, call sites, and related documentation across VFS, filesystems, and block layer. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14jfs: stop using write_cache_pagesChristoph Hellwig-3/+5
Stop using the obsolete write_cache_pages and use writeback_iter directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: truncate good inode pages when hard link is 0Lizhi Xu-1/+1
The fileset value of the inode copy from the disk by the reproducer is AGGR_RESERVED_I. When executing evict, its hard link number is 0, so its inode pages are not truncated. This causes the bugon to be triggered when executing clear_inode() because nrpages is greater than 0. Reported-by: syzbot+6e516bb515d93230bc7b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e516bb515d93230bc7b Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage()Suchit Karunakaran-64/+78
Replace legacy XT_GETPAGE macro with an inline function that returns a xtpage_t pointer and update all instances of XT_GETPAGE in jfs_xtree.c Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Simplified xt_getpage by removing size and rc arguments. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: Regular file corruption checkEdward Adam Davis-0/+3
The reproducer builds a corrupted file on disk with a negative i_size value. Add a check when opening this file to avoid subsequent operation failures. Reported-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=630f6d40b3ccabc8e96e Tested-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: upper bound check of tree index in dbAllocAGArnaud Lecomte-0/+6
When computing the tree index in dbAllocAG, we never check if we are out of bounds realative to the size of the stree. This could happen in a scenario where the filesystem metadata are corrupted. Reported-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cffd18309153948f3c3e Tested-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-04tree-wide: s/struct fileattr/struct file_kattr/gChristian Brauner-4/+4
Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-17fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()Lorenzo Stoakes-1/+1
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"), the f_op->mmap() hook has been deprecated in favour of f_op->mmap_prepare(). We have provided generic .mmap_prepare() equivalents, so update all file systems that specify these directly in their file_operations structures. This updates 9p, adfs, affs, bfs, fat, hfs, hfsplus, hostfs, hpfs, jffs2, jfs, minix, omfs, ramfs and ufs file systems directly. It updates generic_ro_fops which impacts qnx4, cramfs, befs, squashfs, frebxfs, qnx6, efs, romfs, erofs and isofs file systems. There are remaining file systems which use generic hooks in a less direct way which we address in a subsequent commit. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/c7dc90e44a9e75e750939ea369290d6e441a18e6.1750099179.git.lorenzo.stoakes@oracle.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-10new helper: set_default_d_op()Al Viro-1/+1
... to be used instead of manually assigning to ->s_d_op. All in-tree filesystem converted (and field itself is renamed, so any out-of-tree ones in need of conversion will be caught by compiler). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-05-31Merge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds-0/+106
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
2025-05-12jfs: implement migrate_folio for jfs_metapage_aopsShivank Garg-0/+106
Add the missing migrate_folio operation to jfs_metapage_aops to fix warnings during memory compaction. These warnings were introduced by commit 7ee3647243e5 ("migrate: Remove call to ->writepage") which added explicit warnings when filesystems don't implement migrate_folio. System reports following warnings: jfs_metapage_aops does not implement migrate_folio WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 fallback_migrate_folio mm/migrate.c:953 [inline] WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 move_to_new_folio+0x70e/0x840 mm/migrate.c:1007 Implement metapage_migrate_folio() which handles both single and multiple metapages per page configurations. [shivankg@amd.com: change comment style] Link: https://lkml.kernel.org/r/1967593d-8084-4a4a-b384-35d5adc54eb4@amd.com [akpm@linux-foundation.org: fix build] [shivankg@amd.com: remove redundant NULL check in __metapage_migrate_folio()] Link: https://lkml.kernel.org/r/a67db238-0ca6-4725-abb2-dc092de87e1b@amd.com Link: https://lkml.kernel.org/r/20250430100150.279751-3-shivankg@amd.com Fixes: 35474d52c605 ("jfs: Convert metapage_writepage to metapage_write_folio") Signed-off-by: Shivank Garg <shivankg@amd.com> Reported-by: syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67faff52.050a0220.379d84.001b.GAE@google.com Tested-by: syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com Cc: Alistair Popple <apopple@nvidia.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-03jfs: fix array-index-out-of-bounds read in add_missing_indicesAditya Dutt-3/+15
stbl is s8 but it must contain offsets into slot which can go from 0 to 127. Added a bound check for that error and return -EIO if the check fails. Also make jfs_readdir return with error if add_missing_indices returns with an error. Reported-by: syzbot+b974bd41515f770c608b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com./bug?extid=b974bd41515f770c608b Signed-off-by: Aditya Dutt <duttaditya18@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-04-03jfs: Fix null-ptr-deref in jfs_ioc_trimDylan Wolff-1/+2
[ Syzkaller Report ] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000087: 0000 [#1 KASAN: null-ptr-deref in range [0x0000000000000438-0x000000000000043f] CPU: 2 UID: 0 PID: 10614 Comm: syz-executor.0 Not tainted 6.13.0-rc6-gfbfd64d25c7a-dirty #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Sched_ext: serialise (enabled+all), task: runnable_at=-30ms RIP: 0010:jfs_ioc_trim+0x34b/0x8f0 Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93 90 82 fe ff 4c 89 ff 31 f6 RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206 RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001 RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000 R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438 FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body+0x61/0xb0 ? die_addr+0xb1/0xe0 ? exc_general_protection+0x333/0x510 ? asm_exc_general_protection+0x26/0x30 ? jfs_ioc_trim+0x34b/0x8f0 jfs_ioctl+0x3c8/0x4f0 ? __pfx_jfs_ioctl+0x10/0x10 ? __pfx_jfs_ioctl+0x10/0x10 __se_sys_ioctl+0x269/0x350 ? __pfx___se_sys_ioctl+0x10/0x10 ? do_syscall_64+0xfb/0x210 do_syscall_64+0xee/0x210 ? syscall_exit_to_user_mode+0x1e0/0x330 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe51f4903ad Code: c3 e8 a7 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d RSP: 002b:00007fe5202250c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fe51f5cbf80 RCX: 00007fe51f4903ad RDX: 0000000020000680 RSI: 00000000c0185879 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe520225640 R13: 000000000000000e R14: 00007fe51f44fca0 R15: 00007fe52021d000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:jfs_ioc_trim+0x34b/0x8f0 Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93 90 82 fe ff 4c 89 ff 31 f6 RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206 RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001 RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000 R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438 FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Fatal exception [ Analysis ] We believe that we have found a concurrency bug in the `fs/jfs` module that results in a null pointer dereference. There is a closely related issue which has been fixed: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6c1b3599b2feb5c7291f5ac3a36e5fa7cedb234 ... but, unfortunately, the accepted patch appears to still be susceptible to a null pointer dereference under some interleavings. To trigger the bug, we think that `JFS_SBI(ipbmap->i_sb)->bmap` is set to NULL in `dbFreeBits` and then dereferenced in `jfs_ioc_trim`. This bug manifests quite rarely under normal circumstances, but is triggereable from a syz-program. Reported-and-tested-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Reported-and-tested-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Signed-off-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-04-03jfs: validate AG parameters in dbMount() to prevent crashesVasiliy Kovalev-1/+5
Validate db_agheight, db_agwidth, and db_agstart in dbMount to catch corrupted metadata early and avoid undefined behavior in dbAllocAG. Limits are derived from L2LPERCTL, LPERCTL/MAXAG, and CTLTREESIZE: - agheight: 0 to L2LPERCTL/2 (0 to 5) ensures shift (L2LPERCTL - 2*agheight) >= 0. - agwidth: 1 to min(LPERCTL/MAXAG, 2^(L2LPERCTL - 2*agheight)) ensures agperlev >= 1. - Ranges: 1-8 (agheight 0-3), 1-4 (agheight 4), 1 (agheight 5). - LPERCTL/MAXAG = 1024/128 = 8 limits leaves per AG; 2^(10 - 2*agheight) prevents division to 0. - agstart: 0 to CTLTREESIZE-1 - agwidth*(MAXAG-1) keeps ti within stree (size 1365). - Ranges: 0-1237 (agwidth 1), 0-348 (agwidth 8). UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:1400:9 shift exponent -335544310 is negative CPU: 0 UID: 0 PID: 5822 Comm: syz-executor130 Not tainted 6.14.0-rc5-syzkaller #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_shift_out_of_bounds+0x3c8/0x420 lib/ubsan.c:468 dbAllocAG+0x1087/0x10b0 fs/jfs/jfs_dmap.c:1400 dbDiscardAG+0x352/0xa20 fs/jfs/jfs_dmap.c:1613 jfs_ioc_trim+0x45a/0x6b0 fs/jfs/jfs_discard.c:105 jfs_ioctl+0x2cd/0x3e0 fs/jfs/ioctl.c:131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Cc: stable@vger.kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+fe8264911355151c487f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fe8264911355151c487f Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-03-27Merge tag 'jfs-6.14' of github.com:kleikamp/linux-shaggyLinus Torvalds-41/+51
Pull jfs updates from David Kleikamp: "Various bug fixes and cleanups for JFS" * tag 'jfs-6.14' of github.com:kleikamp/linux-shaggy: jfs: add index corruption check to DT_GETPAGE() fs/jfs: consolidate sanity checking in dbMount jfs: add sanity check for agwidth in dbMount jfs: Prevent copying of nlink with value 0 from disk inode fs/jfs: Prevent integer overflow in AG size calculation fs/jfs: cast inactags to s64 to prevent potential overflow jfs: Fix uninit-value access of imap allocated in the diMount() function jfs: fix slab-out-of-bounds read in ea_get() jfs: add check read-only before truncation in jfs_truncate_nolock() jfs: add check read-only before txBeginAnon() call jfs: reject on-disk inodes of an unsupported type jfs: Remove reference to bh->b_page jfs: Delete a couple tabs in jfs_reconfigure()
2025-03-11jfs: add index corruption check to DT_GETPAGE()Roman Smirnov-1/+2
If the file system is corrupted, the header.stblindex variable may become greater than 127. Because of this, an array access out of bounds may occur: ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:3096:10 index 237 is out of range for type 'struct dtslot[128]' CPU: 0 UID: 0 PID: 5822 Comm: syz-executor740 Not tainted 6.13.0-rc4-syzkaller-00110-g4099a71718b0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429 dtReadFirst+0x622/0xc50 fs/jfs/jfs_dtree.c:3096 dtReadNext fs/jfs/jfs_dtree.c:3147 [inline] jfs_readdir+0x9aa/0x3c50 fs/jfs/jfs_dtree.c:2862 wrap_directory_iterator+0x91/0xd0 fs/readdir.c:65 iterate_dir+0x571/0x800 fs/readdir.c:108 __do_sys_getdents64 fs/readdir.c:403 [inline] __se_sys_getdents64+0x1e2/0x4b0 fs/readdir.c:389 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> ---[ end trace ]--- Add a stblindex check for corruption. Reported-by: syzbot <syzbot+9120834fc227768625ba@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=9120834fc227768625ba Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>