summaryrefslogtreecommitdiffstats
path: root/fs/jbd2
AgeCommit message (Collapse)AuthorLines
2026-04-17Merge tag 'ext4_for_linux-7.0-rc1' of ↵Linus Torvalds-55/+155
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - Refactor code paths involved with partial block zero-out in prearation for converting ext4 to use iomap for buffered writes - Remove use of d_alloc() from ext4 in preparation for the deprecation of this interface - Replace some J_ASSERTS with a journal abort so we can avoid a kernel panic for a localized file system error - Simplify various code paths in mballoc, move_extent, and fast commit - Fix rare deadlock in jbd2_journal_cancel_revoke() that can be triggered by generic/013 when blocksize < pagesize - Fix memory leak when releasing an extended attribute when its value is stored in an ea_inode - Fix various potential kunit test bugs in fs/ext4/extents.c - Fix potential out-of-bounds access in check_xattr() with a corrupted file system - Make the jbd2_inode dirty range tracking safe for lockless reads - Avoid a WARN_ON when writeback files due to a corrupted file system; we already print an ext4 warning indicatign that data will be lost, so the WARN_ON is not necessary and doesn't add any new information * tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) jbd2: fix deadlock in jbd2_journal_cancel_revoke() ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all() ext4: fix possible null-ptr-deref in mbt_kunit_exit() ext4: fix possible null-ptr-deref in extents_kunit_exit() ext4: fix the error handling process in extents_kunit_init). ext4: call deactivate_super() in extents_kunit_exit() ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init() ext4: fix bounds check in check_xattrs() to prevent out-of-bounds access ext4: zero post-EOF partial block before appending write ext4: move pagecache_isize_extended() out of active handle ext4: remove ctime/mtime update from ext4_alloc_file_blocks() ext4: unify SYNC mode checks in fallocate paths ext4: ensure zeroed partial blocks are persisted in SYNC mode ext4: move zero partial block range functions out of active handle ext4: pass allocate range as loff_t to ext4_alloc_file_blocks() ext4: remove handle parameters from zero partial block functions ext4: move ordered data handling out of ext4_block_do_zero_range() ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range() ext4: factor out journalled block zeroing range ext4: rename and extend ext4_block_truncate_page() ...
2026-04-13Merge tag 'vfs-7.1-rc1.kino' of ↵Linus Torvalds-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
2026-04-09jbd2: fix deadlock in jbd2_journal_cancel_revoke()Zhang Yi-3/+5
Commit f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()") changed jbd2_journal_cancel_revoke() to use __find_get_block_nonatomic() which holds the folio lock instead of i_private_lock. This breaks the lock ordering (folio -> buffer) and causes an ABBA deadlock when the filesystem blocksize < pagesize: T1 T2 ext4_mkdir() ext4_init_new_dir() ext4_append() ext4_getblk() lock_buffer() <- A sync_blockdev() blkdev_writepages() writeback_iter() writeback_get_folio() folio_lock() <- B ext4_journal_get_create_access() jbd2_journal_cancel_revoke() __find_get_block_nonatomic() folio_lock() <- B block_write_full_folio() lock_buffer() <- A This can occasionally cause generic/013 to hang. Fix by only calling __find_get_block_nonatomic() when the passed buffer_head doesn't belong to the bdev, which is the only case that we need to look up its bdev alias. Otherwise, the lookup is redundant since the found buffer_head is equal to the one we passed in. Fixes: f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260409114204.917154-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-04-09jbd2: store jinode dirty range in PAGE_SIZE unitsLi Chen-23/+58
jbd2_inode fields are updated under journal->j_list_lock, but some paths read them without holding the lock (e.g. fast commit helpers and ordered truncate helpers). READ_ONCE() alone is not sufficient for the dirty range fields when they are stored as loff_t because 32-bit platforms can observe torn loads. Store the dirty range in PAGE_SIZE units as pgoff_t instead. Represent the dirty range end as an exclusive end page. This avoids a special sentinel value and keeps MAX_LFS_FILESIZE on 32-bit representable. Publish a new dirty range by updating end_page before start_page, and treat start_page >= end_page as empty in the accessor for robustness. Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the dirty range and i_flags to match the existing lockless access pattern. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-5-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09jbd2: gracefully abort on transaction state corruptionsMilos Nikic-28/+86
Auditing the jbd2 codebase reveals several legacy J_ASSERT calls that enforce internal state machine invariants (e.g., verifying jh->b_transaction or jh->b_next_transaction pointers). When these invariants are broken, the journal is in a corrupted state. However, triggering a fatal panic brings down the entire system for a localized filesystem error. This patch targets a specific class of these asserts: those residing inside functions that natively return integer error codes, booleans, or error pointers. It replaces the hard J_ASSERTs with WARN_ON_ONCE to capture the offending stack trace, safely drops any held locks, gracefully aborts the journal, and returns -EINVAL. This prevents a catastrophic kernel panic while ensuring the corrupted journal state is safely contained and upstream callers (like ext4 or ocfs2) can gracefully handle the aborted handle. Functions modified in fs/jbd2/transaction.c: - jbd2__journal_start() - do_get_write_access() - jbd2_journal_dirty_metadata() - jbd2_journal_forget() - jbd2_journal_try_to_free_buffers() - jbd2_journal_file_inode() Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-3-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09jbd2: gracefully abort instead of panicking on unlocked bufferMilos Nikic-1/+6
In jbd2_journal_get_create_access(), if the caller passes an unlocked buffer, the code currently triggers a fatal J_ASSERT. While an unlocked buffer here is a clear API violation and a bug in the caller, crashing the entire system is an overly severe response. It brings down the whole machine for a localized filesystem inconsistency. Replace the J_ASSERT with a WARN_ON_ONCE to capture the offending caller's stack trace, and return an error (-EINVAL). This allows the journal to gracefully abort the transaction, protecting data integrity without causing a kernel panic. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-2-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-03-27jbd2: gracefully abort on checkpointing state corruptionsMilos Nikic-2/+13
This patch targets two internal state machine invariants in checkpoint.c residing inside functions that natively return integer error codes. - In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE and a graceful journal abort, returning -EFSCORRUPTED. - In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for an unexpected buffer_jwrite state. If the warning triggers, we explicitly drop the just-taken get_bh() reference and call __flush_batch() to safely clean up any previously queued buffers in the j_chkpt_bhs array, preventing a memory leak before returning -EFSCORRUPTED. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260311041548.159424-1-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-03-06treewide: change inode->i_ino from unsigned long to u64Jeff Layton-3/+3
On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds-2/+1
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds-4/+4
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook-7/+6
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-26jbd2: fix the inconsistency between checksum and data in memory for journal sbYe Bin-0/+14
Copying the file system while it is mounted as read-only results in a mount failure: [~]# mkfs.ext4 -F /dev/sdc [~]# mount /dev/sdc -o ro /mnt/test [~]# dd if=/dev/sdc of=/dev/sda bs=1M [~]# mount /dev/sda /mnt/test1 [ 1094.849826] JBD2: journal checksum error [ 1094.850927] EXT4-fs (sda): Could not load journal inode mount: mount /dev/sda on /mnt/test1 failed: Bad message The process described above is just an abstracted way I came up with to reproduce the issue. In the actual scenario, the file system was mounted read-only and then copied while it was still mounted. It was found that the mount operation failed. The user intended to verify the data or use it as a backup, and this action was performed during a version upgrade. Above issue may happen as follows: ext4_fill_super set_journal_csum_feature_set(sb) if (ext4_has_metadata_csum(sb)) incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3; if (test_opt(sb, JOURNAL_CHECKSUM) jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat); lock_buffer(journal->j_sb_buffer); sb->s_feature_incompat |= cpu_to_be32(incompat); //The data in the journal sb was modified, but the checksum was not updated, so the data remaining in memory has a mismatch between the data and the checksum. unlock_buffer(journal->j_sb_buffer); In this case, the journal sb copied over is in a state where the checksum and data are inconsistent, so mounting fails. To solve the above issue, update the checksum in memory after modifying the journal sb. Fixes: 4fd5ea43bc11 ("jbd2: checksum journal superblock") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251103010123.3753631-1-yebin@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-11-26jbd2: store more accurate errno in superblock when possibleWengang Wang-9/+13
When jbd2_journal_abort() is called, the provided error code is stored in the journal superblock. Some existing calls hard-code -EIO even when the actual failure is not I/O related. This patch updates those calls to pass more accurate error codes, allowing the superblock to record the true cause of failure. This helps improve diagnostics and debugging clarity when analyzing journal aborts. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Message-ID: <20251031210501.7337-1-wen.gang.wang@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-26jbd2: avoid bug_on in jbd2_journal_get_create_access() when file system ↵Ye Bin-5/+14
corrupted There's issue when file system corrupted: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1289! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0 RSP: 0018:ffff888117aafa30 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534 RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010 RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0 Call Trace: <TASK> __ext4_journal_get_create_access+0x42/0x170 ext4_getblk+0x319/0x6f0 ext4_bread+0x11/0x100 ext4_append+0x1e6/0x4a0 ext4_init_new_dir+0x145/0x1d0 ext4_mkdir+0x326/0x920 vfs_mkdir+0x45c/0x740 do_mkdirat+0x234/0x2f0 __x64_sys_mkdir+0xd6/0x120 do_syscall_64+0x5f/0xfa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The above issue occurs with us in errors=continue mode when accompanied by storage failures. There have been many inconsistencies in the file system data. In the case of file system data inconsistency, for example, if the block bitmap of a referenced block is not set, it can lead to the situation where a block being committed is allocated and used again. As a result, the following condition will not be satisfied then trigger BUG_ON. Of course, it is entirely possible to construct a problematic image that can trigger this BUG_ON through specific operations. In fact, I have constructed such an image and easily reproduced this issue. Therefore, J_ASSERT() holds true only under ideal conditions, but it may not necessarily be satisfied in exceptional scenarios. Using J_ASSERT() directly in abnormal situations would cause the system to crash, which is clearly not what we want. So here we directly trigger a JBD abort instead of immediately invoking BUG_ON. Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251025072657.307851-1-yebin@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-11-13jbd2: use a weaker annotation in journal handlingByungchul Park-1/+1
jbd2 journal handling code doesn't want jbd2_might_wait_for_commit() to be placed between start_this_handle() and stop_this_handle(). So it marks the region with rwsem_acquire_read() and rwsem_release(). However, the annotation is too strong for that purpose. We don't have to use more than try lock annotation for that. rwsem_acquire_read() implies: 1. might be a waiter on contention of the lock. 2. enter to the critical section of the lock. All we need in here is to act 2, not 1. So trylock version of annotation is sufficient for that purpose. Now that dept partially relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a potential wait and might report a deadlock by the wait. Replace it with trylock version of annotation. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Message-ID: <20251024073940.1063-1-byungchul@sk.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-13jbd2: use a per-journal lock_class_key for jbd2_trans_commit_keyTetsuo Handa-2/+4
syzbot is reporting possibility of deadlock due to sharing lock_class_key for jbd2_handle across ext4 and ocfs2. But this is a false positive, for one disk partition can't have two filesystems at the same time. Reported-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e493c165d26d6fcbf72 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <987110fc-5470-457a-a218-d286a09dd82f@I-love.SAKURA.ne.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-10-10jbd2: ensure that all ongoing I/O complete before freeing blocksZhang Yi-4/+9
When releasing file system metadata blocks in jbd2_journal_forget(), if this buffer has not yet been checkpointed, it may have already been written back, currently be in the process of being written back, or has not yet written back. jbd2_journal_forget() calls jbd2_journal_try_remove_checkpoint() to check the buffer's status and add it to the current transaction if it has not been written back. This buffer can only be reallocated after the transaction is committed. jbd2_journal_try_remove_checkpoint() attempts to lock the buffer and check its dirty status while holding the buffer lock. If the buffer has already been written back, everything proceeds normally. However, there are two issues. First, the function returns immediately if the buffer is locked by the write-back process. It does not wait for the write-back to complete. Consequently, until the current transaction is committed and the block is reallocated, there is no guarantee that the I/O will complete. This means that ongoing I/O could write stale metadata to the newly allocated block, potentially corrupting data. Second, the function unlocks the buffer as soon as it detects that the buffer is still dirty. If a concurrent write-back occurs immediately after this unlocking and before clear_buffer_dirty() is called in jbd2_journal_forget(), data corruption can theoretically still occur. Although these two issues are unlikely to occur in practice since the undergoing metadata writeback I/O does not take this long to complete, it's better to explicitly ensure that all ongoing I/O operations are completed. Fixes: 597599268e3b ("jbd2: discard dirty data when forgetting an un-journalled buffer") Cc: stable@kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20250916093337.3161016-2-yi.zhang@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-09-25jbd2: increase IO priority of checkpointJulian Sun-1/+1
In commit 6a3afb6ac6df ("jbd2: increase the journal IO's priority"), the priority of IOs initiated by jbd2 has been raised, exempting them from WBT throttling. Checkpoint is also a crucial operation of jbd2. While no serious issues have been observed so far, it should still be reasonable to exempt checkpoint from WBT throttling. Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-08-13jbd2: prevent softlockup in jbd2_log_do_checkpoint()Baokun Li-0/+1
Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list() periodically release j_list_lock after processing a batch of buffers to avoid long hold times on the j_list_lock. However, since both functions contend for j_list_lock, the combined time spent waiting and processing can be significant. jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when need_resched() is true to avoid softlockups during prolonged operations. But jbd2_log_do_checkpoint() only exits its loop when need_resched() is true, relying on potentially sleeping functions like __flush_batch() or wait_on_buffer() to trigger rescheduling. If those functions do not sleep, the kernel may hit a softlockup. watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373] CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017 Workqueue: writeback wb_workfn (flush-7:2) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : native_queued_spin_lock_slowpath+0x358/0x418 lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] Call trace: native_queued_spin_lock_slowpath+0x358/0x418 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2] add_transaction_credits+0x3bc/0x418 [jbd2] start_this_handle+0xf8/0x560 [jbd2] jbd2__journal_start+0x118/0x228 [jbd2] __ext4_journal_start_sb+0x110/0x188 [ext4] ext4_do_writepages+0x3dc/0x740 [ext4] ext4_writepages+0xa4/0x190 [ext4] do_writepages+0x94/0x228 __writeback_single_inode+0x48/0x318 writeback_sb_inodes+0x204/0x590 __writeback_inodes_wb+0x54/0xf8 wb_writeback+0x2cc/0x3d8 wb_do_writeback+0x2e0/0x2f8 wb_workfn+0x80/0x2a8 process_one_work+0x178/0x3e8 worker_thread+0x234/0x3b8 kthread+0xf0/0x108 ret_from_fork+0x10/0x20 So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid softlockup. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250812063752.912130-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar-1/+1
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-05-20jbd2: remove journal_t argument from jbd2_superblock_csum()Eric Biggers-3/+3
Since jbd2_superblock_csum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-5-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20jbd2: remove journal_t argument from jbd2_chksum()Eric Biggers-12/+12
Since jbd2_chksum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folioZhang Yi-3/+4
jbd2_journal_blocks_per_page() returns the number of blocks in a single page. Rename it to jbd2_journal_blocks_per_folio() and make it returns the number of blocks in the largest folio, preparing for the calculation of journal credits blocks when allocating blocks within a large folio in the writeback path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20jbd2: fix data-race and null-ptr-deref in jbd2_journal_dirty_metadata()Jeongjun Park-2/+3
Since handle->h_transaction may be a NULL pointer, so we should change it to call is_handle_aborted(handle) first before dereferencing it. And the following data-race was reported in my fuzzer: ================================================================== BUG: KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata write to 0xffff888011024104 of 4 bytes by task 10881 on cpu 1: jbd2_journal_dirty_metadata+0x2a5/0x770 fs/jbd2/transaction.c:1556 __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358 ext4_do_update_inode fs/ext4/inode.c:5220 [inline] ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869 __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074 ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103 .... read to 0xffff888011024104 of 4 bytes by task 10880 on cpu 0: jbd2_journal_dirty_metadata+0xf2/0x770 fs/jbd2/transaction.c:1512 __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358 ext4_do_update_inode fs/ext4/inode.c:5220 [inline] ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869 __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074 ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103 .... value changed: 0x00000000 -> 0x00000001 ================================================================== This issue is caused by missing data-race annotation for jh->b_modified. Therefore, the missing annotation needs to be added. Reported-by: syzbot+de24c3fe3c4091051710@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=de24c3fe3c4091051710 Fixes: 6e06ae88edae ("jbd2: speedup jbd2_journal_dirty_metadata()") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250514130855.99010-1-aha310510@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-05-08ext4: rework fast commit commit pathHarshad Shirwadkar-2/+0
This patch reworks fast commit's commit path to remove locking the journal for the entire duration of a fast commit. Instead, we only lock the journal while marking all the eligible inodes as "committing". This allows handles to make progress in parallel with the fast commit. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20250508175908.1004880-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-04-22fs/jbd2: use sleeping version of __find_get_block()Davidlohr Bueso-6/+9
Convert to the new nonatomic flavor to benefit from potential performance benefits and adapt in the future vs migration such that semantics are kept. - jbd2_journal_revoke(): can sleep (has might_sleep() in the beginning) - jbd2_journal_cancel_revoke(): only used from do_get_write_access() and do_get_create_access() which do sleep. So can sleep. - jbd2_clear_buffer_revoked_flags() - only called from journal commit code which sleeps. So can sleep. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-6-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner-2/+2
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-21jbd2: add a missing data flush during file and fs synchronizationZhang Yi-1/+11
When the filesystem performs file or filesystem synchronization (e.g., ext4_sync_file()), it queries the journal to determine whether to flush the file device through jbd2_trans_will_send_data_barrier(). If the target transaction has not started committing, it assumes that the journal will submit the flush command, allowing the filesystem to bypass a redundant flush command. However, this assumption is not always valid. If the journal is not located on the filesystem device, the journal commit thread will not submit the flush command unless the variable ->t_need_data_flush is set to 1. Consequently, the flush may be missed, and data may be lost following a power failure or system crash, even if the synchronization appears to succeed. Unfortunately, we cannot determine with certainty whether the target transaction will flush to the filesystem device before it commits. However, if it has not started committing, it must be the running transaction. Therefore, fix it by always set its t_need_data_flush to 1, ensuring that the committing thread will flush the filesystem device. Fixes: bbd2be369107 ("jbd2: Add function jbd2_trans_will_send_data_barrier()") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241206111327.4171337-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18jbd2: remove jbd2_journal_unfile_buffer()Baokun Li-15/+0
Since the function jbd2_journal_unfile_buffer() is no longer called anywhere after commit e5a120aeb57f ("jbd2: remove journal_head from descriptor buffers"), so let's remove it. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250306063240.157884-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18jbd2: fix off-by-one while erasing journalZhang Yi-9/+6
In __jbd2_journal_erase(), the block_stop parameter includes the last block of a contiguous region; however, the calculation of byte_stop is incorrect, as it does not account for the bytes in that last block. Consequently, the page cache is not cleared properly, which occasionally causes the ext4/050 test to fail. Since block_stop operates on inclusion semantics, it involves repeated increments and decrements by 1, significantly increasing the complexity of the calculations. Optimize the calculation and fix the incorrect byte_stop by make both block_stop and byte_stop to use exclusion semantics. This fixes a failure in fstests ext4/050. Fixes: 01d5d96542fd ("ext4: add discard/zeroout flags to journal flush") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250217065955.3829229-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18ext4: remove references to bh->b_pageMatthew Wilcox (Oracle)-1/+1
Buffer heads are attached to folios, not to pages. Also flush_dcache_page() is now deprecated in favour of flush_dcache_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250213182303.2133205-1-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-17jbd2: remove redundant function jbd2_journal_has_csum_v2or3_featureEric Biggers-2/+2
Since commit dd348f054b24 ("jbd2: switch to using the crc32c library"), jbd2_journal_has_csum_v2or3() and jbd2_journal_has_csum_v2or3_feature() are the same. Remove jbd2_journal_has_csum_v2or3_feature() and just keep jbd2_journal_has_csum_v2or3(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250207031424.42755-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-17jbd2: do not try to recover wiped journalJan Kara-5/+6
If a journal is wiped, we will set journal->j_tail to 0. However if 'write' argument is not set (as it happens for read-only device or for ocfs2), the on-disk superblock is not updated accordingly and thus jbd2_journal_recover() cat try to recover the wiped journal. Fix the check in jbd2_journal_recover() to use journal->j_tail for checking empty journal instead. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250206094657.20865-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-17jbd2: remove wrong sb->s_sequence checkJan Kara-1/+0
Journal emptiness is not determined by sb->s_sequence == 0 but rather by sb->s_start == 0 (which is set a few lines above). Furthermore 0 is a valid transaction ID so the check can spuriously trigger. Remove the invalid WARN_ON. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250206094657.20865-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13jbd2: Correct stale comment of release_buffer_pageKemeng Shi-2/+2
Update stale lock info in comment of release_buffer_page. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13jbd2: correct stale function name in commentKemeng Shi-5/+5
Rename stale journal_clear_revoked_flag to jbd2_clear_buffer_revoked_flags. Rename stale journal_switch_revoke to jbd2_journal_switch_revoke_table. Rename stale __journal_file_buffer to __jbd2_journal_file_buffer. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13jbd2: remove stale comment of update_t_max_waitKemeng Shi-3/+0
Commit 2d44292058828 ("jbd2: remove CONFIG_JBD2_DEBUG to update t_max_wait") removed jbd2_journal_enable_debug, just remove stale comment about jbd2_journal_enable_debug. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250123155014.2097920-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13jbd2: remove unused return value of do_readaheadKemeng Shi-8/+3
Remove unused return value of do_readahead. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13jbd2: remove unused return value of jbd2_journal_cancel_revokeKemeng Shi-4/+1
Remove unused return value of jbd2_journal_cancel_revoke. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123155014.2097920-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13jbd2: drop JBD2_ABORT_ON_SYNCDATA_ERRBaokun Li-4/+2
Since ext4's data_err=abort mode doesn't depend on JBD2_ABORT_ON_SYNCDATA_ERR anymore, and nobody else uses it, we can drop it and only warn in jbd2 as it used to be long ago. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122110533.4116662-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-13jbd2: Avoid long replay times due to high number or revoke blocksJan Kara-14/+52
Some users are reporting journal replay takes a long time when there is excessive number of revoke blocks in the journal. Reported times are like: 1048576 records - 95 seconds 2097152 records - 580 seconds The problem is that hash chains in the revoke table gets excessively long in these cases. Fix the problem by sizing the revoke table appropriately before the revoke pass. Thanks to Alexey Zhuravlev <azhuravlev@ddn.com> for benchmarking the patch with large numbers of revoke blocks [1]. [1] https://lore.kernel.org/all/20250113183107.7bfef7b6@x390.bzzz77.ru Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250121140925.17231-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-10jbd2: remove unused transaction->t_private_listKemeng Shi-1/+0
After we remove ext4 journal callback, transaction->t_private_list is not used anymore. Just remove unused transaction->t_private_list. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241218145414.1422946-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-01-22Merge tag 'crc-for-linus' of ↵Linus Torvalds-29/+3
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: - Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be directly accessible via the library API, instead of requiring the crypto API. This is much simpler and more efficient. - Convert some users such as ext4 to use the CRC32 library API instead of the crypto API. More conversions like this will come later. - Add a KUnit test that tests and benchmarks multiple CRC variants. Remove older, less-comprehensive tests that are made redundant by this. - Add an entry to MAINTAINERS for the kernel's CRC library code. I'm volunteering to maintain it. I have additional cleanups and optimizations planned for future cycles. * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits) MAINTAINERS: add entry for CRC library powerpc/crc: delete obsolete crc-vpmsum_test.c lib/crc32test: delete obsolete crc32test.c lib/crc16_kunit: delete obsolete crc16_kunit.c lib/crc_kunit.c: add KUnit test suite for CRC library functions powerpc/crc-t10dif: expose CRC-T10DIF function through lib arm64/crc-t10dif: expose CRC-T10DIF function through lib arm/crc-t10dif: expose CRC-T10DIF function through lib x86/crc-t10dif: expose CRC-T10DIF function through lib crypto: crct10dif - expose arch-optimized lib function lib/crc-t10dif: add support for arch overrides lib/crc-t10dif: stop wrapping the crypto API scsi: target: iscsi: switch to using the crc32c library f2fs: switch to using the crc32 library jbd2: switch to using the crc32c library ext4: switch to using the crc32c library lib/crc32: make crc32c() go directly to lib bcachefs: Explicitly select CRYPTO from BCACHEFS_FS x86/crc32: expose CRC32 functions through lib x86/crc32: update prototype for crc32_pclmul_le_16() ...
2024-12-04jbd2: flush filesystem device before updating tail sequenceZhang Yi-2/+2
When committing transaction in jbd2_journal_commit_transaction(), the disk caches for the filesystem device should be flushed before updating the journal tail sequence. However, this step is missed if the journal is not located on the filesystem device. As a result, the filesystem may become inconsistent following a power failure or system crash. Fix it by ensuring that the filesystem device is flushed appropriately. Fixes: 3339578f0578 ("jbd2: cleanup journal tail after transaction commit") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20241203014407.805916-3-yi.zhang@huaweicloud.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-04jbd2: increase IO priority for writing revoke recordsZhang Yi-1/+1
Commit '6a3afb6ac6df ("jbd2: increase the journal IO's priority")' increases the priority of journal I/O by marking I/O with the JBD2_JOURNAL_REQ_FLAGS. However, that commit missed the revoke buffers, so also addresses that kind of I/Os. Fixes: 6a3afb6ac6df ("jbd2: increase the journal IO's priority") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20241203014407.805916-2-yi.zhang@huaweicloud.com Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-01jbd2: switch to using the crc32c libraryEric Biggers-29/+3
Now that the crc32c() library function directly takes advantage of architecture-specific optimizations, it is unnecessary to go through the crypto API. Just use crc32c(). This is much simpler, and it improves performance due to eliminating the crypto API overhead. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20241202010844.144356-18-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-11-13jbd2: Fix comment describing journal_init_common()Daniel Martín Gómez-3/+4
The code indicates that journal_init_common() fills the journal_t object it returns while the comment incorrectly states that only a few fields are initialised. Also, the comment claims that journal structures could be created from scratch which isn't possible as journal_init_common() calls journal_load_superblock() which loads and checks journal superblock from disk. Signed-off-by: Daniel Martín Gómez <dalme@riseup.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241107144538.3544-1-dalme@riseup.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: make b_frozen_data allocation always succeedZhihao Cheng-11/+1
The b_frozen_data allocation should not be failed during journal committing process, otherwise jbd2 will abort. Since commit 490c1b444ce653d("jbd2: do not fail journal because of frozen_buffer allocation failure") already added '__GFP_NOFAIL' flag in do_get_write_access(), just add '__GFP_NOFAIL' flag for all allocations in jbd2_journal_write_metadata_buffer(), like 'new_bh' allocation does. Besides, remove all error handling branches for do_get_write_access(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241012085530.2147846-1-chengzhihao@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: remove the 'success' parameter from the jbd2_do_replay() functionYe Bin-9/+12
Keep 'success' internally to track if any error happened and then return it at the end in do_one_pass(). If jbd2_do_replay() return -ENOMEM then stop replay journal. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240930005942.626942-7-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: remove useless 'block_error' variableYe Bin-7/+2
The judgement 'if (block_error && success == 0)' is never valid. Just remove useless 'block_error' variable. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240930005942.626942-6-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>