summaryrefslogtreecommitdiffstats
path: root/fs/bcachefs/rebalance.c
AgeCommit message (Collapse)AuthorLines
2025-09-29Remove bcachefs core codeLinus Torvalds-889/+0
bcachefs was marked 'externally maintained' in 6.17 but the code remained to make the transition smoother. It's now a DKMS module, making the in-kernel code stale, so remove it to avoid any version confusion. Link: https://lore.kernel.org/linux-bcachefs/yokpt2d2g2lluyomtqrdvmkl3amv3kgnipmenobkpgx537kay7@xgcgjviv3n7x/T/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-06-02bcachefs: bch_err_throw()Kent Overstreet-2/+2
Add a tracepoint for any time we return an error and unwind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-01bcachefs: Replace rcu_read_lock() with guardsKent Overstreet-9/+8
The new guard(), scoped_guard() allow for more natural code. Some of the uses with creative flow control have been left. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-27bcachefs: Fix lost rebalance wakeupsKent Overstreet-2/+4
Fix a missing wakeup in 'bcachefs set-file-option' -> xattr option update -> inode_write this was missing because the wakeup needs to happen after transaction commit. Also, add a 'kick' counter, to make sure we don't miss a wakeup that occured right after we finished checking the rebalance_work btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: struct bch_fs_recoveryKent Overstreet-1/+1
bch_fs has gotten obnoxiously big, let's start organizing thins a bit better. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: do_rebalance_scan() now only updates bch_extent_rebalanceKent Overstreet-15/+27
This ensures that our pending rebalance work accounting is accurate quickly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: opts.rebalance_on_ac_onlyKent Overstreet-4/+44
Add an option for setting rebalance to only run when connected to mains power. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_dev_in_target() no longer takes rcu_read_lock()Kent Overstreet-2/+8
Minor optimization, the caller generally has it already. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_check_rebalance_work()Kent Overstreet-0/+116
Add a pass for checking the rebalance_work btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: plumb btree_id through move_pred_fdKent Overstreet-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Rebalance now skips poisoned extentsKent Overstreet-0/+6
Let's not move poisoned extents unnecessarily, since we can't guard against introducing more bitrot. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-14bcachefs: Don't strip rebalance_opts from indirect extentsKent Overstreet-1/+1
Fix bch2_bkey_clear_needs_rebalance(): indirect extents are never supposed to have bch_extent_rebalance stripped off, because that's how we get the IO path options when we don't have the original inode it belonged to. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-21bcachefs: Start copygc, rebalance threads earlierKent Overstreet-0/+7
Previously, copygc and rebalance weren't started until the very end of mounting, after all recvoery passes have finished. But copygc really should be started earlier, since it may be needed for allocations to make forward progress. Additionally, we've been seeing occasional bug reports where starting the kthread fails due to a pending signal - i.e. we're getting timed out by systemd (during a version upgrade), but we're not seeing the signal until mount is about to complete. Additionally, we now have copygc/rebalance explicitly wait for check_snapshots to complete (if being run); they require that for snapshot_is_ancestor() in the data move path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: bch2_copygc_wakeup()Kent Overstreet-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: Kill btree_iter.transKent Overstreet-6/+6
This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30bcachefs: Don't use designated initializers for disk_accounting_posKent Overstreet-1/+2
Not all compilers fully initialize these - they're not guaranteed to because of the union shenanigans. Fixes: https://github.com/koverstreet/bcachefs/issues/844 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30bcachefs: fix units in rebalance_statusKent Overstreet-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: bch2_bkey_ptrs_rebalance_opts()Kent Overstreet-4/+9
Small optimization for bch2_bkey_sectors_need_rebalance() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16bcachefs: rebalance, copygc status also print stacktraceKent Overstreet-3/+26
These are commonly needed when debugging, and saves from having to ask users to dig. Also, rebalance_status now includes pending rebalance work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14bcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enumsKent Overstreet-2/+2
The uppercase/lowercase style is nice for making the namespace explicit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on ↵Kent Overstreet-5/+3
bch_extent_rebalance Previously, bch2_bkey_sectors_need_rebalance() called bch2_target_accepts_data(), checking whether the target is writable. However, this means that adding or removing devices from a target would change the value of bch2_bkey_sectors_need_rebalance() for an existing extent; this needs to be invariant so that the extent trigger can correctly maintain rebalance_work accounting. Instead, check target_accepts_data() in io_opts_to_rebalance_opts(), before creating the bch_extent_rebalance entry. This fixes (one?) cause of rebalance_work accounting being off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Kill BCH_TRANS_COMMIT_lazy_rwKent Overstreet-2/+1
We unconditionally go read-write, if we're going to do so, before journal replay: lazy_rw is obsolete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Move bch_extent_rebalance code to rebalance.cKent Overstreet-0/+186
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Improve trace_rebalance_extentKent Overstreet-5/+21
We now say explicitly which pointers are being moved or compressed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Simplify option logic in rebalanceKent Overstreet-32/+15
Since bch2_move_get_io_opts() now synchronizes io_opts with options from bch_extent_rebalance, delete the ad-hoc logic in rebalance.c that previously did this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: get_update_rebalance_opts()Kent Overstreet-1/+1
bch2_move_get_io_opts() now synchronizes options loaded from the filesystem and inode (if present, i.e. not walking the reflink btree directly) with options from the bch_extent_rebalance_entry, updating the extent if necessary. Since bch_extent_rebalance tracks where its option came from we can preserve "inode options override filesystem options", even for indirect extents where we don't have access to the inode the options came from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_write_inode() now checks for changing rebalance optionsKent Overstreet-2/+2
Previously, BCHFS_IOC_REINHERIT_ATTRS didn't trigger rebalance scans when changing rebalance options - it had been missed, only the xattr interface triggered them. Ideally they'd be done by the transactional trigger, but unpacking the inode to get the options is too heavy to be done in the low level trigger - the inode trigger is run on every extent update, since the bch_inode.bi_journal_seq has to be updated for fsync. bch2_write_inode() is a good compromise, it already unpacks and repacks and is not run in any super-fast paths. Additionally, creating the new rebalance entry to trigger the scan is now done in the same transaction as the inode update that changed the options. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: copygc_enabled, rebalance_enabled now opts.h optionsKent Overstreet-2/+2
They can now be set at mount time Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: rename bch_extent_rebalance fields to match other opts structsKent Overstreet-6/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: bch2_io_opts_fixups()Kent Overstreet-2/+2
Centralize some io path option fixups - they weren't always being applied correctly: - background_compression uses compression if unset - background_target uses foreground_target if unset - nocow disables most fancy io path options Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-18bcachefs: Don't use commit_do() unnecessarilyKent Overstreet-1/+3
Using commit_do() to call alloc_sectors_start_trans() breaks when we're randomly injecting transaction restarts - the restart in the commit causes us to leak the lock that alloc_sectorS_start_trans() takes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21bcachefs: rebalance writes use BCH_WRITE_ONLY_SPECIFIED_DEVSKent Overstreet-0/+3
this was an oversight: rebalance is moving data to a specific device, so we don't want it falling back to the full filesystem Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trans_unlock() must always be followed by relock() or begin()Kent Overstreet-0/+2
We're about to add new asserts for btree_trans locking consistency, and part of that requires that aren't using the btree_trans while it's unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet-4/+4
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: rebalance_status now shows correct unitsDaniel Hill-2/+2
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet-3/+2
The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: remove redundant variable tmpColin Ian King-3/+1
The variable tmp is being assigned a value but it isn't being read afterwards. The assignment is redundant and so tmp can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Improve move_extent tracepointKent Overstreet-2/+1
Also print out the data_opts, so that we can see what specifically is being done to an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Add missing bch2_moving_ctxt_flush_all()Kent Overstreet-0/+1
This fixes a bug with rebalance IOs getting stuck with reads completed, but writes never being issued. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_err_(fn|msg) check if should printKent Overstreet-3/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: rebalance should wakeup on shutdown if disabledDaniel Hill-2/+10
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add a rebalance, data_update tracepointsKent Overstreet-0/+15
Add a tracepoint for rebalance, printing out - the target option - the compression option - the key being rebalanced Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename BTREE_INSERT flagsKent Overstreet-3/+3
BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Data move path now uses bch2_trans_unlock_long()Kent Overstreet-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Ensure copygc does not spinKent Overstreet-6/+4
If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: rebalance_workKent Overstreet-226/+327
This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: move.c exports, refactoringKent Overstreet-2/+1
Prep work for the new rebalance code - we need a few helpers exported. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Break up io.cKent Overstreet-2/+0
More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert more code to bch_err_msg()Kent Overstreet-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix divide by zero in rebalance_work()Kent Overstreet-0/+4
This fixes https://github.com/koverstreet/bcachefs-tools/issues/159 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>