linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Lines
2026-04-07	gfs2: drain ail under sd_log_flush_lock	Andreas Gruenbacher	-2/+1
	When a withdraw is carried out, call gfs2_ail_drain() under the sdp->sd_log_flush_lock. This isn't strictly necessary but should be easier to read, and more robust against possible future bugs. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07	drm/xe: Fix bug in idledly unit conversion	Vinay Belgaumkar	-2/+1
	We only need to convert to picosecond units before writing to RING_IDLEDLY. Fixes: 7c53ff050ba8 ("drm/xe: Apply Wa_16023105232") Cc: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Acked-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260401012710.4165547-1-vinay.belgaumkar@intel.com (cherry picked from commit 13743bd628bc9d9a0e2fe53488b2891aedf7cc74) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-04-07	scripts/dtc: Update to upstream version v1.7.2-69-g53373d135579	Rob Herring (Arm)	-8/+18
	This adds the following commits from upstream: 53373d135579 dtc: Remove unused dts_version in dtc-lexer.l caf7465c5d60 libfdt: fdt_check_full: Handle FDT_NOP when FDT_END is expected 5976c4a66098 libfdt: fdt_rw: Introduce fdt_downgrade_version() 5bb5bedd347d fdtdump: Return an error code on wrong tag value 68b960e299f7 fdtdump: Remove dtb version check adba02caf554 dtc: Use a consistent type for basenamelen 8d15a63e84ff libfdt: Verify alignment of sub-blocks in dtb Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2026-04-07	cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP	Mario Limonciello	-0/+1
	The dynamic EPP feature uses power_supply_reg_notifier() and power_supply_unreg_notifier() but doesn't declare a dependency on POWER_SUPPLY, causing linker errors when POWER_SUPPLY is not enabled. Add POWER_SUPPLY to the selects. Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Fixes: e30ca6dd5345 ("cpufreq/amd-pstate: Add dynamic energy performance preference") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604040742.ySEdkuAa-lkp@intel.com/ Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20260407194949.310114-1-mario.limonciello@amd.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-04-07	kbuild: expand inlining hints with -fdiagnostics-show-inlining-chain	Justin Stitt	-0/+4
	Clang recently added -fdiagnostics-show-inlining-chain [1] to improve the visibility of inlining chains in diagnostics. This is particularly useful for CONFIG_FORTIFY_SOURCE where detections can happen deep in inlined functions. Add this flag to KBUILD_CFLAGS under a cc-option so it is enabled if the compiler supports it. Note that GCC does not have an equivalent flag as it supports a similar diagnostic structure unconditionally. Link: https://github.com/llvm/llvm-project/pull/174892 [1] Link: https://github.com/ClangBuiltLinux/linux/issues/1571 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20260330-kbuild-show-inlining-v2-1-c0c481a4ea7b@google.com Signed-off-by: Nicolas Schier <nsc@kernel.org>
2026-04-07	PCI: Remove no_pci_devices()	Heiner Kallweit	-20/+0
	After having removed the last usage of no_pci_devices(), this function can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/b0ce592d-c34c-4e0b-b389-4e346b3a0c44@gmail.com
2026-04-07	Input: pc110pad - remove driver	Dmitry Torokhov	-171/+0
	Palm Top PC 110 is a handheld personal computer with 80486SX CPU that was released exclusively in Japan in September 1995. While the kernel still supports 486 CPU it is highly unlikely that anyone is using this device with the latest kernel. Remove the driver. [bhelgaas: since this was posted, "x86/cpu: Remove M486/M486SX/ELAN support" has been queued for v7.1, so pc110pad is no longer relevant: https://lore.kernel.org/all/20251214084710.3606385-2-mingo@kernel.org/] Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20240808172733.1194442-4-dmitry.torokhov@gmail.com
2026-04-07	bpf: Retire rcu_trace_implies_rcu_gp()	Kumar Kartikeya Dwivedi	-60/+19
	RCU Tasks Trace grace period implies RCU grace period, and this guarantee is expected to remain in the future. Only BPF is the user of this predicate, hence retire the API and clean up all in-tree users. RCU Tasks Trace is now implemented on SRCU-fast and its grace period mechanism always has at least one call to synchronize_rcu() as it is required for SRCU-fast's correctness (it replaces the smp_mb() that SRCU-fast readers skip). So, RCU-tt GP will always imply RCU GP. Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260407162234.785270-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07	selftests/bpf: Allow prog name matching for tests with __description	Kumar Kartikeya Dwivedi	-20/+76
	For tests that carry a __description tag, allow matching on both the description string and program name for convenience. Before this commit, the description string must be spelt out to filter the tests. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260407145606.3991770-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07	watchdog: ni903x_wdt: Convert to a platform driver	Rafael J. Wysocki	-13/+14
	In all cases in which a struct acpi_driver is used for binding a driver to an ACPI device object, a corresponding platform device is created by the ACPI core and that device is regarded as a proper representation of underlying hardware. Accordingly, a struct platform_driver should be used by driver code to bind to that device. There are multiple reasons why drivers should not bind directly to ACPI device objects [1]. In particular, registering a watchdog device under a struct acpi_device is questionable because it causes the watchdog to be hidden in the ACPI bus sysfs hierarchy and it goes against the general rule that a struct acpi_device can only be a parent of another struct acpi_device. Overall, it is better to bind drivers to platform devices than to their ACPI companions, so convert the ni903x_wdt watchdog ACPI driver to a platform one. While this is not expected to alter functionality, it changes sysfs layout and so it will be visible to user space. Note that after this change it actually makes sense to look for the "timeout-sec" property via device_property_read_u32() under the device passed to watchdog_init_timeout() because it has an fwnode handle (unlike a struct acpi_device which is an fwnode itself). Link: https://lore.kernel.org/all/2396510.ElGaqSPkdT@rafael.j.wysocki/ [1] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/13996583.uLZWGnKmhe@rafael.j.wysocki
2026-04-07	ACPI: PAD: xen: Convert to a platform driver	Rafael J. Wysocki	-11/+12
	In all cases in which a struct acpi_driver is used for binding a driver to an ACPI device object, a corresponding platform device is created by the ACPI core and that device is regarded as a proper representation of underlying hardware. Accordingly, a struct platform_driver should be used by driver code to bind to that device. There are multiple reasons why drivers should not bind directly to ACPI device objects [1]. Overall, it is better to bind drivers to platform devices than to their ACPI companions, so convert the Xen ACPI processor aggregator device (PAD) driver to a platform one. While this is not expected to alter functionality, it changes sysfs layout and so it will be visible to user space. Link: https://lore.kernel.org/all/2396510.ElGaqSPkdT@rafael.j.wysocki/ [1] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8683270.T7Z3S40VBb@rafael.j.wysocki
2026-04-07	fs/resctrl: Add missing return value descriptions	Reinette Chatre	-0/+8
	Using the stricter "./tools/docs/kernel-doc -Wall -v" to verify proper formatting of documentation comments includes warnings related to return markup on functions that are omitted during the default verification checks. This stricter verification reports a couple of missing return descriptions in resctrl: Warning: .../fs/resctrl/rdtgroup.c:1536 No description found for return value of 'rdtgroup_cbm_to_size' Warning: .../fs/resctrl/rdtgroup.c:3131 No description found for return value of 'mon_get_kn_priv' Warning: .../fs/resctrl/rdtgroup.c:3523 No description found for return value of 'cbm_ensure_valid' Warning: .../fs/resctrl/monitor.c:238 No description found for return value of 'resctrl_find_cleanest_closid' Add the missing return descriptions. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/1c50b9f7c73251c007133590986f127e1af57780.1775576382.git.reinette.chatre@intel.com
2026-04-07	MAINTAINERS: Update resctrl entry	Reinette Chatre	-0/+2
	The x86 maintainers handle the resctrl filesystem and x86 architectural resctrl code. Even so, the x86 maintainers are not part of the resctrl section and not returned when scripts/get_maintainer.pl is run on resctrl filesystem code. With patches flowing via x86 maintainers resctrl should also ensure it follows the tip rules. Add the x86 maintainer alias, x86@kernel.org, to the resctrl section to ensure x86 maintainers are included in associated resctrl submissions. Add a reference to the tip tree handbook to make it clear which rules resctrl follows. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/4c14dd82e81737c6413e10fe097475b1cc0886fc.1775576382.git.reinette.chatre@intel.com
2026-04-07	sched_ext: Documentation: Fix scx_bpf_move_to_local kfunc name	fangqiurong	-2/+2
	The correct kfunc name is scx_bpf_dsq_move_to_local(), not scx_bpf_move_to_local(). Fix the two references in the Scheduling Cycle section. Signed-off-by: fangqiurong <fangqiurong@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-07	workqueue: use NR_STD_WORKER_POOLS instead of hardcoded value	Maninder Singh	-2/+2
	use NR_STD_WORKER_POOLS for irq_work_fns[] array definition. NR_STD_WORKER_POOLS is also 2, but better to use MACRO. Initialization loop for_each_bh_worker_pool() also uses same MACRO. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-07	of: property: Allow fw_devlink device-tree on x86	Herve Codina	-1/+25
	PCI drivers can use a device-tree overlay to describe the hardware available on the PCI board. This is the case, for instance, of the LAN966x PCI device driver. Adding some more nodes in the device-tree overlay adds some more consumer/supplier relationship between devices instantiated from this overlay. Those fw_node consumer/supplier relationships are handled by fw_devlink and are created based on the device-tree parsing done by the of_fwnode_add_links() function. Those consumer/supplier links are needed in order to ensure a correct PM runtime management and a correct removal order between devices. For instance, without those links a supplier can be removed before its consumers is removed leading to all kind of issue if this consumer still want the use the already removed supplier. The support for the usage of an overlay from a PCI driver has been added on x86 systems in commit 1f340724419ed ("PCI: of: Create device tree PCI host bridge node"). In the past, support for fw_devlink on x86 had been tried but this support has been removed in commit 4a48b66b3f52 ("of: property: Disable fw_devlink DT support for X86"). Indeed, this support was breaking some x86 systems such as OLPC system and the regression was reported in [0]. Instead of disabling this support for all x86 system, use a finer grain and disable this support only for the possible problematic subset of x86 systems (at least OLPC and CE4100). Those systems use a device-tree to describe their hardware. Identify those systems using key properties in the device-tree. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/lkml/3c1f2473-92ad-bfc4-258e-a5a08ad73dd0@web.de/ [0] Link: https://patch.msgid.link/20260325143555.451852-18-herve.codina@bootlin.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2026-04-07	btrfs: btrfs_log_dev_io_error() on all bio errors	Boris Burkov	-2/+10
	As far as I can tell, we never intentionally constrained ourselves to these status codes, and it is misleading and surprising to lack the bdev error logging when we get a different error code from the block layer. This can lead to jumping to a wrong conclusion like "this system didn't see any bio failures but aborted with EIO". For example on nvme devices, I observe many failures coming back as BLK_STS_MEDIUM. It is apparent that the nvme driver returns a variety of BLK_STS_* status values in nvme_error_status(). So handle the known expected errors and make some noise on the rest which we expect won't really happen. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: fix silent IO error loss in encoded writes and zoned split	Michal Grzedzicki	-2/+2
	can_finish_ordered_extent() and btrfs_finish_ordered_zoned() set BTRFS_ORDERED_IOERR via bare set_bit(). Later, btrfs_mark_ordered_extent_error() in btrfs_finish_one_ordered() uses test_and_set_bit(), finds it already set, and skips mapping_set_error(). The error is never recorded on the inode's address_space, making it invisible to fsync. For encoded writes this causes btrfs receive to silently produce files with zero-filled holes. Fix: replace bare set_bit(BTRFS_ORDERED_IOERR) with btrfs_mark_ordered_extent_error() which pairs test_and_set_bit() with mapping_set_error(), guaranteeing the error is recorded exactly once. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: Michal Grzedzicki <mge@meta.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: skip clearing EXTENT_DEFRAG for NOCOW ordered extents	Dave Chen	-3/+7
	In btrfs_finish_one_ordered(), clear_bits is unconditionally initialized with EXTENT_DEFRAG. For NOCOW ordered extents this is always a no-op because should_nocow() already forces the COW path when EXTENT_DEFRAG is set, so a NOCOW ordered extent can never have EXTENT_DEFRAG on its range. Although harmless, the unconditional btrfs_clear_extent_bit() call still performs a cold rbtree lookup under the io tree spinlock on every NOCOW write completion. Avoid this by only adding EXTENT_DEFRAG to clear_bits for non-NOCOW ordered extents, and skip the call entirely when there are no bits to clear. Signed-off-by: Dave Chen <davechen@synology.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: use BTRFS_FS_UPDATE_UUID_TREE_GEN flag for UUID tree rescan check	Dave Chen	-1/+1
	The UUID tree rescan check in open_ctree() compares fs_info->generation with the superblock's uuid_tree_generation. This comparison is not reliable because fs_info->generation is bumped at transaction start time in join_transaction(), while uuid_tree_generation is only updated at commit time via update_super_roots(). Between the early BTRFS_FS_UPDATE_UUID_TREE_GEN flag check and the late rescan decision, mount operations such as file orphan cleanup from an unclean shutdown start transactions without committing them. This advances fs_info->generation past uuid_tree_generation and produces a false-positive mismatch. Use the BTRFS_FS_UPDATE_UUID_TREE_GEN flag directly instead. The flag was already set earlier in open_ctree() when the generations were known to match, and accurately represents "UUID tree is up to date" without being affected by subsequent transaction starts. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Dave Chen <davechen@synology.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: remove duplicate journal_info reset on failure to commit transaction	Filipe Manana	-2/+0
	If we get an error during the transaction commit path, we are resetting current->journal_info to NULL twice - once in btrfs_commit_transaction() right before calling cleanup_transaction() and then once again inside cleanup_transaction(). Remove the instance in btrfs_commit_transaction(). Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: tag as unlikely if statements that check for fs in error state	Filipe Manana	-16/+16
	Having the filesystem in an error state, meaning we had a transaction abort, is unexpected. Mark every check for the error state with the unlikely annotation to convey that and to allow the compiler to generate better code. On x86_64, using gcc 14.2.0-19 from Debian, resulted in a slightly reduced object size and better code. Before: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 2008598 175912 15592 2200102 219226 fs/btrfs/btrfs.ko After: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 2008450 175912 15592 2199954 219192 fs/btrfs/btrfs.ko Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	Merge tag 'ata-7.0-final' of ↵	Linus Torvalds	-0/+14
	git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fix from Niklas Cassel: - Add a quirk for JMicron JMB582/JMB585 AHCI controllers such that they only use 32-bit DMA addresses. While these controllers do report that they support 64-bit DMA addresses, a user reports that using 64-bit DMA addresses cause silent corruption even on modern x86 systems (Arthur) * tag 'ata-7.0-final' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: ahci: force 32-bit DMA for JMicron JMB582/JMB585
2026-04-07	Merge tag 'hyperv-fixes-signed-20260406' of ↵	Linus Torvalds	-8/+29
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V fixes from Wei Liu: - Two fixes for Hyper-V PCI driver (Long Li, Sahil Chandna) - Fix an infinite loop issue in MSHV driver (Stanislav Kinsburskii) * tag 'hyperv-fixes-signed-20260406' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: mshv: Fix infinite fault loop on permission-denied GPA intercepts PCI: hv: Fix double ida_free in hv_pci_probe error path PCI: hv: Set default NUMA node to 0 for devices without affinity info
2026-04-07	Merge tag 'mm-hotfixes-stable-2026-04-06-15-27' of ↵	Linus Torvalds	-6/+82
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Eight hotfixes. All are cc:stable and seven are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-04-06-15-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: ocfs2: fix out-of-bounds write in ocfs2_write_end_inline mm/damon/stat: deallocate damon_call() failure leaking damon_ctx mm/vma: fix memory leak in __mmap_region() mm/memory_hotplug: maintain N_NORMAL_MEMORY during hotplug mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails mm: reinstate unconditional writeback start in balance_dirty_pages() liveupdate: propagate file deserialization failures mm: filemap: fix nr_pages calculation overflow in filemap_map_pages()
2026-04-07	ASoC: amd: ps: fix the pcm device numbering for acp pdm dmic	Syed Saba Kareem	-0/+1
	Fixed PCM device numbering is required for acp pdm dmic pcm device to have a common UCM changes. Set the 'use_dai_pcm_id' flag true in acp pdm dma driver for acp 6.3 platform. This will fix the pcm device numbering based on dai_link->id. Fixes: 33cea6bbe488 ("ASoC: amd: add acp6.2 pdm platform driver") Signed-off-by: Syed Saba Kareem <Syed.SabaKareem@amd.com> Fixes: tag. Link: https://patch.msgid.link/20260403100624.676953-1-syed.sabakareem@amd.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-04-07	alarmtimer: Access timerqueue node under lock in suspend	Zhan Xusheng	-4/+8
	In alarmtimer_suspend(), timerqueue_getnext() is called under base->lock, but next->expires is read after the lock is released. This is safe because suspend freezes all relevant task contexts, but reading the node while holding the lock makes the code easier to reason about and not worry about a theoretical UAF. Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260407143627.19405-1-zhanxusheng@xiaomi.com
2026-04-07	dt-bindings: arm-smmu: qcom: Add compatible for Hawi SoC	Mukesh Ojha	-0/+1
	Qualcomm Hawi SoC include apps smmu that implements arm,mmu-500, which is used to translate device-visible virtual addresses to physical addresses. Add compatible for these items. Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-04-07	btrfs: fix double free in create_space_info() error path	Guangshuo Li	-1/+1
	When kobject_init_and_add() fails, the call chain is: create_space_info() -> btrfs_sysfs_add_space_info_type() -> kobject_init_and_add() -> failure -> kobject_put(&space_info->kobj) -> space_info_release() -> kfree(space_info) Then control returns to create_space_info(): btrfs_sysfs_add_space_info_type() returns error -> goto out_free -> kfree(space_info) This causes a double free. Keep the direct kfree(space_info) for the earlier failure path, but after btrfs_sysfs_add_space_info_type() has called kobject_put(), let the kobject release callback handle the cleanup. Fixes: a11224a016d6d ("btrfs: fix memory leaks in create_space_info() error paths") CC: stable@vger.kernel.org # 6.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: fix double free in create_space_info_sub_group() error path	Guangshuo Li	-3/+1
	When kobject_init_and_add() fails, the call chain is: create_space_info_sub_group() -> btrfs_sysfs_add_space_info_type() -> kobject_init_and_add() -> failure -> kobject_put(&sub_group->kobj) -> space_info_release() -> kfree(sub_group) Then control returns to create_space_info_sub_group(), where: btrfs_sysfs_add_space_info_type() returns error -> kfree(sub_group) Thus, sub_group is freed twice. Keep parent->sub_group[index] = NULL for the failure path, but after btrfs_sysfs_add_space_info_type() has called kobject_put(), let the kobject release callback handle the cleanup. Fixes: f92ee31e031c ("btrfs: introduce btrfs_space_info sub-group") CC: stable@vger.kernel.org # 6.18+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: do not reject a valid running dev-replace	Qu Wenruo	-1/+6
	[BUG] There is a bug report that a btrfs with running dev-replace got rejected with the following messages: BTRFS error (device sdk1): devid 0 path /dev/sdk1 is registered but not found in chunk tree BTRFS error (device sdk1): remove the above devices or use 'btrfs device scan --forget <dev>' to unregister them before mount BTRFS error (device sdk1): open_ctree failed: -117 [CAUSE] The tree and super block dumps show the fs is completely sane, except one thing, there is no dev item for devid 0 in chunk tree. However this is not a bug, as we do not insert dev item for devid 0 in the first place. Since the devid 0 is only there temporarily we do not really need to insert a dev item for it and then later remove it again. It is the commit 34308187395f ("btrfs: add extra device item checks at mount") adding a overly strict check that triggers a false alert and rejected the valid filesystem. [FIX] Add a special handling for devid 0, and doesn't require devid 0 to have a device item in chunk tree. Reported-by: Jaron Viëtor <jaron@vietors.com> Link: https://lore.kernel.org/linux-btrfs/CAF1bhLVYLZvD=j2XyuxXDKD-NWNJAwDnpVN+UYeQW-HbzNRn1A@mail.gmail.com/ Fixes: 34308187395f ("btrfs: add extra device item checks at mount") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: only invalidate btree inode pages after all ebs are released	Qu Wenruo	-7/+7
	In close_ctree(), we call invalidate_inode_pages2() to invalidate all pages from btree inode. But the problem is, it never returns 0, but always -EBUSY. The problem is that we are still holding all the essential tree root nodes, thus pages holding those tree blocks can not be invalidated thus invalidate_inode_pages2() always returns -EBUSY. This is also against the error cleanup path of open_ctree(), which properly frees all root pointers before calling invalidate_inode_pages(). So fix the order by delaying invalidate_inode_pages2() until we have freed all root pointers. Reviewed-by: Anand Jain <asj@kernel.org> Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: prevent direct reclaim during compressed readahead	JP Kobryn (Meta)	-18/+45
	Under memory pressure, direct reclaim can kick in during compressed readahead. This puts the associated task into D-state. Then shrink_lruvec() disables interrupts when acquiring the LRU lock. Under heavy pressure, we've observed reclaim can run long enough that the CPU becomes prone to CSD lock stalls since it cannot service incoming IPIs. Although the CSD lock stalls are the worst case scenario, we have found many more subtle occurrences of this latency on the order of seconds, over a minute in some cases. Prevent direct reclaim during compressed readahead. This is achieved by using different GFP flags at key points when the bio is marked for readahead. There are two functions that allocate during compressed readahead: btrfs_alloc_compr_folio() and add_ra_bio_pages(). Both currently use GFP_NOFS which includes __GFP_DIRECT_RECLAIM. For the internal API call btrfs_alloc_compr_folio(), the signature changes to accept an additional gfp_t parameter. At the readahead call site, it gets flags similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM. __GFP_NOWARN is added since these allocations are allowed to fail. Demand reads still use full GFP_NOFS and will enter reclaim if needed. All other existing call sites of btrfs_alloc_compr_folio() now explicitly pass GFP_NOFS to retain their current behavior. add_ra_bio_pages() gains a bool parameter which allows callers to specify if they want to allow direct reclaim or not. In either case, the __GFP_NOWARN flag was added unconditionally since the allocations are speculative. There has been some previous work done on calling add_ra_bio_pages() [0]. This patch is complementary: where that patch reduces call frequency, this patch reduces the latency associated with those calls. [0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@bur.io/ Reviewed-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: replace BUG_ON() with error return in cache_save_setup()	Teng Liu	-1/+7
	In cache_save_setup(), if create_free_space_inode() succeeds but the subsequent lookup_free_space_inode() still fails on retry, the BUG_ON(retries) will crash the kernel. This can happen due to I/O errors or transient failures, not just programming bugs. Replace the BUG_ON with proper error handling that returns the original error code through the existing cleanup path. The callers already handle this gracefully: disk_cache_state defaults to BTRFS_DC_ERROR, so the space cache simply won't be written for that block group. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Teng Liu <27rabbitlt@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: zstd: don't cache sectorsize in a local variable	David Sterba	-8/+4
	The sectorsize is used once or at most twice in the callbacks, no need to cache it on stack. Minor effect on zstd_compress_folios() where it saves 8 bytes of stack. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: zlib: don't cache sectorsize in a local variable	David Sterba	-5/+3
	The sectorsize is used once or at most twice in the callbacks, no need to cache it on stack. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: zlib: drop redundant folio address variable	David Sterba	-7/+3
	We're caching the current output folio address but it's not really necessary as we store it in the variable and then pass it to the stream context. We can read the folio address directly. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: lzo: inline read/write length helpers	David Sterba	-22/+6
	The LZO_LEN read/write helpers are supposed to be trivial and we're duplicating the put/get unaligned helpers so use them directly. Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: use common eb range validation in read_extent_buffer_to_user_nofault()	David Sterba	-2/+2
	The extent buffer access is checked in other helpers by check_eb_range(), which validates the requested start, length against the extent buffer. While this almost never fails we should still handle it as an error and not just warn. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: read eb folio index right before loops	David Sterba	-9/+10
	There are generic helpers to access extent buffer folio data of any length, potentially iterating over a few of them. This is a slow path, either we use the type based accessors or the eb folio allocation is contiguous and we can use the memcpy/memcmp helpers. The initialization of 'i' is done at the beginning though it may not be needed. Move it right before the folio loop, this has minor effect on generated code in __write_extent_buffer(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: rename local variable for offset in folio	David Sterba	-4/+4
	Use proper abbreviation of the 'offset in folio' in the variable name, same as we have in accessors.c. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: unify types for binary search variables	David Sterba	-1/+1
	The variables calculating where to jump next are using mixed in types which requires some conversions on the instruction level. Using 'u32' removes one call to 'movslq', making the main loop shorter. This complements type conversion done in a724f313f84beb ("btrfs: do unsigned integer division in the extent buffer binary search loop") Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: remove duplicate calculation of eb offset in btrfs_bin_search()	David Sterba	-1/+0
	In the main search loop the variable 'oil' (offset in folio) is set twice, one duplicated when the key fits completely to the contiguous range. We can remove it and while it's just a simple calculation, the binary search loop is executed many times so micro optimizations add up. The code size is reduced by 64 bytes on release config, the loop is reorganized a bit and a few instructions shorter. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: tree-checker: add remap-tree checks to check_block_group_item()	Mark Harmstone	-0/+41
	Add some write-time checks for block group items relating to the remap tree. Here we're checking: * That the REMAPPED or METADATA_REMAP flags aren't set unless the REMAP_TREE incompat flag is also set * That `remap_bytes` isn't more than the size of the block group * That `identity_remap_count` isn't more than the number of sectors in the block group Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: make btrfs_free_log() and btrfs_free_log_root_tree() return void	Filipe Manana	-8/+4
	These functions never fail, always return success (0) and none of the callers care about their return values. Change their return type from int to void. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: fix deadlock between reflink and transaction commit when using ↵	Filipe Manana	-0/+45
	flushoncommit When using the flushoncommit mount option, we can have a deadlock between a transaction commit and a reflink operation that copied an inline extent to an offset beyond the current i_size of the destination node. The deadlock happens like this: 1) Task A clones an inline extent from inode X to an offset of inode Y that is beyond Y's current i_size. This means we copied the inline extent's data to a folio of inode Y that is beyond its EOF, using a call to copy_inline_to_page(); 2) Task B starts a transaction commit and calls btrfs_start_delalloc_flush() to flush delalloc; 3) The delalloc flushing sees the new dirty folio of inode Y and when it attempts to flush it, it ends up at extent_writepage() and sees that the offset of the folio is beyond the i_size of inode Y, so it attempts to invalidate the folio by calling folio_invalidate(), which ends up at btrfs' folio invalidate callback - btrfs_invalidate_folio(). There it tries to lock the folio's range in inode Y's extent io tree, but it blocks since it's currently locked by task A - during a reflink we lock the inodes and the source and destination ranges after flushing all delalloc and waiting for ordered extent completion - after that we don't expect to have dirty folios in the ranges, the exception is if we have to copy an inline extent's data (because the destination offset is not zero); 4) Task A then attempts to start a transaction to update the inode item, and then it's blocked since the current transaction is in the TRANS_STATE_COMMIT_START state. Therefore task A has to wait for the current transaction to become unblocked (its state >= TRANS_STATE_UNBLOCKED). So task A is waiting for the transaction commit done by task B, and the later waiting on the extent lock of inode Y that is currently held by task A. Syzbot recently reported this with the following stack traces: INFO: task kworker/u8:7:1053 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:7 state:D stack:23520 pid:1053 tgid:1053 ppid:2 task_flags:0x4208060 flags:0x00080000 Workqueue: writeback wb_workfn (flush-btrfs-46) Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wait_extent_bit fs/btrfs/extent-io-tree.c:811 [inline] btrfs_lock_extent_bits+0x59c/0x700 fs/btrfs/extent-io-tree.c:1914 btrfs_lock_extent fs/btrfs/extent-io-tree.h:152 [inline] btrfs_invalidate_folio+0x43d/0xc40 fs/btrfs/inode.c:7704 extent_writepage fs/btrfs/extent_io.c:1852 [inline] extent_write_cache_pages fs/btrfs/extent_io.c:2580 [inline] btrfs_writepages+0x12ff/0x2440 fs/btrfs/extent_io.c:2713 do_writepages+0x32e/0x550 mm/page-writeback.c:2554 __writeback_single_inode+0x133/0x11a0 fs/fs-writeback.c:1750 writeback_sb_inodes+0x995/0x19d0 fs/fs-writeback.c:2042 wb_writeback+0x456/0xb70 fs/fs-writeback.c:2227 wb_do_writeback fs/fs-writeback.c:2374 [inline] wb_workfn+0x41a/0xf60 fs/fs-writeback.c:2414 process_one_work kernel/workqueue.c:3276 [inline] process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440 kthread+0x388/0x470 kernel/kthread.c:436 ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> INFO: task syz.4.64:6910 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz.4.64 state:D stack:22752 pid:6910 tgid:6905 ppid:5944 task_flags:0x400140 flags:0x00080002 Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wait_current_trans+0x39f/0x590 fs/btrfs/transaction.c:535 start_transaction+0x6a7/0x1650 fs/btrfs/transaction.c:705 clone_copy_inline_extent fs/btrfs/reflink.c:299 [inline] btrfs_clone+0x128a/0x24d0 fs/btrfs/reflink.c:529 btrfs_clone_files+0x271/0x3f0 fs/btrfs/reflink.c:750 btrfs_remap_file_range+0x76b/0x1320 fs/btrfs/reflink.c:903 vfs_copy_file_range+0xda7/0x1390 fs/read_write.c:1600 __do_sys_copy_file_range fs/read_write.c:1683 [inline] __se_sys_copy_file_range+0x2fb/0x480 fs/read_write.c:1650 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f73afc799 RSP: 002b:00007f5f7315e028 EFLAGS: 00000246 ORIG_RAX: 0000000000000146 RAX: ffffffffffffffda RBX: 00007f5f73d75fa0 RCX: 00007f5f73afc799 RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000005 RBP: 00007f5f73b92c99 R08: 0000000000000863 R09: 0000000000000000 R10: 00002000000000c0 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5f73d76038 R14: 00007f5f73d75fa0 R15: 00007fff138a5068 </TASK> INFO: task syz.4.64:6975 blocked for more than 143 seconds. Not tainted syzkaller #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz.4.64 state:D stack:24736 pid:6975 tgid:6905 ppid:5944 task_flags:0x400040 flags:0x00080002 Call Trace: <TASK> context_switch kernel/sched/core.c:5298 [inline] __schedule+0x1553/0x5240 kernel/sched/core.c:6911 __schedule_loop kernel/sched/core.c:6993 [inline] schedule+0x164/0x360 kernel/sched/core.c:7008 wb_wait_for_completion+0x3e8/0x790 fs/fs-writeback.c:227 __writeback_inodes_sb_nr+0x24c/0x2d0 fs/fs-writeback.c:2838 try_to_writeback_inodes_sb+0x9a/0xc0 fs/fs-writeback.c:2886 btrfs_start_delalloc_flush fs/btrfs/transaction.c:2175 [inline] btrfs_commit_transaction+0x82e/0x31a0 fs/btrfs/transaction.c:2364 btrfs_ioctl+0xca7/0xd00 fs/btrfs/ioctl.c:5206 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xff/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5f73afc799 RSP: 002b:00007f5f7313d028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f5f73d76090 RCX: 00007f5f73afc799 RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000004 RBP: 00007f5f73b92c99 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5f73d76128 R14: 00007f5f73d76090 R15: 00007fff138a5068 </TASK> Fix this by updating the i_size of the destination inode of a reflink operation after we copy an inline extent's data to an offset beyond the i_size and before attempting to start a transaction to update the inode's item. Reported-by: syzbot+63056bf627663701bbbf@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/69bba3fe.050a0220.227207.002f.GAE@google.com/ Fixes: 05a5a7621ce6 ("Btrfs: implement full reflink support for inline extents") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: tree-checker: check remap-tree flags in btrfs_check_chunk_valid()	Mark Harmstone	-0/+14
	Add a check to btrfs_check_chunk_valid() that the METADATA_REMAP and REMAPPED flags are only set if the REMAP_TREE incompat flag is also set. Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: tree-checker: add checker for items in remap tree	Mark Harmstone	-0/+70
	Add write-time checking of items in the remap tree, to catch errors before they are written to disk. We're checking: * That remap items, remap backrefs, and identity remaps aren't written unless the REMAP_TREE incompat flag is set * That identity remaps have a size of 0 * That remap items and remap backrefs have a size of sizeof(struct btrfs_remap_item) * That the objectid for these items is aligned to the sector size * That the offset for these items (i.e. the size of the remapping) isn't 0 and is aligned to the sector size * That objectid + offset doesn't overflow Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: fix unnecessary flush on close when truncating zero-sized files	Dave Chen	-1/+1
	In btrfs_setsize(), when a file is truncated to size 0, the BTRFS_INODE_FLUSH_ON_CLOSE flag is unconditionally set to ensure pending writes get flushed on close. This flag was designed to protect the "truncate-then-rewrite" pattern, where an application truncates a file with existing data down to zero and writes new content, ensuring the new data reach disk on close. However, when a file already has a size of 0 (e.g. a newly created file opened with O_CREAT \| O_TRUNC), oldsize and newsize are both 0. In this case, setting BTRFS_INODE_FLUSH_ON_CLOSE is unnecessary because no "good data" was truncated away. The subsequent filemap_flush() in btrfs_release_file() then triggers avoidable writeback that disrupts the normal delayed writeback batching, adding I/O overhead. This comes from a real workload. A backup service creates temporary files via mkstemp(), closes them, and later reopens them with O_TRUNC for writing. The O_TRUNC is defensive. The file creation and usage is done by a different component, so removing the unneeded truncation is not straightforward. This pattern repeats for a large number of files each close() triggers an unnecessary filemap_flush(). Signed-off-by: Dave Chen <davechen@synology.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07	btrfs: move shutdown and remove_bdev callbacks out of experimental features	Qu Wenruo	-6/+0
	These two new callbacks have been introduced in v6.19, and it has been two releases in v7.1. During that time we have not yet exposed bugs related that two features, thus it's time to expose them for end users. It's especially important to expose remove_bdev callback to end users. That new callback makes btrfs automatically shutdown or go degraded when a device is missing (depending on if the fs can maintain RW), which is affecting end users. We want some feedback from early adopters. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>