summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorLines
2026-02-15Merge tag 'i2c-for-7.0-rc1' of ↵Linus Torvalds-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: - core: cleaner fwnode usage - tegra: timing improvements and Tegra264 support - lpi2c: fix SMBus block read NACK after byte count - amd-mp2, designware, mlxbf, rtl9300, spacemit, tegra: cleanups - designware: - use a dedicated algorithm for AMD Navi - replace magic numbers with named constants - replace min_t() with min() to avoid u8 truncation - refactor core to enable mode switching - imx-lpi2c: add runtime PM support for IRQ and clock handling - lan9691-i2c: add new driver - rtl9300: use OF helpers directly and avoid fwnode handling - spacemit: add bus reset support - units: add HZ_PER_GHZ and use it in several i2c drivers - at24 i2c eeprom: - add a set of new compatibles to DT bindings - use dev_err_probe() consistently in the driver * tag 'i2c-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (46 commits) i2c: imx-lpi2c: fix SMBus block read NACK after byte count i2c: designware: Remove an unnecessary condition i2c: designware: Enable mode swapping i2c: designware: Combine the init functions i2c: designware: Combine some of the common functions i2c: designware: Use device_is_compatible() instead of custom approach dt-bindings: eeprom: at24: Add compatible for Puya P24C128F drivers/i2c/busses: use min() instead of min_t() i2c: imx-lpi2c: Add runtime PM support for IRQ and clock management on i.MX8QXP/8QM i2c: amd-mp2: clean up amd_mp2_find_device() i2c: designware: Replace magic numbers with named constants i2c: rtl9300: use of instead of fwnode i2c: rtl9300: remove const cast i2c: tegra: remove unused rst i2c: designware: Remove not-going-to-be-supported code for Baikal SoC i2c: spacemit: drop useless spaces i2c: mlxbf: Use HZ_PER_KHZ in the driver i2c: mlxbf: Remove unused bus speed definitions i2c: core: Use dev_fwnode() i2c: core: Replace custom implementation of device_match_fwnode() ...
2026-02-15Merge tag 'input-for-v7.0-rc0' of ↵Linus Torvalds-180/+0
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - support for FocalTech FT8112 added to i2c-hid driver - support for FocalTech FT3518 added to edt-ft5x06 driver - support for power buttons in TWL603x chips added to twl4030-pwrbutton driver - an update to gpio-decoder driver to make it usable on non-OF platforms and to clean up the code - an update to synaptics_i2c driver switching it to use managed resources and a fix to restarting polling after resume - an update to gpio-keys driver to fall back to getting IRQ from resources if not specified using other means - an update to ili210x driver to support polling mode - a number of input drivers switched to scnprintf() to suppress truncation warnings - a number of updates and conversions of device tree bindings to yaml format - fixes to spelling in comments and messages in several drivers - other assorted fixups * tag 'input-for-v7.0-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits) dt-bindings: input: qcom,pm8941-pwrkey: Document PMM8654AU dt-bindings: input: touchscreen: imagis: allow linux,keycodes for ist3038 Input: apbps2 - fix comment style and typos Input: gpio_keys - fall back to platform_get_irq() for interrupt-only keys Input: novatek-nvt-ts - drop wake_type check dt-bindings: input: touchscreen: tsc2007: document '#io-channel-cells' Input: ili210x - add support for polling mode dt-bindings: touchscreen: trivial-touch: Drop 'interrupts' requirement for old Ilitek Input: appletouch - fix potential race between resume and open HID: i2c-hid: Add FocalTech FT8112 dt-bindings: input: i2c-hid: Introduce FocalTech FT8112 Input: synaptics_i2c - switch to using managed resources Input: synaptics_i2c - guard polling restart in resume Input: gpio_decoder - don't use "proxy" headers Input: gpio_decoder - make use of the macros from bits.h Input: gpio_decoder - replace custom loop by gpiod_get_array_value_cansleep() Input: gpio_decoder - unify messages with help of dev_err_probe() Input: gpio_decoder - make use of device properties Input: serio - complete sizeof(*pointer) conversions Input: wdt87xx_i2c - switch to use dev_err_probe() ...
2026-02-15Merge tag 'clk-for-linus' of ↵Linus Torvalds-52/+1390
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "Not much changed in the clk framework this time except the clk.h consumer API moved the context saving APIs around to fix a build error in certain configurations. There was a change to the core framework for CLK_OPS_PARENT_ENABLE behavior during registration, but it wrecked existing drivers that didn't expect things to be turned off during clk registration so it got reverted. This cycle is really a large collection of new clk drivers, primarily for Qualcomm SoCs but also for Amlogic, SpacemiT, Google, and Aspeed. Another big change in here is support for automatic hardware clock gating on Samsung SoCs where the clks turn on and off when needed. Ideally more vendors move to this method for better power savings. The highlights are in the updates section below. Beyond all the new drivers we have a bunch of cleanups like converting drivers from divider_round_rate() to divider_determine_rate() and using scoped for each OF child loops. Otherwise it's the usual data fixes and plugging reference leaks, etc. that's all pretty ordinary but not critical enough to fix until the next release. New Drivers: - Qualcomm Kaanapali global, tcsr, rpmh, display, gpu, camera, and video clk controllers - Qualcomm SM8750 camera clk controllers - Qualcomm MSM8940 and SDM439 global clk controllers - Google GS101 Display Process Unit (DPU) clk controllers - SpacemiT K3 clk controllers - Amlogic t7 clk controllers - Aspeed AST2700 clk controllers Updates: - Convert clock dividers from round_rate() to determine_rate() - Fix sparse warnings, kernel-doc warnings, and plug leaked OF refs - Automatic hardware clk gating on Google GS101 SoCs - Amlogic s4 video clks - CAN-FD clks and resets on Renesas RZ/T2H, RZ/N2H, RZ/V2H, and RZ/V2N - Expanded Serial Peripheral Interface (xSPI) clocks and resets on Renesas RZ/T21H and RZ/N2H - DMAC, interrupt controller (ICU), SPI, and thermal (TSU) clocks and resets on Renesas RZ/V2N - More serial (RSCI) clocks and resets on Renesas RZ/V2H and RZ/V2N - CPU frequency scaling on T-HEAD TH1520" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (165 commits) clk: aspeed: Add reset for HACE/VIDEO dt-bindings: clock: aspeed: Add VIDEO reset definition clk: aspeed: add AST2700 clock driver MAINTAINERS: Add entry for ASPEED clock drivers. clk: aspeed: Move the existing ASPEED clk drivers into aspeed subdirectory. Revert "clk: Respect CLK_OPS_PARENT_ENABLE during recalc" clk: Disable KUNIT_UML_PCI dt-bindings: clk: rs9: Fix DIF pattern match clk: rs9: Convert to DEFINE_SIMPLE_DEV_PM_OPS() clk: rs9: Reserve 8 struct clk_hw slots for for 9FGV0841 clk: qcom: sm8750: Constify 'qcom_cc_desc' in SM8750 camcc clk: zynqmp: pll: Fix zynqmp_clk_divider_determine_rate kerneldoc clk: zynqmp: divider: Fix zynqmp_clk_divider_determine_rate kerneldoc clk: mediatek: Fix error handling in runtime PM setup clk: mediatek: don't select clk-mt8192 for all ARM64 builds clk: mediatek: Add mfg_eb as parent to mt8196 mfgpll clocks clk: mediatek: Refactor pllfh registration to pass device clk: mediatek: Pass device to clk_hw_register for PLLs clk: mediatek: Refactor pll registration to pass device clk: Respect CLK_OPS_PARENT_ENABLE during recalc ...
2026-02-14Merge branch 'next' into for-linusDmitry Torokhov-238/+398
Prepare input updates for 7.0 merge window.
2026-02-14Merge tag 'fbdev-for-7.0-rc1' of ↵Linus Torvalds-11/+11
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev updates from Helge Deller: "It's now easily possible to replace the framebuffer penguin boot logo with an own logo at compile time (Vincent Mailhol) The hyperv framebuffer driver has been removed, since the hyperv DRM driver now seems to provide equal functionality. Various console_conditional_schedule() calls across the console drivers (fbcon, printk, vt) have been removed since they are no longer necessary. All other patches are either fixes in au1100fb, au1200fb, ffb, rivafb, vt8500lcdfb and of_display_timing, or minor cleanups in the fbcon and omapfb drivers" * tag 'fbdev-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: (32 commits) fbcon: Declare struct fb_info.fbcon_par as of type struct fbcon_par fbcon: Remove struct fbcon_display.inverse fbdev: au1200fb: Fix a memory leak in au1200fb_drv_probe() fbdev: ffb: fix corrupted video output on Sun FFB1 fbdev: of_display_timing: Fix device node reference leak in of_get_display_timings() staging: fbtft: Make framebuffer registration message debug-only staging: fbtft: Fix build failure when CONFIG_FB_DEVICE=n fbdev: au1100fb: Check return value of clk_enable() in .resume() printk, vt, fbcon: Remove console_conditional_schedule() fbdev: fix fb_pad_unaligned_buffer mask fbdev: of: display_timing: fix refcount leak in of_get_display_timings() fbdev: vt8500lcdfb: fix missing dma_free_coherent() video/logo: don't select LOGO_LINUX_MONO and LOGO_LINUX_VGA16 by default video/logo: move logo selection logic to Kconfig video/logo: remove logo_mac_clut224 sh: defconfig: remove CONFIG_LOGO_SUPERH_* newport_con: depend on LOGO_LINUX_CLUT224 instead of LOGO_SGI_CLUT224 video/logo: allow custom logo video/logo: add a type parameter to the logo makefile function video/logo: remove orphan .pgm Makefile rule ...
2026-02-14Merge tag 'mailbox-v6.20' of ↵Linus Torvalds-29/+32
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox Pull mailbox updates from Jassi Brar: "Platform and core updates PCC: - Updates to transmission and interrupt handling, including dynamic txdone configuration, ->last_tx_done() wiring, and SHMEM initialization fixes. Reverted previous shared buffer patch MediaTek - Introduce mtk-vcp-mailbox driver and bindings for MT8196 VCP - Expand mtk-cmdq for MT8196 with GCE virtualization, mminfra_offset, and instruction generation data Spreadtrum (SPRD) - Add Mailbox Revision 2 support and UMS9230 bindings - Fix unhandled interrupt masking and TX done delivery flags Microchip - Add pic64gx compatibility to MPFS - Fix out-of-bounds access and smatch warnings in mchp-ipc-sbi Core & Misc Platform Updates - Prevent out-of-bounds access in fw_mbox_index_xlate() - Add bindings for Qualcomm CPUCP (Kaanapali) - Simplify mtk-cmdq and zynqmp-ipi with scoped OF child iterators - Consolidate various minor fixes, dead code removal, and typo corrections across Broadcom, NXP, Samsung, Xilinx, ARM, and core headers" * tag 'mailbox-v6.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: (34 commits) mailbox: sprd: mask interrupts that are not handled mailbox: sprd: add support for mailbox revision 2 mailbox: sprd: clear delivery flag before handling TX done dt-bindings: mailbox: sprd: add compatible for UMS9230 mailbox: bcm-ferxrm-mailbox: Use default primary handler mailbox: Remove mailbox_client.h from controller drivers mailbox: zynqmp-ipi: Simplify with scoped for each OF child loop mailbox: mtk-cmdq: Simplify with scoped for each OF child loop dt-bindings: mailbox: xlnx,zynqmp-ipi-mailbox: Document msg region requirement mailbox: Improve RISCV_SBI_MPXY_MBOX guidance mailbox: mchp-ipc-sbi: fix uninitialized symbol and other smatch warnings mailbox: arm_mhuv3: fix typo in comment mailbox: cix: fix typo in error message mailbox: imx: Skip the suspend flag for i.MX7ULP mailbox: exynos: drop unneeded runtime pointer (pclk) mailbox: pcc: Remove spurious IRQF_ONESHOT usage mailbox: mtk-cmdq: Add driver data to support for MT8196 mailbox: mtk-cmdq: Add mminfra_offset configuration for DRAM transaction mailbox: mtk-cmdq: Add GCE hardware virtualization configuration mailbox: mtk-cmdq: Add cmdq private data to cmdq_pkt for generating instruction ...
2026-02-14Merge branches 'clk-aspeed' and 'clk-qcom' into clk-nextStephen Boyd-0/+594
* clk-aspeed: clk: aspeed: Add reset for HACE/VIDEO dt-bindings: clock: aspeed: Add VIDEO reset definition clk: aspeed: add AST2700 clock driver MAINTAINERS: Add entry for ASPEED clock drivers. clk: aspeed: Move the existing ASPEED clk drivers into aspeed subdirectory. * clk-qcom: (49 commits) clk: qcom: sm8750: Constify 'qcom_cc_desc' in SM8750 camcc clk: qcom: gfx3d: add parent to parent request map clk: qcom: dispcc-sm7150: Fix dispcc_mdss_pclk1_clk_src clk: qcom: dispcc-sdm845: Enable parents for pixel clocks clk: qcom: regmap-divider: convert from divider_round_rate() to divider_determine_rate() clk: qcom: regmap-divider: convert from divider_ro_round_rate() to divider_ro_determine_rate() clk: qcom: alpha-pll: convert from divider_round_rate() to divider_determine_rate() clk: qcom: Add support for GPUCC and GXCLK for Kaanapali clk: qcom: Add support for VideoCC driver for Kaanapali clk: qcom: camcc: Add support for camera clock controller for Kaanapali clk: qcom: dispcc: Add support for display clock controller Kaanapali clk: qcom: clk-alpha-pll: Add support for controlling Pongo EKO_T PLL clk: qcom: clk-alpha-pll: Update the PLL support for cal_l clk: qcom: camcc: Add camera clock controller driver for SM8750 SoC clk: qcom: clk-alpha-pll: Add support for controlling Rivian PLL dt-bindings: clock: qcom: document the Kaanapali GPU Clock Controller dt-bindings: clock: qcom: Add Kaanapali video clock controller dt-bindings: clock: qcom: Add support for CAMCC for Kaanapali dt-bindings: clock: qcom: document Kaanapali DISPCC clock controller dt-bindings: clock: qcom: Add camera clock controller for SM8750 SoC ...
2026-02-14Merge branches 'clk-amlogic', 'clk-thead', 'clk-mediatek' and 'clk-samsung' ↵Stephen Boyd-0/+383
into clk-next * clk-amlogic: clk: meson: gxbb: use the existing HHI_HDMI_PLL_CNTL3 macro clk: meson: g12a: Limit the HDMI PLL OD to /4 clk: meson: gxbb: Limit the HDMI PLL OD to /4 on GXL/GXM SoCs clk: amlogic: remove potentially unsafe flags from S4 video clocks clk: amlogic: add video-related clocks for S4 SoC dt-bindings: clock: add video clock indices for Amlogic S4 SoC clk: meson: t7: add t7 clock peripherals controller driver clk: meson: t7: add support for the T7 SoC PLL clock dt-bindings: clock: add Amlogic T7 peripherals clock controller dt-bindings: clock: add Amlogic T7 SCMI clock controller dt-bindings: clock: add Amlogic T7 PLL clock controller * clk-thead: clk: thead: th1520-ap: Support CPU frequency scaling clk: thead: th1520-ap: Add macro to define multiplexers with flags clk: thead: th1520-ap: Support setting PLL rates clk: thead: th1520-ap: Add C910 bus clock clk: thead: th1520-ap: Poll for PLL lock and wait for stability dt-bindings: clock: thead,th1520-clk-ap: Add ID for C910 bus clock * clk-mediatek: Revert "clk: Respect CLK_OPS_PARENT_ENABLE during recalc" clk: mediatek: Fix error handling in runtime PM setup clk: mediatek: don't select clk-mt8192 for all ARM64 builds clk: mediatek: Add mfg_eb as parent to mt8196 mfgpll clocks clk: mediatek: Refactor pllfh registration to pass device clk: mediatek: Pass device to clk_hw_register for PLLs clk: mediatek: Refactor pll registration to pass device clk: Respect CLK_OPS_PARENT_ENABLE during recalc dt-bindings: clock: mediatek,mt7622-pciesys: Remove syscon compatible clk: mediatek: Drop __initconst from gates * clk-samsung: clk: samsung: gs101: add support for Display Process Unit (DPU) clocks dt-bindings: samsung: exynos-sysreg: add gs101 dpu compatible dt-bindings: clock: google,gs101-clock: Add DPU clock management unit dt-bindings: clock: google,gs101-clock: fix alphanumeric ordering clk: samsung: fix sysreg save/restore when PM is enabled for CMU clk: samsung: avoid warning message on legacy Exynos (auto clock gating) clk: samsung: gs101: Enable auto_clock_gate mode for each gs101 CMU clk: samsung: Implement automatic clock gating mode for CMUs dt-bindings: clock: google,gs101-clock: add samsung,sysreg property as required clk: samsung: exynosautov920: add clock support dt-bindings: clock: exynosautov920: add MFD clock definitions
2026-02-14Merge branches 'clk-renesas', 'clk-cleanup', 'clk-spacemit' and 'clk-tegra' ↵Stephen Boyd-63/+737
into clk-next * clk-renesas: (25 commits) dt-bindings: clk: rs9: Fix DIF pattern match clk: rs9: Convert to DEFINE_SIMPLE_DEV_PM_OPS() clk: rs9: Reserve 8 struct clk_hw slots for for 9FGV0841 clk: renesas: Add missing log message terminators clk: renesas: rzg2l: Remove DSI clock rate restrictions clk: renesas: rzv2h: Deassert reset on assert timeout clk: renesas: rzg2l: Deassert reset on assert timeout clk: renesas: cpg-mssr: Unlock before reset verification clk: renesas: r9a09g056: Add entries for CANFD clk: renesas: r9a09g057: Add entries for CANFD clk: renesas: r9a09g077: Add CANFD clocks clk: renesas: cpg-mssr: Handle RZ/T2H register layout in PM callbacks dt-bindings: clock: renesas,r9a09g077/87: Add PCLKCAN ID clk: renesas: cpg-mssr: Simplify pointer math in cpg_rzt2h_mstp_read() clk: renesas: r9a09g056: Add clock and reset entries for TSU clk: renesas: r9a09g057: Add entries for RSCIs clk: renesas: r9a09g056: Add entries for RSCIs clk: renesas: r9a09g056: Add entries for the RSPIs clk: renesas: r9a09g056: Add entries for ICU clk: renesas: r9a09g056: Add entries for the DMACs ... * clk-cleanup: clk: Disable KUNIT_UML_PCI clk: zynqmp: pll: Fix zynqmp_clk_divider_determine_rate kerneldoc clk: zynqmp: divider: Fix zynqmp_clk_divider_determine_rate kerneldoc clk: tegra: tegra124-emc: fix device leak on set_rate() clk: Annotate #else and #endif clk: Merge prepare and unprepare sections clk: Move clk_{save,restore}_context() to COMMON_CLK section clk: clk-apple-nco: Add "apple,t8103-nco" compatible clk: versatile: impd1: Simplify with scoped for each OF child loop clk: scpi: Simplify with scoped for each OF child loop clk: lmk04832: Simplify with scoped for each OF child loop * clk-spacemit: clk: spacemit: k3: add the clock tree clk: spacemit: k3: extract common header clk: spacemit: ccu_pll: add plla type clock clk: spacemit: ccu_mix: add inverted enable gate clock dt-bindings: soc: spacemit: k3: add clock support clk: spacemit: add platform SoC prefix to reset name clk: spacemit: extract common ccu functions reset: spacemit: fix auxiliary device id clk: spacemit: prepare common ccu header clk: spacemit: Hide common clock driver from user controller clk: spacemit: Respect Kconfig setting when building modules * clk-tegra: clk: tegra30: Add CSI pad clock gates clk: tegra: Set CSUS as vi_sensor's gate for Tegra20, Tegra30 and Tegra114 clk: tegra20: Reparent dsi clock to pll_d_out0 clk: tegra: tegra124-emc: Simplify with scoped for each OF child loop clk: tegra: Adjust callbacks in tegra_clock_pm clk: tegra: tegra124-emc: Fix potential memory leak in tegra124_clk_register_emc()
2026-02-14Merge tag 'f2fs-for-7.0-rc1' of ↵Linus Torvalds-29/+186
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this development cycle, we focused on several key performance optimizations: - introducing large folio support to enhance read speeds for immutable files - reducing checkpoint=enable latency by flushing only committed dirty pages - implementing tracepoints to diagnose and resolve lock priority inversion. Additionally, we introduced the packed_ssa feature to optimize the SSA footprint when utilizing large block sizes. Detail summary: Enhancements: - support large folio for immutable non-compressed case - support non-4KB block size without packed_ssa feature - optimize f2fs_enable_checkpoint() to avoid long delay - optimize f2fs_overwrite_io() for f2fs_iomap_begin - optimize NAT block loading during checkpoint write - add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint - pin files do not require sbi->writepages lock for ordering - avoid f2fs_map_blocks() for consecutive holes in readpages - flush plug periodically during GC to maximize readahead effect - add tracepoints to catch lock overheads - add several sysfs entries to tune internal lock priorities Fixes: - fix lock priority inversion issue - fix incomplete block usage in compact SSA summaries - fix to show simulate_lock_timeout correctly - fix to avoid mapping wrong physical block for swapfile - fix IS_CHECKPOINTED flag inconsistency issue caused by concurrent atomic commit and checkpoint writes - fix to avoid UAF in f2fs_write_end_io()" * tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (61 commits) f2fs: sysfs: introduce critical_task_priority f2fs: introduce trace_f2fs_priority_update f2fs: fix lock priority inversion issue f2fs: optimize f2fs_overwrite_io() for f2fs_iomap_begin f2fs: fix incomplete block usage in compact SSA summaries f2fs: decrease maximum flush retry count in f2fs_enable_checkpoint() f2fs: optimize NAT block loading during checkpoint write f2fs: change size parameter of __has_cursum_space() to unsigned int f2fs: add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint f2fs: pin files do not require sbi->writepages lock for ordering f2fs: fix to show simulate_lock_timeout correctly f2fs: introduce FAULT_SKIP_WRITE f2fs: check skipped write in f2fs_enable_checkpoint() Revert "f2fs: add timeout in f2fs_enable_checkpoint()" f2fs: fix to unlock folio in f2fs_read_data_large_folio() f2fs: fix error path handling in f2fs_read_data_large_folio() f2fs: use folio_end_read f2fs: fix to avoid mapping wrong physical block for swapfile f2fs: avoid f2fs_map_blocks() for consecutive holes in readpages f2fs: advance index and offset after zeroing in large folio read ...
2026-02-14block: update docs for bio and bvec_iterAndreas Hindborg-9/+28
The documentation for bio and bvec_iter refers to a vector named bvl_vec. This does not exist. Update the documentation comment with correct use. Also update documentation comments for remaining fields of `bvec_iter` to improve readability. The fields of `bvec_iter` is using a mix of tabs and spaces for indentation. While at it, change them all to tabs, which is most prevalent in this struct definition. Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-14drm/fourcc: fix plane order for 10/12/16-bit YCbCr formatsSimon Ser-6/+6
The short comments had the correct order, but the long comments had the planes reversed. Fixes: 2271e0a20ef7 ("drm: drm_fourcc: add 10/12/16bit software decoder YCbCr formats") Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Robert Mader <robert.mader@collabora.com> Link: https://patch.msgid.link/20260208224718.57199-1-contact@emersion.fr
2026-02-14fbcon: Declare struct fb_info.fbcon_par as of type struct fbcon_parThomas Zimmermann-1/+2
The only correct type for the field fbcon_par in struct fb_info is struct fbcon_par. Declare is as such. The field is a pointer to fbcon-private data. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-14printk, vt, fbcon: Remove console_conditional_schedule()Sebastian Andrzej Siewior-1/+0
do_con_write(), fbcon_redraw.*() invoke console_conditional_schedule() which is a conditional scheduling point based on printk's internal variables console_may_schedule. It may only be used if the console lock is acquired for instance via console_lock() or console_trylock(). Prinkt sets the internal variable to 1 (and allows to schedule) if the console lock has been acquired via console_lock(). The trylock does not allow it. The console_conditional_schedule() invocation in do_con_write() is invoked shortly before console_unlock(). The console_conditional_schedule() invocation in fbcon_redraw.*() original from fbcon_scroll() / vt's con_scroll() which originate from a line feed. In console_unlock() the variable is set to 0 (forbids to schedule) and it tries to schedule while making progress printing. This is brand new compared to when console_conditional_schedule() was added in v2.4.9.11. In v2.6.38-rc3, console_unlock() (started its existence) iterated over all consoles and flushed them with disabled interrupts. A scheduling attempt here was not possible, it relied that a long print scheduled before console_unlock(). Since commit 8d91f8b15361d ("printk: do cond_resched() between lines while outputting to consoles"), which appeared in v4.5-rc1, console_unlock() attempts to schedule if it was allowed to schedule while during console_lock(). Each record is idealy one line so after every line feed. This console_conditional_schedule() is also only relevant on PREEMPT_NONE and PREEMPT_VOLUNTARY builds. In other configurations cond_resched() becomes a nop and has no impact. I'm bringing this all up just proof that it is not required anymore. It becomes a problem on a PREEMPT_RT build with debug code enabled because that might_sleep() in cond_resched() remains and triggers a warnings. This is due to legacy_kthread_func-> console_flush_one_record -> vt_console_print-> lf -> con_scroll -> fbcon_scroll and vt_console_print() acquires a spinlock_t which does not allow a voluntary schedule. There is no need to fb_scroll() to schedule since console_flush_one_record() attempts to schedule after each line. !PREEMPT_RT is not affected because the legacy printing thread is only enabled on PREEMPT_RT builds. Therefore I suggest to remove console_conditional_schedule(). Cc: Simona Vetter <simona@ffwll.ch> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Fixes: 5f53ca3ff83b4 ("printk: Implement legacy printer kthread for PREEMPT_RT") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Petr Mladek <pmladek@suse.com> # from printk() POV Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-14video/logo: move logo selection logic to KconfigVincent Mailhol-7/+0
Now that the path to the logo file can be directly entered in Kbuild, there is no more need to handle all the logo file selection in the Makefile and the C files. The only exception is the logo_spe_clut224 which is only used by the Cell processor (found for example in the Playstation 3) [1]. This extra logo uses its own different image which shows up on a separate line just below the normal logo. Because the extra logo uses a different image, it can not be factorized under the custom logo logic. Move all the logo file selection logic to Kbuild (except from the logo_spe_clut224.ppm), this done, clean-up the C code to only leave one entry for each logo type (monochrome, 16-colors and 224-colors). [1] Cell SPE logos Link: https://lore.kernel.org/all/20070710122702.765654000@pademelon.sonytel.be/ Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-14video/logo: remove logo_mac_clut224Vincent Mailhol-1/+0
The logo_mac_clut224 depends on the runtime value MACH_IS_MAC being true to be displayed. This makes that logo a one-of-a-kind, as it is the only one whose selection can not be decided at compile time. This dynamic logo selection logic conflicts with our upcoming plans to simplify the logo selection code. Considering that the logo_mac_clut224 is only used by the Macintosh 68k, a machine whose sales ended some thirty years ago and which thus represents a very small user base, it is preferable to resolve the conflict in favour of code simplicity. Remove the logo_mac_clut224 so that the logo selection can be statically determined at compile time. The users who wish to continue using that logo can still download it from [1] and add: CONFIG_LOGO_LINUX_CLUT224=y CONFIG_LOGO_LINUX_CLUT224_FILE="/path/to/logo_mac_clut224.ppm" to their configuration file to restore it. [1] logo_mac_clut224.ppm file Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/drivers/video/logo/logo_mac_clut224.ppm?h=v6.18 Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-14fb: Add dev_of_fbinfo() helper for optional sysfs supportChintan Patel-0/+9
Add dev_of_fbinfo() to return the framebuffer struct device when CONFIG_FB_DEVICE is enabled, or NULL otherwise. This allows fbdev drivers to use sysfs interfaces via runtime checks instead of CONFIG_FB_DEVICE ifdefs, keeping the code clean while remaining fully buildable. Suggested-by: Helge Deller <deller@gmx.de> Reviewed-by: Helge Deller <deller@gmx.de> Reviewed-by: Andy Shevchenko <andy@kernel.org> Signed-off-by: Chintan Patel <chintanlike@gmail.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-14fbdev: Use device_create_with_groups() to fix sysfs groups registration raceHans de Goede-1/+0
The fbdev sysfs attributes are registered after sending the uevent for the device creation, leaving a race window where e.g. udev rules may not be able to access the sysfs attributes because the registration is not done yet. Fix this by switching to device_create_with_groups(). This also results in a nice cleanup. After switching to device_create_with_groups() all that is left of fb_init_device() is setting the drvdata and that can be passed to device_create[_with_groups]() too. After which fb_init_device() can be completely removed. Dropping fb_init_device() + fb_cleanup_device() in turn allows removing fb_info.class_flag as they were the only user of this field. Fixes: 5fc830d6aca1 ("fbdev: Register sysfs groups through device_add_group") Cc: stable@vger.kernel.org Cc: Shixiong Ou <oushixiong@kylinos.cn> Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Signed-off-by: Helge Deller <deller@gmx.de>
2026-02-13Merge tag 'trace-v7.0' of ↵Linus Torvalds-15/+26
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: "User visible changes: - Add an entry into MAINTAINERS file for RUST versions of code There's now RUST code for tracing and static branches. To differentiate that code from the C code, add entries in for the RUST version (with "[RUST]" around it) so that the right maintainers get notified on changes. - New bitmask-list option added to tracefs When this is set, bitmasks in trace event are not displayed as hex numbers, but instead as lists: e.g. 0-5,7,9 instead of 0000015f - New show_event_filters file in tracefs Instead of having to search all events/*/*/filter for any active filters enabled in the trace instance, the file show_event_filters will list them so that there's only one file that needs to be examined to see if any filters are active. - New show_event_triggers file in tracefs Instead of having to search all events/*/*/trigger for any active triggers enabled in the trace instance, the file show_event_triggers will list them so that there's only one file that needs to be examined to see if any triggers are active. - Have traceoff_on_warning disable trace pintk buffer too Recently recording of trace_printk() could go to other trace instances instead of the top level instance. But if traceoff_on_warning triggers, it doesn't stop the buffer with trace_printk() and that data can easily be lost by being overwritten. Have traceoff_on_warning also disable the instance that has trace_printk() being written to it. - Update the hist_debug file to show what function the field uses When CONFIG_HIST_TRIGGERS_DEBUG is enabled, a hist_debug file exists for every event. This displays the internal data of any histogram enabled for that event. But it is lacking the function that is called to process one of its fields. This is very useful information that was missing when debugging histograms. - Up the histogram stack size from 16 to 31 Stack traces can be used as keys for event histograms. Currently the size of the stack that is stored is limited to just 16 entries. But the storage space in the histogram is 256 bytes, meaning that it can store up to 31 entries (plus one for the count of entries). Instead of letting that space go to waste, up the limit from 16 to 31. This makes the keys much more useful. - Fix permissions of per CPU file buffer_size_kb The per CPU file of buffer_size_kb was incorrectly set to read only in a previous cleanup. It should be writable. - Reset "last_boot_info" if the persistent buffer is cleared The last_boot_info shows address information of a persistent ring buffer if it contains data from a previous boot. It is cleared when recording starts again, but it is not cleared when the buffer is reset. The data is useless after a reset so clear it on reset too. Internal changes: - A change was made to allow tracepoint callbacks to have preemption enabled, and instead be protected by SRCU. This required some updates to the callbacks for perf and BPF. perf needed to disable preemption directly in its callback because it expects preemption disabled in the later code. BPF needed to disable migration, as its code expects to run completely on the same CPU. - Have irq_work wake up other CPU if current CPU is "isolated" When there's a waiter waiting on ring buffer data and a new event happens, an irq work is triggered to wake up that waiter. This is noisy on isolated CPUs (running NO_HZ_FULL). Trigger an IPI to a house keeping CPU instead. - Use proper free of trigger_data instead of open coding it in. - Remove redundant call of event_trigger_reset_filter() It was called immediately in a function that was called right after it. - Workqueue cleanups - Report errors if tracing_update_buffers() were to fail. - Make the enum update workqueue generic for other parts of tracing On boot up, a work queue is created to convert enum names into their numbers in the trace event format files. This work queue can also be used for other aspects of tracing that takes some time and shouldn't be called by the init call code. The blk_trace initialization takes a bit of time. Have the initialization code moved to the new tracing generic work queue function. - Skip kprobe boot event creation call if there's no kprobes defined on cmdline The kprobe initialization to set up kprobes if they are defined on the cmdline requires taking the event_mutex lock. This can be held by other tracing code doing initialization for a long time. Since kprobes added to the kernel command line need to be setup immediately, as they may be tracing early initialization code, they cannot be postponed in a work queue and must be setup in the initcall code. If there's no kprobe on the kernel cmdline, there's no reason to take the mutex and slow down the boot up code waiting to get the lock only to find out there's nothing to do. Simply exit out early if there's no kprobes on the kernel cmdline. If there are kprobes on the cmdline, then someone cares more about tracing over the speed of boot up. - Clean up the trigger code a bit - Move code out of trace.c and into their own files trace.c is now over 11,000 lines of code and has become more difficult to maintain. Start splitting it up so that related code is in their own files. Move all the trace_printk() related code into trace_printk.c. Move the __always_inline stack functions into trace.h. Move the pid filtering code into a new trace_pid.c file. - Better define the max latency and snapshot code The latency tracers have a "max latency" buffer that is a copy of the main buffer and gets swapped with it when a new high latency is detected. This keeps the trace up to the highest latency around where this max_latency buffer is never written to. It is only used to save the last max latency trace. A while ago a snapshot feature was added to tracefs to allow user space to perform the same logic. It could also enable events to trigger a "snapshot" if one of their fields hit a new high. This was built on top of the latency max_latency buffer logic. Because snapshots came later, they were dependent on the latency tracers to be enabled. In reality, the latency tracers depend on the snapshot code and not the other way around. It was just that they came first. Restructure the code and the kconfigs to have the latency tracers depend on snapshot code instead. This actually simplifies the logic a bit and allows to disable more when the latency tracers are not defined and the snapshot code is. - Fix a "false sharing" in the hwlat tracer code The loop to search for latency in hardware was using a variable that could be changed by user space for each sample. If the user change this variable, it could cause a bus contention, and reading that variable can show up as a large latency in the trace causing a false positive. Read this variable at the start of the sample with a READ_ONCE() into a local variable and keep the code from sharing cache lines with readers. - Fix function graph tracer static branch optimization code When only one tracer is defined for function graph tracing, it uses a static branch to call that tracer directly. When another tracer is added, it goes into loop logic to call all the registered callbacks. The code was incorrect when going back to one tracer and never re-enabled the static branch again to do the optimization code. - And other small fixes and cleanups" * tag 'trace-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (46 commits) function_graph: Restore direct mode when callbacks drop to one tracing: Fix indentation of return statement in print_trace_fmt() tracing: Reset last_boot_info if ring buffer is reset tracing: Fix to set write permission to per-cpu buffer_size_kb tracing: Fix false sharing in hwlat get_sample() tracing: Move d_max_latency out of CONFIG_FSNOTIFY protection tracing: Better separate SNAPSHOT and MAX_TRACE options tracing: Add tracer_uses_snapshot() helper to remove #ifdefs tracing: Rename trace_array field max_buffer to snapshot_buffer tracing: Move pid filtering into trace_pid.c tracing: Move trace_printk functions out of trace.c and into trace_printk.c tracing: Use system_state in trace_printk_init_buffers() tracing: Have trace_printk functions use flags instead of using global_trace tracing: Make tracing_update_buffers() take NULL for global_trace tracing: Make printk_trace global for tracing system tracing: Move ftrace_trace_stack() out of trace.c and into trace.h tracing: Move __trace_buffer_{un}lock_*() functions to trace.h tracing: Make tracing_selftest_running global to the tracing subsystem tracing: Make tracing_disabled global for tracing system tracing: Clean up use of trace_create_maxlat_file() ...
2026-02-13Merge tag 'platform-drivers-x86-v7.0-1' of ↵Linus Torvalds-52/+93
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: "Highlights: - amd/pmf: - Avoid overwriting BIOS input values when events occur rapidly - Fix PMF driver issues related to S4 (in part on crypto/ccp side) - Add NPU metrics API (for accel side consumers) - Allow disabling Smart PC function through a module parameter - asus-wmi & HID/asus: - Unification of backlight control (replaces quirks) - Support multiple interfaces for controlling keyboard/RGB brightness - Simplify init sequence - hp-wmi: - Add manual fan control for Victus S models - Add fan mode keep-alive - Fix platform profile values for Omen 16-wf1xxx - Add EC offset to get the thermal profile - intel/pmc: Show substate residencies also for non-primary PMCs - intel/ISST: - Store and restore data for all domains - Write interface improvements - lenovo-wmi: - Support multiple Capability Data - Add HWMON reporting and tuning support - mellanox/mlx-platform: Add HI173 & HI174 support - surface/aggregator_registry: Add Surface Pro 11 (QCOM) - thinkpad_acpi: Add support for HW damage detection capability - uniwill: Implement cTGP setting - wmi: - Introduce marshalling support - Convert a few drivers to use the new buffer-based WMI API - tools/power/x86/intel-speed-select: Allow read operations for non-root - Miscellaneous cleanups / refactoring / improvements" * tag 'platform-drivers-x86-v7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (68 commits) platform/x86: lenovo-wmi-{capdata,other}: Fix HWMON channel visibility platform/x86: hp-wmi: Add EC offsets to read Victus S thermal profile platform: mellanox: mlx-platform: Add support DGX flavor of next-generation 800GB/s ethernet switch. platform: mellanox: mlx-platform: Add support for new Nvidia DGX system based on class VMOD0010 HID: asus: add support for the asus-wmi brightness handler platform/x86: asus-wmi: add keyboard brightness event handler platform/x86: asus-wmi: remove unused keyboard backlight quirk HID: asus: listen to the asus-wmi brightness device instead of creating one platform/x86: asus-wmi: Add support for multiple kbd led handlers HID: asus: early return for ROG devices HID: asus: move vendor initialization to probe HID: asus: fortify keyboard handshake HID: asus: use same report_id in response HID: asus: initialize additional endpoints only for certain devices HID: asus: simplify RGB init sequence platform/wmi: string-kunit: Add missing oversized string test case platform/x86/amd/pmf: Added a module parameter to disable the Smart PC function platform/x86/uniwill: Implement cTGP setting platform/x86: uniwill-laptop: Introduce device descriptor system platform/x86/amd: Use scope-based cleanup for wbrf_record() ...
2026-02-13Merge tag 'mtd/for-7.0' of ↵Linus Torvalds-18/+152
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD updates from Miquel Raynal: "MTD: - prioritize ofpart in physmap-core probing - conversions to scoped for each OF child loops Bindings: - The bulk of the changes consists of binding fixes/updates to restrict the use of undefined properties, which was mostly ineffective in the current form because of the nesting of partition nodes and the lack of compatible strings - YAML conversions and the addition of a dma-coherent property in the cdns,hp-nfc driver SPI NAND: - support for octal DTR modes (8D-8D-8D) - support for Foresee F35SQB002G chips And small misc fixes" * tag 'mtd/for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (65 commits) mtd: spi-nor: hisi-sfc: fix refcounting bug in hisi_spi_nor_register_all() mtd: spinand: fix NULL pointer dereference in spinand_support_vendor_ops() mtd: rawnand: pl353: Add message about ECC mode mtd: rawnand: pl353: Fix software ECC support mtd: spinand: winbond: Remove unneeded semicolon dt-bindings: mtd: cdns,hp-nfc: Add dma-coherent property mtd: spinand: Disable continuous read during probe mtd: spinand: add Foresee F35SQB002G flash support mtd: spinand: winbond: W35N octal DTR support mtd: spinand: Add octal DTR support mtd: spinand: Warn if using SSDR-only vendor commands in a non SSDR mode mtd: spinand: Give the bus interface to the configuration helper mtd: spinand: Propagate the bus interface across core helpers mtd: spinand: Add support for setting a bus interface mtd: spinand: Gather all the bus interface steps in one single function mtd: spinand: winbond: Configure the IO mode after the dummy cycles mtd: spinand: winbond: Rename IO_MODE register macro mtd: spinand: winbond: Fix style mtd: spinand: winbond: Register W35N vendor specific operation mtd: spinand: winbond: Register W25N vendor specific operation ...
2026-02-13Merge tag 'dma-mapping-7.0-2026-02-13' of ↵Linus Torvalds-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping update from Marek Szyprowski: "A small code cleanup for the DMA-mapping subsystem: removal of unused hooks (Robin Murphy)" * tag 'dma-mapping-7.0-2026-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: Remove dma_mark_clean (again)
2026-02-13ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()Qanux-0/+2
On the receive path, __ioam6_fill_trace_data() uses trace->nodelen to decide how much data to write for each node. It trusts this field as-is from the incoming packet, with no consistency check against trace->type (the 24-bit field that tells which data items are present). A crafted packet can set nodelen=0 while setting type bits 0-21, causing the function to write ~100 bytes past the allocated region (into skb_shared_info), which corrupts adjacent heap memory and leads to a kernel panic. Add a shared helper ioam6_trace_compute_nodelen() in ioam6.c to derive the expected nodelen from the type field, and use it: - in ioam6_iptunnel.c (send path, existing validation) to replace the open-coded computation; - in exthdrs.c (receive path, ipv6_hop_ioam) to drop packets whose nodelen is inconsistent with the type field, before any data is written. Per RFC 9197, bits 12-21 are each short (4-octet) fields, so they are included in IOAM6_MASK_SHORT_FIELDS (changed from 0xff100000 to 0xff1ffc00). Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260211040412.86195-1-qjx1298677004@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-13Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds-13/+112
Pull virtio updates from Michael Tsirkin: - in-order support in virtio core - multiple address space support in vduse - fixes, cleanups all over the place, notably dma alignment fixes for non-cache-coherent systems * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (59 commits) vduse: avoid adding implicit padding vhost: fix caching attributes of MMIO regions by setting them explicitly vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr() vdpa/mlx5: reuse common function for MAC address updates vdpa/mlx5: update mlx_features with driver state check crypto: virtio: Replace package id with numa node id crypto: virtio: Remove duplicated virtqueue_kick in virtio_crypto_skcipher_crypt_req crypto: virtio: Add spinlock protection with virtqueue notification Documentation: Add documentation for VDUSE Address Space IDs vduse: bump version number vduse: add vq group asid support vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls vduse: take out allocations from vduse_dev_alloc_coherent vduse: remove unused vaddr parameter of vduse_domain_free_coherent vduse: refactor vdpa_dev_add for goto err handling vhost: forbid change vq groups ASID if DRIVER_OK is set vdpa: document set_group_asid thread safety vduse: return internal vq group struct as map token vduse: add vq group support vduse: add v1 API definition ...
2026-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds-10/+50
Pull KVM updates from Paolo Bonzini: "Loongarch: - Add more CPUCFG mask bits - Improve feature detection - Add lazy load support for FPU and binary translation (LBT) register state - Fix return value for memory reads from and writes to in-kernel devices - Add support for detecting preemption from within a guest - Add KVM steal time test case to tools/selftests ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers - More pKVM fixes for features that are not supposed to be exposed to guests - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest - Preliminary work for guest GICv5 support - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures - A small set of FPSIMD cleanups - Selftest fixes addressing the incorrect alignment of page allocation - Other assorted low-impact fixes and spelling fixes RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit bf2c3138ae36 ("Merge tag 'kvm-x86-pmu-6.20' ...") - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace, - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs) - Clean up KVM's handling of marking mapped vCPU pages dirty - Drop a pile of *ancient* sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n - Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y - Add a regression test for recent APICv update fixes - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information - Drop a dead function declaration - Minor cleanups x86 (Intel): - Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit. The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM - Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed - Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch x86 (AMD): - Drop a user-triggerable WARN on nested_svm_load_cr3() failure - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) *must* flush the RAP - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0 - Treat exit_code as an unsigned 64-bit value through all of KVM - Add support for fetching SNP certificates from userspace - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2 - Misc fixes and cleanups x86 selftests: - Add a regression test for TPR<=>CR8 synchronization and IRQ masking - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests) - Extend several nested VMX tests to also cover nested SVM - Add a selftest for nested VMLOAD/VMSAVE - Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain guest_memfd: - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock Generic: - Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits) KVM: s390: Increase permitted SE header size to 1 MiB MAINTAINERS: Replace backup for s390 vfio-pci KVM: s390: vsie: Fix race in acquire_gmap_shadow() KVM: s390: vsie: Fix race in walk_guest_tables() KVM: s390: Use guest address to mark guest page dirty irqchip/riscv-imsic: Adjust the number of available guest irq files RISC-V: KVM: Transparent huge page support RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add riscv vm satp modes KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr() RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr() RISC-V: KVM: Remove unnecessary 'ret' assignment KVM: s390: Add explicit padding to struct kvm_s390_keyop KVM: LoongArch: selftests: Add steal time test case LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side LoongArch: KVM: Add paravirt preempt feature in hypervisor side ...
2026-02-12Merge tag 'riscv-for-linus-7.0-mw1' of ↵Linus Torvalds-2/+36
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: - Add support for control flow integrity for userspace processes. This is based on the standard RISC-V ISA extensions Zicfiss and Zicfilp - Improve ptrace behavior regarding vector registers, and add some selftests - Optimize our strlen() assembly - Enable the ISO-8859-1 code page as built-in, similar to ARM64, for EFI volume mounting - Clean up some code slightly, including defining copy_user_page() as copy_page() rather than memcpy(), aligning us with other architectures; and using max3() to slightly simplify an expression in riscv_iommu_init_check() * tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits) riscv: lib: optimize strlen loop efficiency selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function selftests: riscv: verify ptrace accepts valid vector csr values selftests: riscv: verify ptrace rejects invalid vector csr inputs selftests: riscv: verify syscalls discard vector context selftests: riscv: verify initial vector state with ptrace selftests: riscv: test ptrace vector interface riscv: ptrace: validate input vector csr registers riscv: csr: define vtype register elements riscv: vector: init vector context with proper vlenb riscv: ptrace: return ENODATA for inactive vector extension kselftest/riscv: add kselftest for user mode CFI riscv: add documentation for shadow stack riscv: add documentation for landing pad / indirect branch tracking riscv: create a Kconfig fragment for shadow stack and landing pad support arch/riscv: add dual vdso creation logic and select vdso based on hw arch/riscv: compile vdso with landing pad and shadow stack note riscv: enable kernel access to shadow stack memory via the FWFT SBI call riscv: add kernel command line option to opt out of user CFI riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe ...
2026-02-12Merge tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds-13/+8
Pull NFS client updates from Anna Schumaker: "New Features: - Use an LRU list for returning unused delegations - Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default Bugfixes: - NFS/localio: - Handle short writes by retrying - Prevent direct reclaim recursion into NFS via nfs_writepages - Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit - Remove -EAGAIN handling in nfs_local_doio() - pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN - fs/nfs: Fix a readdir slow-start regression - SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path Other cleanups and improvements: - A few other NFS/localio cleanups - Various other delegation handling cleanups from Christoph - Unify security_inode_listsecurity() calls - Improvements to NFSv4 lease handling - Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set" * tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits) SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path nfs: nfs4proc: Convert comma to semicolon SUNRPC: Change list definition method sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset NFSv4: limit lease period in nfs4_set_lease_period() NFSv4: pass lease period in seconds to nfs4_set_lease_period() nfs: unify security_inode_listsecurity() calls fs/nfs: Fix readdir slow-start regression pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN NFS: fix delayed delegation return handling NFS: simplify error handling in nfs_end_delegation_return NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return NFS: remove the delegation == NULL check in nfs_end_delegation_return NFS: use bool for the issync argument to nfs_end_delegation_return NFS: return void from ->return_delegation NFS: return void from nfs4_inode_make_writeable NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4 NFS: Add a way to disable NFS v4.0 via KConfig NFS: Move sequence slot operations into minorversion operations NFS: Pass a struct nfs_client to nfs4_init_sequence() ...
2026-02-12Merge tag 'ata-6.20-rc1' of ↵Linus Torvalds-40/+36
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ATA updates from Damien Le Moal: - Cleanup IRQ masking in the handling of completed report zones commands (Niklas) - Improve the handling of Thunderbolt attached devices to speed up device removal (Henry) - Several patches to generalize the existing max_sec quirks to facilitates quirking the maximum command size of buggy drives, many of which have recently showed up with the recent increase of the default max_sectors block limit (Niklas) - Cleanup the ahci-platform and sata dt-bindings schema (Rob, Manivannan) - Improve device node scan in the ahci-dwc driver (Krzysztof) - Remove clang W=1 warnings with the ahci-imx and ahci-xgene drivers (Krzysztof) - Fix a long standing potential command starvation situation with non-NCQ commands issued when NCQ commands are on-going (me) - Limit max_sectors to 8191 on the INTEL SSDSC2KG480G8 SSD (Niklas) - Remove Vesa Local Bus (VLB) support in the pata_legacy driver (Ethan) - Simple fixes in the pata_cypress (typo) and pata_ftide010 (timing) drivers (Ethan, Linus W) * tag 'ata-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: pata_ftide010: Fix some DMA timings ata: pata_cypress: fix typo in error message ata: pata_legacy: remove VLB support ata: libata-core: Quirk INTEL SSDSC2KG480G8 max_sectors dt-bindings: ata: sata: Document the graph port ata: libata-scsi: avoid Non-NCQ command starvation ata: libata-scsi: refactor ata_scsi_translate() ata: ahci-xgene: Fix Wvoid-pointer-to-enum-cast warning ata: ahci-imx: Fix Wvoid-pointer-to-enum-cast warning ata: ahci-dwc: Simplify with scoped for each OF child loop dt-bindings: ata: ahci-platform: Drop unnecessary select schema ata: libata: Allow more quirks ata: libata: Add libata.force parameter max_sec ata: libata: Add support to parse equal sign in libata.force ata: libata: Change libata.force to use the generic ATA_QUIRK_MAX_SEC quirk ata: libata: Add ata_force_get_fe_for_dev() helper ata: libata: Add ATA_QUIRK_MAX_SEC and convert all device quirks ata: libata: avoid long timeouts on hot-unplugged SATA DAS ata: libata-scsi: Remove superfluous local_irq_save()
2026-02-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds-5/+170
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
2026-02-12Merge tag 'cxl-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds-9/+80
Pull CXL updates from Dave Jiang: - Introduce cxl_memdev_attach and pave way for soft reserved handling, type2 accelerator enabling, and LSA 2.0 enabling. All these series require the endpoint driver to settle before continuing the memdev driver probe. - Address CXL port error protocol handling and reporting. The large patch series was split into three parts. The first two parts are included here with the final part coming later. The first part consists of a series of code refactoring to PCI AER sub-system that addresses CXL and also CXL RAS code to prepare for port error handling. The second part refactors the CXL code to move management of component registers to cxl_port objects to allow all CXL AER errors to be handled through the cxl_port hierarchy. - Provide AMD Zen5 platform address translation for CXL using ACPI PRMT. This includes a conventions document to explain why this is needed and how it's implemented. - Misc CXL patches of fixes, cleanups, and updates. Including CXL address translation for unaligned MOD3 regions. [ TLA service: CXL is "Compute Express Link" ] * tag 'cxl-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (59 commits) cxl: Disable HPA/SPA translation handlers for Normalized Addressing cxl/region: Factor out code into cxl_region_setup_poison() cxl/atl: Lock decoders that need address translation cxl: Enable AMD Zen5 address translation using ACPI PRMT cxl/acpi: Prepare use of EFI runtime services cxl: Introduce callback for HPA address ranges translation cxl/region: Use region data to get the root decoder cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos() cxl/region: Separate region parameter setup and region construction cxl: Simplify cxl_root_ops allocation and handling cxl/region: Store HPA range in struct cxl_region cxl/region: Store root decoder in struct cxl_region cxl/region: Rename misleading variable name @hpa to @hpa_range Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement cxl, doc: Moving conventions in separate files cxl, doc: Remove isonum.txt inclusion cxl/port: Unify endpoint and switch port lookup cxl/port: Move endpoint component register management to cxl_port cxl/port: Map Port RAS registers cxl/port: Move dport RAS setup to dport add time ...
2026-02-12Merge tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds-20/+18
Pull VFIO updates from Alex Williamson: "A small cycle with the bulk in selftests and reintroducing poison handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and some dmabuf structure consolidation. - Update outdated mdev comment referencing the renamed mdev_type_add() function (Julia Lawall) - Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex Mastro) - Relax selftest assertion relative to differences in huge page handling between legacy (v1) TYPE1 IOMMU mapping behavior and the compatibility mode supported by IOMMUFD (David Matlack) - Reintroduce memory poison handling support for non-struct-page- backed memory in the nvgrace-gpu variant driver (Ankit Agrawal) - Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure and semantics (Leon Romanovsky) - Add missing upstream bridge locking across PCI function reset, resolving an assertion failure when secondary bus reset is used to provide that reset (Anthony Pighin) - Fixes to hisi_acc vfio-pci variant driver to resolve corner case issues related to resets, repeated migration, and error injection scenarios (Longfang Liu, Weili Qian) - Restrict vfio selftest builds to arm64 and x86_64, resolving compiler warnings on 32-bit archs (Ted Logan) - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has stepped up (Ioana Ciornei)" * tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio: vfio/fsl-mc: add myself as maintainer vfio: selftests: only build tests on arm64 and x86_64 hisi_acc_vfio_pci: fix the queue parameter anomaly issue hisi_acc_vfio_pci: resolve duplicate migration states hisi_acc_vfio_pci: update status after RAS error hisi_acc_vfio_pci: fix VF reset timeout issue vfio/pci: Lock upstream bridge for vfio_pci_core_disable() types: reuse common phys_vec type instead of DMABUF open‑coded variant vfio/nvgrace-gpu: register device memory for poison handling mm: add stubs for PFNMAP memory failure registration functions vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU vfio: selftests: Add vfio_dma_mapping_mmio_test vfio: selftests: Align BAR mmaps for efficient IOMMU mapping vfio: selftests: Centralize IOMMU mode name definitions vfio/mdev: update outdated comment
2026-02-12Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds-29/+61
Pull SCSI updates from James Bottomley: "Usual driver updates (qla2xxx, mpi3mr, mpt3sas, ufs) plus assorted cleanups and fixes. The biggest core change is the massive code motion in the sd driver to remove forward declarations and the most significant change is to enumify the queuecommand return" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (78 commits) scsi: csiostor: Fix dereference of null pointer rn scsi: buslogic: Reduce stack usage scsi: ufs: host: mediatek: Require CONFIG_PM scsi: ufs: mediatek: Fix page faults in ufs_mtk_clk_scale() trace event scsi: smartpqi: Fix memory leak in pqi_report_phys_luns() scsi: mpi3mr: Make driver probing asynchronous scsi: ufs: core: Flush exception handling work when RPM level is zero scsi: efct: Use IRQF_ONESHOT and default primary handler scsi: ufs: core: Use a host-wide tagset in SDB mode scsi: qla2xxx: target: Add WQ_PERCPU to alloc_workqueue() users scsi: qla2xxx: Add WQ_PERCPU to alloc_workqueue() users scsi: qla4xxx: Add WQ_PERCPU to alloc_workqueue() users scsi: mpi3mr: Driver version update to 8.17.0.3.50 scsi: mpi3mr: Fixed the W=1 compilation warning scsi: mpi3mr: Record and report controller firmware faults scsi: mpi3mr: Update MPI Headers to revision 39 scsi: mpi3mr: Use negotiated link rate from DevicePage0 scsi: mpi3mr: Avoid redundant diag-fault resets scsi: mpi3mr: Rename log data save helper to reflect threaded/BH context scsi: mpi3mr: Add module parameter to control threaded IRQ polling ...
2026-02-12mm: rmap: support batched checks of the references for large foliosBaolin Wang-4/+40
Patch series "support batch checking of references and unmapping for large folios", v6. Currently, folio_referenced_one() always checks the young flag for each PTE sequentially, which is inefficient for large folios. This inefficiency is especially noticeable when reclaiming clean file-backed large folios, where folio_referenced() is observed as a significant performance hotspot. Moreover, on Arm architecture, which supports contiguous PTEs, there is already an optimization to clear the young flags for PTEs within a contiguous range. However, this is not sufficient. We can extend this to perform batched operations for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE). Similar to folio_referenced_one(), we can also apply batched unmapping for large file folios to optimize the performance of file folio reclamation. By supporting batched checking of the young flags, flushing TLB entries, and unmapping, I can observed a significant performance improvements in my performance tests for file folios reclamation. Please check the performance data in the commit message of each patch. This patch (of 5): Currently, folio_referenced_one() always checks the young flag for each PTE sequentially, which is inefficient for large folios. This inefficiency is especially noticeable when reclaiming clean file-backed large folios, where folio_referenced() is observed as a significant performance hotspot. Moreover, on Arm64 architecture, which supports contiguous PTEs, there is already an optimization to clear the young flags for PTEs within a contiguous range. However, this is not sufficient. We can extend this to perform batched operations for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE). Introduce a new API: clear_flush_young_ptes() to facilitate batched checking of the young flags and flushing TLB entries, thereby improving performance during large folio reclamation. And it will be overridden by the architecture that implements a more efficient batch operation in the following patches. While we are at it, rename ptep_clear_flush_young_notify() to clear_flush_young_ptes_notify() to indicate that this is a batch operation. Link: https://lkml.kernel.org/r/cover.1770645603.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/12132694536834262062d1fb304f8f8a064b6750.1770645603.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Barry Song <baohua@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: make vm_area_desc utilise vma_flags_t onlyLorenzo Stoakes-6/+8
Now we have eliminated all uses of vm_area_desc->vm_flags, eliminate this field, and have mmap_prepare users utilise the vma_flags_t vm_area_desc->vma_flags field only. As part of this change we alter is_shared_maywrite() to accept a vma_flags_t parameter, and introduce is_shared_maywrite_vm_flags() for use with legacy vm_flags_t flags. We also update struct mmap_state to add a union between vma_flags and vm_flags temporarily until the mmap logic is also converted to using vma_flags_t. Also update the VMA userland tests to reflect this change. Link: https://lkml.kernel.org/r/fd2a2938b246b4505321954062b1caba7acfc77a.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: update all remaining mmap_prepare users to use vma_flags_tLorenzo Stoakes-9/+23
We will be shortly removing the vm_flags_t field from vm_area_desc so we need to update all mmap_prepare users to only use the dessc->vma_flags field. This patch achieves that and makes all ancillary changes required to make this possible. This lays the groundwork for future work to eliminate the use of vm_flags_t in vm_area_desc altogether and more broadly throughout the kernel. While we're here, we take the opportunity to replace VM_REMAP_FLAGS with VMA_REMAP_FLAGS, the vma_flags_t equivalent. No functional changes intended. Link: https://lkml.kernel.org/r/fb1f55323799f09fe6a36865b31550c9ec67c225.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Damien Le Moal <dlemoal@kernel.org> [zonefs] Acked-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: update shmem_[kernel]_file_*() functions to use vma_flags_tLorenzo Stoakes-5/+3
In order to be able to use only vma_flags_t in vm_area_desc we must adjust shmem file setup functions to operate in terms of vma_flags_t rather than vm_flags_t. This patch makes this change and updates all callers to use the new functions. No functional changes intended. [akpm@linux-foundation.org: comment fixes, per Baolin] Link: https://lkml.kernel.org/r/736febd280eb484d79cef5cf55b8a6f79ad832d2.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Cc: Pedro Falcato <pfalcato@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: update hugetlbfs to use VMA flags on mmap_prepareLorenzo Stoakes-3/+13
In order to update all mmap_prepare users to utilising the new VMA flags type vma_flags_t and associated helper functions, we start by updating hugetlbfs which has a lot of additional logic that requires updating to make this change. This is laying the groundwork for eliminating the vm_flags_t from struct vm_area_desc and using vma_flags_t only, which further lays the ground for removing the deprecated vm_flags_t type altogether. No functional changes intended. Link: https://lkml.kernel.org/r/9226bec80c9aa3447cc2b83354f733841dba8a50.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Cc: Pedro Falcato <pfalcato@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: add basic VMA flag operation helper functionsLorenzo Stoakes-1/+168
Now we have the mk_vma_flags() macro helper which permits easy specification of any number of VMA flags, add helper functions which operate with vma_flags_t parameters. This patch provides vma_flags_test[_mask](), vma_flags_set[_mask]() and vma_flags_clear[_mask]() respectively testing, setting and clearing flags with the _mask variants accepting vma_flag_t parameters, and the non-mask variants implemented as macros which accept a list of flags. This allows us to trivially test/set/clear aggregate VMA flag values as necessary, for instance: if (vma_flags_test(&flags, VMA_READ_BIT, VMA_WRITE_BIT)) goto readwrite; vma_flags_set(&flags, VMA_READ_BIT, VMA_WRITE_BIT); vma_flags_clear(&flags, VMA_READ_BIT, VMA_WRITE_BIT); We also add a function for testing that ALL flags are set for convenience, e.g.: if (vma_flags_test_all(&flags, VMA_READ_BIT, VMA_MAYREAD_BIT)) { /* Both READ and MAYREAD flags set */ ... } The compiler generates optimal assembly for each such that they behave as if the caller were setting the bitmap flags manually. This is important for e.g. drivers which manipulate flag values rather than a VMA's specific flag values. We also add helpers for testing, setting and clearing flags for VMA's and VMA descriptors to reduce boilerplate. Also add the EMPTY_VMA_FLAGS define to aid initialisation of empty flags. Finally, update the userland VMA tests to add the helpers there so they can be utilised as part of userland testing. Link: https://lkml.kernel.org/r/885d4897d67a6a57c0b07fa182a7055ad752df11.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: add mk_vma_flags() bitmap flag macro helperLorenzo Stoakes-0/+33
This patch introduces the mk_vma_flags() macro helper to allow easy manipulation of VMA flags utilising the new bitmap representation implemented of VMA flags defined by the vma_flags_t type. It is a variadic macro which provides a bitwise-or'd representation of all of each individual VMA flag specified. Note that, while we maintain VM_xxx flags for backwards compatibility until the conversion is complete, we define VMA flags of type vma_flag_t using VMA_xxx_BIT to avoid confusing the two. This helper macro therefore can be used thusly: vma_flags_t flags = mk_vma_flags(VMA_READ_BIT, VMA_WRITE_BIT); Testing has demonstrated that the compiler optimises this code such that it generates the same assembly utilising this macro as it does if the flags were specified manually, for instance: vma_flags_t get_flags(void) { return mk_vma_flags(VMA_READ_BIT, VMA_WRITE_BIT, VMA_EXEC_BIT); } Generates the same code as: vma_flags_t get_flags(void) { vma_flags_t flags; vma_flags_clear_all(&flags); vma_flag_set(&flags, VMA_READ_BIT); vma_flag_set(&flags, VMA_WRITE_BIT); vma_flag_set(&flags, VMA_EXEC_BIT); return flags; } And: vma_flags_t get_flags(void) { vma_flags_t flags; unsigned long *bitmap = ACCESS_PRIVATE(&flags, __vma_flags); *bitmap = 1UL << (__force int)VMA_READ_BIT; *bitmap |= 1UL << (__force int)VMA_WRITE_BIT; *bitmap |= 1UL << (__force int)VMA_EXEC_BIT; return flags; } That is: get_flags: movl $7, %eax ret Link: https://lkml.kernel.org/r/fde00df6ff7fb8c4b42cc0defa5a4924c7a1943a.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: rename vma_flag_test/set_atomic() to vma_test/set_atomic_flag()Lorenzo Stoakes-8/+5
In order to stay consistent between functions which manipulate a vm_flags_t argument of the form of vma_flags_...() and those which manipulate a VMA (in this case the flags field of a VMA), rename vma_flag_[test/set]_atomic() to vma_[test/set]_atomic_flag(). This lays the groundwork for adding VMA flag manipulation functions in a subsequent commit. Link: https://lkml.kernel.org/r/033dcf12e819dee5064582bced9b12ea346d1607.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Cc: Pedro Falcato <pfalcato@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm/vma: remove __private sparse decoration from vma_flags_tLorenzo Stoakes-8/+10
Patch series "mm: add bitmap VMA flag helpers and convert all mmap_prepare to use them", v2. We introduced the bitmap VMA type vma_flags_t in the aptly named commit 9ea35a25d51b ("mm: introduce VMA flags bitmap type") in order to permit future growth in VMA flags and to prevent the asinine requirement that VMA flags be available to 64-bit kernels only if they happened to use a bit number about 32-bits. This is a long-term project as there are very many users of VMA flags within the kernel that need to be updated in order to utilise this new type. In order to further this aim, this series adds a number of helper functions to enable ordinary interactions with VMA flags - that is testing, setting and clearing them. In order to make working with VMA bit numbers less cumbersome this series introduces the mk_vma_flags() helper macro which generates a vma_flags_t from a variadic parameter list, e.g.: vma_flags_t flags = mk_vma_flags(VMA_READ_BIT, VMA_WRITE_BIT, VMA_EXEC_BIT); It turns out that the compiler optimises this very well to the point that this is just as efficient as using VM_xxx pre-computed bitmap values. This series then introduces the following functions: bool vma_flags_test_mask(vma_flags_t flags, vma_flags_t to_test); bool vma_flags_test_all_mask(vma_flags_t flags, vma_flags_t to_test); void vma_flags_set_mask(vma_flags_t *flags, vma_flags_t to_set); void vma_flags_clear_mask(vma_flags_t *flags, vma_flags_t to_clear); Providing means of testing any flag, testing all flags, setting, and clearing a specific vma_flags_t mask. For convenience, helper macros are provided - vma_flags_test(), vma_flags_set() and vma_flags_clear(), each of which utilise mk_vma_flags() to make these operations easier, as well as an EMPTY_VMA_FLAGS macro to make initialisation of an empty vma_flags_t value easier, e.g.: vma_flags_t flags = EMPTY_VMA_FLAGS; vma_flags_set(&flags, VMA_READ_BIT, VMA_WRITE_BIT, VMA_EXEC_BIT); ... if (vma_flags_test(flags, VMA_READ_BIT)) { ... } ... if (vma_flags_test_all_mask(flags, VMA_REMAP_FLAGS)) { ... } ... vma_flags_clear(&flags, VMA_READ_BIT); Since callers are often dealing with a vm_area_struct (VMA) or vm_area_desc (VMA descriptor as used in .mmap_prepare) object, this series further provides helpers for these - firstly vma_set_flags_mask() and vma_set_flags() for a VMA: vma_flags_t flags = EMPTY_VMA_FLAGS: vma_flags_set(&flags, VMA_READ_BIT, VMA_WRITE_BIT, VMA_EXEC_BIT); ... vma_set_flags_mask(&vma, flags); ... vma_set_flags(&vma, VMA_DONTDUMP_BIT); Note that these do NOT ensure appropriate locks are taken and assume the callers takes care of this. For VMA descriptors this series adds vma_desc_[test, set, clear]_flags_mask() and vma_desc_[test, set, clear]_flags() for a VMA descriptor, e.g.: static int foo_mmap_prepare(struct vm_area_desc *desc) { ... vma_desc_set_flags(desc, VMA_SEQ_READ_BIT); vma_desc_clear_flags(desc, VMA_RAND_READ_BIT); ... if (vma_desc_test_flags(desc, VMA_SHARED_BIT) { ... } ... } With these helpers introduced, this series then updates all mmap_prepare users to make use of the vma_flags_t vm_area_desc->vma_flags field rather than the legacy vm_flags_t vm_area_desc->vm_flags field. In order to do so, several other related functions need to be updated, with separate patches for larger changes in hugetlbfs, secretmem and shmem before finally removing vm_area_desc->vm_flags altogether. This lays the foundations for future elimination of vm_flags_t and associated defines and functionality altogether in the long run, and elimination of the use of vm_flags_t in f_op->mmap() hooks in the near term as mmap_prepare replaces these. There is a useful synergy between the VMA flags and mmap_prepare work here as with this change in place, converting f_op->mmap() to f_op->mmap_prepare naturally also converts use of vm_flags_t to vma_flags_t in all drivers which declare mmap handlers. This accounts for the majority of the users of the legacy vm_flags_*() helpers and thus a large number of drivers which need to interact with VMA flags in general. This series also updates the userland VMA tests to account for the change, and adds unit tests for these helper functions to assert that they behave as expected. In order to faciliate this change in a sensible way, the series also separates out the VMA unit tests into - code that is duplicated from the kernel that should be kept in sync, code that is customised for test purposes and code that is stubbed out. We also separate out the VMA userland tests into separate files to make it easier to manage and to provide a sensible baseline for adding the userland tests for these helpers. This patch (of 13): We need to pass around these values and access them in a way that sparse does not allow, as __private implies noderef, i.e. disallowing dereference of the value, which manifests as sparse warnings even when passed around benignly. Link: https://lkml.kernel.org/r/cover.1769097829.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/64fa89f416f22a60ae74cfff8fd565e7677be192.1769097829.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Chris Mason <clm@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm/vma: use unmap_desc in exit_mmap() and vms_clear_ptes()Liam R. Howlett-4/+0
Convert vms_clear_ptes() to use unmap_desc to call unmap_vmas() instead of the large argument list. The UNMAP_STATE() cannot be used because the vma iterator in the vms does not point to the correct maple state (mas_detach), and the tree_end will be set incorrectly. Setting up the arguments manually avoids setting the struct up incorrectly and doing extra work to get the correct pagetable range. exit_mmap() also calls unmap_vmas() with many arguments. Using the unmap_all_init() function to set the unmap descriptor for all vmas makes this a bit easier to read. Update to the vma test code is necessary to ensure testing continues to function. No functional changes intended. Link: https://lkml.kernel.org/r/20260121164946.2093480-10-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm: relocate the page table ceiling and floor definitionsLiam R. Howlett-19/+19
Patch series " Remove XA_ZERO from error recovery of dup_mmap()", v3. It is possible that the dup_mmap() call fails on allocating or setting up a vma after the maple tree of the oldmm is copied. Today, that failure point is marked by inserting an XA_ZERO entry over the failure point so that the exact location does not need to be communicated through to exit_mmap(). However, a race exists in the tear down process because the dup_mmap() drops the mmap lock before exit_mmap() can remove the partially set up vma tree. This means that other tasks may get to the mm tree and find the invalid vma pointer (since it's an XA_ZERO entry), even though the mm is marked as MMF_OOM_SKIP and MMF_UNSTABLE. To remove the race fully, the tree must be cleaned up before dropping the lock. This is accomplished by extracting the vma cleanup in exit_mmap() and changing the required functions to pass through the vma search limit. Any other tree modifications would require extra cycles which should be spent on freeing memory. This does run the risk of increasing the possibility of finding no vmas (which is already possible!) in code that isn't careful. The final four patches are to address the excessive argument lists being passed between the functions. Using the struct unmap_desc also allows some special-case code to be removed in favour of the struct setup differences. This patch (of 11): pgtables.h defines a fallback for ceiling and floor of the page tables within the CONFIG_MMU section. Moving the definitions to outside the CONFIG_MMU allows for using them in generic code. [akpm@linux-foundation.org: remove stray newline, per SeongJae] Link: https://lkml.kernel.org/r/20260121164946.2093480-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20260121164946.2093480-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: SeongJae Park <sj@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm/vmscan: select the closest preferred node in demote_folio_list()Bing Jiao-3/+3
The preferred demotion node (migration_target_control.nid) should be the one closest to the source node to minimize migration latency. Currently, a discrepancy exists where demote_folio_list() randomly selects an allowed node if the preferred node from next_demotion_node() is not set in mems_effective. To address it, update next_demotion_node() to select a preferred target against allowed nodes; and to return the closest demotion target if all preferred nodes are not in mems_effective via next_demotion_node(). It ensures that the preferred demotion target is consistently the closest available node to the source node. [akpm@linux-foundation.org: fix comment typo, per Shakeel] Link: https://lkml.kernel.org/r/20260114205305.2869796-3-bingjiao@google.com Signed-off-by: Bing Jiao <bingjiao@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Gregory Price <gourry@gourry.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12mm/vmscan: fix demotion targets checks in reclaim/demotionBing Jiao-6/+6
Patch series "mm/vmscan: fix demotion targets checks in reclaim/demotion", v9. This patch series addresses two issues in demote_folio_list(), can_demote(), and next_demotion_node() in reclaim/demotion. 1. demote_folio_list() and can_demote() do not correctly check demotion target against cpuset.mems_effective, which will cause (a) pages to be demoted to not-allowed nodes and (b) pages fail demotion even if the system still has allowed demotion nodes. Patch 1 fixes this bug by updating cpuset_node_allowed() and mem_cgroup_node_allowed() to return effective_mems, allowing directly logic-and operation against demotion targets. 2. next_demotion_node() returns a preferred demotion target, but it does not check the node against allowed nodes. Patch 2 ensures that next_demotion_node() filters against the allowed node mask and selects the closest demotion target to the source node. This patch (of 2): Fix two bugs in demote_folio_list() and can_demote() due to incorrect demotion target checks against cpuset.mems_effective in reclaim/demotion. Commit 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim") introduces the cpuset.mems_effective check and applies it to can_demote(). However: 1. It does not apply this check in demote_folio_list(), which leads to situations where pages are demoted to nodes that are explicitly excluded from the task's cpuset.mems. 2. It checks only the nodes in the immediate next demotion hierarchy and does not check all allowed demotion targets in can_demote(). This can cause pages to never be demoted if the nodes in the next demotion hierarchy are not set in mems_effective. These bugs break resource isolation provided by cpuset.mems. This is visible from userspace because pages can either fail to be demoted entirely or are demoted to nodes that are not allowed in multi-tier memory systems. To address these bugs, update cpuset_node_allowed() and mem_cgroup_node_allowed() to return effective_mems, allowing directly logic-and operation against demotion targets. Also update can_demote() and demote_folio_list() accordingly. Bug 1 reproduction: Assume a system with 4 nodes, where nodes 0-1 are top-tier and nodes 2-3 are far-tier memory. All nodes have equal capacity. Test script: echo 1 > /sys/kernel/mm/numa/demotion_enabled mkdir /sys/fs/cgroup/test echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control echo "0-2" > /sys/fs/cgroup/test/cpuset.mems echo $$ > /sys/fs/cgroup/test/cgroup.procs swapoff -a # Expectation: Should respect node 0-2 limit. # Observation: Node 3 shows significant allocation (MemFree drops) stress-ng --oomable --vm 1 --vm-bytes 150% --mbind 0,1 Bug 2 reproduction: Assume a system with 6 nodes, where nodes 0-2 are top-tier, node 3 is a far-tier node, and nodes 4-5 are the farthest-tier nodes. All nodes have equal capacity. Test script: echo 1 > /sys/kernel/mm/numa/demotion_enabled mkdir /sys/fs/cgroup/test echo +cpuset > /sys/fs/cgroup/cgroup.subtree_control echo "0-2,4-5" > /sys/fs/cgroup/test/cpuset.mems echo $$ > /sys/fs/cgroup/test/cgroup.procs swapoff -a # Expectation: Pages are demoted to Nodes 4-5 # Observation: No pages are demoted before oom. stress-ng --oomable --vm 1 --vm-bytes 150% --mbind 0,1,2 Link: https://lkml.kernel.org/r/20260114205305.2869796-1-bingjiao@google.com Link: https://lkml.kernel.org/r/20260114205305.2869796-2-bingjiao@google.com Fixes: 7d709f49babc ("vmscan,cgroup: apply mems_effective to reclaim") Signed-off-by: Bing Jiao <bingjiao@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Gregory Price <gourry@gourry.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-12Merge tag 'for-7.0/io_uring-zcrx-large-buffers-20260206' of ↵Linus Torvalds-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring large rx buffer support from Jens Axboe: "Now that the networking updates are upstream, here's the support for large buffers for zcrx. Using larger (bigger than 4K) rx buffers can increase the effiency of zcrx. For example, it's been shown that using 32K buffers can decrease CPU usage by ~30% compared to 4K buffers" * tag 'for-7.0/io_uring-zcrx-large-buffers-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/zcrx: implement large rx buffer support
2026-02-12Merge tag 'trace-rv-v7.0' of ↵Linus Torvalds-524/+499
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verifier updates from Steven Rostedt: - Refactor da_monitor to minimize macros Complete refactor of da_monitor.h to reduce reliance on macros generating functions. Use generic static functions and uses the preprocessor only when strictly necessary (e.g. for tracepoint handlers). The change essentially relies on functions with generic names (e.g. da_handle) instead of monitor-specific as well adding the need to define constant (e.g. MONITOR_NAME, MONITOR_TYPE) before including the header rather than calling macros that would define functions. Also adapt monitors and documentation accordingly. - Cleanup DA code generation scripts Clean up functions in dot2c removing reimplementations of trivial library functions (__buff_to_string) and removing some other unused intermediate steps. - Annotate functions with types in the rvgen python scripts - Remove superfluous assignments and cleanup generated code The rvgen scripts generate a superfluous assignment to 0 for enum variables and don't add commas to the last elements, which is against the kernel coding standards. Change the generation process for a better compliance and slightly simpler logic. - Remove superfluous declarations from generated code The monitor container source files contained a declaration and a definition for the rv_monitor variable. The former is superfluous and was removed. - Fix reference to outdated documentation s/da_monitor_synthesis.rst/monitor_synthesis.rst in comment in da_monitor.h * tag 'trace-rv-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Fix documentation reference in da_monitor.h verification/rvgen: Remove unused variable declaration from containers verification/dot2c: Remove superfluous enum assignment and add last comma verification/dot2c: Remove __buff_to_string() and cleanup verification/rvgen: Annotate DA functions with types verification/rvgen: Adapt dot2k and templates after refactoring da_monitor.h Documentation/rv: Adapt documentation after da_monitor refactoring rv: Cleanup da_monitor after refactor rv: Refactor da_monitor to minimise macros
2026-02-12drm/amdgpu: set family for GC 11.5.4Alex Deucher-0/+1
Set the family for GC 11.5.4 Fixes: 47ae1f938d12 ("drm/amdgpu: add support for GC IP version 11.5.4") Cc: Tim Huang <tim.huang@amd.com> Cc: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Cc: Roman Li <Roman.Li@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-12Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds-288/+1065
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-02-12Merge tag 'mm-stable-2026-02-11-19-22' of ↵Linus Torvalds-544/+1053
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "powerpc/64s: do not re-activate batched TLB flush" makes arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev) It adds a generic enter/leave layer and switches architectures to use it. Various hacks were removed in the process. - "zram: introduce compressed data writeback" implements data compression for zram writeback (Richard Chang and Sergey Senozhatsky) - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous page ranges for hugepages. Large improvements during demand faulting are demonstrated (David Hildenbrand) - "memcg cleanups" tidies up some memcg code (Chen Ridong) - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats" improves DAMOS stat's provided information, deterministic control, and readability (SeongJae Park) - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few issues in the hugetlb cgroup charging selftests (Li Wang) - "Fix va_high_addr_switch.sh test failure - again" addresses several issues in the va_high_addr_switch test (Chunyu Hu) - "mm/damon/tests/core-kunit: extend existing test scenarios" improves the KUnit test coverage for DAMON (Shu Anzai) - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN (Shivank Garg) - "arch, mm: consolidate hugetlb early reservation" reworks and consolidates a pile of straggly code related to reservation of hugetlb memory from bootmem and creation of CMA areas for hugetlb (Mike Rapoport) - "mm: clean up anon_vma implementation" cleans up the anon_vma implementation in various ways (Lorenzo Stoakes) - "tweaks for __alloc_pages_slowpath()" does a little streamlining of the page allocator's slowpath code (Vlastimil Babka) - "memcg: separate private and public ID namespaces" cleans up the memcg ID code and prevents the internal-only private IDs from being exposed to userspace (Shakeel Butt) - "mm: hugetlb: allocate frozen gigantic folio" cleans up the allocation of frozen folios and avoids some atomic refcount operations (Kefeng Wang) - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement of memory betewwn the active and inactive LRUs and adds auto-tuning of the ratio-based quotas and of monitoring intervals (SeongJae Park) - "Support page table check on PowerPC" makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan) - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes nodes_and() and nodes_andnot() propagate the return values from the underlying bit operations, enabling some cleanup in calling code (Yury Norov) - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up some DAMON internal interfaces (SeongJae Park) - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work in khupaged and fixes a scan limit accounting issue (Shivank Garg) - "mm: balloon infrastructure cleanups" goes to town on the balloon infrastructure and its page migration function. Mainly cleanups, also some locking simplification (David Hildenbrand) - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds additional tracepoints to the page reclaim code (Jiayuan Chen) - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is part of Marco's kernel-wide migration from the legacy workqueue APIs over to the preferred unbound workqueues (Marco Crivellari) - "Various mm kselftests improvements/fixes" provides various unrelated improvements/fixes for the mm kselftests (Kevin Brodsky) - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic folio allocation, mainly by avoiding unnecessary work in pfn_range_valid_contig() (Kefeng Wang) - "selftests/damon: improve leak detection and wss estimation reliability" improves the reliability of two of the DAMON selftests (SeongJae Park) - "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION" does some cleanup work in the core DAMON code (SeongJae Park) - "Docs/mm/damon: update intro, modules, maintainer profile, and misc" performs maintenance work on the DAMON documentation (SeongJae Park) - "mm: add and use vma_assert_stabilised() helper" refactors and cleans up the core VMA code. The main aim here is to be able to use the mmap write lock's lockdep state to perform various assertions regarding the locking which the VMA code requires (Lorenzo Stoakes) - "mm, swap: swap table phase II: unify swapin use" removes some old swap code (swap cache bypassing and swap synchronization) which wasn't working very well. Various other cleanups and simplifications were made. The end result is a 20% speedup in one benchmark (Kairui Song) - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM available on 64-bit alpha, loongarch, mips, parisc, and um. Various cleanups were performed along the way (Qi Zheng) * tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits) mm/memory: handle non-split locks correctly in zap_empty_pte_table() mm: move pte table reclaim code to memory.c mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config um: mm: enable MMU_GATHER_RCU_TABLE_FREE parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE mips: mm: enable MMU_GATHER_RCU_TABLE_FREE LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles zsmalloc: make common caches global mm: add SPDX id lines to some mm source files mm/zswap: use %pe to print error pointers mm/vmscan: use %pe to print error pointers mm/readahead: fix typo in comment mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file() mm: refactor vma_map_pages to use vm_insert_pages mm/damon: unify address range representation with damon_addr_range mm/cma: replace snprintf with strscpy in cma_new_area ...