aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-07-08arm64: debug: split bkpt32 exception entryAda Couprie Diaz2-13/+16
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The BKPT32 exception can only be triggered by a BKPT instruction. Thus, we know that the PC is a legitimate address and isn't being used to train a branch predictor with a bogus address : we don't need to call `arm64_apply_bp_hardening()`. The handler for this exception only pends a signal and doesn't depend on any per-CPU state : we don't need to inhibit preemption, nor do we need to keep the DAIF exceptions masked, so we can unmask them earlier. Split the BKPT32 exception entry and adjust function signatures and its behaviour to match its relaxed constraints compared to other debug exceptions. We can also remove `NOKRPOBE_SYMBOL`, as this cannot lead to a kprobe recursion. This replaces the last usage of `el0_dbg()`, so remove it. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-13-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split brk64 exception entryAda Couprie Diaz2-33/+37
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The BRK64 instruction can only be triggered by a BRK instruction. Thus, we know that the PC is a legitimate address and isn't being used to train a branch predictor with a bogus address : we don't need to call `arm64_apply_bp_hardening()`. We do not need to handle the Cortex-A76 erratum #1463225 either, as it only relevant for single stepping at EL1. BRK64 does not write FAR_EL1 either, as only hardware watchpoints do so. Split the BRK64 exception entry, adjust the function signature, and its behaviour to match the lack of needed mitigations. Further, as the EL0 and EL1 code paths are cleanly separated, we can split `do_brk64()` into `do_el0_brk64()` and `do_el1_brk64()`, and call them directly from the relevant entry paths. Use `die()` directly for the EL1 error path, as in `do_el1_bti()` and `do_el1_undef()`. We can also remove `NOKRPOBE_SYMBOL` for the EL0 path, as it cannot lead to a kprobe recursion. When taking a BRK64 exception from EL0, the exception handling is safely preemptible : the only possible handler is `uprobe_brk_handler()`. It only operates on task-local data and properly checks its validity, then raises a Thread Information Flag, processed before returning to userspace in `do_notify_resume()`, which is already preemptible. Thus we can safely unmask interrupts and enable preemption before handling the break itself, fixing a PREEMPT_RT issue where the handler could call a sleeping function with preemption disabled. Given that the break hook registration is handled statically in `call_break_hook` since (arm64: debug: call software break handlers statically) and that we now bypass the exception handler registration, this change renders `early_brk64` redundant : its functionality is now handled through the post-init path. This also removes the last usage of `el1_dbg()`. This also removes the last usage of `el0_dbg()` without `CONFIG_COMPAT`. Mark it `__maybe_unused`, to prevent a warning when building this patch without `CONFIG_COMPAT`, as the following patch removes `el0_dbg()`. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-12-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split hardware watchpoint exception entryAda Couprie Diaz2-12/+36
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. Hardware watchpoints are the only debug exceptions that will write FAR_EL1, so we need to preserve it and pass it down. However, they cannot be used to maliciously train branch predictors, so we can omit calling `arm64_bp_hardening()`, nor do they need to handle the Cortex-A76 erratum #1463225, as it only applies to single stepping exceptions. As the hardware watchpoint handler only returns 0 and never triggers the call to `arm64_notify_die()`, we can call it directly from `entry-common.c`. Split the hardware watchpoint exception entry and adjust the behaviour to match the lack of needed mitigations. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-11-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split single stepping exception entryAda Couprie Diaz3-47/+71
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The single stepping exception has the most constraints : it can be exploited to train branch predictors and it needs special handling at EL1 for the Cortex-A76 erratum #1463225. We need to conserve all those mitigations. However, it does not write an address at FAR_EL1, as only hardware watchpoints do so. The single-step handler does its own signaling if it needs to and only returns 0, so we can call it directly from `entry-common.c`. Split the single stepping exception entry, adjust the function signature, keep the security mitigation and erratum handling. Further, as the EL0 and EL1 code paths are cleanly separated, we can split `do_softstep()` into `do_el0_softstep()` and `do_el1_softstep()` and call them directly from the relevant entry paths. We can also remove `NOKPROBE_SYMBOL` for the EL0 path, as it cannot lead to a kprobe recursion. Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that we can do it as early as possible, and only for the exceptions coming from EL0, where it is needed. This is safe to do as it is `noinstr`, as are all the functions it may call. `el0_ia()` and `el0_pc()` already call it this way. When taking a soft-step exception from EL0, most of the single stepping handling is safely preemptible : the only possible handler is `uprobe_single_step_handler()`. It only operates on task-local data and properly checks its validity, then raises a Thread Information Flag, processed before returning to userspace in `do_notify_resume()`, which is already preemptible. However, the soft-step handler first calls `reinstall_suspended_bps()` to check if there is any hardware breakpoint or watchpoint pending or already stepped through. This cannot be preempted as it manipulates the hardware breakpoint and watchpoint registers. Move the call to `try_step_suspended_breakpoints()` to `entry-common.c` and adjust the relevant comments. We can now safely unmask interrupts before handling the step itself, fixing a PREEMPT_RT issue where the handler could call a sleeping function with preemption disabled. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Closes: https://lore.kernel.org/linux-arm-kernel/Z6YW_Kx4S2tmj2BP@uudg.org/ Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-10-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: refactor reinstall_suspended_bps()Ada Couprie Diaz2-14/+13
`reinstall_suspended_bps()` plays a key part in the stepping process when we have hardware breakpoints and watchpoints enabled. It checks if we need to step one, will re-enable it if it has been handled and will return whether or not we need to proceed with a single-step. However, the current naming and return values make it harder to understand the logic and goal of the function. Rename it `try_step_suspended_breakpoints()` and change the return value to a boolean, aligning it with similar functions used in `do_el0_undef()` like `try_emulate_mrs()`, and making its behaviour more obvious. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-9-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split hardware breakpoint exception entryAda Couprie Diaz2-10/+34
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. Hardware breakpoints exceptions are generated by the hardware after user configuration. As such, they can be exploited when training branch predictors outside of the userspace VA range: they still need to call `arm64_apply_bp_hardening()` if needed to mitigate against this attack. However, they do not need to handle the Cortex-A76 erratum #1463225 as it only applies to single stepping exceptions. It does not set an address in FAR_EL1 either, only the hardware watchpoint does. As the hardware breakpoint handler only returns 0 and never triggers the call to `arm64_notify_die()`, we can call it directly from `entry-common.c`. Split the hardware breakpoint exception entry, adjust the function signature, and handling of the Cortex-A76 erratum to fit the behaviour of the exception. Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that we can do it as early as possible, and only for the exceptions coming from EL0, where it is needed. This is safe to do as it is `noinstr`, as are all the functions it may call. `el0_ia()` and `el0_pc()` already call it this way. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-8-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: entry: Add entry and exit functions for debug exceptionsAda Couprie Diaz1-0/+22
Move the `debug_exception_enter()` and `debug_exception_exit()` functions from mm/fault.c, as they are needed to split the debug exceptions entry paths from the current unified one. Make them externally visible in include/asm/exception.h until the caller in mm/fault.c is cleaned up. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-7-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: remove break/step handler registration infrastructureAda Couprie Diaz1-63/+0
Remove all infrastructure for the dynamic registration previously used by software breakpoints and stepping handlers. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-6-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: call step handlers staticallyAda Couprie Diaz3-47/+10
Software stepping checks for the correct handler by iterating over a list of dynamically registered handlers and calling all of them until one handles the exception. This is the only generic way to handle software stepping handlers in arm64 as the exception does not provide an immediate that could be checked, contrary to software breakpoints. However, the registration mechanism is not exported and has only two current users : the KGDB stepping handler, and the uprobe single step handler. Given that one comes from user mode and the other from kernel mode, call the appropriate one by checking the source EL of the exception. Add a stand-in that returns DBG_HOOK_ERROR when the configuration options are not enabled. Remove `arch_init_uprobes()` as it is not useful anymore and is specific to arm64. Unify the naming of the handler to XXX_single_step_handler(), making it clear they are related. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-5-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: call software breakpoint handlers staticallyAda Couprie Diaz6-111/+62
Software breakpoints pass an immediate value in ESR ("comment") that can be used to call a specialized handler (KGDB, KASAN...). We do so in two different ways : - During early boot, `early_brk64` statically checks against known immediates and calls the corresponding handler, - During init, handlers are dynamically registered into a list. When called, the generic software breakpoint handler will iterate over the list to find the appropriate handler. The dynamic registration does not provide any benefit here as it is not exported and all its uses are within the arm64 tree. It also depends on an RCU list, whose safe access currently relies on the non-preemptible state of `do_debug_exception`. Replace the list iteration logic in `call_break_hooks` to call the breakpoint handlers statically if they are enabled, like in `early_brk64`. Expose the handlers in their respective headers to be reachable from `arch/arm64/kernel/debug-monitors.c` at link time. Unify the naming of the software breakpoint handlers to XXX_brk_handler(), making it clear they are related and to differentiate from the hardware breakpoints. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-4-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: refactor aarch32_break_handler()Ada Couprie Diaz2-6/+6
`aarch32_break_handler()` is called in `do_el0_undef()` when we are trying to handle an exception whose Exception Syndrome is unknown. It checks if the instruction hit might be a 32-bit arm break (be it A32 or T2), and sends a SIGTRAP to userspace if it is so that it can be handled. However, this is badly represented in the naming of the function, and is not consistent with the other functions called with the same logic in `do_el0_undef()`. Rename it `try_handle_aarch32_break()` and change the return value to a boolean to align with the logic of the other tentative handlers in `do_el0_undef()`, the previous error code being ignored anyway. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250707114109.35672-3-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: clean up single_step_handler logicAda Couprie Diaz1-6/+4
Remove the unnecessary boolean which always checks if the handler was found and return early instead. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250707114109.35672-2-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04arm64: poe: Handle spurious Overlay faultsKevin Brodsky1-0/+5
We do not currently issue an ISB after updating POR_EL0 when context-switching it, for instance. The rationale is that if the old value of POR_EL0 is more restrictive and causes a fault during uaccess, the access will be retried [1]. In other words, we are trading an ISB on every context-switching for the (unlikely) possibility of a spurious fault. We may also miss faults if the new value of POR_EL0 is more restrictive, but that's considered acceptable. However, as things stand, a spurious Overlay fault results in uaccess failing right away since it causes fault_from_pkey() to return true. If an Overlay fault is reported, we therefore need to double check POR_EL0 against vma_pkey(vma) - this is what arch_vma_access_permitted() already does. As it turns out, we already perform that explicit check if no Overlay fault is reported, and we need to keep that check (see comment added in fault_from_pkey()). Net result: the Overlay ISS2 bit isn't of much help to decide whether a pkey fault occurred. Remove the check for the Overlay bit from fault_from_pkey() and add a comment to try and explain the situation. While at it, also add a comment to permission_overlay_switch() in case anyone gets surprised by the lack of ISB. [1] https://lore.kernel.org/linux-arm-kernel/ZtYNGBrcE-j35fpw@arm.com/ Fixes: 160a8e13de6c ("arm64: context switch POR_EL0 register") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://lore.kernel.org/r/20250619160042.2499290-2-kevin.brodsky@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04arm64: Filter out SME hwcaps when FEAT_SME isn't implementedMark Brown1-25/+32
We have a number of hwcaps for various SME subfeatures enumerated via ID_AA64SMFR0_EL1. Currently we advertise these without cross checking against the main SME feature, advertised in ID_AA64PFR1_EL1.SME which means that if the two are out of sync userspace can see a confusing situation where SME subfeatures are advertised without the base SME hwcap. This can be readily triggered by using the arm64.nosme override which only masks out ID_AA64PFR1_EL1.SME, and there have also been reports of VMMs which do the same thing. Fix this as we did previously for SVE in 064737920bdb ("arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented") by filtering out the SME subfeature hwcaps when FEAT_SME is not present. Fixes: 5e64b862c482 ("arm64/sme: Basic enumeration support") Reported-by: Yury Khrustalev <yury.khrustalev@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250620-arm64-sme-filter-hwcaps-v1-1-02b9d3c2d8ef@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04arm64: move smp_send_stop() cpu mask off stackArnd Bergmann1-1/+1
For really large values of CONFIG_NR_CPUS, a CPU mask value should not be put on the stack: arch/arm64/kernel/smp.c:1188:1: error: the frame size of 8544 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] This could be achieved using alloc_cpumask_var(), which makes it depend on CONFIG_CPUMASK_OFFSTACK, but as this function is already serialized and can only run on one CPU, making the variable 'static' is easier. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250620111045.3364827-1-arnd@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04arm64: Unconditionally select CONFIG_JUMP_LABELMarc Zyngier1-2/+1
Aneesh reports that his kernel fails to boot in nVHE mode with KVM's protected mode enabled. Further investigation by Mostafa reveals that this fails because CONFIG_JUMP_LABEL=n and that we have static keys shared between EL1 and EL2. While this can be worked around, it is obvious that we have long relied on having CONFIG_JUMP_LABEL enabled at all times, as all supported compilers now have 'asm goto' (which is the basic block for jump labels). Let's simplify our lives once and for all by mandating jump labels. It's not like anyone else is testing anything without them, and we already rely on them for other things (kfence, xfs, preempt). Link: https://lore.kernel.org/r/yq5ah60pkq03.fsf@kernel.org Reported-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Reported-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250613141936.2219895-1-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04arm64: efi: Fix KASAN false positive for EFI runtime stackBreno Leitao1-3/+8
KASAN reports invalid accesses during arch_stack_walk() for EFI runtime services due to vmalloc tagging[1]. The EFI runtime stack must be allocated with KASAN tags reset to avoid false positives. This patch uses arch_alloc_vmap_stack() instead of __vmalloc_node() for EFI stack allocation, which internally calls kasan_reset_tag() The changes ensure EFI runtime stacks are properly sanitized for KASAN while maintaining functional consistency. Link: https://lore.kernel.org/all/aFVVEgD0236LdrL6@gmail.com/ [1] Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20250704-arm_kasan-v2-1-32ebb4fd7607@debian.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-04arm64/watchdog_hld: Add a cpufreq notifier for update watchdog threshYicong Yang1-0/+58
arm64 depends on the cpufreq driver to gain the maximum cpu frequency to convert the watchdog_thresh to perf event period. cpufreq drivers like cppc_cpufreq will be initialized lately after the initializing of the hard lockup detector so just use a safe cpufreq which will be inaccurency. Use a cpufreq notifier to adjust the event's period to a more accurate one. Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250701110214.27242-3-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-03arm64/debug: Drop redundant DBG_MDSCR_* macrosAnshuman Khandual2-13/+13
MDSCR_EL1 has already been defined in tools sysreg format and hence can be used in all debug monitor related call paths. But using generated sysreg definitions causes build warnings because there is a mismatch between mdscr variable (u32) and GENMASK() based masks (long unsigned int). Convert all variables handling MDSCR_EL1 register as u64 which also reflects its true width as well. -------------------------------------------------------------------------- arch/arm64/kernel/debug-monitors.c: In function ‘disable_debug_monitors’: arch/arm64/kernel/debug-monitors.c:108:13: warning: conversion from ‘long unsigned int’ to ‘u32’ {aka ‘unsigned int’} changes value from ‘18446744073709518847’ to ‘4294934527’ [-Woverflow] 108 | disable = ~MDSCR_EL1_MDE; | ^ -------------------------------------------------------------------------- While here, replace an open encoding with MDSCR_EL1_TDCC in __cpu_setup(). Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20250613023646.1215700-2-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02arm64/hwcaps: Add MTE_STORE_ONLY hwcapsYeoreum Yun2-0/+2
Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. add MTE_STORE_ONLY hwcaps so that user can use this feature. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250618092957.2069907-5-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02arm64/kernel: Support store-only mte tag checkYeoreum Yun2-2/+15
Introduce new flag -- MTE_CTRL_STORE_ONLY used to set store-only tag check. This flag isn't overridden by prefered tcf flag setting but set together with prefered setting of way to report tag check fault. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-4-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02arm64/cpufeature: Add MTE_STORE_ONLY featureYeoreum Yun1-0/+8
Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. add MTE_STORE_ONLY feature. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-2-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supportedYeoreum Yun2-0/+2
If FEAT_MTE_TAGGED_FAR (Armv8.9) is supported, bits 63:60 of the fault address are preserved in response to synchronous tag check faults (SEGV_MTESERR). This patch modifies below to support this feature: - Use the original FAR_EL1 value when an MTE tag check fault occurs, if ARM64_MTE_FAR is supported so that not only logical tag (bits 59:56) but also address tag (bits 63:60] being reported too. - Add HWCAP for mtefar to let user know bits 63:60 includes address tag information when when FEAT_MTE_TAGGED_FAR is supported. Applications that require this information should install a signal handler with the SA_EXPOSE_TAGBITS flag. While this introduces a minor ABI change, most applications do not set this flag and therefore will not be affected. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250618084513.1761345-3-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR featureYeoreum Yun1-0/+8
Add FEAT_MTE_TAGGED_FAR cpucap which makes FAR_ELx report all non-address bits on a synchronous MTE tag check fault since Armv8.9 Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Yury Khrustalev <yury.khrustalev@arm.com> Link: https://lore.kernel.org/r/20250618084513.1761345-2-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-01arm64: Implement HAVE_LIVEPATCHSong Liu1-0/+4
Allocate a task flag used to represent the patch pending state for the task. When a livepatch is being loaded or unloaded, the livepatch code uses this flag to select the proper version of a being patched kernel functions to use for current task. In arch/arm64/Kconfig, select HAVE_LIVEPATCH and include proper Kconfig. This is largely based on [1] by Suraj Jitindar Singh. [1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/ Cc: Suraj Jitindar Singh <surajjs@amazon.com> Cc: Torsten Duwe <duwe@suse.de> Acked-by: Miroslav Benes <mbenes@suse.cz> Tested-by: Breno Leitao <leitao@debian.org> Tested-by: Andrea della Porta <andrea.porta@suse.com> Signed-off-by: Song Liu <song@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250630174502.842486-1-song@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-01ACPI: Suppress misleading SPCR console message when SPCR table is absentLi Chen1-3/+7
The kernel currently alway prints: "Use ACPI SPCR as default console: No/Yes " even on systems that lack an SPCR table. This can mislead users into thinking the SPCR table exists on the machines without SPCR. With this change, the "Yes" is only printed if the SPCR table is present, parsed and !param_acpi_nospcr. This avoids user confusion on SPCR-less systems. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Acked-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/20250620131309.126555-3-me@linux.beauty Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-01arm64: pi: use 'targets' instead of extra-y in MakefileMasahiro Yamada1-1/+1
%.pi.o files are built as prerequisites of other objects. There is no need to use extra-y, which is planned for deprecation. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20250602180937.528459-1-masahiroy@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-30arm64: Add BBM Level 2 cpu featureMikołaj Lenczewski1-0/+38
The Break-Before-Make cpu feature supports multiple levels (levels 0-2), and this commit adds a dedicated BBML2 cpufeature to test against support for. To support BBML2 in as wide a range of contexts as we can, we want not only the architectural guarantees that BBML2 makes, but additionally want BBML2 to not create TLB conflict aborts. Not causing aborts avoids us having to prove that no recursive faults can be induced in any path that uses BBML2, allowing its use for arbitrary kernel mappings. This feature builds on the previous ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, as all early cpus must support BBML2 for us to enable it (and any later cpus must also support it to be onlined). Not onlining late cpus that do not support BBML2 is unavoidable, as we might currently be using BBML2 semantics for kernel memory regions. This could cause faults in the late cpus, and would be difficult to unwind, so let us avoid the case altogether. Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250625113435.26849-3-miko.lenczewski@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-30arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability typeCatalin Marinas1-11/+49
For system-wide capabilities, the kernel has the SCOPE_SYSTEM type. Such capabilities are checked once the SMP boot has completed using the sanitised ID registers. However, there is a need for a new capability type similar in scope to the system one but with checking performed locally on each CPU during boot (e.g. based on MIDR_EL1 which is not a sanitised register). Introduce ARM64_CPUCAP_MATCH_ALL_EARLY_CPUS which, together with ARM64_CPUCAP_SCOPE_LOCAL_CPU, ensures that such capability is enabled only if all early CPUs have it. For ease of use, define ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE which combines SCOPE_LOCAL_CPU, PERMITTED_FOR_LATE_CPUS and MATCH_ALL_EARLY_CPUS. Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Reviewed-by: Suzuki K Poulose <Suzuki.Poulose@arm.com> Link: https://lore.kernel.org/r/20250625113435.26849-2-miko.lenczewski@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-20arm64: stacktrace: Implement arch_stack_walk_reliable()Song Liu1-10/+43
Add arch_stack_walk_reliable(), which will be used during kernel live patching to detect when threads have completed executing old versions of functions. Note that arch_stack_walk_reliable() only needs to guarantee that it returns an error code when it cannot provide a reliable stacktrace. It is not required to provide a reliable stacktrace in all scenarios so long as it returns said error code. At present we can only reliably unwind up to an exception boundary. In future we should be able to improve this with additional data from the compiler (e.g. sframe). Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250320171559.3423224-2-song@kernel.org [ Mark: Simplify logic, clarify commit message ] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrea della Porta <andrea.porta@suse.com> Cc: Breno Leitao <leitao@debian.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Song Liu <song@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250521111000.2237470-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-20arm64: stacktrace: Check kretprobe_find_ret_addr() return valueMark Rutland1-0/+2
If kretprobe_find_ret_addr() fails to find the original return address, it returns 0. Check for this case so that a reliable stacktrace won't silently ignore it. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrea della Porta <andrea.porta@suse.com> Cc: Breno Leitao <leitao@debian.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Song Liu <song@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-and-tested-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250521111000.2237470-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-20arm64/module: Use text-poke API for late relocations.Dylan Hatch1-44/+57
To enable late module patching, livepatch modules need to be able to apply some of their relocations well after being loaded. In this scenario however, the livepatch module text and data is already RX-only, so special treatment is needed to make the late relocations possible. To do this, use the text-poking API for these late relocations. This patch is partially based off commit 88fc078a7a8f6 ("x86/module: Use text_poke() for late relocations"). Signed-off-by: Dylan Hatch <dylanbhatch@google.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250603223417.3700218-1-dylanbhatch@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-12arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()Tengda Wu1-1/+1
KASAN reports a stack-out-of-bounds read in regs_get_kernel_stack_nth(). Call Trace: [ 97.283505] BUG: KASAN: stack-out-of-bounds in regs_get_kernel_stack_nth+0xa8/0xc8 [ 97.284677] Read of size 8 at addr ffff800089277c10 by task 1.sh/2550 [ 97.285732] [ 97.286067] CPU: 7 PID: 2550 Comm: 1.sh Not tainted 6.6.0+ #11 [ 97.287032] Hardware name: linux,dummy-virt (DT) [ 97.287815] Call trace: [ 97.288279] dump_backtrace+0xa0/0x128 [ 97.288946] show_stack+0x20/0x38 [ 97.289551] dump_stack_lvl+0x78/0xc8 [ 97.290203] print_address_description.constprop.0+0x84/0x3c8 [ 97.291159] print_report+0xb0/0x280 [ 97.291792] kasan_report+0x84/0xd0 [ 97.292421] __asan_load8+0x9c/0xc0 [ 97.293042] regs_get_kernel_stack_nth+0xa8/0xc8 [ 97.293835] process_fetch_insn+0x770/0xa30 [ 97.294562] kprobe_trace_func+0x254/0x3b0 [ 97.295271] kprobe_dispatcher+0x98/0xe0 [ 97.295955] kprobe_breakpoint_handler+0x1b0/0x210 [ 97.296774] call_break_hook+0xc4/0x100 [ 97.297451] brk_handler+0x24/0x78 [ 97.298073] do_debug_exception+0xac/0x178 [ 97.298785] el1_dbg+0x70/0x90 [ 97.299344] el1h_64_sync_handler+0xcc/0xe8 [ 97.300066] el1h_64_sync+0x78/0x80 [ 97.300699] kernel_clone+0x0/0x500 [ 97.301331] __arm64_sys_clone+0x70/0x90 [ 97.302084] invoke_syscall+0x68/0x198 [ 97.302746] el0_svc_common.constprop.0+0x11c/0x150 [ 97.303569] do_el0_svc+0x38/0x50 [ 97.304164] el0_svc+0x44/0x1d8 [ 97.304749] el0t_64_sync_handler+0x100/0x130 [ 97.305500] el0t_64_sync+0x188/0x190 [ 97.306151] [ 97.306475] The buggy address belongs to stack of task 1.sh/2550 [ 97.307461] and is located at offset 0 in frame: [ 97.308257] __se_sys_clone+0x0/0x138 [ 97.308910] [ 97.309241] This frame has 1 object: [ 97.309873] [48, 184) 'args' [ 97.309876] [ 97.310749] The buggy address belongs to the virtual mapping at [ 97.310749] [ffff800089270000, ffff800089279000) created by: [ 97.310749] dup_task_struct+0xc0/0x2e8 [ 97.313347] [ 97.313674] The buggy address belongs to the physical page: [ 97.314604] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14f69a [ 97.315885] flags: 0x15ffffe00000000(node=1|zone=2|lastcpupid=0xfffff) [ 97.316957] raw: 015ffffe00000000 0000000000000000 dead000000000122 0000000000000000 [ 97.318207] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 97.319445] page dumped because: kasan: bad access detected [ 97.320371] [ 97.320694] Memory state around the buggy address: [ 97.321511] ffff800089277b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 97.322681] ffff800089277b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 97.323846] >ffff800089277c00: 00 00 f1 f1 f1 f1 f1 f1 00 00 00 00 00 00 00 00 [ 97.325023] ^ [ 97.325683] ffff800089277c80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 [ 97.326856] ffff800089277d00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This issue seems to be related to the behavior of some gcc compilers and was also fixed on the s390 architecture before: commit d93a855c31b7 ("s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()") As described in that commit, regs_get_kernel_stack_nth() has confirmed that `addr` is on the stack, so reading the value at `*addr` should be allowed. Use READ_ONCE_NOCHECK() helper to silence the KASAN check for this case. Fixes: 0a8ea52c3eb1 ("arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Link: https://lore.kernel.org/r/20250604005533.1278992-1-wutengda@huaweicloud.com [will: Use '*addr' as the argument to READ_ONCE_NOCHECK()] Signed-off-by: Will Deacon <will@kernel.org>
2025-06-12arm64/gcs: Don't call gcs_free() during flush_gcs()Mark Brown1-1/+3
Currently we call gcs_free() during flush_gcs() to reset the thread state for GCS. This includes unmapping any kernel allocated GCS, but this is redundant when doing a flush_thread() since we are reinitialising the thread memory too. Inline the reinitialisation of the thread struct. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250611-arm64-gcs-flush-thread-v1-1-cc26feeddabd@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-06-07Merge tag 'kbuild-v6.16' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a symbol only to specified modules - Improve ABI handling in gendwarfksyms - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n - Add checkers for redundant or missing <linux/export.h> inclusion - Deprecate the extra-y syntax - Fix a genksyms bug when including enum constants from *.symref files * tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits) genksyms: Fix enum consts from a reference affecting new values arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES} efi/libstub: use 'targets' instead of extra-y in Makefile module: make __mod_device_table__* symbols static scripts/misc-check: check unnecessary #include <linux/export.h> when W=1 scripts/misc-check: check missing #include <linux/export.h> when W=1 scripts/misc-check: add double-quotes to satisfy shellcheck kbuild: move W=1 check for scripts/misc-check to top-level Makefile scripts/tags.sh: allow to use alternative ctags implementation kconfig: introduce menu type enum docs: symbol-namespaces: fix reST warning with literal block kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION docs/core-api/symbol-namespaces: drop table of contents and section numbering modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time kbuild: move kbuild syntax processing to scripts/Makefile.build Makefile: remove dependency on archscripts for header installation Documentation/kbuild: Add new gendwarfksyms kABI rules Documentation/kbuild: Drop section numbers ...
2025-06-07arch: use always-$(KBUILD_BUILTIN) for vmlinux.ldsMasahiro Yamada1-1/+1
The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN), which behaves equivalently. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2025-06-05Merge tag 'arm64-fixes' of ↵Linus Torvalds4-4/+30
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "We've got a couple of build fixes when using LLD, a missing TLB invalidation and a workaround for broken firmware on SoCs with CPUs that implement MPAM: - Disable problematic linker assertions for broken versions of LLD - Work around sporadic link failure with LLD and various randconfig builds - Fix missing invalidation in the TLB batching code when reclaim races with mprotect() and friends - Add a command-line override for MPAM to allow booting on systems with broken firmware" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Add override for MPAM arm64/mm: Close theoretical race where stale TLB entry remains valid arm64: Work around convergence issue with LLD linker arm64: Disable LLD linker ASSERT()s for the time being
2025-06-02arm64: Add override for MPAMXi Ruoyao3-4/+13
As the message of the commit 09e6b306f3ba ("arm64: cpufeature: discover CPU support for MPAM") already states, if a buggy firmware fails to either enable MPAM or emulate the trap as if it were disabled, the kernel will just fail to boot. While upgrading the firmware should be the best solution, we have some hardware of which the vendor have made no response 2 months after we requested a firmware update. Allow overriding it so our devices don't become some e-waste. Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: Mingcong Bai <jeffbai@aosc.io> Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com> Cc: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250602043723.216338-1-xry111@xry111.site Signed-off-by: Will Deacon <will@kernel.org>
2025-06-02arm64: Work around convergence issue with LLD linkerArd Biesheuvel1-0/+11
LLD will occasionally error out with a '__init_end does not converge' error if INIT_IDMAP_DIR_SIZE is defined in terms of _end, as this results in a circular dependency. Counter this by dimensioning the initial IDMAP page tables based on a new boundary marker 'kimage_limit', and define it such that its value should not change as a result of the initdata segment being pushed over a 64k segment boundary due to changes in INIT_IDMAP_DIR_SIZE, provided that its value doesn't change by more than 2M between linker passes. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250531123005.3866382-2-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>
2025-06-02arm64: Disable LLD linker ASSERT()s for the time beingArd Biesheuvel1-0/+6
It turns out [1] that the way LLD handles ASSERT()s in the linker script can result in spurious failures, so disable them for the newly introduced BSS symbol export checks. Since we're not aware of any issues with the existing assertions in vmlinux.lds.S, leave those alone for now so that they can continue to provide useful coverage. A linker fix [2] is due to land in version 21 of LLD. Link: https://lore.kernel.org/r/202505261019.OUlitN6m-lkp@intel.com [1] Link: https://github.com/llvm/llvm-project/commit/5859863bab7f [2] Link: https://github.com/ClangBuiltLinux/linux/issues/2094 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505261019.OUlitN6m-lkp@intel.com/ Link: https://lore.kernel.org/r/20250529073507.2984959-2-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-31Merge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds4-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
2025-05-30Merge tag 'efi-next-for-v6.16' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Not a lot going on in the EFI tree this cycle. The only thing that stands out is the new support for SBAT metadata, which was a bit contentious when it was first proposed, because in the initial incarnation, it would have required us to maintain a revocation index, and bump it each time a vulnerability affecting UEFI secure boot got fixed. This was shot down for obvious reasons. This time, only the changes needed to emit the SBAT section into the PE/COFF image are being carried upstream, and it is up to the distros to decide what to put in there when creating and signing the build. This only has the EFI zboot bits (which the distros will be using for arm64); the x86 bzImage changes should be arriving next cycle, presumably via the -tip tree. Summary: - Add support for emitting a .sbat section into the EFI zboot image, so that downstreams can easily include revocation metadata in the signed EFI images - Align PE symbolic constant names with other projects - Bug fix for the efi_test module - Log the physical address and size of the EFI memory map when failing to map it - A kerneldoc fix for the EFI stub code" * tag 'efi-next-for-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: include: pe.h: Fix PE definitions efi/efi_test: Fix missing pending status update in getwakeuptime efi: zboot specific mechanism for embedding SBAT section efi/libstub: Describe missing 'out' parameter in efi_load_initrd efi: Improve logging around memmap init
2025-05-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-6/+42
Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit 7bcf7246c42a ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...
2025-05-28Merge tag 'arm64-upstream' of ↵Linus Torvalds17-554/+525
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The headline feature is the re-enablement of support for Arm's Scalable Matrix Extension (SME) thanks to a bumper crop of fixes from Mark Rutland. If matrices aren't your thing, then Ryan's page-table optimisation work is much more interesting. Summary: ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code Entry code: - Hook up TIF_NEED_RESCHED_LAZY so that CONFIG_PREEMPT_LAZY can be selected Memory management: - Prevent BSS exports from being used by the early PI code - Propagate level and stride information to the low-level TLB invalidation routines when operating on hugetlb entries - Use the page-table contiguous hint for vmap() mappings with VM_ALLOW_HUGE_VMAP where possible - Optimise vmalloc()/vmap() page-table updates to use "lazy MMU mode" and hook this up on arm64 so that the trailing DSB (used to publish the updates to the hardware walker) can be deferred until the end of the mapping operation - Extend mmap() randomisation for 52-bit virtual addresses (on par with 48-bit addressing) and remove limited support for randomisation of the linear map Perf and PMUs: - Add support for probing the CMN-S3 driver using ACPI - Minor driver fixes to the CMN, Arm-NI and amlogic PMU drivers Selftests: - Fix FPSIMD and SME tests to align with the freshly re-enabled SME support - Fix default setting of the OUTPUT variable so that tests are installed in the right location vDSO: - Replace raw counter access from inline assembly code with a call to the the __arch_counter_get_cntvct() helper function Miscellaneous: - Add some missing header inclusions to the CCA headers - Rework rendering of /proc/cpuinfo to follow the x86-approach and avoid repeated buffer expansion (the user-visible format remains identical) - Remove redundant selection of CONFIG_CRC32 - Extend early error message when failing to map the device-tree blob" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: cputype: Add cputype definition for HIP12 arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again perf/arm-cmn: Add CMN S3 ACPI binding arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace perf/arm-cmn: Initialise cmn->cpu earlier kselftest/arm64: Set default OUTPUT path when undefined arm64: Update comment regarding values in __boot_cpu_mode arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/cpuinfo: only show one cpu's info in c_show() arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop ...
2025-05-27Merge branch 'for-next/sme-fixes' into for-next/coreWill Deacon6-404/+376
* for-next/sme-fixes: (35 commits) arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected arm64/fpsimd: ptrace: Gracefully handle errors arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state arm64/fpsimd: ptrace: Do not present register data for inactive mode arm64/fpsimd: ptrace: Save task state before generating SVE header arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data arm64/fpsimd: Make clone() compatible with ZA lazy saving arm64/fpsimd: Clear PSTATE.SM during clone() arm64/fpsimd: Consistently preserve FPSIMD state during clone() arm64/fpsimd: Remove redundant task->mm check arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return() arm64/fpsimd: Add task_smstop_sm() arm64/fpsimd: Factor out {sve,sme}_state_size() helpers arm64/fpsimd: Clarify sve_sync_*() functions arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE arm64/fpsimd: signal: Consistently read FPSIMD context arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only arm64/fpsimd: Do not discard modified SVE state ...
2025-05-27Merge branch 'for-next/mm' into for-next/coreWill Deacon8-61/+58
* for-next/mm: arm64/boot: Disallow BSS exports to startup code arm64/boot: Move global CPU override variables out of BSS arm64/boot: Move init_pgdir[] and init_idmap_pgdir[] into __pi_ namespace arm64: mm: Drop redundant check in pmd_trans_huge() arm64/mm: Permit lazy_mmu_mode to be nested arm64/mm: Disable barrier batching in interrupt contexts arm64/mm: Batch barriers when updating kernel mappings mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes arm64/mm: Support huge pte-mapped pages in vmap mm/vmalloc: Gracefully unmap huge ptes mm/vmalloc: Warn on improper use of vunmap_range() arm64/mm: Hoist barriers out of set_ptes_anysz() loop arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz() arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() mm/page_table_check: Batch-check pmds/puds just like ptes arm64: hugetlb: Refine tlb maintenance scope arm64: hugetlb: Cleanup huge_pte size discovery mechanisms arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings arm64: Support ARM64_VA_BITS=52 when setting ARCH_MMAP_RND_BITS_MAX arm64/mm: Remove randomization of the linear map
2025-05-27Merge branch 'for-next/misc' into for-next/coreWill Deacon2-59/+58
* for-next/misc: arm64/cpuinfo: only show one cpu's info in c_show() arm64: Extend pr_crit message on invalid FDT arm64: Kconfig: remove unnecessary selection of CRC32 arm64: Add missing includes for mem_encrypt
2025-05-27Merge branch 'for-next/fixes' into for-next/coreWill Deacon2-1/+10
Merge in for-next/fixes, as subsequent improvements to our early PI code that disallow BSS exports depend on the 'arm64_use_ng_mappings' fix here. * for-next/fixes: arm64: cpufeature: Move arm64_use_ng_mappings to the .data section to prevent wrong idmap generation arm64: errata: Add missing sentinels to Spectre-BHB MIDR arrays
2025-05-27Merge branch 'for-next/entry' into for-next/coreWill Deacon2-1/+3
* for-next/entry: arm64: el2_setup.h: Make __init_el2_fgt labels consistent, again arm64: Update comment regarding values in __boot_cpu_mode arm64/mm: Re-organise setting up FEAT_S1PIE registers PIRE0_EL1 and PIR_EL1 arm64: enable PREEMPT_LAZY
2025-05-27Merge branch 'for-next/efi' into for-next/coreWill Deacon2-31/+27
* for-next/efi: arm64/fpsimd: Avoid unnecessary per-CPU buffers for EFI runtime calls