summaryrefslogtreecommitdiffstats
path: root/arch
AgeCommit message (Collapse)AuthorLines
2026-05-17Merge tag 'x86-urgent-2026-05-17' of ↵Linus Torvalds-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: - Fix x86 boot crash for non-kjump kexecs (David Woodhouse) * tag 'x86-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Push kjump return address even for non-kjump kexec
2026-05-17Merge tag 'sched-urgent-2026-05-17' of ↵Linus Torvalds-7/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: - Fix ARM64-specific rseq regressions (Mark Rutland) * tag 'sched-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/entry: Fix arm64-specific rseq brokenness
2026-05-17Merge tag 'ras-urgent-2026-05-17' of ↵Linus Torvalds-28/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MCE fix from Ingo Molnar: - Fix an MCE polling interval adjustment regression (Borislav Petkov) * tag 'ras-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Restore MCA polling interval halving
2026-05-17Merge tag 'riscv-for-linus-7.1-rc4' of ↵Linus Torvalds-22/+72
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "Relatively low-impact fixes. Probably the most notable one is that we no longer ask the monitor-mode firmware to delegate misaligned access handling to the kernel by default, since the kernel code needs significant improvement to match the functionality of the firmware. This change avoids functional problems at some cost in performance, but shouldn't affect any system with misaligned access handling in hardware. - Disable satp register probing when no5lvl is specified on the kernel command line - Fix a CFI-related issue with the misaligned access speed measurement code - Reduce the CFI shadow stack size limit from 4GB to 2GB (following ARM64 GCS) - Prevent the kernel from requesting delegation of misaligned access faults unless a new Kconfig option, RISCV_SBI_FWFT_DELEGATE_MISALIGNED, is enabled. This will depend on CONFIG_NONPORTABLE until the deficiencies of the kernel misaligned access fixup code are fixed - Fix some potential uninitialized memory accesses in error paths in compat_riscv_gpr_set() and compat_restore_sigcontext() - Fix a bug in the RISC-V MIPS vendor errata patching code where a logical-and was used in place of a bitwise-and - Drop some unnecessary code in riscv_fill_hwcap_from_isa_string() - Use macros for isa2hwcap indices in riscv_fill_hwcap(), rather than open-coding them - Fix some documentation typos (one affecting 'make htmldocs')" * tag 'riscv-for-linus-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: misaligned: Make enabling delegation depend on NONPORTABLE riscv: Docs: fix unmatched quote warning riscv: cfi: reduce shadow stack size limit from 4GB to 2GB riscv: cpufeature: Use pre-defined ISA ext macros to index isa2hwcap riscv: mm: Fixup no5lvl failure when vaddr is invalid riscv: Fix register corruption from uninitialized cregs on error riscv: errata: Fix bitwise vs logical AND in MIPS errata patching Documentation: riscv: cmodx: fix typos riscv: cpufeature: Drop this_hwcap clear in T-Head vector workaround riscv: Define __riscv_copy_{,vec_}{words,bytes}_unaligned() using SYM_TYPED_FUNC_START
2026-05-16Merge tag 'powerpc-7.1-3' of ↵Linus Torvalds-52/+27
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix preempt count leak in sysfs show paths - Fix error handling in pika_dtm_thread - Remove pmac_low_i2c_{lock,unlock}() - Enable all windfarms by default - Fix dead default for GUEST_STATE_BUFFER_TEST - Remove redundant preempt_disable|enable() calls from arch_irq_work_raise() Thanks to Aboorva Devarajan, Ally Heev, Amit Machhiwal, Bart Van Assche, Christophe Leroy, Christophe Leroy (CS GROUP), Dan Carpenter, Gautam Menghani, Harsh Prateek Bora, Julian Braha, Krzysztof Kozlowski, Linus Walleij, Ma Ke, Ritesh Harjani (IBM), and Sayali Patil * tag 'powerpc-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/time: Remove redundant preempt_disable|enable() calls from arch_irq_work_raise() powerpc/hv-gpci: fix preempt count leak in sysfs show paths powerpc: fix dead default for GUEST_STATE_BUFFER_TEST powerpc/powermac: Remove pmac_low_i2c_{lock,unlock}() powerpc/warp: Fix error handling in pika_dtm_thread powerpc: 82xx: fix uninitialized pointers with free attribute powerpc/g5: Enable all windfarms by default
2026-05-15Merge tag 'for-linus-7.1b-rc4-tag' of ↵Linus Torvalds-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - one simple cleanup - a fix for a corner case when running as Xen PV dom0 - a fix of a regression for Xen PV guests, introduced in 7.0 * tag 'for-linus-7.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Tolerate nested XEN_LAZY_MMU entering/leaving x86/xen: Fix xen_e820_swap_entry_with_ram() xen/arm: Replace __ASSEMBLY__ with __ASSEMBLER__ in interface.h
2026-05-14Merge tag 'acpi-7.1-rc4' of ↵Linus Torvalds-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support fixes from Rafael Wysocki: "These fix several platform drivers that use the ACPI companion of the given platform device without checking its presence, which may lead to a NULL pointer dereference or other kind of malfunction if the driver is forced to match a device without an ACPI companion via driver override, and restore debug log level for some messages in the ACPI CPPC library: - Check ACPI_COMPANION() against NULL during probe in several core ACPI device drivers (Rafael Wysocki) - Restore log level of messages in amd_set_max_freq_ratio() (Mario Limonciello)" * tag 'acpi-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PAD: xen: Check ACPI_COMPANION() against NULL ACPI: driver: Check ACPI_COMPANION() against NULL during probe Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"
2026-05-14Merge branch 'acpi-cppc'Rafael J. Wysocki-3/+3
Merge a revert of an ACPI CPPC commit that increased the log level of some debug messages which turned out to be a bad idea: - Restore log level of messages in amd_set_max_freq_ratio() (Mario Limonciello) * acpi-cppc: Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"
2026-05-14x86/xen: Tolerate nested XEN_LAZY_MMU entering/leavingJuergen Gross-2/+6
With the support of nested lazy mmu sections it can happen that arch_enter_lazy_mmu_mode() is being called twice without a call of arch_leave_lazy_mmu_mode() in between, as the lazy_mmu_*() helpers are not disabling preemption when checking for nested lazy mmu sections. This is a problem when running as a Xen PV guest, as xen_enter_lazy_mmu() and xen_leave_lazy_mmu() don't tolerate this case. Fix that in xen_enter_lazy_mmu() and xen_leave_lazy_mmu() in order not to hurt all other lazy mmu mode users. Fixes: 291b3abed657 ("x86/xen: use lazy_mmu_state when context-switching") Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260508143933.493013-1-jgross@suse.com>
2026-05-14x86/xen: Fix xen_e820_swap_entry_with_ram()Juergen Gross-1/+1
When swapping a not page-aligned E820 map entry with RAM, the start address of the modified entry is calculated wrong (the offset into the page is subtracted instead of being added to the page address). Fixes: be35d91c8880 ("xen: tolerate ACPI NVS memory overlapping with Xen allocated memory") Reported-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260505102417.208138-1-jgross@suse.com>
2026-05-14powerpc/time: Remove redundant preempt_disable|enable() calls from ↵Sayali Patil-2/+4
arch_irq_work_raise() A kernel panic is observed when handling machine check exceptions from real mode. BUG: Unable to handle kernel data access on read at 0xc00000006be21300 Oops: Kernel access of bad area, sig: 11 [#1] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 88222248 XER: 00000005 CFAR: c00000000003ffc4 DAR: c00000006be21300 DSISR: 40000000 IRQMASK: 0 NIP [c000000000029e40] arch_irq_work_raise+0x10/0x70 LR [c00000000003ffc8] machine_check_queue_event+0xa8/0x150 Call Trace: [c0000000179d3c70] [c00000000003ff64] machine_check_queue_event+0x44/0x150 [c0000000179d3d30] [c0000000000084e0] machine_check_early_common+0x1f0/0x2c0 The crash occurs because arch_irq_work_raise() calls preempt_disable() from machine check exception (MCE) handlers running in real mode. In this context, accessing the preempt_count can fault, leading to the panic. The preempt_disable()/preempt_enable() pair in arch_irq_work_raise() was originally added by commit 0fe1ac48bef0 ("powerpc/perf_event: Fix oops due to perf_event_do_pending call") to avoid races while raising irq work from exception context. Later, commit 471ba0e686cb ("irq_work: Do not raise an IPI when queueing work on the local CPU") added preemption protection in irq_work_queue() path, while commit 20b876918c06 ("irq_work: Use per cpu atomics instead of regular atomics") added equivalent protection in irq_work_queue_on() before reaching arch_irq_work_raise(): irq_work_queue() / irq_work_queue_on() -> preempt_disable() -> __irq_work_queue_local() -> irq_work_raise() -> arch_irq_work_raise() As a result, callers other than mce_irq_work_raise() already execute with preemption disabled, making the additional preempt_disable()/preempt_enable() pair in arch_irq_work_raise() redundant. The arch_irq_work_raise() function executes in NMI context when called from MCE handler. Hence we will not be preempted or scheduled out since we are in NMI context with MSR[EE]=0. Therefore, it is safe to remove the preempt_disable()/preempt_enable() calls from here. Remove it to avoid accessing preempt_count from real mode context. Fixes: cc15ff327569 ("powerpc/mce: Avoid using irq_work_queue() in realmode") Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> [Maddy: Fixed the commit title] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260513081413.222490-1-sayalip@linux.ibm.com
2026-05-13riscv: misaligned: Make enabling delegation depend on NONPORTABLEVivian Wang-1/+23
The unaligned access emulation code in Linux has various deficiencies. For example, it doesn't emulate vector instructions [1] [2], and doesn't emulate KVM guest accesses. Therefore, requesting misaligned exception delegation with SBI FWFT actually regresses vector instructions' and KVM guests' behavior. Until Linux can handle it properly, guard these sbi_fwft_set() calls behind RISCV_SBI_FWFT_DELEGATE_MISALIGNED, which in turn depends on NONPORTABLE. Those who are sure that this wouldn't be a problem can enable this option, perhaps getting better performance. The rest of the existing code proceeds as before, except as if SBI_FWFT_MISALIGNED_EXC_DELEG is not available, to handle any remaining address misaligned exceptions on a best-effort basis. The KVM SBI FWFT implementation is also not touched, but it is disabled if the firmware emulates unaligned accesses. Cc: stable@vger.kernel.org Fixes: cf5a8abc6560 ("riscv: misaligned: request misaligned exception from SBI") Reported-by: Songsong Zhang <U2FsdGVkX1@gmail.com> # KVM Link: https://lore.kernel.org/linux-riscv/38ce44c1-08cf-4e3f-8ade-20da224f529c@iscas.ac.cn/ [1] Link: https://lore.kernel.org/linux-riscv/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/ [2] Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20260401-riscv-misaligned-dont-delegate-v2-1-5014a288c097@iscas.ac.cn Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-05-13riscv: cfi: reduce shadow stack size limit from 4GB to 2GBZong Li-3/+4
Follow the ARM64 GCS (Guarded Control Stack) implementation approach by reducing the shadow stack size allocation from min(RLIMIT_STACK, 4GB) to min(RLIMIT_STACK/2, 2GB). See commit 506496bcbb42 ("arm64/gcs: Ensure that new threads have a GCS") Rationale: 1. Shadow stacks only store return addresses (8 bytes per entry), not local variables, function parameters, or saved registers. A 2GB shadow stack is far more than sufficient for any practical application, even with extremely deep recursion. Using half the size maintains adequate margin while being more resource-efficient. 2. On memory-constrained systems (e.g., platforms with only 4GB of physical memory, which is a common configuration), allocating 4GB of virtual address space for shadow stack per process/thread can lead to virtual memory allocation failures when the overcommit mode is set to OVERCOMMIT_GUESS or OVERCOMMIT_NEVER: Error: "__vm_enough_memory: not enough memory for the allocation" This reduces virtual address space consumption by 50% while maintaining more than adequate space for return address storage. Signed-off-by: Zong Li <zong.li@sifive.com> Link: https://patch.msgid.link/20260428024105.645162-1-zong.li@sifive.com [pjw@kernel.org: clean up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-05-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds-58/+182
Pull kvm fixes from Paolo Bonzini: "arm64: - Add the pKVM side of the workaround for ARM's erratum 4193714, provided that the EL3 firmware does its part of the job. KVM will refuse to initialise otherwise - Correctly handle 52bit VAs for guest EL2 stage-1 translations when running under NV with E2H==0 - Correctly deal with permission faults in guest_memfd memslots - Fix the steal-time selftest after the infrastructure was reworked - Make sure the host cannot pass a non-sensical clock update to the EL2 tracing infrastructure - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390 ability to run arm64 guests, which will inevitably lead to arm64 code being directly used on s390 - Make sure that EL2 is configured with both exception entry and exit being Context Synchronization Events - Handle the current vcpu being NULL on EL2 panic - Fix the selftest_vcpu memcache being empty at the point of donation or sharing - Check that the memcache has enough capacity before engaging on the share/donate path - Fix __deactivate_fgt() to use its parameter rather than a variable in the macro context s390: - Fix array overrun with large amounts of PCI devices x86: - Never use L0's PAUSE loop exiting while L2 is running, since it's unlikely that a nested guest will help solving the hypervisor's spinlock contention - Fix emulation of MOVNTDQA - Fix typo in Xen hypercall tracepoint - Add back an optimization that was left behind when recently fixing a bug - Add module parameter to disable CET, whose implementation seems to have issues. For now it remains enabled by default Generic: - Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn() Documentation: - Update stale links Selftests: - Fix guest_memfd_test with host page size > guest page size" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: VMX: introduce module parameter to disable CET KVM: x86: Swap the dst and src operand for MOVNTDQA KVM: x86: use again the flush argument of __link_shadow_page() KVM: selftests: Ensure gmem file sizes are multiple of host page size Documentation: kvm: update links in the references section of AMD Memory Encryption KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running KVM: x86: Fix Xen hypercall tracepoint argument assignment KVM: Reject wrapped offset in kvm_reset_dirty_gfn() KVM: arm64: Pre-check vcpu memcache for host->guest donate KVM: arm64: Pre-check vcpu memcache for host->guest share KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache KVM: arm64: Fix __deactivate_fgt macro parameter typo KVM: arm64: Guard against NULL vcpu on VHE hyp panic path KVM: arm64: Make EL2 exception entry and exit context-synchronization events MAINTAINERS: Add Steffen as reviewer for KVM/arm64 KVM: arm64: Remove potential UB on nvhe tracing clock update KVM: selftests: arm64: Fix steal_time test after UAPI refactoring KVM: arm64: Handle permission faults with guest_memfd KVM: arm64: nv: Consider the DS bit when translating TCR_EL2 KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests ...
2026-05-13x86/mce: Restore MCA polling interval halvingBorislav Petkov (AMD)-28/+5
RongQing reported that the MCA polling interval doesn't halve when an error gets logged. It was traced down to the commit in Fixes:, because: mce_timer_fn() |-> mce_poll_banks() |-> machine_check_poll() |-> mce_log() which will queue the work and return. Now, back in mce_timer_fn(): /* * Alert userspace if needed. If we logged an MCE, reduce the polling * interval, otherwise increase the polling interval. */ if (mce_notify_irq()) <--- here we haven't ran the notifier chain yet so mce_need_notify is not set yet so this won't hit and we won't halve the interval iv. Now the notifier chain runs. mce_early_notifier() sets the bit, does mce_notify_irq(), that clears the bit and then the notifier chain a little later logs the error. So this is a silly timing issue. But, that's all unnecessary. All it needs to happen here is, the "should we notify of a logged MCE" mce_notify_irq() asks, should be simply a question to the mce gen pool: "Are you empty?" And that then turns into a simple yes or no answer and it all JustWorks(tm). So do that and also distribute the functionality where it belongs: - Print that MCE events have been logged in mce_log() - Trigger the mcelog tool specific work in the first notifier As a result, mce_notify_irq() can go now. Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector") Reported-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@baidu.com
2026-05-13KVM: VMX: introduce module parameter to disable CETPaolo Bonzini-2/+16
There have been reports of host hangs caused by CET virtualization. Until these are analyzed further, introduce a module parameter that makes it possible to easily disable it. Link: https://lore.kernel.org/all/85548beb-1486-40f9-beb4-632c78e3360b@proxmox.com/ Cc: David Riley <d.riley@proxmox.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12Merge tag 'kvm-s390-master-7.1-1' of ↵Paolo Bonzini-8/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: pci: fix array indexing For large amounts of PCI devices its possible to overrun the arrays as the index was miscalculated in 2 places.
2026-05-12KVM: x86: Swap the dst and src operand for MOVNTDQASean Christopherson-1/+1
Swap the MOVNTDQA operands, as MOVNTDQA does NOT in fact have "the same characteristics as 0F E7 (MOVNTDQ)"; MOVNTDQA loads from memory and stores to registers, while MOVNTDQ loads from registers and stores to memory. Per the SDM: MOVNTDQ - Move packed integer values in xmm1 to m128 using non-temporal hint. MOVNTDQA - Move double quadword from m128 to xmm1 using non-temporal hint if WC memory type. Reported-by: Josh Eads <josheads@google.com> Fixes: c57d9bafbd0b ("KVM: x86: Add support for emulating MOVNTDQA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260506213514.2781948-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12KVM: x86: use again the flush argument of __link_shadow_page()Paolo Bonzini-2/+21
Except in the case of parentless nested-TDP pages, mmu_page_zap_pte() clears the SPTE but leaves the invalid_list empty. In this case, using kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill. Avoid flushing the entirety of the remote TLBs unless the invalid_list was populated: instead, use a more efficient gfn-targeting flush (if available) and skip it altogether if the caller guarantees that a TLB flush is not necessary. Based-on: <20260503201029.106481-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20260503210917.121840-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12Merge tag 'kvmarm-fixes-7.1-2' of ↵Paolo Bonzini-13/+111
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #2 - Add the pKVM side of the workaround for ARM's erratum 4193714, provided that the EL3 firmware does its part of the job. KVM will refuse to initialise otherwise. - Correctly handle 52bit VAs for guest EL2 stage-1 translations when running under NV with E2H==0. - Correctly deal with permission faults in guest_memfd memslots. - Fix the steal-time selftest after the infrastructure was reworked. - Make sure the host cannot pass a non-sensical clock update to the EL2 tracing infrastructure. - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390 ability to run arm64 guests, which will inevitably lead to arm64 code being directly used on s390. - Make sure that EL2 is configured with both exception entry and exit being Context Synchronization Events. - Handle the current vcpu being NULL on EL2 panic. - Fix the selftest_vcpu memcache being empty at the point of donation or sharing. - Check that the memcache has enough capacity before engaging on the share/donate path. - Fix __deactivate_fgt() to use its parameter rather than a variable in the macro context.
2026-05-12KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is runningSean Christopherson-31/+27
Never use L0's (KVM's) PAUSE loop exiting controls while L2 is running, and instead always configure vmcb02 according to L1's exact capabilities and desires. The purpose of intercepting PAUSE after N attempts is to detect when the vCPU may be stuck waiting on a lock, so that KVM can schedule in a different vCPU that may be holding said lock. Barring a very interesting setup, L1 and L2 do not share locks, and it's extremely unlikely that an L1 vCPU would hold a spinlock while running L2. I.e. having a vCPU executing in L1 yield to a vCPU running in L2 will not allow the L1 vCPU to make forward progress, and vice versa. While teaching KVM's "on spin" logic to only yield to other vCPUs in L2 is doable, in all likelihood it would do more harm than good for most setups. KVM has limited visibility into which L2 "vCPUs" belong to the same VM, and thus share a locking domain. And even if L2 vCPUs are in the same VM, KVM has no visilibity into L2 vCPU's that are scheduled out by the L1 hypervisor. Furthermore, KVM doesn't actually steal PAUSE exits from L1. If L1 is intercepting PAUSE, KVM will route PAUSE exits to L1, not L0, as nested_svm_intercept() gives priority to the vmcb12 intercept. As such, overriding the count/threshold fields in vmcb02 with vmcb01's values is nonsensical, as doing so clobbers all the training/learning that has been done in L1. Even worse, if L1 is not intercepting PAUSE, i.e. KVM is handling PAUSE exits, then KVM will adjust the PLE knobs based on L2 behavior, which could very well be detrimental to L1, e.g. due to essentially poisoning L1 PLE training with bad data. And copying the count from vmcb02 to vmcb01 on a nested VM-Exit makes even less sense, because again, the purpose of PLE is to detect spinning vCPUs. Whether or not a vCPU is spinning in L2 at the time of a nested VM-Exit has no relevance as to the behavior of the vCPU when it executes in L1. The only scenarios where any of this actually works is if at least one of KVM or L1 is NOT intercepting PAUSE for the guest. Per the original changelog, those were the only scenarios considered to be supported. Disabling KVM's use of PLE makes it so the VM is always in a "supported" mode. Last, but certainly not least, using KVM's count/threshold instead of the values provided by L1 is a blatant violation of the SVM architecture. Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Cc: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260508213321.373309-1-seanjc@google.com/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12KVM: x86: Fix Xen hypercall tracepoint argument assignmentQiang Ma-1/+1
TRACE_EVENT(kvm_xen_hypercall) stores a5 in __entry->a4 instead of __entry->a5. That overwrites the recorded a4 argument and leaves a5 unset in the trace entry. Fix the typo so both arguments are captured correctly. Signed-off-by: Qiang Ma <maqianga@uniontech.com> Link: https://patch.msgid.link/20260512015313.1685784-1-maqianga@uniontech.com/ Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12powerpc/hv-gpci: fix preempt count leak in sysfs show pathsAboorva Devarajan-8/+16
Four sysfs show() callbacks in hv-gpci take get_cpu_var(hv_gpci_reqb) (which calls preempt_disable()) but only call the matching put_cpu_var() on the error path under the 'out:' label. Every successful read leaks one preempt_disable(): processor_bus_topology_show() processor_config_show() affinity_domain_via_virtual_processor_show() affinity_domain_via_domain_show() (affinity_domain_via_partition_show() was already correct.) On a CONFIG_PREEMPT=y kernel, repeated reads raise preempt_count and eventually return to userspace with preemption still disabled. The next user-mode page fault then hits faulthandler_disabled() == 1, gets forced to SIGSEGV, and the resulting coredump trips 'BUG: scheduling while atomic' in call_usermodehelper_exec -> wait_for_completion_state -> schedule: BUG: scheduling while atomic: <task>/<pid>/0x00000004 ... __schedule_bug+0x6c/0x90 __schedule+0x58c/0x13a0 schedule+0x48/0x1a0 schedule_timeout+0x104/0x170 wait_for_completion_state+0x16c/0x330 call_usermodehelper_exec+0x254/0x2d0 vfs_coredump+0x1050/0x2590 get_signal+0xb9c/0xc80 do_notify_resume+0xf8/0x470 Add an out_success label that calls put_cpu_var() before returning the byte count, mirroring affinity_domain_via_partition_show(). Fixes: 71f1c39647d8 ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor bus topology information") Fixes: 1a160c2a13c6 ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show processor config information") Fixes: 71a7ccb478fc ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via virtual processor information") Fixes: a69a57cac1ec ("powerpc/hv_gpci: Add sysfs file inside hv_gpci device to show affinity domain via domain information") Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260508041256.3447113-1-aboorvad@linux.ibm.com
2026-05-12powerpc: fix dead default for GUEST_STATE_BUFFER_TESTJulian Braha-2/+1
The GUEST_STATE_BUFFER_TEST config option should default to KUNIT_ALL_TESTS so that if all tests are enabled then it is included, but currently the 'default KUNIT_ALL_TESTS' statement is shadowed by 'def_tristate n', meaning that this second default statement is currently dead code. It looks to me like the commit 6ccbbc33f06a ("KVM: PPC: Add helper library for Guest State Buffers") intended to set the default to KUNIT_ALL_TESTS, but mistakenly missed the def_tristate. This dead code was found by kconfirm, a static analysis tool for Kconfig. Fixes: 6ccbbc33f06a ("KVM: PPC: Add helper library for Guest State Buffers") Signed-off-by: Julian Braha <julianbraha@gmail.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Reviewed-by: Amit Machhiwal <amachhiw@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260405161545.161006-1-julianbraha@gmail.com
2026-05-12powerpc/powermac: Remove pmac_low_i2c_{lock,unlock}()Bart Van Assche-38/+0
Commit a28d3af2a26c ("[PATCH] 2/5 powerpc: Rework PowerMac i2c part 2") removed the last calls to the pmac_low_i2c_{lock,unlock}() functions. Hence, remove these two functions. Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260316174747.3871924-1-bvanassche@acm.org
2026-05-12powerpc/warp: Fix error handling in pika_dtm_threadMa Ke-0/+2
pika_dtm_thread() acquires client through of_find_i2c_device_by_node() but fails to release it in error handling path. This could result in a reference count leak, preventing proper cleanup and potentially leading to resource exhaustion. Add put_device() to release the reference in the error handling path. Found by code review. Cc: stable@vger.kernel.org Fixes: 3984114f0562 ("powerpc/warp: Platform fix for i2c change") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251116024411.21968-1-make24@iscas.ac.cn
2026-05-12powerpc: 82xx: fix uninitialized pointers with free attributeAlly Heev-2/+2
Uninitialized pointers with `__free` attribute can cause undefined behavior as the memory allocated to the pointer is freed automatically when the pointer goes out of scope. powerpc/km82xx doesn't have any bugs related to this as of now, but, it is better to initialize and assign pointers with `__free` attribute in one statement to ensure proper scope-based cleanup Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/aPiG_F5EBQUjZqsl@stanley.mountain/ Signed-off-by: Ally Heev <allyheev@gmail.com> Fixes: 4aa5cc1e0012 ("powerpc-km82xx.c: replace of_node_put() with __free") Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251116-aheev-uninitialized-free-attr-km82xx-v2-1-4307e2b5300d@gmail.com
2026-05-12powerpc/g5: Enable all windfarms by defaultLinus Walleij-0/+2
The G5 defconfig is clearly intended for the G5 Powermac series, and that should enable all the available windfarm drivers, or the machine will overheat a short while after booting and shut itself down, which is annoying. Signed-off-by: Linus Walleij <linusw@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260505-powermac-g5-config-v3-1-7747bf72f874@kernel.org
2026-05-11x86/CPU/AMD: Prevent improper isolation of shared resources in Zen2's op cachePrathyushi Nangia-1/+5
Make sure resources are not improperly shared in the op cache and cause instruction corruption this way. Signed-off-by: Prathyushi Nangia <prathyushi.nangia@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-05-09Merge tag 'powerpc-7.1-2' of ↵Linus Torvalds-121/+235
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix KASAN sanitization flag for core_$(BITS).o - Fixes for handling offset values in pseries htmdump - Fix interrupt mask in cpm1_gpiochip_add16() - ps3/pasemi fixes to drop redundant result assignment - Fixes in papr-hvpipe code path - powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events Thanks to Aboorva Devarajan, Athira Rajeev, Christophe Leroy (CS GROUP), Geert Uytterhoeven, Haren Myneni, Krzysztof Kozlowski, Mukesh Kumar Chaurasiya (IBM), Nathan Chancellor, Ritesh Harjani (IBM), Shivani Nittor, Sourabh Jain, Thomas Zimmermann, and Venkat Rao Bagalkote. * tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits) powerpc/pasemi: Drop redundant res assignment powerpc/ps3: Drop redundant result assignment powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16() powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ() pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg() pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release() pseries/papr-hvpipe: Fix the usage of copy_to_user() pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init() pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle() pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace pseries/papr-hvpipe: Fix race with interrupt handler powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module powerpc/pseries/htmdump: Fix the offset value used in htm status dump powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump ...
2026-05-08Merge tag 'x86-urgent-2026-05-09' of ↵Linus Torvalds-5/+14
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix memory map enumeration bug in the Xen e820 parsing code (Juergen Gross) - Re-enable e820 BIOS fallback if e820 table is empty (David Gow) * tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/e820: Re-enable BIOS fallback if e820 table is empty x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()
2026-05-08Merge tag 'perf-urgent-2026-05-09' of ↵Linus Torvalds-16/+57
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fixes from Ingo Molnar: - Fix deadlock in the perf_mmap() failure path (Peter Zijlstra) - Intel ACR (Auto Counter Reload) fixes (Dapeng Mi): - Fix validation and configuration of ACR masks - Fix ACR rescheduling bug causing stale masks - Disable the PMI on ACR-enabled hardware - Enable ACR on Panther Cover uarch too * tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Enable auto counter reload for DMR perf/x86/intel: Disable PMI for self-reloaded ACR events perf/x86/intel: Always reprogram ACR events to prevent stale masks perf/x86/intel: Improve validation and configuration of ACR masks perf/core: Fix deadlock in perf_mmap() failure path
2026-05-08Merge tag 'arm64-fixes' of ↵Linus Torvalds-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: - ptrace(PTRACE_SETREGSET) fix to zero the target's fpsimd_state rather than the tracer's * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's
2026-05-08Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"Mario Limonciello-3/+3
Some older systems don't support CPPC in the firmware and this just makes noise for them when booting. Drop back to debug. This reverts commit 21fb59ab4b9767085f4fe1edbdbe3177fbb9ec97. Fixes: 21fb59ab4b976 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn") Suggested-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: All applicable <stable@vger.kernel.org> Link: https://patch.msgid.link/20260504230141.484743-2-mario.limonciello@amd.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-05-08arm64/entry: Fix arm64-specific rseq brokennessMark Rutland-7/+24
Mathias Stearn reports that since v6.19, there are two big issues affecting rseq: (1) On arm64 specifically, rseq critical sections aren't aborted when they should be. (2) The 'cpu_id_start' field is no longer written by the kernel in all cases it used to be, including some cases where TCMalloc depends on the kernel clobbering the field. This patch fixes issue #1. This patch DOES NOT fix issue #2, which will need to be addressed by other patches. The arm64-specific brokenness is a result of commits: 2fc0e4b4126c ("rseq: Record interrupt from user space") 39a167560a61 ("rseq: Optimize event setting") The first commit failed to add a call to rseq_note_user_irq_entry() on arm64. Thus arm64 never sets rseq_event::user_irq to record that it may be necessary to abort an active rseq critical section upon return to userspace. On its own, this commit had no functional impact as the value of rseq_event::user_irq was not consumed. The second commit relied upon rseq_event::user_irq to determine whether or not to bother to perform rseq work when returning to userspace. As rseq_event::user_irq wasn't set on arm64, this work would be skipped, and consequently an active rseq critical section would not be aborted. Fix this by giving arm64 syscall-specific entry/exit paths, and performing the relevant logic in syscall and non-syscall paths, including calling rseq_note_user_irq_entry() for non-syscall entry. Currently arm64 cannot use syscall_enter_from_user_mode(), syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to ordering constraints with exception masking, and risk of ABI breakage for syscall tracing/audit/etc. For the moment the entry/exit logic is left as arm64-specific, directly using enter_from_user_mode() and exit_to_user_mode(), but mirroring the generic code. I intend to follow up with refactoring/cleanup, as we did for kernel mode entry paths in commit: 041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()") ... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly. Fixes: 39a167560a61 ("rseq: Optimize event setting") Reported-by: Mathias Stearn <mathias@mongodb.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/ Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com
2026-05-08x86/kexec: Push kjump return address even for non-kjump kexecDavid Woodhouse-0/+8
The version of purgatory code shipped by kexec-tools attempts to look above the top of its stack to find a return address for a kjump, even in a non-kjump kexec. After the commit in Fixes: the word above the stack might not be there, leading to a fault (which is at least now caught by my exception-handling code in kexec). That commit fixed things for the actual kjump path, but no longer "gratuitously" pushes the unused return address to the stack in the non-kjump path. Put that *back* in the non-kjump path, to prevent purgatory from crashing when trying to access it. Fixes: 2cacf7f23a02 ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context") Reported-by: Rohan Kakulawaram <rohanka@google.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Rohan Kakulawaram <rohanka@google.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/32d627134143ffd957891cb697138e839c623211.camel@infradead.org
2026-05-07riscv: cpufeature: Use pre-defined ISA ext macros to index isa2hwcapHui Wang-8/+8
We have pre-defined ISA extension macros, here use those macros to replace a magic number for isa2hwcap definition and some array indexing for isa2hwcap access. This doesn't change the original functionality, just improve the code maintainability and readability. Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://patch.msgid.link/20260506132152.53239-1-hui.wang@canonical.com Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-05-07KVM: arm64: Pre-check vcpu memcache for host->guest donateFuad Tabba-0/+4
__pkvm_host_donate_guest() flips the host stage-2 PTE for the donated page to a non-valid annotation via host_stage2_set_owner_metadata_locked() and then calls kvm_pgtable_stage2_map() to install the matching guest stage-2 mapping. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM even at PAGE_SIZE granularity: the donate path verifies PKVM_NOPAGE for the guest IPA before the map, so the walker must allocate fresh page-table pages from the vcpu memcache, and the host controls the vcpu memcache via the topup interface. An under-provisioned donation request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation alongside the existing __host_check_page_state_range() / __guest_check_page_state_range() pre-checks, using the helper introduced for host->guest share. If the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(), return -ENOMEM before any state mutation. Fixes: 1e579adca177 ("KVM: arm64: Introduce __pkvm_host_donate_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Pre-check vcpu memcache for host->guest shareFuad Tabba-0/+20
__pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to install the guest stage-2 mapping, after a forward pass that mutates the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments host_share_guest_count) for every page in the range. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM when the stage-2 walker exhausts the caller's memcache, and the host controls the vcpu memcache via the topup interface, so an under-provisioned share request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation in the existing pre-check pass so that kvm_pgtable_stage2_map() cannot fail at the call site, using kvm_mmu_cache_min_pages() -- the same bound host EL1 uses for its own stage-2 maps. If the vcpu memcache holds fewer pages, return -ENOMEM before any state mutation. Fixes: d0bd3e6570ae ("KVM: arm64: Introduce __pkvm_host_share_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Seed pkvm_ownership_selftest vcpu memcacheFuad Tabba-1/+15
The hypercall handlers call pkvm_refill_memcache() to top up the hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest(). pkvm_ownership_selftest invokes those functions directly with a static selftest_vcpu that has an empty memcache. Seed selftest_vcpu's memcache from the prepopulated selftest pages, leaving the remainder for selftest_vm.pool. Required by the memcache-sufficiency pre-check added in the following patches. Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Fix __deactivate_fgt macro parameter typoFuad Tabba-1/+1
__deactivate_fgt() declares its first parameter as "htcxt" but the body references "hctxt". The parameter is unused; the macro silently captures "hctxt" from the enclosing scope. Both existing callers (__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen to define a local "struct kvm_cpu_context *hctxt", so the macro works by coincidence. A future caller without an "hctxt" local in scope, or naming it differently, would compile but bind to the wrong context. Align the parameter name with the sibling __activate_fgt() macro. The "vcpu" parameter remains unused in the body, kept for API symmetry with __activate_fgt() (which uses it). Fixes: f5a5a406b4b8 ("KVM: arm64: Propagate and handle Fine-Grained UNDEF bits") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Guard against NULL vcpu on VHE hyp panic pathFuad Tabba-1/+2
On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu) on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer is cleared after every guest exit (and is never set when no guest is running), so an unexpected EL2 exception landing in _guest_exit_panic, e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this function with vcpu == NULL. __deactivate_traps() then dereferences vcpu via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv() -> vcpu->arch.features, faulting inside the panic handler and obscuring the original failure. The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c) already guards its vcpu-using cleanup with "if (vcpu)"; mirror that here. sysreg_restore_host_state_vhe() does not depend on vcpu and continues to run unconditionally, preserving panic forensics. The trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's %p handling. Fixes: 6a0259ed29bb ("KVM: arm64: Remove hyp_panic arguments") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Make EL2 exception entry and exit context-synchronization eventsFuad Tabba-1/+1
SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI 0487 M.b D24.2.175 (p. D24-9754): - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally a CSE. - FEAT_ExS: the reset value is architecturally UNKNOWN; software must set the bit to make the entry/exit a CSE. INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after MSRs to context-switching system registers (HCR_EL2, ZCR_EL2, ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not architecturally backed unless EOS=1 (and, for entry, EIS=1). Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON was setting both unconditionally. The conversion made SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only models unconditionally-RES1 fields and EIS/EOS are RES1 only when FEAT_ExS is absent, the auto-generated mask is UL(0). The seven other bits dropped from the old mask (positions 4, 5, 16, 18, 23, 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS are the only bits whose semantics changed for FEAT_ExS hardware and where the kernel relies on the value being 1. Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are unconditionally CSEs regardless of whether FEAT_ExS is implemented. This matches the pairing in arch/arm64/kvm/config.c which treats EIS and EOS together as RES1 under !FEAT_ExS. Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure") Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07x86/boot/e820: Re-enable BIOS fallback if e820 table is emptyDavid Gow-1/+5
In commit: 157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables") the check on the number of entries in the e820 table was removed. The intention was to support single-entry maps, but by removing the check entirely, we also skip the fallback (to, e.g., the BIOS 88h function). This means that if no E820 map is passed in from the bootloader (which is the case on some bootloaders, like linld), we end up with an empty memory map, and the kernel fails to boot (either by deadlocking on OOM, or by failing to allocate the real mode trampoline, or similar). Re-instate the check in append_e820_table(), but only check that nr_entries is non-zero. This allows e820__memory_setup_default() to fall back to other memory size sources, and doesn't affect e820__memory_setup_extended(), as the latter ignores the return value from append_e820_table(). In doing so, we also update the return values to be proper error codes, with -ENOENT for this case (there are no entries), and -EINVAL for the case where an entry appears invalid. Given none of the callers check the actual value -- just whether it's nonzero -- this is largely aesthetic in practice. Tested against linld, and the kernel boots again fine. [ mingo: Readability edits to the comment and the changelog. ] Fixes: 157266edcc56 ("x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables") Signed-off-by: David Gow <david@davidgow.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://patch.msgid.link/20260416065746.1896647-1-david@davidgow.net
2026-05-06Merge tag 'parisc-for-7.1-rc3' of ↵Linus Torvalds-21/+30
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - Revert "parisc: led: fix reference leak on failed device registration" - Fix build failures introduced when allowing to build 32-/64-bit only VDSO - Switch to dynamic parisc root device to avoid upcoming warnings - Fix IRQ leak in LASI driver * tag 'parisc-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix IRQ leak in LASI driver parisc: Fix 64-bit kernel build when CONFIG_COMPAT=n parisc: Fix build failure for 32-bit kernel with PA2.0 instruction set parisc: drivers: switch to dynamic root device Revert "parisc: led: fix reference leak on failed device registration"
2026-05-06KVM: arm64: Remove potential UB on nvhe tracing clock updateMostafa Saleh-0/+3
Sashiko(locally) reports possiblity of division by zero and out-of-bounds bitwise shift in trace_clock_update(). Although the clock update is untrusted, we should at least have some basic checks to avoid undefined behaviours. Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://patch.msgid.link/20260430103724.2151625-1-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-06KVM: arm64: Handle permission faults with guest_memfdAlexandru Elisei-8/+21
gmem_abort() calls kvm_pgtable_stage2_map() to make changes to stage 2. It does this for both relaxing permissions on an existing mapping and to install a missing mapping. kvm_pgtable_stage2_map() doesn't make changes to stage 2 if there is an existing, valid entry and the new entry modifies only the permissions. This is checked in: kvm_pgtable_stage2_map() stage2_map_walk_leaf() stage2_map_walker_try_leaf() stage2_pte_needs_update() and if only the permissions differ, kvm_pgtable_stage2_map() returns -EAGAIN and KVM returns to the guest to replay the instruction. The assumption is that a concurrent fault on a different VCPU already mapped the faulting IPA, and replaying the instruction will either succeed, or cause a permission fault, which should be handled with kvm_pgtable_stage2_relax_perms(). gmem_abort(), on a read or write fault on a system without DIC (instruction cache invalidation required for data to instruction coherence), installs a valid entry with read and write permissions, but without executable permissions. On an execution fault on the same page, gmem_abort() attempts to relax the permissions to allow execution, but calls kvm_pgtable_stage2_map() to change the existing, valid, entry. kvm_pgtable_stage2_map() returns -EAGAIN and KVM resumes execution from the faulting instruction, which leads to an infinite loop of permission faults on the same instruction. Allow the guest to make progress by using kvm_pgtable_stage2_relax_perms() to relax permissions. Fixes: a7b57e099592 ("KVM: arm64: Handle guest_memfd-backed guest page faults") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260505094913.75317-1-alexandru.elisei@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-06KVM: arm64: nv: Consider the DS bit when translating TCR_EL2Wei-Lin Chang-0/+1
When running an nVHE L1, TCR_EL2 is mapped to TCR_EL1. Writes to the register are trapped and written to TCR_EL1 after a translation. Booting an nVHE L1 with 52-bit VA isn't working because the translation was ignoring the DS bit set by the guest, hence causing repeating level 0 faults. Add it in the translation function. Signed-off-by: Wei-Lin Chang <weilin.chang@arm.com> Link: https://patch.msgid.link/20260505144735.1496530-1-weilin.chang@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-06KVM: arm64: Work around C1-Pro erratum 4193714 for protected guestsJames Morse-1/+43
C1-Pro cores with SME have an erratum where TLBI+DSB does not complete all outstanding SME accesses. Instead a DSB needs to be executed on the affected CPUs. The implication is that pages cannot be unmapped from the host Stage 2 and then provided to a protected guest or to the hypervisor. Host SME accesses may still complete after this point. This erratum breaks pKVM's guarantees, and the workaround is hard to implement as EL2 and EL1 share a security state meaning EL1 can mask IPIs sent by EL2, leading to interrupt blackouts. Instead, do this in EL3. This has the advantage of a separate security state, meaning lower EL cannot mask the IPI. It is also simpler for EL3 to know about CPUs that are off or in PSCI's CPU_SUSPEND. Add the needed hook to host_stage2_set_owner_metadata_locked(). This covers the cases where the host loses access to a page: __pkvm_host_donate_guest() __pkvm_guest_unshare_host() host_stage2_set_owner_locked() when owner_id == PKVM_ID_HYP Since pKVM relies on the firmware call for correctness, check for the firmware counterpart during protected KVM initialisation and fail the pKVM initialisation if it is missing. Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Sudeep Holla <sudeep.holla@kernel.org> Link: https://patch.msgid.link/20260505165205.2690919-1-catalin.marinas@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-06Merge tag 'efi-fixes-for-v7.1-1' of ↵Linus Torvalds-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Fix issues in EFI graceful recovery on x86 introduced by changes to the kernel mode FPU APIs - I-cache coherency fixes for the LoongArch EFI stub - Locking fix for EFI pstore - Code tweak for efivarfs * tag 'efi-fixes-for-v7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efi: Restore IRQ state in EFI page fault handler x86/efi: Fix graceful fault handling after FPU softirq changes efi/libstub: Synchronize instruction cache after kernel relocation efi/loongarch: Implement efi_cache_sync_image() efi/libstub: Move efi_relocate_kernel() into its only remaining user efi: pstore: Drop efivar lock when efi_pstore_open() returns with an error efivarfs: use QSTR() in efivarfs_alloc_dentry