summaryrefslogtreecommitdiffstats
path: root/arch/arm64/tools
AgeCommit message (Collapse)AuthorLines
2026-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds-17/+137
Pull KVM updates from Paolo Bonzini: "Loongarch: - Add more CPUCFG mask bits - Improve feature detection - Add lazy load support for FPU and binary translation (LBT) register state - Fix return value for memory reads from and writes to in-kernel devices - Add support for detecting preemption from within a guest - Add KVM steal time test case to tools/selftests ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers - More pKVM fixes for features that are not supposed to be exposed to guests - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest - Preliminary work for guest GICv5 support - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures - A small set of FPSIMD cleanups - Selftest fixes addressing the incorrect alignment of page allocation - Other assorted low-impact fixes and spelling fixes RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit bf2c3138ae36 ("Merge tag 'kvm-x86-pmu-6.20' ...") - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace, - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs) - Clean up KVM's handling of marking mapped vCPU pages dirty - Drop a pile of *ancient* sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n - Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y - Add a regression test for recent APICv update fixes - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information - Drop a dead function declaration - Minor cleanups x86 (Intel): - Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit. The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM - Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed - Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch x86 (AMD): - Drop a user-triggerable WARN on nested_svm_load_cr3() failure - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) *must* flush the RAP - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0 - Treat exit_code as an unsigned 64-bit value through all of KVM - Add support for fetching SNP certificates from userspace - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2 - Misc fixes and cleanups x86 selftests: - Add a regression test for TPR<=>CR8 synchronization and IRQ masking - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests) - Extend several nested VMX tests to also cover nested SVM - Add a selftest for nested VMLOAD/VMSAVE - Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain guest_memfd: - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock Generic: - Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits) KVM: s390: Increase permitted SE header size to 1 MiB MAINTAINERS: Replace backup for s390 vfio-pci KVM: s390: vsie: Fix race in acquire_gmap_shadow() KVM: s390: vsie: Fix race in walk_guest_tables() KVM: s390: Use guest address to mark guest page dirty irqchip/riscv-imsic: Adjust the number of available guest irq files RISC-V: KVM: Transparent huge page support RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add riscv vm satp modes KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr() RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr() RISC-V: KVM: Remove unnecessary 'ret' assignment KVM: s390: Add explicit padding to struct kvm_s390_keyop KVM: LoongArch: selftests: Add steal time test case LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side LoongArch: KVM: Add paravirt preempt feature in hypervisor side ...
2026-02-10Merge tag 'sched-core-2026-02-09' of ↵Linus Torvalds-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Scheduler Kconfig space updates: - Further consolidate configurable preemption modes (Peter Zijlstra) Reduce the number of architectures that are allowed to offer PREEMPT_NONE and PREEMPT_VOLUNTARY, reducing the number of preemption models from four to just two: 'full' and 'lazy' on up-to-date architectures (arm64, loongarch, powerpc, riscv, s390, x86). None and voluntary are only available as legacy features on platforms that don't implement lazy preemption yet, or which don't even support preemption. The goal is to eventually remove cond_resched() and voluntary preemption altogether. RSEQ based 'scheduler time slice extension' support (Thomas Gleixner and Peter Zijlstra): This allows a thread to request a time slice extension when it enters a critical section to avoid contention on a resource when the thread is scheduled out inside of the critical section. - Add fields and constants for time slice extension - Provide static branch for time slice extensions - Add statistics for time slice extensions - Add prctl() to enable time slice extensions - Implement sys_rseq_slice_yield() - Implement syscall entry work for time slice extensions - Implement time slice extension enforcement timer - Reset slice extension when scheduled - Implement rseq_grant_slice_extension() - entry: Hook up rseq time slice extension - selftests: Implement time slice extension test - Allow registering RSEQ with slice extension - Move slice_ext_nsec to debugfs - Lower default slice extension - selftests/rseq: Add rseq slice histogram script Scheduler performance/scalability improvements: - Update rq->avg_idle when a task is moved to an idle CPU, which improves the scalability of various workloads (Shubhang Kaushik) - Reorder fields in 'struct rq' for better caching (Blake Jones) - Fair scheduler SMP NOHZ balancing code speedups (Shrikanth Hegde): - Move checking for nohz cpus after time check - Change likelyhood of nohz.nr_cpus - Remove nohz.nr_cpus and use weight of cpumask instead - Avoid false sharing for sched_clock_irqtime (Wangyang Guo) - Cleanups (Yury Norov): - Drop useless cpumask_empty() in find_energy_efficient_cpu() - Simplify task_numa_find_cpu() - Use cpumask_weight_and() in sched_balance_find_dst_group() DL scheduler updates: - Add a deadline server for sched_ext tasks (by Andrea Righi and Joel Fernandes, with fixes by Peter Zijlstra) RT scheduler updates: - Skip currently executing CPU in rto_next_cpu() (Chen Jinghuang) Entry code updates and performance improvements (Jinjie Ruan) This is part of the scheduler tree in this cycle due to inter- dependencies with the RSEQ based time slice extension work: - Remove unused syscall argument from syscall_trace_enter() - Rework syscall_exit_to_user_mode_work() for architecture reuse - Add arch_ptrace_report_syscall_entry/exit() - Inline syscall_exit_work() and syscall_trace_enter() Scheduler core updates (Peter Zijlstra): - Rework sched_class::wakeup_preempt() and rq_modified_*() - Avoid rq->lock bouncing in sched_balance_newidle() - Rename rcu_dereference_check_sched_domain() => rcu_dereference_sched_domain() - <linux/compiler_types.h>: Add the __signed_scalar_typeof() helper Fair scheduler updates/refactoring (Peter Zijlstra and Ingo Molnar): - Fold the sched_avg update - Change rcu_dereference_check_sched_domain() to rcu-sched - Switch to rcu_dereference_all() - Remove superfluous rcu_read_lock() - Limit hrtick work - Join two #ifdef CONFIG_FAIR_GROUP_SCHED blocks - Clean up comments in 'struct cfs_rq' - Separate se->vlag from se->vprot - Rename cfs_rq::avg_load to cfs_rq::sum_weight - Rename cfs_rq::avg_vruntime to ::sum_w_vruntime & helper functions - Introduce and use the vruntime_cmp() and vruntime_op() wrappers for wrapped-signed aritmetics - Sort out 'blocked_load*' namespace noise Scheduler debugging code updates: - Export hidden tracepoints to modules (Gabriele Monaco) - Convert copy_from_user() + kstrtouint() to kstrtouint_from_user() (Fushuai Wang) - Add assertions to QUEUE_CLASS (Peter Zijlstra) - hrtimer: Fix tracing oddity (Thomas Gleixner) Misc fixes and cleanups: - Re-evaluate scheduling when migrating queued tasks out of throttled cgroups (Zicheng Qu) - Remove task_struct->faults_disabled_mapping (Christoph Hellwig) - Fix math notation errors in avg_vruntime comment (Zhan Xusheng) - sched/cpufreq: Use %pe format for PTR_ERR() printing (zenghongling)" * tag 'sched-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups sched/cpufreq: Use %pe format for PTR_ERR() printing sched/rt: Skip currently executing CPU in rto_next_cpu() sched/clock: Avoid false sharing for sched_clock_irqtime selftests/sched_ext: Add test for DL server total_bw consistency selftests/sched_ext: Add test for sched_ext dl_server sched/debug: Fix dl_server (re)start conditions sched/debug: Add support to change sched_ext server params sched_ext: Add a DL server for sched_ext tasks sched/debug: Stop and start server based on if it was active sched/debug: Fix updating of ppos on server write ops sched/deadline: Clear the defer params entry: Inline syscall_exit_work() and syscall_trace_enter() entry: Add arch_ptrace_report_syscall_entry/exit() entry: Rework syscall_exit_to_user_mode_work() for architecture reuse entry: Remove unused syscall argument from syscall_trace_enter() sched: remove task_struct->faults_disabled_mapping sched: Update rq->avg_idle when a task is moved to an idle CPU selftests/rseq: Add rseq slice histogram script hrtimer: Fix trace oddity ...
2026-02-05Merge branch kvm-arm64/resx into kvmarm-master/nextMarc Zyngier-10/+72
* kvm-arm64/resx: : . : Add infrastructure to deal with the full gamut of RESx bits : for NV. As a result, it is now possible to have the expected : semantics for some bits such as SCTLR_EL2.SPAN. : . KVM: arm64: Add debugfs file dumping computed RESx values KVM: arm64: Add sanitisation to SCTLR_EL2 KVM: arm64: Remove all traces of HCR_EL2.MIOCNCE KVM: arm64: Remove all traces of FEAT_TME KVM: arm64: Simplify handling of full register invalid constraint KVM: arm64: Get rid of FIXED_VALUE altogether KVM: arm64: Simplify handling of HCR_EL2.E2H RESx KVM: arm64: Move RESx into individual register descriptors KVM: arm64: Add RES1_WHEN_E2Hx constraints as configuration flags KVM: arm64: Add REQUIRES_E2H1 constraint as configuration flags KVM: arm64: Simplify FIXED_VALUE handling KVM: arm64: Convert HCR_EL2.RW to AS_RES1 KVM: arm64: Correctly handle SCTLR_EL1 RES1 bits for unsupported features KVM: arm64: Allow RES1 bits to be inferred from configuration KVM: arm64: Inherit RESx bits from FGT register descriptors KVM: arm64: Extend unified RESx handling to runtime sanitisation KVM: arm64: Introduce data structure tracking both RES0 and RES1 bits KVM: arm64: Introduce standalone FGU computing primitive KVM: arm64: Remove duplicate configuration for SCTLR_EL1.{EE,E0E} arm64: Convert SCTLR_EL2 to sysreg infrastructure Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05Merge branch kvm-arm64/gicv5-prologue into kvmarm-master/nextMarc Zyngier-1/+1
* kvm-arm64/gicv5-prologue: : . : Prologue to GICv5 support, courtesy of Sascha Bischoff. : : This is preliminary work that sets the scene for the full-blow : support. : . irqchip/gic-v5: Check if impl is virt capable KVM: arm64: gic: Set vgic_model before initing private IRQs arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1 KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05KVM: arm64: Remove all traces of HCR_EL2.MIOCNCEMarc Zyngier-2/+1
MIOCNCE had the potential to eat your data, and also was never implemented by anyone. It's been retrospectively removed from the architecture, and we're happy to follow that lead. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-19-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05KVM: arm64: Remove all traces of FEAT_TMEMarc Zyngier-9/+3
FEAT_TME has been dropped from the architecture. Retrospectively. I'm sure someone is crying somewhere, but most of us won't. Clean-up time. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05arm64: Convert SCTLR_EL2 to sysreg infrastructureMarc Zyngier-0/+69
Convert SCTLR_EL2 to the sysreg infrastructure, as per the 2025-12_rel revision of the Registers.json file. Note that we slightly deviate from the above, as we stick to the ARM ARM M.a definition of SCTLR_EL2[9], which is RES0, in order to avoid dragging the POE2 definitions... Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-30arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1Sascha Bischoff-1/+1
The GICv5 architecture is dropping the ICC_HAPR_EL1 and ICV_HAPR_EL1 system registers. These registers were never added to the sysregs, but the traps for them were. Drop the trap bit from the ICH_HFGRTR_EL2 and make it Res1 as per the upcoming GICv5 spec change. Additionally, update the EL2 setup code to not attempt to set that bit. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260128175919.3828384-4-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-29Merge branch 'for-next/errata' into for-next/coreWill Deacon-0/+1
* for-next/errata: arm64: errata: Workaround for SI L1 downstream coherency issue
2026-01-23arm64: errata: Workaround for SI L1 downstream coherency issueLucas Wei-0/+1
When software issues a Cache Maintenance Operation (CMO) targeting a dirty cache line, the CPU and DSU cluster may optimize the operation by combining the CopyBack Write and CMO into a single combined CopyBack Write plus CMO transaction presented to the interconnect (MCN). For these combined transactions, the MCN splits the operation into two separate transactions, one Write and one CMO, and then propagates the write and optionally the CMO to the downstream memory system or external Point of Serialization (PoS). However, the MCN may return an early CompCMO response to the DSU cluster before the corresponding Write and CMO transactions have completed at the external PoS or downstream memory. As a result, stale data may be observed by external observers that are directly connected to the external PoS or downstream memory. This erratum affects any system topology in which the following conditions apply: - The Point of Serialization (PoS) is located downstream of the interconnect. - A downstream observer accesses memory directly, bypassing the interconnect. Conditions: This erratum occurs only when all of the following conditions are met: 1. Software executes a data cache maintenance operation, specifically, a clean or clean&invalidate by virtual address (DC CVAC or DC CIVAC), that hits on unique dirty data in the CPU or DSU cache. This results in a combined CopyBack and CMO being issued to the interconnect. 2. The interconnect splits the combined transaction into separate Write and CMO transactions and returns an early completion response to the CPU or DSU before the write has completed at the downstream memory or PoS. 3. A downstream observer accesses the affected memory address after the early completion response is issued but before the actual memory write has completed. This allows the observer to read stale data that has not yet been updated at the PoS or downstream memory. The implementation of workaround put a second loop of CMOs at the same virtual address whose operation meet erratum conditions to wait until cache data be cleaned to PoC. This way of implementation mitigates performance penalty compared to purely duplicate original CMO. Signed-off-by: Lucas Wei <lucaswei@google.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23Merge branch kvm-arm64/feat_idst into kvmarm-master/nextMarc Zyngier-3/+4
* kvm-arm64/feat_idst: : . : Add support for FEAT_IDST, allowing ID registers that are not implemented : to be reported as a normal trap rather than as an UNDEF exception. : . KVM: arm64: selftests: Add a test for FEAT_IDST KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome KVM: arm64: pkvm: Add a generic synchronous exception injection primitive KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers KVM: arm64: Add a generic synchronous exception injection primitive KVM: arm64: Add trap routing for GMID_EL1 arm64: Repaint ID_AA64MMFR2_EL1.IDS description Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch kvm-arm64/vtcr into kvmarm-master/nextMarc Zyngier-3/+60
* kvm-arm64/vtcr: : . : VTCR_EL2 conversion to the configuration-driven RESx framework, : fixing a couple of UXN/PXN/XN bugs in the process. : . KVM: arm64: nv: Return correct RES0 bits for FGT registers KVM: arm64: Always populate FGT masks at boot time KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co arm64: Convert VTCR_EL2 to sysreg infratructure arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps() KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate() KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp() KVM: arm64: Inject UNDEF for a register trap without accessor KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load KVM: arm64: Fix EL2 S1 XN handling for hVHE setups KVM: arm64: gic: Check for vGICv3 when clearing TWI Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-22arm64: Add support for FEAT_{LS64, LS64_V}Yicong Yang-0/+2
Armv8.7 introduces single-copy atomic 64-byte loads and stores instructions and its variants named under FEAT_{LS64, LS64_V}. These features are identified by ID_AA64ISAR1_EL1.LS64 and the use of such instructions in userspace (EL0) can be trapped. As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later patch will be exported to userspace, FEAT_LS64_V will be enabled only in kernel. In order to support the use of corresponding instructions in userspace: - Make ID_AA64ISAR1_EL1.LS64 visbile to userspace - Add identifying and enabling in the cpufeature list - Expose these support of these features to userspace through HWCAP3 and cpuinfo ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for special memory (device memory) so requires support by the CPU, system and target memory location (device that support these instructions). The HWCAP3_LS64, implies the support of CPU and system (since no identification method from system, so SoC vendors should advertise support in the CPU if system also support them). Otherwise for ld64b/st64b the atomicity may not be guaranteed or a DABT will be generated, so users (probably userspace driver developer) should make sure the target memory (device) also have the support. For st64bv 0xffffffffffffffff will be returned as status result for unsupported memory so user should check it. Document the restrictions along with HWCAP3_LS64. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22rseq: Implement sys_rseq_slice_yield()Thomas Gleixner-0/+1
Provide a new syscall which has the only purpose to yield the CPU after the kernel granted a time slice extension. sched_yield() is not suitable for that because it unconditionally schedules, but the end of the time slice extension is not required to schedule when the task was already preempted. This also allows to have a strict check for termination to catch user space invoking random syscalls including sched_yield() from a time slice extension region. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20251215155708.929634896@linutronix.de
2026-01-15arm64: Repaint ID_AA64MMFR2_EL1.IDS descriptionMarc Zyngier-3/+4
ID_AA64MMFR2_EL1.IDS, as described in the sysreg file, is pretty horrible as it diesctly give the ESR value. Repaint it using the usual NI/IMP identifiers to describe the absence/presence of FEAT_IDST. Also add the new EL3 routing feature, even if we really don't care about it. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://patch.msgid.link/20260108173233.2911955-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15arm64: Convert VTCR_EL2 to sysreg infratructureMarc Zyngier-0/+57
Our definition of VTCR_EL2 is both partial (tons of fields are missing) and totally inconsistent (some constants are shifted, some are not). They are also expressed in terms of TCR, which is rather inconvenient. Replace the ad-hoc definitions with the the generated version. This results in a bunch of additional changes to make the code with the unshifted nature of generated enumerations. The register data was extracted from the BSD licenced AARCHMRS (AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0). Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnumMarc Zyngier-3/+3
ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 are currently represented as unordered enumerations. However, the architecture treats them as Unsigned, as hinted to by the MRS data: (FEAT_S2TGran4K <=> (((UInt(ID_AA64MMFR0_EL1.TGran4_2) == 0) && FEAT_TGran4K) || (UInt(ID_AA64MMFR0_EL1.TGran4_2) >= 2)))) and similar descriptions exist for 16 and 64k. This is also confirmed by D24.1.3.3 ("Alternative ID scheme used for ID_AA64MMFR0_EL1 stage 2 granule sizes") in the L.b revision of the ARM ARM. Turn these fields into UnsignedEnum so that we can use the above description more or less literally. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds-0/+2
Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-02Merge tag 'arm64-upstream' of ↵Linus Torvalds-47/+120
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set_*_tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...
2025-12-01Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/nextOliver Upton-0/+1
* kvm-arm64/nv-xnx-haf: (22 commits) : Support for FEAT_XNX and FEAT_HAF in nested : : Add support for a couple of MMU-related features that weren't : implemented by KVM's software page table walk: : : - FEAT_XNX: Allows the hypervisor to describe execute permissions : separately for EL0 and EL1 : : - FEAT_HAF: Hardware update of the Access Flag, which in the context of : nested means software walkers must also set the Access Flag. : : The series also adds some basic support for testing KVM's emulation of : the AT instruction, including the implementation detail that AT sets the : Access Flag in KVM. KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2 ... Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trappingMarc Zyngier-0/+1
A long time ago, an unsuspecting architect forgot to add a trap bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but what's a bit of spec between friends? Thankfully, this was fixed in a later revision, and ARM "deprecates" the lack of trapping ability. Unfortuantely, a few (billion) CPUs went out with that defect, anything ARMv8.0 from ARM, give or take. And on these CPUs, you can't trap DIR on its own, full stop. As the next best thing, we can trap everything in the common group, which is a tad expensive, but hey ho, that's what you get. You can otherwise recycle the HW in the neaby bin. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24arm64: Detect FEAT_XNXOliver Upton-0/+1
Detect the feature in anticipation of using it in KVM. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-13arm64/sysreg: Add ICH_VMCR_EL2Sascha Bischoff-0/+21
Add the ICH_VMCR_EL2 register, which is required for the upcoming GICv5 KVM support. This register has two different field encodings, based on if it is used for GICv3 or GICv5-based VMs. The GICv5-specific field encodings are generated with a FEAT_GCIE prefix. This register is already described in the GICv3 KVM code directly. This will be ported across to use the generated encodings as part of an upcoming change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13arm64/sysreg: Move generation of RES0/RES1/UNKN to functionSascha Bischoff-12/+14
The RESx and UNKN define generation happens in two places (EndSysreg and EndSysregFields), and was using nearly identical code. Split this out into a function, and call that instead, rather then keeping the dupliated code. There are no changes to the generated sysregs as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13arm64/sysreg: Support feature-specific fields with 'Prefix' descriptorSascha Bischoff-38/+88
Some system register field encodings change based on, for example the in-use architecture features, or the context in which they are accessed. In order to support these different field encodings, introduce the Prefix descriptor (Prefix, EndPrefix) for describing such sysregs. The Prefix descriptor can be used in the following way: Sysreg EXAMPLE 0 1 2 3 4 Prefix FEAT_A Field 63:0 Foo EndPrefix Prefix FEAT_B Field 63:1 Bar Res0 0 EndPrefix Field 63:0 Baz EndSysreg This will generate a single set of system register encodings (REG_, SYS_, ...), and then generate three sets of field definitions for the system register called EXAMPLE. The first set is prefixed by FEAT_A, e.g. FEAT_A_EXAMPLE_Foo. The second set is prefixed by FEAT_B, e.g., FEAT_B_EXAMPLE_Bar. The third set is not given a prefix at all, e.g. EXAMPLE_BAZ. For each set, a corresponding set of defines for Res0, Res1, and Unkn is generated. The intent for the final prefix-less fields is to describe default or legacy field encodings. This ensure that prefixed encodings can be added to already-present sysregs without affecting existing legacy code. Prefixed fields must be defined before those without a prefix, and this is checked by the generator. This ensures consisnt ordering within the sysregs definitions. The Prefix descriptor can be used within Sysreg or SysregFields blocks. Field, Res0, Res1, Unkn, Rax, SignedEnum, Enum can all be used within a Prefix block. Fields and Mapping can not. Fields that vary with features must be described as part of a SysregFields block, instead. Mappings, which are just a code comment, make little sense in this context, and have hence not been included. There are no changes to the generated system register definitions as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13arm64/sysreg: Fix checks for incomplete sysreg definitionsSascha Bischoff-3/+3
The checks for incomplete sysreg definitions were checking if the next_bit was greater than 0, which is incorrect and missed occasions where bit 0 hasn't been defined for a sysreg. The reason is that next_bit is -1 when all bits have been processed (LSB - 1). Change the checks to use >= 0, instead. Also, set next_bit in Mapping to -1 instead of 0 to match these new checks. There are no changes to the generated sysreg definitons as part of this change, and conveniently no definitions lack definitions for bit 0. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-03arch: hookup listns() system callChristian Brauner-0/+1
Add the listns() system call to all architectures. Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-20-2e6f823ebdc0@kernel.org Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds-0/+1
Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...
2025-09-29Merge tag 'arm64-upstream' of ↵Linus Torvalds-23/+80
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's good stuff across the board, including some nice mm improvements for CPUs with the 'noabort' BBML2 feature and a clever patch to allow ptdump to play nicely with block mappings in the vmalloc area. Confidential computing: - Add support for accepting secrets from firmware (e.g. ACPI CCEL) and mapping them with appropriate attributes. CPU features: - Advertise atomic floating-point instructions to userspace - Extend Spectre workarounds to cover additional Arm CPU variants - Extend list of CPUs that support break-before-make level 2 and guarantee not to generate TLB conflict aborts for changes of mapping granularity (BBML2_NOABORT) - Add GCS support to our uprobes implementation. Documentation: - Remove bogus SME documentation concerning register state when entering/exiting streaming mode. Entry code: - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY) - Micro-optimise syscall entry path with a compiler branch hint. Memory management: - Enable huge mappings in vmalloc space even when kernel page-table dumping is enabled - Tidy up the types used in our early MMU setup code - Rework rodata= for closer parity with the behaviour on x86 - For CPUs implementing BBML2_NOABORT, utilise block mappings in the linear map even when rodata= applies to virtual aliases - Don't re-allocate the virtual region between '_text' and '_stext', as doing so confused tools parsing /proc/vmcore. Miscellaneous: - Clean-up Kconfig menuconfig text for architecture features - Avoid redundant bitmap_empty() during determination of supported SME vector lengths - Re-enable warnings when building the 32-bit vDSO object - Avoid breaking our eggs at the wrong end. Perf and PMUs: - Support for v3 of the Hisilicon L3C PMU - Support for Hisilicon's MN and NoC PMUs - Support for Fujitsu's Uncore PMU - Support for SPE's extended event filtering feature - Preparatory work to enable data source filtering in SPE - Support for multiple lanes in the DWC PCIe PMU - Support for i.MX94 in the IMX DDR PMU driver - MAINTAINERS update (Thank you, Yicong) - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets). Selftests: - Add basic LSFE check to the existing hwcaps test - Support nolibc in GCS tests - Extend SVE ptrace test to pass unsupported regsets and invalid vector lengths - Minor cleanups (typos, cosmetic changes). System registers: - Fix ID_PFR1_EL1 definition - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1 - Sync TCR_EL1 definition with the latest Arm ARM (L.b) - Be stricter about the input fed into our AWK sysreg generator script - Typo fixes and removal of redundant definitions. ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code. CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: cpufeature: Remove duplicate asm/mmu.h header arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN perf/dwc_pcie: Fix use of uninitialized variable arm/syscalls: mark syscall invocation as likely in invoke_syscall Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver arm64: map [_text, _stext) virtual address range non-executable+read-only arm64/sysreg: Update TCR_EL1 register arm64: Enable vmalloc-huge with ptdump arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions ...
2025-09-24Merge branch 'for-next/sysregs' into for-next/coreWill Deacon-21/+69
* for-next/sysregs: arm64/sysreg: Update TCR_EL1 register arm64: sysreg: Add validation checks to sysreg header generation script arm64: sysreg: Correct sign definitions for EIESB and DoubleLock arm64: sysreg: Fix and tidy up sysreg field definitions
2025-09-22arm64/sysreg: Update TCR_EL1 registerAnshuman Khandual-8/+44
Update TCR_EL1 register fields as per latest ARM ARM DDI 0487 L.B and while here drop an explicit sysreg definition SYS_TCR_EL1 from sysreg.h, which is now redundant. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-18arm64: sysreg: Add new PMSFCR_EL1 fields and PMSDSFR_EL1 registerJames Clark-2/+11
Add new fields and register that are introduced for the features FEAT_SPE_EFT (extended filtering) and FEAT_SPE_FDS (data source filtering). Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-17arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capabilitySascha Bischoff-0/+1
Implement the GCIE_LEGACY capability as a system feature to be able to check for support from KVM. The type is explicitly ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, which means that the capability is enabled early if all boot CPUs support it. Additionally, if this capability is enabled during boot, it prevents late onlining of CPUs that lack it, thereby avoiding potential mismatched configurations which would break KVM. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-11arm64: sysreg: Add validation checks to sysreg header generation scriptFuad Tabba-0/+20
The gen_sysreg.awk script processes the system register specification in the sysreg text file to generate C macro definitions. The current script will silently accept certain errors in the specification file, leading to incorrect header generation. For example, a Sysreg or SysregFields can be accidentally duplicated, causing its macros to be emitted twice. An Enum can contain duplicate values for different items, which is architecturally incorrect. Add checks to catch these errors at build time. The script now tracks all seen Sysreg and SysregFields definitions and checks for duplicates. It also tracks values within each Enum block to ensure entries are unique. Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-11arm64: sysreg: Correct sign definitions for EIESB and DoubleLockFuad Tabba-2/+2
The `ID_AA64MMFR4_EL1.EIESB` field, is an unsigned enumeration, but was incorrectly defined as a `SignedEnum` when introduced in commit cfc680bb04c5 ("arm64: sysreg: Add layout for ID_AA64MMFR4_EL1"). This is corrected to `UnsignedEnum`. Conversely, the `ID_AA64DFR0_EL1.DoubleLock` field, is a signed enumeration, but was incorrectly defined as an `UnsignedEnum`. This is corrected to `SignedEnum`, which wasn't correctly set when annotated as such in commit ad16d4cf0b4f ("arm64/sysreg: Initial unsigned annotations for ID registers"). Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-11arm64: sysreg: Fix and tidy up sysreg field definitionsFuad Tabba-11/+3
Fix the value of ID_PFR1_EL1.Security NSACR_RFR to be 0b0010, as per DDI0601/2025-06, which wasn't correctly set when introduced in commit 1224308075f1 ("arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation"). While at it, remove redundant definitions of CPACR_EL12 and RCWSMASK_EL1 and fix some typos. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-08-29Merge tag 'kvmarm-fixes-6.17-1' of ↵Paolo Bonzini-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes
2025-08-21arm64: Add capability denoting FEAT_RASv1p1Marc Zyngier-0/+1
Detecting FEAT_RASv1p1 is rather complicated, as there are two ways for the architecture to advertise the same thing (always a delight...). Add a capability that will advertise this in a synthetic way to the rest of the kernel. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds-4/+514
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-29Merge tag 'arm64-upstream' of ↵Linus Torvalds-0/+135
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "A quick summary: perf support for Branch Record Buffer Extensions (BRBE), typical PMU hardware updates, small additions to MTE for store-only tag checking and exposing non-address bits to signal handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on. There is also a TLBI optimisation on hardware that does not require break-before-make when changing the user PTEs between contiguous and non-contiguous. More details: Perf and PMU updates: - Add support for new (v3) Hisilicon SLLC and DDRC PMUs - Add support for Arm-NI PMU integrations that share interrupts between clock domains within a given instance - Allow SPE to be configured with a lower sample period than the minimum recommendation advertised by PMSIDR_EL1.Interval - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE) - Adjust the perf watchdog period according to cpu frequency changes - Minor driver fixes and cleanups Hardware features: - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY) - Support for reporting the non-address bits during a synchronous MTE tag check fault (FEAT_MTE_TAGGED_FAR) - Optimise the TLBI when folding/unfolding contiguous PTEs on hardware with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts Software features: - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable() and using the text-poke API for late module relocations - Force VMAP_STACK always on and change arm64_efi_rt_init() to use arch_alloc_vmap_stack() in order to avoid KASAN false positives ACPI: - Improve SPCR handling and messaging on systems lacking an SPCR table Debug: - Simplify the debug exception entry path - Drop redundant DBG_MDSCR_* macros Kselftests: - Cleanups and improvements for SME, SVE and FPSIMD tests Miscellaneous: - Optimise loop to reduce redundant operations in contpte_ptep_get() - Remove ISB when resetting POR_EL0 during signal handling - Mark the kernel as tainted on SEA and SError panic - Remove redundant gcs_free() call" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64/gcs: task_gcs_el0_enable() should use passed task arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: signal: Remove ISB when resetting POR_EL0 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems arm64/mm: Drop redundant addr increment in set_huge_pte_at() kselftest/arm4: Provide local defines for AT_HWCAP3 arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling ...
2025-07-28Merge branch 'kvm-arm64/config-masks' into kvmarm/nextOliver Upton-2/+11
* kvm-arm64/config-masks: : More config-driven mask computation, courtesy of Marc Zyngier : : Converts more system registers to the config-driven computation of RESx : masks based on the advertised feature set KVM: arm64: Tighten the definition of FEAT_PMUv3p9 KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/nextOliver Upton-0/+6
* kvm-arm64/gcie-legacy: : Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff : : FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to : support the legacy GICv3 for VMs, including a backwards-compatible VGIC : implementation that we all know and love. : : As a starting point for GICv5 enablement in KVM, enable + use the : GICv3-compatible feature when running VMs on GICv5 hardware. KVM: arm64: gic-v5: Probe for GICv5 KVM: arm64: gic-v5: Support GICv3 compat arm64/sysreg: Add ICH_VCTLR_EL2 irqchip/gic-v5: Populate struct gic_kvm_info irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge tag 'irqchip-gic-v5-host' into kvmarm/nextOliver Upton-2/+496
GICv5 initial host support Add host kernel support for the new arm64 GICv5 architecture, which is quite a departure from the previous ones. Include support for the full gamut of the architecture (interrupt routing and delivery to CPUs, wired interrupts, MSIs, and interrupt translation). * tag 'irqchip-gic-v5-host': (32 commits) arm64: smp: Fix pNMI setup after GICv5 rework arm64: Kconfig: Enable GICv5 docs: arm64: gic-v5: Document booting requirements for GICv5 irqchip/gic-v5: Add GICv5 IWB support irqchip/gic-v5: Add GICv5 ITS support irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT handling irqchip/gic-v3: Rename GICv3 ITS MSI parent PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function of/irq: Add of_msi_xlate() helper function irqchip/gic-v5: Enable GICv5 SMP booting irqchip/gic-v5: Add GICv5 LPI/IPI support irqchip/gic-v5: Add GICv5 IRS/SPI support irqchip/gic-v5: Add GICv5 PPI support arm64: Add support for GICv5 GSB barriers arm64: smp: Support non-SGIs for IPIs arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability arm64: cpucaps: Rename GICv3 CPU interface capability arm64: Disable GICv5 read/write/instruction traps arm64/sysreg: Add ICH_HFGITR_EL2 arm64/sysreg: Add ICH_HFGWTR_EL2 ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-24Merge branch 'for-next/feat_mte_store_only' into for-next/coreCatalin Marinas-0/+1
* for-next/feat_mte_store_only: : MTE feature to restrict tag checking to store only operations kselftest/arm64/mte: Add MTE_STORE_ONLY testcases kselftest/arm64/mte: Preparation for mte store only test kselftest/arm64/abi: Add MTE_STORE_ONLY feature hwcap test KVM: arm64: Expose MTE_STORE_ONLY feature to guest arm64/hwcaps: Add MTE_STORE_ONLY hwcaps arm64/kernel: Support store-only mte tag check prctl: Introduce PR_MTE_STORE_ONLY arm64/cpufeature: Add MTE_STORE_ONLY feature
2025-07-24Merge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', ↵Catalin Marinas-0/+2
'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (23 commits) drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation perf/cxlpmu: Remove unintended newline from IRQ name format string perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe() perf: arm_spe: Relax period restriction perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) KVM: arm64: nvhe: Disable branch generation in nVHE guests arm64: Handle BRBE booting requirements arm64/sysreg: Add BRBE registers and fields perf/arm: Add missing .suppress_bind_attrs perf/arm-cmn: Reduce stack usage during discovery perf: imx9_perf: make the read-only array mask static const perf/arm-cmn: Broaden module description for wider interconnect support ... * for-next/livepatch: : Support for HAVE_LIVEPATCH on arm64 arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: Implement HAVE_LIVEPATCH arm64: stacktrace: Implement arch_stack_walk_reliable() arm64: stacktrace: Check kretprobe_find_ret_addr() return value arm64/module: Use text-poke API for late relocations. * for-next/user-contig-bbml2: : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts arm64/mm: Elide tlbi in contpte_convert() under BBML2 iommu/arm: Add BBM Level 2 smmu feature arm64: Add BBM Level 2 cpu feature arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type * for-next/misc: : Miscellaneous arm64 patches arm64/gcs: task_gcs_el0_enable() should use passed task arm64: signal: Remove ISB when resetting POR_EL0 arm64/mm: Drop redundant addr increment in set_huge_pte_at() arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get arm64: pi: use 'targets' instead of extra-y in Makefile * for-next/acpi: : Various ACPI arm64 changes ACPI: Suppress misleading SPCR console message when SPCR table is absent ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled * for-next/debug-entry: : Simplify the debug exception entry path arm64: debug: remove debug exception registration infrastructure arm64: debug: split bkpt32 exception entry arm64: debug: split brk64 exception entry arm64: debug: split hardware watchpoint exception entry arm64: debug: split single stepping exception entry arm64: debug: refactor reinstall_suspended_bps() arm64: debug: split hardware breakpoint exception entry arm64: entry: Add entry and exit functions for debug exceptions arm64: debug: remove break/step handler registration infrastructure arm64: debug: call step handlers statically arm64: debug: call software breakpoint handlers statically arm64: refactor aarch32_break_handler() arm64: debug: clean up single_step_handler logic * for-next/feat_mte_tagged_far: : Support for reporting the non-address bits during a synchronous MTE tag check fault kselftest/arm64/mte: Add mtefar tests on check_mmap_options kselftest/arm64/mte: Refactor check_mmap_option test kselftest/arm64/mte: Add verification for address tag in signal handler kselftest/arm64/mte: Add address tag related macro and function kselftest/arm64/mte: Check MTE_FAR feature is supported kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS kselftest/arm64: Add MTE_FAR hwcap test KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature * for-next/kselftest: : Kselftest updates for arm64 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems kselftest/arm4: Provide local defines for AT_HWCAP3 kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace kselftest/arm64: Fix check for setting new VLs in sve-ptrace kselftest/arm64: Convert tpidr2 test to use kselftest.h * for-next/mdscr-cleanup: : Drop redundant DBG_MDSCR_* macros KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t arm64/debug: Drop redundant DBG_MDSCR_* macros * for-next/vmap-stack: : Force VMAP_STACK on arm64 arm64: remove CONFIG_VMAP_STACK checks from entry code arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN arm64: efi: Remove CONFIG_VMAP_STACK check arm64: Mandate VMAP_STACK arm64: efi: Fix KASAN false positive for EFI runtime stack arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth() arm64/gcs: Don't call gcs_free() during flush_gcs() arm64: Restrict pagetable teardown to avoid false warning docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-07-15arm64: sysreg: Add THE/ASID2 controls to TCR2_ELxMarc Zyngier-2/+11
FEAT_THE and FEAT_ASID2 add new controls to the TCR2_ELx registers. Add them to the register descriptions. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250714115503.3334242-2-maz@kernel.org [ fix whitespace ] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08arm64/sysreg: Add ICH_VCTLR_EL2Sascha Bischoff-0/+6
This system register is required to enable/disable V3 legacy mode when running on a GICv5 host. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20250627100847.1022515-4-sascha.bischoff@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08arm64: Detect FEAT_SCTLR2Oliver Upton-0/+1
KVM is about to pick up support for SCTLR2. Add cpucap for later use in the guest/host context switch hot path. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08arm64: cpucaps: Add GICv5 CPU interface (GCIE) capabilityLorenzo Pieralisi-0/+1
Implement the GCIE capability as a strict boot cpu capability to detect whether architectural GICv5 support is available in HW. Plug it in with a naming consistent with the existing GICv3 CPU interface capability. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-17-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08arm64: cpucaps: Rename GICv3 CPU interface capabilityLorenzo Pieralisi-1/+1
In preparation for adding a GICv5 CPU interface capability, rework the existing GICv3 CPUIF capability - change its name and description so that the subsequent GICv5 CPUIF capability can be added with a more consistent naming on top. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-16-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>