linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Files	Lines
2025-10-10	KVM: x86/pmu: Don't try to get perf capabilities for hybrid CPUs	Dapeng Mi	1	-3/+5
	Explicitly zero kvm_host_pmu instead of attempting to get the perf PMU capabilities when running on a hybrid CPU to avoid running afoul of perf's sanity check. ------------[ cut here ]------------ WARNING: arch/x86/events/core.c:3089 at perf_get_x86_pmu_capability+0xd/0xc0, Call Trace: <TASK> kvm_x86_vendor_init+0x1b0/0x1a40 [kvm] vmx_init+0xdb/0x260 [kvm_intel] vt_init+0x12/0x9d0 [kvm_intel] do_one_initcall+0x60/0x3f0 do_init_module+0x97/0x2b0 load_module+0x2d08/0x2e30 init_module_from_file+0x96/0xe0 idempotent_init_module+0x117/0x330 __x64_sys_finit_module+0x73/0xe0 Always read the capabilities for non-hybrid CPUs, i.e. don't entirely revert to reading capabilities if and only if KVM wants to use a PMU, as it may be useful to have the host PMU capabilities available, e.g. if only or debug. Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Closes: https://lore.kernel.org/all/70b64347-2aca-4511-af78-a767d5fa8226@intel.com/ Fixes: 51f34b1e650f ("KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20251010005239.146953-1-dapeng1.mi@linux.intel.com [sean: rework changelog, call out hybrid CPUs in shortlog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-07	KVM: selftests: Test prefault memory during concurrent memslot removal	Yan Zhao	1	-17/+114
	Expand the prefault memory selftest to add a regression test for a KVM bug where KVM's retry logic would result in (breakable) deadlock due to the memslot deletion waiting on prefaulting to release SRCU, and prefaulting waiting on the memslot to fully disappear (KVM uses a two-step process to delete memslots, and KVM x86 retries page faults if a to-be-deleted, a.k.a. INVALID, memslot is encountered). To exercise concurrent memslot remove, spawn a second thread to initiate memslot removal at roughly the same time as prefaulting. Test memslot removal for all testcases, i.e. don't limit concurrent removal to only the success case. There are essentially three prefault scenarios (so far) that are of interest: 1. Success 2. ENOENT due to no memslot 3. EAGAIN due to INVALID memslot For all intents and purposes, #1 and #2 are mutually exclusive, or rather, easier to test via separate testcases since writing to non-existent memory is trivial. But for #3, making it mutually exclusive with #1 _or_ #2 is actually more complex than testing memslot removal for all scenarios. The only requirement to let memslot removal coexist with other scenarios is a way to guarantee a stable result, e.g. that the "no memslot" test observes ENOENT, not EAGAIN, for the final checks. So, rather than make memslot removal mutually exclusive with the ENOENT scenario, simply restore the memslot and retry prefaulting. For the "no memslot" case, KVM_PRE_FAULT_MEMORY should be idempotent, i.e. should always fail with ENOENT regardless of how many times userspace attempts prefaulting. Pass in both the base GPA and the offset (instead of the "full" GPA) so that the worker can recreate the memslot. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250924174255.2141847-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-30	KVM: x86: Export KVM-internal symbols for sub-modules only	Sean Christopherson	11	-173/+173
	Rework almost all of KVM x86's exports to expose symbols only to KVM's vendor modules, i.e. to kvm-{amd,intel}.ko. Keep the generic exports that are guarded by CONFIG_KVM_EXTERNAL_WRITE_TRACKING=y, as they're explicitly designed/intended for external usage. Link: https://lore.kernel.org/r/20250919003303.1355064-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30	KVM: x86: Drop pointless exports of kvm_arch_xxx() hooks	Sean Christopherson	1	-3/+0
	Drop the exporting of several kvm_arch_xxx() hooks that are only called from arch-neutral code, i.e. that are only called from kvm.ko. Link: https://lore.kernel.org/r/20250919003303.1355064-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30	KVM: x86: Move kvm_intr_is_single_vcpu() to lapic.c	Sean Christopherson	4	-35/+33
	Move kvm_intr_is_single_vcpu() to lapic.c, drop its export, and make its "fast" helper local to lapic.c. kvm_intr_is_single_vcpu() is only usable if the local APIC is in-kernel, i.e. it most definitely belongs in the local APIC code. No functional change intended. Fixes: cf04ec393ed0 ("KVM: x86: Dedup AVIC vs. PI code for identifying target vCPU") Link: https://lore.kernel.org/r/20250919003303.1355064-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30	KVM: Export KVM-internal symbols for sub-modules only	Sean Christopherson	7	-75/+110
	Rework the vast majority of KVM's exports to expose symbols only to KVM submodules, i.e. to x86's kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko. With few exceptions, KVM's exported APIs are intended (and safe) for KVM- internal usage only. Keep kvm_get_kvm(), kvm_get_kvm_safe(), and kvm_put_kvm() as normal exports, as they are needed by VFIO, and are generally safe for external usage (though ideally even the get/put APIs would be KVM-internal, and VFIO would pin a VM by grabbing a reference to its associated file). Implement a framework in kvm_types.h in anticipation of providing a macro to restrict KVM-specific kernel exports, i.e. to provide symbol exports for KVM if and only if KVM is built as one or more modules. Link: https://lore.kernel.org/r/20250919003303.1355064-3-seanjc@google.com Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30	KVM: s390/vfio-ap: Use kvm_is_gpa_in_memslot() instead of open coded equivalent	Sean Christopherson	3	-1/+11
	Use kvm_is_gpa_in_memslot() to check the validity of the notification indicator byte address instead of open coding equivalent logic in the VFIO AP driver. Opportunistically use a dedicated wrapper that exists and is exported expressly for the VFIO AP module. kvm_is_gpa_in_memslot() is generally unsuitable for use outside of KVM; other drivers typically shouldn't rely on KVM's memslots, and using the API requires kvm->srcu (or slots_lock) to be held for the entire duration of the usage, e.g. to avoid TOCTOU bugs. handle_pqap() is a bit of a special case, as it's explicitly invoked from KVM with kvm->srcu already held, and the VFIO AP driver is in many ways an extension of KVM that happens to live in a separate module. Providing a dedicated API for the VFIO AP driver will allow restricting the vast majority of generic KVM's exports to KVM submodules (e.g. to x86's kvm-{amd,intel}.ko vendor mdoules). No functional change intended. Acked-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20250919003303.1355064-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-09-30	KVM: s390: Fix to clear PTE when discarding a swapped page	Gautam Gala	3	-23/+34
	KVM run fails when guests with 'cmm' cpu feature and host are under memory pressure and use swap heavily. This is because npages becomes ENOMEN (out of memory) in hva_to_pfn_slow() which inturn propagates as EFAULT to qemu. Clearing the page table entry when discarding an address that maps to a swap entry resolves the issue. Fixes: 200197908dc4 ("KVM: s390: Refactor and split some gmap helpers") Cc: stable@vger.kernel.org Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Gautam Gala <ggala@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2025-09-24	KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs	Mark Brown	1	-0/+9
	We have a couple of writable bitfields in ID_AA64ISAR3_EL1 but the set_id_regs selftest does not cover this register at all, add coverage. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs	Mark Brown	1	-8/+5
	Currently we list the main set of registers with bits we test three times, once in the test_regs array which is used at runtime, once in the guest code and once in a list of ARRAY_SIZE() operations we use to tell kselftest how many tests we plan to execute. This is needlessly fiddly, when adding new registers as the test_cnt calculation is formatted with two registers per line. Instead count the number of bitfields in the register arrays at runtime. The existing code subtracts ARRAY_SIZE(test_regs) from the number of tests to account for the terminating FTR_REG_END entries in the per register arrays, the new code accounts for this when enumerating. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Cope with arch silliness in EL2 selftest	Oliver Upton	1	-2/+15
	Implementations without FEAT_FGT aren't required to trap the entire ID register space when HCR_EL2.TID3 is set. This is a terrible idea, as the hypervisor may need to advertise the absence of a feature to the VM using a negative value in a signed field, FEAT_E2H0 being a great example of this. Cope with uncooperative implementations in the EL2 selftest by accepting a zero value when FEAT_FGT is absent and otherwise only tolerating the expected nonzero value. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Add basic test for running in VHE EL2	Oliver Upton	2	-0/+59
	Add an embarrassingly simple selftest for sanity checking KVM's VHE EL2 and test that the ID register bits are consistent with HCR_EL2.E2H being RES1. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Enable EL2 by default	Oliver Upton	3	-1/+26
	Take advantage of VHE to implicitly promote KVM selftests to run at EL2 with only slight modification. Update the smccc_filter test to account for this now that the EL2-ness of a VM is visible to tests. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Initialize HCR_EL2	Oliver Upton	1	-0/+6
	Initialize HCR_EL2 such that EL2&0 is considered 'InHost', allowing the use of (mostly) unmodified EL1 selftests at EL2. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters	Oliver Upton	1	-31/+28
	Configuring the number of implemented counters via PMCR_EL0.N was a bad idea in retrospect as it interacts poorly with nested. Migrate the selftest to use the vCPU attribute instead of the KVM_SET_ONE_REG mechanism. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2	Oliver Upton	3	-8/+28
	Arch timer registers are redirected to their hypervisor counterparts when running in VHE EL2. This is great, except for the fact that the hypervisor timers use different PPIs. Use the correct INTIDs when that is the case. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Select SMCCC conduit based on current EL	Oliver Upton	5	-8/+21
	HVCs are taken within the VM when EL2 is in use. Ensure tests use the SMC instruction when running at EL2 to interact with the host. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Provide helper for getting default vCPU target	Oliver Upton	6	-13/+23
	The default vCPU target in KVM selftests is pretty boring in that it doesn't enable any vCPU features. Expose a helper for getting the default target to prepare for cramming in more features. Call KVM_ARM_PREFERRED_TARGET directly from get-reg-list as it needs fine-grained control over feature flags. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts	Oliver Upton	4	-11/+69
	FEAT_VHE has the somewhat nice property of implicitly redirecting EL1 register aliases to their corresponding EL2 representations when E2H=1. Unfortunately, there's no such abstraction for userspace and EL2 registers are always accessed by their canonical encoding. Introduce a helper that applies EL2 redirections to sysregs and use aggressive inlining to catch misuse at compile time. Go a little past the architectural definition for ease of use for test authors (e.g. the stack pointer). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Create a VGICv3 for 'default' VMs	Oliver Upton	19	-56/+60
	Start creating a VGICv3 by default unless explicitly opted-out by the test. While having an interrupt controller is nice, the real benefit here is clearing a hurdle for EL2 VMs which mandate the presence of a VGIC. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation	Oliver Upton	2	-17/+35
	vgic_v3_setup() has a good bit of sanity checking internally to ensure that vCPUs have actually been created and match the dimensioning of the vgic itself. Spin off an unsanitised setup and initialization helper so vgic initialization can be wired in around a 'default' VM's vCPU creation. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Add helper to check for VGICv3 support	Oliver Upton	7	-7/+21
	Introduce a proper predicate for probing VGICv3 by performing a 'test' creation of the device on a dummy VM. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Initialize VGICv3 only once	Oliver Upton	1	-3/+0
	vgic_v3_setup() unnecessarily initializes the vgic twice. Keep the initialization after configuring MMIO frames and get rid of the other. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-24	KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code	Oliver Upton	3	-14/+20
	In order to compel the default usage of EL2 in selftests, move kvm_arch_vm_post_create() to library code and expose an opt-in for using MTE by default. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-23	KVM: VMX: Make CR4.CET a guest owned bit	Mathias Krause	1	-1/+2
	Make CR4.CET a guest-owned bit under VMX by extending KVM_POSSIBLE_CR4_GUEST_BITS accordingly. There's no need to intercept changes to CR4.CET, as it's neither included in KVM's MMU role bits, nor does KVM specifically care about the actual value of a (nested) guest's CR4.CET value, beside for enforcing architectural constraints, i.e. make sure that CR0.WP=1 if CR4.CET=1. Intercepting writes to CR4.CET is particularly bad for grsecurity kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features heavily make use of read-only kernel objects and use a cpu-local CR0.WP toggle to override it, when needed. Under a CET-enabled kernel, this also requires toggling CR4.CET, hence the motivation to make it guest-owned. Using the old test from [1] gives the following runtime numbers (perf stat -r 5 ssdd 10 50000): * grsec guest on linux-6.16-rc5 + cet patches: 2.4647 +- 0.0706 seconds time elapsed ( +- 2.86% ) * grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned: 1.5648 +- 0.0240 seconds time elapsed ( +- 1.53% ) Not only does not intercepting CR4.CET make the test run ~35% faster, it's also more stable with less fluctuation due to fewer VMEXITs. Therefore, make CR4.CET a guest-owned bit where possible. This change is VMX-specific, as SVM has no such fine-grained control register intercept control. If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early. Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1] Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-52-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported	Sean Christopherson	1	-1/+18
	Add a check in the MSRs test to verify that KVM's reported support for MSRs with feature bits is consistent between KVM's MSR save/restore lists and KVM's supported CPUID. To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET, track the "second" feature to avoid false failures when running on a CPU with only one of IBT or SHSTK. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-51-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: selftests: Add coverage for KVM-defined registers in MSRs test	Sean Christopherson	1	-3/+94
	Add test coverage for the KVM-defined GUEST_SSP "register" in the MSRs test. While _KVM's_ goal is to not tie the uAPI of KVM-defined registers to any particular internal implementation, i.e. to not commit in uAPI to handling GUEST_SSP as an MSR, treating GUEST_SSP as an MSR for testing purposes is a-ok and is a naturally fit given the semantics of SSP. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-50-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test	Sean Christopherson	1	-1/+21
	When KVM_{G,S}ET_ONE_REG are supported, verify that MSRs can be accessed via ONE_REG and through the dedicated MSR ioctls. For simplicity, run the test twice, e.g. instead of trying to get MSR values into the exact right state when switching write methods. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-49-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: selftests: Extend MSRs test to validate vCPUs without supported features	Sean Christopherson	1	-3/+25
	Add a third vCPUs to the MSRs test that runs with all features disabled in the vCPU's CPUID model, to verify that KVM does the right thing with respect to emulating accesses to MSRs that shouldn't exist. Use the same VM to verify that KVM is honoring the vCPU model, e.g. isn't looking at per-VM state when emulating MSR accesses. Link: https://lore.kernel.org/r/20250919223258.1604852-48-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test	Sean Christopherson	1	-2/+19
	Extend the MSRs test to support {S,U}_CET, which are a bit of a pain to handled due to the MSRs existing if IBT or SHSTK is supported. To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET, track the second feature, but skip only RDMSR #GP tests to avoid false failures when running on a CPU with only one of IBT or SHSTK (the WRMSR #GP tests are still valid since the enable bits are per-feature). Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-47-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: selftests: Add an MSR test to exercise guest/host and read/write	Sean Christopherson	3	-0/+328
	Add a selftest to verify reads and writes to various MSRs, from both the guest and host, and expect success/failure based on whether or not the vCPU supports the MSR according to supported CPUID. Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but provides more coverage with respect to host accesses, and will be extended to provide addition testing of CPUID-based features, save/restore lists, and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in KUT. If kvm.ignore_msrs=true, skip the unsupported and reserved testcases as KVM's ABI is a mess; what exactly is supposed to be ignored, and when, varies wildly. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-46-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors	Sean Christopherson	2	-1/+6
	Add {HV,CP,SX}_VECTOR definitions for AMD's Hypervisor Injection Exception, VMM Communication Exception, and SVM Security Exception vectors, along with human friendly formatting for trace_kvm_inj_exception(). Note, KVM is all but guaranteed to never observe or inject #SX, and #HV is also unlikely to go unused. Add the architectural collateral mostly for completeness, and on the off chance that hardware goes off the rails. Link: https://lore.kernel.org/r/20250919223258.1604852-44-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: x86: Define Control Protection Exception (#CP) vector	Sean Christopherson	2	-1/+2
	Add a CP_VECTOR definition for CET's Control Protection Exception (#CP), along with human friendly formatting for trace_kvm_inj_exception(). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-43-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: x86: Add human friendly formatting for #XM, and #VE	Sean Christopherson	1	-2/+2
	Add XM_VECTOR and VE_VECTOR pretty-printing for trace_kvm_inj_exception(). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-42-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: SVM: Enable shadow stack virtualization for SVM	John Allen	1	-3/+0
	Remove the explicit clearing of shadow stack CPU capabilities. Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: John Allen <john.allen@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-41-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid	Sean Christopherson	3	-2/+6
	Synchronize XSS from the GHCB to KVM's internal tracking if the guest marks XSS as valid on a #VMGEXIT. Like XCR0, KVM needs an up-to-date copy of XSS in order to compute the required XSTATE size when emulating CPUID.0xD.0x1 for the guest. Treat the incoming XSS change as an emulated write, i.e. validatate the guest-provided value, to avoid letting the guest load garbage into KVM's tracking. Simply ignore bad values, as either the guest managed to get an unsupported value into hardware, or the guest is misbehaving and providing pure garbage. In either case, KVM can't fix the broken guest. Explicitly allow access to XSS at all times, as KVM needs to ensure its copy of XSS stays up-to-date. E.g. KVM supports migration of SEV-ES guests and so needs to allow the host to save/restore XSS, otherwise a guest that knows its XSS hasn't change could get stale/bad CPUID emulation if the guest doesn't provide XSS in the GHCB on every exit. This creates a hypothetical problem where a guest could request emulation of RDMSR or WRMSR on XSS, but arguably that's not even a problem, e.g. it would be entirely reasonable for a guest to request "emulation" as a way to inform the hypervisor that its XSS value has been modified. Note, emulating the change as an MSR write also takes care of side effects, e.g. marking dynamic CPUID bits as dirty. Suggested-by: John Allen <john.allen@amd.com> base-commit: 14298d819d5a6b7180a4089e7d2121ca3551dc6c Link: https://lore.kernel.org/r/20250919223258.1604852-40-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: SVM: Pass through shadow stack MSRs as appropriate	John Allen	1	-0/+11
	Pass through XSAVE managed CET MSRs on SVM when KVM supports shadow stack. These cannot be intercepted without also intercepting XSAVE which would likely cause unacceptable performance overhead. MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted. Reviewed-by: Chao Gao <chao.gao@intel.com> Signed-off-by: John Allen <john.allen@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-39-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: SVM: Update dump_vmcb with shadow stack save area additions	John Allen	1	-0/+11
	Add shadow stack VMCB fields to dump_vmcb. PL0_SSP, PL1_SSP, PL2_SSP, PL3_SSP, and U_CET are part of the SEV-ES save area and are encrypted, but can be decrypted and dumped if the guest policy allows debugging. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: John Allen <john.allen@amd.com> Link: https://lore.kernel.org/r/20250919223258.1604852-38-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02	Sean Christopherson	1	-0/+20
	Transfer the three CET Shadow Stack VMCB fields (S_CET, ISST_ADDR, and SSP) on VMRUN, #VMEXIT, and loading nested state (saving nested state simply copies the entire save area). SVM doesn't provide a way to disallow L1 from enabling Shadow Stacks for L2, i.e. KVM must provide nested support before advertising SHSTK to userspace. Link: https://lore.kernel.org/r/20250919223258.1604852-37-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: SVM: Emulate reads and writes to shadow stack MSRs	John Allen	2	-1/+23
	Emulate shadow stack MSR access by reading and writing to the corresponding fields in the VMCB. Signed-off-by: John Allen <john.allen@amd.com> [sean: mark VMCB_CET dirty/clean as appropriate] Link: https://lore.kernel.org/r/20250919223258.1604852-36-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state	Chao Gao	1	-2/+11
	Advertise the LOAD_CET_STATE VM-Entry/Exit control bits in the nested VMX MSRS, as all nested support for CET virtualization, including consistency checks, is in place. Advertise support if and only if KVM supports at least one of IBT or SHSTK. While it's userspace's responsibility to provide a consistent CPU model to the guest, that doesn't mean KVM should set userspace up to fail. Note, the existing {CLEAR,LOAD}_BNDCFGS behavior predates KVM_X86_QUIRK_STUFF_FEATURE_MSRS, i.e. KVM "solved" the inconsistent CPU model problem by overwriting the VMX MSRs provided by userspace. Signed-off-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-35-seanjc@google.com Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: nVMX: Add consistency checks for CET states	Chao Gao	1	-0/+48
	Introduce consistency checks for CET states during nested VM-entry. A VMCS contains both guest and host CET states, each comprising the IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various checks are applied to CET states during VM-entry as documented in SDM Vol3 Chapter "VM ENTRIES". Implement all these checks during nested VM-entry to emulate the architectural behavior. In summary, there are three kinds of checks on guest/host CET states during VM-entry: A. Checks applied to both guest states and host states: * The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS) and 11 (TRACKER) cannot both be set. * SSP should not have bits 1:0 set. * The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical. B. Checks applied to host states only * IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32 bits cleared. C. Checks applied to guest states only: * IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1 are identical, where N is the CPU's maximum linear-address width). But, bits 63:N of SSP must be identical. Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-34-seanjc@google.com [sean: have common helper return 0/-EINVAL, not true/false] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET	Chao Gao	1	-0/+6
	Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state area in the VMCS12. This ensures that configurations with CR4.CET set and CR0.WP not set result in VM-entry failure, aligning with architectural behavior. Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-33-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: nVMX: Prepare for enabling CET support for nested guest	Yang Weijiang	5	-1/+101
	Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting to enable CET for nested VM. vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants to resume L2, that way correct CET states can be observed by one another. Please note that consistency checks regarding CET state during VM-Entry will be added later to prevent this patch from becoming too large. Advertising the new CET VM_ENTRY/EXIT control bits are also be deferred until after the consistency checks are added. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Tested-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250919223258.1604852-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2	Yang Weijiang	2	-9/+23
	Per SDM description(Vol.3D, Appendix A.1): "If bit 56 is read as 1, software can use VM entry to deliver a hardware exception with or without an error code, regardless of vector" Modify has_error_code check before inject events to nested guest. Only enforce the check when guest is in real mode, the exception is not hard exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all other case ignore the check to make the logic consistent with SDM. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: VMX: Configure nested capabilities after CPU capabilities	Sean Christopherson	1	-2/+7
	Swap the order between configuring nested VMX capabilities and base CPU capabilities, so that nested VMX support can be conditioned on core KVM support, e.g. to allow conditioning support for LOAD_CET_STATE on the presence of IBT or SHSTK. Because the sanity checks on nested VMX config performed by vmx_check_processor_compat() run _after_ vmx_hardware_setup(), any use of kvm_cpu_cap_has() when configuring nested VMX support will lead to failures in vmx_check_processor_compat(). While swapping the order of two (or more) configuration flows can lead to a game of whack-a-mole, in this case nested support inarguably should be done after base support. KVM should never condition base support on nested support, because nested support is fully optional, while obviously it's desirable to condition nested support on base support. And there's zero evidence the current ordering was intentional, e.g. commit 66a6950f9995 ("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking") likely placed the call to kvm_set_cpu_caps() after nested setup because it looked pretty. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: x86: Enable CET virtualization for VMX and advertise to userspace	Yang Weijiang	6	-3/+45
	Add support for the LOAD_CET_STATE VM-Enter and VM-Exit controls, the CET XFEATURE bits in XSS, and advertise support for IBT and SHSTK to userspace. Explicitly clear IBT and SHSTK onn SVM, as additional work is needed to enable CET on SVM, e.g. to context switch S_CET and other state. Disable KVM CET feature if unrestricted_guest is unsupported/disabled as KVM does not support emulating CET, as running without Unrestricted Guest can result in KVM emulating large swaths of guest code. While it's highly unlikely any guest will trigger emulation while also utilizing IBT or SHSTK, there's zero reason to allow CET without Unrestricted Guest as that combination should only be possible when explicitly disabling unrestricted_guest for testing purposes. Disable CET if VMX_BASIC[bit56] == 0, i.e. if hardware strictly enforces the presence of an Error Code based on exception vector, as attempting to inject a #CP with an Error Code (#CP architecturally has an Error Code) will fail due to the #CP vector historically not having an Error Code. Clear S_CET and SSP-related VMCS on "reset" to emulate the architectural of CET MSRs and SSP being reset to 0 after RESET, power-up and INIT. Note, KVM already clears guest CET state that is managed via XSTATE in kvm_xstate_reset(). Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: move some bits to separate patches, massage changelog] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-29-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true	Sean Christopherson	1	-0/+10
	Make IBT and SHSTK virtualization mutually exclusive with "officially" supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the allow_smaller_maxphyaddr module param is set. Running a guest with a smaller MAXPHYADDR requires intercepting #PF, and can also trigger emulation of arbitrary instructions. Intercepting and reacting to #PFs doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle Shadow Stack accesses, and emulating arbitrary instructions doesn't play nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side effects, e.g. doesn't enforce end-branch markers or model Shadow Stack updates. Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is overkill, as allow_smaller_maxphyaddr is only problematic if the guest is actually configured to have a smaller MAXPHYADDR. However, KVM's ABI doesn't provide a way to express that IBT and SHSTK may break if enabled in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR. I.e. the alternative is to do nothing in KVM and instead update documentation and hope KVM users are thorough readers. Go with the conservative-but-correct approach; worst case scenario, this restriction can be dropped if there's a strong use case for enabling CET on hosts with allow_smaller_maxphyaddr. Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-28-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: x86: Initialize allow_smaller_maxphyaddr earlier in setup	Sean Christopherson	2	-23/+23
	Initialize allow_smaller_maxphyaddr during hardware setup as soon as KVM knows whether or not TDP will be utilized. To avoid having to teach KVM's emulator all about CET, KVM's upcoming CET virtualization support will be mutually exclusive with allow_smaller_maxphyaddr, i.e. will disable SHSTK and IBT if allow_smaller_maxphyaddr is enabled. In general, allow_smaller_maxphyaddr should be initialized as soon as possible since it's globally visible while its only input is whether or not EPT/NPT is enabled. I.e. there's effectively zero risk of setting allow_smaller_maxphyaddr too early, and substantial risk of setting it too late. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250922184743.1745778-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-23	KVM: x86: Disable support for Shadow Stacks if TDP is disabled	Sean Christopherson	1	-0/+8
	Make TDP a hard requirement for Shadow Stacks, as there are no plans to add Shadow Stack support to the Shadow MMU. E.g. KVM hasn't been taught to understand the magic Writable=0,Dirty=1 combination that is required for Shadow Stack accesses, and so enabling Shadow Stacks when using shadow paging will put the guest into an infinite #PF loop (KVM thinks the shadow page tables have a valid mapping, hardware says otherwise). Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-27-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>