linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Files	Lines
2025-09-18	KVM: x86/pmu: Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h header	Dapeng Mi	4	-17/+17
	Move PMU_CAP_{FW_WRITES,LBR_FMT} into msr-index.h and rename them with PERF_CAP prefix to keep consistent with other perf capabilities macros. No functional change intended. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-24-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-18	KVM: x86: Rename vmx_vmentry/vmexit_ctrl() helpers	Dapeng Mi	1	-4/+4
	Rename the two helpers vmx_vmentry/vmexit_ctrl() to vmx_get_initial_vmentry/vmexit_ctrl() to represent their real meaning. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-23-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-18	KVM: x86/pmu: Snapshot host (i.e. perf's) reported PMU capabilities	Sean Christopherson	1	-5/+10
	Take a snapshot of the unadulterated PMU capabilities provided by perf so that KVM can compare guest vPMU capabilities against hardware capabilities when determining whether or not to intercept PMU MSRs (and RDPMC). Reviewed-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-18	KVM: SVM: Check pmu->version, not enable_pmu, when getting PMC MSRs	Sean Christopherson	1	-1/+1
	Gate access to PMC MSRs based on pmu->version, not on kvm->arch.enable_pmu, to more accurately reflect KVM's behavior. This is a glorified nop, as pmu->version and pmu->nr_arch_gp_counters can only be non-zero if amd_pmu_refresh() is reached, kvm_pmu_refresh() invokes amd_pmu_refresh() if and only if kvm->arch.enable_pmu is true, and amd_pmu_refresh() forces pmu->version to be 1 or 2. I.e. the following holds true: !pmu->nr_arch_gp_counters \|\| kvm->arch.enable_pmu == (pmu->version > 0) and so the only way for amd_pmu_get_pmc() to return a non-NULL value is if both kvm->arch.enable_pmu and pmu->version evaluate to true. No real functional change intended. Reviewed-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-18	KVM: VMX: Setup canonical VMCS config prior to kvm_x86_vendor_init()	Sean Christopherson	1	-4/+9
	Setup the golden VMCS config during vmx_init(), before the call to kvm_x86_vendor_init(), instead of waiting until the callback to do hardware setup. setup_vmcs_config() only touches VMX state, i.e. doesn't poke anything in kvm.ko, and has no runtime dependencies beyond hv_init_evmcs(). Setting the VMCS config early on will allow referencing VMCS and VMX capabilities at any point during setup, e.g. to check for PERF_GLOBAL_CTRL save/load support during mediated PMU initialization. Tested-by: Xudong Hao <xudong.hao@intel.com> Link: https://lore.kernel.org/r/20250806195706.1650976-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-16	Documentation: KVM: Call out that KVM strictly follows the 8254 PIT spec	Jiaming Zhang	1	-0/+6
	Explicitly document that the behavior of KVM_SET_PIT2 strictly conforms to the Intel 8254 PIT hardware specification, specifically that a write of '0' adheres to the spec's definition that a programmed count of '0' is converted to the maximum possible value (2^16). E.g. an unaware userspace might attempt to validate that KVM_GET_PIT2 returns the exact state set via KVM_SET_PIT2, and be surprised when the returned count is 65536, not 0. Add a references to the Intel 8254 PIT datasheet that will hopefully stay fresh for some time (the internet isn't exactly brimming with copies of the 8254 datasheet). Link: https://lore.kernel.org/all/CANypQFbEySjKOFLqtFFf2vrEe=NBr7XJfbkjQhqXuZGg7Rpoxw@mail.gmail.com Signed-off-by: Jiaming Zhang <r772577952@gmail.com> Link: https://lore.kernel.org/r/20250905174736.260694-1-r772577952@gmail.com [sean: add context Link, drop local APIC change, massage changelog accordingly] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-16	KVM: x86: hyper-v: Use guard() instead of mutex_lock() to simplify code	Liao Yuanhong	1	-7/+5
	Use guard(mutex) instead of mutex_lock/mutex_unlock pair to simplify the error handling when setting up the TSC page for a Hyper-V guest. No functional change intended. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Link: https://lore.kernel.org/r/20250901131604.646415-1-liaoyuanhong@vivo.com [sean: tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-16	KVM: x86: Use guard() instead of mutex_lock() to simplify code	Liao Yuanhong	1	-10/+7
	Use guard(mutex) instead of mutex_lock/mutex_unlock pair to simplify the error handling when allocating the APIC access page. No functional change intended. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Link: https://lore.kernel.org/r/20250901131822.647802-1-liaoyuanhong@vivo.com [sean: add blank link to isolate guard(), tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-16	KVM: x86/pmu: Correct typo "_COUTNERS" to "_COUNTERS"	Dapeng Mi	2	-7/+7
	Fix typos. "_COUTNERS" -> "_COUNTERS". Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Link: https://lore.kernel.org/r/20250718001905.196989-2-dapeng1.mi@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-16	KVM: TDX: Reject fully in-kernel irqchip if EOIs are protected, i.e. for TDX VMs	Sagi Shahar	3	-0/+15
	Reject KVM_CREATE_IRQCHIP if the VM type has protected EOIs, i.e. if KVM can't intercept EOI and thus can't faithfully emulate level-triggered interrupts that are routed through the I/O APIC. For TDX VMs, the TDX-Module owns the VMX EOI-bitmap and configures all IRQ vectors to have the CPU accelerate EOIs, i.e. doesn't allow KVM to intercept any EOIs. KVM already requires a split irqchip[1], but does so during vCPU creation, which is both too late to allow userspace to fallback to a split irqchip and a less-than-stellar experience for userspace since an -EINVAL on KVM_VCPU_CREATE is far harder to debug/triage than failure exactly on KVM_CREATE_IRQCHIP. And of course, allowing an action that ultimately fails is arguably a bug regardless of the impact on userspace. Link: https://lore.kernel.org/lkml/20250222014757.897978-11-binbin.wu@linux.intel.com [1] Link: https://lore.kernel.org/lkml/aK3vZ5HuKKeFuuM4@google.com Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sagi Shahar <sagis@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250827011726.2451115-1-sagis@google.com [sean: massage shortlog+changelog, relocate setting has_protected_eoi] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10	KVM: x86: Move vector_hashing into lapic.c	Sean Christopherson	3	-11/+6
	Move the vector_hashing module param into lapic.c now that all usage is contained within the local APIC emulation code. Opportunistically drop the accessor and append "_enabled" to the variable to help capture that it's a boolean module param. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10	KVM: x86: Make "lowest priority" helpers local to lapic.c	Sean Christopherson	2	-16/+12
	Make various helpers for resolving lowest priority IRQs local to lapic.c now that kvm_irq_delivery_to_apic() lives in lapic.c as well. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-10	KVM: x86: Move kvm_irq_delivery_to_apic() from irq.c to lapic.c	Sean Christopherson	4	-61/+60
	Move kvm_irq_delivery_to_apic() to lapic.c as it is specific to local APIC emulation. This will allow burying more local APIC code in lapic.c, e.g. the various "lowest priority" helpers. No functional change intended. Link: https://lore.kernel.org/r/20250821214209.3463350-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Zero XSTATE components on INIT by iterating over supported features	Chao Gao	1	-3/+9
	Tweak the code a bit to facilitate resetting more xstate components in the future, e.g., CET's xstate-managed MSRs. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-6-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Manually clear MPX state only on INIT	Sean Christopherson	1	-16/+30
	Don't manually clear/zero MPX state on RESET, as the guest FPU state is zero allocated and KVM only does RESET during vCPU creation, i.e. the relevant state is guaranteed to be all zeroes. Opportunistically move the relevant code into a helper in anticipation of adding support for CET shadow stacks, which also has state that is zeroed on INIT. Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-5-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Add kvm_msr_{read,write}() helpers	Yang Weijiang	3	-5/+16
	Wrap __kvm_{get,set}_msr() into two new helpers for KVM usage and use the helpers to replace existing usage of the raw functions. kvm_msr_{read,write}() are KVM-internal helpers, i.e. used when KVM needs to get/set a MSR value for emulating CPU behavior, i.e., host_initiated == %true in the helpers. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-4-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Use double-underscore read/write MSR helpers as appropriate	Sean Christopherson	1	-13/+16
	Use the double-underscore helpers for emulating MSR reads and writes in he no-underscore versions to better capture the relationship between the two sets of APIs (the double-underscore versions don't honor userspace MSR filters). No functional change intended. Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-3-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Rename kvm_{g,s}et_msr()* to show that they emulate guest accesses	Yang Weijiang	4	-26/+27
	Rename kvm_{g,s}et_msr_with_filter() kvm_{g,s}et_msr() to kvm_emulate_msr_{read,write} __kvm_emulate_msr_{read,write} to make it more obvious that KVM uses these helpers to emulate guest behaviors, i.e., host_initiated == false in these helpers. Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Mathias Krause <minipli@grsecurity.net> Tested-by: John Allen <john.allen@amd.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/r/20250812025606.74625-2-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Advertise support for the immediate form of MSR instructions	Xin Li	4	-2/+16
	Advertise support for the immediate form of MSR instructions to userspace if the instructions are supported by the underlying CPU, and KVM is using VMX, i.e. is running on an Intel-compatible CPU. For SVM, explicitly clear X86_FEATURE_MSR_IMM to ensure KVM doesn't over- report support if AMD-compatible CPUs ever implement the immediate forms, as SVM will likely require explicit enablement in KVM. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20250805202224.1475590-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: VMX: Support the immediate form of WRMSRNS in the VM-Exit fastpath	Xin Li	3	-4/+17
	Add support for handling "WRMSRNS with an immediate" VM-Exits in KVM's fastpath. On Intel, all writes to the x2APIC ICR and to the TSC Deadline MSR are non-serializing, i.e. it's highly likely guest kernels will switch to using WRMSRNS when possible. And in general, any MSR written via WRMSRNS is probably worth handling in the fastpath, as the entire point of WRMSRNS is to shave cycles in hot paths. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: rewrite changelog, split rename to separate patch] Link: https://lore.kernel.org/r/20250805202224.1475590-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel	Xin Li	6	-13/+90
	Add support for the immediate forms of RDMSR and WRMSRNS (currently Intel-only). The immediate variants are only valid in 64-bit mode, and use a single general purpose register for the data (the register is also encoded in the instruction, i.e. not implicit like regular RDMSR/WRMSR). The immediate variants are primarily motivated by performance, not code size: by having the MSR index in an immediate, it is available much earlier in the CPU pipeline, which allows hardware much more leeway about how a particular MSR is handled. Intel VMX support for the immediate forms of MSR accesses communicates exit information to the host as follows: 1) The immediate form of RDMSR uses VM-Exit Reason 84. 2) The immediate form of WRMSRNS uses VM-Exit Reason 85. 3) For both VM-Exit reasons 84 and 85, the Exit Qualification field is set to the MSR index that triggered the VM-Exit. 4) Bits 3 ~ 6 of the VM-Exit Instruction Information field are set to the register encoding used by the immediate form of the instruction, i.e. the destination register for RDMSR, and the source for WRMSRNS. 5) The VM-Exit Instruction Length field records the size of the immediate form of the MSR instruction. To deal with userspace RDMSR exits, stash the destination register in a new kvm_vcpu_arch field, similar to cui_linear_rip, pio, etc. Alternatively, the register could be saved in kvm_run.msr or re-retrieved from the VMCS, but the former would require sanitizing the value to ensure userspace doesn't clobber the value to an out-of-bounds index, and the latter would require a new one-off kvm_x86_ops hook. Don't bother adding support for the instructions in KVM's emulator, as the only way for RDMSR/WRMSR to be encountered is if KVM is emulating large swaths of code due to invalid guest state, and a vCPU cannot have invalid guest state while in 64-bit mode. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: minor tweaks, massage and expand changelog] Link: https://lore.kernel.org/r/20250805202224.1475590-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Rename handle_fastpath_set_msr_irqoff() to handle_fastpath_wrmsr()	Xin Li	4	-5/+5
	Rename the WRMSR fastpath API to drop "irqoff", as that information is redundant (the fastpath always runs with IRQs disabled), and to prepare for adding a fastpath for the immediate variant of WRMSRNS. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> [sean: split to separate patch, write changelog] Link: https://lore.kernel.org/r/20250805202224.1475590-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Rename local "ecx" variables to "msr" and "pmc" as appropriate	Sean Christopherson	1	-12/+12
	Rename "ecx" variables in {RD,WR}MSR and RDPMC helpers to "msr" and "pmc" respectively, in anticipation of adding support for the immediate variants of RDMSR and WRMSRNS, and to better document what the variables hold (versus where the data originated). No functional change intended. Link: https://lore.kernel.org/r/20250805202224.1475590-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions	Xin Li	2	-0/+2
	The immediate form of MSR access instructions are primarily motivated by performance, not code size: by having the MSR number in an immediate, it is available much earlier in the pipeline, which allows the hardware much more leeway about how a particular MSR is handled. Use a scattered CPU feature bit for MSR immediate form instructions. Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Link: https://lore.kernel.org/r/20250805202224.1475590-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Add a fastpath handler for INVD	Sean Christopherson	4	-0/+14
	Add a fastpath handler for INVD so that the common fastpath logic can be trivially tested on both Intel and AMD. Under KVM, INVD is always: (a) intercepted, (b) available to the guest, and (c) emulated as a nop, with no side effects. Combined with INVD not having any inputs or outputs, i.e. no register constraints, INVD is the perfect instruction for exercising KVM's fastpath as it can be inserted into practically any guest-side code stream. Link: https://lore.kernel.org/r/20250805190526.1453366-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Push acquisition of SRCU in fastpath into kvm_pmu_trigger_event()	Sean Christopherson	2	-14/+8
	Acquire SRCU in the VM-Exit fastpath if and only if KVM needs to check the PMU event filter, to further trim the amount of code that is executed with SRCU protection in the fastpath. Counter-intuitively, holding SRCU can do more harm than good due to masking potential bugs, and introducing a new SRCU-protected asset to code reachable via kvm_skip_emulated_instruction() would be quite notable, i.e. definitely worth auditing. E.g. the primary user of kvm->srcu is KVM's memslots, accessing memslots all but guarantees guest memory may be accessed, accessing guest memory can fault, and page faults might sleep, which isn't allowed while IRQs are disabled. Not acquiring SRCU means the (hypothetical) illegal sleep would be flagged when running with PROVE_RCU=y, even if DEBUG_ATOMIC_SLEEP=n. Note, performance is NOT a motivating factor, as SRCU lock/unlock only adds ~15 cycles of latency to fastpath VM-Exits. I.e. overhead isn't a concern _if_ SRCU protection needs to be extended beyond PMU events, e.g. to honor userspace MSR filters. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Rename check_pmu_event_filter() to pmc_is_event_allowed()	Sean Christopherson	1	-3/+3
	Rename check_pmu_event_filter() to make its polarity more obvious, and to connect the dots to is_gp_event_allowed() and is_fixed_event_allowed(). No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Drop redundant check on PMC being locally enabled for emulation	Sean Christopherson	1	-2/+1
	Drop the check on a PMC being locally enabled when triggering emulated events, as the bitmap of passed-in PMCs only contains locally enabled PMCs. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Drop redundant check on PMC being globally enabled for emulation	Sean Christopherson	1	-1/+1
	When triggering PMC events in response to emulation, drop the redundant checks on a PMC being globally and locally enabled, as the passed in bitmap contains only PMCs that are locally enabled (and counting the right event), and the local copy of the bitmap has already been masked with global_ctrl. No true functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Open code pmc_event_is_allowed() in its callers	Sean Christopherson	1	-8/+4
	Open code pmc_event_is_allowed() in its callers, as kvm_pmu_trigger_event() only needs to check the event filter (both global and local enables are consulted outside of the loop). No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Rename pmc_speculative_in_use() to pmc_is_locally_enabled()	Sean Christopherson	3	-5/+5
	Rename pmc_speculative_in_use() to pmc_is_locally_enabled() to better capture what it actually tracks, and to show its relationship to pmc_is_globally_enabled(). While neither AMD nor Intel refer to event selectors or the fixed counter control MSR as "local", it's the obvious name to pair with "global". As for "speculative", there's absolutely nothing speculative about the checks. E.g. for PMUs without PERF_GLOBAL_CTRL, from the guest's perspective, the counters are "in use" without any qualifications. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Calculate set of to-be-emulated PMCs at time of WRMSRs	Sean Christopherson	3	-21/+61
	Calculate and track PMCs that are counting instructions/branches retired when the PMC's event selector (or fixed counter control) is modified instead evaluating the event selector on-demand. Immediately recalc a PMC's configuration on writes to avoid false negatives/positives when KVM skips an emulated WRMSR, which is guaranteed to occur before the main run loop processes KVM_REQ_PMU. Out of an abundance of caution, and because it's relatively cheap, recalc reprogrammed PMCs in kvm_pmu_handle_event() as well. Recalculating in response to KVM_REQ_PMU _should_ be unnecessary, but for now be paranoid to avoid introducing easily-avoidable bugs in edge cases. The code can be removed in the future if necessary, e.g. in the unlikely event that the overhead of recalculating to-be-emulated PMCs is noticeable. Note! Deliberately don't check the PMU event filters, as doing so could result in KVM consuming stale information. Tracking which PMCs are counting branches/instructions will allow grabbing SRCU in the fastpath VM-Exit handlers if and only if a PMC event might be triggered (to consult the event filters), and will also allow the upcoming mediated PMU to do the right thing with respect to counting instructions (the mediated PMU won't be able to update PMCs in the VM-Exit fastpath). Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Add wrappers for counting emulated instructions/branches	Sean Christopherson	4	-15/+24
	Add wrappers for triggering instruction retired and branch retired PMU events in anticipation of reworking the internal mechanisms to track which PMCs need to be evaluated, e.g. to avoid having to walk and check every PMC. Opportunistically bury "struct kvm_pmu_emulated_event_selectors" in pmu.c. No functional change intended. Link: https://lore.kernel.org/r/20250805190526.1453366-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86/pmu: Move kvm_init_pmu_capability() to pmu.c	Sean Christopherson	2	-46/+48
	Move kvm_init_pmu_capability() to pmu.c so that future changes can access variables that have no business being visible outside of pmu.c. kvm_init_pmu_capability() is called once per module load, there's is zero reason it needs to be inlined. No functional change intended. Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Sandipan Das <sandipan.das@amd.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250805190526.1453366-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Fold WRMSR fastpath helpers into the main handler	Sean Christopherson	1	-29/+5
	Fold the per-MSR WRMSR fastpath helpers into the main handler now that the IPI path in particular is relatively tiny. In addition to eliminating a decent amount of boilerplate, this removes the ugly -errno/1/0 => bool conversion (which is "necessitated" by kvm_x2apic_icr_write_fast()). Opportunistically drop the comment about IPIs, as the purpose of the fastpath is hopefully self-evident, and _if_ it needs more documentation, the documentation (and rules!) should be placed in a more central location. No functional change intended. Link: https://lore.kernel.org/r/20250805190526.1453366-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Unconditionally grab data from EDX:EAX in WRMSR fastpath	Sean Christopherson	1	-3/+1
	Always grab EDX:EAX in the WRMSR fastpath to deduplicate and simplify the case statements, and to prepare for handling immediate variants of WRMSRNS in the fastpath (the data register is explicitly provided in that case). There's no harm in reading the registers, as their values are always available, i.e. don't require VMREADs (or similarly slow operations). No real functional change intended. Cc: Xin Li <xin@zytor.com> Link: https://lore.kernel.org/r/20250805190526.1453366-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Acquire SRCU in WRMSR fastpath iff instruction needs to be skipped	Sean Christopherson	1	-13/+8
	Acquire SRCU in the WRMSR fastpath if and only if an instruction needs to be skipped, i.e. only if the fastpath succeeds. The reasoning in commit 3f2739bd1e0b ("KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes") about "avoid having to play whack-a-mole" seems sound, but in hindsight unconditionally acquiring SRCU does more harm than good. While acquiring/releasing SRCU isn't slow per se, the things that are _protected_ by kvm->srcu are generally safe to access only in the "slow" VM-Exit path. E.g. accessing memslots in generic helpers is never safe, because accessing guest memory with IRQs disabled is unless unsafe (except when kvm_vcpu_read_guest_atomic() is used, but that API should never be used in emulation helpers). In other words, playing whack-a-mole is actually desirable in this case, because every access to an asset protected by kvm->srcu warrants further scrutiny. Link: https://lore.kernel.org/r/20250805190526.1453366-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Unconditionally handle MSR_IA32_TSC_DEADLINE in fastpath exits	Sean Christopherson	3	-5/+1
	Drop the fastpath VM-Exit requirement that KVM can use the hypervisor timer to emulate the APIC timer in TSC deadline mode. I.e. unconditionally handle MSR_IA32_TSC_DEADLINE WRMSRs in the fastpath. Restricting the fastpath to maybe using the VMX preemption timer is ineffective and unnecessary. If the requested deadline can't be programmed into the VMX preemption timer, KVM will fall back to hrtimers, i.e. the restriction is ineffective as far as preventing any kind of worst case scenario. But guarding against a worst case scenario is completely unnecessary as the "slow" path, start_sw_tscdeadline() => hrtimer_start(), explicitly disables IRQs. In fact, the worst case scenario is when KVM thinks it can use the VMX preemption timer, as KVM will eat the overhead of calling into vmx_set_hv_timer() and falling back to hrtimers. Opportunistically limit kvm_can_use_hv_timer() to lapic.c as the fastpath code was the only external user. Stating the obvious, this allows handling MSR_IA32_TSC_DEADLINE writes in the fastpath on AMD CPUs. Link: https://lore.kernel.org/r/20250805190526.1453366-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Drop semi-arbitrary restrictions on IPI type in fastpath	Sean Christopherson	1	-7/+1
	Drop the restrictions on fastpath IPIs only working for fixed IRQs with a physical destination now that the fastpath is explicitly limited to "fast" delivery. Limiting delivery to a single physical APIC ID guarantees only one vCPU will receive the event, but that isn't necessary "fast", e.g. if the targeted vCPU is the last of 4096 vCPUs. And logical destination mode or shorthand (to self) can also be fast, e.g. if only a few vCPUs are being targeted. Lastly, there's nothing inherently slow about delivering an NMI, INIT, SIPI, SMI, etc., i.e. there's no reason to artificially limit fastpath delivery to fixed vector IRQs. Link: https://lore.kernel.org/r/20250805190526.1453366-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Only allow "fast" IPIs in fastpath WRMSR(X2APIC_ICR) handler	Sean Christopherson	3	-4/+27
	Explicitly restrict fastpath ICR writes to IPIs that are "fast", i.e. can be delivered without having to walk all vCPUs, and that target at most 16 vCPUs. Artificially restricting ICR writes to physical mode guarantees at most one vCPU will receive in IPI (because x2APIC IDs are read-only), but that delivery might not be "fast". E.g. even if the vCPU exists, KVM might have to iterate over 4096 vCPUs to find the right one. Limiting delivery to fast IPIs aligns the WRMSR fastpath with kvm_arch_set_irq_inatomic() (which also runs with IRQs disabled), and will allow dropping the semi-arbitrary restrictions on delivery mode and type. Link: https://lore.kernel.org/r/20250805190526.1453366-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Add kvm_icr_to_lapic_irq() helper to allow for fastpath IPIs	Sean Christopherson	1	-12/+18
	Extract the code for converting an ICR message into a kvm_lapic_irq structure into a local helper so that a fast-only IPI path can share the conversion logic. No functional change intended. Link: https://lore.kernel.org/r/20250805190526.1453366-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	kvm: x86: simplify kvm_vector_to_index()	Yury Norov	1	-9/+2
	Use find_nth_bit() and make the function almost a one-liner. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: allow CPUID 0xC000_0000 to proceed on Zhaoxin CPUs	Ewan Hai	1	-1/+2
	Bypass the Centaur-only filter for the CPUID signature leaf so that processing continues when the CPU vendor is Zhaoxin. Signed-off-by: Ewan Hai <ewanhai-oc@zhaoxin.com> Link: https://lore.kernel.org/r/20250818083034.93935-1-ewanhai-oc@zhaoxin.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	arch/x86/kvm/ioapic: Remove license boilerplate with bad FSF address	Thomas Huth	1	-14/+1
	The Free Software Foundation does not reside in "59 Temple Place" anymore, so we should not mention that address in the source code here. But instead of updating the address to their current location, let's rather drop the license boilerplate text here and use a proper SPDX license identifier instead. The text talks about the "GNU Lesser General Public License" and "any later version", so LGPL-2.1+ is the right choice here. Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/r/20250728152843.310260-1-thuth@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: SVM: Skip fastpath emulation on VM-Exit if next RIP isn't valid	Sean Christopherson	1	-2/+10
	Skip the WRMSR and HLT fastpaths in SVM's VM-Exit handler if the next RIP isn't valid, e.g. because KVM is running with nrips=false. SVM must decode and emulate to skip the instruction if the CPU doesn't provide the next RIP, and getting the instruction bytes to decode requires reading guest memory. Reading guest memory through the emulator can fault, i.e. can sleep, which is disallowed since the fastpath handlers run with IRQs disabled. BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:106 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 32611, name: qemu preempt_count: 1, expected: 0 INFO: lockdep is turned off. irq event stamp: 30580 hardirqs last enabled at (30579): [<ffffffffc08b2527>] vcpu_run+0x1787/0x1db0 [kvm] hardirqs last disabled at (30580): [<ffffffffb4f62e32>] __schedule+0x1e2/0xed0 softirqs last enabled at (30570): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210 softirqs last disabled at (30568): [<ffffffffb4247a64>] fpu_swap_kvm_fpstate+0x44/0x210 CPU: 298 UID: 0 PID: 32611 Comm: qemu Tainted: G U 6.16.0-smp--e6c618b51cfe-sleep #782 NONE Tainted: [U]=USER Hardware name: Google Astoria-Turin/astoria, BIOS 0.20241223.2-0 01/17/2025 Call Trace: <TASK> dump_stack_lvl+0x7d/0xb0 __might_resched+0x271/0x290 __might_fault+0x28/0x80 kvm_vcpu_read_guest_page+0x8d/0xc0 [kvm] kvm_fetch_guest_virt+0x92/0xc0 [kvm] __do_insn_fetch_bytes+0xf3/0x1e0 [kvm] x86_decode_insn+0xd1/0x1010 [kvm] x86_emulate_instruction+0x105/0x810 [kvm] __svm_skip_emulated_instruction+0xc4/0x140 [kvm_amd] handle_fastpath_invd+0xc4/0x1a0 [kvm] vcpu_run+0x11a1/0x1db0 [kvm] kvm_arch_vcpu_ioctl_run+0x5cc/0x730 [kvm] kvm_vcpu_ioctl+0x578/0x6a0 [kvm] __se_sys_ioctl+0x6d/0xb0 do_syscall_64+0x8a/0x2c0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f479d57a94b </TASK> Note, this is essentially a reapply of commit 5c30e8101e8d ("KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"), but with different justification (KVM now grabs SRCU when skipping the instruction for other reasons). Fixes: b439eb8ab578 ("Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250805190526.1453366-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: SVM: Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2	Sean Christopherson	4	-0/+9
	Emulate PERF_CNTR_GLOBAL_STATUS_SET when PerfMonV2 is enumerated to the guest, as the MSR is supposed to exist in all AMD v2 PMUs. Fixes: 4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support") Cc: stable@vger.kernel.org Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20250711172746.1579423-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-19	KVM: x86: Don't (re)check L1 intercepts when completing userspace I/O	Sean Christopherson	3	-14/+13
	When completing emulation of instruction that generated a userspace exit for I/O, don't recheck L1 intercepts as KVM has already finished that phase of instruction execution, i.e. has already committed to allowing L2 to perform I/O. If L1 (or host userspace) modifies the I/O permission bitmaps during the exit to userspace, KVM will treat the access as being intercepted despite already having emulated the I/O access. Pivot on EMULTYPE_NO_DECODE to detect that KVM is completing emulation. Of the three users of EMULTYPE_NO_DECODE, only complete_emulated_io() (the intended "recipient") can reach the code in question. gp_interception()'s use is mutually exclusive with is_guest_mode(), and complete_emulated_insn_gp() unconditionally pairs EMULTYPE_NO_DECODE with EMULTYPE_SKIP. The bad behavior was detected by a syzkaller program that toggles port I/O interception during the userspace I/O exit, ultimately resulting in a WARN on vcpu->arch.pio.count being non-zero due to KVM no completing emulation of the I/O instruction. WARNING: CPU: 23 PID: 1083 at arch/x86/kvm/x86.c:8039 emulator_pio_in_out+0x154/0x170 [kvm] Modules linked in: kvm_intel kvm irqbypass CPU: 23 UID: 1000 PID: 1083 Comm: repro Not tainted 6.16.0-rc5-c1610d2d66b1-next-vm #74 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:emulator_pio_in_out+0x154/0x170 [kvm] PKRU: 55555554 Call Trace: <TASK> kvm_fast_pio+0xd6/0x1d0 [kvm] vmx_handle_exit+0x149/0x610 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xda8/0x1ac0 [kvm] kvm_vcpu_ioctl+0x244/0x8c0 [kvm] __x64_sys_ioctl+0x8a/0xd0 do_syscall_64+0x5d/0xc60 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> Reported-by: syzbot+cc2032ba16cc2018ca25@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68790db4.a00a0220.3af5df.0020.GAE@google.com Fixes: 8a76d7f25f8f ("KVM: x86: Add x86 callback for intercept check") Cc: stable@vger.kernel.org Cc: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250715190638.1899116-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-08-17	Linux 6.17-rc2v6.17-rc2	Linus Torvalds	1	-1/+1

2025-08-15	x86/cpuid: Remove transitional <asm/cpuid.h> header	Ahmed S. Darwish	1	-8/+0
	All CPUID call sites were updated at commit: 968e30006807 ("x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header") to include <asm/cpuid/api.h> instead of <asm/cpuid.h>. The <asm/cpuid.h> header was still retained as a wrapper, just in case some new code in -next started using it. Now that everything is merged to Linus' tree, remove the header. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250815070227.19981-2-darwi@linutronix.de
2025-08-15	x86/sev: Ensure SVSM reserved fields in a page validation entry are ↵	Tom Lendacky	2	-0/+3
	initialized to zero In order to support future versions of the SVSM_CORE_PVALIDATE call, all reserved fields within a PVALIDATE entry must be set to zero as an SVSM should be ensuring all reserved fields are zero in order to support future usage of reserved areas based on the protocol version. Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Joerg Roedel <joerg.roedel@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/7cde412f8b057ea13a646fb166b1ca023f6a5031.1755098819.git.thomas.lendacky@amd.com