| Age | Commit message (Collapse) | Author | Files | Lines |
|
Our fault injection mechanism is mildly primitive, and doesn't
really implement the architecture when it comes to reporting
the level of a failing S1 PTW (we blindly report a SEA outside
of a PTW).
Now that we can walk the S1 page tables and look for a particular
IPA in the descriptors, it is pretty easy to improve the SEA
injection code.
Note that we only do it for AArch64 guests, and that 32bit guests
are left to their own device (oddly enough, I don't fancy writing
a 32bit PTW...).
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Use the filtering hook infrastructure to implement a new walker
that, for a given VA and an IPA, returns the level of the first
occurence of this IPA in the walk from that VA.
This will be used to improve our SEA syndrome reporting.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a filtering hook that can get called on each level of the
walk, and providing access to the full state.
Crucially, this is called *before* the access is made, so that
it is possible to track down the level of a faulting access.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
If calling into the AT code from guest EL1, there is no need
to consider any context switch, as we are guaranteed to be
in the correct context.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As we are about to plug the SW PTW into the EL1-only code, we can
no longer assume that the EL1 state is not resident on the CPU,
as we don't necessarily get there from EL2 traps.
Turn the __vcpu_sys_reg() access on the EL1 state into calls to
the vcpu_read_sys_reg() helper, which is guaranteed to do the
right thing.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As we are about to use the S1 PTW in non-NV contexts, we must make
sure that we don't evaluate the EL2 state when dealing with the EL1&0
translation regime.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Translation faults from TTBR must be reported on the start level,
and not level-0. Enforcing this requires moving quite a lot of
code around so that the start level can be computed early enough
that it is usable.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
With 52bit PAs, block mappings can exist at different levels (such
as level 0 for 4kB pages, or level 1 for 16kB and 64kB pages).
Account for this in walk_s1().
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Expand the output address populated in PAR_EL1 to 52bit addresses.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
LPA2 gets the memory access shareability from TCR_ELx instead of
getting it form the descriptors. Store it in the walk info struct
so that it is passed around and evaluated as required.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Instead of just passing the translation regime, pass the full
walk_info structure to compute_par_s1(). This will help further
chamges that will require it.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a helper converting the descriptor into a nicely formed OA,
irrespective of the in-descriptor representation (< 52bit, LPA
or LPA2).
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
52bit addresses from TTBR need extra adjustment and alignment
checks. Implement the requirements of the architecture.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Adjust the computation of the max OA to account for 52bit PAs.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Track whether the guest is using 52bit PAs, either LPA or LPA2.
This further simplifies the handling of LVA for 4k and 16k pages,
as LPA2 implies LVA in this case.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The changes to the debug architecture up to v8.8 are concerned with
external debug, which of course has no direct impact on VMs. Raise the
feature limit and document what's preventing us from raising it further.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
While KVM does not expose IMPDEF features to VMs, FEAT_TIDCP1 is an
architecturally-defined EL1 trap of a particular sysreg encoding range.
Furthermore, KVM already advertises this feature to non-NV VMs.
As there is no interaction with EL2 traps, expose the feature.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
FEAT_SpecSEI is an informational feature describing whether speculative
loads may generate SErrors. Since there are already cases where KVM
reinjects an SError into the VM it is already possible this may happen
due to a speculative load within the VM.
Stop hiding the feature from NV-enabled VMs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM now handles HCR_EL2.{TWEDEn,TWEDEL} correctly when computing the
effective HCR for a nested context. Advertise the feature.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Ignore the guest hypervisor's configured TWE delay if it hasn't actually
requested WFE traps. Otherwise, OR'ing these fields into the effective
HCR when the guest sets TWE is safe as KVM doesn't use FEAT_TWED and
leaves the fields initialized to 0.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
FEAT_AFP doesn't intersect with any EL2 trap behavior, expose to
NV-enabled VMs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The exact wording of the restrictions on branch prediction due to
FEAT_ECBHB in DDI0487L.b is as follows:
When FEAT_ECBHB is implemented, the branch history information created
in a context before an exception to a higher Exception level using
AArch64 cannot be used by code before that exception to exploitatively
control the execution of any indirect branches in code in a different
context after the exception.
While vEL2 and EL1 are multiplexed at EL1, they exist in different
hardware-described contexts as KVM uses different stage-2 MMUs to
represent the corresponding translation regimes. Additionally, exception
entries into vEL2 always imply a hardware exception entry into literal EL2
for the emulated regime change.
Given all of this, and the fact that FEAT_ECBHB places no limitation on
the EL of the protected context after the exception, we can claim
FEAT_ECBHB on supporting hardware.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM already supports FEAT_RASv1p1 for NV-enabled VMs but only when
advertised through the canonical field. Stop masking the silly frac
field to expose the feature on systems without FEAT_DF.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The supporting infrastructure in KVM's abort injection code was merged a
while ago, but the author (me!) forgot to relax the NV limitation when
FEAT_DF2 got exposed to non-NV VMs. Fix it.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature
fields where a non-negative value implies that a feature is implemented
and a negative value implies that it is not. While the intention of
masking this field was likely to hide the feature, KVM actually
advertises it, even on unsupporting hardware.
Remove FEAT_DoubleLock from the mask, making the NI value visible to the
VM. Take care to accept the old, incorrect values for this field as
we've lied to userspace.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Consistently use denylisting of features such that the limitations of
KVM's nested implementation are explicitly documented (rather than
implied).
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Assert that the EL2 features {HCX, TWED} of ID_AA64MMFR1_EL1 are writable
from userspace. They are only allowed to be downgraded in userspace.
Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Allow userspace to downgrade {HCX, TWED} in ID_AA64MMFR1_EL1. Userspace can
only change the value from high to low.
Signed-off-by: Jinqian Yang <yangjinqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
While MDCR_EL2 cannot be RES0, convert it to the same infrastructure
anyway, as it make things cleaner.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
While SCTLR_EL1 cannot be RES0, convert it to the same infrastructure
anyway, as it make things cleaner.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Enforce that TCR2_EL2 are RES0 when FEAT_TCR2 isn't present.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Enforce that SCTLR2_EL{1,2} are RES0 when FEAT_SCTLR2 isn't present.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
While HCR_EL2 is unlikely to ever be RES0 (at least when NV is on),
but consistency doesn't hurt, and it can be described in the same
way as the other registers.
Convert it over to the new RES0-computing infrastructure.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add the dependency between the HCRX_EL2 register and FEAT_HCX.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similarly to the FEAT_FGT registers, add the dependency between
the registers and the controlling feature.
WHile we're at it, add the missing checks for the RES0 vs valid
bit overlap.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As we want to enforce FGT registers behaving as RES0 when FEAT_FGT
is not exposed to the guest, We move a bumch of things that are
so far passed as parameter into a structure that points to the
bit description.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
struct reg_bits_to_feat_map is great to describe bit-to-feature
dependency, but not so much to describe register-to-feature
dependency. Yet both need to exist.
Add a new reg_feat_map_desc structure to describe this.
Extra complexity is added by the need to source the RES0 bits from
the runtime-computed FGT masks, for which we need an extra flag
and extra complexity. Oh well.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Turns out I'm rather bad at noticing that the description of features
has already been added. Remove superflusous definitions for SYSREG128
and MTE2.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM needs to ensure the guest hypervisor's traps take effect when the
vCPU is in a nested context. While supporting infrastructure is in place
for most of the EL2 trap registers, MDCR_EL2 is not.
Fold the guest's trap configuration into the effective MDCR_EL2. Apply
it directly to the in-memory representation as it gets recomputed on
every vcpu_load() anyway.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In case you haven't realized it yet, the architecture is _slightly_
broken in the context of nested virt. Here we have another example of
FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually
affects execution at vEL2.
Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this
mess at the expense of unnecessarily trapping the breakpoint/watchpoint
registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for
obvious correctness to start.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The presence of FEAT_GCIE_LEGACY is now handled as a CPU
feature. Therefore, drop the check and flag from the GIC driver and
gic_kvm_info as it is no longer required or used by KVM.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The previous implementation of the probing function had the flaw that
it wouldn't catch mismatched CPU features. Specifically, GICv5 legacy
support (support for GICv3 VMs on a GICv5 host) was being enabled as
long as the initial boot CPU had support for the feature. This allowed
the support to become enabled on mismatched configurations.
Move to using cpus_have_final_cap(ARM64_HAS_GICV5_LEGACY) instead,
which only returns true when all booted CPUs support
FEAT_GCIE_LEGACY. A byproduct of this is that it ensures that late
onlining of CPUs is blocked on feature mismatch.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Implement the GCIE_LEGACY capability as a system feature to be able to
check for support from KVM. The type is explicitly
ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, which means that the capability
is enabled early if all boot CPUs support it. Additionally, if this
capability is enabled during boot, it prevents late onlining of CPUs
that lack it, thereby avoiding potential mismatched configurations
which would break KVM.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Extend the NV check to pass for a GICv5 host that has
FEAT_GCIE_LEGACY. The has_gcie_v3_compat flag is only set on GICv5
hosts (that explicitly support FEAT_GCIE_LEGACY), and hence the
explicit check for a VGIC_V5 is omitted.
As of this change, vGICv3-based VMs can run with nested on a
compatible GICv5 host.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We currently access ICC_SRE_EL2 at each load/put on VHE, and on each
entry/exit on nVHE. Both are quite onerous on NV, as this register
always traps.
We do this to make sure the EL1 guest doesn't flip between v2 and v3
behind our back. But all modern implementations have dropped v2,
and this is just overhead.
At the same time, the GICv5 spec has been fixed to allow access to
ICC_SRE_EL2 in legacy mode. Use this opportunity to replace the
GICv5 checks for v2 compat checks, with an ad-hoc static key.
Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Map the hyp text section as RO, there are no secrets there
and that allows the kernel extract info for debugging.
As in case of panic we can now dump the faulting instructions
similar to the kernel.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similar to the kernel panic, where the instruction code is printed,
we can do the same for hypervisor panics.
This patch does that only in case of “CONFIG_NVHE_EL2_DEBUG” or nvhe.
The next patch adds support for pKVM.
Also, remove the hardcoded argument dump_kernel_instr().
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Device MMIO registration may happen quite frequently during VM boot,
and the SRCU synchronization each time has a measurable effect
on VM startup time. In our experiments it can account for around 25%
of a VM's startup time.
Replace the synchronization with a deferred free of the old kvm_io_bus
structure.
Tested-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
This ensures that, if a VCPU has "observed" that an IO registration has
occurred, the instruction currently being trapped or emulated will also
observe the IO registration.
At the same time, enforce that kvm_get_bus() is used only on the
update side, ensuring that a long-term reference cannot be obtained by
an SRCU reader.
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation to remove synchronize_srcu() from MMIO registration,
remove the distributor's dependency on this implicit barrier by
direct acquire-release synchronization on the flag write and its
lock-free check.
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|