| Age | Commit message (Collapse) | Author | Files | Lines |
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt
Samsung DTS ARM changes for v6.18
1. Drop S3C2416 SoC from bindings, because it was removed from kernel
in 2023.
2. Add Ethernet attached via SROM controller (memory bus) on SMDK5250.
This wasn't tested, but code should work just like it is working on
Exynos5410-based boards.
* tag 'samsung-dt-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: dts: samsung: smdk5250: add sromc node
ARM: dts: samsung: exynos5250: describe sromc bank memory map
ARM: dts: samsung: exynos5410: use multiple tuples for sromc ranges
dt-bindings: arm: samsung: Drop S3C2416
Link: https://lore.kernel.org/r/20250909184559.105777-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt into soc/dt
Minor improvements in ARM64 DTS for v6.18
Add default address cells for interrupt controllers to fix dtc W=1
warnings on Amazon, APM, Socionext and Toshiba boards.
* tag 'dt64-cleanup-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt:
arm64: dts: toshiba: tmpv7708: Add default GIC address cells
arm64: dts: amazon: alpine-v3: Add default GIC address cells
arm64: dts: amazon: alpine-v2: Add default GIC address cells
arm64: dts: apm: storm: Add default GIC address cells
arm64: dts: socionext: uniphier-pxs3: Add default PCI interrup controller address cells
arm64: dts: socionext: uniphier-ld20: Add default PCI interrup controller address cells
Link: https://lore.kernel.org/r/20250909182256.102840-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into soc/dt
i2c-gpio-fixes-for-6.18
We have dedictaded bindings for scl/sda nowadays. Switch away from the
deprecated plain 'gpios' property.
* tag 'i2c-gpio-fixes-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
ARM: dts: stm32: use recent scl/sda gpio bindings
ARM: dts: cirrus: ep7211: use recent scl/sda gpio bindings
Link: https://lore.kernel.org/r/aLlgGdrFEjh26knK@shikoro
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Having a platform need a mach-* directory should be seen as a negative,
it means the platform needs special non-standard handling. ARM64 support
does not allow mach-* directories at all. While we may not get to that
given all the non-standard architectures we support, we should still try
to get as close as we can and reduce the number of mach directories.
The mach-hpe/ directory and files, provides just one "feature":
having the kernel print the machine name if the DTB does not also contain
a "model" string (which they always do). To reduce the number of mach-*
directories let's do without that feature and remove this directory.
Note, we drop the l2c_aux_mask = ~0 line, but this is safe as
the fallback GENERIC_DT machine has that as the default.
Signed-off-by: Andrew Davis <afd@ti.com>
Link: https://lore.kernel.org/r/20250813170308.290349-1-afd@ti.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt
Samsung DTS ARM64 changes for v6.18
1. Exynos850 e850 board: Enable Ethernet.
2. Exynos990: Enable watchdog and USB, add more clock controllers.
3. Exynos2200: Switch to 32-bit address space for blocks, because all
peripherals fit there. Add remaining serial engine (USI) nodes
(serial, I2C).
4. New Artpec ARTPEC-8 SoC with board. That's a design from Samsung,
sharing all basic blocks with other Samsung SoCs (busses, clock
controllers, pin controllers, PCIe, USB) and having media/video
related blocks from Axis.
Only basic support is added here: few clock controllers, pin
controller and UART.
5. Several cleanups.
* tag 'samsung-dt64-6.18' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
arm64: dts: exynos990: Enable PERIC0 and PERIC1 clock controllers
arm64: dts: axis: Add ARTPEC-8 Grizzly dts support
arm64: dts: exynos: axis: Add initial ARTPEC-8 SoC support
dt-bindings: arm: axis: Add ARTPEC-8 grizzly board
arm64: dts: exynos8895: Minor whitespace cleanup
dt-bindings: arm: Convert Axis board/soc bindings to json-schema
arm64: dts: exynos2200: Add default GIC address cells
arm64: dts: fsd: Add default GIC address cells
arm64: dts: google: gs101: Add default GIC address cells
arm64: dts: exynos5433: Add default GIC address cells
arm64: dts: exynos2200: define all usi nodes
arm64: dts: exynos2200: increase the size of all syscons
arm64: dts: exynos2200: use 32-bit address space for /soc
arm64: dts: exynos2200: fix typo in hsi2c23 bus pins label
arm64: dts: exynos990-r8s: Enable USB
arm64: dts: exynos990-c1s: Enable USB
arm64: dts: exynos990-x1s-common: Enable USB
arm64: dts: exynos990: Add USB nodes
arm64: dts: exynos990: Enable watchdog timer
arm64: dts: exynos: Add Ethernet node for E850-96 board
Link: https://lore.kernel.org/r/20250909180127.99783-4-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into soc/dt
SoCFPGA DTS updates for v6.18
- Add and enable gmac for Agilex5
* tag 'socfpga_dts_updates_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
arm64: dts: socfpga: agilex5: enable gmac2 on the Agilex5 dev kit
arm64: dts: Agilex5 Add gmac nodes to DTSI for Agilex5
Link: https://lore.kernel.org/r/20250908040718.187857-1-dinguyen@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
HDMI-CEC and -audio on RK3288-Miqi
* tag 'v6.18-rockchip-dts32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: add HDMI audio to rk3288-miqi
ARM: dts: rockchip: add CEC pinctrl to rk3288-miqi
Link: https://lore.kernel.org/r/12138356.VV5PYv0bhD@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt
New boards: FriendlyElec NanoPi Zero2, ArmSoM Sige1, Radxa ROCK 2A/2F,
HINLINK H66K / H68K .
Interesting new peripherals: I guess the most interesting one is likely
the NPU on RK3588. The rocket driver has been merged into both the DRM
tree as well as mainline Mesa.
Other stll interesting ones are DW-Displayport on RK3588, DSI on RK3576
(missing soc pwm-support to be useful on most boards), thermal support
and watchdog on RK3576.
The rest peripheral additions on a number of boards (Beelink A1,
Pine{phone,book}, rk3576-evb1-v10, Rock 5*, ...)
* tag 'v6.18-rockchip-dts64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (46 commits)
arm64: dts: rockchip: Enable DP2HDMI for ROCK 5 ITX
arm64: dts: rockchip: Enable DisplayPort for rk3588s Cool Pi 4B
arm64: dts: rockchip: Add DP1 for rk3588
arm64: dts: rockchip: Add DP0 for rk3588
arm64: dts: rockchip: Add FriendlyElec NanoPi Zero2
dt-bindings: arm: rockchip: Add FriendlyElec NanoPi Zero2
arm64: dts: rockchip: Add ArmSoM Sige1
dt-bindings: arm: rockchip: Add ArmSoM Sige1
arm64: dts: rockchip: Add Radxa ROCK 2A/2F
dt-bindings: arm: rockchip: Add Radxa ROCK 2A/2F
dt-bindings: soc: rockchip: add missing clock reference for rk3576-dcphy syscon
arm64: dts: rockchip: add USB3 on Beelink A1
arm64: dts: rockchip: add SPDIF audio to Beelink A1
arm64: dts: rockchip: add IR receiver to rk3328-roc
arm64: dts: rockchip: Further describe the WiFi for the Pinephone Pro
arm64: dts: rockchip: Further describe the WiFi for the Pinebook Pro
arm64: dts: rockchip: Enable the NPU on NanoPi R6C/R6S
arm64: dts: rockchip: enable NPU on OPI5/5B
arm64: dts: rockchip: Add Bluetooth on rk3576-evb1-v10
arm64: dts: rockchip: Add WiFi on rk3576-evb1-v10
...
Link: https://lore.kernel.org/r/5241735.C4sosBPzcN@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux into soc/dt
T-HEAD Devicetrees for v6.18
Add a device tree node for the IMG BXM-4-64 GPU present in the T-HEAD
TH1520 SoC used by the Lichee Pi 4A board. This node enables support for
the GPU using the drm/imagination driver.
By adding this node, the kernel can recognize and initialize the GPU,
providing graphics acceleration capabilities on the Lichee Pi 4A and
other boards based on the TH1520 SoC. The display controller and HDMI
output are still a work in progress.
Also included is a MAINTAINERS patch that adds an entry for the T-Head
SoC patchwork.
Signed-off-by: Drew Fustini <fustini@kernel.org>
* tag 'thead-dt-for-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/fustini/linux:
MAINTAINERS: Add RISC-V T-HEAD SoC patchwork
riscv: dts: thead: th1520: Add IMG BXM-4-64 GPU node
Link: https://lore.kernel.org/r/aLyIXR1G9DUzwGWc@x1
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
- Enable Netfilter legacy tables support,
- Drop CONFIG_IP_NF_FILTER=m, CONFIG_IP_NF_MANGLE=m,
CONFIG_IP6_NF_FILTER=m, and CONFIG_IP6_NF_MANGLE=m (auto-modular
since commit 9fce66583f06c212 ("netfilter: Exclude LEGACY TABLES on
PREEMPT_RT.")),
- Enable legacy EBTABLES support (no longer auto-selected since commit
9fce66583f06c212 ("netfilter: Exclude LEGACY TABLES on
PREEMPT_RT.")),
- Drop CONFIG_CDROM_PKTCDVD=m (removed in commit 1cea5180f2f812c4
("block: remove pktcdvd driver")),
- Move CONFIG_CRC_BENCHMARK=y (moved in commit 89a51591405e09a8
("lib/crc: Move files into lib/crc/")).
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://patch.msgid.link/012bc96a01eef989b39eedbe84591bd50c022e57.1754904412.git.geert@linux-m68k.org
|
|
The function signatures of the m68k-optimized implementations of the
find_{first,next}_{,zero_}bit() helpers do not match the generic
variants.
Fix this by changing all non-pointer inputs and outputs to "unsigned
long", and updating a few local variables.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509092305.ncd9mzaZ-lkp@intel.com/
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: "Yury Norov (NVIDIA)" <yury.norov@gmail.com>
Link: https://patch.msgid.link/de6919554fbb4cd1427155c6bafbac8a9df822c8.1757517135.git.geert@linux-m68k.org
|
|
Detect the Bhyve hypervisor and enable 15-bit MSI support if available.
Detecting Bhyve used to be a purely cosmetic issue of the kernel printing
'Hypervisor detected: Bhyve' at boot time.
But FreeBSD 15.0 will support¹ the 15-bit MSI enlightenment to support
more than 255 vCPUs (http://david.woodhou.se/ExtDestId.pdf) which means
there's now actually some functional reason to do so.
¹ https://github.com/freebsd/freebsd-src/commit/313a68ea20b4
[ bp: Massage, move tail comment ontop. ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ahmed S. Darwish <darwi@linutronix.de>
Link: https://lore.kernel.org/03802f6f7f5b5cf8c5e8adfe123c397ca8e21093.camel@infradead.org
|
|
Map the hyp text section as RO, there are no secrets there
and that allows the kernel extract info for debugging.
As in case of panic we can now dump the faulting instructions
similar to the kernel.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similar to the kernel panic, where the instruction code is printed,
we can do the same for hypervisor panics.
This patch does that only in case of “CONFIG_NVHE_EL2_DEBUG” or nvhe.
The next patch adds support for pKVM.
Also, remove the hardcoded argument dump_kernel_instr().
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Currently uprobe syscall handles all errors with forcing SIGILL to current
process. As suggested by Andrii it'd be helpful for uprobe syscall detection
to return error value for the !in_uprobe_trampoline check.
This way we could just call uprobe syscall and based on return value we will
find out if the kernel has it.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
|
|
The ns_bpf_qdisc selftest triggers a kernel panic:
Unable to handle kernel paging request at virtual address ffffffffa38dbf58
Current test_progs pgtable: 4K pagesize, 57-bit VAs, pgdp=0x00000001109cc000
[ffffffffa38dbf58] pgd=000000011fffd801, p4d=000000011fffd401, pud=000000011fffd001, pmd=0000000000000000
Oops [#1]
Modules linked in: bpf_testmod(OE) xt_conntrack nls_iso8859_1 [...] [last unloaded: bpf_testmod(OE)]
CPU: 1 UID: 0 PID: 23584 Comm: test_progs Tainted: G W OE 6.17.0-rc1-g2465bb83e0b4 #1 NONE
Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2024.01+dfsg-1ubuntu5.1 01/01/2024
epc : __qdisc_run+0x82/0x6f0
ra : __qdisc_run+0x6e/0x6f0
epc : ffffffff80bd5c7a ra : ffffffff80bd5c66 sp : ff2000000eecb550
gp : ffffffff82472098 tp : ff60000096895940 t0 : ffffffff8001f180
t1 : ffffffff801e1664 t2 : 0000000000000000 s0 : ff2000000eecb5d0
s1 : ff60000093a6a600 a0 : ffffffffa38dbee8 a1 : 0000000000000001
a2 : ff2000000eecb510 a3 : 0000000000000001 a4 : 0000000000000000
a5 : 0000000000000010 a6 : 0000000000000000 a7 : 0000000000735049
s2 : ffffffffa38dbee8 s3 : 0000000000000040 s4 : ff6000008bcda000
s5 : 0000000000000008 s6 : ff60000093a6a680 s7 : ff60000093a6a6f0
s8 : ff60000093a6a6ac s9 : ff60000093140000 s10: 0000000000000000
s11: ff2000000eecb9d0 t3 : 0000000000000000 t4 : 0000000000ff0000
t5 : 0000000000000000 t6 : ff60000093a6a8b6
status: 0000000200000120 badaddr: ffffffffa38dbf58 cause: 000000000000000d
[<ffffffff80bd5c7a>] __qdisc_run+0x82/0x6f0
[<ffffffff80b6fe58>] __dev_queue_xmit+0x4c0/0x1128
[<ffffffff80b80ae0>] neigh_resolve_output+0xd0/0x170
[<ffffffff80d2daf6>] ip6_finish_output2+0x226/0x6c8
[<ffffffff80d31254>] ip6_finish_output+0x10c/0x2a0
[<ffffffff80d31446>] ip6_output+0x5e/0x178
[<ffffffff80d2e232>] ip6_xmit+0x29a/0x608
[<ffffffff80d6f4c6>] inet6_csk_xmit+0xe6/0x140
[<ffffffff80c985e4>] __tcp_transmit_skb+0x45c/0xaa8
[<ffffffff80c995fe>] tcp_connect+0x9ce/0xd10
[<ffffffff80d66524>] tcp_v6_connect+0x4ac/0x5e8
[<ffffffff80cc19b8>] __inet_stream_connect+0xd8/0x318
[<ffffffff80cc1c36>] inet_stream_connect+0x3e/0x68
[<ffffffff80b42b20>] __sys_connect_file+0x50/0x88
[<ffffffff80b42bee>] __sys_connect+0x96/0xc8
[<ffffffff80b42c40>] __riscv_sys_connect+0x20/0x30
[<ffffffff80e5bcae>] do_trap_ecall_u+0x256/0x378
[<ffffffff80e69af2>] handle_exception+0x14a/0x156
Code: 892a 0363 1205 489c 8bc1 c7e5 2d03 084a 2703 080a (2783) 0709
---[ end trace 0000000000000000 ]---
The bpf_fifo_dequeue prog returns a skb which is a pointer. The pointer
is treated as a 32bit value and sign extend to 64bit in epilogue. This
behavior is right for most bpf prog types but wrong for struct ops which
requires RISC-V ABI.
So let's sign extend struct ops return values according to the function
model and RISC-V ABI([0]).
[0]: https://riscv.org/wp-content/uploads/2024/12/riscv-calling.pdf
Fixes: 25ad10658dc1 ("riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/bpf/20250908012448.1695-1-hengqi.chen@gmail.com
|
|
The bpf_flush_icache() is done by bpf_arch_text_copy() already.
Remove the duplicated one in arch_prepare_bpf_trampoline().
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/bpf/20250904105119.21861-1-hengqi.chen@gmail.com
|
|
The logic for allocating ppc64_stub_entry trampolines in the .stubs
section relies on an inline sentinel, where a NULL .funcdata member
indicates an available slot.
While preceding commits fixed the initialization bugs that led to ftrace
stub corruption, the sentinel-based approach remains fragile: it depends
on an implicit convention between subsystems modifying different
struct types in the same memory area.
Replace the sentinel with an explicit counter, module->arch.num_stubs.
Instead of iterating through memory to find a NULL marker, the module
loader uses this counter as the boundary for the next free slot.
This simplifies the allocation code, hardens it against future changes
to stub structures, and removes the need for an extra relocation slot
previously reserved to terminate the sentinel search.
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-4-joe.lawrence@redhat.com
|
|
CONFIG_PPC_FTRACE_OUT_OF_LINE introduced setup_ftrace_ool_stubs() to
extend the ppc64le module .stubs section with an array of
ftrace_ool_stub structures for each patchable function.
Fix its ppc64_stub_entry stub reservation loop to properly write across
all of the num_stubs used and not just the first entry.
Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line")
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-3-joe.lawrence@redhat.com
|
|
When an ftrace call site is converted to a NOP, its corresponding
dyn_ftrace record should have its ftrace_ops pointer set to
ftrace_nop_ops.
Correct the powerpc implementation to ensure the
ftrace_rec_set_nop_ops() helper is called on all successful NOP
initialization paths. This ensures all ftrace records are consistent
before being handled by the ftrace core.
Fixes: eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line")
Suggested-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250912142740.3581368-2-joe.lawrence@redhat.com
|
|
Configure mbm_event mode on AMD platforms. On AMD platforms, it is
recommended to use the mbm_event mode, if supported, to prevent the
hardware from resetting counters between reads. This can result in
misleading values or display "Unavailable" if no counter is assigned
to the event.
Enable mbm_event mode, known as ABMC (Assignable Bandwidth Monitoring
Counters) on AMD, by default if the system supports it.
Update ABMC across all logical processors within the resctrl domain to
ensure proper functionality.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
When Linux is booted at EL1, host_data_ptr() resolves to the nVHE
hypervisor's copy of host data. When hyp mode isn't available for
KVM the nVHE percpu bases remain uninitialized. Consequently, any usage
of host_data_ptr() will result in a NULL dereference which has been
observed in KVM's trace filtering helpers.
Add an early return to the trace filtering helpers if KVM isn't
initialized, avoiding the NULL dereference. Take this opportunity
to move the TRBE-skipping checks to a common helper.
Fixes: 054b88391bbe2 ("KVM: arm64: Support trace filtering for guests")
Signed-off-by: Yingchao Deng <yingchao.deng@oss.qualcomm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
[maz: repainted the helpers to be readable, and the commit message
with Oliver's suggestion]
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
System software reads resctrl event data for a particular resource by writing
the RMID and Event Identifier (EvtID) to the QM_EVTSEL register and then
reading the event data from the QM_CTR register.
In ABMC mode, the event data of a specific counter ID is read by setting the
following fields: QM_EVTSEL.ExtendedEvtID = 1, QM_EVTSEL.EvtID = L3CacheABMC
(=1) and setting QM_EVTSEL.RMID to the desired counter ID. Reading the QM_CTR
then returns the contents of the specified counter ID. RMID_VAL_ERROR bit is
set if the counter configuration is invalid, or if an invalid counter ID is
set in the QM_EVTSEL.RMID field. RMID_VAL_UNAVAIL bit is set if the counter
data is unavailable.
Introduce resctrl_arch_reset_cntr() and resctrl_arch_cntr_read() to reset and
read event data for a specific counter.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
resctrl_arch_rmid_read() adjusts the value obtained from MSR_IA32_QM_CTR to
account for the overflow for MBM events and apply counter scaling for all the
events. This logic is common to both reading an RMID and reading a hardware
counter directly.
Refactor the hardware value adjustment logic into get_corrected_val() to
prepare for support of reading a hardware counter.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
with ABMC
The ABMC feature allows users to assign a hardware counter to an RMID,
event pair and monitor bandwidth usage as long as it is assigned. The
hardware continues to track the assigned counter until it is explicitly
unassigned by the user.
Implement an x86 architecture-specific handler to configure a counter. This
architecture specific handler is called by resctrl fs when a counter is
assigned or unassigned as well as when an already assigned counter's
configuration should be updated. Configure counters by writing to the
L3_QOS_ABMC_CFG MSR, specifying the counter ID, bandwidth source (RMID),
and event configuration.
The ABMC feature details are documented in APM [1] available from [2].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
|
|
The ABMC feature allows users to assign a hardware counter to an RMID,
event pair and monitor bandwidth usage as long as it is assigned. The
hardware continues to track the assigned counter until it is explicitly
unassigned by the user.
The ABMC feature implements an MSR L3_QOS_ABMC_CFG (C000_03FDh).
ABMC counter assignment is done by setting the counter id, bandwidth
source (RMID) and bandwidth configuration.
Attempts to read or write the MSR when ABMC is not enabled will result
in a #GP(0) exception.
Introduce the data structures and definitions for MSR L3_QOS_ABMC_CFG
(0xC000_03FDh):
=========================================================================
Bits Mnemonic Description Access Reset
Type Value
=========================================================================
63 CfgEn Configuration Enable R/W 0
62 CtrEn Enable/disable counting R/W 0
61:53 – Reserved MBZ 0
52:48 CtrID Counter Identifier R/W 0
47 IsCOS BwSrc field is a CLOSID R/W 0
(not an RMID)
46:44 – Reserved MBZ 0
43:32 BwSrc Bandwidth Source R/W 0
(RMID or CLOSID)
31:0 BwType Bandwidth configuration R/W 0
tracked by the CtrID
==========================================================================
The ABMC feature details are documented in APM [1] available from [2].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
[ bp: Touchups. ]
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
|
|
Add the functionality to enable/disable the AMD ABMC feature.
The AMD ABMC feature is enabled by setting enabled bit(0) in the
L3_QOS_EXT_CFG MSR. When the state of ABMC is changed, the MSR needs to be
updated on all the logical processors in the QOS Domain.
Hardware counters will reset when ABMC state is changed.
[ bp: Massage commit message. ]
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
|
|
ABMC feature details are reported via CPUID Fn8000_0020_EBX_x5.
Bits Description
15:0 MAX_ABMC Maximum Supported Assignable Bandwidth
Monitoring Counter ID + 1
The ABMC feature details are documented in APM [1] available from [2].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
Detect the feature and number of assignable counters supported. For backward
compatibility, upon detecting the assignable counter feature, enable the
mbm_total_bytes and mbm_local_bytes events that users are familiar with as
part of original L3 MBM support.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
|
|
The cache allocation and memory bandwidth allocation feature properties are
consolidated into struct resctrl_cache and struct resctrl_membw respectively.
In preparation for more monitoring properties that will clobber the existing
resource struct more, re-organize the monitoring specific properties to also
be in a separate structure.
Also convert "bandwidth sources" terminology to "memory transactions" to have
consistency within resctrl for related monitoring features.
[ bp: Massage commit message. ]
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
Add a kernel command-line parameter to enable or disable the exposure of
the ABMC (Assignable Bandwidth Monitoring Counters) hardware feature to
resctrl.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
Users can create as many monitor groups as RMIDs supported by the hardware.
However, the bandwidth monitoring feature on AMD only guarantees that RMIDs
currently assigned to a processor will be tracked by hardware. The counters of
any other RMIDs which are no longer being tracked will be reset to zero.
The MBM event counters return "Unavailable" for the RMIDs that are not tracked
by hardware. So, there can be only limited number of groups that can give
guaranteed monitoring numbers. With ever changing configurations there is no
way to definitely know which of these groups are being tracked during
a particular time. Users do not have the option to monitor a group or set of
groups for a certain period of time without worrying about RMID being reset in
between.
The ABMC feature allows users to assign a hardware counter to an RMID, event
pair and monitor bandwidth usage as long as it is assigned. The hardware
continues to track the assigned counter until it is explicitly unassigned by
the user. There is no need to worry about counters being reset during this
period. Additionally, the user can specify the type of memory transactions
(e.g., reads, writes) for the counter to track.
Without ABMC enabled, monitoring will work in current mode without assignment
option.
The Linux resctrl subsystem provides an interface that allows monitoring of up
to two memory bandwidth events per group, selected from a combination of
available total and local events. When ABMC is enabled, two events will be
assigned to each group by default, in line with the current interface design.
Users will also have the option to configure which types of memory
transactions are counted by these events.
Due to the limited number of available counters (32), users may quickly
exhaust the available counters. If the system runs out of assignable ABMC
counters, the kernel will report an error. In such cases, users will need to
unassign one or more active counters to free up counters for new assignments.
resctrl will provide options to assign or unassign events through the
group-specific interface file.
The feature is detected via CPUID_Fn80000020_EBX_x00 bit 5: ABMC (Assignable
Bandwidth Monitoring Counters).
The ABMC feature details are documented in APM [1] available from [2]. [1]
AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC).
[ bp: Massage commit message, fixup enumeration due to VMSCAPE ]
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
|
|
|
|
There's a rule in computer programming that objects appear zero, once, or many
times. So code accordingly.
There are two MBM events and resctrl is coded with a lot of
if (local)
do one thing
if (total)
do a different thing
Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays of
pointers to per event data instead of explicit fields for total and local
bandwidth.
Simplify by coding for many events using loops on which are enabled.
Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more
widely. Also provide a for_each_mbm_event_id() helper macro.
Cleanup variable names in functions touched to consistently use "eventid" for
those with type enum resctrl_event_id.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
rdt_mon_features is used as a bitmask of enabled monitor events. A monitor
event's status is now maintained in mon_evt::enabled with all monitor events'
mon_evt structures found in the filesystem's mon_event_all[] array.
Remove the remaining uses of rdt_mon_features.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
This ensures that, if a VCPU has "observed" that an IO registration has
occurred, the instruction currently being trapped or emulated will also
observe the IO registration.
At the same time, enforce that kvm_get_bus() is used only on the
update side, ensuring that a long-term reference cannot be obtained by
an SRCU reader.
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation to remove synchronize_srcu() from MMIO registration,
remove the distributor's dependency on this implicit barrier by
direct acquire-release synchronization on the flag write and its
lock-free check.
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
It is now used only within kvm_vgic_map_resources(). vgic_dist::ready
is already written directly by this function, so it is clearer to
bypass the macro for reads as well.
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The resctrl file system now has complete knowledge of the status of every
event. So there is no need for per-event function calls to check.
Replace each of the resctrl_arch_is_{event}enabled() calls with
resctrl_is_mon_event_enabled(QOS_{EVENT}).
No functional change.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
* kvm-arm64/pkvm_vm_handle:
: pKVM VM handle allocation fixes, courtesy of Fuad Tabba.
:
: From the cover letter (20250909072437.4110547-1-tabba@google.com):
:
: "In pKVM, this handle is allocated when the VM is initialized at the
: hypervisor, which is on the first vCPU run. However, the host starts
: initializing the VM and setting up its data structures earlier. MMU
: notifiers for the VMs are also registered before VM initialization at
: the hypervisor, and rely on the handle to identify the VM.
:
: Therefore, there is a potential gap between when the VM is (partially)
: setup at the host, but still without a valid pKVM handle to identify it
: when communicating with the hypervisor."
KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm()
KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization
KVM: arm64: Consolidate pKVM hypervisor VM initialization logic
KVM: arm64: Separate allocation and insertion of pKVM VM table entries
KVM: arm64: Decouple hyp VM creation state from its handle
KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs
KVM: arm64: Rename 'host_kvm' to 'kvm' in pKVM host code
KVM: arm64: Rename pkvm.enabled to pkvm.is_protected
KVM: arm64: Add build-time check for duplicate DECLARE_REG use
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When a pKVM guest is active, TLB invalidations triggered by host MMU
notifiers require a valid hypervisor handle. Currently, this handle is
only allocated when the first vCPU is run.
However, the guest's memory is associated with the host MMU much
earlier, during kvm_arch_init_vm(). This creates a window where an MMU
invalidation could occur after the kvm_pgtable pointer checked by the
notifiers is set but before the pKVM handle has been created.
Fix this by reserving the pKVM handle when the host VM is first set up.
Move the call to the __pkvm_reserve_vm hypercall from the first-vCPU-run
path into pkvm_init_host_vm(), which is called during initial VM setup.
This ensures the handle is available before any subsystem can trigger an
MMU notification for the VM.
The VM destruction path is updated to call __pkvm_unreserve_vm for cases
where a VM was reserved but never fully created at the hypervisor,
ensuring the handle is properly released.
This fix leverages the two-stage reservation/initialization hypercall
interface introduced in preceding patches.
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
initialization
The existing __pkvm_init_vm hypercall performs both the reservation of a
VM table entry and the initialization of the hypervisor VM state in a
single operation. This design prevents the host from obtaining a VM
handle from the hypervisor until all preparation for the creation and
the initialization of the VM is done, which is on the first vCPU run
operation.
To support more flexible VM lifecycle management, the host needs the
ability to reserve a handle early, before the first vCPU run.
Refactor the hypercall interface to enable this, splitting the single
hypercall into a two-stage process:
- __pkvm_reserve_vm: A new hypercall that allocates a slot in the
hypervisor's vm_table, marks it as reserved, and returns a unique
handle to the host.
- __pkvm_unreserve_vm: A corresponding cleanup hypercall to safely
release the reservation if the host fails to proceed with full
initialization.
- __pkvm_init_vm: The existing hypercall is modified to no longer
allocate a slot. It now expects a pre-reserved handle and commits the
donated VM memory to that slot.
For now, the host-side code in __pkvm_create_hyp_vm calls the new
reserve and init hypercalls back-to-back to maintain existing behavior.
This paves the way for subsequent patches to separate the reservation
and initialization steps in the VM's lifecycle.
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The insert_vm_table_entry() function was performing tasks beyond its
primary responsibility. In addition to inserting a VM pointer into the
vm_table, it was also initializing several fields within 'struct
pkvm_hyp_vm', such as the VMID and stage-2 MMU pointers. This mixing of
concerns made the code harder to follow.
As another preparatory step towards allowing a VM table entry to be
reserved before the VM is fully created, this logic must be cleaned up.
By separating table insertion from state initialization, we can control
the timing of the initialization step more precisely in subsequent
patches.
Refactor the code to consolidate all initialization logic into
init_pkvm_hyp_vm():
- Move the initialization of the handle, VMID, and MMU fields from
insert_vm_table_entry() to init_pkvm_hyp_vm().
- Simplify insert_vm_table_entry() to perform only one action: placing
the provided pkvm_hyp_vm pointer into the vm_table.
- Update the calling sequence in __pkvm_init_vm() to first allocate an
entry in the VM table, initialize the VM, and then insert the VM into
the VM table. This is all protected by the vm_table_lock for now.
Subsequent patches will adjust the sequence and not hold the
vm_table_lock while initializing the VM at the hypervisor
(init_pkvm_hyp_vm()).
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The current insert_vm_table_entry() function performs two actions at
once: it finds a free slot in the pKVM VM table and populates it with
the pkvm_hyp_vm pointer.
Refactor this function as a preparatory step for future work that will
require reserving a VM slot and its corresponding handle earlier in the
VM lifecycle, before the pkvm_hyp_vm structure is initialized and ready
to be inserted.
Split the function into a two-phase process:
- A new allocate_vm_table_entry() function finds an empty slot, marks it
as reserved with a RESERVED_ENTRY placeholder, and returns a handle
derived from the slot's index.
- The insert_vm_table_entry() function is repurposed to take the handle,
validate that the corresponding slot is in the reserved state, and
then populate it with the pkvm_hyp_vm pointer.
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Currently, the presence of a pKVM handle (pkvm.handle != 0) is used to
determine if the corresponding hypervisor (EL2) VM has been created and
initialized. This couples the handle's lifecycle with the VM's creation
state.
This coupling will become problematic with upcoming changes that will
allocate the pKVM handle earlier in the VM's life, before the VM is
instantiated at the hypervisor.
To prepare for this and make the state tracking explicit, decouple the
two concepts. Introduce a new boolean flag, 'pkvm.is_created', to track
whether the hypervisor-side VM has been created and initialized.
A new helper, pkvm_hyp_vm_is_created(), is added to check this flag. All
call sites that previously checked for the handle's existence are
converted to use the new, explicit check. The 'is_created' flag is set
to true upon successful creation in the hypervisor (EL2) and cleared
upon destruction.
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The hypervisor code for protected KVM contains comments that are
imprecise and at times flat-out wrong. They often refer to a "protected
VM" in contexts where the code or data structure applies to _any_ VM
managed by the hypervisor when pKVM is enabled.
For instance, the 'vm_table' holds handles for all VMs known to the
hypervisor, not exclusively for those that are configured as protected.
This inaccurate terminology can make the code scope harder to understand
for future (and current) developers.
Clarify the comments throughout the pKVM hypervisor code to make a clear
distinction between the pKVM feature itself (i.e., "protected mode") and
the VMs that are specifically configured to be protected. This involves
replacing ambiguous uses of "protected VM" with more accurate phrasing.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In hypervisor (EL2) code, it is important to distinguish between the
host's 'struct kvm' and a protected VM's 'struct kvm'. Using 'host_kvm'
as variable name in that context makes this distinction clear.
However, in the host kernel code (EL1), there is no such ambiguity. The
code is only ever concerned with the host's own 'struct kvm' instance.
The 'host_' prefix is therefore redundant and adds unnecessary
verbosity.
Simplify the code by renaming the 'host_kvm' parameter to 'kvm' in all
functions within host-side kernel code (EL1). This improves readability
and makes the naming consistent with other host-side kernel code.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The 'pkvm.enabled' field in struct kvm_protected_vm is confusingly
named. Its purpose is to indicate whether a VM is a _protected_ VM under
pKVM, and not whether the VM itself is enabled or running.
For a non-protected VM, the VM can be fully active and running, yet this
field would be false. This ambiguity can lead to incorrect assumptions
about the VM's operational state and makes the code harder to reason
about.
Rename the field to 'is_protected' to make it unambiguous that the flag
tracks the protected status of the VM.
No functional change intended.
Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The DECLARE_REG() macro provides a convenient way to create a local
variable initialized from a cpu context in the hyp trap handlers.
However, a common error is to use the macro multiple times in the same
scope with the same register index, but for different logical purposes.
This results in valid C code that compiles without error, but introduces
subtle bugs where a developer expects two different variables to hold
values from two different registers, when in fact they are both sourced
from the same one.
To prevent this entire class of bugs, modify the DECLARE_REG() macro
to declare a dummy variable whose name is derived from the register
index. If the macro is used again with the same index in the same
scope, the compiler will fail with a "redeclaration of variable"
error, turning a subtle runtime bug into an obvious build-time failure.
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
There are currently only three monitor events, all associated with the
RDT_RESOURCE_L3 resource. Growing support for additional events will be easier
with some restructuring to have a single point in file system code where all
attributes of all events are defined.
Place all event descriptions into an array mon_event_all[]. Doing this has the
beneficial side effect of removing the need for rdt_resource::evt_list.
Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range checks for
event ids and as the starting index to scan mon_event_all[].
Drop the code that builds evt_list and change the two places where the list is
scanned to scan mon_event_all[] instead using a new helper macro
for_each_mon_event().
Architecture code now informs file system code which events are available with
resctrl_enable_mon_event().
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/cover.1757108044.git.babu.moger@amd.com
|
|
When running with transparent huge pages and CONFIG_NVHE_EL2_DEBUG then
the debug checking in assert_host_shared_guest() fails on the launch of an
np-guest. This WARN_ON() causes a panic and generates the stack below.
In __pkvm_host_relax_perms_guest() the debug checking assumes the mapping
is a single page but it may be a block map. Update the checking so that
the size is not checked and just assumes the correct size.
While we're here make the same fix in __pkvm_host_mkyoung_guest().
Info: # lkvm run -k /share/arch/arm64/boot/Image -m 704 -c 8 --name guest-128
Info: Removed ghost socket file "/.lkvm//guest-128.sock".
[ 1406.521757] kvm [141]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:1088!
[ 1406.521804] kvm [141]: nVHE call trace:
[ 1406.521828] kvm [141]: [<ffff8000811676b4>] __kvm_nvhe_hyp_panic+0xb4/0xe8
[ 1406.521946] kvm [141]: [<ffff80008116d12c>] __kvm_nvhe_assert_host_shared_guest+0xb0/0x10c
[ 1406.522049] kvm [141]: [<ffff80008116f068>] __kvm_nvhe___pkvm_host_relax_perms_guest+0x48/0x104
[ 1406.522157] kvm [141]: [<ffff800081169df8>] __kvm_nvhe_handle___pkvm_host_relax_perms_guest+0x64/0x7c
[ 1406.522250] kvm [141]: [<ffff800081169f0c>] __kvm_nvhe_handle_trap+0x8c/0x1a8
[ 1406.522333] kvm [141]: [<ffff8000811680fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
[ 1406.522454] kvm [141]: ---[ end nVHE call trace ]---
[ 1406.522477] kvm [141]: Hyp Offset: 0xfffece8013600000
[ 1406.522554] Kernel panic - not syncing: HYP panic:
[ 1406.522554] PS:834003c9 PC:0000b1806db6d170 ESR:00000000f2000800
[ 1406.522554] FAR:ffff8000804be420 HPFAR:0000000000804be0 PAR:0000000000000000
[ 1406.522554] VCPU:0000000000000000
[ 1406.523337] CPU: 3 UID: 0 PID: 141 Comm: kvm-vcpu-0 Not tainted 6.16.0-rc7 #97 PREEMPT
[ 1406.523485] Hardware name: FVP Base RevC (DT)
[ 1406.523566] Call trace:
[ 1406.523629] show_stack+0x18/0x24 (C)
[ 1406.523753] dump_stack_lvl+0xd4/0x108
[ 1406.523899] dump_stack+0x18/0x24
[ 1406.524040] panic+0x3d8/0x448
[ 1406.524184] nvhe_hyp_panic_handler+0x10c/0x23c
[ 1406.524325] kvm_handle_guest_abort+0x68c/0x109c
[ 1406.524500] handle_exit+0x60/0x17c
[ 1406.524630] kvm_arch_vcpu_ioctl_run+0x2e0/0x8c0
[ 1406.524794] kvm_vcpu_ioctl+0x1a8/0x9cc
[ 1406.524919] __arm64_sys_ioctl+0xac/0x104
[ 1406.525067] invoke_syscall+0x48/0x10c
[ 1406.525189] el0_svc_common.constprop.0+0x40/0xe0
[ 1406.525322] do_el0_svc+0x1c/0x28
[ 1406.525441] el0_svc+0x38/0x120
[ 1406.525588] el0t_64_sync_handler+0x10c/0x138
[ 1406.525750] el0t_64_sync+0x1ac/0x1b0
[ 1406.525876] SMP: stopping secondary CPUs
[ 1406.525965] Kernel Offset: disabled
[ 1406.526032] CPU features: 0x0000,00000080,8e134ca1,9446773f
[ 1406.526130] Memory Limit: none
[ 1406.959099] ---[ end Kernel panic - not syncing: HYP panic:
[ 1406.959099] PS:834003c9 PC:0000b1806db6d170 ESR:00000000f2000800
[ 1406.959099] FAR:ffff8000804be420 HPFAR:0000000000804be0 PAR:0000000000000000
[ 1406.959099] VCPU:0000000000000000 ]
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Fixes: f28f1d02f4eaa ("KVM: arm64: Add a range to __pkvm_host_unshare_guest()")
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|