| Age | Commit message (Collapse) | Author | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix SEV guest boot failures in certain circumstances, due to
very early code relying on a BSS-zeroed variable that isn't
actually zeroed yet an may contain non-zero bootup values
Move the variable into the .data section go gain even earlier
zeroing
- Expose & allow the IBPB-on-Entry feature on SNP guests, which
was not properly exposed to guests due to initial implementational
caution
- Fix O= build failure when CONFIG_EFI_SBAT_FILE is using relative
file paths
- Fix the various SNC (Sub-NUMA Clustering) topology enumeration
bugs/artifacts (sched-domain build errors mostly).
SNC enumeration data got more complicated with Granite Rapids X
(GNR) and Clearwater Forest X (CWF), which exposed these bugs
and made their effects more serious
- Also use the now sane(r) SNC code to fix resctrl SNC detection bugs
- Work around a historic libgcc unwinder bug in the vdso32 sigreturn
code (again), which regressed during an overly aggressive recent
cleanup of DWARF annotations
* tag 'x86-urgent-2026-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/vdso32: Work around libgcc unwinder bug
x86/resctrl: Fix SNC detection
x86/topo: Fix SNC topology mess
x86/topo: Replace x86_has_numa_in_package
x86/topo: Add topology_num_nodes_per_package()
x86/numa: Store extra copy of numa_nodes_parsed
x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file paths
x86/sev: Allow IBPB-on-Entry feature for SNP guests
x86/boot/sev: Move SEV decompressor variables into the .data section
|
|
CONFIG_EFI_SBAT_FILE can be a relative path. When compiling using a different
output directory (O=) the build currently fails because it can't find the
filename set in CONFIG_EFI_SBAT_FILE:
arch/x86/boot/compressed/sbat.S: Assembler messages:
arch/x86/boot/compressed/sbat.S:6: Error: file not found: kernel.sbat
Add $(srctree) as include dir for sbat.o.
[ bp: Massage commit message. ]
Fixes: 61b57d35396a ("x86/efi: Implement support for embedding SBAT data for x86")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <stable@kernel.org>
Link: https://patch.msgid.link/f4eda155b0cef91d4d316b4e92f5771cb0aa7187.1772047658.git.jstancek@redhat.com
|
|
The SEV-SNP IBPB-on-Entry feature does not require a guest-side
implementation. It was added in Zen5 h/w, after the first SNP Zen
implementation, and thus was not accounted for when the initial set of SNP
features were added to the kernel.
In its abundant precaution, commit
8c29f0165405 ("x86/sev: Add SEV-SNP guest feature negotiation support")
included SEV_STATUS' IBPB-on-Entry bit as a reserved bit, thereby masking
guests from using the feature.
Allow guests to make use of IBPB-on-Entry when supported by the hypervisor, as
the bit is now architecturally defined and safe to expose.
Fixes: 8c29f0165405 ("x86/sev: Add SEV-SNP guest feature negotiation support")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@kernel.org
Link: https://patch.msgid.link/20260203222405.4065706-2-kim.phillips@amd.com
|
|
As part of the work to remove the dependency on calling into the decompressor
code (startup_64()) for a UEFI boot, a call to rmpadjust() was removed from
sev_enable() in favor of checking the value of the snp_vmpl variable.
When booting through a non-UEFI path and calling startup_64(), the call to
sev_enable() is performed before the BSS section is zeroed. With the removal
of the rmpadjust() call and the corresponding check of the return code, the
snp_vmpl variable is checked.
Since the kernel is running at VMPL0, the snp_vmpl variable will not have been
set and should be the default value of 0. However, since the call occurs
before the BSS is zeroed, the snp_vmpl variable may not actually be zero,
which will cause the guest boot to fail.
Since the decompressor relocates itself, the BSS would need to be cleared both
before and after the relocation, but this would, in effect, cause all of the
changes to BSS variables before relocation to be lost after relocation.
Instead, move the snp_vmpl variable into the .data section so that it is
initialized and the value made safe during relocation. As a pre-caution
against future changes, move other SEV-related decompressor variables into the
.data section, too.
Fixes: 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Tested-by: Kevin Hui <kevinhui@meta.com>
Tested-by: Changyuan Lyu <changyuanl@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/5648b7de5b0a5d0dfef3785f9582b718678c6448.1770217260.git.thomas.lendacky@amd.com
|
|
Commit 3e86e4d74c04 ("kbuild: keep .modinfo section in
vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it
from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and
ELF_DETAILS was present in all architecture specific vmlinux linker
scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and
COMMON_DISCARDS may be used by other linker scripts, such as the s390
and x86 compressed boot images, which may not expect to have a .modinfo
section. In certain circumstances, this could result in a bootloader
failing to load the compressed kernel [1].
Commit ddc6cbef3ef1 ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with
SecureBoot trailer") recently addressed this for the s390 bzImage but
the same bug remains for arm, parisc, and x86. The presence of .modinfo
in the x86 bzImage was the root cause of the issue worked around with
commit d50f21091358 ("kbuild: align modinfo section for Secureboot
Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes
lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its
MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition.
Split .modinfo out from ELF_DETAILS into its own macro and handle it in
all vmlinux linker scripts. Discard .modinfo in the places where it was
previously being discarded from being in COMMON_DISCARDS, as it has
never been necessary in those uses.
Cc: stable@vger.kernel.org
Fixes: 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped")
Reported-by: Ed W <lists@wildgooses.com>
Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1]
Tested-by: Ed W <lists@wildgooses.com> # x86_64
Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "powerpc/64s: do not re-activate batched TLB flush" makes
arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)
It adds a generic enter/leave layer and switches architectures to use
it. Various hacks were removed in the process.
- "zram: introduce compressed data writeback" implements data
compression for zram writeback (Richard Chang and Sergey Senozhatsky)
- "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
page ranges for hugepages. Large improvements during demand faulting
are demonstrated (David Hildenbrand)
- "memcg cleanups" tidies up some memcg code (Chen Ridong)
- "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
stats" improves DAMOS stat's provided information, deterministic
control, and readability (SeongJae Park)
- "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
issues in the hugetlb cgroup charging selftests (Li Wang)
- "Fix va_high_addr_switch.sh test failure - again" addresses several
issues in the va_high_addr_switch test (Chunyu Hu)
- "mm/damon/tests/core-kunit: extend existing test scenarios" improves
the KUnit test coverage for DAMON (Shu Anzai)
- "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
transiently return -EAGAIN (Shivank Garg)
- "arch, mm: consolidate hugetlb early reservation" reworks and
consolidates a pile of straggly code related to reservation of
hugetlb memory from bootmem and creation of CMA areas for hugetlb
(Mike Rapoport)
- "mm: clean up anon_vma implementation" cleans up the anon_vma
implementation in various ways (Lorenzo Stoakes)
- "tweaks for __alloc_pages_slowpath()" does a little streamlining of
the page allocator's slowpath code (Vlastimil Babka)
- "memcg: separate private and public ID namespaces" cleans up the
memcg ID code and prevents the internal-only private IDs from being
exposed to userspace (Shakeel Butt)
- "mm: hugetlb: allocate frozen gigantic folio" cleans up the
allocation of frozen folios and avoids some atomic refcount
operations (Kefeng Wang)
- "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
of memory betewwn the active and inactive LRUs and adds auto-tuning
of the ratio-based quotas and of monitoring intervals (SeongJae Park)
- "Support page table check on PowerPC" makes
CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)
- "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
nodes_and() and nodes_andnot() propagate the return values from the
underlying bit operations, enabling some cleanup in calling code
(Yury Norov)
- "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
some DAMON internal interfaces (SeongJae Park)
- "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
in khupaged and fixes a scan limit accounting issue (Shivank Garg)
- "mm: balloon infrastructure cleanups" goes to town on the balloon
infrastructure and its page migration function. Mainly cleanups, also
some locking simplification (David Hildenbrand)
- "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
additional tracepoints to the page reclaim code (Jiayuan Chen)
- "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
part of Marco's kernel-wide migration from the legacy workqueue APIs
over to the preferred unbound workqueues (Marco Crivellari)
- "Various mm kselftests improvements/fixes" provides various unrelated
improvements/fixes for the mm kselftests (Kevin Brodsky)
- "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
folio allocation, mainly by avoiding unnecessary work in
pfn_range_valid_contig() (Kefeng Wang)
- "selftests/damon: improve leak detection and wss estimation
reliability" improves the reliability of two of the DAMON selftests
(SeongJae Park)
- "mm/damon: cleanup kdamond, damon_call(), damos filter and
DAMON_MIN_REGION" does some cleanup work in the core DAMON code
(SeongJae Park)
- "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
performs maintenance work on the DAMON documentation (SeongJae Park)
- "mm: add and use vma_assert_stabilised() helper" refactors and cleans
up the core VMA code. The main aim here is to be able to use the mmap
write lock's lockdep state to perform various assertions regarding
the locking which the VMA code requires (Lorenzo Stoakes)
- "mm, swap: swap table phase II: unify swapin use" removes some old
swap code (swap cache bypassing and swap synchronization) which
wasn't working very well. Various other cleanups and simplifications
were made. The end result is a 20% speedup in one benchmark (Kairui
Song)
- "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
available on 64-bit alpha, loongarch, mips, parisc, and um. Various
cleanups were performed along the way (Qi Zheng)
* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
mm/memory: handle non-split locks correctly in zap_empty_pte_table()
mm: move pte table reclaim code to memory.c
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
um: mm: enable MMU_GATHER_RCU_TABLE_FREE
parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
zsmalloc: make common caches global
mm: add SPDX id lines to some mm source files
mm/zswap: use %pe to print error pointers
mm/vmscan: use %pe to print error pointers
mm/readahead: fix typo in comment
mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
mm: refactor vma_map_pages to use vm_insert_pages
mm/damon: unify address range representation with damon_addr_range
mm/cma: replace snprintf with strscpy in cma_new_area
...
|
|
Architectures currently opt in for implementing lazy_mmu helpers by
defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE.
In preparation for introducing a generic lazy_mmu layer that will require
storage in task_struct, let's switch to a cleaner approach: instead of
defining a macro, select a CONFIG option.
This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch
select it when it implements lazy_mmu helpers.
__HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on
the new CONFIG instead.
On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected.
This creates some complications in arch/x86/boot/, because a few files
manually undefine PARAVIRT* options. As a result <asm/paravirt.h> does
not define the lazy_mmu helpers, but this breaks the build as
<linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE.
There does not seem to be a clean way out of this - let's just undefine
that new CONFIG too.
Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Move the internal header out of the usual include/asm/ include path
because having an "internal" header there doesn't really make it
internal - quite the opposite - that's the normal arch include path.
So move where it belongs and make it really internal.
No functional changes.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20251204145716.GDaTGhTEHNOtSdTkEe@fat_crate.local
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Borislav Petkov:
- Use the proper accessors when reading CR3 as part of the page level
transitions (5-level to 4-level, the use case being kexec) so that
only the physical address in CR3 is picked up and not flags which are
above the physical mask shift
- Clean up and unify __phys_addr_symbol() definitions
* tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub: Fix page table access in 5-level to 4-level paging transition
x86/boot: Fix page table access in 5-level to 4-level paging transition
x86/mm: Unify __phys_addr_symbol()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Largely cleanups along with a change to save XSS to the GHCB
(Guest-Host Communication Block) in SEV-ES guests so that the
hypervisor can determine the guest's XSAVES buffer size properly
and thus support shadow stacks in AMD confidential guests
* tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cc: Fix enum spelling to fix kernel-doc warnings
x86/boot: Drop unused sev_enable() fallback
x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled()
x86/sev: Include XSS value in GHCB CPUID request
x86/boot: Move boot_*msr helpers to asm/shared/msr.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- The mandatory pile of cleanups the cat drags in every merge window
* tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Clean up whitespace in a20.c
x86/mm: Delete disabled debug code
x86/{boot,mtrr}: Remove unused function declarations
x86/percpu: Use BIT_WORD() and BIT_MASK() macros
x86/cpufeatures: Correct LKGS feature flag description
x86/idtentry: Add missing '*' to kernel-doc lines
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Callchain support:
- Add support for deferred user-space stack unwinding for perf,
enabled on x86. (Peter Zijlstra, Steven Rostedt)
- unwind_user/x86: Enable frame pointer unwinding on x86 (Josh
Poimboeuf)
x86 PMU support and infrastructure:
- x86/insn: Simplify for_each_insn_prefix() (Peter Zijlstra)
- x86/insn,uprobes,alternative: Unify insn_is_nop() (Peter Zijlstra)
Intel PMU driver:
- Large series to prepare for and implement architectural PEBS
support for Intel platforms such as Clearwater Forest (CWF) and
Panther Lake (PTL). (Dapeng Mi, Kan Liang)
- Check dynamic constraints (Kan Liang)
- Optimize PEBS extended config (Peter Zijlstra)
- cstates:
- Remove PC3 support from LunarLake (Zhang Rui)
- Add Pantherlake support (Zhang Rui)
- Clearwater Forest support (Zide Chen)
AMD PMU driver:
- x86/amd: Check event before enable to avoid GPF (George Kennedy)
Fixes and cleanups:
- task_work: Fix NMI race condition (Peter Zijlstra)
- perf/x86: Fix NULL event access and potential PEBS record loss
(Dapeng Mi)
- Misc other fixes and cleanups (Dapeng Mi, Ingo Molnar, Peter
Zijlstra)"
* tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use
perf/x86/intel: Optimize PEBS extended config
perf/x86/intel: Check PEBS dyn_constraints
perf/x86/intel: Add a check for dynamic constraints
perf/x86/intel: Add counter group support for arch-PEBS
perf/x86/intel: Setup PEBS data configuration and enable legacy groups
perf/x86/intel: Update dyn_constraint base on PEBS event precise level
perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
perf/x86/intel: Process arch-PEBS records or record fragments
perf/x86/intel/ds: Factor out PEBS group processing code to functions
perf/x86/intel/ds: Factor out PEBS record processing code to functions
perf/x86/intel: Initialize architectural PEBS
perf/x86/intel: Correct large PEBS flag check
perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call
perf/x86: Fix NULL event access and potential PEBS record loss
perf/x86: Remove redundant is_x86_event() prototype
entry,unwind/deferred: Fix unwind_reset_info() placement
unwind_user/x86: Fix arch=um build
perf: Support deferred user unwind
unwind_user/x86: Teach FP unwind about start of function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- klp-build livepatch module generation (Josh Poimboeuf)
Introduce new objtool features and a klp-build script to generate
livepatch modules using a source .patch as input.
This builds on concepts from the longstanding out-of-tree kpatch
project which began in 2012 and has been used for many years to
generate livepatch modules for production kernels. However, this is a
complete rewrite which incorporates hard-earned lessons from 12+
years of maintaining kpatch.
Key improvements compared to kpatch-build:
- Integrated with objtool: Leverages objtool's existing control-flow
graph analysis to help detect changed functions.
- Works on vmlinux.o: Supports late-linked objects, making it
compatible with LTO, IBT, and similar.
- Simplified code base: ~3k fewer lines of code.
- Upstream: No more out-of-tree #ifdef hacks, far less cruft.
- Cleaner internals: Vastly simplified logic for
symbol/section/reloc inclusion and special section extraction.
- Robust __LINE__ macro handling: Avoids false positive binary diffs
caused by the __LINE__ macro by introducing a fix-patch-lines
script which injects #line directives into the source .patch to
preserve the original line numbers at compile time.
- Disassemble code with libopcodes instead of running objdump
(Alexandre Chartre)
- Disassemble support (-d option to objtool) by Alexandre Chartre,
which supports the decoding of various Linux kernel code generation
specials such as alternatives:
17ef: sched_balance_find_dst_group+0x62f mov 0x34(%r9),%edx
17f3: sched_balance_find_dst_group+0x633 | <alternative.17f3> | X86_FEATURE_POPCNT
17f3: sched_balance_find_dst_group+0x633 | call 0x17f8 <__sw_hweight64> | popcnt %rdi,%rax
17f8: sched_balance_find_dst_group+0x638 cmp %eax,%edx
... jump table alternatives:
1895: sched_use_asym_prio+0x5 test $0x8,%ch
1898: sched_use_asym_prio+0x8 je 0x18a9 <sched_use_asym_prio+0x19>
189a: sched_use_asym_prio+0xa | <jump_table.189a> | JUMP
189a: sched_use_asym_prio+0xa | jmp 0x18ae <sched_use_asym_prio+0x1e> | nop2
189c: sched_use_asym_prio+0xc mov $0x1,%eax
18a1: sched_use_asym_prio+0x11 and $0x80,%ecx
... exception table alternatives:
native_read_msr:
5b80: native_read_msr+0x0 mov %edi,%ecx
5b82: native_read_msr+0x2 | <ex_table.5b82> | EXCEPTION
5b82: native_read_msr+0x2 | rdmsr | resume at 0x5b84 <native_read_msr+0x4>
5b84: native_read_msr+0x4 shl $0x20,%rdx
.... x86 feature flag decoding (also see the X86_FEATURE_POPCNT
example in sched_balance_find_dst_group() above):
2faaf: start_thread_common.constprop.0+0x1f jne 0x2fba4 <start_thread_common.constprop.0+0x114>
2fab5: start_thread_common.constprop.0+0x25 | <alternative.2fab5> | X86_FEATURE_ALWAYS | X86_BUG_NULL_SEG
2fab5: start_thread_common.constprop.0+0x25 | jmp 0x2faba <.altinstr_aux+0x2f4> | jmp 0x4b0 <start_thread_common.constprop.0+0x3f> | nop5
2faba: start_thread_common.constprop.0+0x2a mov $0x2b,%eax
... NOP sequence shortening:
1048e2: snapshot_write_finalize+0xc2 je 0x104917 <snapshot_write_finalize+0xf7>
1048e4: snapshot_write_finalize+0xc4 nop6
1048ea: snapshot_write_finalize+0xca nop11
1048f5: snapshot_write_finalize+0xd5 nop11
104900: snapshot_write_finalize+0xe0 mov %rax,%rcx
104903: snapshot_write_finalize+0xe3 mov 0x10(%rdx),%rax
... and much more.
- Function validation tracing support (Alexandre Chartre)
- Various -ffunction-sections fixes (Josh Poimboeuf)
- Clang AutoFDO (Automated Feedback-Directed Optimizations) support
(Josh Poimboeuf)
- Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo
Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra,
Thorsten Blum)
* tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
objtool: Fix segfault on unknown alternatives
objtool: Build with disassembly can fail when including bdf.h
objtool: Trim trailing NOPs in alternative
objtool: Add wide output for disassembly
objtool: Compact output for alternatives with one instruction
objtool: Improve naming of group alternatives
objtool: Add Function to get the name of a CPU feature
objtool: Provide access to feature and flags of group alternatives
objtool: Fix address references in alternatives
objtool: Disassemble jump table alternatives
objtool: Disassemble exception table alternatives
objtool: Print addresses with alternative instructions
objtool: Disassemble group alternatives
objtool: Print headers for alternatives
objtool: Preserve alternatives order
objtool: Add the --disas=<function-pattern> action
objtool: Do not validate IBT for .return_sites and .call_sites
objtool: Improve tracing of alternative instructions
objtool: Add functions to better name alternatives
objtool: Identify the different types of alternatives
...
|
|
Remove trailing whitespace on empty lines.
No functional changes.
[ bp: Massage commit message. ]
Signed-off-by: Harry Fellowes <harryfellowes1@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20250825192832.6444-3-harryfellowes1@gmail.com
|
|
Commits
28be1b454c2b ("x86/boot: Remove unused copy_*_gs() functions")
34d2819f2078 ("x86, mtrr: Remove unused mtrr/state.c")
removed the functions but left the prototypes. Remove them.
[ bp: Merge into a single patch. ]
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251120121037.1479334-1-yuehaibing@huawei.com
|
|
The misc.h header is not included by the EFI stub, which is the only
C caller of sev_enable(). This means the fallback for cases where
CONFIG_AMD_MEM_ENCRYPT is not set is never used, so it can be dropped.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/20250909080631.2867579-6-ardb+git@google.com
|
|
With CONFIG_KLP_BUILD enabled, checksums are only needed during a
klp-build run. There's no need to enable them for normal kernel builds.
This also has the benefit of softening the xxhash dependency.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/edbb1ca215e4926e02edb493b68b9d6d063e902f.1762990139.git.jpoimboe@kernel.org
|
|
Drop one redundant definition, while at it.
There should be no functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251031122122.GKaQSpwhLvkinKKbjG@fat_crate.local
|
|
When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3 and
applying PAGE_MASK. This approach has several issues:
- __native_read_cr3() returns the raw CR3 register value, which on x86_64
includes not just the physical address but also flags. Bits above the
physical address width of the system i.e. above __PHYSICAL_MASK_SHIFT) are
also not masked.
- The PGD entry is masked by PAGE_SIZE which doesn't take into account the
higher bits such as _PAGE_BIT_NOPTISHADOW.
Replace this with proper accessor functions:
- native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out
of CR3 (like SME or LAM bits). All remaining bits are real address bits or
reserved and must be 0.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags
above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above
the max physical address are reserved and must be 0.
Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/a482fd68-ce54-472d-8df1-33d6ac9f6bb5@intel.com
|
|
This is a follow up to commit c4781dc3d1cf ("Kbuild: enable
-fms-extensions") but in a separate change due to being substantially
different from the initial submission.
There are many places within the kernel that use their own CFLAGS
instead of the main KBUILD_CFLAGS, meaning code written with the main
kernel's use of '-fms-extensions' in mind that may be tangentially
included in these areas will result in "error: declaration does not
declare anything" messages from the compiler.
Add '-fms-extensions' to all these areas to ensure consistency, along
with -Wno-microsoft-anon-tag to silence clang's warning about use of the
extension that the kernel cares about using. parisc does not build with
clang so it does not need this warning flag. LoongArch does not need it
either because -W flags from KBUILD_FLAGS are pulled into cflags-vdso.
Reported-by: Christian Brauner <brauner@kernel.org>
Closes: https://lore.kernel.org/20251030-meerjungfrau-getrocknet-7b46eacc215d@brauner/
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
|
|
The boot_{rdmsr,wrmsr}() helpers are *just* the barebones MSR access
functionality, without any tracing or exception handling glue as it is done in
kernel proper.
Move these helpers to asm/shared/msr.h and rename to raw_{rdmsr,wrmsr}() to
indicate what they are.
[ bp: Correct the reason why those helpers exist. I should've caught that in
the original patch that added them:
176db622573f ("x86/boot: Introduce helpers for MSR reads/writes"
but oh well...
- fixup include path delimiters to <> ]
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/all/20250924200852.4452-2-john.allen@amd.com
|
|
Use the new-found freedom of allowing variable declarions inside
for() to simplify the for_each_insn_prefix() iterator to no longer
need an external temporary.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- Simplify inline asm flag output operands now that the minimum
compiler version supports the =@ccCOND syntax
- Remove a bunch of AS_* Kconfig symbols which detect assembler support
for various instruction mnemonics now that the minimum assembler
version supports them all
- The usual cleanups all over the place
* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
x86/mtrr: Remove license boilerplate text with bad FSF address
x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
x86/entry/fred: Push __KERNEL_CS directly
x86/kconfig: Remove CONFIG_AS_AVX512
crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
crypto: X86 - Remove CONFIG_AS_VAES
crypto: x86 - Remove CONFIG_AS_GFNI
x86/kconfig: Drop unused and needless config X86_64_SMP
|
|
The kexec code will call set_pages_state() after tearing down all the GHCBs,
which will therefore result in a call to early_set_pages_state().
This means the __init annotation is wrong, and must be dropped.
Fixes: c5c30a373693 ("x86/boot: Move startup code out of __head section")
Reported-by: Srikanth Aithal <Srikanth.Aithal@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Srikanth Aithal <Srikanth.Aithal@amd.com>
|
|
Document the CPUID reading the different SEV guest types do - the SNP
one which relies on the presence of a CPUID table and the SEV-ES one,
which reads the CPUID supplied by the hypervisor.
The intent being to clarify the two back-to-back, similar CPUID
invocations.
No functional changes.
[ bp: Turn into a proper patch. ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/fbb24767-0e06-d1d6-36e0-1757d98aca66@amd.com
|
|
The minimum supported GCC version is 8.1, which supports flag output operands
and always defines __GCC_ASM_FLAG_OUTPUTS__ macro.
Remove code depending on __GCC_ASM_FLAG_OUTPUTS__ and use the "=@ccCOND" flag
output operand directly.
Use the equivalent "=@ccz" instead of "=@cce" flag output operand for
CMPXCHG8B and CMPXCHG16B instructions. These instructions set a single flag
bit - the Zero flag - and "=@ccz" is used to distinguish the CC user from
comparison instructions, where set ZERO flag indeed means that the values are
equal.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250905121723.GCaLrU04lP2A50PT-B@fat_crate.local
|
|
Conflicts:
arch/x86/include/asm/sev-internal.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It is a silly oneliner anyway. Replace it with its equivalent.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Move startup code out of the __head section, now that this no longer has
a special significance. Move everything into .text or .init.text as
appropriate, so that startup code is not kept around unnecessarily.
[ bp: Fold in hunk to fix 32-bit CPU hotplug:
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509022207.56fd97f4-lkp@intel.com ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-45-ardb+git@google.com
|
|
Recent EFI x86 systems are more strict when it comes to mapping boot
images, and require that mappings are either read-write or read-execute.
Now that the boot code is being cleaned up and refactored, most of it is
being moved into .init.text [where it arguably belongs] but that implies
that when booting on such strict EFI firmware, we need to take care to
map .init.text (and the .altinstr_aux section that follows it)
read-execute as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-44-ardb+git@google.com
|
|
In order to be able to have tight control over which code may execute
from the early 1:1 mapping of memory, but still link vmlinux as a single
executable, prefix all symbol references in startup code with __pi_, and
invoke it from outside using the __pi_ prefix.
Use objtool to check that no absolute symbol references are present in
the startup code, as these cannot be used from code running from the 1:1
mapping.
Note that this also requires disabling the latent-entropy GCC plugin, as
the global symbol references that it injects would require explicit
exports, and given that the startup code rarely executes more than once,
it is not a useful source of entropy anyway.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-43-ardb+git@google.com
|
|
Invoke objtool on each startup code object individually to check for the
absence of absolute relocations. This is needed because this code will
be invoked from the 1:1 mapping of memory before those absolute virtual
addresses (which are derived from the kernel virtual base address
provided to the linker and possibly shifted at boot) are mapped.
Only objects built under arch/x86/boot/startup/ have this restriction,
and once they have been incorporated into vmlinux.o, this distinction is
difficult to make. So force the invocation of objtool for each object
file individually, even if objtool is deferred to vmlinux.o for the rest
of the build. In the latter case, only pass --noabs and nothing else;
otherwise, append it to the existing objtool command line.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-40-ardb+git@google.com
|
|
Create aliases that expose routines that are part of the startup code to
other code in the core kernel, so that they can be called later as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-38-ardb+git@google.com
|
|
Rename sev-nmi.c to noinstr.c, and move the get/put GHCB routines into it too,
which are also annotated as 'noinstr' and suffer from the same problem as the
NMI code, i.e., that GCC may ignore the __no_sanitize_address__ function
attribute implied by 'noinstr' and insert KASAN instrumentation anyway.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-37-ardb+git@google.com
|
|
Provide PIC aliases for data objects that are shared between the SEV startup
code and the SEV code that executes later. This is needed so that the confined
startup code is permitted to access them.
This requires some of these variables to be moved into a source file that is
not part of the startup code, as the PIC alias is already implied, and
exporting variables in the opposite direction is not supported.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-36-ardb+git@google.com
|
|
snp_vmpl will be assigned a non-zero value when executing at a VMPL other than
0, and this is inferred from a call to RMPADJUST, which only works when
running at VMPL0.
This means that testing snp_vmpl is sufficient, and there is no need to
perform the same check again.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-34-ardb+git@google.com
|
|
To avoid having to reason about whether or not to use the per-CPU SVSM calling
area when running startup and init code on the boot CPU, reuse the boot SVSM
calling area as the per-CPU area for the BSP.
Thus, remove the need to make the per-CPU variables and associated state in
sev_cfg accessible to the startup code once confined.
[ bp: Massage commit message. ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-33-ardb+git@google.com
|
|
The early page state change API is mostly only used very early, when only the
boot time SVSM calling area is in use. However, this API is also called by the
kexec finishing code, which runs very late, and potentially from a different
CPU (which uses a different calling area).
To avoid pulling the per-CPU SVSM calling area pointers and related SEV state
into the startup code, refactor the page state change API so the SVSM calling
area virtual and physical addresses can be provided by the caller.
No functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-32-ardb+git@google.com
|
|
Both the decompressor and the SEV startup code implement the exact same
sequence for invoking the MSR based communication protocol to effectuate
a page state change.
Before tweaking the internal APIs used in both versions, merge them and
share them so those tweaks are only needed in a single place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-31-ardb+git@google.com
|
|
The boottime SVSM calling area is used both by the startup code running from
a 1:1 mapping, and potentially later on running from the ordinary kernel
mapping.
This SVSM calling area is statically allocated, and so its physical address
doesn't change. However, its virtual address depends on the calling context
(1:1 mapping or kernel virtual mapping), and even though the variable that
holds the virtual address of this calling area gets updated from 1:1 address
to kernel address during the boot, it is hard to reason about why this is
guaranteed to be safe.
So instead, take the RIP-relative address of the boottime SVSM calling area
whenever its virtual address is required, and only use a global variable for
the physical address.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-30-ardb+git@google.com
|
|
Both the decompressor and the core kernel implement an early #VC handler,
which only deals with CPUID instructions, and full featured one, which can
handle any #VC exception.
The former communicates with the hypervisor using the MSR based protocol,
whereas the latter uses a shared GHCB page, which is configured a bit later
during the boot, when the kernel runs from its ordinary virtual mapping,
rather than the 1:1 mapping that the startup code uses.
Accessing this shared GHCB page from the core kernel's startup code is
problematic, because it involves converting the GHCB address provided by the
caller to a physical address. In the startup code, virtual to physical address
translations are problematic, given that the virtual address might be a 1:1
mapped address, and such translations should therefore be avoided.
This means that exposing startup code dealing with the GHCB to callers that
execute from the ordinary kernel virtual mapping should be avoided too. So
move all GHCB page based communication out of the startup code, now that all
communication occurring before the kernel virtual mapping is up relies on the
MSR protocol only.
As an exception, add a flag representing the need to apply the coherency
fix in order to avoid exporting CPUID* helpers because of the code
running too early for the *cpu_has* infrastructure.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250828102202.1849035-29-ardb+git@google.com
|
|
Now that Secure AVIC support is complete, make it part of to the SNP present
features.
Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/20250828113225.209174-1-Neeraj.Upadhyay@amd.com
|
|
Determining the VMPL at which the kernel runs involves performing a RMPADJUST
operation on an arbitrary page of memory, and observing whether it succeeds.
The use of boot_ghcb_page in the core kernel in this case is completely
arbitrary, but results in the need to provide a PIC alias for it. So use
boot_svsm_ca_page instead, which already needs this alias for other reasons.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-28-ardb+git@google.com
|
|
The early page state change API performs an SVSM call to PVALIDATE each page
when running under a SVSM, and this involves either a GHCB page based call or
a call based on the MSR protocol.
The GHCB page based variant involves VA to PA translation of the GHCB address,
and this is best avoided in the startup code, where virtual addresses are
ambiguous (1:1 or kernel virtual).
As this is the last remaining occurrence of svsm_perform_call_protocol() in
the startup code, switch to the MSR protocol exclusively in this particular
case, so that the GHCB based plumbing can be moved out of the startup code
entirely in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-27-ardb+git@google.com
|
|
As the preceding code comment already indicates, remapping the SVSM
calling area occurs long before the GHCB page is configured, and so
calling svsm_perform_call_protocol() is guaranteed to result in a call
to svsm_perform_msr_protocol().
So just call the latter directly. This allows most of the GHCB based API
infrastructure to be moved out of the startup code in a subsequent
patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-26-ardb+git@google.com
|
|
The Secure AVIC feature provides SEV-SNP guests hardware acceleration for
performance sensitive APIC accesses while securely managing the guest-owned
APIC state through the use of a private APIC backing page.
This helps prevent the hypervisor from generating unexpected interrupts for
a vCPU or otherwise violate architectural assumptions around the APIC
behavior.
Add a new x2APIC driver that will serve as the base of the Secure AVIC
support. It is initially the same as the x2APIC physical driver (without IPI
callbacks), but will be modified as features are implemented.
As the new driver does not implement Secure AVIC features yet, if the
hypervisor sets the Secure AVIC bit in SEV_STATUS, maintain the existing
behavior to enforce the guest termination.
[ bp: Massage commit message. ]
Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/20250828070334.208401-2-Neeraj.Upadhyay@amd.com
|
|
There are two distinct callers of snp_cpuid(): the MSR protocol and the GHCB
page based interface.
The snp_cpuid() logic does not care about the distinction, which only matters
at a lower level. But the fact that it supports both interfaces means that the
GHCB page based logic is pulled into the early startup code where PA to VA
conversions are problematic, given that it runs from the 1:1 mapping of memory.
So keep snp_cpuid() itself in the startup code, but factor out the hypervisor
calls via a callback, so that the GHCB page handling can be moved out.
Code refactoring only - no functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/20250828102202.1849035-25-ardb+git@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Remove a transitional asm/cpuid.h header which was added only as a
fallback during cpuid helpers reorg
- Initialize reserved fields in the SVSM page validation calls
structure to zero in order to allow for future structure extensions
- Have the sev-guest driver's buffers used in encryption operations be
in linear mapping space as the encryption operation can be offloaded
to an accelerator
- Have a read-only MSR write when in an AMD SNP guest trap to the
hypervisor as it is usually done. This makes the guest user
experience better by simply raising a #GP instead of terminating said
guest
- Do not output AVX512 elapsed time for kernel threads because the data
is wrong and fix a NULL pointer dereferencing in the process
- Adjust the SRSO mitigation selection to the new attack vectors
* tag 'x86_urgent_for_v6.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpuid: Remove transitional <asm/cpuid.h> header
x86/sev: Ensure SVSM reserved fields in a page validation entry are initialized to zero
virt: sev-guest: Satisfy linear mapping requirement in get_derived_key()
x86/sev: Improve handling of writes to intercepted TSC MSRs
x86/fpu: Fix NULL dereference in avx512_status()
x86/bugs: Select best SRSO mitigation
|
|
initialized to zero
In order to support future versions of the SVSM_CORE_PVALIDATE call, all
reserved fields within a PVALIDATE entry must be set to zero as an SVSM should
be ensuring all reserved fields are zero in order to support future usage of
reserved areas based on the protocol version.
Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/7cde412f8b057ea13a646fb166b1ca023f6a5031.1755098819.git.thomas.lendacky@amd.com
|
|
An SNP cache coherency vulnerability requires a cache line eviction
mitigation when validating memory after a page state change to private.
The specific mitigation is to touch the first and last byte of each 4K
page that is being validated. There is no need to perform the mitigation
when performing a page state change to shared and rescinding validation.
CPUID bit Fn8000001F_EBX[31] defines the COHERENCY_SFW_NO CPUID bit
that, when set, indicates that the software mitigation for this
vulnerability is not needed.
Implement the mitigation and invoke it when validating memory (making it
private) and the COHERENCY_SFW_NO bit is not set, indicating the SNP
guest is vulnerable.
Co-developed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
|