summaryrefslogtreecommitdiffstats
path: root/arch/x86/lib
AgeCommit message (Collapse)AuthorLines
2026-02-12Merge tag 'mm-stable-2026-02-11-19-22' of ↵Linus Torvalds-31/+8
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "powerpc/64s: do not re-activate batched TLB flush" makes arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev) It adds a generic enter/leave layer and switches architectures to use it. Various hacks were removed in the process. - "zram: introduce compressed data writeback" implements data compression for zram writeback (Richard Chang and Sergey Senozhatsky) - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous page ranges for hugepages. Large improvements during demand faulting are demonstrated (David Hildenbrand) - "memcg cleanups" tidies up some memcg code (Chen Ridong) - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats" improves DAMOS stat's provided information, deterministic control, and readability (SeongJae Park) - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few issues in the hugetlb cgroup charging selftests (Li Wang) - "Fix va_high_addr_switch.sh test failure - again" addresses several issues in the va_high_addr_switch test (Chunyu Hu) - "mm/damon/tests/core-kunit: extend existing test scenarios" improves the KUnit test coverage for DAMON (Shu Anzai) - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN (Shivank Garg) - "arch, mm: consolidate hugetlb early reservation" reworks and consolidates a pile of straggly code related to reservation of hugetlb memory from bootmem and creation of CMA areas for hugetlb (Mike Rapoport) - "mm: clean up anon_vma implementation" cleans up the anon_vma implementation in various ways (Lorenzo Stoakes) - "tweaks for __alloc_pages_slowpath()" does a little streamlining of the page allocator's slowpath code (Vlastimil Babka) - "memcg: separate private and public ID namespaces" cleans up the memcg ID code and prevents the internal-only private IDs from being exposed to userspace (Shakeel Butt) - "mm: hugetlb: allocate frozen gigantic folio" cleans up the allocation of frozen folios and avoids some atomic refcount operations (Kefeng Wang) - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement of memory betewwn the active and inactive LRUs and adds auto-tuning of the ratio-based quotas and of monitoring intervals (SeongJae Park) - "Support page table check on PowerPC" makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan) - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes nodes_and() and nodes_andnot() propagate the return values from the underlying bit operations, enabling some cleanup in calling code (Yury Norov) - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up some DAMON internal interfaces (SeongJae Park) - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work in khupaged and fixes a scan limit accounting issue (Shivank Garg) - "mm: balloon infrastructure cleanups" goes to town on the balloon infrastructure and its page migration function. Mainly cleanups, also some locking simplification (David Hildenbrand) - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds additional tracepoints to the page reclaim code (Jiayuan Chen) - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is part of Marco's kernel-wide migration from the legacy workqueue APIs over to the preferred unbound workqueues (Marco Crivellari) - "Various mm kselftests improvements/fixes" provides various unrelated improvements/fixes for the mm kselftests (Kevin Brodsky) - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic folio allocation, mainly by avoiding unnecessary work in pfn_range_valid_contig() (Kefeng Wang) - "selftests/damon: improve leak detection and wss estimation reliability" improves the reliability of two of the DAMON selftests (SeongJae Park) - "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION" does some cleanup work in the core DAMON code (SeongJae Park) - "Docs/mm/damon: update intro, modules, maintainer profile, and misc" performs maintenance work on the DAMON documentation (SeongJae Park) - "mm: add and use vma_assert_stabilised() helper" refactors and cleans up the core VMA code. The main aim here is to be able to use the mmap write lock's lockdep state to perform various assertions regarding the locking which the VMA code requires (Lorenzo Stoakes) - "mm, swap: swap table phase II: unify swapin use" removes some old swap code (swap cache bypassing and swap synchronization) which wasn't working very well. Various other cleanups and simplifications were made. The end result is a 20% speedup in one benchmark (Kairui Song) - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM available on 64-bit alpha, loongarch, mips, parisc, and um. Various cleanups were performed along the way (Qi Zheng) * tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits) mm/memory: handle non-split locks correctly in zap_empty_pte_table() mm: move pte table reclaim code to memory.c mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config um: mm: enable MMU_GATHER_RCU_TABLE_FREE parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE mips: mm: enable MMU_GATHER_RCU_TABLE_FREE LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles zsmalloc: make common caches global mm: add SPDX id lines to some mm source files mm/zswap: use %pe to print error pointers mm/vmscan: use %pe to print error pointers mm/readahead: fix typo in comment mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file() mm: refactor vma_map_pages to use vm_insert_pages mm/damon: unify address range representation with damon_addr_range mm/cma: replace snprintf with strscpy in cma_new_area ...
2026-02-10Merge tag 'x86_misc_for_7.0-rc1' of ↵Linus Torvalds-22/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Dave Hansen: "The usual smattering of x86/misc changes. The IPv6 patch in here surprised me in a couple of ways. First, the function it inlines is able to eat a lot more CPU time than I would have expected. Second, the inlining does not seem to bloat the kernel, at least in the configs folks have tested. - Inline x86-specific IPv6 checksum helper - Update IOMMU docs to use stable identifiers - Print unhashed pointers on fatal stack overflows" * tag 'x86_misc_for_7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/traps: Print unhashed pointers on stack overflow Documentation/x86: Update IOMMU spec references to use stable identifiers x86/lib: Inline csum_ipv6_magic()
2026-01-20x86/mm: simplify clear_page_*Ankur Arora-31/+8
clear_page_rep() and clear_page_erms() are wrappers around "REP; STOS" variations. Inlining gets rid of an unnecessary CALL/RET (which isn't free when using RETHUNK speculative execution mitigations.) Fixup and rename clear_page_orig() to adapt to the changed calling convention. Also add a comment from Dave Hansen detailing various clearing mechanisms used in clear_page(). Link: https://lkml.kernel.org/r/20260107072009.1615991-5-ankur.a.arora@oracle.com Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Raghavendra K T <raghavendra.kt@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Hildenbrand <david@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-12x86/paravirt: Remove not needed includes of paravirt.hJuergen Gross-1/+0
In some places asm/paravirt.h is included without really being needed. Remove the related #include statements. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-2-jgross@suse.com
2026-01-05x86/lib: Inline csum_ipv6_magic()Eric Dumazet-22/+0
Inline this small helper. It has been observed to consume up to 0.75%, which is significant for such a small function. This should reduce register pressure, as saddr and daddr are often back to back in memory. For instance code inlined in tcp6_gro_receive() will look like: 55a: 48 03 73 28 add 0x28(%rbx),%rsi 55e: 8b 43 70 mov 0x70(%rbx),%eax 561: 29 f8 sub %edi,%eax 563: 0f c8 bswap %eax 565: 89 c0 mov %eax,%eax 567: 48 05 00 06 00 00 add $0x600,%rax 56d: 48 03 46 08 add 0x8(%rsi),%rax 571: 48 13 46 10 adc 0x10(%rsi),%rax 575: 48 13 46 18 adc 0x18(%rsi),%rax 579: 48 13 46 20 adc 0x20(%rsi),%rax 57d: 48 83 d0 00 adc $0x0,%rax 581: 48 89 c6 mov %rax,%rsi 584: 48 c1 ee 20 shr $0x20,%rsi 588: 01 f0 add %esi,%eax 58a: 83 d0 00 adc $0x0,%eax 58d: 89 c6 mov %eax,%esi 58f: 66 31 c0 xor %ax,%ax Surprisingly, this inlining does not seem to bloat kernel text size. It at least two cases[1], it either has no effect or results in a slightly smaller kernel. 1. https://lore.kernel.org/all/CANn89iJzcb_XO9oCApKYfRxsMMmg7BHukRDqWTca3ZLQ8HT0iQ@mail.gmail.com/ [ dhansen: add justification and note about lack of kernel bloat ] Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251113154545.594580-1-edumazet@google.com
2025-12-06Merge tag 'objtool-urgent-2025-12-06' of ↵Linus Torvalds-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fixes from Ingo Molnar: "Address various objtool scalability bugs/inefficiencies exposed by allmodconfig builds, plus improve the quality of alternatives instructions generated code and disassembly" * tag 'objtool-urgent-2025-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Simplify .annotate_insn code generation output some more objtool: Add more robust signal error handling, detect and warn about stack overflows objtool: Remove newlines and tabs from annotation macros objtool: Consolidate annotation macros x86/asm: Remove ANNOTATE_DATA_SPECIAL usage x86/alternative: Remove ANNOTATE_DATA_SPECIAL usage objtool: Fix stack overflow in validate_branch()
2025-12-03objtool: Remove newlines and tabs from annotation macrosJosh Poimboeuf-1/+1
Remove newlines and tabs from the annotation macros so the invoking code can insert them as needed to match the style of the surrounding code. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://patch.msgid.link/66305834c2eb78f082217611b756231ae9c0b555.1764694625.git.jpoimboe@kernel.org
2025-12-02Merge tag 'x86_misc_for_6.19-rc1' of ↵Linus Torvalds-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Dave Hansen: "The most significant are some changes to ensure that symbols exported for KVM are used only by KVM modules themselves, along with some related cleanups. In true x86/misc fashion, the other patch is completely unrelated and just enhances an existing pr_warn() to make it clear to users how they have tainted their kernel when something is mucking with MSRs. Summary: - Make MSR-induced taint easier for users to track down - Restrict KVM-specific exports to KVM itself" * tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" x86/mtrr: Drop unnecessary export of "mtrr_state" x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)
2025-12-02Merge tag 'x86_sev_for_v6.19_rc1' of ↵Linus Torvalds-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Largely cleanups along with a change to save XSS to the GHCB (Guest-Host Communication Block) in SEV-ES guests so that the hypervisor can determine the guest's XSAVES buffer size properly and thus support shadow stacks in AMD confidential guests * tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cc: Fix enum spelling to fix kernel-doc warnings x86/boot: Drop unused sev_enable() fallback x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled() x86/sev: Include XSS value in GHCB CPUID request x86/boot: Move boot_*msr helpers to asm/shared/msr.h
2025-11-12x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possibleSean Christopherson-6/+8
Extend KVM's export macro framework to provide EXPORT_SYMBOL_FOR_KVM(), and use the helper macro to export symbols for KVM throughout x86 if and only if KVM will build one or more modules, and only for those modules. To avoid unnecessary exports when CONFIG_KVM=m but kvm.ko will not be built (because no vendor modules are selected), let arch code #define EXPORT_SYMBOL_FOR_KVM to suppress/override the exports. Note, the set of symbols to restrict to KVM was generated by manual search and audit; any "misses" are due to human error, not some grand plan. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112173944.1380633-5-seanjc%40google.com
2025-11-11x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled()Borislav Petkov (AMD)-1/+1
Drop one redundant definition, while at it. There should be no functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251031122122.GKaQSpwhLvkinKKbjG@fat_crate.local
2025-10-16x86/insn: Simplify for_each_insn_prefix()Peter Zijlstra-7/+5
Use the new-found freedom of allowing variable declarions inside for() to simplify the for_each_insn_prefix() iterator to no longer need an external temporary. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-16x86/insn,uprobes,alternative: Unify insn_is_nop()Peter Zijlstra-0/+145
Both uprobes and alternatives have insn_is_nop() variants, unify them and make sure insn_is_nop() works for both x86_64 and i386. Specifically, uprobe must not compare userspace instructions to kernel nops as that does not work right in the compat case. For the uprobe case we therefore must recognise common 32bit and 64bit nops. Because uprobe will consume the instruction as a nop, it must not mistakenly claim a non-nop instruction to be a nop. Eg. 'REX.b3 NOP' is 'xchg %r8,%rax' - not a nop. For the kernel case similar constraints apply, is it used to optimize NOPs by replacing strings of short(er) nops with longer nops. Must not claim an instruction is a nop if it really isn't. Not recognising a nop is non-fatal. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-11Merge tag 'x86_core_for_v6.18_rc1' of ↵Linus Torvalds-31/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 updates from Borislav Petkov: - Remove a bunch of asm implementing condition flags testing in KVM's emulator in favor of int3_emulate_jcc() which is written in C - Replace KVM fastops with C-based stubs which avoids problems with the fastop infra related to latter not adhering to the C ABI due to their special calling convention and, more importantly, bypassing compiler control-flow integrity checking because they're written in asm - Remove wrongly used static branches and other ugliness accumulated over time in hyperv's hypercall implementation with a proper static function call to the correct hypervisor call variant - Add some fixes and modifications to allow running FRED-enabled kernels in KVM even on non-FRED hardware - Add kCFI improvements like validating indirect calls and prepare for enabling kCFI with GCC. Add cmdline params documentation and other code cleanups - Use the single-byte 0xd6 insn as the official #UD single-byte undefined opcode instruction as agreed upon by both x86 vendors - Other smaller cleanups and touchups all over the place * tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86,retpoline: Optimize patch_retpoline() x86,ibt: Use UDB instead of 0xEA x86/cfi: Remove __noinitretpoline and __noretpoline x86/cfi: Add "debug" option to "cfi=" bootparam x86/cfi: Standardize on common "CFI:" prefix for CFI reports x86/cfi: Document the "cfi=" bootparam options x86/traps: Clarify KCFI instruction layout compiler_types.h: Move __nocfi out of compiler-specific header objtool: Validate kCFI calls x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware x86/fred: Install system vector handlers even if FRED isn't fully enabled x86/hyperv: Use direct call to hypercall-page x86/hyperv: Clean up hv_do_hypercall() KVM: x86: Remove fastops KVM: x86: Convert em_salc() to C KVM: x86: Introduce EM_ASM_3WCL KVM: x86: Introduce EM_ASM_1SRC2 KVM: x86: Introduce EM_ASM_2CL KVM: x86: Introduce EM_ASM_2W ...
2025-09-30Merge tag 'x86_bugs_for_v6.18_rc1' of ↵Linus Torvalds-35/+40
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mitigation updates from Borislav Petkov: - Add VMSCAPE to the attack vector controls infrastructure - A bunch of the usual cleanups and fixlets, some of them resulting from fuzzing the different mitigation options * tag 'x86_bugs_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Report correct retbleed mitigation status x86/bugs: Fix reporting of LFENCE retpoline x86/bugs: Fix spectre_v2 forcing x86/bugs: Remove uses of cpu_mitigations_off() x86/bugs: Simplify SSB cmdline parsing x86/bugs: Use early_param() for spectre_v2 x86/bugs: Use early_param() for spectre_v2_user x86/bugs: Add attack vector controls for VMSCAPE x86/its: Move ITS indirect branch thunks to .text..__x86.indirect_thunk
2025-09-12x86/its: Move ITS indirect branch thunks to .text..__x86.indirect_thunkJosh Poimboeuf-35/+40
The ITS mitigation includes both indirect branch thunks and return thunks. Both are currently placed in .text..__x86.return_thunk, which is appropriate for the latter but not the former. For consistency with other mitigations, move the indirect branch thunks to .text..__x86.indirect_thunk. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
2025-09-04x86,ibt: Use UDB instead of 0xEAPeter Zijlstra-31/+31
A while ago [0] FineIBT started using the 0xEA instruction to raise #UD. All existing parts will generate #UD in 64bit mode on that instruction. However; Intel/AMD have not blessed using this instruction, it is on their 'reserved' opcode list for future use. Peter Anvin worked the committees and got use of 0xD6 blessed, it shall be called UDB (per the next SDM or so), and it being a single byte instruction is easy to slip into a single byte immediate -- as is done by this very patch. Reworking the FineIBT code to use UDB wasn't entirely trivial. Notably the FineIBT-BHI1 case ran out of bytes. In order to condense the encoding some it was required to move the hash register from R10D to EAX (thanks hpa!). Per the x86_64 ABI, RAX is used to pass the number of vector registers for vararg function calls -- something that should not happen in the kernel. More so, the kernel is built with -mskip-rax-setup, which should leave RAX completely unused, allowing its re-use. [ For BPF; while the bpf2bpf tail-call uses RAX in its calling convention, that does not use CFI and is unaffected. Only the 'regular' C->BPF transition is covered by CFI. ] The ENDBR poison value is changed from 'OSP NOP3' to 'NOPL -42(%RAX)', this is basically NOP4 but with UDB as its immediate. As such it is still a non-standard NOP value unique to prior ENDBR sites, but now also provides UDB. Per Agner Fog's optimization guide, Jcc is assumed not-taken. That is, the expected path should be the fallthrough case for improved throughput. Since the preamble now relies on the ENDBR poison to provide UDB, the code is changed to write the poison right along with the initial preamble -- this is possible because the ITS mitigation already disabled IBT over rewriting the CFI scheme. The scheme in detail: Preamble: FineIBT FineIBT-BHI1 FineIBT-BHI __cfi_\func: __cfi_\func: __cfi_\func: endbr endbr endbr subl $0x12345678, %eax subl $0x12345678, %eax subl $0x12345678, %eax jne.d32,np \func+3 cmovne %rax, %rdi cs cs call __bhi_args_N jne.d8,np \func+3 \func: \func: \func: nopl -42(%rax) nopl -42(%rax) nopl -42(%rax) Notably there are 7 bytes available after the SUBL; this enables the BHI1 case to fit without the nasty overlapping case it had previously. The !BHI case uses Jcc.d32,np to consume all 7 bytes without the need for an additional NOP, while the BHI case uses CS padding to align the CALL with the end of the preamble such that it returns to \func+0. Caller: FineIBT Paranoid-FineIBT fineibt_caller: fineibt_caller: mov $0x12345678, %eax mov $0x12345678, %eax lea -10(%r11), %r11 cmp -0x11(%r11), %eax nop5 cs lea -0x10(%r11), %r11 retpoline: retpoline: cs call __x86_indirect_thunk_r11 jne fineibt_caller+0xd call *%r11 nop Notably this is before apply_retpolines() which will fix up the retpoline call -- since all parts with IBT also have eIBRS (lets ignore ITS). Typically the retpoline site is rewritten (when still intact) into: call *%r11 nop3 [0] 06926c6cdb95 ("x86/ibt: Optimize the FineIBT instruction sequence") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250901191307.GI4067720@noisy.programming.kicks-ass.net
2025-08-18x86/insn: Add XOP prefix instructions decoder supportMasami Hiramatsu (Google)-10/+149
Support decoding AMD's XOP prefix encoded instructions. These instructions are introduced for Bulldozer micro architecture, and not supported on Intel's processors. But when compiling kernel with CONFIG_X86_NATIVE_CPU on some AMD processor (e.g. -march=bdver2), these instructions can be used. Closes: https://lore.kernel.org/all/871pq06728.fsf@wylie.me.uk/ Reported-by: Alan J. Wylie <alan@wylie.me.uk> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Alan J. Wylie <alan@wylie.me.uk> Link: https://lore.kernel.org/175386161199.564247.597496379413236944.stgit@devnote2
2025-07-29Merge tag 'x86_core_for_v6.17_rc1' of ↵Linus Torvalds-2/+24
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Add helpers for WB{NO,}INVD with the purpose of using them in KVM and thus diminish the number of invalidations needed. With preceding cleanups, as always * tag 'x86_core_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUs x86/lib: Add WBNOINVD helper functions x86/lib: Drop the unused return value from wbinvd_on_all_cpus() drm/gpu: Remove dead checks on wbinvd_on_all_cpus()'s return value
2025-07-28Merge tag 'libcrypto-updates-for-linus' of ↵Linus Torvalds-9666/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request for 6.17. The main focus this cycle is on reorganizing the SHA-1 and SHA-2 code, providing high-quality library APIs for SHA-1 and SHA-2 including HMAC support, and establishing conventions for lib/crypto/ going forward: - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares most of the SHA-512 code) into lib/crypto/. This includes both the generic and architecture-optimized code. Greatly simplify how the architecture-optimized code is integrated. Add an easy-to-use library API for each SHA variant, including HMAC support. Finally, reimplement the crypto_shash support on top of the library API. - Apply the same reorganization to the SHA-256 code (and also SHA-224 which shares most of the SHA-256 code). This is a somewhat smaller change, due to my earlier work on SHA-256. But this brings in all the same additional improvements that I made for SHA-1 and SHA-512. There are also some smaller changes: - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For these algorithms it's just a move, not a full reorganization yet. - Fix the MIPS chacha-core.S to build with the clang assembler. - Fix the Poly1305 functions to work in all contexts. - Fix a performance regression in the x86_64 Poly1305 code. - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code. Note that since the new organization of the SHA code is much simpler, the diffstat of this pull request is negative, despite the addition of new fully-documented library APIs for multiple SHA and HMAC-SHA variants. These APIs will allow further simplifications across the kernel as users start using them instead of the old-school crypto API. (I've already written a lot of such conversion patches, removing over 1000 more lines of code. But most of those will target 6.18 or later)" * tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits) lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils lib/crypto: x86/sha1-ni: Convert to use rounds macros lib/crypto: x86/sha1-ni: Minor optimizations and cleanup crypto: sha1 - Remove sha1_base.h lib/crypto: x86/sha1: Migrate optimized code into library lib/crypto: sparc/sha1: Migrate optimized code into library lib/crypto: s390/sha1: Migrate optimized code into library lib/crypto: powerpc/sha1: Migrate optimized code into library lib/crypto: mips/sha1: Migrate optimized code into library lib/crypto: arm64/sha1: Migrate optimized code into library lib/crypto: arm/sha1: Migrate optimized code into library crypto: sha1 - Use same state format as legacy drivers crypto: sha1 - Wrap library and add HMAC support lib/crypto: sha1: Add HMAC support lib/crypto: sha1: Add SHA-1 library functions lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() crypto: x86/sha1 - Rename conflicting symbol lib/crypto: sha2: Add hmac_sha*_init_usingrawkey() lib/crypto: arm/poly1305: Remove unneeded empty weak function lib/crypto: x86/poly1305: Fix performance regression on short messages ...
2025-07-10x86/lib: Add WBINVD and WBNOINVD helpers to target multiple CPUsZheyun Shen-0/+12
Extract KVM's open-coded calls to do writeback caches on multiple CPUs to common library helpers for both WBINVD and WBNOINVD (KVM will use both). Put the onus on the caller to check for a non-empty mask to simplify the SMP=n implementation, e.g. so that it doesn't need to check that the one and only CPU in the system is present in the mask. [sean: move to lib, add SMP=n helpers, clarify usage] Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn Link: https://lore.kernel.org/20250522233733.3176144-5-seanjc@google.com
2025-07-10x86/lib: Add WBNOINVD helper functionsKevin Loughlin-0/+11
In line with WBINVD usage, add WBNOINVD helper functions. Explicitly fall back to WBINVD (via alternative()) if WBNOINVD isn't supported even though the instruction itself is backwards compatible (WBNOINVD is WBINVD with an ignored REP prefix), so that disabling X86_FEATURE_WBNOINVD behaves as one would expect, e.g. in case there's a hardware issue that affects WBNOINVD. Opportunistically, add comments explaining the architectural behavior of WBINVD and WBNOINVD, and provide hints and pointers to uarch-specific behavior. Note, alternative() ensures compatibility with early boot code as needed. [ bp: Massage, fix typos, make export _GPL. ] Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/20250522233733.3176144-4-seanjc@google.com
2025-07-10x86/lib: Drop the unused return value from wbinvd_on_all_cpus()Sean Christopherson-2/+1
Drop wbinvd_on_all_cpus()'s return value; both the "real" version and the stub always return '0', and none of the callers check the return. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250522233733.3176144-3-seanjc@google.com
2025-06-30lib/crc: x86: Migrate optimized CRC code into lib/crc/Eric Biggers-1435/+0
Move the x86-optimized CRC code from arch/x86/lib/crc* into its new location in lib/crc/x86/, and wire it up in the new way. This new way of organizing the CRC code eliminates the need to artificially split the code for each CRC variant into separate arch and generic modules, enabling better inlining and dead code elimination. For more details, see "lib/crc: Prepare for arch-optimized code in subdirs of lib/crc/". Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20250607200454.73587-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30x86/crc: drop checks of CONFIG_AS_VPCLMULQDQEric Biggers-9/+1
Now that the minimum binutils version supports VPCLMULQDQ (and the minimum clang version does too), there is no need to check for assembler support before compiling code that uses these instructions. Link: https://lore.kernel.org/r/20250531211318.83677-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-06-30lib/crypto: x86: Move arch/x86/lib/crypto/ into lib/crypto/Eric Biggers-9666/+4
Move the contents of arch/x86/lib/crypto/ into lib/crypto/x86/. The new code organization makes a lot more sense for how this code actually works and is developed. In particular, it makes it possible to build each algorithm as a single module, with better inlining and dead code elimination. For a more detailed explanation, see the patchset which did this for the CRC library code: https://lore.kernel.org/r/20250607200454.73587-1-ebiggers@kernel.org/. Also see the patchset which did this for SHA-512: https://lore.kernel.org/linux-crypto/20250616014019.415791-1-ebiggers@kernel.org/ This is just a preparatory commit, which does the move to get the files into their new location but keeps them building the same way as before. Later commits will make the actual improvements to the way the arch-optimized code is integrated for each algorithm. Add a gitignore entry for the removed directory arch/x86/lib/crypto/ so that people don't accidentally commit leftover generated files. Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20250619191908.134235-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-05-26Merge tag 'x86-core-2025-05-25' of ↵Linus Torvalds-76/+86
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: | Since commit | | c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") | | dated Jun 6 2017, we have been using C code on the boot path in a way | that is not supported by the toolchain, i.e., to execute non-PIC C | code from a mapping of memory that is different from the one provided | to the linker. It should have been obvious at the time that this was a | bad idea, given the need to sprinkle fixup_pointer() calls left and | right to manipulate global variables (including non-pointer variables) | without crashing. | | This C startup code has been expanding, and in particular, the SEV-SNP | startup code has been expanding over the past couple of years, and | grown many of these warts, where the C code needs to use special | annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)" * tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits) x86/bugs: Fix spectre_v2 mitigation default on Intel x86/bugs: Restructure ITS mitigation x86/xen/msr: Fix uninitialized variable 'err' x86/msr: Remove a superfluous inclusion of <asm/asm.h> x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only x86/mm/64: Make 5-level paging support unconditional x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model x86/mm/64: Always use dynamic memory layout x86/bugs: Fix indentation due to ITS merge x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() x86/cpuid: Rename have_cpuid_p() to cpuid_feature() x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h> x86/msr: Add rdmsrl_on_cpu() compatibility wrapper x86/mm: Fix kernel-doc descriptions of various pgtable methods x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too x86/boot: Defer initialization of VM space related global variables ...
2025-05-26Merge tag 'v6.16-p1' of ↵Linus Torvalds-0/+9666
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix memcpy_sglist to handle partially overlapping SG lists - Use memcpy_sglist to replace null skcipher - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS - Hide CRYPTO_MANAGER - Add delayed freeing of driver crypto_alg structures Compression: - Allocate large buffers on first use instead of initialisation in scomp - Drop destination linearisation buffer in scomp - Move scomp stream allocation into acomp - Add acomp scatter-gather walker - Remove request chaining - Add optional async request allocation Hashing: - Remove request chaining - Add optional async request allocation - Move partial block handling into API - Add ahash support to hmac - Fix shash documentation to disallow usage in hard IRQs Algorithms: - Remove unnecessary SIMD fallback code on x86 and arm/arm64 - Drop avx10_256 xts(aes)/ctr(aes) on x86 - Improve avx-512 optimisations for xts(aes) - Move chacha arch implementations into lib/crypto - Move poly1305 into lib/crypto and drop unused Crypto API algorithm - Disable powerpc/poly1305 as it has no SIMD fallback - Move sha256 arch implementations into lib/crypto - Convert deflate to acomp - Set block size correctly in cbcmac Drivers: - Do not use sg_dma_len before mapping in sun8i-ss - Fix warm-reboot failure by making shutdown do more work in qat - Add locking in zynqmp-sha - Remove cavium/zip - Add support for PCI device 0x17D8 to ccp - Add qat_6xxx support in qat - Add support for RK3576 in rockchip-rng - Add support for i.MX8QM in caam Others: - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up - Add new SEV/SNP platform shutdown API in ccp" * tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits) x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining crypto: qat - add missing header inclusion crypto: api - Redo lookup on EEXIST Revert "crypto: testmgr - Add hash export format testing" crypto: marvell/cesa - Do not chain submitted requests crypto: powerpc/poly1305 - add depends on BROKEN for now Revert "crypto: powerpc/poly1305 - Add SIMD fallback" crypto: ccp - Add missing tee info reg for teev2 crypto: ccp - Add missing bootloader info reg for pspv5 crypto: sun8i-ce - move fallback ahash_request to the end of the struct crypto: octeontx2 - Use dynamic allocated memory region for lmtst crypto: octeontx2 - Initialize cptlfs device info once crypto: xts - Only add ecb if it is not already there crypto: lrw - Only add ecb if it is not already there crypto: testmgr - Add hash export format testing crypto: testmgr - Use ahash for generic tfm crypto: hmac - Add ahash support crypto: testmgr - Ignore EEXIST on shash allocation crypto: algapi - Add driver template support to crypto_inst_setname crypto: shash - Set reqsize in shash_alg ...
2025-05-26Merge tag 'crc-for-linus' of ↵Linus Torvalds-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Cleanups for the kernel's CRC (cyclic redundancy check) code: - Use __ro_after_init where appropriate - Remove unnecessary static_key on s390 - Rename some source code files - Rename the crc32 and crc32c crypto API modules - Use subsys_initcall instead of arch_initcall - Restore maintainers for crc_kunit.c - Fold crc16_byte() into crc16.c - Add some SPDX license identifiers" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crc32: add SPDX license identifier lib/crc16: unexport crc16_table and crc16_byte() w1: ds2406: use crc16() instead of crc16_byte() loop MAINTAINERS: add crc_kunit.c back to CRC LIBRARY lib/crc: make arch-optimized code use subsys_initcall crypto: crc32 - remove "generic" from file and module names x86/crc: drop "glue" from filenames sparc/crc: drop "glue" from filenames s390/crc: drop "glue" from filenames powerpc/crc: rename crc32-vpmsum_core.S to crc-vpmsum-template.S powerpc/crc: drop "glue" from filenames arm64/crc: drop "glue" from filenames arm/crc: drop "glue" from filenames s390/crc32: Remove no-op module init and exit functions s390/crc32: Remove have_vxrs static key lib/crc: make the CPU feature static keys __ro_after_init
2025-05-13Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflictsIngo Molnar-0/+48
Conflicts: Documentation/admin-guide/hw-vuln/index.rst arch/x86/include/asm/cpufeatures.h arch/x86/kernel/alternative.c arch/x86/kernel/cpu/bugs.c arch/x86/kernel/cpu/common.c drivers/base/cpu.c include/linux/cpu.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Merge branch 'x86/msr' into x86/core, to resolve conflictsIngo Molnar-18/+19
Conflicts: arch/x86/boot/startup/sme.c arch/x86/coco/sev/core.c arch/x86/kernel/fpu/core.c arch/x86/kernel/fpu/xstate.c Semantic conflict: arch/x86/include/asm/sev-internal.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Merge branch 'x86/boot' into x86/core, to merge dependent commitsIngo Molnar-0/+4
Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: 6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Merge branch 'x86/asm' into x86/core, to merge dependent commitsIngo Molnar-55/+53
Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: 6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Merge branch 'x86/alternatives' into x86/core, to merge dependent commitsIngo Molnar-3/+10
Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: 6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-12crypto: lib/chacha - add array bounds to function prototypesEric Biggers-4/+4
Add explicit array bounds to the function prototypes for the parameters that didn't already get handled by the conversion to use chacha_state: - chacha_block_*(): Change 'u8 *out' or 'u8 *stream' to u8 out[CHACHA_BLOCK_SIZE]. - hchacha_block_*(): Change 'u32 *out' or 'u32 *stream' to u32 out[HCHACHA_OUT_WORDS]. - chacha_init(): Change 'const u32 *key' to 'const u32 key[CHACHA_KEY_WORDS]'. Change 'const u8 *iv' to 'const u8 iv[CHACHA_IV_SIZE]'. No functional changes. This just makes it clear when fixed-size arrays are expected. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-12crypto: lib/chacha - strongly type the ChaCha stateEric Biggers-24/+34
The ChaCha state matrix is 16 32-bit words. Currently it is represented in the code as a raw u32 array, or even just a pointer to u32. This weak typing is error-prone. Instead, introduce struct chacha_state: struct chacha_state { u32 x[16]; }; Convert all ChaCha and HChaCha functions to use struct chacha_state. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-09lib/crc: make arch-optimized code use subsys_initcallEric Biggers-3/+3
Make the architecture-optimized CRC code do its CPU feature checks in subsys_initcalls instead of arch_initcalls. This makes it consistent with arch/*/lib/crypto/ and ensures that it runs after initcalls that possibly could be a prerequisite for kernel-mode FPU, such as x86's xfd_update_static_branch() and loongarch's init_euen_mask(). Note: as far as I can tell, x86's xfd_update_static_branch() isn't *actually* needed for kernel-mode FPU. loongarch's init_euen_mask() is needed to enable save/restore of the vector registers, but loongarch doesn't yet have any CRC or crypto code that uses vector registers anyway. Regardless, let's be consistent with arch/*/lib/crypto/ and robust against any potential future dependency on an arch_initcall. Link: https://lore.kernel.org/r/20250510035959.87995-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-05-09x86/its: FineIBT-paranoid vs ITSPeter Zijlstra-3/+12
FineIBT-paranoid was using the retpoline bytes for the paranoid check, disabling retpolines, because all parts that have IBT also have eIBRS and thus don't need no stinking retpolines. Except... ITS needs the retpolines for indirect calls must not be in the first half of a cacheline :-/ So what was the paranoid call sequence: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b <f0> lea -0x10(%r11), %r11 e: 75 fd jne d <fineibt_paranoid_start+0xd> 10: 41 ff d3 call *%r11 13: 90 nop Now becomes: <fineibt_paranoid_start>: 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d a: 4d 8d 5b f0 lea -0x10(%r11), %r11 e: 2e e8 XX XX XX XX cs call __x86_indirect_paranoid_thunk_r11 Where the paranoid_thunk looks like: 1d: <ea> (bad) __x86_indirect_paranoid_thunk_r11: 1e: 75 fd jne 1d __x86_indirect_its_thunk_r11: 20: 41 ff eb jmp *%r11 23: cc int3 [ dhansen: remove initialization to false ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09x86/its: Add support for ITS-safe return thunkPawan Gupta-1/+12
RETs in the lower half of cacheline may be affected by ITS bug, specifically when the RSB-underflows. Use ITS-safe return thunk for such RETs. RETs that are not patched: - RET in retpoline sequence does not need to be patched, because the sequence itself fills an RSB before RET. - RET in Call Depth Tracking (CDT) thunks __x86_indirect_{call|jump}_thunk and call_depth_return_thunk are not patched because CDT by design prevents RSB-underflow. - RETs in .init section are not reachable after init. - RETs that are explicitly marked safe with ANNOTATE_UNRET_SAFE. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09x86/its: Add support for ITS-safe indirect thunkPawan Gupta-0/+28
Due to ITS, indirect branches in the lower half of a cacheline may be vulnerable to branch target injection attack. Introduce ITS-safe thunks to patch indirect branches in the lower half of cacheline with the thunk. Also thunk any eBPF generated indirect branches in emit_indirect_jump(). Below category of indirect branches are not mitigated: - Indirect branches in the .init section are not mitigated because they are discarded after boot. - Indirect branches that are explicitly marked retpoline-safe. Note that retpoline also mitigates the indirect branches against ITS. This is because the retpoline sequence fills an RSB entry before RET, and it does not suffer from RSB-underflow part of the ITS. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-06x86/insn: Stop decoding i64 instructions in x86-64 mode at opcodeMasami Hiramatsu (Google)-4/+9
In commit 2e044911be75 ("x86/traps: Decode 0xEA instructions as #UD") FineIBT starts using 0xEA as an invalid instruction like UD2. But insn decoder always returns the length of "0xea" instruction as 7 because it does not check the (i64) superscript. The x86 instruction decoder should also decode 0xEA on x86-64 as a one-byte invalid instruction by decoding the "(i64)" superscript tag. This stops decoding instruction which has (i64) but does not have (o64) superscript in 64-bit mode at opcode and skips other fields. With this change, insn_decoder_test says 0xea is 1 byte length if x86-64 (-y option means 64-bit): $ printf "0:\tea\t\n" | insn_decoder_test -y -v insn_decoder_test: success: Decoded and checked 1 instructions Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/174580490000.388420.5225447607417115496.stgit@devnote2
2025-05-06x86/insn: Fix opcode map (!REX2) superscript tagsMasami Hiramatsu (Google)-25/+25
Commit: 159039af8c07 ("x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map") added (!REX2) superscript with a space, but the correct format requires ',' for concatination with other superscript tags. Add ',' to generate correct insn attribute tables. I confirmed with following command: arch/x86/lib/x86-opcode-map.txt | grep e8 | head -n 1 [0xe8] = INAT_MAKE_IMM(INAT_IMM_VWORD32) | INAT_FORCE64 | INAT_NO_REX2, Fixes: 159039af8c07 ("x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/174580489027.388420.15539375184727726142.stgit@devnote2
2025-05-06Merge tag 'v6.15-rc4' into x86/asm, to pick up fixesIngo Molnar-2/+2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-05crypto: x86/sha256 - Add simd block functionHerbert Xu-3/+10
Add CRYPTO_ARCH_HAVE_LIB_SHA256_SIMD and a SIMD block function so that the caller can decide whether to use SIMD. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: arch/sha256 - Export block functions as GPL onlyHerbert Xu-2/+2
Export the block functions as GPL only, there is no reason to let arbitrary modules use these internal functions. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: x86/blake2s - Include linux/init.hHerbert Xu-7/+5
Explicitly include linux/init.h rather than pulling it through potluck. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05Revert "crypto: run initcalls for generic implementations earlier"Herbert Xu-3/+3
This reverts commit c4741b23059794bd99beef0f700103b0d983b3fd. Crypto API self-tests no longer run at registration time and now occur either at late_initcall or upon the first use. Therefore the premise of the above commit no longer exists. Revert it and subsequent additions of subsys_initcall and arch_initcall. Note that lib/crypto calls will stay at subsys_initcall (or rather downgraded from arch_initcall) because they may need to occur before Crypto API registration. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: x86/sha256 - implement library instead of shashEric Biggers-0/+2064
Instead of providing crypto_shash algorithms for the arch-optimized SHA-256 code, instead implement the SHA-256 library. This is much simpler, it makes the SHA-256 library functions be arch-optimized, and it fixes the longstanding issue where the arch-optimized SHA-256 was disabled by default. SHA-256 still remains available through crypto_shash, but individual architectures no longer need to handle it. To match sha256_blocks_arch(), change the type of the nblocks parameter of the assembly functions from int to size_t. The assembly functions actually already treated it as size_t. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: lib/poly1305 - Use block-only interfaceHerbert Xu-60/+0
Now that every architecture provides a block function, use that to implement the lib/poly1305 and remove the old per-arch code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05crypto: x86/poly1305 - Add block-only interfaceHerbert Xu-85/+69
Add block-only interface. Also remove the unnecessary SIMD fallback path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>