summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorLines
2026-03-19perf annotate-data: Add invalidate_reg_state() helper for x86Zecheng Li-12/+17
Add a helper function to consistently invalidate register state instead of field assignments. This ensures kind, ok, and copied_from are all properly cleared when a register becomes invalid. The helper sets: - kind = TSR_KIND_INVALID - ok = false - copied_from = -1 Replace all invalidation patterns with calls to this helper. No functional change and this removes some incorrect annotations that were caused by incomplete invalidation (e.g. a obsolete copied_from from an invalidated register). Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf annotate-data: Handle global variable access with const registerZecheng Li-0/+5
When a register holds a constant value (TSR_KIND_CONST) and is used with a negative offset, treat it as a potential global variable access instead of falling through to CFA (frame) handling. This fixes cases like array indexing with computed offsets: movzbl -0x7d72725a(%rax), %eax # array[%rax] Where %rax contains a computed index and the negative offset points to a global array. Previously this fell through to the CFA path which doesn't handle global variables, resulting in "no type information". The fix redirects such accesses to check_kernel which calls get_global_var_type() to resolve the type from the global variable cache. This is only done for kernel DSOs since the pattern relies on kernel-specific global variable resolution. We could also treat registers with integer types to the global variable path, but this requires more changes. Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf annotate-data: Collect global variables without nameZecheng Li-6/+1
Previously, global_var__collect() required get_global_var_info() to succeed (i.e., the variable must have a symbol name) before caching a global variable. This prevented variables that exist in DWARF but lack symbol table coverage from being cached. Remove the symbol table requirement since DW_OP_addr already provides the variable's address directly from DWARF. The symbol table lookup is now optional to obtain the variable name when available. Also remove the var_offset != 0 check, which was intended to skip variables where the access address doesn't match the symbol start. The symbol table lookup is now optional and I found removing this check has no effect on the annotation results for both kernel and userspace programs. Test results show improved annotation coverage especially for userspace programs with RIP-relative addressing instructions. Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf dwarf-aux: Handle array types in die_get_member_typeZecheng Li-1/+16
When a struct member is an array type, die_get_member_type() would stop iterating since array types weren't handled in the loop. This caused accesses to array elements within structs to not resolve properly. Add array type handling by resolving the array to its element type and calculating the offset within an element using modulo arithmetic This improves type annotation coverage for struct members that are arrays. Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf annotate-data: Improve type comparison from different scopesZecheng Li-1/+3
When comparing types from different scopes, first compare their type offsets. A larger offset means the field belongs to an outer (enclosing) struct. This helps resolve cases where a pointer is found in an inner scope, but a struct containing that pointer exists in an outer scope. Previously, is_better_type would prefer the pointer type, but the struct type is actually more complete and should be chosen. Prefer types from outer scopes when is_better_type cannot determine a better type. This is a heuristic for the case `struct A { struct B; }` where A and B have the same size but I think in most cases A is in the outer scope and should be preferred. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf dwarf-aux: Skip check_variable for variable lookupZecheng Li-30/+30
Both die_find_variable_by_reg and die_find_variable_by_addr call match_var_offset which already performs sufficient checking and type matching. The additional check_variable call is redundant, and its need_pointer logic is only a heuristic. Since DWARF encodes accurate type information, which match_var_offset verifies, skipping check_variable improves both coverage and accuracy. Return the matched type from die_find_variable_by_reg and die_find_variable_by_addr via the existing `type` field in find_var_data, removing the need for check_variable in find_data_type_die. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf dwarf-aux: Preserve typedefs in match_var_offsetZecheng Li-11/+15
Preserve typedefs in match_var_offset to match the results by __die_get_real_type. Also move the (offset == 0) branch after the is_pointer check to ensure the correct type is used, fixing cases where an incorrect pointer type was chosen when the access offset was 0. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19perf dwarf-aux: Add die_get_pointer_type to get pointer typesZecheng Li-17/+51
When a variable type is wrapped in typedef/qualifiers, callers may need to first resolve it to the underlying DW_TAG_pointer_type or DW_TAG_array_type. A simple tag check is not enough and directly calling __die_get_real_type() can stop at the pointer type (e.g. typedef -> pointer) instead of the pointee type. Add die_get_pointer_type() helper that follows typedef/qualifier chains and returns the underlying pointer DIE. Use it in annotate-data.c so pointer checks and dereference work correctly for typedef'd pointers. Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2026-03-19PCI: eswin: Add ESWIN PCIe Root Complex driverSenchuan Zhang-0/+426
Add driver for the ESWIN PCIe Root Complex based on the DesignWare PCIe core, IP revision 5.96a. The PCIe Gen.3 Root Complex supports data rate of 8 GT/s and x4 lanes, with INTx and MSI interrupt capability. Signed-off-by: Yu Ning <ningyu@eswincomputing.com> Signed-off-by: Yanghui Ou <ouyanghui@eswincomputing.com> Signed-off-by: Senchuan Zhang <zhangsenchuan@eswincomputing.com> [mani: renamed "EIC7700" to "ESWIN", added maintainers entry, removed async probe] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> [bhelgaas: add driver tag in subject] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260227111808.1996-1-zhangsenchuan@eswincomputing.com
2026-03-19dt-bindings: PCI: eswin: Add ESWIN PCIe Root ComplexSenchuan Zhang-0/+166
Add Device Tree binding documentation for the ESWIN PCIe Root Complex. The Root Complex is based on Synopsys Designware PCIe IP. Signed-off-by: Yu Ning <ningyu@eswincomputing.com> Signed-off-by: Yanghui Ou <ouyanghui@eswincomputing.com> Signed-off-by: Senchuan Zhang <zhangsenchuan@eswincomputing.com> [mani: Renamed 'EIC7700' to 'ESWIN'] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> [bhelgaas: add driver tag in subject] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260227111732.1979-1-zhangsenchuan@eswincomputing.com
2026-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-3382/+5779
Cross-merge networking fixes after downstream PR (net-7.0-rc5). net/netfilter/nft_set_rbtree.c 598adea720b97 ("netfilter: revert nft_set_rbtree: validate open interval overlap") 3aea466a43998 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-19io_uring/kbuf: propagate BUF_MORE through early buffer commit pathJens Axboe-3/+10
When io_should_commit() returns true (eg for non-pollable files), buffer commit happens at buffer selection time and sel->buf_list is set to NULL. When __io_put_kbufs() generates CQE flags at completion time, it calls __io_put_kbuf_ring() which finds a NULL buffer_list and hence cannot determine whether the buffer was consumed or not. This means that IORING_CQE_F_BUF_MORE is never set for non-pollable input with incrementally consumed buffers. Likewise for io_buffers_select(), which always commits upfront and discards the return value of io_kbuf_commit(). Add REQ_F_BUF_MORE to store the result of io_kbuf_commit() during early commit. Then __io_put_kbuf_ring() can check this flag and set IORING_F_BUF_MORE accordingy. Reported-by: Martin Michaelis <code@mgjm.de> Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Link: https://github.com/axboe/liburing/issues/1553 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-19io_uring/kbuf: fix missing BUF_MORE for incremental buffers at EOFJens Axboe-0/+4
For a zero length transfer, io_kbuf_inc_commit() is called with !len. Since we never enter the while loop to consume the buffers, io_kbuf_inc_commit() ends up returning true, consuming the buffer. But if no data was consumed, by definition it cannot have consumed the buffer. Return false for that case. Reported-by: Martin Michaelis <code@mgjm.de> Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Link: https://github.com/axboe/liburing/issues/1553 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-03-19selftests/landlock: Test tsync interruption and cancellation pathsMickaël Salaün-1/+90
Add tsync_interrupt test to exercise the signal interruption path in landlock_restrict_sibling_threads(). When a signal interrupts wait_for_completion_interruptible() while the calling thread waits for sibling threads to finish credential preparation, the kernel: 1. Sets ERESTARTNOINTR to request a transparent syscall restart. 2. Calls cancel_tsync_works() to opportunistically dequeue task works that have not started running yet. 3. Breaks out of the preparation loop, then unblocks remaining task works via complete_all() and waits for them to finish. 4. Returns the error, causing abort_creds() in the syscall handler. Specifically, cancel_tsync_works() in its entirety, the ERESTARTNOINTR error branch in landlock_restrict_sibling_threads(), and the abort_creds() error branch in the landlock_restrict_self() syscall handler are timing-dependent and not exercised by the existing tsync tests, making code coverage measurements non-deterministic. The test spawns a signaler thread that rapidly sends SIGUSR1 to the calling thread while it performs landlock_restrict_self() with LANDLOCK_RESTRICT_SELF_TSYNC. Since ERESTARTNOINTR causes a transparent restart, userspace always sees the syscall succeed. This is a best-effort coverage test: the interruption path is exercised when the signal lands during the preparation wait, which depends on thread scheduling. The test creates enough idle sibling threads (200) to ensure multiple serialized waves of credential preparation even on machines with many cores (e.g., 64), widening the window for the signaler. Deterministic coverage would require wrapping the wait call with ALLOW_ERROR_INJECTION() and using CONFIG_FAIL_FUNCTION. Test coverage for security/landlock was 90.2% of 2105 lines according to LLVM 21, and it is now 91.1% of 2105 lines with this new test. Cc: Günther Noack <gnoack@google.com> Cc: Justin Suess <utilityemal77@gmail.com> Cc: Tingmao Wang <m@maowtm.org> Cc: Yihan Ding <dingyihan@uniontech.com> Link: https://lore.kernel.org/r/20260310190416.1913908-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-03-19bpf: Add warning to detect memory leak in bpf_selem_unlink_nofail()Amery Hung-0/+8
While very unlikely, local storage theoretically may leak memory of the size of "struct bpf_local_storage" when destroy() fails to grab local_storage->lock and initializes selem->local_storage before other racing map_free() see it. Warn the user to allow debugging the issue instead of leaking the memory silently. Note that test_maps in bpf selftests already stress tested bpf_selem_unlink_nofail() by creating 4096 sockets and then immediately destroying them in multiple threads. With 64 threads, 64 x 4096 socket local storages were created and destroyed during the test and no warning in the function were triggered. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://patch.msgid.link/20260318224219.615105-1-ameryhung@gmail.com
2026-03-19smb: client: fix generic/694 due to wrong ->i_blocksPaulo Alcantara-32/+16
When updating ->i_size, make sure to always update ->i_blocks as well until we query new allocation size from the server. generic/694 was failing because smb3_simple_falloc() was missing the update of ->i_blocks after calling cifs_setsize(). So, fix this by updating ->i_blocks directly in cifs_setsize(), so all places that call it doesn't need to worry about updating ->i_blocks later. Reported-by: Shyam Prasad N <sprasad@microsoft.com> Closes: https://lore.kernel.org/r/CANT5p=rqgRwaADB=b_PhJkqXjtfq3SFv41SSTXSVEHnuh871pA@mail.gmail.com Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-19pinctrl: mediatek: common: Fix probe failure for devices without EINTLuca Leonardo Scorcia-3/+6
Some pinctrl devices like mt6397 or mt6392 don't support EINT at all, but the mtk_eint_init function is always called and returns -ENODEV, which then bubbles up and causes probe failure. To address this only call mtk_eint_init if EINT pins are present. Tested on Xiaomi Mi Smart Clock x04g (mt6392). Fixes: e46df235b4e6 ("pinctrl: mediatek: refactor EINT related code for all MediaTek pinctrl can fit") Signed-off-by: Luca Leonardo Scorcia <l.scorcia@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
2026-03-19Bluetooth: L2CAP: Fix regressions caused by reusing identLuiz Augusto von Dentz-3/+27
This attempt to fix regressions caused by reusing ident which apparently is not handled well on certain stacks causing the stack to not respond to requests, so instead of simple returning the first unallocated id this stores the last used tx_ident and then attempt to use the next until all available ids are exausted and then cycle starting over to 1. Link: https://bugzilla.kernel.org/show_bug.cgi?id=221120 Link: https://bugzilla.kernel.org/show_bug.cgi?id=221177 Fixes: 6c3ea155e5ee ("Bluetooth: L2CAP: Fix not tracking outstanding TX ident") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Christian Eggers <ceggers@arri.de>
2026-03-19Bluetooth: L2CAP: Fix null-ptr-deref on l2cap_sock_ready_cbHelen Koike-0/+3
Before using sk pointer, check if it is null. Fix the following: KASAN: null-ptr-deref in range [0x0000000000000260-0x0000000000000267] CPU: 0 UID: 0 PID: 5985 Comm: kworker/0:5 Not tainted 7.0.0-rc4-00029-ga989fde763f4 #1 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-9.fc43 06/10/2025 Workqueue: events l2cap_info_timeout RIP: 0010:kasan_byte_accessible+0x12/0x30 Code: 79 ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 40 d6 48 c1 ef 03 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 07 3c 08 0f 92 c0 c3 cc cce veth0_macvtap: entered promiscuous mode RSP: 0018:ffffc90006e0f808 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffffffff89746018 RCX: 0000000080000001 RDX: 0000000000000000 RSI: ffffffff89746018 RDI: 000000000000004c RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: dffffc0000000000 R11: ffffffff8aae3e70 R12: 0000000000000000 R13: 0000000000000260 R14: 0000000000000260 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880983c2000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005582615a5008 CR3: 000000007007e000 CR4: 0000000000752ef0 PKRU: 55555554 Call Trace: <TASK> __kasan_check_byte+0x12/0x40 lock_acquire+0x79/0x2e0 lock_sock_nested+0x48/0x100 ? l2cap_sock_ready_cb+0x46/0x160 l2cap_sock_ready_cb+0x46/0x160 l2cap_conn_start+0x779/0xff0 ? __pfx_l2cap_conn_start+0x10/0x10 ? l2cap_info_timeout+0x60/0xa0 ? __pfx___mutex_lock+0x10/0x10 l2cap_info_timeout+0x68/0xa0 ? process_scheduled_works+0xa8d/0x18c0 process_scheduled_works+0xb6e/0x18c0 ? __pfx_process_scheduled_works+0x10/0x10 ? assign_work+0x3d5/0x5e0 worker_thread+0xa53/0xfc0 kthread+0x388/0x470 ? __pfx_worker_thread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x51e/0xb90 ? __pfx_ret_from_fork+0x10/0x10 veth1_macvtap: entered promiscuous mode ? __switch_to+0xc7d/0x1450 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- batman_adv: batadv0: Interface activated: batadv_slave_0 batman_adv: batadv0: Interface activated: batadv_slave_1 netdevsim netdevsim7 netdevsim0: set [1, 0] type 2 family 0 port 6081 - 0 netdevsim netdevsim7 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0 netdevsim netdevsim7 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0 netdevsim netdevsim7 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0 RIP: 0010:kasan_byte_accessible+0x12/0x30 Code: 79 ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 40 d6 48 c1 ef 03 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 07 3c 08 0f 92 c0 c3 cc cce ieee80211 phy39: Selected rate control algorithm 'minstrel_ht' RSP: 0018:ffffc90006e0f808 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffffffff89746018 RCX: 0000000080000001 RDX: 0000000000000000 RSI: ffffffff89746018 RDI: 000000000000004c RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: dffffc0000000000 R11: ffffffff8aae3e70 R12: 0000000000000000 R13: 0000000000000260 R14: 0000000000000260 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880983c2000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7e16139e9c CR3: 000000000e74e000 CR4: 0000000000752ef0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Fixes: 54a59aa2b562 ("Bluetooth: Add l2cap_chan->ops->ready()") Signed-off-by: Helen Koike <koike@igalia.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-03-19Bluetooth: hci_ll: Fix firmware leak on error pathAnas Iqbal-0/+2
Smatch reports: drivers/bluetooth/hci_ll.c:587 download_firmware() warn: 'fw' from request_firmware() not released on lines: 544. In download_firmware(), if request_firmware() succeeds but the returned firmware content is invalid (no data or zero size), the function returns without releasing the firmware, resulting in a resource leak. Fix this by calling release_firmware() before returning when request_firmware() succeeded but the firmware content is invalid. Fixes: 371805522f87 ("bluetooth: hci_uart: add LL protocol serdev driver support") Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Anas Iqbal <mohd.abd.6602@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-03-19Bluetooth: hci_sync: annotate data-races around hdev->req_statusCen Zhang-12/+12
__hci_cmd_sync_sk() sets hdev->req_status under hdev->req_lock: hdev->req_status = HCI_REQ_PEND; However, several other functions read or write hdev->req_status without holding any lock: - hci_send_cmd_sync() reads req_status in hci_cmd_work (workqueue) - hci_cmd_sync_complete() reads/writes from HCI event completion - hci_cmd_sync_cancel() / hci_cmd_sync_cancel_sync() read/write - hci_abort_conn() reads in connection abort path Since __hci_cmd_sync_sk() runs on hdev->req_workqueue while hci_send_cmd_sync() runs on hdev->workqueue, these are different workqueues that can execute concurrently on different CPUs. The plain C accesses constitute a data race. Add READ_ONCE()/WRITE_ONCE() annotations on all concurrent accesses to hdev->req_status to prevent potential compiler optimizations that could affect correctness (e.g., load fusing in the wait_event condition or store reordering). Signed-off-by: Cen Zhang <zzzccc427@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-03-19Bluetooth: MGMT: Fix dangling pointer on mgmt_add_adv_patterns_monitor_completeLuiz Augusto von Dentz-1/+1
This fixes the condition checking so mgmt_pending_valid is executed whenever status != -ECANCELED otherwise calling mgmt_pending_free(cmd) would kfree(cmd) without unlinking it from the list first, leaving a dangling pointer. Any subsequent list traversal (e.g., mgmt_pending_foreach during __mgmt_power_off, or another mgmt_pending_valid call) would dereference freed memory. Link: https://lore.kernel.org/linux-bluetooth/20260315132013.75ab40c5@kernel.org/T/#m1418f9c82eeff8510c1beaa21cf53af20db96c06 Fixes: 302a1f674c00 ("Bluetooth: MGMT: Fix possible UAFs") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
2026-03-19Bluetooth: SCO: Fix use-after-free in sco_recv_frame() due to missing sock_holdHyunwoo Kim-3/+7
sco_recv_frame() reads conn->sk under sco_conn_lock() but immediately releases the lock without holding a reference to the socket. A concurrent close() can free the socket between the lock release and the subsequent sk->sk_state access, resulting in a use-after-free. Other functions in the same file (sco_sock_timeout(), sco_conn_del()) correctly use sco_sock_hold() to safely hold a reference under the lock. Fix by using sco_sock_hold() to take a reference before releasing the lock, and adding sock_put() on all exit paths. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-03-19Bluetooth: L2CAP: Validate PDU length before reading SDU length in ↵Hyunwoo Kim-0/+5
l2cap_ecred_data_rcv() l2cap_ecred_data_rcv() reads the SDU length field from skb->data using get_unaligned_le16() without first verifying that skb contains at least L2CAP_SDULEN_SIZE (2) bytes. When skb->len is less than 2, this reads past the valid data in the skb. The ERTM reassembly path correctly calls pskb_may_pull() before reading the SDU length (l2cap_reassemble_sdu, L2CAP_SAR_START case). Apply the same validation to the Enhanced Credit Based Flow Control data path. Fixes: aac23bf63659 ("Bluetooth: Implement LE L2CAP reassembly") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-03-19Bluetooth: L2CAP: Fix stack-out-of-bounds read in l2cap_ecred_conn_reqMinseo Park-3/+3
Syzbot reported a KASAN stack-out-of-bounds read in l2cap_build_cmd() that is triggered by a malformed Enhanced Credit Based Connection Request. The vulnerability stems from l2cap_ecred_conn_req(). The function allocates a local stack buffer (`pdu`) designed to hold a maximum of 5 Source Channel IDs (SCIDs), totaling 18 bytes. When an attacker sends a request with more than 5 SCIDs, the function calculates `rsp_len` based on this unvalidated `cmd_len` before checking if the number of SCIDs exceeds L2CAP_ECRED_MAX_CID. If the SCID count is too high, the function correctly jumps to the `response` label to reject the packet, but `rsp_len` retains the attacker's oversized value. Consequently, l2cap_send_cmd() is instructed to read past the end of the 18-byte `pdu` buffer, triggering a KASAN panic. Fix this by moving the assignment of `rsp_len` to after the `num_scid` boundary check. If the packet is rejected, `rsp_len` will safely remain 0, and the error response will only read the 8-byte base header from the stack. Fixes: c28d2bff7044 ("Bluetooth: L2CAP: Fix result of L2CAP_ECRED_CONN_RSP when MTU is too short") Reported-by: syzbot+b7f3e7d9a596bf6a63e3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b7f3e7d9a596bf6a63e3 Tested-by: syzbot+b7f3e7d9a596bf6a63e3@syzkaller.appspotmail.com Signed-off-by: Minseo Park <jacob.park.9436@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-03-19vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFOYishai Hadas-39/+83
When userspace opts into VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2, the driver may report the VFIO_PRECOPY_INFO_REINIT output flag in response to the VFIO_MIG_GET_PRECOPY_INFO ioctl, along with a new initial_bytes value. The presence of the VFIO_PRECOPY_INFO_REINIT flag indicates to the caller that new initial data is available in the migration stream. If the firmware reports a new initial-data chunk, any previously dirty bytes in memory are treated as initial bytes, since the caller must read both sets before reaching the end of the initial-data region. In this case, the driver issues a new SAVE command to fetch the data and prepare it for a subsequent read() from userspace. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio/mlx5: consider inflight SAVE during PRE_COPYYishai Hadas-1/+8
Consider an inflight SAVE operation during the PRE_COPY phase, so the caller will wait when no data is currently available but is expected to arrive. This enables a follow-up patch to avoid returning -ENOMSG while a new *initial_bytes* chunk is still pending from an asynchronous SAVE command issued by the VFIO_MIG_GET_PRECOPY_INFO ioctl. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19net/mlx5: Add IFC bits for migration stateYishai Hadas-2/+14
Add the relevant IFC bits for querying an extra migration state from the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctlYishai Hadas-56/+68
Introduce a core helper function for VFIO_MIG_GET_PRECOPY_INFO and adapt all drivers to use it. It centralizes the common code and ensures that output flags are cleared on entry, in case user opts in to VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. This preventing any unintended echoing of userspace data back to userspace. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2Yishai Hadas-0/+22
Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't assign info.flags before copy_to_user(). Because they copy the struct in from userspace first, this effectively echoes userspace-provided flags back as output, preventing the field from being used to report new reliable data from the drivers. Add support for a new device feature named VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. On SET, enables the v2 pre_copy_info behaviour, where the vfio_precopy_info.flags is a valid output field. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio: Define uAPI for re-init initial bytes during the PRE_COPY phaseYishai Hadas-0/+24
As currently defined, initial_bytes is monotonically decreasing and precedes dirty_bytes when reading from the saving file descriptor. The transition from initial_bytes to dirty_bytes is unidirectional and irreversible. The initial_bytes are considered as critical data that is highly recommended to be transferred to the target as part of PRE_COPY, without this data, the PRE_COPY phase would be ineffective. We come to solve the case when a new chunk of critical data is introduced during the PRE_COPY phase and the driver would like to report an entirely new value for the initial_bytes. For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new initial_bytes value during the PRE_COPY phase. Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't assign info.flags before copy_to_user(), this effectively echoes userspace-provided flags back as output, preventing the field from being used to report new reliable data from the drivers. Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace to explicitly opt in by enabling the VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 device feature. When the caller opts in, the driver may report an entirely new value for initial_bytes. It may be larger, it may be smaller, it may include the previous unread initial_bytes, it may discard the previous unread initial_bytes, up to the driver logic and state. The presence of the VFIO_PRECOPY_INFO_REINIT output flag set by the driver indicates that new initial data is present on the stream. Once the caller sees this flag, the initial_bytes value should be re-evaluated relative to the readiness state for transition to STOP_COPY. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20260317161753.18964-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio: selftests: Fix VLA initialisation in vfio_pci_irq_set()Manish Honap-1/+3
C does not permit an initialiser expression on a variable-length array (C99 Section 6.7.9 constraint: "The type of the entity to be initialized shall not be a variable length array type"). vfio_pci_irq_set() declared: u8 buf[sizeof(struct vfio_irq_set) + sizeof(int) * count] = {}; where `count` is a runtime function parameter, making `buf` a VLA. GCC rejects this with (tried with GCC-9.4.0): error: variable-sized object may not be initialized Fix by removing the `= {}` initialiser and inserting an explicit memset() immediately after the declaration. memset() on a VLA is perfectly legal and achieves the same zero-initialisation on all conforming C implementations. Fixes: 19faf6fd969c ("vfio: selftests: Add a helper library for VFIO selftests") Cc: stable@vger.kernel.org Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> Link: https://lore.kernel.org/r/20260317051402.3725670-1-mhonap@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio: uapi: fix comment typoJoseph Salisbury-1/+1
The file contains a spelling error in a source comment (succes). Typos in comments reduce readability and make text searches less reliable for developers and maintainers. Replace 'succes' with 'success' in the affected comment. This is a comment-only cleanup and does not change behavior. Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com> Link: https://lore.kernel.org/r/20260316185617.166414-1-joseph.salisbury@oracle.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19vfio: mdev: replace mtty_dev->vd_class with a const struct classJori Koolstra-8/+9
The class_create() call has been deprecated in favor of class_register() as the driver core now allows for a struct class to be in read-only memory. Replace mtty_dev->vd_class with a const struct class and drop the class_create() call. Compile tested and found no errors/warns in dmesg after enabling CONFIG_VFIO and CONFIG_SAMPLE_VFIO_MDEV_MTTY. Link: https://lore.kernel.org/all/2023040244-duffel-pushpin-f738@gregkh/ Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20260308214939.1215682-1-jkoolstra@xs4all.nl Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-03-19bpf: Do not allow deleting local storage in NMIAmery Hung-0/+3
Currently, local storage may deadlock when deferring freeing selem or local storage through kfree_rcu(), call_rcu() or call_rcu_tasks_trace() in NMI or reentrant. Since deleting selem in NMI is an unlikely use case, partially mitigate it by returning error when calling from bpf_xxx_storage_delete() helpers in NMI. Note that, it is still possible to deadlock through reentrant. A full mitigation requires returning error when irqs_disabled() is true, which, however is too heavy-handed for bpf_xxx_storage_delete(). The long-term solution requires _nolock versions of call_rcu. Another possible solution is to defer the free through irq_work [0], but it would grow the size of selem, which is non-ideal. The check is only needed in bpf_selem_unlink(), which is used by helpers and syscalls. bpf_selem_unlink_nofail() is fine as it is called during map and owner tear down that never run in NMI or reentrant. [0] https://lore.kernel.org/bpf/20260205190233.912-1-alexei.starovoitov@gmail.com/ Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://patch.msgid.link/20260319025716.2361065-1-ameryhung@gmail.com
2026-03-19Merge tag 'net-7.0-rc5' of ↵Linus Torvalds-459/+734
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless, Bluetooth and netfilter. Nothing too exciting here, mostly fixes for corner cases. Current release - fix to a fix: - bonding: prevent potential infinite loop in bond_header_parse() Current release - new code bugs: - wifi: mac80211: check tdls flag in ieee80211_tdls_oper Previous releases - regressions: - af_unix: give up GC if MSG_PEEK intervened - netfilter: conntrack: add missing netlink policy validations - NFC: nxp-nci: allow GPIOs to sleep" * tag 'net-7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (78 commits) MPTCP: fix lock class name family in pm_nl_create_listen_socket icmp: fix NULL pointer dereference in icmp_tag_validation() net: dsa: bcm_sf2: fix missing clk_disable_unprepare() in error paths net: shaper: protect from late creation of hierarchy net: shaper: protect late read accesses to the hierarchy net: mvpp2: guard flow control update with global_tx_fc in buffer switching nfnetlink_osf: validate individual option lengths in fingerprints netfilter: nf_tables: release flowtable after rcu grace period on error netfilter: bpf: defer hook memory release until rcu readers are done net: bonding: fix NULL deref in bond_debug_rlb_hash_show udp_tunnel: fix NULL deref caused by udp_sock_create6 when CONFIG_IPV6=n net/mlx5e: Fix race condition during IPSec ESN update net/mlx5e: Prevent concurrent access to IPSec ASO context net/mlx5: qos: Restrict RTNL area to avoid a lock cycle ipv6: add NULL checks for idev in SRv6 paths NFC: nxp-nci: allow GPIOs to sleep net: macb: fix uninitialized rx_fs_lock net: macb: fix use-after-free access to PTP clock netdevsim: drop PSP ext ref on forward failure wifi: mac80211: always free skb on ieee80211_tx_prepare_skb() failure ...
2026-03-19KVM: arm64: selftests: Add no-vgic-v5 selftestSascha Bischoff-178/+298
Now that GICv5 is supported, it is important to check that all of the GICv5 register state is hidden from a guest that doesn't create a vGICv5. Rename the no-vgic-v3 selftest to no-vgic, and extend it to check GICv5 system registers too. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-42-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: selftests: Introduce a minimal GICv5 PPI selftestSascha Bischoff-0/+379
This basic selftest creates a vgic_v5 device (if supported), and tests that one of the PPI interrupts works as expected with a basic single-vCPU guest. Upon starting, the guest enables interrupts. That means that it is initialising all PPIs to have reasonable priorities, but marking them as disabled. Then the priority mask in the ICC_PCR_EL1 is set, and interrupts are enable in ICC_CR0_EL1. At this stage the guest is able to receive interrupts. The architected SW_PPI (64) is enabled and KVM_IRQ_LINE ioctl is used to inject the state into the guest. The guest's interrupt handler has an explicit WFI in order to ensure that the guest skips WFI when there are pending and enabled PPI interrupts. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-41-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Communicate userspace-driveable PPIs via a UAPISascha Bischoff-1/+69
GICv5 systems will likely not support the full set of PPIs. The presence of any virtual PPI is tied to the presence of the physical PPI. Therefore, the available PPIs will be limited by the physical host. Userspace cannot drive any PPIs that are not implemented. Moreover, it is not desirable to expose all PPIs to the guest in the first place, even if they are supported in hardware. Some devices, such as the arch timer, are implemented in KVM, and hence those PPIs shouldn't be driven by userspace, either. Provided a new UAPI: KVM_DEV_ARM_VGIC_GRP_CTRL => KVM_DEV_ARM_VGIC_USERPSPACE_PPIs This allows userspace to query which PPIs it is able to drive via KVM_IRQ_LINE. Additionally, introduce a check in kvm_vm_ioctl_irq_line() to reject any PPIs not in the userspace mask. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-40-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19Documentation: KVM: Introduce documentation for VGICv5Sascha Bischoff-0/+38
Now that it is possible to create a VGICv5 device, provide initial documentation for it. At this stage, there is little to document. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-39-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Probe for GICv5 deviceSascha Bischoff-11/+37
The basic GICv5 PPI support is now complete. Allow probing for a native GICv5 rather than just the legacy support. The implementation doesn't support protected VMs with GICv5 at this time. Therefore, if KVM has protected mode enabled the native GICv5 init is skipped, but legacy VMs are allowed if the hardware supports it. At this stage the GICv5 KVM implementation only supports PPIs, and doesn't interact with the host IRS at all. This means that there is no need to check how many concurrent VMs or vCPUs per VM are supported by the IRS - the PPI support only requires the CPUIF. The support is artificially limited to VGIC_V5_MAX_CPUS, i.e. 512, vCPUs per VM. With this change it becomes possible to run basic GICv5-based VMs, provided that they only use PPIs. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-38-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Set ICH_VCTLR_EL2.En on bootSascha Bischoff-0/+2
This control enables virtual HPPI selection, i.e., selection and delivery of interrupts for a guest (assuming that the guest itself has opted to receive interrupts). This is set to enabled on boot as there is no reason for disabling it in normal operation as virtual interrupt signalling itself is still controlled via the HCR_EL2. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-37-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Introduce kvm_arm_vgic_v5_ops and register themSascha Bischoff-0/+75
Only the KVM_DEV_ARM_VGIC_GRP_CTRL->KVM_DEV_ARM_VGIC_CTRL_INIT op is currently supported. All other ops are stubbed out. Co-authored-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-36-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Hide FEAT_GCIE from NV GICv5 guestsSascha Bischoff-0/+5
Currently, NV guests are not supported with GICv5. Therefore, make sure that FEAT_GCIE is always hidden from such guests. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-35-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic: Hide GICv5 for protected guestsSascha Bischoff-0/+10
We don't support running protected guest with GICv5 at the moment. Therefore, be sure that we don't expose it to the guest at all by actively hiding it when running a protected guest. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-34-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Mandate architected PPI for PMU emulation on GICv5Sascha Bischoff-5/+18
Make it mandatory to use the architected PPI when running a GICv5 guest. Attempts to set anything other than the architected PPI (23) are rejected. Additionally, KVM_ARM_VCPU_PMU_V3_INIT is relaxed to no longer require KVM_ARM_VCPU_PMU_V3_IRQ to be called for GICv5-based guests. In this case, the architectued PPI is automatically used. Documentation is bumped accordingly. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-33-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Enlighten arch timer for GICv5Sascha Bischoff-22/+94
Now that GICv5 has arrived, the arch timer requires some TLC to address some of the key differences introduced with GICv5. For PPIs on GICv5, the queue_irq_unlock irq_op is used as AP lists are not required at all for GICv5. The arch timer also introduces an irq_op - get_input_level. Extend the arch-timer-provided irq_ops to include the PPI op for vgic_v5 guests. When possible, DVI (Direct Virtual Interrupt) is set for PPIs when using a vgic_v5, which directly inject the pending state into the guest. This means that the host never sees the interrupt for the guest for these interrupts. This has three impacts. * First of all, the kvm_cpu_has_pending_timer check is updated to explicitly check if the timers are expected to fire. * Secondly, for mapped timers (which use DVI) they must be masked on the host prior to entering a GICv5 guest, and unmasked on the return path. This is handled in set_timer_irq_phys_masked. * Thirdly, it makes zero sense to attempt to inject state for a DVI'd interrupt. Track which timers are direct, and skip the call to kvm_vgic_inject_irq() for these. The final, but rather important, change is that the architected PPIs for the timers are made mandatory for a GICv5 guest. Attempts to set them to anything else are actively rejected. Once a vgic_v5 is initialised, the arch timer PPIs are also explicitly reinitialised to ensure the correct GICv5-compatible PPIs are used - this also adds in the GICv5 PPI type to the intid. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-32-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19irqchip/gic-v5: Introduce minimal irq_set_type() for PPIsSascha Bischoff-0/+18
GICv5 does not support configuring the handling mode or trigger mode of PPIs at runtime - these choices are made at implementation time, and most of the architected PPIs have an architected handling mode (as reported in the ICH_PPI_HMRn_EL1 registers). As chip->set_irq_type() is optional, this has not been implemented for GICv5 PPIs as it served no real purpose. However, although the set_irq_type() function is marked as optional, the lack of it breaks attempts to create a domain hierarchy on top of GICv5's PPI domain. This is due to __irq_set_trigger() calling chip->set_irq_type(), which returns -ENOSYS if the parent domain doesn't implement the set_irq_type() call. In order to make things work, this change introduces a set_irq_type() call for GICv5 PPIs. This performs a basic sanity check (that the hardware's handling mode (Level/Edge) matches what is being set as the type, and does nothing else. This is sufficient to get hierarchical domains working for GICv5 PPIs (such as the one KVM introduces for the arch timer). It has the side benefit (or drawback) that it will catch cases where the firmware description doesn't match what the hardware reports. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-31-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Initialise ID and priority bits when resetting vcpuSascha Bischoff-1/+21
Determine the number of priority bits and ID bits exposed to the guest as part of resetting the vcpu state. These values are presented to the guest by trapping and emulating reads from ICC_IDR0_EL1. GICv5 supports either 16- or 24-bits of ID space (for SPIs and LPIs). It is expected that 2^16 IDs is more than enough, and therefore this value is chosen irrespective of the hardware supporting more or not. The GICv5 architecture only supports 5 bits of priority in the CPU interface (but potentially fewer in the IRS). Therefore, this is the default value chosen for the number of priority bits in the CPU IF. Note: We replicate the way that GICv3 uses the num_id_bits and num_pri_bits variables. That is, num_id_bits stores the value of the hardware field verbatim (0 means 16-bits, 1 would mean 24-bits for GICv5), and num_pri_bits stores the actual number of priority bits; the field value + 1. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://patch.msgid.link/20260319154937.3619520-30-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-19KVM: arm64: gic-v5: Create and initialise vgic_v5Sascha Bischoff-20/+63
Update kvm_vgic_create to create a vgic_v5 device. When creating a vgic, FEAT_GCIE in the ID_AA64PFR2 is only exposed to vgic_v5-based guests, and is hidden otherwise. GIC in ~ID_AA64PFR0_EL1 is never exposed for a vgic_v5 guest. When initialising a vgic_v5, skip kvm_vgic_dist_init as GICv5 doesn't support one. The current vgic_v5 implementation only supports PPIs, so no SPIs are initialised either. The current vgic_v5 support doesn't extend to nested guests. Therefore, the init of vgic_v5 for a nested guest is failed in vgic_v5_init. As the current vgic_v5 doesn't require any resources to be mapped, vgic_v5_map_resources is simply used to check that the vgic has indeed been initialised. Again, this will change as more GICv5 support is merged in. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260319154937.3619520-29-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>