aboutsummaryrefslogtreecommitdiffstats
path: root/include/uapi (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-07-28Merge tag 'vfs-6.17-rc1.fallocate' of ↵Linus Torvalds1-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fallocate updates from Christian Brauner: "fallocate() currently supports creating preallocated files efficiently. However, on most filesystems fallocate() will preallocate blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified. The extent state must later be converted to a written state when the user writes data into this range, which can trigger numerous metadata changes and journal I/O. This may leads to significant write amplification and performance degradation in synchronous write mode. At the moment, the only method to avoid this is to create an empty file and write zero data into it (for example, using 'dd' with a large block size). However, this method is slow and consumes a considerable amount of disk bandwidth. Now that more and more flash-based storage devices are available it is possible to efficiently write zeros to SSDs using the unmap write zeroes command if the devices do not write physical zeroes to the media. For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support the DEAC bit[1], the write zeroes command does not write actual data to the device, instead, NVMe converts the zeroed range to a deallocated state, which works fast and consumes almost no disk write bandwidth. This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices. fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a way that subsequent writes to that range do not require further changes to the file mapping metadata. This flag is beneficial for subsequent pure overwriting within this range, as it can save on block allocation and, consequently, significant metadata changes" * tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ext4: add FALLOC_FL_WRITE_ZEROES support block: add FALLOC_FL_WRITE_ZEROES support block: factor out common part in blkdev_fallocate() fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate dm: clear unmap write zeroes limits when disabling write zeroes scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP nvmet: set WZDS and DRB if device enables unmap write zeroes operation nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
2025-07-28Merge tag 'vfs-6.17-rc1.nsfs' of ↵Linus Torvalds2-0/+22
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains namespace updates. This time specifically for nsfs: - Userspace heavily relies on the root inode numbers for namespaces to identify the initial namespaces. That's already a hard dependency. So we cannot change that anymore. Move the initial inode numbers to a public header and align the only two namespaces that currently don't do that with all the other namespaces. - The root inode of /proc having a fixed inode number has been part of the core kernel ABI since its inception, and recently some userspace programs (mainly container runtimes) have started to explicitly depend on this behaviour. The main reason this is useful to userspace is that by checking that a suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO, they can then use openat2() together with RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH} to ensure that there isn't a bind-mount that replaces some procfs file with a different one. This kind of attack has lead to security issues in container runtimes in the past (such as CVE-2019-19921) and libraries like libpathrs[1] use this feature of procfs to provide safe procfs handling functions" * tag 'vfs-6.17-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: uapi: export PROCFS_ROOT_INO mntns: use stable inode number for initial mount ns netns: use stable inode number for initial mount ns nsfs: move root inode number to uapi
2025-07-28Merge tag 'vfs-6.17-rc1.coredump' of ↵Linus Torvalds1-0/+104
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull coredump updates from Christian Brauner: "This contains an extension to the coredump socket and a proper rework of the coredump code. - This extends the coredump socket to allow the coredump server to tell the kernel how to process individual coredumps. This allows for fine-grained coredump management. Userspace can decide to just let the kernel write out the coredump, or generate the coredump itself, or just reject it. * COREDUMP_KERNEL The kernel will write the coredump data to the socket. * COREDUMP_USERSPACE The kernel will not write coredump data but will indicate to the parent that a coredump has been generated. This is used when userspace generates its own coredumps. * COREDUMP_REJECT The kernel will skip generating a coredump for this task. * COREDUMP_WAIT The kernel will prevent the task from exiting until the coredump server has shutdown the socket connection. The flexible coredump socket can be enabled by using the "@@" prefix instead of the single "@" prefix for the regular coredump socket: @@/run/systemd/coredump.socket - Cleanup the coredump code properly while we have to touch it anyway. Split out each coredump mode in a separate helper so it's easy to grasp what is going on and make the code easier to follow. The core coredump function should now be very trivial to follow" * tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits) cleanup: add a scoped version of CLASS() coredump: add coredump_skip() helper coredump: avoid pointless variable coredump: order auto cleanup variables at the top coredump: add coredump_cleanup() coredump: auto cleanup prepare_creds() cred: add auto cleanup method coredump: directly return coredump: auto cleanup argv coredump: add coredump_write() coredump: use a single helper for the socket coredump: move pipe specific file check into coredump_pipe() coredump: split pipe coredumping into coredump_pipe() coredump: move core_pipe_count to global variable coredump: prepare to simplify exit paths coredump: split file coredumping into coredump_file() coredump: rename do_coredump() to vfs_coredump() selftests/coredump: make sure invalid paths are rejected coredump: validate socket path in coredump_parse() coredump: don't allow ".." in coredump socket path ...
2025-07-28Merge tag 'pull-headers_param' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull asm/param cleanup from Al Viro: "This massages asm/param.h to simpler and more uniform shape: - all arch/*/include/uapi/asm/param.h are either generated includes of <asm-generic/param.h> or a #define or two followed by such include - no arch/*/include/asm/param.h anywhere, generated or not - include <asm/param.h> resolves to arch/*/include/uapi/asm/param.h of the architecture in question (or that of host in case of uml) - include/asm-generic/param.h pulls uapi/asm-generic/param.h and deals with USER_HZ, CLOCKS_PER_SEC and with HZ redefinition after that" * tag 'pull-headers_param' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: loongarch, um, xtensa: get rid of generated arch/$ARCH/include/asm/param.h alpha: regularize the situation with asm/param.h xtensa: get rid uapi/asm/param.h
2025-07-28Merge tag 'i2c-host-6.17-pt1' of ↵Wolfram Sang7-23/+39
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow i2c-host for v6.17, part 1 Cleanups and refactorings: - lpi2c, riic, st, stm32f7: general improvements - riic: support more flexible IRQ configurations - tegra: fix documentation Improvements: - lpi2c: improve register polling and add atomic transfer - imx: use guarded spinlocks New hardware support: - Samsung Exynos 2200 - Renesas RZ/T2H (R9A09G077), RZ/N2H (R9A09G087) DT binding: - rk3x: enable power domains - nxp: support clock property
2025-07-27Input: Add and document BTN_GRIP*Vicki Pfau1-0/+5
Many controllers these days have started including grip buttons. As there has been no particular assigned BTN_* constants for these, they've been haphazardly assigned to BTN_TRIGGER_HAPPY*. Unfortunately, the assignment of these has varied significantly between drivers. Add and document new constants for these grip buttons. Signed-off-by: Vicki Pfau <vi@endrift.com> Link: https://lore.kernel.org/r/20250702040102.125432-2-vi@endrift.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-07-25Merge tag 'nf-next-25-07-25' of ↵Jakub Kicinski1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following series contains Netfilter/IPVS updates for net-next: 1) Display netns inode in conntrack table full log, from lvxiafei. 2) Autoload nf_log_syslog in case no logging backend is available, from Lance Yang. 3) Three patches to remove unused functions in x_tables, nf_tables and conntrack. From Yue Haibing. 4) Exclude LEGACY TABLES on PREEMPT_RT: Add NETFILTER_XTABLES_LEGACY to exclude xtables legacy infrastructure. 5) Restore selftests by toggling NETFILTER_XTABLES_LEGACY where needed. From Florian Westphal. 6) Use CONFIG_INET_SCTP_DIAG in tools/testing/selftests/net/netfilter/config, from Sebastian Andrzej Siewior. 7) Use timer_delete in comment in IPVS codebase, from WangYuli. 8) Dump flowtable information in nfnetlink_hook, this includes an initial patch to consolidate common code in helper function, from Phil Sutter. 9) Remove unused arguments in nft_pipapo set backend, from Florian Westphal. 10) Return nft_set_ext instead of boolean in set lookup function, from Florian Westphal. 11) Remove indirection in dynamic set infrastructure, also from Florian. 12) Consolidate pipapo_get/lookup, from Florian. 13) Use kvmalloc in nft_pipapop, from Florian Westphal. 14) syzbot reports slab-out-of-bounds in xt_nfacct log message, fix from Florian Westphal. 15) Ignored tainted kernels in selftest nft_interface_stress.sh, from Phil Sutter. 16) Fix IPVS selftest by disabling rp_filter with ipip tunnel device, from Yi Chen. * tag 'nf-next-25-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: selftests: netfilter: ipvs.sh: Explicity disable rp_filter on interface tunl0 selftests: netfilter: Ignore tainted kernels in interface stress test netfilter: xt_nfacct: don't assume acct name is null-terminated netfilter: nft_set_pipapo: prefer kvmalloc for scratch maps netfilter: nft_set_pipapo: merge pipapo_get/lookup netfilter: nft_set: remove indirection from update API call netfilter: nft_set: remove one argument from lookup and update functions netfilter: nft_set_pipapo: remove unused arguments netfilter: nfnetlink_hook: Dump flowtable info netfilter: nfnetlink: New NFNLA_HOOK_INFO_DESC helper ipvs: Rename del_timer in comment in ip_vs_conn_expire_now() selftests: netfilter: Enable CONFIG_INET_SCTP_DIAG selftests: net: Enable legacy netfilter legacy options. netfilter: Exclude LEGACY TABLES on PREEMPT_RT. netfilter: conntrack: Remove unused net in nf_conntrack_double_lock() netfilter: nf_tables: Remove unused nft_reduce_is_readonly() netfilter: x_tables: Remove unused functions xt_{in|out}name() netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid netfilter: conntrack: table full detailed log ==================== Link: https://patch.msgid.link/20250725170340.21327-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25ipv6: add `force_forwarding` sysctl to enable per-interface forwardingGabriel Goller3-0/+3
It is currently impossible to enable ipv6 forwarding on a per-interface basis like in ipv4. To enable forwarding on an ipv6 interface we need to enable it on all interfaces and disable it on the other interfaces using a netfilter rule. This is especially cumbersome if you have lots of interfaces and only want to enable forwarding on a few. According to the sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding for all interfaces, while the interface-specific `net.ipv6.conf.<interface>.forwarding` configures the interface Host/Router configuration. Introduce a new sysctl flag `force_forwarding`, which can be set on every interface. The ip6_forwarding function will then check if the global forwarding flag OR the force_forwarding flag is active and forward the packet. To preserve backwards-compatibility reset the flag (on all interfaces) to 0 if the net.ipv6.conf.all.forwarding flag is set to 0. Add a short selftest that checks if a packet gets forwarded with and without `force_forwarding`. [0]: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Gabriel Goller <g.goller@proxmox.com> Link: https://patch.msgid.link/20250722081847.132632-1-g.goller@proxmox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25netfilter: nfnetlink_hook: Dump flowtable infoPhil Sutter1-0/+2
Introduce NFNL_HOOK_TYPE_NFT_FLOWTABLE to distinguish flowtable hooks from base chain ones. Nested attributes are shared with the old NFTABLES hook info type since they fit apart from their misleading name. Old nftables in user space will ignore this new hook type and thus continue to print flowtable hooks just like before, e.g.: | family netdev { | hook ingress device test0 { | 0000000000 nf_flow_offload_ip_hook [nf_flow_table] | } | } With this patch in place and support for the new hook info type, output becomes more useful: | family netdev { | hook ingress device test0 { | 0000000000 flowtable ip mytable myft [nf_flow_table] | } | } Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25accel/rocket: Add IOCTLs for synchronizing memory accessesTomeu Vizoso1-0/+34
The NPU cores have their own access to the memory bus, and this isn't cache coherent with the CPUs. Add IOCTLs so userspace can mark when the caches need to be flushed, and also when a writer job needs to be waited for before the buffer can be accessed from the CPU. Initially based on the same IOCTLs from the Etnaviv driver. v2: - Don't break UABI by reordering the IOCTL IDs (Jeff Hugo) v3: - Check that padding fields in IOCTLs are zero (Jeff Hugo) v6: - Fix conversion logic to make sure we use DMA_BIDIRECTIONAL when needed (Lucas Stach) v8: - Always sync BOs in both directions (Robin Murphy) Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250721-6-10-rocket-v9-5-77ebd484941e@tomeuvizoso.net
2025-07-25accel/rocket: Add job submission IOCTLTomeu Vizoso1-0/+64
Using the DRM GPU scheduler infrastructure, with a scheduler for each core. Userspace can decide for a series of tasks to be executed sequentially in the same core, so SRAM locality can be taken advantage of. The job submission code was initially based on Panfrost. v2: - Remove hardcoded number of cores - Misc. style fixes (Jeffrey Hugo) - Repack IOCTL struct (Jeffrey Hugo) v3: - Adapt to a split of the register block in the DT bindings (Nicolas Frattaroli) - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo) - Use drm_* logging functions (Thomas Zimmermann) - Rename reg i/o macros (Thomas Zimmermann) - Add padding to ioctls and check for zero (Jeff Hugo) - Improve error handling (Nicolas Frattaroli) v6: - Use mutexes guard (Markus Elfring) - Use u64_to_user_ptr (Jeff Hugo) - Drop rocket_fence (Rob Herring) v7: - Assign its own IOMMU domain to each client, for isolation (Daniel Stone and Robin Murphy) v8: - Use reset lines to reset the cores (Robin Murphy) - Use the macros to compute the values for the bitfields (Robin Murphy) - More descriptive name for the IRQ (Robin Murphy) - Simplify job interrupt handing (Robin Murphy) - Correctly acquire a reference to the IOMMU (Robin Murphy) - Specify the size of the embedded structs in the IOCTLs for future extensibility (Rob Herring) - Expose only 32 bits for the address of the regcmd BO (Robin Murphy) Tested-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250721-6-10-rocket-v9-4-77ebd484941e@tomeuvizoso.net
2025-07-25accel/rocket: Add IOCTL for BO creationTomeu Vizoso1-0/+44
This uses the SHMEM DRM helpers and we map right away to the CPU and NPU sides, as all buffers are expected to be accessed from both. v2: - Sync the IOMMUs for the other cores when mapping and unmapping. v3: - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo) v6: - Use mutexes guard (Markus Elfring) v7: - Assign its own IOMMU domain to each client, for isolation (Daniel Stone and Robin Murphy) v8: - Correctly acquire a reference to the IOMMU (Robin Murphy) - Allocate DMA address ourselves with drm_mm (Robin Murphy) - Use refcount_read (Heiko Stuebner) - Remove superfluous dma_sync_sgtable_for_device (Robin Murphy) Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250721-6-10-rocket-v9-3-77ebd484941e@tomeuvizoso.net
2025-07-24net: define an enum for the napi threaded stateSamiullah Khawaja1-0/+5
Instead of using '0' and '1' for napi threaded state use an enum with 'disabled' and 'enabled' states. Tested: ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Link: https://patch.msgid.link/20250723013031.2911384-4-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24Merge tag 'wireless-next-2025-07-24' of ↵Jakub Kicinski1-0/+39
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Another wireless update: - rtw89: - STA+P2P concurrency - support for USB devices RTL8851BU/RTL8852BU - ath9k: OF support - ath12k: - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - iwlwifi: some FIPS interoperability - brcm80211: support SDIO 43751 device - rt2x00: better DT/OF support - cfg80211/mac80211: - improved S1G support - beacon monitor for MLO * tag 'wireless-next-2025-07-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (199 commits) ssb: use new GPIO line value setter callbacks for the second GPIO chip wifi: Fix typos wifi: brcmsmac: Use str_true_false() helper wifi: brcmfmac: fix EXTSAE WPA3 connection failure due to AUTH TX failure wifi: brcm80211: Remove yet more unused functions wifi: brcm80211: Remove more unused functions wifi: brcm80211: Remove unused functions wifi: iwlwifi: Revert "wifi: iwlwifi: remove support of several iwl_ppag_table_cmd versions" wifi: iwlwifi: check validity of the FW API range wifi: iwlwifi: don't export symbols that we shouldn't wifi: iwlwifi: mld: use spec link id and not FW link id wifi: iwlwifi: mld: decode EOF bit for AMPDUs wifi: iwlwifi: Remove support for rx OMI bandwidth reduction wifi: iwlwifi: stop supporting iwl_omi_send_status_notif ver 1 wifi: iwlwifi: remove SC2F firmware support wifi: iwlwifi: mvm: Remove NAN support wifi: iwlwifi: mld: avoid outdated reorder buffer head_sn wifi: iwlwifi: mvm: avoid outdated reorder buffer head_sn wifi: iwlwifi: disable certain features for fips_enabled wifi: iwlwifi: mld: support channel survey collection for ACS scans ... ==================== Link: https://patch.msgid.link/20250724100349.21564-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24misc: pci_endpoint_test: Add doorbell test caseFrank Li1-0/+1
Add doorbell support with the help of three new registers: PCIE_ENDPOINT_TEST_DB_BAR, PCIE_ENDPOINT_TEST_DB_ADDR, and PCIE_ENDPOINT_TEST_DB_DATA. The testcase works by triggering the doorbell in Endpoint by writing the value from PCI_ENDPOINT_TEST_DB_DATA register to the address provided by PCI_ENDPOINT_TEST_DB_OFFSET register of the BAR indicated by the PCIE_ENDPOINT_TEST_DB_BAR register and waiting for the completion status from the Endpoint. Signed-off-by: Frank Li <Frank.Li@nxp.com> [mani: removed one spurious change and reworded the commit message] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20250710-ep-msi-v21-7-57683fc7fb25@nxp.com
2025-07-23sched: Dump configuration and statistics of dualpi2 qdiscChia-Yu Chang1-0/+15
The configuration and statistics dump of the DualPI2 Qdisc provides information related to both queues, such as packet numbers and queuing delays in the L-queue and C-queue, as well as general information such as probability value, WRR credits, memory usage, packet marking counters, max queue size, etc. The following patch includes enqueue/dequeue for DualPI2. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Link: https://patch.msgid.link/20250722095915.24485-3-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23sched: Struct definition and parsing of dualpi2 qdiscChia-Yu Chang1-0/+53
DualPI2 is the reference implementation of IETF RFC9332 DualQ Coupled AQM (https://datatracker.ietf.org/doc/html/rfc9332) providing two queues called low latency (L-queue) and classic (C-queue). By default, it enqueues non-ECN and ECT(0) packets into the C-queue and ECT(1) and CE packets into the low latency queue (L-queue), as per IETF RFC9332 spec. This patch defines the dualpi2 Qdisc structure and parsing, and the following two patches include dumping and enqueue/dequeue for the DualPI2. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Link: https://patch.msgid.link/20250722095915.24485-2-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23devlink: Fix excessive stack usage in rate TC bandwidth parsingCarolina Jubran1-2/+9
The devlink_nl_rate_tc_bw_parse function uses a large stack array for devlink attributes, which triggers a warning about excessive stack usage: net/devlink/rate.c: In function 'devlink_nl_rate_tc_bw_parse': net/devlink/rate.c:382:1: error: the frame size of 1648 bytes is larger than 1536 bytes [-Werror=frame-larger-than=] Introduce a separate attribute set specifically for rate TC bandwidth parsing that only contains the two attributes actually used: index and bandwidth. This reduces the stack array from DEVLINK_ATTR_MAX entries to just 2 entries, solving the stack usage issue. Update devlink selftest to use the new 'index' and 'bw' attribute names consistent with the YAML spec. Example usage with ynl with the new spec: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-set --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1, "rate-tc-bws": [ {"index": 0, "bw": 50}, {"index": 1, "bw": 50}, {"index": 2, "bw": 0}, {"index": 3, "bw": 0}, {"index": 4, "bw": 0}, {"index": 5, "bw": 0}, {"index": 6, "bw": 0}, {"index": 7, "bw": 0} ] }' ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \ --do rate-get --json '{ "bus-name": "pci", "dev-name": "0000:08:00.0", "port-index": 1 }' output for rate-get: {'bus-name': 'pci', 'dev-name': '0000:08:00.0', 'port-index': 1, 'rate-tc-bws': [{'bw': 50, 'index': 0}, {'bw': 50, 'index': 1}, {'bw': 0, 'index': 2}, {'bw': 0, 'index': 3}, {'bw': 0, 'index': 4}, {'bw': 0, 'index': 5}, {'bw': 0, 'index': 6}, {'bw': 0, 'index': 7}], 'rate-tx-max': 0, 'rate-tx-priority': 0, 'rate-tx-share': 0, 'rate-tx-weight': 0, 'rate-type': 'leaf'} Fixes: 566e8f108fc7 ("devlink: Extend devlink rate API with traffic classes bandwidth management") Reported-by: Arnd Bergmann <arnd@arndb.de> Closes: https://lore.kernel.org/netdev/20250708160652.1810573-1-arnd@kernel.org/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507171943.W7DJcs6Y-lkp@intel.com/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Tested-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/1753175609-330621-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-23IB: Extend UVERBS_METHOD_REG_MR to get DMAHYishai Hadas1-0/+1
Extend UVERBS_METHOD_REG_MR to get DMAH and pass it to all drivers. It will be used in mlx5 driver as part of the next patch from the series. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/2ae1e628c0675db81f092cc00d3ad6fbf6139405.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-23RDMA/core: Introduce a DMAH object and its alloc/free APIsYishai Hadas1-0/+17
Introduce a new DMA handle (DMAH) object along with its corresponding allocation and deallocation APIs. This DMAH object encapsulates attributes intended for use in DMA transactions. While its initial purpose is to support TPH functionality, it is designed to be extensible for future features such as DMA PCI multipath, PCI UIO configurations, PCI traffic class selection, and more. Further details: ---------------- We ensure that a caller requesting a DMA handle for a specific CPU ID is permitted to be scheduled on it. This prevent a potential security issue where a non privilege user may trigger DMA operations toward a CPU that it's not allowed to run on. We manage reference counting for the DMAH object and its consumers (e.g., memory regions) as will be detailed in subsequent patches in the series. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/2cad097e849597e49d6b61e6865dba878257f371.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-23IB/core: Add UVERBS_METHOD_REG_MR on the MR objectYishai Hadas1-0/+14
This new method enables us to use a single ioctl from user space which supports the below variants of reg_mr [1]. The method will be extended in the next patches from the series with an extra attribute to let us pass DMA handle to be used as part of the registration. [1] ibv_reg_mr(), ibv_reg_mr_iova(), ibv_reg_mr_iova2(), ibv_reg_dmabuf_mr(). Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/5a3822ceef084efe967c9752e89c58d8250337c7.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-22accel/amdxdna: Support user space allocated bufferLizhi Hou1-0/+25
Enhance DRM_IOCTL_AMDXDNA_CREATE_BO to accept user space allocated buffer pointer. The buffer pages will be pinned in memory. Unless the CAP_IPC_LOCK is enabled for the application process, the total pinned memory can not beyond rlimit_memlock. Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20250716164414.112091-1-lizhi.hou@amd.com
2025-07-21ethtool: rss: support removing contexts via NetlinkJakub Kicinski1-0/+2
Implement removing additional RSS contexts via Netlink. Technically it'd be possible to shoehorn the delete operation into ethnl_request_ops-compatible handler. The code ends up longer than open coded version, and I think we'll need a custom way of sending notifications at some stage (if we allow tying the context lifetime to the netlink socket, in the future). Link: https://patch.msgid.link/20250717234343.2328602-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-21ethtool: rss: support creating contexts via NetlinkJakub Kicinski1-0/+3
Support creating contexts via Netlink. Setting flow hashing fields on the new context is not supported at this stage, it can be added later. An empty indirection table is not supported. This is a carry over from the IOCTL interface where empty indirection table meant delete. We can repurpose empty indirection table in Netlink but for now to avoid confusion reject it using the policy. Support letting user choose the ID for the new context. This was not possible in IOCTL since the context ID field for the create action had to be set to the ETH_RXFH_CONTEXT_ALLOC magic value. Link: https://patch.msgid.link/20250717234343.2328602-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-22btrfs: defrag: add flag to force no-compressionDavid Sterba1-0/+3
Currently the defrag ioctl cannot rewrite the extents without compression. Add a new flag for that, as setting compression to 0 (or "no compression") means to do no changes to compression so take what is the current default, like mount options or properties. The defrag setting overrides mount or properties. The compression BTRFS_DEFRAG_DONT_COMPRESS is only used for in-memory operations and does not need to have a fixed value. Mount with zstd:9, copy test file from /usr/bin/ (about 260KB): $ mount -o compress=zstd:9 /dev/vda /mnt $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 127: 13312.. 13439: 128: encoded 1: 128.. 255: 13364.. 13491: 128: 13440: encoded 2: 256.. 291: 13424.. 13459: 36: 13492: last,encoded,eof testfile: 3 extents found $ compsize testfile Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 42% 124K 292K 292K zstd 42% 124K 292K 292K Defrag to uncompressed: $ btrfs fi defrag --nocomp testfile $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 291: 291840.. 292131: 292: last,eof testfile: 1 extent found $ compsize testfile Processed 1 file, 1 regular extents (1 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 100% 292K 292K 292K none 100% 292K 292K 292K Compress again with LZO: $ btrfs fi defrag -clzo testfile $ filefrag -vsb testfile filefrag: -b needs a blocksize option, assuming 1024-byte blocks. Filesystem type is: 9123683e File size of testfile is 297704 (292 blocks of 1024 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 127: 13312.. 13439: 128: encoded 1: 128.. 255: 13392.. 13519: 128: 13440: encoded 2: 256.. 291: 13480.. 13515: 36: 13520: last,encoded,eof testfile: 3 extents found $ compsize testfile Processed 1 file, 3 regular extents (3 refs), 0 inline, 1 fragments. Type Perc Disk Usage Uncompressed Referenced TOTAL 64% 188K 292K 292K lzo 64% 188K 292K 292K Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-21Merge tag 'v6.16-rc7' into tty-nextGreg Kroah-Hartman4-14/+6
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-18iommufd: Destroy vdevice on idevice destroyXu Yilun1-0/+5
Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so that vdev can't outlive idev. idev represents the physical device bound to iommufd, while the vdev represents the virtual instance of the physical device in the VM. The lifecycle of the vdev should not be longer than idev. This doesn't cause real problem on existing use cases cause vdev doesn't impact the physical device, only provides virtualization information. But to extend vdev for Confidential Computing (CC), there are needs to do secure configuration for the vdev, e.g. TSM Bind/Unbind. These configurations should be rolled back on idev destroy, or the external driver (VFIO) functionality may be impact. The idev is created by external driver so its destruction can't fail. The idev implements pre_destroy() op to actively remove its associated vdev before destroying itself. There are 3 cases on idev pre_destroy(): 1. vdev is already destroyed by userspace. No extra handling needed. 2. vdev is still alive. Use iommufd_object_tombstone_user() to destroy vdev and tombstone the vdev ID. 3. vdev is being destroyed by userspace. The vdev ID is already freed, but vdev destroy handler is not completed. This requires multi-threads syncing - vdev holds idev's short term users reference until vdev destruction completes, idev leverages existing wait_shortterm mechanism for syncing. idev should also block any new reference to it after pre_destroy(), or the following wait shortterm would timeout. Introduce a 'destroying' flag, set it to true on idev pre_destroy(). Any attempt to reference idev should honor this flag under the protection of idev->igroup->lock. Link: https://patch.msgid.link/r/20250716070349.1807226-5-yilun.xu@linux.intel.com Originally-by: Nicolin Chen <nicolinc@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> Signed-off-by: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@kernel.org> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6Alexei Starovoitov5-20/+32
Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-18wifi: cfg80211: support configuring an S1G short beaconing BSSLachlan Hodges1-0/+39
S1G short beacons are an optional frame type used in an S1G BSS that contain a limited set of elements. While they are optional, they are a fundamental part of S1G that enables significant power saving. Expose 2 additional netlink attributes, NL80211_ATTR_S1G_LONG_BEACON_PERIOD which denotes the number of beacon intervals between each long beacon and NL80211_ATTR_S1G_SHORT_BEACON which is a nested attribute containing the short beacon tail and head. We split them as the long beacon period cannot be updated, and is only used when initialisng the interface, whereas the short beacon data can be used to both initialise and update the templates. This follows how things such as the beacon interval and DTIM period currently operate. During the initialisation path, we ensure we have the long beacon period if the short beacon data is being passed down, whereas the update path will simply update the template if its sent down. The short beacon data is validated using the same routines for regular beacons as they support correctly parsing the short beacon format while ensuring the frame is well-formed. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250717074205.312577-2-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-07-18drm: Move drm_gem ioctl kerneldoc to uapi fileDavid Francis1-12/+28
The drm_gem ioctls were documented in internal file drm_gem.c instead of uapi header drm.h. Move them there and change to appropriate kerneldoc formatting. Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250717143556.857893-3-David.Francis@amd.com
2025-07-18drm: Add DRM prime interface to reassign GEM handleDavid Francis1-0/+23
CRIU restore of drm buffer objects requires the ability to create or import a buffer object with a specific gem handle. Add new drm ioctl DRM_IOCTL_GEM_CHANGE_HANDLE, which takes the gem handle of an object and moves that object to a specified new gem handle. This ioctl needs to call drm_prime_remove_buf_handle, but that function acquires the prime lock, which the ioctl needs to hold for other purposes. Make drm_prime_remove_buf_handle not acquire the prime lock, and change its other caller to reflect this. The rest of the kernel patches required to enable CRIU can be found at https://lore.kernel.org/dri-devel/20250617194536.538681-1-David.Francis@amd.com/ v2 - Move documentation to UAPI headers v3 - Always return 0 on success Signed-off-by: David Francis <David.Francis@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250717143556.857893-2-David.Francis@amd.com
2025-07-17ethtool: rss: initial RSS_SET (indirection table handling)Jakub Kicinski1-0/+1
Add initial support for RSS_SET, for now only operations on the indirection table are supported. Unlike the ioctl don't check if at least one parameter is being changed. This is how other ethtool-nl ops behave, so pick the ethtool-nl consistency vs copying ioctl behavior. There are two special cases here: 1) resetting the table to defaults; 2) support for tables of different size. For (1) I use an empty Netlink attribute (array of size 0). (2) may require some background. AFAICT a lot of modern devices allow allocating RSS tables of different sizes. mlx5 can upsize its tables, bnxt has some "table size calculation", and Intel folks asked about RSS table sizing in context of resource allocation in the past. The ethtool IOCTL API has a concept of table size, but right now the user is expected to provide a table exactly the size the device requests. Some drivers may change the table size at runtime (in response to queue count changes) but the user is not in control of this. What's not great is that all RSS contexts share the same table size. For example a device with 128 queues enabled, 16 RSS contexts 8 queues in each will likely have 256 entry tables for each of the 16 contexts, while 32 would be more than enough given each context only has 8 queues. To address this the Netlink API should avoid enforcing table size at the uAPI level, and should allow the user to express the min table size they expect. To fully solve (2) we will need more driver plumbing but at the uAPI level this patch allows the user to specify a table size smaller than what the device advertises. The device table size must be a multiple of the user requested table size. We then replicate the user-provided table to fill the full device size table. This addresses the "allow the user to express the min table size" objective, while not enforcing any fixed size. From Netlink perspective .get_rxfh_indir_size() is now de facto the "max" table size supported by the device. We may choose to support table replication in ethtool, too, when we actually plumb this thru the device APIs. Initially I was considering moving full pattern generation to the kernel (which queues to use, at which frequency and what min sequence length). I don't think this complexity would buy us much and most if not all devices have pow-2 table sizes, which simplifies the replication a lot. Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-12/+0
Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") af52020fc599 ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c a44312d58e78 ("net: phy: Don't register LEDs for genphy") f0f2b992d818 ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c 5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17drm: document DRM_MODE_PAGE_FLIP_EVENT interactions with atomicSimon Ser1-0/+8
It's not obvious off-hand which CRTCs will get a page-flip event when using this flag in an atomic commit, because it's all implicitly implied based on the contents of the atomic commit. Document requirements for using this flag and how to request an event for a CRTC. Note, because prepare_signaling() runs right after drm_atomic_set_property() calls, page-flip events are not delivered for CRTCs pulled in later by DRM core (e.g. on modeset by drm_atomic_helper_check_modeset()) or the driver (e.g. other CRTCs sharing a DP-MST connector). v2: fix cut off sentence in commit message (Pekka) Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: Simona Vetter <simona@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Pekka Paalanen <pekka.paalanen@collabora.com> Cc: David Turner <david.turner@raspberrypi.com> Cc: Daniel Stone <daniel@fooishbar.org> Link: https://lore.kernel.org/r/20250501112945.6448-1-contact@emersion.fr
2025-07-17drm/v3d: Add parameter to retrieve the number of GPU resets per-fdMaíra Canal1-0/+1
The GL extension KHR_robustness uses the number of global and per-context GPU resets to learn about graphics resets that affect a GL context. This commit introduces a new V3D parameter to retrieve the number of GPU resets triggered by jobs submitted through a file descriptor. To retrieve this information, user-space must use DRM_V3D_PARAM_CONTEXT_RESET_COUNTER. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://lore.kernel.org/r/20250711-v3d-reset-counter-v1-2-1ac73e9fca2d@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-17drm/v3d: Add parameter to retrieve the global number of GPU resetsMaíra Canal1-0/+1
The GL extension KHR_robustness uses the number of global and per-context GPU resets to learn about graphics resets that affect a GL context. This commit introduces a new V3D parameter to retrieve the global number of GPU resets that have happened since the driver was probed. To retrieve this information, user-space must use DRM_V3D_PARAM_GLOBAL_RESET_COUNTER. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Link: https://lore.kernel.org/r/20250711-v3d-reset-counter-v1-1-1ac73e9fca2d@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-16bpf: Add struct bpf_token_infoTao Chen1-0/+8
The 'commit 35f96de04127 ("bpf: Introduce BPF token object")' added BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD already used to get BPF object info, so we can also get token info with this cmd. One usage scenario, when program runs failed with token, because of the permission failure, we can report what BPF token is allowing with this API for debugging. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250716134654.1162635-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-16drm/amdgpu: Replace HQD terminology with slots namingJesse Zhang1-2/+2
The term "HQD" is CP-specific and doesn't accurately describe the queue resources for other IP blocks like SDMA, VCN, or VPE. This change: 1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect the generic nature of the resource counting 2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots` 3. Maintains the same functionality while using more appropriate terminology Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16drm/amdgpu: Add user queue instance count in HW IP infoJesse Zhang1-0/+2
This change exposes the number of available user queue instances for each hardware IP type (GFX, COMPUTE, SDMA) through the drm_amdgpu_info_hw_ip interface. Key changes: 1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure 2. Implemented counting of available HQD slots using: - mes.gfx_hqd_mask for GFX queues - mes.compute_hqd_mask for COMPUTE queues - mes.sdma_hqd_mask for SDMA queues 3. Only counts available instances when user queues are enabled (!disable_uq) v2: using the adev->mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks to determine the number of queue slots available for each engine type (Alex) v3: rename userq_num_instance to userq_num_hqds (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-14tcp: add LINUX_MIB_BEYOND_WINDOWEric Dumazet1-0/+1
Add a new SNMP MIB : LINUX_MIB_BEYOND_WINDOW Incremented when an incoming packet is received beyond the receiver window. nstat -az | grep TcpExtBeyondWindow Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711114006.480026-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14Add support to set NAPI threaded for individual NAPISamiullah Khawaja1-0/+1
A net device has a threaded sysctl that can be used to enable threaded NAPI polling on all of the NAPI contexts under that device. Allow enabling threaded NAPI polling at individual NAPI level using netlink. Extend the netlink operation `napi-set` and allow setting the threaded attribute of a NAPI. This will enable the threaded polling on a NAPI context. Add a test in `nl_netdev.py` that verifies various cases of threaded NAPI being set at NAPI and at device level. Tested ./tools/testing/selftests/net/nl_netdev.py TAP version 13 1..7 ok 1 nl_netdev.empty_check ok 2 nl_netdev.lo_check ok 3 nl_netdev.page_pool_check ok 4 nl_netdev.napi_list_check ok 5 nl_netdev.dev_set_threaded ok 6 nl_netdev.napi_set_threaded ok 7 nl_netdev.nsim_rxq_reset_down # Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-14PCI/IOV: Restore VF resizable BAR state after resetMichał Winiarski1-0/+9
Similar to regular resizable BARs, VF BARs can also be resized, e.g. by the system firmware or the PCI subsystem itself. The capability layout is the same as PCI_EXT_CAP_ID_REBAR. Add the capability ID and restore it as a part of IOV state. See PCIe r6.2, sec 7.8.7. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20250702093522.518099-2-michal.winiarski@intel.com
2025-07-14Revert "netfilter: nf_tables: Add notifications for hook changes"Phil Sutter2-12/+0
This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82. Such notifications fit better into core or nfnetlink_hook code, following the NFNL_MSG_HOOK_GET message format. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-14ASoC: codec: Convert to GPIO descriptors forMark Brown2-2/+6
Merge series from Peng Fan <peng.fan@nxp.com>: This patchset is a pick up of patch 1,2 from [1]. And I also collect Linus's R-b for patch 2. After this patchset, there is only one user of of_gpio.h left in sound driver(pxa2xx-ac97). of_gpio.h is deprecated, update the driver to use GPIO descriptors. Patch 1 is to drop legacy platform data which in-tree no users are using it Patch 2 is to convert to GPIO descriptors Checking the DTS that use the device, all are using GPIOD_ACTIVE_LOW polarity for reset-gpios, so all should work as expected with this patch. [1] https://lore.kernel.org/all/20250408-asoc-gpio-v1-0-c0db9d3fd6e9@nxp.com/
2025-07-14i2c: Clarify behavior of I2C_M_RD flagI Viswanath1-1/+2
Update the description of I2C_M_RD to clarify that not setting it signals a write transaction Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-07-13RDMA/efa: Add CQ with external memory supportMichael Margolin1-1/+2
Add an option to create CQ using external memory instead of allocating in the driver. The memory can be passed from userspace by dmabuf fd and an offset or a VA. One of the possible usages is creating CQs that reside in accelerator memory, allowing low latency asynchronous direct polling from the accelerator device. Add a capability bit to reflect on the feature support. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20250708202308.24783-4-mrgolin@amazon.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-13RDMA/uverbs: Add a common way to create CQ with umemMichael Margolin1-0/+4
Add ioctl command attributes and a common handling for the option to create CQs with memory buffers passed from userspace. When required attributes are supplied, create umem and provide it for driver's use. The extension enables creation of CQs on top of preallocated CPU virtual or device memory buffers, by supplying VA or dmabuf fd, in a common way. Drivers can support this flow by initializing a new create_cq_umem fp field in their ops struct, with a function that can handle the new parameter. Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://patch.msgid.link/20250708202308.24783-2-mrgolin@amazon.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-11iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV supportNicolin Chen1-0/+15
Add a new vEVENTQ type for VINTFs that are assigned to the user space. Simply report the two 64-bit LVCMDQ_ERR_MAPs register values. Link: https://patch.msgid.link/r/68161a980da41fa5022841209638aeff258557b5.1752126748.git.nicolinc@nvidia.com Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommu/tegra241-cmdqv: Add user-space use supportNicolin Chen1-0/+59
The CMDQV HW supports a user-space use for virtualization cases. It allows the VM to issue guest-level TLBI or ATC_INV commands directly to the queue and executes them without a VMEXIT, as HW will replace the VMID field in a TLBI command and the SID field in an ATC_INV command with the preset VMID and SID. This is built upon the vIOMMU infrastructure by allowing VMM to allocate a VINTF (as a vIOMMU object) and assign VCMDQs (HW QUEUE objs) to the VINTF. So firstly, replace the standard vSMMU model with the VINTF implementation but reuse the standard cache_invalidate op (for unsupported commands) and the standard alloc_domain_nested op (for standard nested STE). Each VINTF has two 64KB MMIO pages (128B per logical VCMDQ): - Page0 (directly accessed by guest) has all the control and status bits. - Page1 (trapped by VMM) has guest-owned queue memory location/size info. VMM should trap the emulated VINTF0's page1 of the guest VM for the guest- level VCMDQ location/size info and forward that to the kernel to translate to a physical memory location to program the VCMDQ HW during an allocation call. Then, it should mmap the assigned VINTF's page0 to the VINTF0 page0 of the guest VM. This allows the guest OS to read and write the guest-own VINTF's page0 for direct control of the VCMDQ HW. For ATC invalidation commands that hold an SID, it requires all devices to register their virtual SIDs to the SID_MATCH registers and their physical SIDs to the pairing SID_REPLACE registers, so that HW can use those as a lookup table to replace those virtual SIDs with the correct physical SIDs. Thus, implement the driver-allocated vDEVICE op with a tegra241_vintf_sid structure to allocate SID_REPLACE and to program the SIDs accordingly. This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. Link: https://patch.msgid.link/r/fb0eab83f529440b6aa181798912a6f0afa21eb0.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-11iommufd: Allow an input data_type via iommu_hw_infoNicolin Chen1-1/+19
The iommu_hw_info can output via the out_data_type field the vendor data type from a driver, but this only allows driver to report one data type. Now, with SMMUv3 having a Tegra241 CMDQV implementation, it has two sets of types and data structs to report. One way to support that is to use the same type field bidirectionally. Reuse the same field by adding an "in_data_type", allowing user space to request for a specific type and to get the corresponding data. For backward compatibility, since the ioctl handler has never checked an input value, add an IOMMU_HW_INFO_FLAG_INPUT_TYPE to switch between the old output-only field and the new bidirectional field. Link: https://patch.msgid.link/r/887378a7167e1786d9d13cde0c36263ed61823d7.1752126748.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>