summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorLines
2025-10-30pidfd: add a new supported_mask fieldChristian Brauner-0/+3
Some of the future fields in struct pidfd_info can be optional. If the kernel has nothing to emit in that field, then it doesn't set the flag in the reply. This presents a problem: There is currently no way to know what mask flags the kernel supports since one can't always count on them being in the reply. Add a new PIDFD_INFO_SUPPORTED_MASK flag and field that the kernel can set in the reply. Userspace can use this to determine if the fields it requires from the kernel are supported. This also gives us a way to deprecate fields in the future, if that should become necessary. Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-5-ca449b7b7aa0@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-30pidfs: add missing PIDFD_INFO_SIZE_VER1Christian Brauner-0/+1
We grew struct pidfd_info not too long ago. Link: https://patch.msgid.link/20251028-work-coredump-signal-v1-3-ca449b7b7aa0@kernel.org Fixes: 1d8db6fd698d ("pidfs, coredump: add PIDFD_INFO_COREDUMP") Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-30net/smc: make wr buffer count configurableHalil Pasic-0/+2
Think SMC_WR_BUF_CNT_SEND := SMC_WR_BUF_CNT used in send context and SMC_WR_BUF_CNT_RECV := 3 * SMC_WR_BUF_CNT used in recv context. Those get replaced with lgr->max_send_wr and lgr->max_recv_wr respective. Please note that although with the default sysctl values qp_attr.cap.max_send_wr == qp_attr.cap.max_recv_wr is maintained but can not be assumed to be generally true any more. I see no downside to that, but my confidence level is rather modest. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20251027224856.2970019-2-pasic@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-30sysctl: fix kernel-doc format warningRandy Dunlap-7/+4
Describe the "type" struct member using '@type' and move it together with the rest of the doc for ctl_table_header to avoid a kernel-doc warning: Warning: include/linux/sysctl.h:178 Incorrect use of kernel-doc format: * enum type - Enumeration to differentiate between ctl target types Fixes: 2f2665c13af4 ("sysctl: replace child with an enumeration") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-10-30netfilter: fix typo in nf_conntrack_l4proto.h commentcaivive (Weibiao Tu)-1/+1
In the comment for nf_conntrack_l4proto.h, the word "nfnetink" was incorrectly spelled. It has been corrected to "nfnetlink". Fixes a typo to enhance readability and ensure consistency. Signed-off-by: caivive (Weibiao Tu) <cavivie@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-10-30xfrm: Determine inner GSO type from packet inner protocolJianbo Liu-1/+2
The GSO segmentation functions for ESP tunnel mode (xfrm4_tunnel_gso_segment and xfrm6_tunnel_gso_segment) were determining the inner packet's L2 protocol type by checking the static x->inner_mode.family field from the xfrm state. This is unreliable. In tunnel mode, the state's actual inner family could be defined by x->inner_mode.family or by x->inner_mode_iaf.family. Checking only the former can lead to a mismatch with the actual packet being processed, causing GSO to create segments with the wrong L2 header type. This patch fixes the bug by deriving the inner mode directly from the packet's inner protocol stored in XFRM_MODE_SKB_CB(skb)->protocol. Instead of replicating the code, this patch modifies the xfrm_ip2inner_mode helper function. It now correctly returns &x->inner_mode if the selector family (x->sel.family) is already specified, thereby handling both specific and AF_UNSPEC cases appropriately. With this change, ESP GSO can use xfrm_ip2inner_mode to get the correct inner mode. It doesn't affect existing callers, as the updated logic now mirrors the checks they were already performing externally. Fixes: 26dbd66eab80 ("esp: choose the correct inner protocol for GSO on inter address family tunnels") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-30accel/ivpu: Add support for userptr buffer objectsJacek Lawrynowicz-0/+52
Introduce a new ioctl `drm_ivpu_bo_create_from_userptr` that allows users to create GEM buffer objects from user pointers to memory regions. The user pointer must be page-aligned and the memory region must remain valid for the buffer object's lifetime. Userptr buffers enable direct use of mmapped files (e.g. inference weights) in NPU workloads without copying data to NPU buffer objects. This reduces memory usage and provides better flexibility for NPU applications. Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://patch.msgid.link/20251029091752.203198-1-karol.wachowski@linux.intel.com
2025-10-30wifi: mac80211: add RX flag to report radiotap VHT informationBenjamin Berg-1/+21
mac80211 already reports some basic information in the radiotap header with the known fields declared by the driver. However, drivers may want to report more accurate information and in that case the full VHT radiotap structure needs to be provided. Add a new RX_FLAG_RADIOTAP_VHT which is set when the VHT information should be pulled from the skb. Update the code to fill in the VHT fields to only do so when requested by the driver or if the information has not yet been set. This way the driver can fully control the information if it chooses so. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251027142118.0bad1c307a21.I2cf285c20a822698039603f2af00ed9c548f2ee0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-29crypto: blake2b - Reimplement using library APIEric Biggers-126/+0
Replace blake2b_generic.c with a new file blake2b.c which implements the BLAKE2b crypto_shash algorithms on top of the BLAKE2b library API. Change the driver name suffix from "-generic" to "-lib" to reflect that these algorithms now just use the (possibly arch-optimized) library. This closely mirrors crypto/{md5,sha1,sha256,sha512}.c. Remove include/crypto/internal/blake2b.h since it is no longer used. Likewise, remove struct blake2b_state from include/crypto/blake2b.h. Omit support for import_core and export_core, since there are no legacy drivers that need these for these algorithms. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2b: Add BLAKE2b library functionsEric Biggers-13/+137
Add a library API for BLAKE2b, closely modeled after the BLAKE2s API. This will allow in-kernel users such as btrfs to use BLAKE2b without going through the generic crypto layer. In addition, as usual the BLAKE2b crypto_shash algorithms will be reimplemented on top of this. Note: to create lib/crypto/blake2b.c I made a copy of lib/crypto/blake2s.c and made the updates from BLAKE2s => BLAKE2b. This way, the BLAKE2s and BLAKE2b code is kept consistent. Therefore, it borrows the SPDX-License-Identifier and Copyright from lib/crypto/blake2s.c rather than crypto/blake2b_generic.c. The library API uses 'struct blake2b_ctx', consistent with other lib/crypto/ APIs. The existing 'struct blake2b_state' will be removed once the blake2b crypto_shash algorithms are updated to stop using it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29byteorder: Add le64_to_cpu_array() and cpu_to_le64_array()Eric Biggers-0/+16
Add le64_to_cpu_array() and cpu_to_le64_array(). These mirror the corresponding 32-bit functions. These will be used by the BLAKE2b code. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2s: Document the BLAKE2s library APIEric Biggers-0/+58
Add kerneldoc for the BLAKE2s library API. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2s: Drop excessive const & rename block => dataEric Biggers-7/+6
A couple more small cleanups to the BLAKE2s code before these things get propagated into the BLAKE2b code: - Drop 'const' from some non-pointer function parameters. It was a bit excessive and not conventional. - Rename 'block' argument of blake2s_compress*() to 'data'. This is for consistency with the SHA-* code, and also to avoid the implication that it points to a singular "block". No functional changes. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2s: Rename blake2s_state to blake2s_ctxEric Biggers-30/+29
For consistency with the SHA-1, SHA-2, SHA-3 (in development), and MD5 library APIs, rename blake2s_state to blake2s_ctx. As a refresher, the ctx name: - Is a bit shorter. - Avoids confusion with the compression function state, which is also often called the state (but is just part of the full context). - Is consistent with OpenSSL. Not a big deal, of course. But consistency is nice. With a BLAKE2b library API about to be added, this is a convenient time to update this. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2s: Adjust parameter order of blake2s()Eric Biggers-3/+3
Reorder the parameters of blake2s() from (out, in, key, outlen, inlen, keylen) to (key, keylen, in, inlen, out, outlen). This aligns BLAKE2s with the common conventions of pairing buffers and their lengths, and having outputs follow inputs. This is widely used elsewhere in lib/crypto/ and crypto/, and even elsewhere in the BLAKE2s code itself such as blake2s_init_key() and blake2s_final(). So blake2s() was a bit of an exception. Notably, this results in the same order as hmac_*_usingrawkey(). Note that since the type signature changed, it's not possible for a blake2s() call site to be silently missed. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29scsi: ufs: core: Add a quirk to suppress link_startup_againAdrian Hunter-0/+7
ufshcd_link_startup() has a facility (link_startup_again) to issue DME_LINKSTARTUP a 2nd time even though the 1st time was successful. Some older hardware benefits from that, however the behaviour is non-standard, and has been found to cause link startup to be unreliable for some Intel Alder Lake based host controllers. Add UFSHCD_QUIRK_PERFORM_LINK_STARTUP_ONCE to suppress link_startup_again, in preparation for setting the quirk for affected controllers. Fixes: 7dc9fb47bc9a ("scsi: ufs: ufs-pci: Add support for Intel ADL") Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251024085918.31825-3-adrian.hunter@intel.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-29libeth: xdp: Disable generic kCFI pass for libeth_xdp_tx_xmit_bulk()Nathan Chancellor-1/+1
When building drivers/net/ethernet/intel/idpf/xsk.c for ARCH=arm with CONFIG_CFI=y using a version of LLVM prior to 22.0.0, there is a BUILD_BUG_ON failure: $ cat arch/arm/configs/repro.config CONFIG_BPF_SYSCALL=y CONFIG_CFI=y CONFIG_IDPF=y CONFIG_XDP_SOCKETS=y $ make -skj"$(nproc)" ARCH=arm LLVM=1 clean defconfig repro.config drivers/net/ethernet/intel/idpf/xsk.o In file included from drivers/net/ethernet/intel/idpf/xsk.c:4: include/net/libeth/xsk.h:205:2: error: call to '__compiletime_assert_728' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(tmo == libeth_xsktmo) 205 | BUILD_BUG_ON(!__builtin_constant_p(tmo == libeth_xsktmo)); | ^ ... libeth_xdp_tx_xmit_bulk() indirectly calls libeth_xsk_xmit_fill_buf() but these functions are marked as __always_inline so that the compiler can turn these indirect calls into direct ones and see that the tmo parameter to __libeth_xsk_xmit_fill_buf_md() is ultimately libeth_xsktmo from idpf_xsk_xmit(). Unfortunately, the generic kCFI pass in LLVM expands the kCFI bundles from the indirect calls in libeth_xdp_tx_xmit_bulk() in such a way that later optimizations cannot turn these calls into direct ones, making the BUILD_BUG_ON fail because it cannot be proved at compile time that tmo is libeth_xsktmo. Disable the generic kCFI pass for libeth_xdp_tx_xmit_bulk() to ensure these indirect calls can always be turned into direct calls to avoid this error. Closes: https://github.com/ClangBuiltLinux/linux/issues/2124 Fixes: 9705d6552f58 ("idpf: implement Rx path for AF_XDP") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20251025-idpf-fix-arm-kcfi-build-error-v1-3-ec57221153ae@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-10-29compiler_types: Introduce __nocfi_genericNathan Chancellor-0/+6
There are two different ways that LLVM can expand kCFI operand bundles in LLVM IR: generically in the middle end or using an architecture specific sequence when lowering LLVM IR to machine code in the backend. The generic pass allows any architecture to take advantage of kCFI but the expansion of these bundles in the middle end can mess with optimizations that may turn indirect calls into direct calls when the call target is known at compile time, such as after inlining. Add __nocfi_generic, dependent on an architecture selecting CONFIG_ARCH_USES_CFI_GENERIC_LLVM_PASS, to disable kCFI bundle generation in functions where only the generic kCFI pass may cause problems. Link: https://github.com/ClangBuiltLinux/linux/issues/2124 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20251025-idpf-fix-arm-kcfi-build-error-v1-1-ec57221153ae@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-10-29Merge patch series "ufs: Add support for AMD Versal Gen2 UFS"Martin K. Petersen-0/+55
Ajay Neeli <ajay.neeli@amd.com> says: This patch series adds support for the UFS driver on the AMD Versal Gen 2 SoC. It includes: - Device tree bindings and driver implementation. - Secure read support for the secure retrieval of UFS calibration values. The UFS host driver is based upon the Synopsis DesignWare (DWC) UFS architecture, utilizing the existing UFSHCD_DWC and UFSHCD_PLATFORM drivers. Link: https://patch.msgid.link/20251021113003.13650-1-ajay.neeli@amd.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-29scsi: ufs: amd-versal2: Add UFS support for AMD Versal Gen 2 SoCSai Krishna Potthuri-0/+1
Add support for the UFS host controller on the AMD Versal Gen 2 SoC, built on the Synopsys DWC UFS architecture, using the UFSHCD DWC and UFSHCD platform driver. This controller requires specific configurations like M-PHY/RMMI/UniPro and vendor specific registers programming before doing the UIC_LINKSTARTUP. Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Ajay Neeli <ajay.neeli@amd.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251021113003.13650-5-ajay.neeli@amd.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-29scsi: firmware: xilinx: Add APIs for UFS PHY initializationAjay Neeli-0/+39
- Add APIs for UFS PHY initialization. - Verify M-PHY TX-RX configuration readiness. - Confirm SRAM initialization and Set SRAM bypass. - Retrieve UFS calibration values. Signed-off-by: Ajay Neeli <ajay.neeli@amd.com> Acked-by: Senthil Nathan Thangaraj <senthilnathan.thangaraj@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251021113003.13650-4-ajay.neeli@amd.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-29scsi: firmware: xilinx: Add support for secure read/write ioctl interfaceIzhar Ameer Shaikh-0/+15
Add support for a generic ioctl read/write interface using which users can request firmware to perform read/write operations on a protected and secure address space. The functionality is introduced through the means of two new IOCTL IDs which extend the existing PM_IOCTL EEMI API: - IOCTL_READ_REG - IOCTL_MASK_WRITE_REG The caller only passes the node id of the given device and an offset. The base address is not exposed to the caller and internally retrieved by the firmware. Firmware will enforce an access policy on the incoming read/write request. Signed-off-by: Izhar Ameer Shaikh <izhar.ameer.shaikh@amd.com> Reviewed-by: Tanmay Shah <tanmay.shah@amd.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Signed-off-by: Ajay Neeli <ajay.neeli@amd.com> Acked-by: Senthil Nathan Thangaraj <senthilnathan.thangaraj@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251021113003.13650-3-ajay.neeli@amd.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-29net: phy: add iterator mdiobus_for_each_phyHeiner Kallweit-1/+10
Add an iterator for all PHY's on a MII bus, and phy_find_next() as a prerequisite. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/cd112f15-401a-43d9-8525-9ff0965a68cd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29net: tls: Cancel RX async resync request on rcd_delta overflowShahar Shitrit-0/+6
When a netdev issues a RX async resync request for a TLS connection, the TLS module handles it by logging record headers and attempting to match them to the tcp_sn provided by the device. If a match is found, the TLS module approves the tcp_sn for resynchronization. While waiting for a device response, the TLS module also increments rcd_delta each time a new TLS record is received, tracking the distance from the original resync request. However, if the device response is delayed or fails (e.g due to unstable connection and device getting out of tracking, hardware errors, resource exhaustion etc.), the TLS module keeps logging and incrementing, which can lead to a WARN() when rcd_delta exceeds the threshold. To address this, introduce tls_offload_rx_resync_async_request_cancel() to explicitly cancel resync requests when a device response failure is detected. Call this helper also as a final safeguard when rcd_delta crosses its threshold, as reaching this point implies that earlier cancellation did not occur. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761508983-937977-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29net: tls: Change async resync helpers argumentShahar Shitrit-13/+8
Update tls_offload_rx_resync_async_request_start() and tls_offload_rx_resync_async_request_end() to get a struct tls_offload_resync_async parameter directly, rather than extracting it from struct sock. This change aligns the function signatures with the upcoming tls_offload_rx_resync_async_request_cancel() helper, which will be introduced in a subsequent patch. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761508983-937977-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29ipv6: icmp: Add RFC 5837 supportIdo Schimmel-0/+1
Add the ability to append the incoming IP interface information to ICMPv6 error messages in accordance with RFC 5837 and RFC 4884. This is required for more meaningful traceroute results in unnumbered networks. The feature is disabled by default and controlled via a new sysctl ("net.ipv6.icmp.errors_extension_mask") which accepts a bitmask of ICMP extensions to append to ICMP error messages. Currently, only a single value is supported, but the interface and the implementation should be able to support more extensions, if needed. Clone the skb and copy the relevant data portions before modifying the skb as the caller of icmp6_send() still owns the skb after the function returns. This should be fine since by default ICMP error messages are rate limited to 1000 per second and no more than 1 per second per specific host. Trim or pad the packet to 128 bytes before appending the ICMP extension structure in order to be compatible with legacy applications that assume that the ICMP extension structure always starts at this offset (the minimum length specified by RFC 4884). Since commit 20e1954fe238 ("ipv6: RFC 4884 partial support for SIT/GRE tunnels") it is possible for icmp6_send() to be called with an skb that already contains ICMP extensions. This can happen when we receive an ICMPv4 message with extensions from a tunnel and translate it to an ICMPv6 message towards an IPv6 host in the overlay network. I could not find an RFC that supports this behavior, but it makes sense to not overwrite the original extensions that were appended to the packet. Therefore, avoid appending extensions if the length field in the provided ICMPv6 header is already filled. Export netdev_copy_name() using EXPORT_IPV6_MOD_GPL() to make it available to IPv6 when it is built as a module. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251027082232.232571-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29ipv4: icmp: Add RFC 5837 supportIdo Schimmel-0/+33
Add the ability to append the incoming IP interface information to ICMPv4 error messages in accordance with RFC 5837 and RFC 4884. This is required for more meaningful traceroute results in unnumbered networks. The feature is disabled by default and controlled via a new sysctl ("net.ipv4.icmp_errors_extension_mask") which accepts a bitmask of ICMP extensions to append to ICMP error messages. Currently, only a single value is supported, but the interface and the implementation should be able to support more extensions, if needed. Clone the skb and copy the relevant data portions before modifying the skb as the caller of __icmp_send() still owns the skb after the function returns. This should be fine since by default ICMP error messages are rate limited to 1000 per second and no more than 1 per second per specific host. Trim or pad the packet to 128 bytes before appending the ICMP extension structure in order to be compatible with legacy applications that assume that the ICMP extension structure always starts at this offset (the minimum length specified by RFC 4884). Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251027082232.232571-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29tcp: add newval parameter to tcp_rcvbuf_grow()Eric Dumazet-1/+1
This patch has no functional change, and prepares the following one. tcp_rcvbuf_grow() will need to have access to tp->rcvq_space.space old and new values. Change mptcp_rcvbuf_grow() in a similar way. Signed-off-by: Eric Dumazet <edumazet@google.com> [ Moved 'oldval' declaration to the next patch to avoid warnings at build time. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251028-net-tcp-recv-autotune-v3-3-74b43ba4c84c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29trace: tcp: add three metrics to trace_tcp_rcvbuf_grow()Eric Dumazet-0/+9
While chasing yet another receive autotuning bug, I found useful to add rcv_ssthresh, window_clamp and rcv_wnd. tcp_stream 40597 [068] 2172.978198: tcp:tcp_rcvbuf_grow: time=50307 rtt_us=50179 copied=77824 inq=0 space=40960 ooo=0 scaling_ratio=219 rcvbuf=131072 rcv_ssthresh=107474 window_clamp=112128 rcv_wnd=110592 tcp_stream 40597 [068] 2173.028528: tcp:tcp_rcvbuf_grow: time=50336 rtt_us=50206 copied=110592 inq=0 space=77824 ooo=0 scaling_ratio=219 rcvbuf=509444 rcv_ssthresh=328658 window_clamp=435813 rcv_wnd=331776 tcp_stream 40597 [068] 2173.078830: tcp:tcp_rcvbuf_grow: time=50305 rtt_us=50070 copied=270336 inq=0 space=110592 ooo=0 scaling_ratio=219 rcvbuf=509444 rcv_ssthresh=431159 window_clamp=435813 rcv_wnd=434176 tcp_stream 40597 [068] 2173.129137: tcp:tcp_rcvbuf_grow: time=50313 rtt_us=50118 copied=434176 inq=0 space=270336 ooo=0 scaling_ratio=219 rcvbuf=2457847 rcv_ssthresh=1299511 window_clamp=2102611 rcv_wnd=1302528 tcp_stream 40597 [068] 2173.179451: tcp:tcp_rcvbuf_grow: time=50318 rtt_us=50041 copied=1019904 inq=0 space=434176 ooo=0 scaling_ratio=219 rcvbuf=2457847 rcv_ssthresh=2087445 window_clamp=2102611 rcv_wnd=2088960 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251028-net-tcp-recv-autotune-v3-2-74b43ba4c84c@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29fs: Make wbc_to_tag() inline and use it in fs.Julian Sun-0/+7
The logic in wbc_to_tag() is widely used in file systems, so modify this function to be inline and use it in file systems. This patch has only passed compilation tests, but it should be fine. Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29writeback: allow the file system to override MIN_WRITEBACK_PAGESChristoph Hellwig-0/+6
The relatively low minimal writeback size of 4MiB means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. Add a superblock field that allows the file system to override the default size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251017034611.651385-3-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm: rename filemap_fdatawrite_range_kick to filemap_flush_rangeChristoph Hellwig-3/+3
Rename filemap_fdatawrite_range_kick to filemap_flush_range because it is the ranged version of filemap_flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-11-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm: remove __filemap_fdatawrite_rangeChristoph Hellwig-2/+0
Use filemap_fdatawrite_range and filemap_fdatawrite_range_kick instead of the low-level __filemap_fdatawrite_range that requires the caller to know the internals of the writeback_control structure and remove __filemap_fdatawrite_range now that it is trivial and only two callers would be left. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-10-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm: remove filemap_fdatawrite_wbcChristoph Hellwig-2/+0
Replace filemap_fdatawrite_wbc, which exposes a writeback_control to the callers with a filemap_writeback helper that takes all the possible arguments and declares the writeback_control itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-9-hch@lst.de Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm,btrfs: add a filemap_flush_nr helperChristoph Hellwig-0/+1
Abstract out the btrfs-specific behavior of kicking off I/O on a number of pages on an address_space into a well-defined helper. Note: there is no kerneldoc comment for the new function because it is not part of the public API. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-7-hch@lst.de Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29regmap: add flat cache with sparse validitySander Vanheule-6/+11
The flat regcache will always assume the data in the cache is valid. Since the cache is preferred over hardware access, this may shadow the actual state of the device. Add a new containing cache structure with the flat data table and a bitmap indicating cache validity. REGCACHE_FLAT will still behave as before, as the validity is ignored. Define new cache type REGCACHE_FLAT_S: a flat cache with sparse validity. The sparse validity is used to determine if a hardware access should occur to initialize the cache on the fly, vs. at regmap init for REGCACHE_FLAT. Contrary to REGCACHE_FLAT, this allows us to implement regcache_ops.drop. Signed-off-by: Sander Vanheule <sander@svanheule.net> Link: https://patch.msgid.link/20251029081248.52607-2-sander@svanheule.net Signed-off-by: Mark Brown <broonie@kernel.org>
2025-10-29perf: Support deferred user unwindPeter Zijlstra-14/+34
Add support for deferred userspace unwind to perf. Where perf currently relies on in-place stack unwinding; from NMI context and all that. This moves the userspace part of the unwind to right before the return-to-userspace. This has two distinct benefits, the biggest is that it moves the unwind to a faultable context. It becomes possible to fault in debug info (.eh_frame, SFrame etc.) that might not otherwise be readily available. And secondly, it de-duplicates the user callchain where multiple samples happen during the same kernel entry. To facilitate this the perf interface is extended with a new record type: PERF_RECORD_CALLCHAIN_DEFERRED and two new attribute flags: perf_event_attr::defer_callchain - to request the user unwind be deferred perf_event_attr::defer_output - to request PERF_RECORD_CALLCHAIN_DEFERRED records The existing PERF_RECORD_SAMPLE callchain section gets a new context type: PERF_CONTEXT_USER_DEFERRED After which will come a single entry, denoting the 'cookie' of the deferred callchain that should be attached here, matching the 'cookie' field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED. The 'defer_callchain' flag is expected on all events with PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event responsible for collecting side-band events (like mmap, comm etc.). Setting 'defer_output' on multiple events will get you duplicated PERF_RECORD_CALLCHAIN_DEFERRED records. Based on earlier patches by Josh and Steven. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net
2025-10-29unwind_user/x86: Teach FP unwind about start of functionPeter Zijlstra-0/+1
When userspace is interrupted at the start of a function, before we get a chance to complete the frame, unwind will miss one caller. X86 has a uprobe specific fixup for this, add bits to the generic unwinder to support this. Suggested-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251024145156.GM4068168@noisy.programming.kicks-ass.net
2025-10-29unwind: Implement compat fp unwindPeter Zijlstra-0/+1
It is important to be able to unwind compat tasks too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20250924080119.613695709@infradead.org
2025-10-29unwind: Make unwind_task_info::unwind_mask consistentPeter Zijlstra-3/+4
The unwind_task_info::unwind_mask was manipulated using a mixture of: regular store WRITE_ONCE() try_cmpxchg() set_bit() atomic_long_*() Clean up and make it consistently atomic_long_t. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20250924080119.384384486@infradead.org
2025-10-29unwind: Simplify unwind_reset_info()Peter Zijlstra-14/+14
Invert the condition of the first if and make it an early exit to reduce an indent level for the rest fo the function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20250924080118.777916262@infradead.org
2025-10-29unwind: Add required include filesPeter Zijlstra-0/+2
To be self sufficient, the file needs to include linux/types.h. This provides things like u32/u64 and struct callback_head. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20250924080118.665787071@infradead.org
2025-10-29unwind: Shorten linesPeter Zijlstra-4/+14
There are some exceptionally long lines that cause ugly wrapping. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20250924080118.545274393@infradead.org
2025-10-29dma-mapping: remove unused map_page callbackLeon Romanovsky-7/+0
After conversion of arch code to use physical address mapping, there are no users of .map_page() and .unmap_page() callbacks, so let's remove them. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-14-3bbfe3a25cdf@kernel.org
2025-10-29dma-mapping: remove unused mapping resource callbacksLeon Romanovsky-6/+0
After ARM and XEN conversions to use physical addresses for the mapping, there are no in-kernel users for map_resource/unmap_resource callbacks, so remove them. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-6-3bbfe3a25cdf@kernel.org
2025-10-29dma-mapping: prepare dma_map_ops to conversion to physical addressLeon Romanovsky-0/+7
Add new .map_phys() and .unmap_phys() callbacks to dma_map_ops as a preparation to replace .map_page() and .unmap_page() respectively. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-1-3bbfe3a25cdf@kernel.org
2025-10-29dma-mapping: benchmark: Restore padding to ensure uABI remained consistentQinxin Xia-0/+1
The padding field in the structure was previously reserved to maintain a stable interface for potential new fields, ensuring compatibility with user-space shared data structures. However,it was accidentally removed by tiantao in a prior commit, which may lead to incompatibility between user space and the kernel. This patch reinstates the padding to restore the original structure layout and preserve compatibility. Fixes: 8ddde07a3d28 ("dma-mapping: benchmark: extract a common header file for map_benchmark definition") Cc: stable@vger.kernel.org Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Reported-by: Barry Song <baohua@kernel.org> Closes: https://lore.kernel.org/lkml/CAGsJ_4waiZ2+NBJG+SCnbNk+nQ_ZF13_Q5FHJqZyxyJTcEop2A@mail.gmail.com/ Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251028120900.2265511-2-xiaqinxin@huawei.com
2025-10-29tools/dma: move dma_map_benchmark from selftests to tools/dmaQinxin Xia-5/+8
dma_map_benchmark is a standalone developer tool rather than an automated selftest. It has no pass/fail criteria, expects manual invocation, and is built as a normal userspace binary. Move it to tools/dma/ and add a minimal Makefile. Suggested-by: Marek Szyprowski <m.szyprowski@samsung.com> Suggested-by: Barry Song <baohua@kernel.org> Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251028120900.2265511-3-xiaqinxin@huawei.com
2025-10-29Merge branch 'linus/master' into sched/core, to resolve conflictPeter Zijlstra-48/+180
Conflicts: kernel/sched/ext.c Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-10-28sctp: Constify struct sctp_sched_opsChristophe JAILLET-3/+3
'struct sctp_sched_ops' is not modified in these drivers. Constifying this structure moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 8019 568 0 8587 218b net/sctp/stream_sched_fc.o After: ===== text data bss dec hex filename 8275 312 0 8587 218b net/sctp/stream_sched_fc.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/dce03527eb7b7cc8a3c26d5cdac12bafe3350135.1761377890.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>