linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Lines
2025-09-22	virtio-spi: Add virtio-spi.h	Haixu Cui	-0/+181
	Add virtio-spi.h header for virtio SPI. Signed-off-by: Haixu Cui <quic_haixcui@quicinc.com> Link: https://patch.msgid.link/20250908092348.1283552-3-quic_haixcui@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-09-22	virtio: Add ID for virtio SPI	Haixu Cui	-0/+1
	Add VIRTIO_ID_SPI definition for virtio SPI. Signed-off-by: Haixu Cui <quic_haixcui@quicinc.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Link: https://patch.msgid.link/20250908092348.1283552-2-quic_haixcui@quicinc.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-09-22	Merge tag 'amd-drm-next-6.18-2025-09-19' of ↵	Dave Airlie	-6/+11
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-09-19: amdgpu: - Fence drv clean up fix - DPC fixes - Misc display fixes - Support the MMIO remap page as a ttm pool - JPEG parser updates - UserQ updates - VCN ctx handling fixes - Documentation updates - Misc cleanups - SMU 13.0.x updates - SI DPM updates - GC 11.x cleaner shader updates - DMCUB updates - DML fixes - Improve fallback handling for pixel encoding - VCN reset improvements - DCE6 DC updates - DSC fixes - Use devm for i2c buses - GPUVM locking updates - GPUVM documentation improvements - Drop non-DC DCE11 code - S0ix fixes - Backlight fix - SR-IOV fixes amdkfd: - SVM updates Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250919193354.2989255-1-alexander.deucher@amd.com
2025-09-21	uapi: vduse: fix typo in comment	Ashwini Sahu	-1/+1
	Fix a spelling mistake in vduse.h: "regsion" → "region" in the documentation for struct vduse_iova_info. No functional change. Signed-off-by: Ashwini Sahu <ashwini@wisig.com> Message-Id: <20250908095645.610336-1-ashwini@wisig.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-09-20	Merge tag 'asoc-v6.18' of ↵	Takashi Iwai	-4/+39
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next ASoC: Updates for v6.18 A relatively quiet release for ASoC, we've had a lot of maintainance work going on and several new drivers but really the most remarkable thing is that we removed a driver, the WL1273 driver used in some old Nokia systems that have had the underlying system support removed from the kernel. - Morimoto-san continues his work on cleanups of the core APIs and enforcement of abstraction layers. - Lots of cleanups and conversions of DT bindings. - Substantial maintainance work on the Intel AVS drivers. - Support for Qualcomm Glymur and PM4125, Realtek RT1321, Shanghai FourSemi FS2104/5S, Texas Instruments PCM1754. - Remove support for TI WL1273.
2025-09-19	rtnetlink: add needed_{head,tail}room attributes	Alasdair McWilliam	-0/+2
	Various network interface types make use of needed_{head,tail}room values to efficiently reserve buffer space for additional encapsulation headers, such as VXLAN, Geneve, IPSec, etc. However, it is not currently possible to query these values in a generic way. Introduce ability to query the needed_{head,tail}room values of a network device via rtnetlink, such that applications that may wish to use these values can do so. For example, Cilium agent iterates over present devices based on user config (direct routing, vxlan, geneve, wireguard etc.) and in future will configure netkit in order to expose the needed_{head,tail}room into K8s pods. See b9ed315d3c4c ("netkit: Allow for configuring needed_{head,tail}room"). Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alasdair McWilliam <alasdair@mcwilliam.dev> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20250917095543.14039-1-alasdair@mcwilliam.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19	ALSA: compress_offload: increase SNDRV_COMPRESS_VERSION minor version by 1	Alexey Klimov	-1/+1
	Since addition of raw opus codec support we need to update compress API minor version by one. Bump the SNDRV_COMPRESS_VERSION to 0.4.1. Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-09-19	ALSA: compress: add raw opus codec define and opus decoder structs	Alexey Klimov	-1/+42
	Adds a raw opus codec define and raw opus decoder structs. This is for raw OPUS packets not packed in any type of container (for instance OGG container). The decoder struct fields are taken from corresponding RFC document: RFC 7845 Section 5. Cc: Srinivas Kandagatla <srini@kernel.org> Cc: Vinod Koul <vkoul@kernel.org> Co-developed-by: Annemarie Porter <annemari@quicinc.com> Signed-off-by: Annemarie Porter <annemari@quicinc.com> Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-09-19	nsfs: add inode number for anon namespace	Christian Brauner	-0/+3
	Add an inode number anonymous namespaces. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19	nsfs: add missing id retrieval support	Christian Brauner	-2/+4
	The mount namespace has supported id retrieval for a while already. Add support for the other types as well. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19	nsfs: support exhaustive file handles	Christian Brauner	-0/+1
	Pidfd file handles are exhaustive meaning they don't require a handle on another pidfd to pass to open_by_handle_at() so it can derive the filesystem to decode in. Instead it can be derived from the file handle itself. The same is possible for namespace file handles. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19	nsfs: support file handles	Christian Brauner	-0/+9
	A while ago we added support for file handles to pidfs so pidfds can be encoded and decoded as file handles. Userspace has adopted this quickly and it's proven very useful. Implement file handles for namespaces as well. A process is not always able to open /proc/self/ns/. That requires procfs to be mounted and for /proc/self/ or /proc/self/ns/ to not be overmounted. However, userspace can always derive a namespace fd from a pidfd. And that always works for a task's own namespace. There's no need to introduce unnecessary behavioral differences between /proc/self/ns/ fds, pidfd-derived namespace fds, and file-handle-derived namespace fds. So namespace file handles are always decodable if the caller is located in the namespace the file handle refers to. This also allows a task to e.g., store a set of file handles to its namespaces in a file on-disk so it can verify when it gets rexeced that they're still valid and so on. This is akin to the pidfd use-case. Or just plainly for namespace comparison reasons where a file handle to the task's own namespace can be easily compared against others. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19	wifi: cfg80211: correctly implement and validate S1G chandef	Lachlan Hodges	-0/+15
	Currently, the S1G channelisation implementation differs from that of VHT, which is the PHY that S1G is based on. The major difference between the clock rate is 1/10th of VHT. However how their channelisation is represented within cfg80211 and mac80211 vastly differ. To rectify this, remove the use of IEEE80211_CHAN_1/2/4.. flags that were previously used to indicate the control channel width, however it should be implied that the control channels are 1MHz in the case of S1G. Additionally, introduce the invert - being IEEE80211_CHAN_NO_4/8/16MHz - that imply the control channel may not be used for a certain bandwidth. With these new flags, we can perform regulatory and chandef validation just as we would for VHT. To deal with the notion that S1G PHYs may contain a 2MHz primary channel, introduce a new variable, s1g_primary_2mhz, which indicates whether we are operating on a 2MHz primary channel. In this case, the chandef::chan points to the 1MHz primary channel pointed to by the primary channel location. Alongside this, introduce some new helper routines that can extract the sibling 1MHz channel. The sibling being the alternate 1MHz primary subchannel within the 2MHz primary channel that is not pointed to by chandef::chan. Furthermore, due to unique restrictions imposed on S1G PHYs, introduce a new flag, IEEE80211_CHAN_S1G_NO_PRIMARY, which states that the 1MHz channel cannot be used as a primary channel. This is assumed to be set by vendors as it is hardware and regdom specific, When we validate a 2MHz primary channel, we need to ensure both 1MHz subchannels do not contain this flag. If one or both of the 1MHz subchannels contain this flag then the 2MHz primary is not permitted for use as a primary channel. Properly integrate S1G channel validation such that it is implemented according with other PHY types such as VHT. Additionally, implement a new S1G-specific regulatory flag to allow cfg80211 to understand specific vendor requirements for S1G PHYs. Signed-off-by: Arien Judge <arien.judge@morsemicro.com> Signed-off-by: Andrew Pope <andrew.pope@morsemicro.com> Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250918051913.500781-2-lachlan.hodges@morsemicro.com [remove redundant NL80211_ATTR_S1G_PRIMARY_2MHZ check] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-19	wifi: nl80211: Add more NAN capabilities	Andrei Otcheretianski	-0/+55
	Add better break down for NAN capabilities, as NAN has multiple optional features. This allows to better indicate which features are supported or or offloaded to the device. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.bb02cd8c1596.I01fb2e8dc3662b847f3c27117bc4e199fc96d0a3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-19	wifi: cfg80211: Add cluster joined notification APIs	Andrei Otcheretianski	-0/+8
	The drivers should notify upper layers and user space when a NAN device joins a cluster. This is needed, for example, to set the correct addr3 in SDF frames. Add API to report cluster join event. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.ad27b7b6e4d9.I70b213a2a49f18d1ba2ad325e67e8eff51cc7a1f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-19	wifi: nl80211: Add NAN Discovery Window (DW) notification	Andrei Otcheretianski	-0/+16
	This notification will be used by the device to inform user space about upcoming DW. When received, user space will be able to prepare multicast Service Discovery Frames (SDFs) to be transmitted during the next DW using %NL80211_CMD_FRAME command on the NAN management interface. The device/driver will take care to transmit the frames in the correct timing. This allows to implement a synchronized Discovery Engine (DE) in user space, if the device doesn't support DE offload. Note that this notification can be sent before the actual DW starts as long as the driver/device handles the actual timing of the SDF transmission. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.0e1d15031bab.I5b1721e61b63910452b3c5cdcdc1e94cb094d4c9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-19	wifi: nl80211: Add more configuration options for NAN commands	Andrei Otcheretianski	-2/+108
	Current NAN APIs have only basic configuration for master preference and operating bands. Add and parse additional parameters which provide more control over NAN synchronization. The newly added attributes allow to publish additional NAN attributes and vendor elements in NAN beacons, control scan and discovery beacons periodicity, enable/disable DW notifications etc. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> tested: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250908140015.a4779492bf8e.I375feb919bd72358173766b9fe10010c40796b33@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-19	rust_binder: add Rust Binder driver	Alice Ryhl	-1/+1
	We're generally not proponents of rewrites (nasty uncomfortable things that make you late for dinner!). So why rewrite Binder? Binder has been evolving over the past 15+ years to meet the evolving needs of Android. Its responsibilities, expectations, and complexity have grown considerably during that time. While we expect Binder to continue to evolve along with Android, there are a number of factors that currently constrain our ability to develop/maintain it. Briefly those are: 1. Complexity: Binder is at the intersection of everything in Android and fulfills many responsibilities beyond IPC. It has become many things to many people, and due to its many features and their interactions with each other, its complexity is quite high. In just 6kLOC it must deliver transactions to the right threads. It must correctly parse and translate the contents of transactions, which can contain several objects of different types (e.g., pointers, fds) that can interact with each other. It controls the size of thread pools in userspace, and ensures that transactions are assigned to threads in ways that avoid deadlocks where the threadpool has run out of threads. It must track refcounts of objects that are shared by several processes by forwarding refcount changes between the processes correctly. It must handle numerous error scenarios and it combines/nests 13 different locks, 7 reference counters, and atomic variables. Finally, It must do all of this as fast and efficiently as possible. Minor performance regressions can cause a noticeably degraded user experience. 2. Things to improve: Thousand-line functions [1], error-prone error handling [2], and confusing structure can occur as a code base grows organically. After more than a decade of development, this codebase could use an overhaul. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n2896 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n3658 3. Security critical: Binder is a critical part of Android's sandboxing strategy. Even Android's most de-privileged sandboxes (e.g. the Chrome renderer, or SW Codec) have direct access to Binder. More than just about any other component, it's important that Binder provide robust security, and itself be robust against security vulnerabilities. It's #1 (high complexity) that has made continuing to evolve Binder and resolving #2 (tech debt) exceptionally difficult without causing #3 (security issues). For Binder to continue to meet Android's needs, we need better ways to manage (and reduce!) complexity without increasing the risk. The biggest change is obviously the choice of programming language. We decided to use Rust because it directly addresses a number of the challenges within Binder that we have faced during the last years. It prevents mistakes with ref counting, locking, bounds checking, and also does a lot to reduce the complexity of error handling. Additionally, we've been able to use the more expressive type system to encode the ownership semantics of the various structs and pointers, which takes the complexity of managing object lifetimes out of the hands of the programmer, reducing the risk of use-after-frees and similar problems. Rust has many different pointer types that it uses to encode ownership semantics into the type system, and this is probably one of the most important aspects of how it helps in Binder. The Binder driver has a lot of different objects that have complex ownership semantics; some pointers own a refcount, some pointers have exclusive ownership, and some pointers just reference the object and it is kept alive in some other manner. With Rust, we can use a different pointer type for each kind of pointer, which enables the compiler to enforce that the ownership semantics are implemented correctly. Another useful feature is Rust's error handling. Rust allows for more simplified error handling with features such as destructors, and you get compilation failures if errors are not properly handled. This means that even though Rust requires you to spend more lines of code than C on things such as writing down invariants that are left implicit in C, the Rust driver is still slightly smaller than C binder: Rust is 5.5kLOC and C is 5.8kLOC. (These numbers are excluding blank lines, comments, binderfs, and any debugging facilities in C that are not yet implemented in the Rust driver. The numbers include abstractions in rust/kernel/ that are unlikely to be used by other drivers than Binder.) Although this rewrite completely rethinks how the code is structured and how assumptions are enforced, we do not fundamentally change how the driver does the things it does. A lot of careful thought has gone into the existing design. The rewrite is aimed rather at improving code health, structure, readability, robustness, security, maintainability and extensibility. We also include more inline documentation, and improve how assumptions in the code are enforced. Furthermore, all unsafe code is annotated with a SAFETY comment that explains why it is correct. We have left the binderfs filesystem component in C. Rewriting it in Rust would be a large amount of work and requires a lot of bindings to the file system interfaces. Binderfs has not historically had the same challenges with security and complexity, so rewriting binderfs seems to have lower value than the rest of Binder. Correctness and feature parity ------------------------------ Rust binder passes all tests that validate the correctness of Binder in the Android Open Source Project. We can boot a device, and run a variety of apps and functionality without issues. We have performed this both on the Cuttlefish Android emulator device, and on a Pixel 6 Pro. As for feature parity, Rust binder currently implements all features that C binder supports, with the exception of some debugging facilities. The missing debugging facilities will be added before we submit the Rust implementation upstream. Tracepoints ----------- I did not include all of the tracepoints as I felt that the mechansim for making C access fields of Rust structs should be discussed on list separately. I also did not include the support for building Rust Binder as a module since that requires exporting a bunch of additional symbols on the C side. Original RFC Link with old benchmark numbers: https://lore.kernel.org/r/20231101-rust-binder-v1-0-08ba9197f637@google.com Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com> Co-developed-by: Matt Gilbride <mattgilbride@google.com> Signed-off-by: Matt Gilbride <mattgilbride@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250919-rust-binder-v2-1-a384b09f28dd@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-18	bpf: Return hashes of maps in BPF_OBJ_GET_INFO_BY_FD	KP Singh	-0/+2
	Currently only array maps are supported, but the implementation can be extended for other maps and objects. The hash is memoized only for exclusive and frozen maps as their content is stable until the exclusive program modifies the map. This is required for BPF signing, enabling a trusted loader program to verify a map's integrity. The loader retrieves the map's runtime hash from the kernel and compares it against an expected hash computed at build time. Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-7-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18	bpf: Implement exclusive map creation	KP Singh	-0/+6
	Exclusive maps allow maps to only be accessed by program with a program with a matching hash which is specified in the excl_prog_hash attr. For the signing use-case, this allows the trusted loader program to load the map and verify the integrity Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-3-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	-2/+4
	Cross-merge networking fixes after downstream PR (net-6.17-rc7). No conflicts. Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en/fs.h 9536fbe10c9d ("net/mlx5e: Add PSP steering in local NIC RX") 7601a0a46216 ("net/mlx5e: Add a miss level for ipsec crypto offload") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18	drm/panfrost: Introduce uAPI for JM context creation	Boris Brezillon	-0/+50
	The new uAPI lets user space query the KM driver for the available priorities a job can be given at submit time. These are managed through the notion of a context, for which we also provide new creation and destruction ioctls. Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20250917191859.500279-2-adrian.larumbe@collabora.com
2025-09-18	net: psp: add socket security association code	Jakub Kicinski	-0/+21
	Add the ability to install PSP Rx and Tx crypto keys on TCP connections. Netlink ops are provided for both operations. Rx side combines allocating a new Rx key and installing it on the socket. Theoretically these are separate actions, but in practice they will always be used one after the other. We can add distinct "alloc" and "install" ops later. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250917000954.859376-9-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18	psp: add op for rotation of device key	Jakub Kicinski	-0/+3
	Rotating the device key is a key part of the PSP protocol design. Some external daemon needs to do it once a day, or so. Add a netlink op to perform this operation. Add a notification group for informing users that key has been rotated and they should rekey (next rotation will cut them off). Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250917000954.859376-6-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18	psp: base PSP device support	Jakub Kicinski	-0/+42
	Add a netlink family for PSP and allow drivers to register support. The "PSP device" is its own object. This allows us to perform more flexible reference counting / lifetime control than if PSP information was part of net_device. In the future we should also be able to "delegate" PSP access to software devices, such as *vlan, veth or netkit more easily. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250917000954.859376-3-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18	RDMA/irdma: Add SRQ support	Faisal Latif	-1/+14
	Implement verb API and UAPI changes to support SRQ functionality in GEN3 devices. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20250827152545.2056-13-tatyana.e.nikolova@intel.com Tested-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18	RDMA/irdma: Introduce GEN3 vPort driver support	Mustafa Ismail	-0/+1
	In the IPU model, a function can host one or more logical network endpoints called vPorts. Each vPort may be associated with either a physical or an internal communication port, and can be RDMA capable. A vPort features a netdev and, if RDMA capable, must have an associated ib_dev. This change introduces a GEN3 auxiliary vPort driver responsible for registering a verbs device for every RDMA-capable vPort. Additionally, the UAPI is updated to prevent the binding of GEN3 devices to older user-space providers. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20250827152545.2056-8-tatyana.e.nikolova@intel.com Tested-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-09-18	tcp: accecn: AccECN option failure handling	Chia-Yu Chang	-0/+2
	AccECN option may fail in various way, handle these: - Attempt to negotiate the use of AccECN on the 1st retransmitted SYN - From the 2nd retransmitted SYN, stop AccECN negotiation - Remove option from SYN/ACK rexmits to handle blackholes - If no option arrives in SYN/ACK, assume Option is not usable - If an option arrives later, re-enabled - If option is zeroed, disable AccECN option processing This patch use existing padding bits in tcp_request_sock and holes in tcp_sock without increasing the size. Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916082434.100722-9-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18	tcp: accecn: AccECN option	Ilpo Järvinen	-0/+7
	The Accurate ECN allows echoing back the sum of bytes for each IP ECN field value in the received packets using AccECN option. This change implements AccECN option tx & rx side processing without option send control related features that are added by a later change. Based on specification: https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt (Some features of the spec will be added in the later changes rather than in this one). A full-length AccECN option is always attempted but if it does not fit, the minimum length is selected based on the counters that have changed since the last update. The AccECN option (with 24-bit fields) often ends in odd sizes so the option write code tries to take advantage of some nop used to pad the other TCP options. The delivered_ecn_bytes pairs with received_ecn_bytes similar to how delivered_ce pairs with received_ce. In contrast to ACE field, however, the option is not always available to update delivered_ecn_bytes. For ACK w/o AccECN option, the delivered bytes calculated based on the cumulative ACK+SACK information are assigned to one of the counters using an estimation heuristic to select the most likely ECN byte counter. Any estimation error is corrected when the next AccECN option arrives. It may occur that the heuristic gets too confused when there are enough different byte counter deltas between ACKs with the AccECN option in which case the heuristic just gives up on updating the counters for a while. tcp_ecn_option sysctl can be used to select option sending mode for AccECN: TCP_ECN_OPTION_DISABLED, TCP_ECN_OPTION_MINIMUM, and TCP_ECN_OPTION_FULL. This patch increases the size of tcp_info struct, as there is no existing holes for new u32 variables. Below are the pahole outcomes before and after this patch: [BEFORE THIS PATCH] struct tcp_info { [...] __u32 tcpi_total_rto_time; /* 244 4 / / size: 248, cachelines: 4, members: 61 / } [AFTER THIS PATCH] struct tcp_info { [...] __u32 tcpi_total_rto_time; / 244 4 / __u32 tcpi_received_ce; / 248 4 / __u32 tcpi_delivered_e1_bytes; / 252 4 / __u32 tcpi_delivered_e0_bytes; / 256 4 / __u32 tcpi_delivered_ce_bytes; / 260 4 / __u32 tcpi_received_e1_bytes; / 264 4 / __u32 tcpi_received_e0_bytes; / 268 4 / __u32 tcpi_received_ce_bytes; / 272 4 / / size: 280, cachelines: 5, members: 68 / } This patch uses the existing 1-byte holes in the tcp_sock_write_txrx group for new u8 members, but adds a 4-byte hole in tcp_sock_write_rx group after the new u32 delivered_ecn_bytes[3] member. Therefore, the group size of tcp_sock_write_rx is increased from 96 to 112. Below are the pahole outcomes before and after this patch: [BEFORE THIS PATCH] struct tcp_sock { [...] u8 received_ce_pending:4; / 2522: 0 1 / u8 unused2:4; / 2522: 4 1 / / XXX 1 byte hole, try to pack / [...] u32 rcv_rtt_last_tsecr; / 2668 4 / [...] __cacheline_group_end__tcp_sock_write_rx[0]; / 2728 0 / [...] / size: 3200, cachelines: 50, members: 167 / } [AFTER THIS PATCH] struct tcp_sock { [...] u8 received_ce_pending:4;/ 2522: 0 1 / u8 unused2:4; / 2522: 4 1 / u8 accecn_minlen:2; / 2523: 0 1 / u8 est_ecnfield:2; / 2523: 2 1 / u8 unused3:4; / 2523: 4 1 / [...] u32 rcv_rtt_last_tsecr; / 2668 4 / u32 delivered_ecn_bytes[3];/ 2672 12 / / XXX 4 bytes hole, try to pack / [...] __cacheline_group_end__tcp_sock_write_rx[0]; / 2744 0 / [...] / size: 3200, cachelines: 50, members: 171 */ } Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Neal Cardwell <ncardwell@google.com> Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916082434.100722-7-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-18	accel/ivpu: Add support for user-managed preemption buffer	Andrzej Kacprowski	-0/+11
	Allow user mode drivers to manage preemption buffers, enabling memory savings by sharing a single buffer across multiple command queues within the same memory context. Introduce DRM_IVPU_PARAM_PREEMPT_BUFFER_SIZE to report the required preemption buffer size as specified by the firmware. The preemption buffer is now passed from user space as an entry in the BO list of DRM_IVPU_CMDQ_SUBMIT. The buffer must be non-mappable and large enough to hold preemption data. For backward compatibility, the kernel will allocate an internal preemption buffer if user space does not provide one. User space can only provide a single preemption buffer, simplifying the ioctl interface and parameter validation. A separate secondary preemption buffer is only needed to save below 4GB address space on 37xx and only if preemption buffers are not shared. Signed-off-by: Andrzej Kacprowski <Andrzej.Kacprowski@intel.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Link: https://lore.kernel.org/r/20250915103437.830086-1-karol.wachowski@linux.intel.com
2025-09-17	crypto: ccp - Add AMD Seamless Firmware Servicing (SFS) driver	Ashish Kalra	-0/+87
	AMD Seamless Firmware Servicing (SFS) is a secure method to allow non-persistent updates to running firmware and settings without requiring BIOS reflash and/or system reset. SFS does not address anything that runs on the x86 processors and it can be used to update ASP firmware, modules, register settings and update firmware for other microprocessors like TMPM, etc. SFS driver support adds ioctl support to communicate the SFS commands to the ASP/PSP by using the TEE mailbox interface. The Seamless Firmware Servicing (SFS) driver is added as a PSP sub-device. For detailed information, please look at the SFS specifications: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58604.pdf Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/cover.1758057691.git.ashish.kalra@amd.com
2025-09-17	HID: hidraw: tighten ioctl command parsing	Benjamin Tissoires	-0/+2
	The handling for variable-length ioctl commands in hidraw_ioctl() is rather complex and the check for the data direction is incomplete. Simplify this code by factoring out the various ioctls grouped by dir and size, and using a switch() statement with the size masked out, to ensure the rest of the command is correctly matched. Fixes: 9188e79ec3fd ("HID: add phys and name ioctls to hidraw") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-09-16	ASoC: codecs: pcm1754: add pcm1754 dac driver	Mark Brown	-2/+4
	Merge series from Stefan Kerkmann <s.kerkmann@pengutronix.de>: Add a CODEC driver for the TI PCM1754.
2025-09-16	io_uring/zcrx: allow synchronous buffer return	Pavel Begunkov	-0/+12
	Returning buffers via a ring is performant and convenient, but it becomes a problem when/if the user misconfigured the ring size and it becomes full. Add a synchronous way to return buffers back to the page pool via a new register opcode. It's supposed to be a reliable slow path for refilling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-15	mptcp: pm: nl: announce deny-join-id0 flag	Matthieu Baerts (NGI0)	-2/+4
	During the connection establishment, a peer can tell the other one that it cannot establish new subflows to the initial IP address and port by setting the 'C' flag [1]. Doing so makes sense when the sender is behind a strict NAT, operating behind a legacy Layer 4 load balancer, or using anycast IP address for example. When this 'C' flag is set, the path-managers must then not try to establish new subflows to the other peer's initial IP address and port. The in-kernel PM has access to this info, but the userspace PM didn't. The RFC8684 [1] is strict about that: (...) therefore the receiver MUST NOT try to open any additional subflows toward this address and port. So it is important to tell the userspace about that as it is responsible for the respect of this flag. When a new connection is created and established, the Netlink events now contain the existing but not currently used 'flags' attribute. When MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows to the initial IP address and port -- info that are also part of the event -- can be established. Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6 [1] Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Reported-by: Marek Majkowski <marek@cloudflare.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-40171884ade8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15	drm/amdgpu: add AMDGPU_IDS_FLAGS_GANG_SUBMIT	Christian König	-4/+5
	Add a UAPI flag indicating if gang submit is supported or not. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15	tee: add Qualcomm TEE driver	Amirreza Zarrabi	-0/+1
	Introduce qcomtee_object, which represents an object in both QTEE and the kernel. QTEE clients can invoke an instance of qcomtee_object to access QTEE services. If this invocation produces a new object in QTEE, an instance of qcomtee_object will be returned. Similarly, QTEE can request services from by issuing a callback request, which invokes an instance of qcomtee_object. Implement initial support for exporting qcomtee_object to userspace and QTEE, enabling the invocation of objects hosted in QTEE and userspace through the TEE subsystem. Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Harshal Dev <quic_hdev@quicinc.com> Acked-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2025-09-15	tee: increase TEE_MAX_ARG_SIZE to 4096	Amirreza Zarrabi	-1/+1
	Increase TEE_MAX_ARG_SIZE to accommodate worst-case scenarios where additional buffer space is required to pass all arguments to TEE. This change is necessary for upcoming support for Qualcomm TEE, which requires a larger buffer for argument marshaling. Reviewed-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Tested-by: Harshal Dev <quic_hdev@quicinc.com> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2025-09-15	tee: add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF	Amirreza Zarrabi	-6/+35
	The TEE subsystem allows session-based access to trusted services, requiring a session to be established to receive a service. This is not suitable for an environment that represents services as objects. An object supports various operations that a client can invoke, potentially generating a result or a new object that can be invoked independently of the original object. Add TEE_IOCTL_PARAM_ATTR_TYPE_OBJREF_INPUT/OUTPUT/INOUT to represent an object. Objects may reside in either TEE or userspace. To invoke an object in TEE, introduce a new ioctl. Use the existing SUPPL_RECV and SUPPL_SEND to invoke an object in userspace. Reviewed-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Harshal Dev <quic_hdev@quicinc.com> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2025-09-15	tee: add TEE_IOCTL_PARAM_ATTR_TYPE_UBUF	Amirreza Zarrabi	-6/+16
	For drivers that can transfer data to the TEE without using shared memory from client, it is necessary to receive the user address directly, bypassing any processing by the TEE subsystem. Introduce TEE_IOCTL_PARAM_ATTR_TYPE_UBUF_INPUT/OUTPUT/INOUT to represent userspace buffers. Reviewed-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Harshal Dev <quic_hdev@quicinc.com> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2025-09-15	Merge tag 'tee-prot-dma-buf-for-v6.18' of ↵	Arnd Bergmann	-0/+31
	git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee into soc/drivers TEE protected DMA-bufs for v6.18 - Allocates protected DMA-bufs from a DMA-heap instantiated from the TEE subsystem. - The DMA-heap uses a protected memory pool provided by the backend TEE driver, allowing it to choose how to allocate the protected physical memory. - Three use-cases (Secure Video Playback, Trusted UI, and Secure Video Recording) have been identified so far to serve as examples of what can be expected. - The use-cases have predefined DMA-heap names, "protected,secure-video", "protected,trusted-ui", and "protected,secure-video-record". The backend driver registers protected memory pools for the use-cases it supports. * tag 'tee-prot-dma-buf-for-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee: optee: smc abi: dynamic protected memory allocation optee: FF-A: dynamic protected memory allocation optee: support protected memory allocation tee: add tee_shm_alloc_dma_mem() tee: new ioctl to a register tee_shm from a dmabuf file descriptor tee: refactor params_from_user() tee: implement protected DMA-heap dma-buf: dma-heap: export declared functions optee: sync secure world ABI headers Link: https://lore.kernel.org/r/20250912101752.GA1453408@rayden Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-09-15	Input: add INPUT_PROP_HAPTIC_TOUCHPAD	Angela Czubak	-0/+1
	INPUT_PROP_HAPTIC_TOUCHPAD property is to be set for a device with simple haptic capabilities. Signed-off-by: Angela Czubak <aczubak@google.com> Co-developed-by: Jonathan Denose <jdenose@google.com> Signed-off-by: Jonathan Denose <jdenose@google.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2025-09-15	Input: add FF_HAPTIC effect type	Angela Czubak	-1/+21
	FF_HAPTIC effect type can be used to trigger haptic feedback with HID simple haptic usages. Signed-off-by: Angela Czubak <aczubak@google.com> Co-developed-by: Jonathan Denose <jdenose@google.com> Signed-off-by: Jonathan Denose <jdenose@google.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
2025-09-15	Merge tag 'v6.17-rc6' into drm-next	Dave Airlie	-3/+6
	This is a backmerge of Linux 6.17-rc6, needed for msm, also requested by misc. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-09-13	x86/kexec: carry forward the boot DTB on kexec	Brian Mak	-0/+4
	Currently, the kexec_file_load syscall on x86 does not support passing a device tree blob to the new kernel. Some embedded x86 systems use device trees. On these systems, failing to pass a device tree to the new kernel causes a boot failure. To add support for this, we copy the behavior of ARM64 and PowerPC and copy the current boot's device tree blob for use in the new kernel. We do this on x86 by passing the device tree blob as a setup_data entry in accordance with the x86 boot protocol. This behavior is gated behind the KEXEC_FILE_FORCE_DTB flag. Link: https://lkml.kernel.org/r/20250805211527.122367-3-makb@juniper.net Signed-off-by: Brian Mak <makb@juniper.net> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13	mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED	David Hildenbrand	-1/+1
	Let's allow for making MADV_COLLAPSE succeed on areas that neither have VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled unless explicitly advised (PR_THP_DISABLE_EXCEPT_ADVISED). MADV_COLLAPSE is a clear advice that we want to collapse. Note that we still respect the VM_NOHUGEPAGE flag, just like MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED, including for shmem. Link: https://lkml.kernel.org/r/20250815135549.130506-4-usamaarif642@gmail.com Co-developed-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13	prctl: extend PR_SET_THP_DISABLE to optionally exclude VM_HUGEPAGE	David Hildenbrand	-0/+10
	Patch series "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised", v5. This will allow individual processes to opt-out of THP = "always" into THP = "madvise", without affecting other workloads on the system. This has been extensively discussed on the mailing list and has been summarized very well by David in the first patch which also includes the links to alternatives, please refer to the first patch commit message for the motivation for this series. Patch 1 adds the PR_THP_DISABLE_EXCEPT_ADVISED flag to implement this, along with the MMF changes. Patch 2 is a cleanup patch for tva_flags that will allow the forced collapse case to be transmitted to vma_thp_disabled (which is done in patch 3). Patch 4 adds documentation for PR_SET_THP_DISABLE/PR_GET_THP_DISABLE. Patches 6-7 implement the selftests for PR_SET_THP_DISABLE for completely disabling THPs (old behaviour) and only enabling it at advise (PR_THP_DISABLE_EXCEPT_ADVISED). This patch (of 7): People want to make use of more THPs, for example, moving from the "never" system policy to "madvise", or from "madvise" to "always". While this is great news for every THP desperately waiting to get allocated out there, apparently there are some workloads that require a bit of care during that transition: individual processes may need to opt-out from this behavior for various reasons, and this should be permitted without needing to make all other workloads on the system similarly opt-out. The following scenarios are imaginable: (1) Switch from "none" system policy to "madvise"/"always", but keep THPs disabled for selected workloads. (2) Stay at "none" system policy, but enable THPs for selected workloads, making only these workloads use the "madvise" or "always" policy. (3) Switch from "madvise" system policy to "always", but keep the "madvise" policy for selected workloads: allocate THPs only when advised. (4) Stay at "madvise" system policy, but enable THPs even when not advised for selected workloads -- "always" policy. Once can emulate (2) through (1), by setting the system policy to "madvise"/"always" while disabling THPs for all processes that don't want THPs. It requires configuring all workloads, but that is a user-space problem to sort out. (4) can be emulated through (3) in a similar way. Back when (1) was relevant in the past, as people started enabling THPs, we added PR_SET_THP_DISABLE, so relevant workloads that were not ready yet (i.e., used by Redis) were able to just disable THPs completely. Redis still implements the option to use this interface to disable THPs completely. With PR_SET_THP_DISABLE, we added a way to force-disable THPs for a workload -- a process, including fork+exec'ed process hierarchy. That essentially made us support (1): simply disable THPs for all workloads that are not ready for THPs yet, while still enabling THPs system-wide. The quest for handling (3) and (4) started, but current approaches (completely new prctl, options to set other policies per process, alternatives to prctl -- mctrl, cgroup handling) don't look particularly promising. Likely, the future will use bpf or something similar to implement better policies, in particular to also make better decisions about THP sizes to use, but this will certainly take a while as that work just started. Long story short: a simple enable/disable is not really suitable for the future, so we're not willing to add completely new toggles. While we could emulate (3)+(4) through (1)+(2) by simply disabling THPs completely for these processes, this is a step backwards, because these processes can no longer allocate THPs in regions where THPs were explicitly advised: regions flagged as VM_HUGEPAGE. Apparently, that imposes a problem for relevant workloads, because "not THPs" is certainly worse than "THPs only when advised". Could we simply relax PR_SET_THP_DISABLE, to "disable THPs unless not explicitly advised by the app through MAD_HUGEPAGE"? maybe, but this would change the documented semantics quite a bit, and the versatility to use it for debugging purposes, so I am not 100% sure that is what we want -- although it would certainly be much easier. So instead, as an easy way forward for (3) and (4), add an option to make PR_SET_THP_DISABLE disable less THPs for a process. In essence, this patch: (A) Adds PR_THP_DISABLE_EXCEPT_ADVISED, to be used as a flag in arg3 of prctl(PR_SET_THP_DISABLE) when disabling THPs (arg2 != 0). prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED). (B) Makes prctl(PR_GET_THP_DISABLE) return 3 if PR_THP_DISABLE_EXCEPT_ADVISED was set while disabling. Previously, it would return 1 if THPs were disabled completely. Now it returns the set flags as well: 3 if PR_THP_DISABLE_EXCEPT_ADVISED was set. (C) Renames MMF_DISABLE_THP to MMF_DISABLE_THP_COMPLETELY, to express the semantics clearly. Fortunately, there are only two instances outside of prctl() code. (D) Adds MMF_DISABLE_THP_EXCEPT_ADVISED to express "no THP except for VMAs with VM_HUGEPAGE" -- essentially "thp=madvise" behavior Fortunately, we only have to extend vma_thp_disabled(). (E) Indicates "THP_enabled: 0" in /proc/pid/status only if THPs are disabled completely Only indicating that THPs are disabled when they are really disabled completely, not only partially. For now, we don't add another interface to obtained whether THPs are disabled partially (PR_THP_DISABLE_EXCEPT_ADVISED was set). If ever required, we could add a new entry. The documented semantics in the man page for PR_SET_THP_DISABLE "is inherited by a child created via fork(2) and is preserved across execve(2)" is maintained. This behavior, for example, allows for disabling THPs for a workload through the launching process (e.g., systemd where we fork() a helper process to then exec()). For now, MADV_COLLAPSE will fail in regions without VM_HUGEPAGE and VM_NOHUGEPAGE. As MADV_COLLAPSE is a clear advise that user space thinks a THP is a good idea, we'll enable that separately next (requiring a bit of cleanup first). There is currently not way to prevent that a process will not issue PR_SET_THP_DISABLE itself to re-enable THP. There are not really known users for re-enabling it, and it's against the purpose of the original interface. So if ever required, we could investigate just forbidding to re-enable them, or make this somehow configurable. Link: https://lkml.kernel.org/r/20250815135549.130506-1-usamaarif642@gmail.com Link: https://lkml.kernel.org/r/20250815135549.130506-2-usamaarif642@gmail.com Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Usama Arif <usamaarif642@gmail.com> Tested-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yafang <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13	mempolicy: clarify what zone reclaim means	Joshua Hahn	-3/+9
	The zone_reclaim_mode API controls the reclaim behavior when a node runs out of memory. Contrary to its user-facing name, it is internally referred to as "node_reclaim_mode". This can be confusing. But because we cannot change the name of the API since it has been in place since at least 2.6, let's try to be more explicit about what the behavior of this API is. Change the description to clarify what zone reclaim entails, and be explicit about the RECLAIM_ZONE bit, whose purpose has led to some confusion in the past already [1] [2]. While at it, also soften the warning about changing these bits. [joshua.hahnjy@gmail.com: remove the reference to the vm.zone_reclaim_mode sysctl as an ABI] Link: https://lkml.kernel.org/r/20250806134404.2000234-1-joshua.hahnjy@gmail.com Link: https://lkml.kernel.org/r/20250805205048.1518453-1-joshua.hahnjy@gmail.com Link: https://lore.kernel.org/linux-mm/1579005573-58923-1-git-send-email-alex.shi@linux.alibaba.com/ [1] Link: https://lore.kernel.org/linux-mm/20200626003459.D8E015CA@viggo.jf.intel.com/ [2] Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Byungchul Park <byungchul@sk.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Rakie Kim <rakie.kim@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13	Merge tag 'v6.17-rc3' into togreg	Jonathan Cameron	-1/+2
	Linux 6.17-rc3
2025-09-13	iio: add power and energy measurement modifiers	Antoniu Miclaus	-0/+4
	Add new IIO modifiers to support power and energy measurement devices: Power modifiers: - IIO_MOD_ACTIVE: Real power consumed by the load - IIO_MOD_REACTIVE: Power that oscillates between source and load - IIO_MOD_APPARENT: Magnitude of complex power Signal quality modifiers: - IIO_MOD_RMS: Root Mean Square value Additionally adds: - IIO_CHAN_INFO_POWERFACTOR: Power factor channel info type for representing the ratio of active power to apparent power These modifiers enable proper representation of power measurement devices like energy meters and power analyzers. Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>