summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorLines
2026-02-04mm/slab: move [__]ksize and slab_ksize() to mm/slub.cHarry Yoo-1/+0
To access SLUB's internal implementation details beyond cache flags in ksize(), move __ksize(), ksize(), and slab_ksize() to mm/slub.c. [vbabka@suse.cz: also make __ksize() static and move its kerneldoc to ksize() ] Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-9-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: allow specifying free pointer offset when using constructorHarry Yoo-14/+16
When a slab cache has a constructor, the free pointer is placed after the object because certain fields must not be overwritten even after the object is freed. However, some fields that the constructor does not initialize can safely be overwritten after free. Allow specifying the free pointer offset within the object, reducing the overall object size when some fields can be reused for the free pointer. Adjust the document accordingly. Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-3-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mshv: Update hv_stats_page definitionsNuno Das Neves-0/+7
hv_stats_page belongs in hvhdk.h, move it there. It does not require a union to access the data for different counters, just use a single u64 array for simplicity and to match the Windows definitions. While at it, correct the ARM64 value for VpRootDispatchThreadBlocked. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-03dt-bindings: clock: aspeed: Add VIDEO reset definitionJammy Huang-0/+1
ASPEED clock controller provides a couple of resets. Add the define of video to allow referring to it. Signed-off-by: Jammy Huang <jammy_huang@aspeedtech.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2026-02-04platform/chrome: lightbar: Use flexible array memberGwendal Grignou-1/+1
Variable arrays should be defined as [], not [0], otherwise the kernel complains: memcpy : detected field-spanning write (size 9) of single field "param->set_program_ex.data" at drivers/platform/chrome/cros_ec_lightbar.c:603 (size 0) Fixes: 9600b8bdbfe4 ("platform/chrome: lightbar: Add support for large sequence") Signed-off-by: Gwendal Grignou <gwendal@google.com> Link: https://lore.kernel.org/r/20260204034848.697033-1-gwendal@google.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2026-02-03net: usb: introduce usbnet_mii_ioctl helper functionEthan Nelson-Moore-0/+1
Many USB network drivers use identical code to pass ioctl requests on to the MII layer. Reduce code duplication by refactoring this code into a helper function. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> (v1) Reviewed-by: Andrew Lunn <andrew@lunn.ch> (v3) Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260203013517.26170-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-03lib/crypto: mldsa: Clarify the documentation for mldsa_verify() slightlyEric Biggers-1/+3
mldsa_verify() implements ML-DSA.Verify with ctx='', so document this more explicitly. Remove the one-liner comment above mldsa_verify() which was somewhat misleading. Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20260202221552.174341-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-03scsi: ufs: host: mediatek: Require CONFIG_PMArnd Bergmann-4/+0
The added print statement from a recent fix causes the driver to fail building when CONFIG_PM is disabled: drivers/ufs/host/ufs-mediatek.c: In function 'ufs_mtk_resume': drivers/ufs/host/ufs-mediatek.c:1890:40: error: 'struct dev_pm_info' has no member named 'request' 1890 | hba->dev->power.request, It seems unlikely that the driver can work at all without CONFIG_PM, so just add a dependency and remove the existing ifdef checks, rather than adding another ifdef. Fixes: 15ef3f5aa822 ("scsi: ufs: host: mediatek: Enhance recovery on resume failure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20260202095052.1232703-1-arnd@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-02-03of: Add for_each_compatible_node_scoped() helperKrzysztof Kozlowski-0/+7
Just like looping through children and available children, add a scoped helper for for_each_compatible_node() so error paths can drop of_node_put() leading to simpler code. Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260109-of-for-each-compatible-scoped-v3-1-c22fa2c0749a@oss.qualcomm.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2026-02-03dt-bindings: Remove unused includesRob Herring (Arm)-1462/+0
Remove includes which are not referenced by either DTS files or drivers. There's a few more which are new, so they are excluded for now. Reviewed-by: Linus Walleij <linusw@kernel.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20251212231203.727227-1-robh@kernel.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2026-02-03Merge tag 'i2c-host-6.20' of ↵Wolfram Sang-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow i2c-host for v6.20 - amd-mp2, designware, mlxbf, rtl9300, spacemit, tegra: cleanups - designware: use a dedicated algorithm for AMD Navi - designware: replace magic numbers with named constants - designware: replace min_t() with min() to avoid u8 truncation - designware: refactor core to enable mode switching - imx-lpi2c: add runtime PM support for IRQ and clock handling - lan9691-i2c: add new driver - rtl9300: use OF helpers directly and avoid fwnode handling - spacemit: add bus reset support - units: add HZ_PER_GHZ and use it in several i2c drivers
2026-02-03bpf: Clear singular ids for scalars in is_state_visited()Puranjay Mohan-2/+5
The verifier assigns ids to scalar registers/stack slots when they are linked through a mov or stack spill/fill instruction. These ids are later used to propagate newly found bounds from one register to all registers that share the same id. The verifier also compares the ids of these registers in current state and cached state when making pruning decisions. When an ID becomes singular (i.e., only a single register or stack slot has that ID), it can no longer participate in bounds propagation. During comparisons between current and cached states for pruning decisions, however, such stale IDs can prevent pruning of otherwise equivalent states. Find and clear all singular ids before caching a state in is_state_visited(). struct bpf_idset which is currently unused has been repurposed for this use case. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260203165102.2302462-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-03panic: add panic_force_cpu= parameter to redirect panic to a specific CPUPnina Feder-0/+9
Some platforms require panic handling to execute on a specific CPU for crash dump to work reliably. This can be due to firmware limitations, interrupt routing constraints, or platform-specific requirements where only a single CPU is able to safely enter the crash kernel. Add the panic_force_cpu= kernel command-line parameter to redirect panic execution to a designated CPU. When the parameter is provided, the CPU that initially triggers panic forwards the panic context to the target CPU via IPI, which then proceeds with the normal panic and kexec flow. The IPI delivery is implemented as a weak function (panic_smp_redirect_cpu) so architectures with NMI support can override it for more reliable delivery. If the specified CPU is invalid, offline, or a panic is already in progress on another CPU, the redirection is skipped and panic continues on the current CPU. [pnina.feder@mobileye.com: fix unused variable warning] Link: https://lkml.kernel.org/r/20260126122618.2967950-1-pnina.feder@mobileye.com Link: https://lkml.kernel.org/r/20260122102457.1154599-1-pnina.feder@mobileye.com Signed-off-by: Pnina Feder <pnina.feder@mobileye.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-03blk-mq: add a new queue sysfs attribute async_depthYu Kuai-0/+1
Add a new field async_depth to request_queue and related APIs, this is currently not used, following patches will convert elevators to use this instead of internal async_depth. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-03block: convert nr_requests to unsigned intYu Kuai-1/+1
This value represents the number of requests for elevator tags, or drivers tags if elevator is none. The max value for elevator tags is 2048, and in drivers at most 16 bits is used for tag. Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-03kthread: Honour kthreads preferred affinity after cpuset changesFrederic Weisbecker-0/+1
When cpuset isolated partitions get updated, unbound kthreads get indifferently affine to all non isolated CPUs, regardless of their individual affinity preferences. For example kswapd is a per-node kthread that prefers to be affine to the node it refers to. Whenever an isolated partition is created, updated or deleted, kswapd's node affinity is going to be broken if any CPU in the related node is not isolated because kswapd will be affine globally. Fix this with letting the consolidated kthread managed affinity code do the affinity update on behalf of cpuset. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAINFrederic Weisbecker-1/+1
Tasks that have all their allowed CPUs offline don't want their affinity to fallback on either nohz_full CPUs or on domain isolated CPUs. And since nohz_full implies domain isolation, checking the latter is enough to verify both. Therefore exclude domain isolation from fallback task affinity. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org
2026-02-03sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated()Frederic Weisbecker-2/+1
It doesn't make sense to use nohz_full without also isolating the related CPUs from the domain topology, either through the use of isolcpus= or cpuset isolated partitions. And now HK_TYPE_DOMAIN includes all kinds of domain isolated CPUs. This means that HK_TYPE_DOMAIN should always be a subset of HK_TYPE_KERNEL_NOISE (of which HK_TYPE_TICK is only an alias). Therefore if a CPU is not HK_TYPE_DOMAIN, it shouldn't be HK_TYPE_KERNEL_NOISE either. Testing the former is then enough. Simplify cpu_is_isolated() accordingly. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03cpuset: Remove cpuset_cpu_is_isolated()Frederic Weisbecker-9/+1
The set of cpuset isolated CPUs is now included in HK_TYPE_DOMAIN housekeeping cpumask. There is no usecase left interested in just checking what is isolated by cpuset and not by the isolcpus= kernel boot parameter. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03cpuset: Propagate cpuset isolation update to workqueue through housekeepingFrederic Weisbecker-1/+1
Until now, cpuset would propagate isolated partition changes to workqueues so that unbound workers get properly reaffined. Since housekeeping now centralizes, synchronize and propagates isolation cpumask changes, perform the work from that subsystem for consolidation and consistency purposes. For simplification purpose, the target function is adapted to take the new housekeeping mask instead of the isolated mask. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03PCI: Flush PCI probe workqueue on cpuset isolated partition changeFrederic Weisbecker-0/+3
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In order to synchronize against PCI probe works and make sure that no asynchronous probing is still pending or executing on a newly isolated CPU, the housekeeping subsystem must flush the PCI probe works. However the PCI probe works can't be flushed easily since they are queued to the main per-CPU workqueue pool. Solve this with creating a PCI probe-specific pool and provide and use the appropriate flushing API. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: linux-pci@vger.kernel.org
2026-02-03sched/isolation: Flush vmstat workqueues on cpuset isolated partition changeFrederic Weisbecker-0/+2
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In order to synchronize against vmstat workqueue to make sure that no asynchronous vmstat work is still pending or executing on a newly made isolated CPU, the housekeeping susbsystem must flush the vmstat workqueues. This involves flushing the whole mm_percpu_wq workqueue, shared with LRU drain, introducing here a welcome side effect. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: linux-mm@kvack.org
2026-02-03sched/isolation: Flush memcg workqueues on cpuset isolated partition changeFrederic Weisbecker-0/+4
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In order to synchronize against memcg workqueue to make sure that no asynchronous draining is still pending or executing on a newly made isolated CPU, the housekeeping susbsystem must flush the memcg workqueues. However the memcg workqueues can't be flushed easily since they are queued to the main per-CPU workqueue pool. Solve this with creating a memcg specific pool and provide and use the appropriate flushing API. Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org
2026-02-03cpuset: Update HK_TYPE_DOMAIN cpumask from cpusetFrederic Weisbecker-0/+7
Until now, HK_TYPE_DOMAIN used to only include boot defined isolated CPUs passed through isolcpus= boot option. Users interested in also knowing the runtime defined isolated CPUs through cpuset must use different APIs: cpuset_cpu_is_isolated(), cpu_is_isolated(), etc... There are many drawbacks to that approach: 1) Most interested subsystems want to know about all isolated CPUs, not just those defined on boot time. 2) cpuset_cpu_is_isolated() / cpu_is_isolated() are not synchronized with concurrent cpuset changes. 3) Further cpuset modifications are not propagated to subsystems Solve 1) and 2) and centralize all isolated CPUs within the HK_TYPE_DOMAIN housekeeping cpumask. Subsystems can rely on RCU to synchronize against concurrent changes. The propagation mentioned in 3) will be handled in further patches. [Chen Ridong: Fix cpu_hotplug_lock deadlock and use correct static branch API] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03cpuset: Provide lockdep check for cpuset lock heldFrederic Weisbecker-0/+2
cpuset modifies partitions, including isolated, while holding the cpuset mutex. This means that holding the cpuset mutex is safe to synchronize against housekeeping cpumask changes. Provide a lockdep check to validate that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org
2026-02-03cpu: Provide lockdep check for CPU hotplug lock write-heldFrederic Weisbecker-0/+2
cpuset modifies partitions, including isolated, while holding the cpu hotplug lock read-held. This means that write-holding the CPU hotplug lock is safe to synchronize against housekeeping cpumask changes. Provide a lockdep check to validate that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: linux-kernel@vger.kernel.org
2026-02-03sched/isolation: Save boot defined domain flagsFrederic Weisbecker-0/+4
HK_TYPE_DOMAIN will soon integrate not only boot defined isolcpus= CPUs but also cpuset isolated partitions. Housekeeping still needs a way to record what was initially passed to isolcpus= in order to keep these CPUs isolated after a cpuset isolated partition is modified or destroyed while containing some of them. Create a new HK_TYPE_DOMAIN_BOOT to keep track of those. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_infoChia-Yu Chang-14/+23
Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN, or ECN_MODE_PENDING. This is done by utilizing available bits from tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits). Also, an extra 24-bit tcpi_options2 field is identified to represent newer options and connection features, as all 8 bits of tcpi_options field have been used. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-14-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSISTChia-Yu Chang-1/+4
Detect spurious retransmission of a previously sent ACK carrying the AccECN option after the second retransmission. Since this might be caused by the middlebox dropping ACK with options it does not recognize, disable the sending of the AccECN option in all subsequent ACKs. This patch follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field (accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was sent with duplicate SACK info. Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl: (TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and persistently sends AccECN option once it fits into TCP option space. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-13-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: fallback outgoing half link to non-AccECNChia-Yu Chang-1/+3
According to Section 3.2.2.1 of AccECN spec (RFC9768), if the Server is in AccECN mode and in SYN-RCVD state, and if it receives a value of zero on a pure ACK with SYN=0 and no SACK blocks, for the rest of the connection the Server MUST NOT set ECT on outgoing packets and MUST NOT respond to AccECN feedback. Nonetheless, as a Data Receiver it MUST NOT disable AccECN feedback. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-12-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACKChia-Yu Chang-5/+15
For Accurate ECN, the first SYN/ACK sent by the TCP server shall set the ACE flag (Table 1 of RFC9768) and the AccECN option to complete the capability negotiation. However, if the TCP server needs to retransmit such a SYN/ACK (for example, because it did not receive an ACK acknowledging its SYN/ACK, or received a second SYN requesting AccECN support), the TCP server retransmits the SYN/ACK without the AccECN option. This is because the SYN/ACK may be lost due to congestion, or a middlebox may block the AccECN option. Furthermore, if this retransmission also times out, to expedite connection establishment, the TCP server should retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and without the AccECN option, while maintaining AccECN feedback mode. This complies with Section 3.2.3.2.2 of the AccECN spec RFC9768. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-10-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: add TCP_SYNACK_RETRANS synack_typeChia-Yu Chang-0/+1
Before this patch, retransmitted SYN/ACK did not have a specific synack_type; however, the upcoming patch needs to distinguish between retransmitted and non-retransmitted SYN/ACK for AccECN negotiation to transmit the fallback SYN/ACK during AccECN negotiation. Therefore, this patch introduces a new synack_type (TCP_SYNACK_RETRANS). Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-9-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: handle unexpected AccECN negotiation feedbackChia-Yu Chang-13/+31
According to Sections 3.1.2 and 3.1.3 of AccECN spec (RFC9768). In Section 3.1.2, it says an AccECN implementation has no need to recognize or support the Server response labelled 'Nonce' or ECN-nonce feedback more generally, as RFC 3540 has been reclassified as Historic. AccECN is compatible with alternative ECN feedback integrity approaches to the nonce. The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is reserved for future use. A TCP Client (A) that receives such a SYN/ACK follows the procedure for forward compatibility given in Section 3.1.3. Then in Section 3.1.3, it says if a TCP Client has sent a SYN requesting AccECN feedback with (AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with the currently reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have logic specific to such a combination, the Client MUST enable AccECN mode as if the SYN/ACK onfirmed that the Server supported AccECN and as if it fed back that the IP-ECN field on the SYN had arrived unchanged. Fixes: 3cae34274c79 ("tcp: accecn: AccECN negotiation"). Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-7-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: disable RFC3168 fallback identifier for CC modulesChia-Yu Chang-4/+19
When AccECN is not successfully negociated for a TCP flow, it defaults fallback to classic ECN (RFC3168). However, L4S service will fallback to non-ECN. This patch enables congestion control module to control whether it should not fallback to classic ECN after unsuccessful AccECN negotiation. A new CA module flag (TCP_CONG_NO_FALLBACK_RFC3168) identifies this behavior expected by the CA. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-6-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiersChia-Yu Chang-7/+47
Two flags for congestion control (CC) module are added in this patch related to AccECN negotiation. First, a new flag (TCP_CONG_NEEDS_ACCECN) defines that the CC expects to negotiate AccECN functionality using the ECE, CWR and AE flags in the TCP header. Second, during ECN negotiation, ECT(0) in the IP header is used. This patch enables CC to control whether ECT(0) or ECT(1) should be used on a per-segment basis. A new flag (TCP_CONG_ECT_1_NEGOTIATION) defines the expected ECT value in the IP header by the CA when not-yet initialized for the connection. The detailed AccECN negotiaotn can be found in IETF RFC9768. Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com> Signed-off-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-5-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: try to avoid safer when ACKs are thinnedIlpo Järvinen-0/+1
Add newly acked pkts EWMA. When ACK thinning occurs, select between safer and unsafe cep delta in AccECN processing based on it. If the packets ACKed per ACK tends to be large, don't conservatively assume ACE field overflow. This patch uses the existing 2-byte holes in the rx group for new u16 variables withtout creating more holes. Below are the pahole outcomes before and after this patch: [BEFORE THIS PATCH] struct tcp_sock { [...] u32 delivered_ecn_bytes[3]; /* 2744 12 */ /* XXX 4 bytes hole, try to pack */ [...] __cacheline_group_end__tcp_sock_write_rx[0]; /* 2816 0 */ [...] /* size: 3264, cachelines: 51, members: 177 */ } [AFTER THIS PATCH] struct tcp_sock { [...] u32 delivered_ecn_bytes[3]; /* 2744 12 */ u16 pkts_acked_ewma; /* 2756 2 */ /* XXX 2 bytes hole, try to pack */ [...] __cacheline_group_end__tcp_sock_write_rx[0]; /* 2816 0 */ [...] /* size: 3264, cachelines: 51, members: 178 */ } Signed-off-by: Ilpo Järvinen <ij@kernel.org> Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-2-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03net: phy: remove modalias-based mdio bus matchingHeiner Kallweit-2/+0
Last user dsa_loop has been migrated away from modalias-based matching, so we can remove this feature now. It was the only user of MDIO_NAME_SIZE, so remove also this constant. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/ce1c6df0-4785-4b28-8322-32dc6bceea18@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03revocable: fix SRCU index corruption by requiring caller-provided storageTzung-Bi Shih-16/+38
The struct revocable handle stores the SRCU read-side index (idx) for the duration of a resource access. If multiple threads share the same struct revocable instance, they race on writing to the idx field, corrupting the SRCU state and potentially causing unsafe unlocks. Refactor the API to replace revocable_alloc()/revocable_free() with revocable_init()/revocable_deinit(). This change requires the caller to provide the storage for struct revocable. By moving storage ownership to the caller, the API ensures that concurrent users maintain their own private idx storage, eliminating the race condition. Reported-by: Johan Hovold <johan@kernel.org> Closes: https://lore.kernel.org/all/20260124170535.11756-4-johan@kernel.org/ Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Link: https://patch.msgid.link/20260129143733.45618-4-tzungbi@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-03revocable: Fix races in revocable_alloc() using RCUTzung-Bi Shih-5/+3
There are two race conditions when allocating a revocable instance: 1. After a struct revocable_provider is revoked, the caller might still hold a dangling pointer to it. A subsequent call to revocable_alloc() can trigger a use-after-free. 2. If revocable_provider_release() runs concurrently with revocable_alloc(), the memory of struct revocable_provider can be accessed during or after kfree(). To fix these: - Manage the lifetime of struct revocable_provider using RCU. Annotate pointers to it with __rcu and use kfree_rcu() for deallocation. - Update revocable_alloc() to safely acquire a reference using RCU primitives. - Update revocable_provider_revoke() to take a double pointer (`**rp`). It atomically NULLs out the caller's pointer before starting revocation. This prevents the caller from holding a dangling pointer. - Drop devm_revocable_provider_alloc(). The devm-managed model cannot support the required double-pointer semantic for safe pointer nulling. Reported-by: Johan Hovold <johan@kernel.org> Closes: https://lore.kernel.org/all/aXdy-b3GOJkzGqYo@hovoldconsulting.com/ Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Link: https://patch.msgid.link/20260129143733.45618-2-tzungbi@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-03Merge branch 'v6.19-rc8'Peter Zijlstra-483/+807
Update to avoid conflicts with /urgent patches. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2026-02-03btrfs: allow mounting filesystems with remap-tree incompat flagMark Harmstone-1/+4
If we encounter a filesystem with the remap-tree incompat flag set, validate its compatibility with the other flags, and load the remap tree using the values that have been added to the superblock. The remap-tree feature depends on the free-space-tree, but no-holes and block-group-tree have been made dependencies to reduce the testing matrix. Similarly I'm not aware of any reason why mixed-bg and zoned would be incompatible with remap-tree, but this is blocked for the time being until it can be fully tested. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: add extended version of struct block_group_itemMark Harmstone-0/+8
Add a struct btrfs_block_group_item_v2, which is used in the block group tree if the remap-tree incompat flag is set. This adds two new fields to the block group item: `remap_bytes` and `identity_remap_count`. `remap_bytes` records the amount of data that's physically within this block group, but nominally in another, remapped block group. This is necessary because this data will need to be moved first if this block group is itself relocated. If `remap_bytes` > 0, this is an indicator to the relocation thread that it will need to search the remap-tree for backrefs. A block group must also have `remap_bytes` == 0 before it can be dropped. `identity_remap_count` records how many identity remap items are located in the remap tree for this block group. When relocation is begun for this block group, this is set to the number of holes in the free-space tree for this range. As identity remaps are converted into actual remaps by the relocation process, this number is decreased. Once it reaches 0, either because of relocation or because extents have been deleted, the block group has been fully remapped and its chunk's device extents are removed. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: add METADATA_REMAP chunk typeMark Harmstone-1/+3
Add a new METADATA_REMAP chunk type, which is a metadata chunk that holds the remap tree. This is needed for bootstrapping purposes: the remap tree can't itself be remapped, and must be relocated the existing way, by COWing every leaf. The remap tree can't go in the SYSTEM chunk as space there is limited, because a copy of the chunk item gets placed in the superblock. The changes in fs/btrfs/volumes.h are because we're adding a new block group type bit after the profile bits, and so can no longer rely on the const_ilog2 trick. The sizing to 32MB per chunk, matching the SYSTEM chunk, is an estimate here, we can adjust it later if it proves to be too big or too small. This works out to be ~500,000 remap items, which for a 4KB block size covers ~2GB of remapped data in the worst case and ~500TB in the best case. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: add definitions and constants for remap-treeMark Harmstone-0/+18
Add an incompat flag for the new remap-tree feature, and the constants and definitions needed to support it. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03platform/chrome: lightbar: Fix lightbar_program_ex alignmentGwendal Grignou-2/+2
Make sure sub-command of lightbar command starts with a 8bit parameter to ensure alignment. Fixes: 9600b8bdbfe4 ("platform/chrome: lightbar: Add support for large sequence") Signed-off-by: Gwendal Grignou <gwendal@google.com> Link: https://lore.kernel.org/r/20260202100621.3608437-1-gwendal@google.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2026-02-02tcp: export tcp_splice_stateGeliang Tang-0/+11
Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h so that they can be used by MPTCP in the next patch. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-3-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: vmclock: support device notificationsBabis Chalios-0/+5
Add optional support for device notifications in VMClock. When supported, the hypervisor will send a device notification every time it updates the seq_count to a new even value. Moreover, add support for poll() in VMClock as a means to propagate this notification to user space. poll() will return a POLLIN event to listeners every time seq_count changes to a value different than the one last seen (since open() or last read()/pread()). This means that when poll() returns a POLLIN event, listeners need to use read() to observe what has changed and update the reader's view of seq_count. In other words, after a poll() returned, all subsequent calls to poll() will immediately return with a POLLIN event until the listener calls read(). The device advertises support for the notification mechanism by setting flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If the flag is not present the driver won't setup the ACPI notification handler and poll() will always immediately return POLLHUP. Signed-off-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Takahiro Itazuri <itazur@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-3-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: vmclock: add vm generation counterBabis Chalios-0/+15
Similar to live migration, loading a VM from some saved state (aka snapshot) is also an event that calls for clock adjustments in the guest. However, guests might want to take more actions as a response to such events, e.g. as discarding UUIDs, resetting network connections, reseeding entropy pools, etc. These are actions that guests don't typically take during live migration, so add a new field in the vmclock_abi called vm_generation_counter which informs the guest about such events. Hypervisor advertises support for vm_generation_counter through the VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the presence of this bit in vmclock_abi flags field before using this flag. Signed-off-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Takahiro Itazur <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-2-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: colocate inet6_cork in inet_cork_fullEric Dumazet-12/+12
All inet6_cork users also use one inet_cork_full. Reduce number of parameters and increase data locality. This saves ~275 bytes of code on x86_64. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02inet: add dst4_mtu() and dst6_mtu() helpersEric Dumazet-0/+12
With CONFIG_MITIGATION_RETPOLINE=y dst_mtu() is a bit fat, because it is generic. Indeed, clang does not always inline it. Add dst4_mtu() and dst6_mtu() helpers for callers that expect either ipv4_mtu() or ip6_mtu() to be called. These helpers are always inlined. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>