summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorLines
2026-02-27timekeeping: Provide infrastructure for coupled clockeventsThomas Gleixner-0/+9
Some architectures have clockevent devices which are coupled to the system clocksource by implementing a less than or equal comparator which compares the programmed absolute expiry time against the underlying time counter. Well known examples are TSC/TSC deadline timer and the S390 TOD clocksource/comparator. While the concept is nice it has some downsides: 1) The clockevents core code is strictly based on relative expiry times as that's the most common case for clockevent device hardware. That requires to convert the absolute expiry time provided by the caller (hrtimers, NOHZ code) to a relative expiry time by reading and substracting the current time. The clockevent::set_next_event() callback must then read the counter again to convert the relative expiry back into a absolute one. 2) The conversion factors from nanoseconds to counter clock cycles are set up when the clockevent is registered. When NTP applies corrections then the clockevent conversion factors can deviate from the clocksource conversion substantially which either results in timers firing late or in the worst case early. The early expiry then needs to do a reprogam with a short delta. In most cases this is papered over by the fact that the read in the set_next_event() callback happens after the read which is used to calculate the delta. So the tendency is that timers expire mostly late. All of this can be avoided by providing support for these devices in the core code: 1) The timekeeping core keeps track of the last update to the clocksource by storing the base nanoseconds and the corresponding clocksource counter value. That's used to keep the conversion math for reading the time within 64-bit in the common case. This information can be used to avoid both reads of the underlying clocksource in the clockevents reprogramming path: delta = expiry - base_ns; cycles = base_cycles + ((delta * clockevent::mult) >> clockevent::shift); The resulting cycles value can be directly used to program the comparator. 2) As #1 does not longer provide the "compensation" through the second read the deviation of the clocksource and clockevent conversions caused by NTP become more prominent. This can be cured by letting the timekeeping core compute and store the reverse conversion factors when the clocksource cycles to nanoseconds factors are modified by NTP: CS::MULT (1 << NS_TO_CYC_SHIFT) --------------- = ---------------------- (1 << CS:SHIFT) NS_TO_CYC_MULT Ergo: NS_TO_CYC_MULT = (1 << (CS::SHIFT + NS_TO_CYC_SHIFT)) / CS::MULT The NS_TO_CYC_SHIFT value is calculated when the clocksource is installed so that it aims for a one hour maximum sleep time. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.944763521@kernel.org
2026-02-27timekeeping: Allow inlining clocksource::read()Thomas Gleixner-0/+2
On some architectures clocksource::read() boils down to a single instruction, so the indirect function call is just a massive overhead especially with speculative execution mitigations in effect. Allow architectures to enable conditional inlining of that read to avoid that by: - providing a static branch to switch to the inlined variant - disabling the branch before clocksource changes - enabling the branch after a clocksource change, when the clocksource indicates in a feature flag that it is the one which provides the inlined variant This is intentionally not a static call as that would only remove the indirect call, but not the rest of the overhead. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.675151545@kernel.org
2026-02-27clockevents: Remove redundant CLOCK_EVT_FEAT_KTIMEThomas Gleixner-1/+0
The only real usecase for this is the hrtimer based broadcast device. No point in using two different feature flags for this. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.609049777@kernel.org
2026-02-27hrtimer: Provide LAZY_REARM modePeter Zijlstra-0/+11
The hrtick timer is frequently rearmed before expiry and most of the time the new expiry is past the armed one. As this happens on every context switch it becomes expensive with scheduling heavy work loads especially in virtual machines as the "hardware" reprogamming implies a VM exit. Add a lazy rearm mode flag which skips the reprogamming if: 1) The timer was the first expiring timer before the rearm 2) The new expiry time is farther out than the armed time This avoids a massive amount of reprogramming operations of the hrtick timer for the price of eventually taking the alredy armed interrupt for nothing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.408524456@kernel.org
2026-02-27sched: Use hrtimer_highres_enabled()Thomas Gleixner-6/+0
Use the static branch based variant and thereby avoid following three pointers. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.203610956@kernel.org
2026-02-27hrtimer: Provide a static branch based hrtimer_hres_enabled()Thomas Gleixner-4/+9
The scheduler evaluates this via hrtimer_is_hres_active() every time it has to update HRTICK. This needs to follow three pointers, which is expensive. Provide a static branch based mechanism to avoid that. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.136503358@kernel.org
2026-02-27platform_data/mlxreg: mlxreg.h: fix all kernel-doc warningsRandy Dunlap-7/+7
Use the correct kernel-doc format & notation to eliminate kernel-doc warnings: Warning: include/linux/platform_data/mlxreg.h:24 Enum value 'MLX_WDT_TYPE1' not described in enum 'mlxreg_wdt_type' Warning: include/linux/platform_data/mlxreg.h:24 Enum value 'MLX_WDT_TYPE2' not described in enum 'mlxreg_wdt_type' Warning: include/linux/platform_data/mlxreg.h:24 Enum value 'MLX_WDT_TYPE3' not described in enum 'mlxreg_wdt_type' Warning: include/linux/platform_data/mlxreg.h:37 bad line: PHYs ready / unready state; Warning: include/linux/platform_data/mlxreg.h:153 struct member 'np' not described in 'mlxreg_core_data' Warning: include/linux/platform_data/mlxreg.h:153 struct member 'hpdev' not described in 'mlxreg_core_data' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260226051232.549537-1-rdunlap@infradead.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-02-27gpio: introduce a header for symbols shared by suppliers and consumersBartosz Golaszewski-3/+13
GPIO_LINE_DIRECTION_IN/OUT definitions are used both in supplier (GPIO controller drivers) as well as consumer code. In order to not force the consumers to include gpio/driver.h or - even worse - to redefine these values, create a new header file - gpio/defs.h - and move them over there. Include this header from both gpio/consumer.h and gpio/driver.h. Reviewed-by: Linus Walleij <linusw@kernel.org> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20260223172006.204268-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2026-02-27gpio: generic: Don't use 'proxy' headersAndy Shevchenko-1/+7
Update header inclusions to follow IWYU (Include What You Use) principle. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260226092023.4096921-1-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2026-02-27drm/i915/overlay: Convert overlay to parent interfaceVille Syrjälä-0/+33
Convert the direct i915_overlay_*() calls from the display side to go over a new parent interface instead. v2: Correctly handle the ERR_PTR returned by i915_overlay_obj_lookup() (Jani) v3: Rebase due to the NULL check in intel_overlay_cleanup() Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260226130150.16816-1-ville.syrjala@linux.intel.com
2026-02-26netmem: remove the pp fields from net_iovByungchul Park-37/+1
Now that the pp fields in net_iov have no users, remove them from net_iov and clean up. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20260224061424.11219-1-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27dmaengine: fsl-edma: fix all kernel-doc warningsRandy Dunlap-2/+3
Use the correct kernel-doc format and struct member names to eliminate these kernel-doc warnings: Warning: include/linux/platform_data/dma-mcf-edma.h:35 struct member 'dma_channels' not described in 'mcf_edma_platform_data' Warning: include/linux/platform_data/dma-mcf-edma.h:35 struct member 'slave_map' not described in 'mcf_edma_platform_data' Warning: include/linux/platform_data/dma-mcf-edma.h:35 struct member 'slavecnt' not described in 'mcf_edma_platform_data' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260226051220.548566-1-rdunlap@infradead.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-02-27ASoC: remove snd_soc_pcm_subclassKuninori Morimoto-7/+1
enum snd_soc_pcm_subclass has added at v3.1 commit b8c0dab9bf337 ("ASoC: core - PCM mutex per rtd"), but has never been used during this 15 years. Let's remove it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/878qcfyogw.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-02-27SDCA ImprovementsMark Brown-1/+0
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>: Another fairly mixed bag of small SDCA fixes/improvements. Fix one DisCo property that was treated as mandatory but is actually not present in the first version of the specification. Fix the counting of routes for SU/GE DAPM widgets, this currently makes assumptions that are not guaranteed to be true which can result in too many/few DAPM routes. Then finally a couple improvements to the volume controls, simplify the mapping between ALSA and SDCA volumes and pull the volume stuff back into the SDCA code. It just wasn't sitting right with me that it was being handled in the ASoC core given it is unlikely to ever see any reuse outside of SDCA.
2026-02-26Merge tag 'mm-hotfixes-stable-2026-02-26-14-14' of ↵Linus Torvalds-6/+18
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "12 hotfixes. 7 are cc:stable. 8 are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-02-26-14-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: update Yosry Ahmed's email address mailmap: add entry for Daniele Alessandrelli mm: fix NULL NODE_DATA dereference for memoryless nodes on boot mm/tracing: rss_stat: ensure curr is false from kthread context mm/kfence: fix KASAN hardware tag faults during late enablement mm/damon/core: disallow non-power of two min_region_sz Squashfs: check metadata block offset is within range MAINTAINERS, mailmap: update e-mail address for Vlastimil Babka liveupdate: luo_file: remember retrieve() status mm: thp: deny THP for files on anonymous inodes mm: change vma_alloc_folio_noprof() macro to inline function mm/kfence: disable KFENCE upon KASAN HW tags enablement
2026-02-26Merge tag 'pm-7.0-rc2' of ↵Linus Torvalds-14/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two intel_pstate driver issues causing it to crash on sysfs attribute accesses when some CPUs in the system are offline, finalize changes related to turning pm_runtime_put() into a void function, and update Daniel Lezcano's contact information: - Fix two issues in the intel_pstate driver causing it to crash when its sysfs interface is used on a system with some offline CPUs (David Arcari, Srinivas Pandruvada) - Update the last user of the pm_runtime_put() return value to discard it and turn pm_runtime_put() into a void function (Rafael Wysocki) - Update Daniel Lezcano's contact information in MAINTAINERS and .mailmap (Daniel Lezcano)" * tag 'pm-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: MAINTAINERS: Update contact with the kernel.org address cpufreq: intel_pstate: Fix crash during turbo disable cpufreq: intel_pstate: Fix NULL pointer dereference in update_cpu_qos_request() PM: runtime: Change pm_runtime_put() return type to void pmdomain: imx: gpcv2: Discard pm_runtime_put() return value
2026-02-26drm/i915/dpt: pass opaque struct intel_dpt around instead of i915_address_spaceJani Nikula-5/+5
struct i915_address_space is used in an opaque fashion in the display parent interface, but it's just one include away from being non-opaque. And anyway the name is rather specific. Switch to using the struct intel_dpt instead, which embeds struct i915_address_space anyway. With the definition hidden in i915_dpt.c, this can't be accidentally made non-opaque, and the type seems rather more generic anyway. We do have to add a new helper i915_dpt_to_vm(), as there's one case in intel_fb_pin_to_dpt() that requires direct access to struct i915_address_space. But this just underlines the point about opacity. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://patch.msgid.link/daa39178c0b0305b010564952d691f06e3cd63ca.1772030909.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-02-26drm/i915/dpt: move suspend/resume to parent interfaceJani Nikula-0/+2
Add per-vm DPT suspend/resume calls to the display parent interface, and lift the generic code away from i915 specific code. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://patch.msgid.link/080945a49559ec1f5183ad409e1526736e828d90.1772030909.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-02-26drm/i915/dpt: move create/destroy to parent interfaceJani Nikula-0/+9
Move the DPT create/destroy calls to the display parent interface. With this, we can remove the dummy xe implementation. Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Link: https://patch.msgid.link/9753b21466c668872f468ccff827eab7be034b0c.1772030909.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-02-26ASoC: SDCA: Pull the Q7.8 volume helpers out of soc-opsCharles Keepax-1/+0
It is cleaner to keep the SDCA code contained and not update the core code for things that are unlikely to see reuse outside of SDCA. Move the Q7.8 volume helpers back into the SDCA core code. Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20260225140118.402695-5-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-02-26kbuild: Split .modinfo out from ELF_DETAILSNathan Chancellor-1/+3
Commit 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and ELF_DETAILS was present in all architecture specific vmlinux linker scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and COMMON_DISCARDS may be used by other linker scripts, such as the s390 and x86 compressed boot images, which may not expect to have a .modinfo section. In certain circumstances, this could result in a bootloader failing to load the compressed kernel [1]. Commit ddc6cbef3ef1 ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer") recently addressed this for the s390 bzImage but the same bug remains for arm, parisc, and x86. The presence of .modinfo in the x86 bzImage was the root cause of the issue worked around with commit d50f21091358 ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition. Split .modinfo out from ELF_DETAILS into its own macro and handle it in all vmlinux linker scripts. Discard .modinfo in the places where it was previously being discarded from being in COMMON_DISCARDS, as it has never been necessary in those uses. Cc: stable@vger.kernel.org Fixes: 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") Reported-by: Ed W <lists@wildgooses.com> Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1] Tested-by: Ed W <lists@wildgooses.com> # x86_64 Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-133/+183
Cross-merge networking fixes after downstream PR (net-7.0-rc2). Conflicts: tools/testing/selftests/drivers/net/hw/rss_ctx.py 19c3a2a81d2b ("selftests: drv-net: rss: Generate unique ports for RSS context tests") ce5a0f4612db ("selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up") include/net/inet_connection_sock.h 858d2a4f67ff6 ("tcp: fix potential race in tcp_v6_syn_recv_sock()") fcd3d039fab69 ("tcp: make tcp_v{4,6}_send_check() static") https://lore.kernel.org/aZ8PSFLzBrEU3I89@sirena.org.uk drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c 69050f8d6d075 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") bf4afc53b77ae ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") 8a96b9144f18a ("net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk()") Adjacent changes: net/netfilter/ipvs/ip_vs_ctl.c c59bd9e62e06 ("ipvs: use more counters to avoid service lookups") bf4afc53b77a ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26Merge tag 'kmalloc_obj-v7.0-rc2' of ↵Linus Torvalds-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kmalloc_obj fixes from Kees Cook: - Fix pointer-to-array allocation types for ubd and kcsan - Force size overflow helpers to __always_inline - Bump __builtin_counted_by_ref to Clang 22.1 from 22.0 (Nathan Chancellor) * tag 'kmalloc_obj-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kcsan: test: Adjust "expect" allocation type for kmalloc_obj overflow: Make sure size helpers are always inlined init/Kconfig: Adjust fixed clang version for __builtin_counted_by_ref ubd: Use pointer-to-pointers for io_thread_req arrays
2026-02-26ACPI: TAD/x86: cmos_rtc: Consolidate address space handler setupRafael J. Wysocki-9/+0
On x86, as a rule the CMOS RTC address space handler is set up by the CMOS RTC ACPI scan handler attach callback, acpi_cmos_rtc_attach(), but if the ACPI namespace does not contain a CMOS RTC device object, the CMOS RTC address space handler installation is taken care of the ACPI TAD (Timer and Alarm Device) driver. This is not particularly straightforward and can be avoided by adding the ACPI TAD device ID to the CMOS RTC ACPI scan handler which will cause it to create a platform device for ACPI TAD after installing the CMOS RTC address space handler. One related detail that needs to be taken care of, though, is that the creation of an ACPI TAD platform device should not cause cmos_rtc_platform_device_present to be set, since this may cause add_rtc_cmos() to suppress the creation of a fallback CMOS RTC platform device which may not be the right thing to do (for instance, due to the fact that the ACPI TAD driver is missing an RTC class device interface). After doing the above, the CMOS RTC address space handler installation and removal can be dropped from the ACPI TAD driver (which allows it to be simplified quite a bit), acpi_remove_cmos_rtc_space_handler() can be dropped and acpi_install_cmos_rtc_space_handler() can be made static. Update the code as per the above. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/23028644.EfDdHjke4D@rafael.j.wysocki
2026-02-26ACPI: x86/rtc-cmos: Use platform device for driver bindingRafael J. Wysocki-0/+6
Modify the rtc-cmos driver to bind to a platform device on systems with ACPI via acpi_match_table and advertise the CMOST RTC ACPI device IDs for driver auto-loading. Note that adding the requisite device IDs to it and exposing them via MODULE_DEVICE_TABLE() is sufficient for this purpose. Since the ACPI device IDs in question are the same as for the CMOS RTC ACPI scan handler, put them into a common header file and use the definition from there in both places. Additionally, to prevent a PNP device from being created for the CMOS RTC if a platform one is present already, make is_cmos_rtc_device() check cmos_rtc_platform_device_present introduced previously. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://patch.msgid.link/13969123.uLZWGnKmhe@rafael.j.wysocki
2026-02-26ACPI: x86: cmos_rtc: Create a CMOS RTC platform deviceRafael J. Wysocki-0/+4
Make the CMOS RTC ACPI scan handler create a platform device that will be used subsequently by rtc-cmos for driver binding on x86 systems with ACPI and update add_rtc_cmos() to skip registering a fallback platform device for the CMOS RTC when the above one has been registered. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # x86 Link: https://patch.msgid.link/1962427.tdWV9SEqCh@rafael.j.wysocki
2026-02-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds-1/+1
Pull rdma fixes from Jason Gunthorpe: "Seems bigger than usual, a number of things were posted near/during the merg window: - Fix some compilation regressions related to the new DMABUF code - Close a race with ib_register_device() vs netdev events that causes GID table corruption - Compilation warnings with some compilers in bng_re - Correct error unwind in bng_re and the umem pinned dmabuf - Avoid NULL pointer crash in ionic during query_port() - Check the size for uAPI validation checks in EFA - Several system call stack leaks in drivers found with AI - Fix the new restricted_node_type so it works with wildcard listens too" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/uverbs: Import DMA-BUF module in uverbs_std_types_dmabuf file RDMA/umem: Fix double dma_buf_unpin in failure path RDMA/core: Check id_priv->restricted_node_type in cma_listen_on_dev() RDMA/ionic: Fix kernel stack leak in ionic_create_cq() RDMA/irdma: Fix kernel stack leak in irdma_create_user_ah() IB/mthca: Add missed mthca_unmap_user_db() for mthca_create_srq() RDMA/efa: Fix typo in efa_alloc_mr() RDMA/ionic: Fix potential NULL pointer dereference in ionic_query_port RDMA/bng_re: Unwind bng_re_dev_init properly RDMA/bng_re: Remove unnessary validity checks RDMA/core: Fix stale RoCE GIDs during netdev events at registration RDMA/uverbs: select CONFIG_DMA_SHARED_BUFFER
2026-02-26mm/slub: drop duplicate kernel-doc for ksize()Sanjay Chitroda-12/+0
The implementation of ksize() was updated with kernel-doc by commit fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c") However, the public header still contains a kernel-doc comment attached to the ksize() prototype. Having documentation both in the header and next to the implementation causes Sphinx to treat the function as being documented twice, resulting in the warning: WARNING: Duplicate C declaration, also defined at core-api/mm-api:521 Declaration is '.. c:function:: size_t ksize(const void *objp)' Kernel-doc guidelines recommend keeping the documentation with the function implementation. Therefore remove the redundant kernel-doc block from include/linux/slab.h so that the implementation in slub.c remains the canonical source for documentation. No functional change. Fixes: fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c") Signed-off-by: Sanjay Chitroda <sanjayembeddedse@gmail.com> Link: https://patch.msgid.link/20260226054712.3610744-1-sanjayembedded@gmail.com Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
2026-02-26mm/slab: mark alloc tags empty for sheaves allocated with __GFP_NO_OBJ_EXTSuren Baghdasaryan-0/+2
alloc_empty_sheaf() allocates sheaves from SLAB_KMALLOC caches using __GFP_NO_OBJ_EXT to avoid recursion, however it does not mark their allocation tags empty before freeing, which results in a warning when CONFIG_MEM_ALLOC_PROFILING_DEBUG is set. Fix this by marking allocation tags for such sheaves as empty. The problem was technically introduced in commit 4c0a17e28340 but only becomes possible to hit with commit 913ffd3a1bf5. Fixes: 4c0a17e28340 ("slab: prevent recursive kmalloc() in alloc_empty_sheaf()") Fixes: 913ffd3a1bf5 ("slab: handle kmalloc sheaves bootstrap") Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20260223155128.3849-1-00107082@163.com/ Analyzed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Tested-by: Harry Yoo <harry.yoo@oracle.com> Tested-by: David Wang <00107082@163.com> Link: https://patch.msgid.link/20260225163407.2218712-1-surenb@google.com Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
2026-02-26Merge tag 'net-7.0-rc2' of ↵Linus Torvalds-8/+26
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...
2026-02-26netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequenceDavid Howells-1/+3
Fix netfslib such that when it's making an unbuffered or DIO write, to make sure that it sends each subrequest strictly sequentially, waiting till the previous one is 'committed' before sending the next so that we don't have pieces landing out of order and potentially leaving a hole if an error occurs (ENOSPC for example). This is done by copying in just those bits of issuing, collecting and retrying subrequests that are necessary to do one subrequest at a time. Retrying, in particular, is simpler because if the current subrequest needs retrying, the source iterator can just be copied again and the subrequest prepped and issued again without needing to be concerned about whether it needs merging with the previous or next in the sequence. Note that the issuing loop waits for a subrequest to complete right after issuing it, but this wait could be moved elsewhere allowing preparatory steps to be performed whilst the subrequest is in progress. In particular, once content encryption is available in netfslib, that could be done whilst waiting, as could cleanup of buffers that have been completed. Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/58526.1772112753@warthog.procyon.org.uk Tested-by: Steve French <sfrench@samba.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-26bonding: print churn state via netlinkHangbin Liu-0/+2
Currently, the churn state is printed only in sysfs. Add netlink support so users could get the state via netlink. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260224020215.6012-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26pppoe: remove kernel-mode relay supportQingfang Deng-16/+0
The kernel-mode PPPoE relay feature and its two associated ioctls (PPPOEIOCSFWD and PPPOEIOCDFWD) are not used by any existing userspace PPPoE implementations. The most commonly-used package, RP-PPPoE [1], handles the relaying entirely in userspace. This legacy code has remained in the driver since its introduction in kernel 2.3.99-pre7 for over two decades, but has served no practical purpose. Remove the unused relay code. [1] https://dianne.skoll.ca/projects/rp-pppoe/ Signed-off-by: Qingfang Deng <dqfext@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20260224015053.42472-1-dqfext@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26vsock: lock down child_ns_mode as write-onceBobby Eshleman-2/+14
Two administrator processes may race when setting child_ns_mode as one process sets child_ns_mode to "local" and then creates a namespace, but another process changes child_ns_mode to "global" between the write and the namespace creation. The first process ends up with a namespace in "global" mode instead of "local". While this can be detected after the fact by reading ns_mode and retrying, it is fragile and error-prone. Make child_ns_mode write-once so that a namespace manager can set it once and be sure it won't change. Writing a different value after the first write returns -EBUSY. This applies to all namespaces, including init_net, where an init process can write "local" to lock all future namespaces into local mode. Fixes: eafb64f40ca4 ("vsock: add netns to vsock core") Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26kthread: consolidate kthread exit paths to prevent use-after-freeChristian Brauner-1/+20
Guillaume reported crashes via corrupted RCU callback function pointers during KUnit testing. The crash was traced back to the pidfs rhashtable conversion which replaced the 24-byte rb_node with an 8-byte rhash_head in struct pid, shrinking it from 160 to 144 bytes. struct kthread (without CONFIG_BLK_CGROUP) is also 144 bytes. With CONFIG_SLAB_MERGE_DEFAULT and SLAB_HWCACHE_ALIGN both round up to 192 bytes and share the same slab cache. struct pid.rcu.func and struct kthread.affinity_node both sit at offset 0x78. When a kthread exits via make_task_dead() it bypasses kthread_exit() and misses the affinity_node cleanup. free_kthread_struct() frees the memory while the node is still linked into the global kthread_affinity_list. A subsequent list_del() by another kthread writes through dangling list pointers into the freed and reused memory, corrupting the pid's rcu.func pointer. Instead of patching free_kthread_struct() to handle the missed cleanup, consolidate all kthread exit paths. Turn kthread_exit() into a macro that calls do_exit() and add kthread_do_exit() which is called from do_exit() for any task with PF_KTHREAD set. This guarantees that kthread-specific cleanup always happens regardless of the exit path - make_task_dead(), direct do_exit(), or kthread_exit(). Replace __to_kthread() with a new tsk_is_kthread() accessor in the public header. Export do_exit() since module code using the kthread_exit() macro now needs it directly. Reported-by: Guillaume Tucker <gtucker@gtucker.io> Tested-by: Guillaume Tucker <gtucker@gtucker.io> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: David Gow <davidgow@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/all/20260224-mittlerweile-besessen-2738831ae7f6@brauner Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 4d13f4304fa4 ("kthread: Implement preferred affinity") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-25netfilter: nf_tables: remove register tracking infrastructureFlorian Westphal-37/+0
This facility was disabled in commit 9e539c5b6d9c ("netfilter: nf_tables: disable expression reduction infra"), because not all nft_exprs guarantee they will update the destination register: some may set NFT_BREAK instead to cancel evaluation of the rule. This has been dead code ever since. There are no plans to salvage this at this time, so remove this. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-10-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25ipvs: no_cport and dropentry counters can be per-netJulian Anastasov-0/+2
Change the no_cport counters to be per-net and address family. This should reduce the extra conn lookups done during present NO_CPORT connections. By changing from global to per-net dropentry counters, one net will not affect the drop rate of another net. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-7-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25ipvs: use more counters to avoid service lookupsJulian Anastasov-6/+18
When new connection is created we can lookup for services multiple times to support fallback options. We already have some counters to skip specific lookups because it costs CPU cycles for hash calculation, etc. Add more counters for fwmark/non-fwmark services (fwm_services and nonfwm_services) and make all counters per address family. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-6-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25ipvs: use single svc tableJulian Anastasov-6/+2
fwmark based services and non-fwmark based services can be hashed in same service table. This reduces the burden of working with two tables. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-4-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25ipvs: make ip_vs_svc_table and ip_vs_svc_fwm_table per netnsJiejian Wu-0/+13
Current ipvs uses one global mutex "__ip_vs_mutex" to keep the global "ip_vs_svc_table" and "ip_vs_svc_fwm_table" safe. But when there are tens of thousands of services from different netns in the table, it takes a long time to look up the table, for example, using "ipvsadm -ln" from different netns simultaneously. We make "ip_vs_svc_table" and "ip_vs_svc_fwm_table" per netns, and we add "service_mutex" per netns to keep these two tables safe instead of the global "__ip_vs_mutex" in current version. To this end, looking up services from different netns simultaneously will not get stuck, shortening the time consumption in large-scale deployment. It can be reproduced using the simple scripts below. init.sh: #!/bin/bash for((i=1;i<=4;i++));do ip netns add ns$i ip netns exec ns$i ip link set dev lo up ip netns exec ns$i sh add-services.sh done add-services.sh: #!/bin/bash for((i=0;i<30000;i++)); do ipvsadm -A -t 10.10.10.10:$((80+$i)) -s rr done runtest.sh: #!/bin/bash for((i=1;i<4;i++));do ip netns exec ns$i ipvsadm -ln > /dev/null & done ip netns exec ns4 ipvsadm -ln > /dev/null Run "sh init.sh" to initiate the network environment. Then run "time ./runtest.sh" to evaluate the time consumption. Our testbed is a 4-core Intel Xeon ECS. The result of the original version is around 8 seconds, while the result of the modified version is only 0.8 seconds. Signed-off-by: Jiejian Wu <jiejian@linux.alibaba.com> Co-developed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-2-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25net: pppoe: avoid zero-length arrays in struct pppoe_hdrEric Woudstra-0/+4
Jakub Kicinski reported following issue in upcoming patches: W=1 C=1 GCC build gives us: net/bridge/netfilter/nf_conntrack_bridge.c: note: in included file (through ../include/linux/if_pppox.h, ../include/uapi/linux/netfilter_bridge.h, ../include/linux/netfilter_bridge.h): include/uapi/linux/if_pppox.h: 153:29: warning: array of flexible structures sparse doesn't like that hdr has a zero-length array which overlaps proto. The kernel code doesn't currently need those arrays. PPPoE connection is functional after applying this patch. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Link: https://patch.msgid.link/20260224155030.106918-1-ericwouds@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25mshv: add arm64 support for doorbell & intercept SINTsAnirudh Rayabharam (Microsoft)-0/+2
On x86, the HYPERVISOR_CALLBACK_VECTOR is used to receive synthetic interrupts (SINTs) from the hypervisor for doorbells and intercepts. There is no such vector reserved for arm64. On arm64, the hypervisor exposes a synthetic register that can be read to find the INTID that should be used for SINTs. This INTID is in the PPI range. To better unify the code paths, introduce mshv_sint_vector_init() that either reads the synthetic register and obtains the INTID (arm64) or just uses HYPERVISOR_CALLBACK_VECTOR as the interrupt vector (x86). Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-25Merge tag 'vfs-7.0-rc2.fixes' of ↵Linus Torvalds-13/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix an uninitialized variable in file_getattr(). The flags_valid field wasn't initialized before calling vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is not enabled. sysctl_hung_task_timeout_secs is 0 in that case causing spurious "waiting for writeback completion for more than 1 seconds" warnings - Fix a null-ptr-deref in do_statmount() when the mount is internal - Add missing kernel-doc description for the @private parameter in iomap_readahead() - Fix mount namespace creation to hold namespace_sem across the mount copy in create_new_namespace(). The previous drop-and-reacquire pattern was fragile and failed to clean up mount propagation links if the real rootfs was a shared or dependent mount - Fix /proc mount iteration where m->index wasn't updated when m->show() overflows, causing a restart to repeatedly show the same mount entry in a rapidly expanding mount table - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the inode number is out of range - Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared. copy_mnt_ns() received the live fs_struct so if a subsequent namespace creation failed the rollback would leave pwd and root pointing to detached mounts. Always allocate a new fs_struct when CLONE_NEWNS is requested - fserror bug fixes: - Remove the unused fsnotify_sb_error() helper now that all callers have been converted to fserror_report_metadata - Fix a lockdep splat in fserror_report() where igrab() takes inode::i_lock which can be held in IRQ context. Replace igrab() with a direct i_count bump since filesystems should not report inodes that are about to be freed or not yet exposed - Handle error pointer in procfs for try_lookup_noperm() - Fix an integer overflow in ep_loop_check_proc() where recursive calls returning INT_MAX would overflow when +1 is added, breaking the recursion depth check - Fix a misleading break in pidfs * tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pidfs: avoid misleading break eventpoll: Fix integer overflow in ep_loop_check_proc() proc: Fix pointer error dereference fserror: fix lockdep complaint when igrabbing inode fsnotify: drop unused helper unshare: fix unshare_fs() handling minix: Correct errno in minix_new_inode namespace: fix proc mount iteration mount: hold namespace_sem across copy in create_new_namespace() iomap: Describe @private in iomap_readahead() statmount: Fix the null-ptr-deref in do_statmount() writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK fs: init flags_valid before calling vfs_fileattr_get
2026-02-25mtd: spinand: Clean the flags sectionMiquel Raynal-2/+3
Mention that we are declaring the main SPI NAND flags with a comment. Align the values with tabs. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-02-25mtd: Add driver for concatenating devicesAmit Kumar Mahapatra-1/+50
Introducing CONFIG_MTD_VIRT_CONCAT to separate the legacy flow from the new approach, where only the concatenated partition is registered as an MTD device, while the individual partitions that form it are not registered independently, as they are typically not required by the user. CONFIG_MTD_VIRT_CONCAT is a boolean configuration option that depends on CONFIG_MTD_PARTITIONED_MASTER. When enabled, it allows flash nodes to be exposed as individual MTD devices along with the other partitions. The solution focuses on fixed-partitions description only as it depends on device boundaries. It supports multiple sets of concatenated devices, each comprising two or more partitions. flash@0 { reg = <0>; partitions { compatible = "fixed-partitions"; part0@0 { part-concat-next = <&flash0_part1>; label = "part0_0"; reg = <0x0 0x800000>; }; flash0_part1: part1@800000 { label = "part0_1"; reg = <800000 0x800000>; }; part2@1000000 { part-concat-next = <&flash1_part0>; label = "part0_2"; reg = <0x800000 0x800000>; }; }; }; flash@1 { reg = <1>; partitions { compatible = "fixed-partitions"; flash1_part0: part1@0 { label = "part1_0"; reg = <0x0 0x800000>; }; part1@800000 { label = "part1_1"; reg = <0x800000 0x800000>; }; }; }; The partitions that gets created are flash@0 part0_0-part0_1-concat flash@1 part1_1 part0_2-part1_0-concat Suggested-by: Bernhard Frauendienst <kernel@nospam.obeliks.de> Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-02-25mtd: Move struct mtd_concat definition to header fileAmit Kumar Mahapatra-0/+12
To enable a more generic approach for concatenating MTD devices, struct mtd_concat should be accessible beyond the mtdconcat driver. Therefore, the definition is being moved to a header file. Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-02-25sched/deadline: Add reporting of runtime left & abs deadline to ↵Tommaso Cucinotta-0/+3
sched_getattr() for DEADLINE tasks The SCHED_DEADLINE scheduler allows reading the statically configured run-time, deadline, and period parameters through the sched_getattr() system call. However, there is no immediate way to access, from user space, the current parameters used within the scheduler: the instantaneous runtime left in the current cycle, as well as the current absolute deadline. The `flags' sched_getattr() parameter, so far mandated to contain zero, now supports the SCHED_GETATTR_FLAG_DL_DYNAMIC=1 flag, to request retrieval of the leftover runtime and absolute deadline, converted to a CLOCK_MONOTONIC reference, instead of the statically configured parameters. This feature is useful for adaptive SCHED_DEADLINE tasks that need to modify their behavior depending on whether or not there is enough runtime left in the current period, and/or what is the current absolute deadline. Notes: - before returning the instantaneous parameters, the runtime is updated; - the abs deadline is returned shifted from rq_clock() to ktime_get_ns(), in CLOCK_MONOTONIC reference; this causes multiple invocations from the same period to return values that may differ for a few ns (showing some small drift), albeit the deadline doesn't move, in rq_clock() reference; - the abs deadline value returned to user-space, as unsigned 64-bit value, can represent nearly 585 years since boot time; - setting flags=0 provides the old behavior (retrieve static parameters). See also the notes from discussion held at OSPM 2025 on the topic "Making user space aware of current deadline-scheduler parameters". Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Matteo Martelli <matteo.martelli@codethink.co.uk> Link: https://patch.msgid.link/20250912053937.31636-2-tommaso.cucinotta@santannapisa.it
2026-02-25RDMA/core: Prepare create CQ path for API unificationLeon Romanovsky-2/+1
Ensure that .create_cq_umem() and .create_cq() follow the same API contract, allowing drivers to be gradually migrated to the umem-aware CQ management flow. Link: https://patch.msgid.link/20260213-refactor-umem-v1-7-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA/core: Manage CQ umem in core codeLeon Romanovsky-0/+1
In the current implementation, CQ umem is handled both by ib_core and the driver. ib_core sometimes creates and destroys it, while the driver also destroys it. Store the umem in struct ib_cq and ensure that only ib_core manages its lifetime, relying solely on its internal reference counter. Link: https://patch.msgid.link/20260213-refactor-umem-v1-5-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA/umem: Remove unnecessary includes and defines from ib_umem headerLeon Romanovsky-4/+0
The ib_umem header no longer requires the removed includes or forward declarations, so drop them to reduce clutter. Link: https://patch.msgid.link/20260213-refactor-umem-v1-3-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>