summaryrefslogtreecommitdiffstats
path: root/mm/damon
AgeCommit message (Collapse)AuthorLines
2026-04-19Merge tag 'mm-stable-2026-04-18-02-14' of ↵Linus Torvalds-40/+53
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song) Address the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory - "fix unexpected type conversions and potential overflows" (Qi Zheng) Fix a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series - "kho: history: track previous kernel version and kexec boot count" (Breno Leitao) Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time - "liveupdate: prevent double preservation" (Pasha Tatashin) Teach LUO to avoid managing the same file across different active sessions - "liveupdate: Fix module unloading and unregister API" (Pasha Tatashin) Address an issue with how LUO handles module reference counting and unregistration during module unloading - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar) Simplify and clean up the zswap crypto compression handling and improve the lifecycle management of zswap pool's per-CPU acomp_ctx resources - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" (SeongJae Park) Address unlikely but possible leaks and deadlocks in damon_call() and damon_walk() - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park) Fix a couple of root-only wild pointer dereferences - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" (SeongJae Park) Update the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time - "Minor hmm_test fixes and cleanups" (Alistair Popple) Bugfixes and a cleanup for the HMM kernel selftests - "Modify memfd_luo code" (Chenghao Duan) Cleanups, simplifications and speedups to the memfd_lou code - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport) Support for userfaultfd in guest_memfd - "selftests/mm: skip several tests when thp is not available" (Chunyu Hu) Fix several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels - "mm/mprotect: micro-optimization work" (Pedro Falcato) A couple of nice speedups for mprotect() - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav) Document upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree * tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits) MAINTAINERS: add page cache reviewer mm/vmscan: avoid false-positive -Wuninitialized warning MAINTAINERS: update Dave's kdump reviewer email address MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE MAINTAINERS: drop include/linux/kho/abi/ from KHO MAINTAINERS: update KHO and LIVE UPDATE maintainers MAINTAINERS: update kexec/kdump maintainers entries mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd() selftests: mm: skip charge_reserved_hugetlb without killall userfaultfd: allow registration of ranges below mmap_min_addr mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update mm/hugetlb: fix early boot crash on parameters without '=' separator zram: reject unrecognized type= values in recompress_store() docs: proc: document ProtectionKey in smaps mm/mprotect: special-case small folios when applying permissions mm/mprotect: move softleaf code out of the main function mm: remove '!root_reclaim' checking in should_abort_scan() mm/sparse: fix comment for section map alignment mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete() selftests/mm: transhuge_stress: skip the test when thp not available ...
2026-04-18mm/damon/core: use time_in_range_open() for damos quota window startSeongJae Park-1/+2
damos_adjust_quota() uses time_after_eq() to show if it is time to start a new quota charge window, comparing the current jiffies and the scheduled next charge window start time. If it is, the next charge window start time is updated and the new charge window starts. The time check and next window start time update is skipped while the scheme is deactivated by the watermarks. Let's suppose the deactivation is kept more than LONG_MAX jiffies (assuming CONFIG_HZ of 250, more than 99 days in 32 bit systems and more than one billion years in 64 bit systems), resulting in having the jiffies larger than the next charge window start time + LONG_MAX. Then, the time_after_eq() call can return false until another LONG_MAX jiffies are passed. This means the scheme can continue working after being reactivated by the watermarks. But, soon, the quota will be exceeded and the scheme will again effectively stop working until the next charge window starts. Because the current charge window is extended to up to LONG_MAX jiffies, however, it will look like it stopped unexpectedly and indefinitely, from the user's perspective. Fix this by using !time_in_range_open() instead. The issue was discovered [1] by sashiko. Link: https://lore.kernel.org/20260329152306.45796-1-sj@kernel.org Link: https://lore.kernel.org/20260324040722.57944-1-sj@kernel.org [1] Fixes: ee801b7dd782 ("mm/damon/schemes: activate schemes based on a watermarks mechanism") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 5.16.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18mm/damon/core: validate damos_quota_goal->nid for node_memcg_{used,free}_bpSeongJae Park-0/+7
Users can set damos_quota_goal->nid with arbitrary value for node_memcg_{used,free}_bp. But DAMON core is using those for NODE-DATA() without a validation of the value. This can result in out of bounds memory access. The issue can actually triggered using DAMON user-space tool (damo), like below. $ sudo mkdir /sys/fs/cgroup/foo $ sudo ./damo start --damos_action stat --damos_quota_interval 1s \ --damos_quota_goal node_memcg_used_bp 50% -1 /foo $ sudo dmseg [...] [ 524.181426] Unable to handle kernel paging request at virtual address 0000000000002c00 Fix this issue by adding the validation of the given node id. If an invalid node id is given, it returns 0% for used memory ratio, and 100% for free memory ratio. Link: https://lore.kernel.org/20260329043902.46163-3-sj@kernel.org Fixes: b74a120bcf50 ("mm/damon/core: implement DAMOS_QUOTA_NODE_MEMCG_USED_BP") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.19.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18mm/damon/core: validate damos_quota_goal->nid for node_mem_{used,free}_bpSeongJae Park-0/+12
Patch series "mm/damon/core: validate damos_quota_goal->nid". node_mem[cg]_{used,free}_bp DAMOS quota goals receive the node id. The node id is used for si_meminfo_node() and NODE_DATA() without proper validation. As a result, privileged users can trigger an out of bounds memory access using DAMON_SYSFS. Fix the issues. The issue was originally reported [1] with a fix by another author. The original author announced [2] that they will stop working including the fix that was still in the review stage. Hence I'm restarting this. This patch (of 2): Users can set damos_quota_goal->nid with arbitrary value for node_mem_{used,free}_bp. But DAMON core is using those for si_meminfo_node() without the validation of the value. This can result in out of bounds memory access. The issue can actually triggered using DAMON user-space tool (damo), like below. $ sudo ./damo start --damos_action stat \ --damos_quota_goal node_mem_used_bp 50% -1 \ --damos_quota_interval 1s $ sudo dmesg [...] [ 65.565986] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000098 Fix this issue by adding the validation of the given node. If an invalid node id is given, it returns 0% for used memory ratio, and 100% for free memory ratio. Link: https://lore.kernel.org/20260329043902.46163-2-sj@kernel.org Link: https://lore.kernel.org/20260325073034.140353-1-objecting@objecting.org [1] Link: https://lore.kernel.org/20260327040924.68553-1-sj@kernel.org [2] Fixes: 0e1c773b501f ("mm/damon/core: introduce damos quota goal metrics for memory node utilization") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.16.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18mm/damon/stat: fix memory leak on damon_start() failure in damon_stat_start()Jackie Liu-1/+4
Destroy the DAMON context and reset the global pointer when damon_start() fails. Otherwise, the context allocated by damon_stat_build_ctx() is leaked, and the stale damon_stat_context pointer will be overwritten on the next enable attempt, making the old allocation permanently unreachable. Link: https://lore.kernel.org/20260331101553.88422-1-liu.yun@linux.dev Fixes: 369c415e6073 ("mm/damon: introduce DAMON_STAT module") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.17.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18mm/damon/core: fix damos_walk() vs kdamond_fn() exit raceSeongJae Park-7/+14
When kdamond_fn() main loop is finished, the function cancels remaining damos_walk() request and unset the damon_ctx->kdamond so that API callers and API functions themselves can show the context is terminated. damos_walk() adds the caller's request to the queue first. After that, it shows if the kdamond of the damon_ctx is still running (damon_ctx->kdamond is set). Only if the kdamond is running, damos_walk() starts waiting for the kdamond's handling of the newly added request. The damos_walk() requests registration and damon_ctx->kdamond unset are protected by different mutexes, though. Hence, damos_walk() could race with damon_ctx->kdamond unset, and result in deadlocks. For example, let's suppose kdamond successfully finished the damow_walk() request cancelling. Right after that, damos_walk() is called for the context. It registers the new request, and shows the context is still running, because damon_ctx->kdamond unset is not yet done. Hence the damos_walk() caller starts waiting for the handling of the request. However, the kdamond is already on the termination steps, so it never handles the new request. As a result, the damos_walk() caller thread infinitely waits. Fix this by introducing another damon_ctx field, namely walk_control_obsolete. It is protected by the damon_ctx->walk_control_lock, which protects damos_walk() request registration. Initialize (unset) it in kdamond_fn() before letting damon_start() returns and set it just before the cancelling of the remaining damos_walk() request is executed. damos_walk() reads the obsolete field under the lock and avoids adding a new request. After this change, only requests that are guaranteed to be handled or cancelled are registered. Hence the after-registration DAMON context termination check is no longer needed. Remove it together. The issue is found by sashiko [1]. Link: https://lore.kernel.org/20260327233319.3528-3-sj@kernel.org Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1] Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.14.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18mm/damon/core: fix damon_call() vs kdamond_fn() exit raceSeongJae Park-31/+14
Patch series "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race". damon_call() and damos_walk() can leak memory and/or deadlock when they race with kdamond terminations. Fix those. This patch (of 2); When kdamond_fn() main loop is finished, the function cancels all remaining damon_call() requests and unset the damon_ctx->kdamond so that API callers and API functions themselves can know the context is terminated. damon_call() adds the caller's request to the queue first. After that, it shows if the kdamond of the damon_ctx is still running (damon_ctx->kdamond is set). Only if the kdamond is running, damon_call() starts waiting for the kdamond's handling of the newly added request. The damon_call() requests registration and damon_ctx->kdamond unset are protected by different mutexes, though. Hence, damon_call() could race with damon_ctx->kdamond unset, and result in deadlocks. For example, let's suppose kdamond successfully finished the damon_call() requests cancelling. Right after that, damon_call() is called for the context. It registers the new request, and shows the context is still running, because damon_ctx->kdamond unset is not yet done. Hence the damon_call() caller starts waiting for the handling of the request. However, the kdamond is already on the termination steps, so it never handles the new request. As a result, the damon_call() caller threads infinitely waits. Fix this by introducing another damon_ctx field, namely call_controls_obsolete. It is protected by the damon_ctx->call_controls_lock, which protects damon_call() requests registration. Initialize (unset) it in kdamond_fn() before letting damon_start() returns and set it just before the cancelling of remaining damon_call() requests is executed. damon_call() reads the obsolete field under the lock and avoids adding a new request. After this change, only requests that are guaranteed to be handled or cancelled are registered. Hence the after-registration DAMON context termination check is no longer needed. Remove it together. Note that the deadlock will not happen when damon_call() is called for repeat mode request. In tis case, damon_call() returns instead of waiting for the handling when the request registration succeeds and it shows the kdamond is running. However, if the request also has dealloc_on_cancel, the request memory would be leaked. The issue is found by sashiko [1]. Link: https://lore.kernel.org/20260327233319.3528-1-sj@kernel.org Link: https://lore.kernel.org/20260327233319.3528-2-sj@kernel.org Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1] Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.14.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-17Merge tag 'trace-v7.1' of ↵Linus Torvalds-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Fix printf format warning for bprintf sunrpc uses a trace_printk() that triggers a printf warning during the compile. Move the __printf() attribute around for when debugging is not enabled the warning will go away - Remove redundant check for EVENT_FILE_FL_FREED in event_filter_write() The FREED flag is checked in the call to event_file_file() and then checked again right afterward, which is unneeded - Clean up event_file_file() and event_file_data() helpers These helper functions played a different role in the past, but now with eventfs, the READ_ONCE() isn't needed. Simplify the code a bit and also add a warning to event_file_data() if the file or its data is not present - Remove updating file->private_data in tracing open All access to the file private data is handled by the helper functions, which do not use file->private_data. Stop updating it on open - Show ENUM names in function arguments via BTF in function tracing When showing the function arguments when func-args option is set for function tracing, if one of the arguments is found to be an enum, show the name of the enum instead of its number - Add new trace_call__##name() API for tracepoints Tracepoints are enabled via static_branch() blocks, where when not enabled, there's only a nop that is in the code where the execution will just skip over it. When tracing is enabled, the nop is converted to a direct jump to the tracepoint code. Sometimes more calculations are required to be performed to update the parameters of the tracepoint. In this case, trace_##name##_enabled() is called which is a static_branch() that gets enabled only when the tracepoint is enabled. This allows the extra calculations to also be skipped by the nop: if (trace_foo_enabled()) { x = bar(); trace_foo(x); } Where the x=bar() is only performed when foo is enabled. The problem with this approach is that there's now two static_branch() calls. One for checking if the tracepoint is enabled, and then again to know if the tracepoint should be called. The second one is redundant Introduce trace_call__foo() that will call the foo() tracepoint directly without doing a static_branch(): if (trace_foo_enabled()) { x = bar(); trace_call__foo(); } - Update various locations to use the new trace_call__##name() API - Move snapshot code out of trace.c Cleaning up trace.c to not be a "dump all", move the snapshot code out of it and into a new trace_snapshot.c file - Clean up some "%*.s" to "%*s" - Allow boot kernel command line options to be called multiple times Have options like: ftrace_filter=foo ftrace_filter=bar ftrace_filter=zoo Equal to: ftrace_filter=foo,bar,zoo - Fix ipi_raise event CPU field to be a CPU field The ipi_raise target_cpus field is defined as a __bitmask(). There is now a __cpumask() field definition. Update the field to use that - Have hist_field_name() use a snprintf() and not a series of strcat() It's safer to use snprintf() that a series of strcat() - Fix tracepoint regfunc balancing A tracepoint can define a "reg" and "unreg" function that gets called before the tracepoint is enabled, and after it is disabled respectively. But on error, after the "reg" func is called and the tracepoint is not enabled, the "unreg" function is not called to tear down what the "reg" function performed - Fix output that shows what histograms are enabled Event variables are displayed incorrectly in the histogram output Instead of "sched.sched_wakeup.$var", it is showing "$sched.sched_wakeup.var" where the '$' is in the incorrect location - Some other simple cleanups * tag 'trace-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (24 commits) selftests/ftrace: Add test case for fully-qualified variable references tracing: Fix fully-qualified variable reference printing in histograms tracepoint: balance regfunc() on func_add() failure in tracepoint_add_func() tracing: Rebuild full_name on each hist_field_name() call tracing: Report ipi_raise target CPUs as cpumask tracing: Remove duplicate latency_fsnotify() stub tracing: Preserve repeated trace_trigger boot parameters tracing: Append repeated boot-time tracing parameters tracing: Remove spurious default precision from show_event_trigger/filter formats cpufreq: Use trace_call__##name() at guarded tracepoint call sites tracing: Remove tracing_alloc_snapshot() when snapshot isn't defined tracing: Move snapshot code out of trace.c and into trace_snapshot.c mm: damon: Use trace_call__##name() at guarded tracepoint call sites btrfs: Use trace_call__##name() at guarded tracepoint call sites spi: Use trace_call__##name() at guarded tracepoint call sites i2c: Use trace_call__##name() at guarded tracepoint call sites kernel: Use trace_call__##name() at guarded tracepoint call sites tracepoint: Add trace_call__##name() API tracing: trace_mmap.h: fix a kernel-doc warning tracing: Pretty-print enum parameters in function arguments ...
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds-232/+470
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-06mm/damon/stat: deallocate damon_call() failure leaking damon_ctxSeongJae Park-0/+7
damon_stat_start() always allocates the module's damon_ctx object (damon_stat_context). Meanwhile, if damon_call() in the function fails, the damon_ctx object is not deallocated. Hence, if the damon_call() is failed, and the user writes Y to “enabled” again, the previously allocated damon_ctx object is leaked. This cannot simply be fixed by deallocating the damon_ctx object when damon_call() fails. That's because damon_call() failure doesn't guarantee the kdamond main function, which accesses the damon_ctx object, is completely finished. In other words, if damon_stat_start() deallocates the damon_ctx object after damon_call() failure, the not-yet-terminated kdamond could access the freed memory (use-after-free). Fix the leak while avoiding the use-after-free by keeping returning damon_stat_start() without deallocating the damon_ctx object after damon_call() failure, but deallocating it when the function is invoked again and the kdamond is completely terminated. If the kdamond is not yet terminated, simply return -EAGAIN, as the kdamond will soon be terminated. The issue was discovered [1] by sashiko. Link: https://lkml.kernel.org/r/20260402134418.74121-1-sj@kernel.org Link: https://lore.kernel.org/20260401012428.86694-1-sj@kernel.org [1] Fixes: 405f61996d9d ("mm/damon/stat: use damon_call() repeat mode instead of damon_callback") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.17.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-06mm/damon/sysfs: dealloc repeat_call_control if damon_call() failsSeongJae Park-1/+2
damon_call() for repeat_call_control of DAMON_SYSFS could fail if somehow the kdamond is stopped before the damon_call(). It could happen, for example, when te damon context was made for monitroing of a virtual address processes, and the process is terminated immediately, before the damon_call() invocation. In the case, the dyanmically allocated repeat_call_control is not deallocated and leaked. Fix the leak by deallocating the repeat_call_control under the damon_call() failure. This issue is discovered by sashiko [1]. Link: https://lkml.kernel.org/r/20260327003224.55752-1-sj@kernel.org Link: https://lore.kernel.org/20260320020630.962-1-sj@kernel.org [1] Fixes: 04a06b139ec0 ("mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/sysfs: check contexts->nr in repeat_call_fnJosh Law-0/+3
damon_sysfs_repeat_call_fn() calls damon_sysfs_upd_tuned_intervals(), damon_sysfs_upd_schemes_stats(), and damon_sysfs_upd_schemes_effective_quotas() without checking contexts->nr. If nr_contexts is set to 0 via sysfs while DAMON is running, these functions dereference contexts_arr[0] and cause a NULL pointer dereference. Add the missing check. For example, the issue can be reproduced using DAMON sysfs interface and DAMON user-space tool (damo) [1] like below. $ sudo damo start --refresh_interval 1s $ echo 0 | sudo tee \ /sys/kernel/mm/damon/admin/kdamonds/0/contexts/nr_contexts Link: https://patch.msgid.link/20260320163559.178101-3-objecting@objecting.org Link: https://lkml.kernel.org/r/20260321175427.86000-4-sj@kernel.org Link: https://github.com/damonitor/damo [1] Fixes: d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file internal work") Signed-off-by: Josh Law <objecting@objecting.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/sysfs: check contexts->nr before accessing contexts_arr[0]Josh Law-0/+3
Multiple sysfs command paths dereference contexts_arr[0] without first verifying that kdamond->contexts->nr == 1. A user can set nr_contexts to 0 via sysfs while DAMON is running, causing NULL pointer dereferences. In more detail, the issue can be triggered by privileged users like below. First, start DAMON and make contexts directory empty (kdamond->contexts->nr == 0). # damo start # cd /sys/kernel/mm/damon/admin/kdamonds/0 # echo 0 > contexts/nr_contexts Then, each of below commands will cause the NULL pointer dereference. # echo update_schemes_stats > state # echo update_schemes_tried_regions > state # echo update_schemes_tried_bytes > state # echo update_schemes_effective_quotas > state # echo update_tuned_intervals > state Guard all commands (except OFF) at the entry point of damon_sysfs_handle_cmd(). Link: https://lkml.kernel.org/r/20260321175427.86000-3-sj@kernel.org Fixes: 0ac32b8affb5 ("mm/damon/sysfs: support DAMOS stats") Signed-off-by: Josh Law <objecting@objecting.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [5.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/sysfs: fix param_ctx leak on damon_sysfs_new_test_ctx() failureJosh Law-1/+3
Patch series "mm/damon/sysfs: fix memory leak and NULL dereference issues", v4. DAMON_SYSFS can leak memory under allocation failure, and do NULL pointer dereference when a privileged user make wrong sequences of control. Fix those. This patch (of 3): When damon_sysfs_new_test_ctx() fails in damon_sysfs_commit_input(), param_ctx is leaked because the early return skips the cleanup at the out label. Destroy param_ctx before returning. Link: https://lkml.kernel.org/r/20260321175427.86000-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260321175427.86000-2-sj@kernel.org Fixes: f0c5118ebb0e ("mm/damon/sysfs: catch commit test ctx alloc failure") Signed-off-by: Josh Law <objecting@objecting.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: document damos_commit_dests() failure semanticsJosh Law-0/+17
Add a kernel-doc-like comment to damos_commit_dests() documenting its allocation failure contract: on -ENOMEM, the destination structure is left in a partially torn-down state that is safe to deallocate via damon_destroy_scheme(), but must not be reused for further commits. This was unclear from the code alone and led to a separate patch [1] attempting to reset nr_dests on failure. Make the intended usage explicit so future readers do not repeat the confusion. Link: https://lkml.kernel.org/r/20260320143648.91673-1-sj@kernel.org Link: https://lore.kernel.org/20260318214939.36100-1-objecting@objecting.org [1] Signed-off-by: Josh Law <objecting@objecting.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/lru_sort: respect addr_unit on default monitoring region setupSeongJae Park-6/+0
In the past, damon_set_region_biggest_system_ram_default(), which is the core function for setting the default monitoring target region of DAMON_LRU_SORT, didn't support addr_unit. Hence DAMON_LRU_SORT was silently ignoring the user input for addr_unit when the user doesn't explicitly set the monitoring target regions, and therefore the default target region is being used. No real problem from that ignorance was reported so far. But, the implicit rule is only making things confusing. Also, the default target region setup function is updated to support addr_unit. Hence there is no reason to keep ignoring it. Respect the user-passed addr_unit for the default target monitoring region use case. Link: https://lkml.kernel.org/r/20260311052927.93921-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/reclaim: respect addr_unit on default monitoring region setupSeongJae Park-6/+0
In the past, damon_set_region_biggest_system_ram_default(), which is the core function for setting the default monitoring target region of DAMON_RECLAIM, didn't support addr_unit. Hence DAMON_RECLAIM was silently ignoring the user input for addr_unit when the user doesn't explicitly set the monitoring target regions, and therefore the default target region is being used. No real problem from that ignorance was reported so far. But, the implicit rule is only making things confusing. Also, the default target region setup function is updated to support addr_unit. Hence there is no reason to keep ignoring it. Respect the user-passed addr_unit for the default target monitoring region use case. Link: https://lkml.kernel.org/r/20260311052927.93921-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: fix wrong damon_set_regions() argumentSeongJae Park-1/+1
The third argument is the length of the second parameter. But addr_unit is wrongly being passed. Fix it. Link: https://lkml.kernel.org/r/20260314001854.79623-1-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: receive addr_unit on ↵SeongJae Park-3/+6
damon_set_region_biggest_system_ram_default() damon_find_biggest_system_ram() was not supporting addr_unit in the past. Hence, its caller, damon_set_region_biggest_system_ram_default(), was also not supporting addr_unit. The previous commit has updated the inner function to support addr_unit. There is no more reason to not support addr_unit on damon_set_region_biggest_system_ram_default(). Rather, it makes unnecessary inconsistency on support of addr_unit. Update it to receive addr_unit and handle it inside. Link: https://lkml.kernel.org/r/20260311052927.93921-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: support addr_unit on damon_find_biggest_system_ram()SeongJae Park-11/+23
damon_find_biggest_system_ram() sets an 'unsigned long' variable with 'resource_size_t' value. This is fundamentally wrong. On environments such as ARM 32 bit machines having LPAE (Large Physical Address Extensions), which DAMON supports, the size of 'unsigned long' may be smaller than that of 'resource_size_t'. It is safe, though, since we restrict the walk to be done only up to ULONG_MAX. DAMON supports the address size gap using 'addr_unit'. We didn't add the support to the function, just to make the initial support change small. Now the support is reasonably settled. This kind of gap is only making the code inconsistent and easy to be confused. Add the support of 'addr_unit' to the function, by letting callers pass the 'addr_unit' and handling it in the function. All callers are passing 'addr_unit' 1, though, to keep the old behavior. [sj@kernel.org: verify found biggest system ram] Link: https://lkml.kernel.org/r/20260317144725.88524-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260311052927.93921-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: fix wrong end address assignment on walk_system_ram()SeongJae Park-1/+1
Patch series "mm/damon: support addr_unit on default monitoring targets for modules". DAMON_RECLAIM and DAMON_LRU_SORT support 'addr_unit' parameters only when the monitoring target address range is explicitly set. This was intentional for making the initial 'addr_unit' support change small. Now 'addr_unit' support is being quite stabilized. Having the corner case of the support is only making the code inconsistent with implicit rules. The inconsistency makes it easy to confuse [1] readers. After all, there is no real reason to keep 'addr_unit' support incomplete. Add the support for the case to improve the readability and more completely support 'addr_unit'. This series is constructed with five patches. The first one (patch 1) fixes a small bug that mistakenly assigns inclusive end address to open end address, which was found from this work. The second and third ones (patches 2 and 3) extend the default monitoring target setting functions in the core layer one by one, to support the 'addr_unit' while making no visible changes. The final two patches (patches 4 and 5) update DAMON_RECLAIM and DAMON_LRU_SORT to support 'addr_unit' for the default monitoring target address ranges, by passing the user input to the core functions. This patch (of 5): 'struct damon_addr_range' and 'struct resource' represent different types of address ranges. 'damon_addr_range' is for end-open ranges ([start, end)). 'resource' is for fully-closed ranges ([start, end]). But walk_system_ram() is assigning resource->end to damon_addr_range->end without the inclusiveness adjustment. As a result, the function returns an address range that is missing the last one byte. The function is being used to find and set the biggest system ram as the default monitoring target for DAMON_RECLAIM and DAMON_LRU_SORT. Missing the last byte of the big range shouldn't be a real problem for the real use cases. That said, the loss is definitely an unintended behavior. Do the correct adjustment. Link: https://lkml.kernel.org/r/20260311052927.93921-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260311052927.93921-2-sj@kernel.org Link: https://lore.kernel.org/20260131015643.79158-1-sj@kernel.org [1] Fixes: 43b0536cb471 ("mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Yang yingliang <yangyingliang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/tests/core-kunit: test goal_tuner commitSeongJae Park-0/+3
Extend damos_commit_quota() kunit test for the newly added goal_tuner parameter. Link: https://lkml.kernel.org/r/20260310010529.91162-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/sysfs-schemes: implement quotas->goal_tuner fileSeongJae Park-0/+58
Add a new DAMON sysfs interface file, namely 'goal_tuner' under the DAMOS quotas directory. It is connected to the damos_quota->goal_tuner field. Users can therefore select their favorite goal-based quotas tuning algorithm by writing the name of the tuner to the file. Reading the file returns the name of the currently selected tuner. Link: https://lkml.kernel.org/r/20260310010529.91162-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: introduce DAMOS_QUOTA_GOAL_TUNER_TEMPORALSeongJae Park-5/+24
Introduce a new goal-based DAMOS quota auto-tuning algorithm, namely DAMOS_QUOTA_GOAL_TUNER_TEMPORAL (temporal in short). The algorithm aims to trigger the DAMOS action only for a temporal time, to achieve the goal as soon as possible. For the temporal period, it uses as much quota as allowed. Once the goal is achieved, it sets the quota zero, so effectively makes the scheme be deactivated. Link: https://lkml.kernel.org/r/20260310010529.91162-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: allow quota goals set zero effective size quotaSeongJae Park-5/+26
User-explicit quotas (size and time quotas) having zero value means the quotas are unset. And, effective size quota is set as the minimum value of the explicit quotas. When quota goals are set, the goal-based quota tuner can make it lower. But the existing only single tuner never sets the effective size quota zero. Because of the fact, DAMON core assumes zero effective quota means the user has set no quota. Multiple tuners are now allowed, though. In the future, some tuners might want to set a zero effective size quota. There is no reason to restrict that. Meanwhile, because of the current implementation, it will only deactivate all quotas and make the scheme work at its full speed. Introduce a dedicated function for checking if no quota is set. The function checks the fact by showing if the user-set explicit quotas are zero and no goal is installed. It is decoupled from zero effective quota, and hence allows future tuners set zero effective quota for intentionally deactivating the scheme by a purpose. Link: https://lkml.kernel.org/r/20260310010529.91162-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: introduce damos_quota_goal_tunerSeongJae Park-0/+1
Patch series "mm/damon: support multiple goal-based quota tuning algorithms". Aim-oriented DAMOS quota auto-tuning uses a single tuning algorithm. The algorithm is designed to find a quota value that should be consistently kept for achieving the aimed goal for long term. It is useful and reliable at automatically operating systems that have dynamic environments in the long term. As always, however, no single algorithm fits all. When the environment has static characteristics or there are control towers in not only the kernel space but also the user space, the algorithm shows some limitations. In such environments, users want kernel work in a more short term deterministic way. Actually there were at least two reports [1,2] of such cases. Extend DAMOS quotas goal to support multiple quota tuning algorithms that users can select. Keep the current algorithm as the default one, to not break the old users. Also give it a name, "consist", as it is designed to "consistently" apply the DAMOS action. And introduce a new tuning algorithm, namely "temporal". It is designed to apply the DAMOS action only temporally, in a deterministic way. In more detail, as long as the goal is under-achieved, it uses the maximum quota available. Once the goal is over-achieved, it sets the quota zero. Tests ===== I confirmed the feature is working as expected using the latest version of DAMON user-space tool, like below. $ # start DAMOS for reclaiming memory aiming 30% free memory $ sudo ./damo/damo start --damos_action pageout \ --damos_quota_goal_tuner temporal \ --damos_quota_goal node_mem_free_bp 30% 0 \ --damos_quota_interval 1s \ --damos_quota_space 100M Note that >=3.1.8 version of DAMON user-space tool supports this feature (--damos_quota_goal_tuner). As expected, DAMOS stops reclaiming memory as soon as the goal amount of free memory is made. When 'consist' tuner is used, the reclamation was continued even after the goal amount of free memory is made, resulting in more than goal amount of free memory, as expected. Patch Sequence ============== First four patches implement the features. Patch 1 extends core API to allow multiple tuners and make the current tuner as the default and only available tuner, namely 'consist'. Patch 2 allows future tuners setting zero effective quota. Patch 3 introduces the second tuner, namely 'temporal'. Patch 4 further extends DAMON sysfs API to let users use that. Three following patches (patches 5-7) update design, usage, and ABI documents, respectively. Final four patches (patches 8-11) are for adding tests. The eighth patch (patch 8) extends the kunit test for online parameters commit for validating the goal_tuner. The ninth and the tenth patches (patches 9-10) extend the testing-purpose DAMON sysfs control helper and DAMON status dumping tool to support the newly added feature. The final eleventh one (patch 11) extends the existing online commit selftest to cover the new feature. This patch (of 11): DAMOS quota goal feature utilizes a single feedback loop based algorithm for automatic tuning of the effective quota. It is useful in dynamic environments that operate systems with only kernels in the long term. But, no one fits all. It is not very easy to control in environments having more controlled characteristics and user-space control towers. We actually got multiple reports [1,2] of use cases that the algorithm is not optimal. Introduce a new field of 'struct damos_quotas', namely 'goal_tuner'. It specifies what tuning algorithm the given scheme should use, and allows DAMON API callers to set it as they want. Nonetheless, this commit introduces no new tuning algorithm but only the interface. This commit hence makes no behavioral change. A new algorithm will be added by the following commit. Link: https://lkml.kernel.org/r/20260310010529.91162-2-sj@kernel.org Link: https://lore.kernel.org/CALa+Y17__d=ZsM1yX+MXx0ozVdsXnFqF4p0g+kATEitrWyZFfg@mail.gmail.com [1] Link: https://lore.kernel.org/20260204022537.814-1-yunjeong.mun@sk.com [2] Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: clarify damon_set_attrs() usagesSeongJae Park-2/+10
damon_set_attrs() is called for multiple purposes from multiple places. Calling it in an unsafe context can make DAMON internal state polluted and results in unexpected behaviors. Clarify when it is safe, and where it is being called. Link: https://lkml.kernel.org/r/20260307195356.203753-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Acked-by: wang lian <lianux.mm@gmail.com> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/tests/core-kunit: add a test for damon_is_last_region()SeongJae Park-0/+23
There was a bug [1] in damon_is_last_region(). Add a kunit test to not reintroduce the bug. Link: https://lkml.kernel.org/r/20260307195356.203753-3-sj@kernel.org Link: https://lore.kernel.org/20260114152049.99727-1-sj@kernel.org/ [1] Signed-off-by: SeongJae Park <sj@kernel.org> Tested-by: wang lian <lianux.mm@gmail.com> Reviewed-by: wang lian <lianux.mm@gmail.com> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: use mult_frac()SeongJae Park-10/+10
Patch series "mm/damon: improve/fixup/update ratio calculation, test and documentation". Yet another batch of misc/minor improvements and fixups. Use mult_frac() instead of the worse open-coding for rate calculations (patch 1). Add a test for a previously found and fixed bug (patch 2). Improve and update comments and documentations for easier code review and up-to-date information (patches 3-6). Finally, fix an obvious typo (patch 7). This patch (of 7): There are multiple places in core code that do open-code rate calculations. Use mult_frac(), which is developed for doing that in a way more safe from overflow and precision loss. Link: https://lkml.kernel.org/r/20260307195356.203753-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260307195356.203753-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Acked-by: wang lian <lianux.mm@gmail.com> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: use time_after_eq() in kdamond_fn()SeongJae Park-5/+9
damon_ctx->passed_sample_intervals and damon_ctx->next_*_sis are unsigned long. Those are compared in kdamond_fn() using normal comparison operators. It is unsafe from overflow. Use time_after_eq(), which is safe from overflows when correctly used, instead. Link: https://lkml.kernel.org/r/20260307194915.203169-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: use time_before() for next_apply_sisSeongJae Park-3/+3
damon_ctx->passed_sample_intervals and damos->next_apply_sis are unsigned long, and compared via normal comparison operators. It is unsafe from overflow. Use time_before(), which is safe from overflow when correctly used, instead. Link: https://lkml.kernel.org/r/20260307194915.203169-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: remove damos_set_next_apply_sis() duplicatesSeongJae Park-9/+2
Patch series "mm/damon/core: make passed_sample_intervals comparisons overflow-safe". DAMON accounts time using its own jiffies-like time counter, namely damon_ctx->passed_sample_intervals. The counter is incremented on each iteration of kdamond_fn() main loop, which sleeps at least one sample interval. Hence the name is like that. DAMON has time-periodic operations including monitoring results aggregation and DAMOS action application. DAMON sets the next time to do each of such operations in the passed_sample_intervals unit. And it does the operation when the counter becomes the same to or larger than the pre-set values, and update the next time for the operation. Note that the operation is done not only when the values exactly match but also when the time is passed, because the values can be updated for online-committed DAMON parameters. The counter is 'unsigned long' type, and the comparison is done using normal comparison operators. It is not safe from overflows. This can cause rare and limited but odd situations. Let's suppose there is an operation that should be executed every 20 sampling intervals, and the passed_sample_intervals value for next execution of the operation is ULONG_MAX - 3. Once the passed_sample_intervals reaches ULONG_MAX - 3, the operation will be executed, and the next time value for doing the operation becomes 17 (ULONG_MAX - 3 + 20), since overflow happens. In the next iteration of the kdamond_fn() main loop, passed_sample_intervals is larger than the next operation time value, so the operation will be executed again. It will continue executing the operation for each iteration, until the passed_sample_intervals also overflows. Note that this will not be common and problematic in the real world. The sampling interval, which takes for each passed_sample_intervals increment, is 5 ms by default. And it is usually [auto-]tuned for hundreds of milliseconds. That means it takes about 248 days or 4,971 days to have the overflow on 32 bit machines when the sampling interval is 5 ms and 100 ms, respectively (1<<32 * sampling_interval_in_seconds / 3600 / 24). On 64 bit machines, the numbers become 2924712086.77536 and 58494241735.5072 years. So the real user impact is negligible. But still this is better to be fixed as long as the fix is simple and efficient. Fix this by simply replacing the overflow-unsafe native comparison operators with the existing overflow-safe time comparison helpers. The first patch only cleans up the next DAMOS action application time setup for consistency and reduced code. The second and the third patches update DAMOS action application time setup and rest, respectively. This patch (of 3): There is a function for damos->next_apply_sis setup. But some places are open-coding it. Consistently use the helper. Link: https://lkml.kernel.org/r/20260307194915.203169-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/tests/core-kunit: add a test for damon_commit_ctx()SeongJae Park-0/+22
Patch series "mm/damon: test and document power-of-2 min_region_sz requirement". Since commit c80f46ac228b ("mm/damon/core: disallow non-power of two min_region_sz"), min_region_sz is always restricted to be a power of two. Add a kunit test to confirm the functionality. Also, the change adds a restriction to addr_unit parameter. Clarify it on the document. This patch (of 2): Add a kunit test for confirming the change that is made on commit c80f46ac228b ("mm/damon/core: disallow non-power of two min_region_sz") functions as expected. Link: https://lkml.kernel.org/r/20260307194222.202075-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/tests/.kunitconifg: enable DAMON_DEBUG_SANITYSeongJae Park-0/+3
CONFIG_DAMON_DEBUG_SANITY is recommended for DAMON development and test setups. Enable it on the default configurations for DAMON kunit test run. Link: https://lkml.kernel.org/r/20260306152914.86303-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: add damon_reset_aggregated() debug_sanity checkSeongJae Park-0/+18
At time of damon_reset_aggregated(), aggregation of the interval should be completed, and hence nr_accesses and nr_accesses_bp should match. I found a few bugs caused it to be broken in the past, from online parameters update and complicated nr_accesses handling changes. Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY. Link: https://lkml.kernel.org/r/20260306152914.86303-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: add damon_split_region_at() debug_sanity checkSeongJae Park-0/+16
damon_split_region_at() should be called with the correct address to split on. Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY. Link: https://lkml.kernel.org/r/20260306152914.86303-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: add damon_merge_regions_of() debug_sanity checkSeongJae Park-0/+15
damon_merge_regions_of() should be called only after aggregation is finished and therefore each region's nr_accesses and nr_accesses_bp match. There were bugs that broke the assumption, during development of online DAMON parameter updates and monitoring results handling changes. Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY. Link: https://lkml.kernel.org/r/20260306152914.86303-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: add damon_merge_two_regions() debug_sanity checkSeongJae Park-0/+16
A data corruption could cause damon_merge_two_regions() creating zero length DAMON regions. Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY. Link: https://lkml.kernel.org/r/20260306152914.86303-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: add damon_nr_regions() debug_sanity checkSeongJae Park-0/+19
damon_target->nr_regions is introduced to get the number quickly without having to iterate regions always. Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY. Link: https://lkml.kernel.org/r/20260306152914.86303-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: add damon_del_region() debug_sanity checkSeongJae Park-0/+13
damon_del_region() should be called for targets that have one or more regions. Add a sanity check for that under CONFIG_DAMON_DEBUG_SANITY. Link: https://lkml.kernel.org/r/20260306152914.86303-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: add damon_new_region() debug_sanity checkSeongJae Park-0/+12
damon_new_region() is supposed to be called with only valid address range arguments. Do the check under DAMON_DEBUG_SANITY. Link: https://lkml.kernel.org/r/20260306152914.86303-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon: add CONFIG_DAMON_DEBUG_SANITYSeongJae Park-0/+11
Patch series "mm/damon: add optional debugging-purpose sanity checks". DAMON code has a few assumptions that can be critical if violated. Validating the assumptions in code can be useful at finding such critical bugs. I was actually adding some such additional sanity checks in my personal tree, and those were useful at finding bugs that I made during the development of new patches. We also found [1] sometimes the assumptions are misunderstood. The validation can work as good documentation for such cases. Add some of such debugging purpose sanity checks. Because those additional checks can impose more overhead, make those only optional via new config, CONFIG_DAMON_DEBUG_SANITY, that is recommended for only development and test setups. And as recommended, enable it for DAMON kunit tests and selftests. Note that the verification only WARN_ON() for each of the insanity. The developer or tester may better to set panic_on_oops together, like damon-tests/corr did [2]. This patch (of 10): Add a new build config that will enable additional DAMON sanity checks. It is recommended to be enabled on only development and test setups, since it can impose additional overhead. Link: https://lkml.kernel.org/r/20260306152914.86303-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260306152914.86303-2-sj@kernel.org Link: https://lore.kernel.org/20251231070029.79682-1-sj@kernel.org [1] Link: https://github.com/damonitor/damon-tests/commit/a80fbee55e272f151b4e5809ee85898aea33e6ff [2] Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/test/core-kunit: add damon_apply_min_nr_regions() testSeongJae Park-0/+52
Add a kunit test for the functionality of damon_apply_min_nr_regions(). Link: https://lkml.kernel.org/r/20260228222831.7232-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/vaddr: do not split regions for min_nr_regionsSeongJae Park-144/+2
The previous commit made DAMON core split regions at the beginning for min_nr_regions. The virtual address space operation set (vaddr) does similar work on its own, for a case user delegates entire initial monitoring regions setup to vaddr. It is unnecessary now, as DAMON core will do similar work for any case. Remove the duplicated work in vaddr. Also, remove a helper function that was being used only for the work, and the test code of the helper function. Link: https://lkml.kernel.org/r/20260228222831.7232-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: split regions for min_nr_regionsSeongJae Park-6/+39
Patch series "mm/damon: strictly respect min_nr_regions". DAMON core respects min_nr_regions only at merge operation. DAMON API callers are therefore responsible to respect or ignore that. Only vaddr ops is respecting that, but only for initial start time. DAMON sysfs interface allows users to setup the initial regions that DAMON core also respects. But, again, it works for only the initial time. Users setting the regions for min_nr_regions can be difficult and inefficient, when the min_nr_regions value is high. There was actually a report [1] from a user. The use case was page granular access monitoring with a large aggregation interval. Make the following three changes for resolving the issue. First (patch 1), make DAMON core split regions at the beginning and every aggregation interval, to respect the min_nr_regions. Second (patch 2), drop the vaddr's split operations and related code that are no more needed. Third (patch 3), add a kunit test for the newly introduced function. This patch (of 3): DAMON core layer respects the min_nr_regions parameter by setting the maximum size of each region as total monitoring region size divided by the parameter value. And the limit is applied by preventing merge of regions that result in a region larger than the maximum size. The limit is updated per ops update interval, because vaddr updates the monitoring regions on the ops update callback. It does nothing for the beginning state. That's because the users can set the initial monitoring regions as they want. That is, if the users really care about the min_nr_regions, they are supposed to set the initial monitoring regions to have more than min_nr_regions regions. The virtual address space operation set, vaddr, has an exceptional case. Users can ask the ops set to configure the initial regions on its own. For the case, vaddr sets up the initial regions to meet the min_nr_regions. So, vaddr has exceptional support, but basically users are required to set the regions on their own if they want min_nr_regions to be respected. When 'min_nr_regions' is high, such initial setup is difficult. If DAMON sysfs interface is used for that, the memory for saving the initial setup is also a waste. Even if the user forgives the setup, DAMON will eventually make more than min_nr_regions regions by splitting operations. But it will take time. If the aggregation interval is long, the delay could be problematic. There was actually a report [1] of the case. The reporter wanted to do page granular monitoring with a large aggregation interval. Also, DAMON is doing nothing for online changes on monitoring regions and min_nr_regions. For example, the user can remove a monitoring region or increase min_nr_regions while DAMON is running. Split regions larger than the size at the beginning of the kdamond main loop, to fix the initial setup issue. Also do the split every aggregation interval, for online changes. This means the behavior is slightly changed. It is difficult to imagine a use case that actually depends on the old behavior, though. So this change is arguably fine. Note that the size limit is aligned by damon_ctx->min_region_sz and cannot be zero. That is, if min_nr_region is larger than the total size of monitoring regions divided by ->min_region_sz, that cannot be respected. Link: https://lkml.kernel.org/r/20260228222831.7232-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260228222831.7232-2-sj@kernel.org Link: https://lore.kernel.org/CAC5umyjmJE9SBqjbetZZecpY54bHpn2AvCGNv3aF6J=1cfoPXQ@mail.gmail.com [1] Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: do non-safe region walk on kdamond_apply_schemes()SeongJae Park-11/+11
kdamond_apply_schemes() is using damon_for_each_region_safe(), which is safe for deallocation of the region inside the loop. However, the loop internal logic does not deallocate regions. Hence it is only wasting the next pointer. Also, it causes a problem. When an address filter is applied, and there is a region that intersects with the filter, the filter splits the region on the filter boundary. The intention is to let DAMOS apply action to only filtered-in address ranges. However, it is using damon_for_each_region_safe(), which sets the next region before the execution of the iteration. Hence, the region that split and now will be next to the previous region, is simply ignored. As a result, DAMOS applies the action to target regions bit slower than expected, when the address filter is used. Shouldn't be a big problem but definitely better to be fixed. damos_skip_charged_region() was working around the issue using a double pointer hack. Use damon_for_each_region(), which is safe for this use case. And drop the work around in damos_skip_charged_region(). Link: https://lkml.kernel.org/r/20260227170623.95384-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon/core: set quota-score histogram with core filtersSeongJae Park-0/+2
Patch series "mm/damon/core: improve DAMOS quota efficiency for core layer filters". Improve two below problematic behaviors of DAMOS that makes it less efficient when core layer filters are used. DAMOS generates the under-quota regions prioritization-purpose access temperature histogram [1] with only the scheme target access pattern. The DAMOS filters are ignored on the histogram, and this can result in the scheme not applied to eligible regions. For working around this, users had to use separate DAMON contexts. The memory tiering approaches are such examples. DAMOS splits regions that intersect with address filters, so that only filtered-out part of the region is skipped. But, the implementation is skipping the other part of the region that is not filtered out, too. As a result, DAMOS can work slower than expected. Improve the two inefficient behaviors with two patches, respectively. Read the patches for more details about the problem and how those are fixed. This patch (of 2): The histogram for under-quota region prioritization [1] is made for all regions that are eligible for the DAMOS target access pattern. When there are DAMOS filters, the prioritization-threshold access temperature that generated from the histogram could be inaccurate. For example, suppose there are three regions. Each region is 1 GiB. The access temperature of the regions are 100, 50, and 0. And a DAMOS scheme that targets _any_ access temperature with quota 2 GiB is being used. The histogram will look like below: temperature size of regions having >=temperature temperature 0 3 GiB 50 2 GiB 100 1 GiB Based on the histogram and the quota (2 GiB), DAMOS applies the action to only the regions having >=50 temperature. This is all good. Let's suppose the region of temperature 50 is excluded by a DAMOS filter. Regardless of the filter, DAMOS will try to apply the action on only regions having >=50 temperature. Because the region of temperature 50 is filtered out, the action is applied to only the region of temperature 100. Worse yet, suppose the filter is excluding regions of temperature 50 and 100. Then no action is really applied to any region, while the region of temperature 0 is there. People used to work around this by utilizing multiple contexts, instead of the core layer DAMOS filters. For example, DAMON-based memory tiering approaches including the quota auto-tuning based one [2] are using a DAMON context per NUMA node. If the above explained issue is effectively alleviated, those can be configured again to run with single context and DAMOS filters for applying the promotion and demotion to only specific NUMA nodes. Alleviate the problem by checking core DAMOS filters when generating the histogram. The reason to check only core filters is the overhead. While core filters are usually for coarse-grained filtering (e.g., target/address filters for process, NUMA, zone level filtering), operation layer filters are usually for fine-grained filtering (e.g., for anon page). Doing this for operation layer filters would cause significant overhead. There is no known use case that is affected by the operation layer filters-distorted histogram problem, though. Do this for only core filters for now. We will revisit this for operation layer filters in future. We might be able to apply a sort of sampling based operation layer filtering. After this fix is applied, for the first case that there is a DAMOS filter excluding the region of temperature 50, the histogram will be like below: temperature size of regions having >=temperature temperature 0 2 GiB 100 1 GiB And DAMOS will set the temperature threshold as 0, allowing both regions of temperatures 0 and 100 be applied. For the second case that there is a DAMOS filter excluding the regions of temperature 50 and 100, the histogram will be like below: temperature size of regions having >=temperature temperature 0 1 GiB And DAMOS will set the temperature threshold as 0, allowing the region of temperature 0 be applied. [1] 'Prioritization' section of Documentation/mm/damon/design.rst [2] commit 0e1c773b501f ("mm/damon/core: introduce damos quota goal metrics for memory node utilization") Link: https://lkml.kernel.org/r/20260227170623.95384-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260227170623.95384-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/damon: remove unused target param of get_scheme_score()Asier Gutierrez-9/+7
damon_target is not used by get_scheme_score operations, nor with virtual neither with physical addresses. Link: https://lkml.kernel.org/r/20260213145032.1740407-1-gutierrez.asier@huawei-partners.com Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Quanmin Yan <yanquanmin1@huawei.com> Cc: ze zuo <zuoze1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27mm/damon/sysfs: check contexts->nr in repeat_call_fnJosh Law-0/+3
damon_sysfs_repeat_call_fn() calls damon_sysfs_upd_tuned_intervals(), damon_sysfs_upd_schemes_stats(), and damon_sysfs_upd_schemes_effective_quotas() without checking contexts->nr. If nr_contexts is set to 0 via sysfs while DAMON is running, these functions dereference contexts_arr[0] and cause a NULL pointer dereference. Add the missing check. For example, the issue can be reproduced using DAMON sysfs interface and DAMON user-space tool (damo) [1] like below. $ sudo damo start --refresh_interval 1s $ echo 0 | sudo tee \ /sys/kernel/mm/damon/admin/kdamonds/0/contexts/nr_contexts Link: https://patch.msgid.link/20260320163559.178101-3-objecting@objecting.org Link: https://lkml.kernel.org/r/20260321175427.86000-4-sj@kernel.org Link: https://github.com/damonitor/damo [1] Fixes: d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file internal work") Signed-off-by: Josh Law <objecting@objecting.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27mm/damon/sysfs: check contexts->nr before accessing contexts_arr[0]Josh Law-0/+3
Multiple sysfs command paths dereference contexts_arr[0] without first verifying that kdamond->contexts->nr == 1. A user can set nr_contexts to 0 via sysfs while DAMON is running, causing NULL pointer dereferences. In more detail, the issue can be triggered by privileged users like below. First, start DAMON and make contexts directory empty (kdamond->contexts->nr == 0). # damo start # cd /sys/kernel/mm/damon/admin/kdamonds/0 # echo 0 > contexts/nr_contexts Then, each of below commands will cause the NULL pointer dereference. # echo update_schemes_stats > state # echo update_schemes_tried_regions > state # echo update_schemes_tried_bytes > state # echo update_schemes_effective_quotas > state # echo update_tuned_intervals > state Guard all commands (except OFF) at the entry point of damon_sysfs_handle_cmd(). Link: https://lkml.kernel.org/r/20260321175427.86000-3-sj@kernel.org Fixes: 0ac32b8affb5 ("mm/damon/sysfs: support DAMOS stats") Signed-off-by: Josh Law <objecting@objecting.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [5.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>