From c82389947d90c8b0dd206d9ef082191eb58df8fc Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Thu, 11 Apr 2024 12:20:57 +0200 Subject: tracing: Add sched_prepare_exec tracepoint Add "sched_prepare_exec" tracepoint, which is run right after the point of no return but before the current task assumes its new exec identity. Unlike the tracepoint "sched_process_exec", the "sched_prepare_exec" tracepoint runs before flushing the old exec, i.e. while the task still has the original state (such as original MM), but when the new exec either succeeds or crashes (but never returns to the original exec). Being able to trace this event can be helpful in a number of use cases: * allowing tracing eBPF programs access to the original MM on exec, before current->mm is replaced; * counting exec in the original task (via perf event); * profiling flush time ("sched_prepare_exec" to "sched_process_exec"). Example of tracing output: $ cat /sys/kernel/debug/tracing/trace_pipe <...>-379 [003] ..... 179.626921: sched_prepare_exec: interp=/usr/bin/sshd filename=/usr/bin/sshd pid=379 comm=sshd <...>-381 [002] ..... 180.048580: sched_prepare_exec: interp=/bin/bash filename=/bin/bash pid=381 comm=sshd <...>-385 [001] ..... 180.068277: sched_prepare_exec: interp=/usr/bin/tty filename=/usr/bin/tty pid=385 comm=bash <...>-389 [006] ..... 192.020147: sched_prepare_exec: interp=/usr/bin/dmesg filename=/usr/bin/dmesg pid=389 comm=bash Signed-off-by: Marco Elver Acked-by: Steven Rostedt (Google) Reviewed-by: Masami Hiramatsu (Google) Link: https://lore.kernel.org/r/20240411102158.1272267-1-elver@google.com Signed-off-by: Kees Cook --- fs/exec.c | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'fs/exec.c') diff --git a/fs/exec.c b/fs/exec.c index cf1df7f16e55..b3c40fbb325f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1267,6 +1267,14 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) return retval; + /* + * This tracepoint marks the point before flushing the old exec where + * the current task is still unchanged, but errors are fatal (point of + * no return). The later "sched_process_exec" tracepoint is called after + * the current task has successfully switched to the new exec. + */ + trace_sched_prepare_exec(current, bprm); + /* * Ensure all future errors are fatal. */ -- cgit v1.2.3 From 3a9e567ca45fb5280065283d10d9a11f0db61d2b Mon Sep 17 00:00:00 2001 From: Jinjiang Tu Date: Thu, 28 Mar 2024 19:10:08 +0800 Subject: mm/ksm: fix ksm exec support for prctl Patch series "mm/ksm: fix ksm exec support for prctl", v4. commit 3c6f33b7273a ("mm/ksm: support fork/exec for prctl") inherits MMF_VM_MERGE_ANY flag when a task calls execve(). However, it doesn't create the mm_slot, so ksmd will not try to scan this task. The first patch fixes the issue. The second patch refactors to prepare for the third patch. The third patch extends the selftests of ksm to verfity the deduplication really happens after fork/exec inherits ths KSM setting. This patch (of 3): commit 3c6f33b7273a ("mm/ksm: support fork/exec for prctl") inherits MMF_VM_MERGE_ANY flag when a task calls execve(). Howerver, it doesn't create the mm_slot, so ksmd will not try to scan this task. To fix it, allocate and add the mm_slot to ksm_mm_head in __bprm_mm_init() when the mm has MMF_VM_MERGE_ANY flag. Link: https://lkml.kernel.org/r/20240328111010.1502191-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20240328111010.1502191-2-tujinjiang@huawei.com Fixes: 3c6f33b7273a ("mm/ksm: support fork/exec for prctl") Signed-off-by: Jinjiang Tu Reviewed-by: David Hildenbrand Cc: Johannes Weiner Cc: Kefeng Wang Cc: Nanyong Sun Cc: Rik van Riel Cc: Stefan Roesch Signed-off-by: Andrew Morton --- fs/exec.c | 11 +++++++++++ include/linux/ksm.h | 13 +++++++++++++ 2 files changed, 24 insertions(+) (limited to 'fs/exec.c') diff --git a/fs/exec.c b/fs/exec.c index cf1df7f16e55..0c5f06d08c35 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -67,6 +67,7 @@ #include #include #include +#include #include #include @@ -267,6 +268,14 @@ static int __bprm_mm_init(struct linux_binprm *bprm) goto err_free; } + /* + * Need to be called with mmap write lock + * held, to avoid race with ksmd. + */ + err = ksm_execve(mm); + if (err) + goto err_ksm; + /* * Place the stack at the largest stack address the architecture * supports. Later, we'll move this to an appropriate place. We don't @@ -288,6 +297,8 @@ static int __bprm_mm_init(struct linux_binprm *bprm) bprm->p = vma->vm_end - sizeof(void *); return 0; err: + ksm_exit(mm); +err_ksm: mmap_write_unlock(mm); err_free: bprm->vma = NULL; diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 401348e9f92b..7e2b1de3996a 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -59,6 +59,14 @@ static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) return 0; } +static inline int ksm_execve(struct mm_struct *mm) +{ + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags)) + return __ksm_enter(mm); + + return 0; +} + static inline void ksm_exit(struct mm_struct *mm) { if (test_bit(MMF_VM_MERGEABLE, &mm->flags)) @@ -107,6 +115,11 @@ static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) return 0; } +static inline int ksm_execve(struct mm_struct *mm) +{ + return 0; +} + static inline void ksm_exit(struct mm_struct *mm) { } -- cgit v1.2.3