| Age | Commit message (Collapse) | Author | Files | Lines |
|
Zero is a valid value for "preempt_dynamic_mode", namely
"preempt_dynamic_none".
Fix the off-by-one in preempt_model_str(), so that "preempty_dynamic_none"
is correctly formatted as PREEMPT(none) instead of PREEMPT(undef).
Fixes: 8bdc5daaa01e ("sched: Add a generic function to return the preemption string")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20250626-preempt-str-none-v2-1-526213b70a89@linutronix.de
|
|
Prerequisite for ARM[64] generic entry conversion
Merge it into the entry branch so further changes can be based on it.
|
|
Currently CONFIG_GENERIC_ENTRY enables both the generic exception
entry logic and the generic syscall entry logic, which are otherwise
loosely coupled.
Introduce separate config options for these so that architectures can
select the two independently. This will make it easier for
architectures to migrate to generic entry code.
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/20250213130007.1418890-2-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20250624-generic-entry-split-v1-1-53d5ef4f94df@linaro.org
[Linus Walleij: rebase onto v6.16-rc1]
|
|
commit 3172fb986666 ("perf/core: Fix WARN in perf_cgroup_switch()") try to
fix a concurrency problem between perf_cgroup_switch and
perf_cgroup_event_disable. But it does not to move the WARN_ON_ONCE into
lock-protected region, so the warning is still be triggered.
Fixes: 3172fb986666 ("perf/core: Fix WARN in perf_cgroup_switch()")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250626135403.2454105-1-luogengkun@huaweicloud.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- Make sure an AUX perf event is really disabled when it overruns
* tag 'perf_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Fix pending disable flow when the AUX ring buffer overruns
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
- Fix possible UAF on error path in filter_free_subsystem_filters()
When freeing a subsystem filter, the filter for the subsystem is
passed in to be freed and all the events within the subsystem will
have their filter freed too. In order to free without waiting for RCU
synchronization, list items are allocated to hold what is going to be
freed to free it via a call_rcu(). If the allocation of these items
fails, it will call the synchronization directly and free after that
(causing a bit of delay for the user).
The subsystem filter is first added to this list and then the filters
for all the events under the subsystem. The bug is if one of the
allocations of the list items for the event filters fail to allocate,
it jumps to the "free_now" label which will free the subsystem
filter, then all the items on the allocated list, and then the event
filters that were not added to the list yet. But because the
subsystem filter was added first, it gets freed twice.
The solution is to add the subsystem filter after the events, and
then if any of the allocations fail it will not try to free any of
them twice
* tag 'trace-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix filter logic error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes.
6 are cc:stable and the remainder address post-6.15 issues or aren't
considered necessary for -stable kernels. 5 are for MM"
* tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add Lorenzo as THP co-maintainer
mailmap: update Duje Mihanović's email address
selftests/mm: fix validate_addr() helper
crashdump: add CONFIG_KEYS dependency
mailmap: correct name for a historical account of Zijun Hu
mailmap: add entries for Zijun Hu
fuse: fix runtime warning on truncate_folio_batch_exceptionals()
scripts/gdb: fix dentry_name() lookup
mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write
mm/alloc_tag: fix the kmemleak false positive issue in the allocation of the percpu variable tag->counters
lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly()
mm/hugetlb: remove unnecessary holding of hugetlb_lock
MAINTAINERS: add missing files to mm page alloc section
MAINTAINERS: add tree entry to mm init block
mm: add OOM killer maintainer structure
fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folio
|
|
If the processing of the tr->events loop fails, the filter that has been
added to filter_head will be released twice in free_filter_list(&head->rcu)
and __free_filter(filter).
After adding the filter of tr->events, add the filter to the filter_head
process to avoid triggering uaf.
Link: https://lore.kernel.org/tencent_4EF87A626D702F816CD0951CE956EC32CD0A@qq.com
Fixes: a9d0aab5eb33 ("tracing: Fix regression of filter waiting a long time on RCU synchronization")
Reported-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=daba72c4af9915e9c894
Tested-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Function bpf_cgroup_read_xattr is defined in fs/bpf_fs_kfuncs.c,
which is compiled only when CONFIG_BPF_LSM is set. Add CONFIG_BPF_LSM
check to bpf_cgroup_read_xattr spec in common_btf_ids in
kernel/bpf/helpers.c to avoid build failures for configs w/o
CONFIG_BPF_LSM.
Build failure example:
BTF .tmp_vmlinux1.btf.o
btf_encoder__tag_kfunc: failed to find kfunc 'bpf_cgroup_read_xattr' in BTF
...
WARN: resolve_btfids: unresolved symbol bpf_cgroup_read_xattr
make[2]: *** [scripts/Makefile.vmlinux:91: vmlinux.unstripped] Error 255
Fixes: 535b070f4a80 ("bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node")
Reported-by: Jake Hillion <jakehillion@meta.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250627175309.2710973-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Auxiliary clocks are disabled by default and attempts to access them
fail.
Provide an interface to enable/disable them at run-time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183758.444626478@linutronix.de
|
|
Update the auxiliary timekeepers periodically. For now this is tied to the system
timekeeper update from the tick. This might be revisited and moved out of the tick.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183758.382451331@linutronix.de
|
|
The behaviour is close to clock_adtime(CLOCK_REALTIME) with the
following differences:
1) ADJ_SETOFFSET adjusts the auxiliary clock offset
2) ADJ_TAI is not supported
3) Leap seconds are not supported
4) PPS is not supported
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183758.317946543@linutronix.de
|
|
Exclude ADJ_TAI, leap seconds and PPS functionality as they make no sense
in the context of auxiliary clocks and provide a time stamp based on the
actual clock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183758.253203783@linutronix.de
|
|
Split out the actual functionality of adjtimex() and make do_adjtimex() a
wrapper which feeds the core timekeeper into it and handles the result
including audit at the call site.
This allows to reuse the actual functionality for auxiliary clocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183758.187322876@linutronix.de
|
|
Redirect the relative offset adjustment to the auxiliary clock offset
instead of modifying CLOCK_REALTIME, which has no meaning in context of
these clocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183758.124057787@linutronix.de
|
|
Split out the inner workings for auxiliary clock support and feed the core time
keeper into it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183758.059934561@linutronix.de
|
|
Add clock_settime(2) support for auxiliary clocks. The function affects the
AUX offset which is added to the "monotonic" clock readout of these clocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183757.995688714@linutronix.de
|
|
Provide clock_getres(2) and clock_gettime(2) for auxiliary clocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183757.932220594@linutronix.de
|
|
Provide interfaces similar to the ktime_get*() family which provide access
to the auxiliary clocks.
These interfaces have a boolean return value, which indicates whether the
accessed clock is valid or not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183757.868342628@linutronix.de
|
|
Propagate a system clocksource change to the auxiliary timekeepers so that
they can pick up the new clocksource.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250625183757.803890875@linutronix.de
|
|
Documentation comments for bpf_strnlen and bpf_strcspn contained
incorrect function names.
Fixes: e91370550f1f ("bpf: Add kfuncs for read-only string operations")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/bpf/20250627174759.3a435f86@canb.auug.org.au/T/#u
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/20250627082001.237606-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Merge branch 'vfs-6.17.bpf' from vfs tree into bpf-next/master
and resolve conflicts.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
smp_call_function_many_cond() opencodes cpumask_any_but().
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-3-yury.norov@gmail.com
|
|
smp_call_function_any() tries to make a local call as it's the cheapest
option, or switches to a CPU in the same node. If it's not possible, the
algorithm gives up and searches for any CPU, in a numerical order.
Instead, it can search for the best CPU based on NUMA locality, including
the 2nd nearest hop (a set of equidistant nodes), and higher.
sched_numa_find_nth_cpu() does exactly that, and also helps to drop most
of the housekeeping code.
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250623000010.10124-2-yury.norov@gmail.com
|
|
Currently swap is restricted before drivers have had a chance to do
their prepare() PM callbacks. Restricting swap this early means that if
a driver needs to evict some content from memory into sawp in it's
prepare callback, it won't be able to.
On AMD dGPUs this can lead to failed suspends under memory pressure
situations as all VRAM must be evicted to system memory or swap.
Move the swap restriction to right after all devices have had a chance
to do the prepare() callback. If there is any problem with the sequence,
restore swap in the appropriate dpm resume callbacks or error handling
paths.
Closes: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Nat Wittstock <nat@fardog.io>
Tested-by: Lucian Langa <lucilanga@7pot.org>
Link: https://patch.msgid.link/20250613214413.4127087-1-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Cross-merge BPF, perf and other fixes after downstream PRs.
It restores BPF CI to green after critical fix
commit bc4394e5e79c ("perf: Fix the throttle error of some clock events")
No conflicts.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
String operations are commonly used so this exposes the most common ones
to BPF programs. For now, we limit ourselves to operations which do not
copy memory around.
Unfortunately, most in-kernel implementations assume that strings are
%NUL-terminated, which is not necessarily true, and therefore we cannot
use them directly in the BPF context. Instead, we open-code them using
__get_kernel_nofault instead of plain dereference to make them safe and
limit the strings length to XATTR_SIZE_MAX to make sure the functions
terminate. When __get_kernel_nofault fails, functions return -EFAULT.
Similarly, when the size bound is reached, the functions return -E2BIG.
In addition, we return -ERANGE when the passed strings are outside of
the kernel address space.
Note that thanks to these dynamic safety checks, no other constraints
are put on the kfunc args (they are marked with the "__ign" suffix to
skip any verifier checks for them).
All of the functions return integers, including functions which normally
(in kernel or libc) return pointers to the strings. The reason is that
since the strings are generally treated as unsafe, the pointers couldn't
be dereferenced anyways. So, instead, we return an index to the string
and let user decide what to do with it. This also nicely fits with
returning various error codes when necessary (see above).
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/4b008a6212852c1b056a413f86e3efddac73551c.1750917800.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If an AUX event overruns, the event core layer intends to disable the
event by setting the 'pending_disable' flag. Unfortunately, the event
is not actually disabled afterwards.
In commit:
ca6c21327c6a ("perf: Fix missing SIGTRAPs")
the 'pending_disable' flag was changed to a boolean. However, the
AUX event code was not updated accordingly. The flag ends up holding a
CPU number. If this number is zero, the flag is taken as false and the
IRQ work is never triggered.
Later, with commit:
2b84def990d3 ("perf: Split __perf_pending_irq() out of perf_pending_irq()")
a new IRQ work 'pending_disable_irq' was introduced to handle event
disabling. The AUX event path was not updated to kick off the work queue.
To fix this bug, when an AUX ring buffer overrun is detected, call
perf_event_disable_inatomic() to initiate the pending disable flow.
Also update the outdated comment for setting the flag, to reflect the
boolean values (0 or 1).
Fixes: 2b84def990d3 ("perf: Split __perf_pending_irq() out of perf_pending_irq()")
Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Liang Kan <kan.liang@linux.intel.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20250625170737.2918295-1-leo.yan@arm.com
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix use-after-free in libbpf when map is resized (Adin Scannell)
- Fix verifier assumptions about 2nd argument of bpf_sysctl_get_name
(Jerome Marchand)
- Fix verifier assumption of nullness of d_inode in dentry (Song Liu)
- Fix global starvation of LRU map (Willem de Bruijn)
- Fix potential NULL dereference in btf_dump__free (Yuan Chen)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: adapt one more case in test_lru_map to the new target_free
libbpf: Fix possible use-after-free for externs
selftests/bpf: Convert test_sysctl to prog_tests
bpf: Specify access type of bpf_sysctl_get_name args
libbpf: Fix null pointer dereference in btf_dump__free on allocation failure
bpf: Adjust free target to avoid global starvation of LRU map
bpf: Mark dentry->d_inode as trusted_or_null
|
|
Pull mount fixes from Al Viro:
"Several mount-related fixes"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
userns and mnt_idmap leak in open_tree_attr(2)
attach_recursive_mnt(): do not lock the covering tree when sliding something under it
replace collect_mounts()/drop_collected_mounts() with a safer variant
|
|
sched_ext performed a kfunc renaming pass in 6.13 and kept the old names
around for compatibility with old binaries. These were scheduled for
cleanup in 6.15 but were missed. Submitting for cleanup in for-next.
Removed the kfuncs, their flags, and any references I could find to them
in doc comments. Left the entries in include/scx/compat.bpf.h as they're
still useful to make new binaries compatible with old kernels.
Tested by applying to my kernel. It builds and a modern version of
scx_lavd loads fine.
Signed-off-by: Jake Hillion <jake@hillion.co.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The dm_crypt code fails to build without CONFIG_KEYS:
kernel/crash_dump_dm_crypt.c: In function 'restore_dm_crypt_keys_to_thread_keyring':
kernel/crash_dump_dm_crypt.c:105:9: error: unknown type name 'key_ref_t'; did you mean 'key_ref_put'?
There is a mix of 'select KEYS' and 'depends on KEYS' in Kconfig,
so there is no single obvious solution here, but generally using 'depends on'
makes more sense and is less likely to cause dependency loops.
Link: https://lkml.kernel.org/r/20250620112140.3396316-1-arnd@kernel.org
Fixes: 62f17d9df692 ("crash_dump: retrieve dm crypt keys in kdump kernel")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Coiby Xu <coxu@redhat.com>
Cc: Dave Vasilevsky <dave@vasilevsky.ca>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are places in BPF code which check if a BTF type is an integer
of particular size. This code can be made simpler by using helpers.
Add new btf_type_is_i{32,64} helpers, and simplify code in a few
files. (Suggested by Eduard for a patch which copy-pasted such a
check [1].)
v1 -> v2:
* export less generic helpers (Eduard)
* make subject less generic than in [v1] (Eduard)
[1] https://lore.kernel.org/bpf/7edb47e73baa46705119a23c6bf4af26517a640f.camel@gmail.com/
[v1] https://lore.kernel.org/bpf/20250624193655.733050-1-a.s.protopopov@gmail.com/
Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625151621.1000584-1-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce support for `bpf_rdonly_cast(v, 0)`, which casts the value
`v` to an untyped, untrusted pointer, logically similar to a `void *`.
The memory pointed to by such a pointer is treated as read-only.
As with other untrusted pointers, memory access violations on loads
return zero instead of causing a fault.
Technically:
- The resulting pointer is represented as a register of type
`PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED` with size zero.
- Offsets within such pointers are not tracked.
- Same load instructions are allowed to have both
`PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED` and `PTR_TO_BTF_ID`
as the base pointer types.
In such cases, `bpf_insn_aux_data->ptr_type` is considered the
weaker of the two: `PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED`.
The following constraints apply to the new pointer type:
- can be used as a base for LDX instructions;
- can't be used as a base for ST/STX or atomic instructions;
- can't be used as parameter for kfuncs or helpers.
These constraints are enforced by existing handling of `MEM_RDONLY`
flag and `PTR_TO_MEM` of size zero.
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625182414.30659-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This commit adds a kernel side enum for use in conjucntion with BTF
CO-RE bpf_core_enum_value_exists. The goal of the enum is to assist
with available BPF features detection. Intended usage looks as
follows:
if (bpf_core_enum_value_exists(enum bpf_features, BPF_FEAT_<f>))
... use feature f ...
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625182414.30659-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add range tracking for instruction BPF_NEG. Without this logic, a trivial
program like the following will fail
volatile bool found_value_b;
SEC("lsm.s/socket_connect")
int BPF_PROG(test_socket_connect)
{
if (!found_value_b)
return -1;
return 0;
}
with verifier log:
"At program exit the register R0 has smin=0 smax=4294967295 should have
been in [-4095, 0]".
This is because range information is lost in BPF_NEG:
0: R1=ctx() R10=fp0
; if (!found_value_b) @ xxxx.c:24
0: (18) r1 = 0xffa00000011e7048 ; R1_w=map_value(...)
2: (71) r0 = *(u8 *)(r1 +0) ; R0_w=scalar(smin32=0,smax=255)
3: (a4) w0 ^= 1 ; R0_w=scalar(smin32=0,smax=255)
4: (84) w0 = -w0 ; R0_w=scalar(range info lost)
Note that, the log above is manually modified to highlight relevant bits.
Fix this by maintaining proper range information with BPF_NEG, so that
the verifier will know:
4: (84) w0 = -w0 ; R0_w=scalar(smin32=-255,smax=0)
Also updated selftests based on the expected behavior.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250625164025.3310203-2-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For built with CONFIG_PREEMPT_RT=y kernels, running rcutorture
tests resulted in the following splat:
[ 68.797425] rcutorture_one_extend_check during change: Current 0x1 To add 0x1 To remove 0x0 preempt_count() 0x0
[ 68.797533] WARNING: CPU: 2 PID: 512 at kernel/rcu/rcutorture.c:1993 rcutorture_one_extend_check+0x419/0x560 [rcutorture]
[ 68.797601] Call Trace:
[ 68.797602] <TASK>
[ 68.797619] ? lockdep_softirqs_off+0xa5/0x160
[ 68.797631] rcutorture_one_extend+0x18e/0xcc0 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797646] ? local_clock+0x19/0x40
[ 68.797659] rcu_torture_one_read+0xf0/0x280 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797678] ? __pfx_rcu_torture_one_read+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797804] ? __pfx_rcu_torture_timer+0x10/0x10 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797815] rcu-torture: rcu_torture_reader task started
[ 68.797824] rcu-torture: Creating rcu_torture_reader task
[ 68.797824] rcu_torture_reader+0x238/0x580 [rcutorture 2466dbd2ff34dbaa36049cb323a80c3306ac997c]
[ 68.797836] ? kvm_sched_clock_read+0x15/0x30
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count() for RT kernels, this commit therefore use
softirq_count() to check the if BH is disabled.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Trivial RCU is a textbook implementation that is not used in the
Linux kernel, but tested to keep textbooks (and presentations) honest.
It is so trivial that it cannot deal with either CPU hotplug or external
migration from one CPU to another. This commit therefore splats whenever
onoff_interval or shuffle_interval are non-zero, and then sets them to
zero in order to avoid false-positive failures.
Those wishing to set these module parameters in order to force failures
in Trivial RCU are free to revert this commit. Just don't expect me to
be sympathetic to any resulting bug reports!
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202505131651.af6e81d7-lkp@intel.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Given that the rcutorture_one_extend_check() function now uses
in_serving_softirq() and in_hardirq(), it is no longer necessary to pass
insoftirq flags down the function-call stack. This commit therefore
removes those flags, and, while in the area, does a bit of whitespace
cleanup.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This commit prints the number of RCU up/down readers and the number
of such readers that migrated from one CPU to another, along
with the rest of the periodic rcu_torture_stats_print() output.
These statistics are currently used only by srcu_down_read{,_fast}()
and srcu_up_read(,_fast)().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
The design of testing of up/down readers such as srcu_down_read()
and srcu_up_read() assumes that these are tested only by the
rcu_torture_updown() kthread, and never by the rcu_torture_reader()
kthread. Because we all know which road is paved with good intentions,
this commit adds WARN_ON_ONCE() to verify that things are going to plan.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This commit creates counters in the rcu_torture_one_read_state_updown
structure that check for a call to ->up_read() that lacks a matching
call to ->down_read().
While in the area, add end-of-run cleanup code that prevents calls to
rcu_torture_updown_hrt() from happening after the test has moved on. Yes,
the srcu_barrier() at the end of the test will wait for them, but this
could result in confusing states, statistics, and diagnostic information.
So explicitly wait for them before we get to the end-of-test output.
[ paulmck: Apply kernel test robot feedback. ]
[ joel: Apply Boqun's fix for counter increment ordering. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
The down/up SRCU reader testing uses an hrtimer handler to exit the SRCU
read-side critical section. This might be delayed, and if delayed for
too long, it can prevent the rcutorture run from completing. This commit
therefore complains if the hrtimer handler is delayed for more than
ten seconds.
[ paulmck, joel: Apply kernel test robot feedback to avoid
false-positive complaint of excessive ->up_read() delays by using
HRTIMER_MODE_HARD ]
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This is strictly a code-movement commit, pulling that part of
the rcu_torture_updown() function's loop body that processes
one rcu_torture_one_read_state_updown structure into a new
rcu_torture_updown_one() function. The checks for the end of the
torture test and the current structure being in use remain in the
rcu_torture_updown() function.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This commit adds a new rcutorture.n_up_down kernel boot parameter
that specifies the number of outstanding SRCU up/down readers, which
begin in kthread context and end in an hrtimer handler. There is a new
kthread ("rcu_torture_updown") that scans an per-reader array looking
for elements whose readers have ended. This kthread sleeps between one
and two milliseconds between consecutive scans.
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Apply Z qiang feedback. ]
[ joel: Fix build error: hrtimer_init is replaced by hrtimer_setup. ]
[ joel: Apply Boqun bug fix to drop extra up_read() call in
rcu_torture_updown()].
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
This commit retrospectively prepares for testing of RCU readers invoked
from hardware interrupt handlers (for example, HRTIMER_MODE_HARD hrtimer
handlers) in kernels built with CONFIG_RCU_TORTURE_TEST_CHK_RDR_STATE=y,
which is rarely used but sometimes extremely useful. This preparation
involves taking early exits if in_hardirq(), and, while we are in the
area, a very early exit if in_nmi().
This means that a number of insoftirq parameters are no longer needed,
but that is the subject of a later commit.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202505140917.8ee62cc6-lkp@intel.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
Testing of rcutorture's SRCU-P scenario on a large arm64 system resulted
in rcu_torture_writer() forward-progress failures, but these same tests
passed on x86. After some off-list discussion of possible memory-ordering
causes for these failures, Boqun showed that these were in fact due to
reordering, but by the scheduler, not by the memory system. On x86,
rcu_torture_writer() would have run quickly enough that by the time
the rcu_torture_updown() kthread started, the rcu_torture_current
variable would already be initialized, thus avoiding a bug in which
a NULL value would cause rcu_torture_updown() to do an extra call to
srcu_up_read_fast().
This commit therefore moves creation of the rcu_torture_writer() kthread
after that of the rcu_torture_reader() kthreads. This results in
deterministic failures on x86.
What about the double-srcu_up_read_fast() bug? Boqun has the fix.
But let's also fix the test while we are at it!
Reported-by: Joel Fernandes <joelagnelf@nvidia.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
The rcu_torture_writer() function scans the memory blocks after a stutter
(or forced idle) interval, complaining about any that have not passed
through ten grace periods since the start of the stutter interval.
But one splat suffices, so this commit therefore stops at the first splat.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
RCU relies on the context tracking nesting counter in order to determine
if it is running in extended quiescent state.
However the context tracking nesting counter is not completely
synchronized with the actual context tracking state:
* The nesting counter is set to 1 or incremented further _after_ the
actual state is set to RCU watching.
* The nesting counter is set to 0 or decremented further _before_ the
actual state is set to RCU not watching.
Therefore it is safe to assume that if ct_nesting() > 0, RCU is
watching. But if ct_nesting() <= 0, RCU is not watching except for tiny
windows.
This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle()
has only been called from interrupts. However the code is confusing
and abuses the role of the context tracking nesting counter while there
are more accurate indicators available.
Clarify and robustify accordingly.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|
|
When a grace period is started, the ->expmask of each node is set up
from sync_exp_reset_tree(). Then later on each leaf node also initialize
its ->exp_tasks pointer.
This means that the initialization of the quiescent state of a node and
the initialization of its blocking tasks happen with an unlocked node
gap in-between.
It happens to be fine because nothing is expected to report an exp
quiescent state within this gap, since no IPI have been issued yet and
every rdp's ->cpu_no_qs.b.exp should be false.
However if it were to happen by accident, the quiescent state could be
reported and propagated while ignoring tasks that blocked _before_ the
start of the grace period.
Prevent such trouble to happen in the future and initialize both the
quiescent states mask to report and the blocked tasks head from the same
node locked block.
If a task blocks within an RCU read side critical section before
sync_exp_reset_tree() is called and is then unblocked between
sync_exp_reset_tree() and __sync_rcu_exp_select_node_cpus(), the QS
won't be reported because no RCU exp IPI had been issued to request it
through the setting of srdp->cpu_no_qs.b.exp.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
|