<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/events/core.c, branch v5.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-10-01T11:57:54Z</updated>
<entry>
<title>perf/core: fix userpage-&gt;time_enabled of inactive events</title>
<updated>2021-10-01T11:57:54Z</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2021-09-29T19:43:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f792565326825ed806626da50c6f9a928f1079c1'/>
<id>urn:sha1:f792565326825ed806626da50c6f9a928f1079c1</id>
<content type='text'>
Users of rdpmc rely on the mmapped user page to calculate accurate
time_enabled. Currently, userpage-&gt;time_enabled is only updated when the
event is added to the pmu. As a result, inactive event (due to counter
multiplexing) does not have accurate userpage-&gt;time_enabled. This can
be reproduced with something like:

   /* open 20 task perf_event "cycles", to create multiplexing */

   fd = perf_event_open();  /* open task perf_event "cycles" */
   userpage = mmap(fd);     /* use mmap and rdmpc */

   while (true) {
     time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */
     time_enabled_read = read(fd).time_enabled;
     if (time_enabled_mmap &gt; time_enabled_read)
         BUG();
   }

Fix this by updating userpage for inactive events in merge_sched_in.

Suggested-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reported-and-tested-by: Lucian Grijincu &lt;lucian@fb.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
</content>
</entry>
<entry>
<title>events: Reuse value read using READ_ONCE instead of re-reading it</title>
<updated>2021-09-15T15:49:06Z</updated>
<author>
<name>Baptiste Lepers</name>
<email>baptiste.lepers@gmail.com</email>
</author>
<published>2021-09-06T01:53:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b89a05b21f46150ac10a962aa50109250b56b03b'/>
<id>urn:sha1:b89a05b21f46150ac10a962aa50109250b56b03b</id>
<content type='text'>
In perf_event_addr_filters_apply, the task associated with
the event (event-&gt;ctx-&gt;task) is read using READ_ONCE at the beginning
of the function, checked, and then re-read from event-&gt;ctx-&gt;task,
voiding all guarantees of the checks. Reuse the value that was read by
READ_ONCE to ensure the consistency of the task struct throughout the
function.

Fixes: 375637bc52495 ("perf/core: Introduce address range filtering")
Signed-off-by: Baptiste Lepers &lt;baptiste.lepers@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210906015310.12802-1-baptiste.lepers@gmail.com
</content>
</entry>
<entry>
<title>Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux</title>
<updated>2021-09-04T18:35:47Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-04T18:35:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=49624efa65ac9889f4e7c7b2452b2e6ce42ba37d'/>
<id>urn:sha1:49624efa65ac9889f4e7c7b2452b2e6ce42ba37d</id>
<content type='text'>
Pull MAP_DENYWRITE removal from David Hildenbrand:
 "Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
  VM_DENYWRITE.

  There are some (minor) user-visible changes:

   - We no longer deny write access to shared libaries loaded via legacy
     uselib(); this behavior matches modern user space e.g. dlopen().

   - We no longer deny write access to the elf interpreter after exec
     completed, treating it just like shared libraries (which it often
     is).

   - We always deny write access to the file linked via /proc/pid/exe:
     sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
     file cannot be denied, and write access to the file will remain
     denied until the link is effectivel gone (exec, termination,
     sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.

  Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
  s390x, ...) and verified via ltp that especially the relevant tests
  (i.e., creat07 and execve04) continue working as expected"

* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
  fs: update documentation of get_write_access() and friends
  mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
  mm: remove VM_DENYWRITE
  binfmt: remove in-tree usage of MAP_DENYWRITE
  kernel/fork: always deny write access to current MM exe_file
  kernel/fork: factor out replacing the current MM exe_file
  binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
</content>
</entry>
<entry>
<title>mm: remove VM_DENYWRITE</title>
<updated>2021-09-03T16:42:01Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2021-04-22T10:08:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d0920bde5eb8ec7e567939b85e65a0596c8580d'/>
<id>urn:sha1:8d0920bde5eb8ec7e567939b85e65a0596c8580d</id>
<content type='text'>
All in-tree users of MAP_DENYWRITE are gone. MAP_DENYWRITE cannot be
set from user space, so all users are gone; let's remove it.

Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
</content>
</entry>
<entry>
<title>bpf: Fix NULL event-&gt;prog pointer access in bpf_overflow_handler</title>
<updated>2021-08-19T18:49:19Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2021-08-19T15:52:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=594286b7574c6e8217b1c233cc0d0650f2268a77'/>
<id>urn:sha1:594286b7574c6e8217b1c233cc0d0650f2268a77</id>
<content type='text'>
Andrii reported that libbpf CI hit the following oops when
running selftest send_signal:
  [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030
  [ 1243.161066] #PF: supervisor read access in kernel mode
  [ 1243.161066] #PF: error_code(0x0000) - not-present page
  [ 1243.161066] PGD 0 P4D 0
  [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI
  [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G           O      5.14.0-rc5 #1
  [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0
  [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 &lt;ff&gt; 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f
  [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046
  [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8
  [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80
  [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000
  [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000
  [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082
  [ 1243.161066] FS:  00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
  [ 1243.161066] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0
  [ 1243.161066] Call Trace:
  [ 1243.161066]  &lt;IRQ&gt;
  [ 1243.161066]  __perf_event_overflow+0x4f/0xf0
  [ 1243.161066]  perf_swevent_hrtimer+0x116/0x130
  [ 1243.161066]  ? __lock_acquire+0x378/0x2730
  [ 1243.161066]  ? __lock_acquire+0x372/0x2730
  [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
  [ 1243.161066]  ? find_held_lock+0x2b/0x80
  [ 1243.161066]  ? lock_is_held_type+0xd5/0x130
  [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
  [ 1243.161066]  ? perf_event_groups_first+0x80/0x80
  [ 1243.161066]  __hrtimer_run_queues+0x1a3/0x460
  [ 1243.161066]  hrtimer_interrupt+0x110/0x220
  [ 1243.161066]  __sysvec_apic_timer_interrupt+0x8a/0x260
  [ 1243.161066]  sysvec_apic_timer_interrupt+0x89/0xc0
  [ 1243.161066]  &lt;/IRQ&gt;
  [ 1243.161066]  asm_sysvec_apic_timer_interrupt+0x12/0x20
  [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250
  [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 &lt;58&gt; 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0
  [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282
  [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000
  [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e
  [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001
  [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80
  [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000
  [ 1243.161066]  ? finish_task_switch+0xab/0x250
  [ 1243.161066]  ? finish_task_switch+0x70/0x250
  [ 1243.161066]  __schedule+0x36b/0xbb0
  [ 1243.161066]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
  [ 1243.161066]  ? lockdep_hardirqs_on+0x79/0x100
  [ 1243.161066]  schedule+0x43/0xe0
  [ 1243.161066]  pipe_read+0x30b/0x450
  [ 1243.161066]  ? wait_woken+0x80/0x80
  [ 1243.161066]  new_sync_read+0x164/0x170
  [ 1243.161066]  vfs_read+0x122/0x1b0
  [ 1243.161066]  ksys_read+0x93/0xd0
  [ 1243.161066]  do_syscall_64+0x35/0x80
  [ 1243.161066]  entry_SYSCALL_64_after_hwframe+0x44/0xae

The oops can also be reproduced with the following steps:
  ./vmtest.sh -s
  # at qemu shell
  cd /root/bpf &amp;&amp; while true; do ./test_progs -t send_signal

Further analysis showed that the failure is introduced with
commit b89fbfbb854c ("bpf: Implement minimal BPF perf link").
With the above commit, the following scenario becomes possible:
    cpu1                        cpu2
                                hrtimer_interrupt -&gt; bpf_overflow_handler
    (due to closing link_fd)
    bpf_perf_link_release -&gt;
    perf_event_free_bpf_prog -&gt;
    perf_event_free_bpf_handler -&gt;
      WRITE_ONCE(event-&gt;overflow_handler, event-&gt;orig_overflow_handler)
      event-&gt;prog = NULL
                                bpf_prog_run(event-&gt;prog, &amp;ctx)

In the above case, the event-&gt;prog is NULL for bpf_prog_run, hence
causing oops.

To fix the issue, check whether event-&gt;prog is NULL or not. If it
is, do not call bpf_prog_run. This seems working as the above
reproducible step runs more than one hour and I didn't see any
failures.

Fixes: b89fbfbb854c ("bpf: Implement minimal BPF perf link")
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210819155209.1927994-1-yhs@fb.com
</content>
</entry>
<entry>
<title>bpf: Allow to specify user-provided bpf_cookie for BPF perf links</title>
<updated>2021-08-16T22:45:07Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2021-08-15T07:05:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=82e6b1eee6a8875ef4eacfd60711cce6965c6b04'/>
<id>urn:sha1:82e6b1eee6a8875ef4eacfd60711cce6965c6b04</id>
<content type='text'>
Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).

This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).

This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.

Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.

Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
</content>
</entry>
<entry>
<title>bpf: Implement minimal BPF perf link</title>
<updated>2021-08-16T22:45:07Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2021-08-15T07:05:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b89fbfbb854c9afc3047e8273cc3a694650b802e'/>
<id>urn:sha1:b89fbfbb854c9afc3047e8273cc3a694650b802e</id>
<content type='text'>
Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.

BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.

Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.

One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.

The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.

Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
</content>
</entry>
<entry>
<title>bpf: Refactor perf_event_set_bpf_prog() to use struct bpf_prog input</title>
<updated>2021-08-16T22:45:07Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2021-08-15T07:05:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=652c1b17b85b9c195978c051aa283027529db1fe'/>
<id>urn:sha1:652c1b17b85b9c195978c051aa283027529db1fe</id>
<content type='text'>
Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
input argument, which makes it easier to re-use for other internal uses
(coming up for BPF link in the next patch). BPF program FD is not as
convenient and in some cases it's not available. So switch to struct bpf_prog,
move out refcounting outside and let caller do bpf_prog_put() in case of an
error. This follows the approach of most of the other BPF internal functions.

Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210815070609.987780-4-andrii@kernel.org
</content>
</entry>
<entry>
<title>bpf: Refactor BPF_PROG_RUN into a function</title>
<updated>2021-08-16T22:45:07Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2021-08-15T07:05:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fb7dd8bca0139fd73d3f4a6cd257b11731317ded'/>
<id>urn:sha1:fb7dd8bca0139fd73d3f4a6cd257b11731317ded</id>
<content type='text'>
Turn BPF_PROG_RUN into a proper always inlined function. No functional and
performance changes are intended, but it makes it much easier to understand
what's going on with how BPF programs are actually get executed. It's more
obvious what types and callbacks are expected. Also extra () around input
parameters can be dropped, as well as `__` variable prefixes intended to avoid
naming collisions, which makes the code simpler to read and write.

This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
BPF_PROG_RUN into a function causes naming conflict compilation error. So
rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.

Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
</content>
</entry>
<entry>
<title>perf: Refactor permissions check into perf_check_permission()</title>
<updated>2021-07-16T16:46:38Z</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2021-07-05T08:44:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b068fc04de10fff8974f6ef32b861ad134d94ba4'/>
<id>urn:sha1:b068fc04de10fff8974f6ef32b861ad134d94ba4</id>
<content type='text'>
Refactor the permission check in perf_event_open() into a helper
perf_check_permission(). This makes the permission check logic more
readable (because we no longer have a negated disjunction). Add a
comment mentioning the ptrace check also checks the uid.

No functional change intended.

Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Link: https://lore.kernel.org/r/20210705084453.2151729-2-elver@google.com
</content>
</entry>
</feed>
