<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/trace/bpf_trace.c, branch v5.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-05-15T15:10:36Z</updated>
<entry>
<title>bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier</title>
<updated>2020-05-15T15:10:36Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2020-05-15T10:11:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b2a5212fb634561bb734c6356904e37f6665b955'/>
<id>urn:sha1:b2a5212fb634561bb734c6356904e37f6665b955</id>
<content type='text'>
Usage of plain %s conversion specifier in bpf_trace_printk() suffers from the
very same issue as bpf_probe_read{,str}() helpers, that is, it is broken on
archs with overlapping address ranges.

While the helpers have been addressed through work in 6ae08ae3dea2 ("bpf: Add
probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers"), we need
an option for bpf_trace_printk() as well to fix it.

Similarly as with the helpers, force users to make an explicit choice by adding
%pks and %pus specifier to bpf_trace_printk() which will then pick the corresponding
strncpy_from_unsafe*() variant to perform the access under KERNEL_DS or USER_DS.
The %pk* (kernel specifier) and %pu* (user specifier) can later also be extended
for other objects aside strings that are probed and printed under tracing, and
reused out of other facilities like bpf_seq_printf() or BTF based type printing.

Existing behavior of %s for current users is still kept working for archs where it
is not broken and therefore gated through CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
For archs not having this property we fall-back to pick probing under KERNEL_DS as
a sensible default.

Fixes: 8d3b7dce8622 ("bpf: add support for %s specifier to bpf_trace_printk()")
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reported-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Brendan Gregg &lt;brendan.d.gregg@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200515101118.6508-4-daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Restrict bpf_probe_read{, str}() only to archs where they work</title>
<updated>2020-05-15T15:10:36Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2020-05-15T10:11:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0ebeea8ca8a4d1d453ad299aef0507dab04f6e8d'/>
<id>urn:sha1:0ebeea8ca8a4d1d453ad299aef0507dab04f6e8d</id>
<content type='text'>
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.

To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").

Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.

However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).

For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Brendan Gregg &lt;brendan.d.gregg@gmail.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2020-03-31T02:52:37Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-03-31T02:52:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ed52f2c608c9451fa2bad298b2ab927416105d65'/>
<id>urn:sha1:ed52f2c608c9451fa2bad298b2ab927416105d65</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: Introduce BPF_PROG_TYPE_LSM</title>
<updated>2020-03-29T23:34:00Z</updated>
<author>
<name>KP Singh</name>
<email>kpsingh@google.com</email>
</author>
<published>2020-03-29T00:43:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fc611f47f2188ade2b48ff6902d5cce8baac0c58'/>
<id>urn:sha1:fc611f47f2188ade2b48ff6902d5cce8baac0c58</id>
<content type='text'>
Introduce types and configs for bpf programs that can be attached to
LSM hooks. The programs can be enabled by the config option
CONFIG_BPF_LSM.

Signed-off-by: KP Singh &lt;kpsingh@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Brendan Jackman &lt;jackmanb@google.com&gt;
Reviewed-by: Florent Revest &lt;revest@google.com&gt;
Reviewed-by: Thomas Garnier &lt;thgarnie@google.com&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Acked-by: James Morris &lt;jamorris@linux.microsoft.com&gt;
Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2020-03-26T01:58:11Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2020-03-26T01:58:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9fb16955fb661945ddffce4504dcffbe55cd518a'/>
<id>urn:sha1:9fb16955fb661945ddffce4504dcffbe55cd518a</id>
<content type='text'>
Overlapping header include additions in macsec.c

A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c

Overlapping test additions in selftests Makefile

Overlapping PCI ID table adjustments in iwlwifi driver.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: Add bpf_xdp_output() helper</title>
<updated>2020-03-13T00:47:38Z</updated>
<author>
<name>Eelco Chaudron</name>
<email>echaudro@redhat.com</email>
</author>
<published>2020-03-06T08:59:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d831ee84bfc9173eecf30dbbc2553ae81b996c60'/>
<id>urn:sha1:d831ee84bfc9173eecf30dbbc2553ae81b996c60</id>
<content type='text'>
Introduce new helper that reuses existing xdp perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct xdp_buff *' as a tracepoint argument.

Signed-off-by: Eelco Chaudron &lt;echaudro@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
</content>
</entry>
<entry>
<title>bpf: Added new helper bpf_get_ns_current_pid_tgid</title>
<updated>2020-03-13T00:33:11Z</updated>
<author>
<name>Carlos Neira</name>
<email>cneirabustos@gmail.com</email>
</author>
<published>2020-03-04T20:41:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b4490c5c4e023f09b7d27c9a9d3e7ad7d09ea6bf'/>
<id>urn:sha1:b4490c5c4e023f09b7d27c9a9d3e7ad7d09ea6bf</id>
<content type='text'>
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.

Signed-off-by: Carlos Neira &lt;cneirabustos@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
</content>
</entry>
<entry>
<title>bpf: Fix bpf_prog_test_run_tracing for !CONFIG_NET</title>
<updated>2020-03-05T23:14:58Z</updated>
<author>
<name>KP Singh</name>
<email>kpsingh@google.com</email>
</author>
<published>2020-03-05T22:01:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3e7c67d90e3ed2f34fce42699f11b150dd1d3999'/>
<id>urn:sha1:3e7c67d90e3ed2f34fce42699f11b150dd1d3999</id>
<content type='text'>
test_run.o is not built when CONFIG_NET is not set and
bpf_prog_test_run_tracing being referenced in bpf_trace.o causes the
linker error:

ld: kernel/trace/bpf_trace.o:(.rodata+0x38): undefined reference to
 `bpf_prog_test_run_tracing'

Add a __weak function in bpf_trace.c to handle this.

Fixes: da00d2f117a0 ("bpf: Add test ops for BPF_PROG_TYPE_TRACING")
Signed-off-by: KP Singh &lt;kpsingh@google.com&gt;
Reported-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Acked-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200305220127.29109-1-kpsingh@chromium.org
</content>
</entry>
<entry>
<title>bpf: Fix deadlock with rq_lock in bpf_send_signal()</title>
<updated>2020-03-05T22:02:22Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2020-03-04T19:11:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1bc7896e9ef44fd77858b3ef0b8a6840be3a4494'/>
<id>urn:sha1:1bc7896e9ef44fd77858b3ef0b8a6840be3a4494</id>
<content type='text'>
When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153f2b58 ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153f2b58, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;sys/mman.h&gt;
  #include &lt;pthread.h&gt;
  #include &lt;sys/types.h&gt;
  #include &lt;sys/stat.h&gt;
  #include &lt;fcntl.h&gt;

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd &lt; 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&amp;ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc &lt; 2)
                return 0;

        filename = argv[1];

        for (i = 0; i &lt; THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i &lt; THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
     --- a/tools/trace.py
     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&amp;__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &amp;__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -&gt; #1 (&amp;p-&gt;pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -&gt; #0 (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &amp;(&amp;sighand-&gt;siglock)-&gt;rlock --&gt; &amp;p-&gt;pi_lock --&gt; &amp;rq-&gt;lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&amp;rq-&gt;lock);
  [   32.856671]                                lock(&amp;p-&gt;pi_lock);
  [   32.857332]                                lock(&amp;rq-&gt;lock);
  [   32.857999]   lock(&amp;(&amp;sighand-&gt;siglock)-&gt;rlock);

  Deadlock happens on CPU0 when it tries to acquire &amp;sighand-&gt;siglock
  but it has been held by CPU1 and CPU1 tries to grab &amp;rq-&gt;lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
</content>
</entry>
<entry>
<title>bpf: Add test ops for BPF_PROG_TYPE_TRACING</title>
<updated>2020-03-04T21:41:06Z</updated>
<author>
<name>KP Singh</name>
<email>kpsingh@google.com</email>
</author>
<published>2020-03-04T19:18:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=da00d2f117a08fbca262db5ea422c80a568b112b'/>
<id>urn:sha1:da00d2f117a08fbca262db5ea422c80a568b112b</id>
<content type='text'>
The current fexit and fentry tests rely on a different program to
exercise the functions they attach to. Instead of doing this, implement
the test operations for tracing which will also be used for
BPF_MODIFY_RETURN in a subsequent patch.

Also, clean up the fexit test to use the generated skeleton.

Signed-off-by: KP Singh &lt;kpsingh@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20200304191853.1529-7-kpsingh@chromium.org
</content>
</entry>
</feed>
