<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/tools/perf/util/thread-stack.c, branch v5.10</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.10</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.10'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-05-05T19:35:29Z</updated>
<entry>
<title>perf thread-stack: Add thread_stack__br_sample_late()</title>
<updated>2020-05-05T19:35:29Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2020-04-29T15:07:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3749e0bbdef24efbf1698bf0dbd9575fddb9ed22'/>
<id>urn:sha1:3749e0bbdef24efbf1698bf0dbd9575fddb9ed22</id>
<content type='text'>
Add a thread stack function to create a branch stack for hardware events
where the sample records get created some time after the event occurred.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lore.kernel.org/lkml/20200429150751.12570-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Add branch stack support</title>
<updated>2020-05-05T19:35:29Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2020-04-29T15:07:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86d67180b920d178ae1c2923f50a0759d6ce1a10'/>
<id>urn:sha1:86d67180b920d178ae1c2923f50a0759d6ce1a10</id>
<content type='text'>
Intel PT already has support for creating branch stacks for each context
(per-cpu or per-thread). In the more common per-cpu case, the branch stack
is not separated for different threads, instead being cleared in between
each sample.

That approach will not work very well for adding branch stacks to
regular events. The branch stacks really need to be accumulated
separately for each thread.

As a start to accomplishing that, this patch adds support for putting
branch stack support into the thread-stack. The advantages are:

1. the branches are accumulated separately for each thread
2. the branch stack is cleared only in between continuous traces

This helps pave the way for adding branch stacks to regular events, not
just synthesized events as at present.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lore.kernel.org/lkml/20200429150751.12570-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Add thread_stack__sample_late()</title>
<updated>2020-04-16T15:19:15Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2020-04-01T10:16:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4fef41bfb1d8d2ada4a18eb3ab80c2682bcbae12'/>
<id>urn:sha1:4fef41bfb1d8d2ada4a18eb3ab80c2682bcbae12</id>
<content type='text'>
Add a thread stack function to create a call chain for hardware events
where the sample records get created some time after the event occurred.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lore.kernel.org/lkml/20200401101613.6201-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread: Rename thread-&gt;mg to thread-&gt;maps</title>
<updated>2019-11-26T14:07:46Z</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-11-26T01:07:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fe87797dea79b59e97a4ea67441bf91f2905bf23'/>
<id>urn:sha1:fe87797dea79b59e97a4ea67441bf91f2905bf23</id>
<content type='text'>
One more step on the merge of 'struct maps' with 'struct map_groups'.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-69vcr8pubpym90skxhmbwhiw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf debug: Remove needless include directives from debug.h</title>
<updated>2019-08-31T22:10:19Z</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-08-29T19:18:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8520a98dbab61e9e340cdfb72dd17ccc8a98961e'/>
<id>urn:sha1:8520a98dbab61e9e340cdfb72dd17ccc8a98961e</id>
<content type='text'>
All we need there is a forward declaration for 'union perf_event', so
remove it from there and add missing header directives in places using
things from this indirect include.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>tools lib: Adopt zalloc()/zfree() from tools/perf</title>
<updated>2019-07-09T13:13:26Z</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-07-04T14:32:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f7c536f23e6afaa5d5d4b0e0958b0be8922491f'/>
<id>urn:sha1:7f7c536f23e6afaa5d5d4b0e0958b0be8922491f</id>
<content type='text'>
Eroding a bit more the tools/perf/util/util.h hodpodge header.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Add missing headers, mostly stdlib.h</title>
<updated>2019-07-09T13:13:22Z</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-07-04T14:21:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=215a0d305c5651928eb67c96bcedd0a6c297dfce'/>
<id>urn:sha1:215a0d305c5651928eb67c96bcedd0a6c297dfce</id>
<content type='text'>
Part of the erosion of util/util.h, that will lose its include stdlib.h,
we need to add it to places where it is needed but was getting it
indirectly.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-1imnqezw99ahc07fjeb51qby@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Eliminate code duplicating thread_stack__pop_ks()</title>
<updated>2019-06-25T11:47:10Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-06-19T06:44:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eb5d854456f5a4ccec6f9681b7196cf056df8cfa'/>
<id>urn:sha1:eb5d854456f5a4ccec6f9681b7196cf056df8cfa</id>
<content type='text'>
Use new function thread_stack__pop_ks() in place of equivalent code.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: http://lkml.kernel.org/r/20190619064429.14940-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf thread-stack: Fix thread stack return from kernel for kernel-only case</title>
<updated>2019-06-25T11:47:10Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-06-19T06:44:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=97860b483c5597663a174ff7405be957b4838391'/>
<id>urn:sha1:97860b483c5597663a174ff7405be957b4838391</id>
<content type='text'>
Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a
different symbol") had the side-effect of introducing more stack entries
before return from kernel space.

When user space is also traced, those entries are popped before entry to
user space, but when user space is not traced, they get stuck at the
bottom of the stack, making the stack grow progressively larger.

Fix by detecting a return-from-kernel branch type, and popping kernel
addresses from the stack then.

Note, the problem and fix affect the exported Call Graph / Tree but not
the callindent option used by "perf script --call-trace".

Example:

  perf-with-kcore record example -e intel_pt//k -- ls
  perf-with-kcore script example --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
  ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db

  Menu option: Reports -&gt; Context-Sensitive Call Graph

  Before: (showing Call Path column only)

    Call Path
    ▶ perf
    ▼ ls
      ▼ 12111:12111
        ▶ setup_new_exec
        ▶ __task_pid_nr_ns
        ▶ perf_event_pid_type
        ▶ perf_event_comm_output
        ▶ perf_iterate_ctx
        ▶ perf_iterate_sb
        ▶ perf_event_comm
        ▶ __set_task_comm
        ▶ load_elf_binary
        ▶ search_binary_handler
        ▶ __do_execve_file.isra.41
        ▶ __x64_sys_execve
        ▶ do_syscall_64
        ▼ entry_SYSCALL_64_after_hwframe
          ▼ swapgs_restore_regs_and_return_to_usermode
            ▼ native_iret
              ▶ error_entry
              ▶ do_page_fault
              ▼ error_exit
                ▼ retint_user
                  ▶ prepare_exit_to_usermode
                  ▼ native_iret
                    ▶ error_entry
                    ▶ do_page_fault
                    ▼ error_exit
                      ▼ retint_user
                        ▶ prepare_exit_to_usermode
                        ▼ native_iret
                          ▶ error_entry
                          ▶ do_page_fault
                          ▼ error_exit
                            ▼ retint_user
                              ▶ prepare_exit_to_usermode
                              ▶ native_iret

  After: (showing Call Path column only)

    Call Path
    ▶ perf
    ▼ ls
      ▼ 12111:12111
        ▶ setup_new_exec
        ▶ __task_pid_nr_ns
        ▶ perf_event_pid_type
        ▶ perf_event_comm_output
        ▶ perf_iterate_ctx
        ▶ perf_iterate_sb
        ▶ perf_event_comm
        ▶ __set_task_comm
        ▶ load_elf_binary
        ▶ search_binary_handler
        ▶ __do_execve_file.isra.41
        ▶ __x64_sys_execve
        ▶ do_syscall_64
        ▶ entry_SYSCALL_64_after_hwframe
        ▶ page_fault
        ▼ entry_SYSCALL_64
          ▼ do_syscall_64
            ▶ __x64_sys_brk
            ▶ __x64_sys_access
            ▶ __x64_sys_openat
            ▶ __x64_sys_newfstat
            ▶ __x64_sys_mmap
            ▶ __x64_sys_close
            ▶ __x64_sys_read
            ▶ __x64_sys_mprotect
            ▶ __x64_sys_arch_prctl
            ▶ __x64_sys_munmap
            ▶ exit_to_usermode_loop
            ▶ __x64_sys_set_tid_address
            ▶ __x64_sys_set_robust_list
            ▶ __x64_sys_rt_sigaction
            ▶ __x64_sys_rt_sigprocmask
            ▶ __x64_sys_prlimit64
            ▶ __x64_sys_statfs
            ▶ __x64_sys_ioctl
            ▶ __x64_sys_getdents64
            ▶ __x64_sys_write
            ▶ __x64_sys_exit_group

Committer notes:

The first arg to the perf-with-kcore needs to be the same for the
'record' and 'script' lines, otherwise we'll record the perf.data file
and kcore_dir/ files in one directory ('example') to then try to use it
from the 'bep' directory, fix the instructions above it so that both use
'example'.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: stable@vger.kernel.org
Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol")
Link: http://lkml.kernel.org/r/20190619064429.14940-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'perf-core-for-mingo-5.3-20190611' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core</title>
<updated>2019-06-17T18:48:14Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2019-06-17T18:48:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3ce5aceb5dee298b082adfa2baa0df5a447c1b0b'/>
<id>urn:sha1:3ce5aceb5dee298b082adfa2baa0df5a447c1b0b</id>
<content type='text'>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf record:

  Alexey Budankov:

  - Allow mixing --user-regs with --call-graph=dwarf, making sure that
    the minimal set of registers for DWARF unwinding is present in the
    set of user registers requested to be present in each sample, while
    warning the user that this may make callchains unreliable if more
    that the minimal set of registers is needed to unwind.

  yuzhoujian:

  - Add support to collect callchains from kernel or user space only,
    IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user}
    bits from the command line.

perf trace:

  Arnaldo Carvalho de Melo:

  - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls
    BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit}
    payloads, use instead the syscall numbers obtainer either by the
    arch specific syscalltbl generators or from audit-libs.

  - Allow 'perf trace' to ask for the number of bytes to collect for
    string arguments, for now ask for PATH_MAX, i.e. the whole
    pathnames, which ends up being just a way to speficy which syscall
    args are pathnames and thus should be read using bpf_probe_read_str().

  - Skip unknown syscalls when expanding strace like syscall groups.
    This helps using the 'string' group of syscalls to work in arm64,
    where some of the syscalls present in x86_64 that deal with
    strings, for instance 'access', are deprecated and this should not
    be asked for tracing.

  Leo Yan:

  - Exit when failing to build eBPF program.

perf config:

  Arnaldo Carvalho de Melo:

  - Bail out when a handler returns failure for a key-value pair. This
    helps with cases where processing a key-value pair is not just a
    matter of setting some tool specific knob, involving, for instance
    building a BPF program to then attach to the list of events 'perf
    trace' will use, e.g. augmented_raw_syscalls.c.

perf.data:

  Kan Liang:

  - Read and store die ID information available in new Intel processors
    in CPUID.1F in the CPU topology written in the perf.data header.

perf stat:

  Kan Liang:

  - Support per-die aggregation.

Documentation:

  Arnaldo Carvalho de Melo:

  - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY,
    CLOCKID and DIR_FORMAT headers.

  Song Liu:

  - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF.

  Leo Yan:

  - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'.

JVMTI:

  Jiri Olsa:

  - Address gcc string overflow warning for strncpy()

core:

  - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd().

Intel PT:

  Adrian Hunter:

  - Add support for samples to contain IPC ratio, collecting cycles
    information from CYC packets, showing the IPC info periodically, because
    Intel PT does not update the cycle count on every branch or instruction,
    the incremental values will often be zero.  When there are values, they
    will be the number of instructions and number of cycles since the last
    update, and thus represent the average IPC since the last IPC value.

    E.g.:

    # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001
    rounding mmap pages size to 1024M (262144 pages)
    [ perf record: Woken up 0 times to write data ]
    [ perf record: Captured and wrote 2.208 MB perf.data ]
    # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid
    #
    &lt;SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)&gt;
    1   cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f   jnz 0x7f5219ac2af0       IPC: 0.81 (36/44)
    2   cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45   cmp $0x1f, %rbp
    3   cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49   jbe 0x7f5219ac2b00
    4   cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f   test $0x8, %al
    5   cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51   jnz 0x7f5219ac2b00
    6   cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57   movq  0x13c58a(%rip), %rcx
    7   cc1 63501.650479626: 7f5219ac27de _int_free+0x5e   mov %rdi, %r12
    8   cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61   movq  %fs:(%rcx), %rax
    9   cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65   test %rax, %rax
   10   cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68   jz 0x7f5219ac2821
   11   cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a   leaq  -0x11(%rbp), %rdi
   12   cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e   mov %rdi, %rsi
   13   cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71   shr $0x4, %rsi
   14   cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75   cmpq  %rsi, 0x13caf4(%rip)
   15   cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c   jbe 0x7f5219ac2821
   16   cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1   cmpq  0x13f138(%rip), %rbp
   17   cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8   jnbe 0x7f5219ac28d8
   18   cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158  testb  $0x2, 0x8(%rbx)
   19   cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c  jnz 0x7f5219ac2ab0       IPC: 6.00 (18/3)
    &lt;SNIP&gt;

  - Allow using time ranges with Intel PT, i.e. these features, already
    present but not optimially usable with Intel PT, should be now:

        Select the second 10% time slice:

        $ perf script --time 10%/2

        Select from 0% to 10% time slice:

        $ perf script --time 0%-10%

        Select the first and second 10% time slices:

        $ perf script --time 10%/1,10%/2

        Select from 0% to 10% and 30% to 40% slices:

        $ perf script --time 0%-10%,30%-40%

cs-etm (ARM):

  Mathieu Poirier:

  - Add support for CPU-wide trace scenarios.

s390:

  Thomas Richter:

  - Fix missing kvm module load for s390.

  - Fix OOM error in TUI mode on s390

  - Support s390 diag event display when doing analysis on !s390
    architectures.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
