linux/tools/perf/tests/shell/lib, branch for-next

perf test: Move attr files into shell directory where they are used

2024-10-17T20:17:36Z

Now the attr tests are shell tests move the associated python and configuration files. Update the installation build rules for the new directories. Recycle the lib install rules for python files allowing the explicit attr.py install line to be dropped. Signed-off-by: Ian Rogers Tested-by: Athira Rajeev Cc: Stephen Rothwell Cc: zhaimingbing Cc: Howard Chu Cc: Ze Gao Cc: Weilin Wang Cc: James Clark Cc: Leo Yan Cc: Thomas Richter Cc: Veronika Molnarova Link: https://lore.kernel.org/r/20241015000158.871828-4-irogers@google.com Signed-off-by: Namhyung Kim

perf stat: Add metric-threshold to json output

2024-10-17T19:44:26Z

When the threshold isn't unknown add a value to the json like: "metric-threshold" : "good" A more complete example: ``` $ perf stat -a -j -I 1000 {"interval" : 1.001089747, "counter-value" : "16045.281449", "unit" : "msec", "event" : "cpu-clock", "event-runtime" : 16045355135, "pcnt-running" : 100.00, "metric-value" : "16.045281", "metric-unit" : "CPUs utilized"} {"interval" : 1.001089747, "counter-value" : "10003.000000", "unit" : "", "event" : "context-switches", "event-runtime" : 16045314844, "pcnt-running" : 100.00, "metric-value" : "623.423156", "metric-unit" : "/sec"} {"interval" : 1.001089747, "counter-value" : "328.000000", "unit" : "", "event" : "cpu-migrations", "event-runtime" : 16045321403, "pcnt-running" : 100.00, "metric-value" : "20.442147", "metric-unit" : "/sec"} {"interval" : 1.001089747, "counter-value" : "20114.000000", "unit" : "", "event" : "page-faults", "event-runtime" : 16045355927, "pcnt-running" : 100.00, "metric-value" : "1.253577", "metric-unit" : "K/sec"} {"interval" : 1.001089747, "counter-value" : "4066679471.000000", "unit" : "", "event" : "instructions", "event-runtime" : 16045369123, "pcnt-running" : 100.00, "metric-value" : "1.628330", "metric-unit" : "insn per cycle"} {"interval" : 1.001089747, "counter-value" : "2497454658.000000", "unit" : "", "event" : "cycles", "event-runtime" : 16045374810, "pcnt-running" : 100.00, "metric-value" : "0.155650", "metric-unit" : "GHz"} {"interval" : 1.001089747, "counter-value" : "914974294.000000", "unit" : "", "event" : "branches", "event-runtime" : 16045379877, "pcnt-running" : 100.00, "metric-value" : "57.024509", "metric-unit" : "M/sec"} {"interval" : 1.001089747, "counter-value" : "9237201.000000", "unit" : "", "event" : "branch-misses", "event-runtime" : 16045375017, "pcnt-running" : 100.00, "metric-value" : "1.009559", "metric-unit" : "of all branches", "metric-threshold" : "good"} {"interval" : 1.001089747, "event-runtime" : 16045397172, "pcnt-running" : 100.00, "metricgroup" : "TopdownL1"} {"interval" : 1.001089747, "metric-value" : "22.036686", "metric-unit" : "% tma_backend_bound", "metric-threshold" : "bad"} {"interval" : 1.001089747, "metric-value" : "7.610161", "metric-unit" : "% tma_bad_speculation", "metric-threshold" : "good"} {"interval" : 1.001089747, "metric-value" : "36.729687", "metric-unit" : "% tma_frontend_bound", "metric-threshold" : "bad"} {"interval" : 1.001089747, "metric-value" : "33.623465", "metric-unit" : "% tma_retiring"} ... ``` Signed-off-by: Ian Rogers Cc: Yicong Yang Cc: Weilin Wang Cc: Will Deacon Cc: James Clark Cc: Mike Leach Cc: Leo Yan Cc: Sumanth Korikkar Cc: Thomas Richter Cc: Tim Chen Cc: John Garry Link: https://lore.kernel.org/r/20241017175356.783793-7-irogers@google.com Signed-off-by: Namhyung Kim

perf test: Speed up some tests using perf list

2024-10-17T16:55:58Z

On my system, perf list is very slow to print the whole events. I think there's a performance issue in SDT and uprobes event listing. I noticed this issue while running perf test on x86 but it takes long to check some CoreSight event which should be skipped quickly. Anyway, some test uses perf list to check whether the required event is available before running the test. The perf list command can take an argument to specify event class or (glob) pattern. But glob pattern is only to suppress output for unmatched ones after checking all events. In this case, specifying event class is better to reduce the number of events it checks and to avoid buggy subsystems entirely. No functional changes intended. Reviewed-by: James Clark Reviewed-by: Ian Rogers Cc: German Gomez Cc: Carsten Haitzler Cc: Leo Yan Link: https://lore.kernel.org/r/20241016065654.269994-1-namhyung@kernel.org Signed-off-by: Namhyung Kim

perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex

2024-09-11T12:35:34Z

Thomas reported the vfs_getname perf tests failing on s/390, it seems it was just to some extraneous '=' somehow getting into the regexp, remove it, now: root@x1:~# perf test getname 91: Add vfs_getname probe to get syscall args filenames : Ok 93: Use vfs_getname probe to get syscall args filenames : FAILED! 126: Check open filename arg using perf trace + vfs_getname : Ok root@x1:~# Second one remains a mistery, have to take some time to nail it down. Reported-by: Thomas Richter Tested-by: Thomas Richter Cc: Adrian Hunter Cc: Alexander Gordeev Cc: Heiko Carstens Cc: Ian Rogers Cc: Jiri Olsa Cc: Kan Liang Cc: Namhyung Kim Cc: Vasily Gorbik , Link: https://lore.kernel.org/lkml/1d7f3b7b-9edc-4d90-955c-9345428563f1@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo

perf tests probe_vfs_getname.sh: Update to use 'perf check feature'

2024-09-04T19:19:52Z

In probe_vfs_getname.sh, current we use "perf record --dry-run" to check for libtraceevent and skip the test if perf is not build with libtraceevent. Change the check to use "perf check feature" option Signed-off-by: Athira Rajeev Acked-by: Namhyung Kim Cc: Disha Goel Cc: Ian Rogers Cc: Jiri Olsa Cc: Kajol Jain Cc: Madhavan Srinivasan Cc: Namhyung Kim Link: https://lore.kernel.org/r/20240904190132.415212-6-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo

perf test vfs_getname: Look for alternative line where to collect the pathname

2024-08-28T21:07:20Z

The getname_flags() routine changed recently and thus the place where we were getting the pathname is not probeable anymore, albeit still present, so use the next line for that, before: root@number:/home/acme/git/perf-tools-next# perf test vfs_getname 91: Add vfs_getname probe to get syscall args filenames : FAILED! 93: Use vfs_getname probe to get syscall args filenames : FAILED! 126: Check open filename arg using perf trace + vfs_getname : FAILED! root@number:/home/acme/git/perf-tools-next# Now tests 91 and 126 are passing, some more investigation is needed for test 93, that continues to fail. Cc: Adrian Hunter Cc: Ian Rogers Cc: Jiri Olsa Cc: Kan Liang Cc: Namhyung Kim Signed-off-by: Arnaldo Carvalho de Melo

perf test: make metric validation test return early when there is no metric supported on the test system

2024-07-31T19:58:18Z

Add a check to return the metric validation test early when perf list metric does not output any metric. This would happen when NO_JEVENTS=1 is set or in a system that there is no metric supported. Signed-off-by: Weilin Wang Tested-by: Ian Rogers Cc: Adrian Hunter Cc: Alexander Shishkin Cc: Caleb Biggers Cc: Ingo Molnar Cc: Jiri Olsa Cc: Kan Liang Cc: Namhyung Kim Cc: Perry Taylor Cc: Peter Zijlstra Cc: Samantha Alt Link: https://lore.kernel.org/lkml/20240522204254.1841420-1-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo

perf test: Stat output per thread of just the parent process

2024-03-21T16:54:39Z

Per-thread mode requires either system-wide (-a), a pid (-p) or a tid (-t). The stat output tests were using system-wide mode but this is racy when threads are starting and exiting - something that happens a lot when running the tests in parallel (perf test -p). Avoid the race conditions by using pid mode with the pid of the parent process. Signed-off-by: Ian Rogers Cc: Adrian Hunter Cc: Alexander Shishkin Cc: Athira Rajeev Cc: Christian Brauner Cc: Disha Goel Cc: Ingo Molnar Cc: James Clark Cc: Jiri Olsa Cc: K Prateek Nayak Cc: Kajol Jain Cc: Kan Liang Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Song Liu Cc: Tim Chen Cc: Yicong Yang Link: https://lore.kernel.org/r/20240301074639.2260708-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo

perf tests: Avoid fork in perf_has_symbol test

2024-02-22T17:12:04Z

perf test -vv Symbols is used to indentify symbols within the perf binary. Add the -F flag so that the test command doesn't fork the test before running. This removes a little overhead. Acked-by: Adrian Hunter Signed-off-by: Ian Rogers Cc: James Clark Cc: Justin Stitt Cc: Bill Wendling Cc: Nick Desaulniers Cc: Yang Jihong Cc: Nathan Chancellor Cc: Kan Liang Cc: Athira Jajeev Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim Link: https://lore.kernel.org/r/20240221034155.1500118-4-irogers@google.com

perf stat: Support per-cluster aggregation

2024-02-09T22:59:53Z

Some platforms have 'cluster' topology and CPUs in the cluster will share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel Jacobsville). Currently parsing and building cluster topology have been supported since [1]. perf stat has already supported aggregation for other topologies like die or socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T bandwidth contention. This patch add support for "--per-cluster" option for per-cluster aggregation. Also update the docs and related test. The output will be like: [root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5 Performance counter stats for 'system wide': S56-D0-CLS158 4 1,321,521,570 LLC-load S56-D0-CLS594 4 794,211,453 LLC-load S56-D0-CLS1030 4 41,623 LLC-load S56-D0-CLS1466 4 41,646 LLC-load S56-D0-CLS1902 4 16,863 LLC-load S56-D0-CLS2338 4 15,721 LLC-load S56-D0-CLS2774 4 22,671 LLC-load [...] On a legacy system without cluster or cluster support, the output will be look like: [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1 Performance counter stats for 'system wide': S56-D0-CLS0 64 18,011,485 cycles S7182-D0-CLS0 64 16,548,835 cycles Note that this patch doesn't mix the cluster information in the outputs of --per-core to avoid breaking any tools/scripts using it. Note that perf recently supports "--per-cache" aggregation, but it's not the same with the cluster although cluster CPUs may share some cache resources. For example on my machine all clusters within a die share the same L3 cache: $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list 0-31 $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list 0-3 [1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die") Tested-by: Jie Zhan Reviewed-by: Tim Chen Reviewed-by: Ian Rogers Signed-off-by: Yicong Yang Cc: james.clark@arm.com Cc: 21cnbao@gmail.com Cc: prime.zeng@hisilicon.com Cc: Jonathan.Cameron@huawei.com Cc: fanghao11@huawei.com Cc: linuxarm@huawei.com Cc: tim.c.chen@intel.com Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com