| perf-trace(1) |
| ============= |
| |
| NAME |
| ---- |
| perf-trace - strace inspired tool |
| |
| SYNOPSIS |
| -------- |
| [verse] |
| 'perf trace' |
| 'perf trace record' |
| |
| DESCRIPTION |
| ----------- |
| This command will show the events associated with the target, initially |
| syscalls, but other system events like pagefaults, task lifetime events, |
| scheduling events, etc. |
| |
| This is a live mode tool in addition to working with perf.data files like |
| the other perf tools. Files can be generated using the 'perf record' command |
| but the session needs to include the raw_syscalls events (-e 'raw_syscalls:*'). |
| Alternatively, 'perf trace record' can be used as a shortcut to |
| automatically include the raw_syscalls events when writing events to a file. |
| |
| The following options apply to perf trace; options to perf trace record are |
| found in the perf record man page. |
| |
| OPTIONS |
| ------- |
| |
| -a:: |
| --all-cpus:: |
| System-wide collection from all CPUs. |
| |
| -e:: |
| --expr:: |
| --event:: |
| List of syscalls and other perf events (tracepoints, HW cache events, |
| etc) to show. Globbing is supported, e.g.: "epoll_*", "*msg*", etc. |
| See 'perf list' for a complete list of events. |
| Prefixing with ! shows all syscalls but the ones specified. You may |
| need to escape it. |
| |
| -D msecs:: |
| --delay msecs:: |
| After starting the program, wait msecs before measuring. This is useful to |
| filter out the startup phase of the program, which is often very different. |
| |
| -o:: |
| --output=:: |
| Output file name. |
| |
| -p:: |
| --pid=:: |
| Record events on existing process ID (comma separated list). |
| |
| -t:: |
| --tid=:: |
| Record events on existing thread ID (comma separated list). |
| |
| -u:: |
| --uid=:: |
| Record events in threads owned by uid. Name or number. |
| |
| -G:: |
| --cgroup:: |
| Record events in threads in a cgroup. |
| |
| Look for cgroups to set at the /sys/fs/cgroup/perf_event directory, then |
| remove the /sys/fs/cgroup/perf_event/ part and try: |
| |
| perf trace -G A -e sched:*switch |
| |
| Will set all raw_syscalls:sys_{enter,exit}, pgfault, vfs_getname, etc |
| _and_ sched:sched_switch to the 'A' cgroup, while: |
| |
| perf trace -e sched:*switch -G A |
| |
| will only set the sched:sched_switch event to the 'A' cgroup, all the |
| other events (raw_syscalls:sys_{enter,exit}, etc are left "without" |
| a cgroup (on the root cgroup, sys wide, etc). |
| |
| Multiple cgroups: |
| |
| perf trace -G A -e sched:*switch -G B |
| |
| the syscall ones go to the 'A' cgroup, the sched:sched_switch goes |
| to the 'B' cgroup. |
| |
| --filter-pids=:: |
| Filter out events for these pids and for 'trace' itself (comma separated list). |
| |
| -v:: |
| --verbose=:: |
| Verbosity level. |
| |
| --no-inherit:: |
| Child tasks do not inherit counters. |
| |
| -m:: |
| --mmap-pages=:: |
| Number of mmap data pages (must be a power of two) or size |
| specification with appended unit character - B/K/M/G. The |
| size is rounded up to have nearest pages power of two value. |
| |
| -C:: |
| --cpu:: |
| Collect samples only on the list of CPUs provided. Multiple CPUs can be provided as a |
| comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. |
| In per-thread mode with inheritance mode on (default), Events are captured only when |
| the thread executes on the designated CPUs. Default is to monitor all CPUs. |
| |
| --duration:: |
| Show only events that had a duration greater than N.M ms. |
| |
| --sched:: |
| Accrue thread runtime and provide a summary at the end of the session. |
| |
| -i:: |
| --input:: |
| Process events from a given perf data file. |
| |
| -T:: |
| --time:: |
| Print full timestamp rather time relative to first sample. |
| |
| --comm:: |
| Show process COMM right beside its ID, on by default, disable with --no-comm. |
| |
| -s:: |
| --summary:: |
| Show only a summary of syscalls by thread with min, max, and average times |
| (in msec) and relative stddev. |
| |
| -S:: |
| --with-summary:: |
| Show all syscalls followed by a summary by thread with min, max, and |
| average times (in msec) and relative stddev. |
| |
| --tool_stats:: |
| Show tool stats such as number of times fd->pathname was discovered thru |
| hooking the open syscall return + vfs_getname or via reading /proc/pid/fd, etc. |
| |
| -f:: |
| --force:: |
| Don't complain, do it. |
| |
| -F=[all|min|maj]:: |
| --pf=[all|min|maj]:: |
| Trace pagefaults. Optionally, you can specify whether you want minor, |
| major or all pagefaults. Default value is maj. |
| |
| --syscalls:: |
| Trace system calls. This options is enabled by default, disable with |
| --no-syscalls. |
| |
| --call-graph [mode,type,min[,limit],order[,key][,branch]]:: |
| Setup and enable call-graph (stack chain/backtrace) recording. |
| See `--call-graph` section in perf-record and perf-report |
| man pages for details. The ones that are most useful in 'perf trace' |
| are 'dwarf' and 'lbr', where available, try: 'perf trace --call-graph dwarf'. |
| |
| Using this will, for the root user, bump the value of --mmap-pages to 4 |
| times the maximum for non-root users, based on the kernel.perf_event_mlock_kb |
| sysctl. This is done only if the user doesn't specify a --mmap-pages value. |
| |
| --kernel-syscall-graph:: |
| Show the kernel callchains on the syscall exit path. |
| |
| --max-stack:: |
| Set the stack depth limit when parsing the callchain, anything |
| beyond the specified depth will be ignored. Note that at this point |
| this is just about the presentation part, i.e. the kernel is still |
| not limiting, the overhead of callchains needs to be set via the |
| knobs in --call-graph dwarf. |
| |
| Implies '--call-graph dwarf' when --call-graph not present on the |
| command line, on systems where DWARF unwinding was built in. |
| |
| Default: /proc/sys/kernel/perf_event_max_stack when present for |
| live sessions (without --input/-i), 127 otherwise. |
| |
| --min-stack:: |
| Set the stack depth limit when parsing the callchain, anything |
| below the specified depth will be ignored. Disabled by default. |
| |
| Implies '--call-graph dwarf' when --call-graph not present on the |
| command line, on systems where DWARF unwinding was built in. |
| |
| --print-sample:: |
| Print the PERF_RECORD_SAMPLE PERF_SAMPLE_ info for the |
| raw_syscalls:sys_{enter,exit} tracepoints, for debugging. |
| |
| --proc-map-timeout:: |
| When processing pre-existing threads /proc/XXX/mmap, it may take a long time, |
| because the file may be huge. A time out is needed in such cases. |
| This option sets the time out limit. The default value is 500 ms. |
| |
| PAGEFAULTS |
| ---------- |
| |
| When tracing pagefaults, the format of the trace is as follows: |
| |
| <min|maj>fault [<ip.symbol>+<ip.offset>] => <addr.dso@addr.offset> (<map type><addr level>). |
| |
| - min/maj indicates whether fault event is minor or major; |
| - ip.symbol shows symbol for instruction pointer (the code that generated the |
| fault); if no debug symbols available, perf trace will print raw IP; |
| - addr.dso shows DSO for the faulted address; |
| - map type is either 'd' for non-executable maps or 'x' for executable maps; |
| - addr level is either 'k' for kernel dso or '.' for user dso. |
| |
| For symbols resolution you may need to install debugging symbols. |
| |
| Please be aware that duration is currently always 0 and doesn't reflect actual |
| time it took for fault to be handled! |
| |
| When --verbose specified, perf trace tries to print all available information |
| for both IP and fault address in the form of dso@symbol+offset. |
| |
| EXAMPLES |
| -------- |
| |
| Trace only major pagefaults: |
| |
| $ perf trace --no-syscalls -F |
| |
| Trace syscalls, major and minor pagefaults: |
| |
| $ perf trace -F all |
| |
| 1416.547 ( 0.000 ms): python/20235 majfault [CRYPTO_push_info_+0x0] => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0@0x61be0 (x.) |
| |
| As you can see, there was major pagefault in python process, from |
| CRYPTO_push_info_ routine which faulted somewhere in libcrypto.so. |
| |
| SEE ALSO |
| -------- |
| linkperf:perf-record[1], linkperf:perf-script[1] |