tools/perf/Documentation/perf-report.txt - linux - Git at Google

 perf-report(1)
 ==============

 NAME
 ----
 perf-report - Read perf.data (created by perf record) and display the profile

 SYNOPSIS
 --------
 [verse]
 'perf report' [-i <file> | --input=file]

 DESCRIPTION
 -----------
 This command displays the performance counter profile information recorded
 via perf record.

 OPTIONS
 -------
 -i::
 --input=::
         Input file name. (default: perf.data unless stdin is a fifo)

 -v::
 --verbose::
         Be more verbose. (show symbol address, etc)

 -n::
 --show-nr-samples::
 	Show the number of samples for each symbol

 --showcpuutilization::
         Show sample percentage for different cpu modes.

 -T::
 --threads::
 	Show per-thread event counters
 -c::
 --comms=::
 	Only consider symbols in these comms. CSV that understands
 	file://filename entries.  This option will affect the percentage of
 	the overhead column.  See --percentage for more info.
 -d::
 --dsos=::
 	Only consider symbols in these dsos. CSV that understands
 	file://filename entries.  This option will affect the percentage of
 	the overhead column.  See --percentage for more info.
 -S::
 --symbols=::
 	Only consider these symbols. CSV that understands
 	file://filename entries.  This option will affect the percentage of
 	the overhead column.  See --percentage for more info.

 --symbol-filter=::
 	Only show symbols that match (partially) with this filter.

 -U::
 --hide-unresolved::
         Only display entries resolved to a symbol.

 -s::
 --sort=::
 	Sort histogram entries by given key(s) - multiple keys can be specified
 	in CSV format.  Following sort keys are available:
 	pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.

 	Each key has following meaning:

 	- comm: command (name) of the task which can be read via /proc/<pid>/comm
 	- pid: command and tid of the task
 	- dso: name of library or module executed at the time of sample
 	- symbol: name of function executed at the time of sample
 	- parent: name of function matched to the parent regex filter. Unmatched
 	entries are displayed as "[other]".
 	- cpu: cpu number the task ran at the time of sample
 	- srcline: filename and line number executed at the time of sample.  The
 	DWARF debugging info must be provided.
 	- weight: Event specific weight, e.g. memory latency or transaction
 	abort cost. This is the global weight.
 	- local_weight: Local weight version of the weight above.
 	- transaction: Transaction abort flags.
 	- overhead: Overhead percentage of sample
 	- overhead_sys: Overhead percentage of sample running in system mode
 	- overhead_us: Overhead percentage of sample running in user mode
 	- overhead_guest_sys: Overhead percentage of sample running in system mode
 	on guest machine
 	- overhead_guest_us: Overhead percentage of sample running in user mode on
 	guest machine
 	- sample: Number of sample
 	- period: Raw number of event count of sample

 	By default, comm, dso and symbol keys are used.
 	(i.e. --sort comm,dso,symbol)

 	If --branch-stack option is used, following sort keys are also
 	available:
 	dso_from, dso_to, symbol_from, symbol_to, mispredict.

 	- dso_from: name of library or module branched from
 	- dso_to: name of library or module branched to
 	- symbol_from: name of function branched from
 	- symbol_to: name of function branched to
 	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
 	- in_tx: branch in TSX transaction
 	- abort: TSX transaction abort.

 	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
 	and symbol_to, see '--branch-stack'.

 -F::
 --fields=::
 	Specify output field - multiple keys can be specified in CSV format.
 	Following fields are available:
 	overhead, overhead_sys, overhead_us, overhead_children, sample and period.
 	Also it can contain any sort key(s).

 	By default, every sort keys not specified in -F will be appended
 	automatically.

 	If --mem-mode option is used, following sort keys are also available
 	(incompatible with --branch-stack):
 	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.

 	- symbol_daddr: name of data symbol being executed on at the time of sample
 	- dso_daddr: name of library or module containing the data being executed
 	on at the time of sample
 	- locked: whether the bus was locked at the time of sample
 	- tlb: type of tlb access for the data at the time of sample
 	- mem: type of memory access for the data at the time of sample
 	- snoop: type of snoop (if any) for the data at the time of sample
 	- dcacheline: the cacheline the data address is on at the time of sample

 	And default sort keys are changed to local_weight, mem, sym, dso,
 	symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.

 -p::
 --parent=<regex>::
         A regex filter to identify parent. The parent is a caller of this
 	function and searched through the callchain, thus it requires callchain
 	information recorded. The pattern is in the exteneded regex format and
 	defaults to "\^sys_|^do_page_fault", see '--sort parent'.

 -x::
 --exclude-other::
         Only display entries with parent-match.

 -w::
 --column-widths=<width[,width...]>::
 	Force each column width to the provided list, for large terminal
 	readability.  0 means no limit (default behavior).

 -t::
 --field-separator=::
 	Use a special separator character and don't pad with spaces, replacing
 	all occurrences of this separator in symbol names (and other output)
 	with a '.' character, that thus it's the only non valid separator.

 -D::
 --dump-raw-trace::
         Dump raw trace in ASCII.

 -g [type,min[,limit],order[,key]]::
 --call-graph::
         Display call chains using type, min percent threshold, optional print
 	limit and order.
 	type can be either:
 	- flat: single column, linear exposure of call chains.
 	- graph: use a graph tree, displaying absolute overhead rates.
 	- fractal: like graph, but displays relative rates. Each branch of
 		 the tree is considered as a new profiled object. +

 	order can be either:
 	- callee: callee based call graph.
 	- caller: inverted caller based call graph.

 	key can be:
 	- function: compare on functions
 	- address: compare on individual code addresses

 	Default: fractal,0.5,callee,function.

 --children::
 	Accumulate callchain of children to parent entry so that then can
 	show up in the output.  The output will have a new "Children" column
 	and will be sorted on the data.  It requires callchains are recorded.

 --max-stack::
 	Set the stack depth limit when parsing the callchain, anything
 	beyond the specified depth will be ignored. This is a trade-off
 	between information loss and faster processing especially for
 	workloads that can have a very long callchain stack.

 	Default: 127

 -G::
 --inverted::
         alias for inverted caller based call graph.

 --ignore-callees=<regex>::
         Ignore callees of the function(s) matching the given regex.
         This has the effect of collecting the callers of each such
         function into one place in the call-graph tree.

 --pretty=<key>::
         Pretty printing style.  key: normal, raw

 --stdio:: Use the stdio interface.

 --tui:: Use the TUI interface, that is integrated with annotate and allows
         zooming into DSOs or threads, among other features. Use of --tui
 	requires a tty, if one is not present, as when piping to other
 	commands, the stdio interface is used.

 --gtk:: Use the GTK2 interface.

 -k::
 --vmlinux=<file>::
         vmlinux pathname

 --kallsyms=<file>::
         kallsyms pathname

 -m::
 --modules::
         Load module symbols. WARNING: This should only be used with -k and
         a LIVE kernel.

 -f::
 --force::
         Don't complain, do it.

 --symfs=<directory>::
         Look for files with symbols relative to this directory.

 -C::
 --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
 	be provided as a comma-separated list with no space: 0,1. Ranges of
 	CPUs are specified with -: 0-2. Default is to report samples on all
 	CPUs.

 -M::
 --disassembler-style=:: Set disassembler style for objdump.

 --source::
 	Interleave source code with assembly code. Enabled by default,
 	disable with --no-source.

 --asm-raw::
 	Show raw instruction encoding of assembly instructions.

 --show-total-period:: Show a column with the sum of periods.

 -I::
 --show-info::
 	Display extended information about the perf.data file. This adds
 	information which may be very large and thus may clutter the display.
 	It currently includes: cpu and numa topology of the host system.

 -b::
 --branch-stack::
 	Use the addresses of sampled taken branches instead of the instruction
 	address to build the histograms. To generate meaningful output, the
 	perf.data file must have been obtained using perf record -b or
 	perf record --branch-filter xxx where xxx is a branch filter option.
 	perf report is able to auto-detect whether a perf.data file contains
 	branch stacks and it will automatically switch to the branch view mode,
 	unless --no-branch-stack is used.

 --objdump=<path>::
         Path to objdump binary.

 --group::
 	Show event group information together.

 --demangle::
 	Demangle symbol names to human readable form. It's enabled by default,
 	disable with --no-demangle.

 --demangle-kernel::
 	Demangle kernel symbol names to human readable form (for C++ kernels).

 --mem-mode::
 	Use the data addresses of samples in addition to instruction addresses
 	to build the histograms.  To generate meaningful output, the perf.data
 	file must have been obtained using perf record -d -W and using a
 	special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See
 	'perf mem' for simpler access.

 --percent-limit::
 	Do not show entries which have an overhead under that percent.
 	(Default: 0).

 --percentage::
 	Determine how to display the overhead percentage of filtered entries.
 	Filters can be applied by --comms, --dsos and/or --symbols options and
 	Zoom operations on the TUI (thread, dso, etc).

 	"relative" means it's relative to filtered entries only so that the
 	sum of shown entries will be always 100%.  "absolute" means it retains
 	the original value before and after the filter is applied.

 --header::
 	Show header information in the perf.data file.  This includes
 	various information like hostname, OS and perf version, cpu/mem
 	info, perf command line, event list and so on.  Currently only
 	--stdio output supports this feature.

 --header-only::
 	Show only perf.data header (forces --stdio).

 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-annotate[1]
	perf-report(1)
	==============

	NAME
	----
	perf-report - Read perf.data (created by perf record) and display the profile

	SYNOPSIS
	--------
	[verse]
	'perf report' [-i <file> \| --input=file]

	DESCRIPTION
	-----------
	This command displays the performance counter profile information recorded
	via perf record.

	OPTIONS
	-------
	-i::
	--input=::
	Input file name. (default: perf.data unless stdin is a fifo)

	-v::
	--verbose::
	Be more verbose. (show symbol address, etc)

	-n::
	--show-nr-samples::
	Show the number of samples for each symbol

	--showcpuutilization::
	Show sample percentage for different cpu modes.

	-T::
	--threads::
	Show per-thread event counters
	-c::
	--comms=::
	Only consider symbols in these comms. CSV that understands
	file://filename entries. This option will affect the percentage of
	the overhead column. See --percentage for more info.
	-d::
	--dsos=::
	Only consider symbols in these dsos. CSV that understands
	file://filename entries. This option will affect the percentage of
	the overhead column. See --percentage for more info.
	-S::
	--symbols=::
	Only consider these symbols. CSV that understands
	file://filename entries. This option will affect the percentage of
	the overhead column. See --percentage for more info.

	--symbol-filter=::
	Only show symbols that match (partially) with this filter.

	-U::
	--hide-unresolved::
	Only display entries resolved to a symbol.

	-s::
	--sort=::
	Sort histogram entries by given key(s) - multiple keys can be specified
	in CSV format. Following sort keys are available:
	pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.

	Each key has following meaning:

	- comm: command (name) of the task which can be read via /proc/<pid>/comm
	- pid: command and tid of the task
	- dso: name of library or module executed at the time of sample
	- symbol: name of function executed at the time of sample
	- parent: name of function matched to the parent regex filter. Unmatched
	entries are displayed as "[other]".
	- cpu: cpu number the task ran at the time of sample
	- srcline: filename and line number executed at the time of sample. The
	DWARF debugging info must be provided.
	- weight: Event specific weight, e.g. memory latency or transaction
	abort cost. This is the global weight.
	- local_weight: Local weight version of the weight above.
	- transaction: Transaction abort flags.
	- overhead: Overhead percentage of sample
	- overhead_sys: Overhead percentage of sample running in system mode
	- overhead_us: Overhead percentage of sample running in user mode
	- overhead_guest_sys: Overhead percentage of sample running in system mode
	on guest machine
	- overhead_guest_us: Overhead percentage of sample running in user mode on
	guest machine
	- sample: Number of sample
	- period: Raw number of event count of sample

	By default, comm, dso and symbol keys are used.
	(i.e. --sort comm,dso,symbol)

	If --branch-stack option is used, following sort keys are also
	available:
	dso_from, dso_to, symbol_from, symbol_to, mispredict.

	- dso_from: name of library or module branched from
	- dso_to: name of library or module branched to
	- symbol_from: name of function branched from
	- symbol_to: name of function branched to
	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
	- in_tx: branch in TSX transaction
	- abort: TSX transaction abort.

	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
	and symbol_to, see '--branch-stack'.

	-F::
	--fields=::
	Specify output field - multiple keys can be specified in CSV format.
	Following fields are available:
	overhead, overhead_sys, overhead_us, overhead_children, sample and period.
	Also it can contain any sort key(s).

	By default, every sort keys not specified in -F will be appended
	automatically.

	If --mem-mode option is used, following sort keys are also available
	(incompatible with --branch-stack):
	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.

	- symbol_daddr: name of data symbol being executed on at the time of sample
	- dso_daddr: name of library or module containing the data being executed
	on at the time of sample
	- locked: whether the bus was locked at the time of sample
	- tlb: type of tlb access for the data at the time of sample
	- mem: type of memory access for the data at the time of sample
	- snoop: type of snoop (if any) for the data at the time of sample
	- dcacheline: the cacheline the data address is on at the time of sample

	And default sort keys are changed to local_weight, mem, sym, dso,
	symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.

	-p::
	--parent=<regex>::
	A regex filter to identify parent. The parent is a caller of this
	function and searched through the callchain, thus it requires callchain
	information recorded. The pattern is in the exteneded regex format and
	defaults to "\^sys_\|^do_page_fault", see '--sort parent'.

	-x::
	--exclude-other::
	Only display entries with parent-match.

	-w::
	--column-widths=<width[,width...]>::
	Force each column width to the provided list, for large terminal
	readability. 0 means no limit (default behavior).

	-t::
	--field-separator=::
	Use a special separator character and don't pad with spaces, replacing
	all occurrences of this separator in symbol names (and other output)
	with a '.' character, that thus it's the only non valid separator.

	-D::
	--dump-raw-trace::
	Dump raw trace in ASCII.

	-g [type,min[,limit],order[,key]]::
	--call-graph::
	Display call chains using type, min percent threshold, optional print
	limit and order.
	type can be either:
	- flat: single column, linear exposure of call chains.
	- graph: use a graph tree, displaying absolute overhead rates.
	- fractal: like graph, but displays relative rates. Each branch of
	the tree is considered as a new profiled object. +

	order can be either:
	- callee: callee based call graph.
	- caller: inverted caller based call graph.

	key can be:
	- function: compare on functions
	- address: compare on individual code addresses

	Default: fractal,0.5,callee,function.

	--children::
	Accumulate callchain of children to parent entry so that then can
	show up in the output. The output will have a new "Children" column
	and will be sorted on the data. It requires callchains are recorded.

	--max-stack::
	Set the stack depth limit when parsing the callchain, anything
	beyond the specified depth will be ignored. This is a trade-off
	between information loss and faster processing especially for
	workloads that can have a very long callchain stack.

	Default: 127

	-G::
	--inverted::
	alias for inverted caller based call graph.

	--ignore-callees=<regex>::
	Ignore callees of the function(s) matching the given regex.
	This has the effect of collecting the callers of each such
	function into one place in the call-graph tree.

	--pretty=<key>::
	Pretty printing style. key: normal, raw

	--stdio:: Use the stdio interface.

	--tui:: Use the TUI interface, that is integrated with annotate and allows
	zooming into DSOs or threads, among other features. Use of --tui
	requires a tty, if one is not present, as when piping to other
	commands, the stdio interface is used.

	--gtk:: Use the GTK2 interface.

	-k::
	--vmlinux=<file>::
	vmlinux pathname

	--kallsyms=<file>::
	kallsyms pathname

	-m::
	--modules::
	Load module symbols. WARNING: This should only be used with -k and
	a LIVE kernel.

	-f::
	--force::
	Don't complain, do it.

	--symfs=<directory>::
	Look for files with symbols relative to this directory.

	-C::
	--cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
	be provided as a comma-separated list with no space: 0,1. Ranges of
	CPUs are specified with -: 0-2. Default is to report samples on all
	CPUs.

	-M::
	--disassembler-style=:: Set disassembler style for objdump.

	--source::
	Interleave source code with assembly code. Enabled by default,
	disable with --no-source.

	--asm-raw::
	Show raw instruction encoding of assembly instructions.

	--show-total-period:: Show a column with the sum of periods.

	-I::
	--show-info::
	Display extended information about the perf.data file. This adds
	information which may be very large and thus may clutter the display.
	It currently includes: cpu and numa topology of the host system.

	-b::
	--branch-stack::
	Use the addresses of sampled taken branches instead of the instruction
	address to build the histograms. To generate meaningful output, the
	perf.data file must have been obtained using perf record -b or
	perf record --branch-filter xxx where xxx is a branch filter option.
	perf report is able to auto-detect whether a perf.data file contains
	branch stacks and it will automatically switch to the branch view mode,
	unless --no-branch-stack is used.

	--objdump=<path>::
	Path to objdump binary.

	--group::
	Show event group information together.

	--demangle::
	Demangle symbol names to human readable form. It's enabled by default,
	disable with --no-demangle.

	--demangle-kernel::
	Demangle kernel symbol names to human readable form (for C++ kernels).

	--mem-mode::
	Use the data addresses of samples in addition to instruction addresses
	to build the histograms. To generate meaningful output, the perf.data
	file must have been obtained using perf record -d -W and using a
	special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See
	'perf mem' for simpler access.

	--percent-limit::
	Do not show entries which have an overhead under that percent.
	(Default: 0).

	--percentage::
	Determine how to display the overhead percentage of filtered entries.
	Filters can be applied by --comms, --dsos and/or --symbols options and
	Zoom operations on the TUI (thread, dso, etc).

	"relative" means it's relative to filtered entries only so that the
	sum of shown entries will be always 100%. "absolute" means it retains
	the original value before and after the filter is applied.

	--header::
	Show header information in the perf.data file. This includes
	various information like hostname, OS and perf version, cpu/mem
	info, perf command line, event list and so on. Currently only
	--stdio output supports this feature.

	--header-only::
	Show only perf.data header (forces --stdio).

	SEE ALSO
	--------
	linkperf:perf-stat[1], linkperf:perf-annotate[1]