| Overhead calculation |
| -------------------- |
| The overhead can be shown in two columns as 'Children' and 'Self' when |
| perf collects callchains. The 'self' overhead is simply calculated by |
| adding all period values of the entry - usually a function (symbol). |
| This is the value that perf shows traditionally and sum of all the |
| 'self' overhead values should be 100%. |
| |
| The 'children' overhead is calculated by adding all period values of |
| the child functions so that it can show the total overhead of the |
| higher level functions even if they don't directly execute much. |
| 'Children' here means functions that are called from another (parent) |
| function. |
| |
| It might be confusing that the sum of all the 'children' overhead |
| values exceeds 100% since each of them is already an accumulation of |
| 'self' overhead of its child functions. But with this enabled, users |
| can find which function has the most overhead even if samples are |
| spread over the children. |
| |
| Consider the following example; there are three functions like below. |
| |
| ----------------------- |
| void foo(void) { |
| /* do something */ |
| } |
| |
| void bar(void) { |
| /* do something */ |
| foo(); |
| } |
| |
| int main(void) { |
| bar() |
| return 0; |
| } |
| ----------------------- |
| |
| In this case 'foo' is a child of 'bar', and 'bar' is an immediate |
| child of 'main' so 'foo' also is a child of 'main'. In other words, |
| 'main' is a parent of 'foo' and 'bar', and 'bar' is a parent of 'foo'. |
| |
| Suppose all samples are recorded in 'foo' and 'bar' only. When it's |
| recorded with callchains the output will show something like below |
| in the usual (self-overhead-only) output of perf report: |
| |
| ---------------------------------- |
| Overhead Symbol |
| ........ ..................... |
| 60.00% foo |
| | |
| --- foo |
| bar |
| main |
| __libc_start_main |
| |
| 40.00% bar |
| | |
| --- bar |
| main |
| __libc_start_main |
| ---------------------------------- |
| |
| When the --children option is enabled, the 'self' overhead values of |
| child functions (i.e. 'foo' and 'bar') are added to the parents to |
| calculate the 'children' overhead. In this case the report could be |
| displayed as: |
| |
| ------------------------------------------- |
| Children Self Symbol |
| ........ ........ .................... |
| 100.00% 0.00% __libc_start_main |
| | |
| --- __libc_start_main |
| |
| 100.00% 0.00% main |
| | |
| --- main |
| __libc_start_main |
| |
| 100.00% 40.00% bar |
| | |
| --- bar |
| main |
| __libc_start_main |
| |
| 60.00% 60.00% foo |
| | |
| --- foo |
| bar |
| main |
| __libc_start_main |
| ------------------------------------------- |
| |
| In the above output, the 'self' overhead of 'foo' (60%) was add to the |
| 'children' overhead of 'bar', 'main' and '\_\_libc_start_main'. |
| Likewise, the 'self' overhead of 'bar' (40%) was added to the |
| 'children' overhead of 'main' and '\_\_libc_start_main'. |
| |
| So '\_\_libc_start_main' and 'main' are shown first since they have |
| same (100%) 'children' overhead (even though they have zero 'self' |
| overhead) and they are the parents of 'foo' and 'bar'. |
| |
| Since v3.16 the 'children' overhead is shown by default and the output |
| is sorted by its values. The 'children' overhead is disabled by |
| specifying --no-children option on the command line or by adding |
| 'report.children = false' or 'top.children = false' in the perf config |
| file. |