| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <!-- This file documents the gprof profiler of the GNU system. |
| |
| Copyright (C) 1988-2015 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 |
| or any later version published by the Free Software Foundation; |
| with no Invariant Sections, with no Front-Cover Texts, and with no |
| Back-Cover Texts. A copy of the license is included in the |
| section entitled "GNU Free Documentation License". |
| --> |
| <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ --> |
| <head> |
| <title>GNU gprof: Internals</title> |
| |
| <meta name="description" content="GNU gprof: Internals"> |
| <meta name="keywords" content="GNU gprof: Internals"> |
| <meta name="resource-type" content="document"> |
| <meta name="distribution" content="global"> |
| <meta name="Generator" content="makeinfo"> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| <link href="index.html#Top" rel="start" title="Top"> |
| <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> |
| <link href="Details.html#Details" rel="up" title="Details"> |
| <link href="Debugging.html#Debugging" rel="next" title="Debugging"> |
| <link href="File-Format.html#File-Format" rel="prev" title="File Format"> |
| <style type="text/css"> |
| <!-- |
| a.summary-letter {text-decoration: none} |
| blockquote.smallquotation {font-size: smaller} |
| div.display {margin-left: 3.2em} |
| div.example {margin-left: 3.2em} |
| div.indentedblock {margin-left: 3.2em} |
| div.lisp {margin-left: 3.2em} |
| div.smalldisplay {margin-left: 3.2em} |
| div.smallexample {margin-left: 3.2em} |
| div.smallindentedblock {margin-left: 3.2em; font-size: smaller} |
| div.smalllisp {margin-left: 3.2em} |
| kbd {font-style:oblique} |
| pre.display {font-family: inherit} |
| pre.format {font-family: inherit} |
| pre.menu-comment {font-family: serif} |
| pre.menu-preformatted {font-family: serif} |
| pre.smalldisplay {font-family: inherit; font-size: smaller} |
| pre.smallexample {font-size: smaller} |
| pre.smallformat {font-family: inherit; font-size: smaller} |
| pre.smalllisp {font-size: smaller} |
| span.nocodebreak {white-space:nowrap} |
| span.nolinebreak {white-space:nowrap} |
| span.roman {font-family:serif; font-weight:normal} |
| span.sansserif {font-family:sans-serif; font-weight:normal} |
| ul.no-bullet {list-style: none} |
| --> |
| </style> |
| |
| |
| </head> |
| |
| <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> |
| <a name="Internals"></a> |
| <div class="header"> |
| <p> |
| Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="prev">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p> |
| </div> |
| <hr> |
| <a name="gprof_0027s-Internal-Operation"></a> |
| <h3 class="section">9.3 <code>gprof</code>’s Internal Operation</h3> |
| |
| <p>Like most programs, <code>gprof</code> begins by processing its options. |
| During this stage, it may building its symspec list |
| (<code>sym_ids.c:sym_id_add</code>), if |
| options are specified which use symspecs. |
| <code>gprof</code> maintains a single linked list of symspecs, |
| which will eventually get turned into 12 symbol tables, |
| organized into six include/exclude pairs—one |
| pair each for the flat profile (INCL_FLAT/EXCL_FLAT), |
| the call graph arcs (INCL_ARCS/EXCL_ARCS), |
| printing in the call graph (INCL_GRAPH/EXCL_GRAPH), |
| timing propagation in the call graph (INCL_TIME/EXCL_TIME), |
| the annotated source listing (INCL_ANNO/EXCL_ANNO), |
| and the execution count listing (INCL_EXEC/EXCL_EXEC). |
| </p> |
| <p>After option processing, <code>gprof</code> finishes |
| building the symspec list by adding all the symspecs in |
| <code>default_excluded_list</code> to the exclude lists |
| EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is specified, |
| EXCL_FLAT as well. |
| These default excludes are not added to EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC. |
| </p> |
| <p>Next, the BFD library is called to open the object file, |
| verify that it is an object file, |
| and read its symbol table (<code>core.c:core_init</code>), |
| using <code>bfd_canonicalize_symtab</code> after mallocing |
| an appropriately sized array of symbols. At this point, |
| function mappings are read (if the ‘<samp>--file-ordering</samp>’ option |
| has been specified), and the core text space is read into |
| memory (if the ‘<samp>-c</samp>’ option was given). |
| </p> |
| <p><code>gprof</code>’s own symbol table, an array of Sym structures, |
| is now built. |
| This is done in one of two ways, by one of two routines, depending |
| on whether line-by-line profiling (‘<samp>-l</samp>’ option) has been |
| enabled. |
| For normal profiling, the BFD canonical symbol table is scanned. |
| For line-by-line profiling, every |
| text space address is examined, and a new symbol table entry |
| gets created every time the line number changes. |
| In either case, two passes are made through the symbol |
| table—one to count the size of the symbol table required, |
| and the other to actually read the symbols. In between the |
| two passes, a single array of type <code>Sym</code> is created of |
| the appropriate length. |
| Finally, <code>symtab.c:symtab_finalize</code> |
| is called to sort the symbol table and remove duplicate entries |
| (entries with the same memory address). |
| </p> |
| <p>The symbol table must be a contiguous array for two reasons. |
| First, the <code>qsort</code> library function (which sorts an array) |
| will be used to sort the symbol table. |
| Also, the symbol lookup routine (<code>symtab.c:sym_lookup</code>), |
| which finds symbols |
| based on memory address, uses a binary search algorithm |
| which requires the symbol table to be a sorted array. |
| Function symbols are indicated with an <code>is_func</code> flag. |
| Line number symbols have no special flags set. |
| Additionally, a symbol can have an <code>is_static</code> flag |
| to indicate that it is a local symbol. |
| </p> |
| <p>With the symbol table read, the symspecs can now be translated |
| into Syms (<code>sym_ids.c:sym_id_parse</code>). Remember that a single |
| symspec can match multiple symbols. |
| An array of symbol tables |
| (<code>syms</code>) is created, each entry of which is a symbol table |
| of Syms to be included or excluded from a particular listing. |
| The master symbol table and the symspecs are examined by nested |
| loops, and every symbol that matches a symspec is inserted |
| into the appropriate syms table. This is done twice, once to |
| count the size of each required symbol table, and again to build |
| the tables, which have been malloced between passes. |
| From now on, to determine whether a symbol is on an include |
| or exclude symspec list, <code>gprof</code> simply uses its |
| standard symbol lookup routine on the appropriate table |
| in the <code>syms</code> array. |
| </p> |
| <p>Now the profile data file(s) themselves are read |
| (<code>gmon_io.c:gmon_out_read</code>), |
| first by checking for a new-style ‘<samp>gmon.out</samp>’ header, |
| then assuming this is an old-style BSD ‘<samp>gmon.out</samp>’ |
| if the magic number test failed. |
| </p> |
| <p>New-style histogram records are read by <code>hist.c:hist_read_rec</code>. |
| For the first histogram record, allocate a memory array to hold |
| all the bins, and read them in. |
| When multiple profile data files (or files with multiple histogram |
| records) are read, the memory ranges of each pair of histogram records |
| must be either equal, or non-overlapping. For each pair of histogram |
| records, the resolution (memory region size divided by the number of |
| bins) must be the same. The time unit must be the same for all |
| histogram records. If the above containts are met, all histograms |
| for the same memory range are merged. |
| </p> |
| <p>As each call graph record is read (<code>call_graph.c:cg_read_rec</code>), |
| the parent and child addresses |
| are matched to symbol table entries, and a call graph arc is |
| created by <code>cg_arcs.c:arc_add</code>, unless the arc fails a symspec |
| check against INCL_ARCS/EXCL_ARCS. As each arc is added, |
| a linked list is maintained of the parent’s child arcs, and of the child’s |
| parent arcs. |
| Both the child’s call count and the arc’s call count are |
| incremented by the record’s call count. |
| </p> |
| <p>Basic-block records are read (<code>basic_blocks.c:bb_read_rec</code>), |
| but only if line-by-line profiling has been selected. |
| Each basic-block address is matched to a corresponding line |
| symbol in the symbol table, and an entry made in the symbol’s |
| bb_addr and bb_calls arrays. Again, if multiple basic-block |
| records are present for the same address, the call counts |
| are cumulative. |
| </p> |
| <p>A gmon.sum file is dumped, if requested (<code>gmon_io.c:gmon_out_write</code>). |
| </p> |
| <p>If histograms were present in the data files, assign them to symbols |
| (<code>hist.c:hist_assign_samples</code>) by iterating over all the sample |
| bins and assigning them to symbols. Since the symbol table |
| is sorted in order of ascending memory addresses, we can |
| simple follow along in the symbol table as we make our pass |
| over the sample bins. |
| This step includes a symspec check against INCL_FLAT/EXCL_FLAT. |
| Depending on the histogram |
| scale factor, a sample bin may span multiple symbols, |
| in which case a fraction of the sample count is allocated |
| to each symbol, proportional to the degree of overlap. |
| This effect is rare for normal profiling, but overlaps |
| are more common during line-by-line profiling, and can |
| cause each of two adjacent lines to be credited with half |
| a hit, for example. |
| </p> |
| <p>If call graph data is present, <code>cg_arcs.c:cg_assemble</code> is called. |
| First, if ‘<samp>-c</samp>’ was specified, a machine-dependent |
| routine (<code>find_call</code>) scans through each symbol’s machine code, |
| looking for subroutine call instructions, and adding them |
| to the call graph with a zero call count. |
| A topological sort is performed by depth-first numbering |
| all the symbols (<code>cg_dfn.c:cg_dfn</code>), so that |
| children are always numbered less than their parents, |
| then making a array of pointers into the symbol table and sorting it into |
| numerical order, which is reverse topological |
| order (children appear before parents). |
| Cycles are also detected at this point, all members |
| of which are assigned the same topological number. |
| Two passes are now made through this sorted array of symbol pointers. |
| The first pass, from end to beginning (parents to children), |
| computes the fraction of child time to propagate to each parent |
| and a print flag. |
| The print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH, |
| with a parent’s include or exclude (print or no print) property |
| being propagated to its children, unless they themselves explicitly appear |
| in INCL_GRAPH or EXCL_GRAPH. |
| A second pass, from beginning to end (children to parents) actually |
| propagates the timings along the call graph, subject |
| to a check against INCL_TIME/EXCL_TIME. |
| With the print flag, fractions, and timings now stored in the symbol |
| structures, the topological sort array is now discarded, and a |
| new array of pointers is assembled, this time sorted by propagated time. |
| </p> |
| <p>Finally, print the various outputs the user requested, which is now fairly |
| straightforward. The call graph (<code>cg_print.c:cg_print</code>) and |
| flat profile (<code>hist.c:hist_print</code>) are regurgitations of values |
| already computed. The annotated source listing |
| (<code>basic_blocks.c:print_annotated_source</code>) uses basic-block |
| information, if present, to label each line of code with call counts, |
| otherwise only the function call counts are presented. |
| </p> |
| <p>The function ordering code is marginally well documented |
| in the source code itself (<code>cg_print.c</code>). Basically, |
| the functions with the most use and the most parents are |
| placed first, followed by other functions with the most use, |
| followed by lower use functions, followed by unused functions |
| at the end. |
| </p> |
| <hr> |
| <div class="header"> |
| <p> |
| Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="prev">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p> |
| </div> |
| |
| |
| |
| </body> |
| </html> |