| <html lang="en"> |
| <head> |
| <title>Implementation - GNU gprof</title> |
| <meta http-equiv="Content-Type" content="text/html"> |
| <meta name="description" content="GNU gprof"> |
| <meta name="generator" content="makeinfo 4.13"> |
| <link title="Top" rel="start" href="index.html#Top"> |
| <link rel="up" href="Details.html#Details" title="Details"> |
| <link rel="next" href="File-Format.html#File-Format" title="File Format"> |
| <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage"> |
| <!-- |
| This file documents the gprof profiler of the GNU system. |
| |
| Copyright (C) 1988-2019 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 |
| or any later version published by the Free Software Foundation; |
| with no Invariant Sections, with no Front-Cover Texts, and with no |
| Back-Cover Texts. A copy of the license is included in the |
| section entitled ``GNU Free Documentation License''. |
| |
| --> |
| <meta http-equiv="Content-Style-Type" content="text/css"> |
| <style type="text/css"><!-- |
| pre.display { font-family:inherit } |
| pre.format { font-family:inherit } |
| pre.smalldisplay { font-family:inherit; font-size:smaller } |
| pre.smallformat { font-family:inherit; font-size:smaller } |
| pre.smallexample { font-size:smaller } |
| pre.smalllisp { font-size:smaller } |
| span.sc { font-variant:small-caps } |
| span.roman { font-family:serif; font-weight:normal; } |
| span.sansserif { font-family:sans-serif; font-weight:normal; } |
| --></style> |
| </head> |
| <body> |
| <div class="node"> |
| <a name="Implementation"></a> |
| <p> |
| Next: <a rel="next" accesskey="n" href="File-Format.html#File-Format">File Format</a>, |
| Up: <a rel="up" accesskey="u" href="Details.html#Details">Details</a> |
| <hr> |
| </div> |
| |
| <h3 class="section">9.1 Implementation of Profiling</h3> |
| |
| <p>Profiling works by changing how every function in your program is compiled |
| so that when it is called, it will stash away some information about where |
| it was called from. From this, the profiler can figure out what function |
| called it, and can count how many times it was called. This change is made |
| by the compiler when your program is compiled with the ‘<samp><span class="samp">-pg</span></samp>’ option, |
| which causes every function to call <code>mcount</code> |
| (or <code>_mcount</code>, or <code>__mcount</code>, depending on the OS and compiler) |
| as one of its first operations. |
| |
| <p>The <code>mcount</code> routine, included in the profiling library, |
| is responsible for recording in an in-memory call graph table |
| both its parent routine (the child) and its parent's parent. This is |
| typically done by examining the stack frame to find both |
| the address of the child, and the return address in the original parent. |
| Since this is a very machine-dependent operation, <code>mcount</code> |
| itself is typically a short assembly-language stub routine |
| that extracts the required |
| information, and then calls <code>__mcount_internal</code> |
| (a normal C function) with two arguments—<code>frompc</code> and <code>selfpc</code>. |
| <code>__mcount_internal</code> is responsible for maintaining |
| the in-memory call graph, which records <code>frompc</code>, <code>selfpc</code>, |
| and the number of times each of these call arcs was traversed. |
| |
| <p>GCC Version 2 provides a magical function (<code>__builtin_return_address</code>), |
| which allows a generic <code>mcount</code> function to extract the |
| required information from the stack frame. However, on some |
| architectures, most notably the SPARC, using this builtin can be |
| very computationally expensive, and an assembly language version |
| of <code>mcount</code> is used for performance reasons. |
| |
| <p>Number-of-calls information for library routines is collected by using a |
| special version of the C library. The programs in it are the same as in |
| the usual C library, but they were compiled with ‘<samp><span class="samp">-pg</span></samp>’. If you |
| link your program with ‘<samp><span class="samp">gcc ... -pg</span></samp>’, it automatically uses the |
| profiling version of the library. |
| |
| <p>Profiling also involves watching your program as it runs, and keeping a |
| histogram of where the program counter happens to be every now and then. |
| Typically the program counter is looked at around 100 times per second of |
| run time, but the exact frequency may vary from system to system. |
| |
| <p>This is done is one of two ways. Most UNIX-like operating systems |
| provide a <code>profil()</code> system call, which registers a memory |
| array with the kernel, along with a scale |
| factor that determines how the program's address space maps |
| into the array. |
| Typical scaling values cause every 2 to 8 bytes of address space |
| to map into a single array slot. |
| On every tick of the system clock |
| (assuming the profiled program is running), the value of the |
| program counter is examined and the corresponding slot in |
| the memory array is incremented. Since this is done in the kernel, |
| which had to interrupt the process anyway to handle the clock |
| interrupt, very little additional system overhead is required. |
| |
| <p>However, some operating systems, most notably Linux 2.0 (and earlier), |
| do not provide a <code>profil()</code> system call. On such a system, |
| arrangements are made for the kernel to periodically deliver |
| a signal to the process (typically via <code>setitimer()</code>), |
| which then performs the same operation of examining the |
| program counter and incrementing a slot in the memory array. |
| Since this method requires a signal to be delivered to |
| user space every time a sample is taken, it uses considerably |
| more overhead than kernel-based profiling. Also, due to the |
| added delay required to deliver the signal, this method is |
| less accurate as well. |
| |
| <p>A special startup routine allocates memory for the histogram and |
| either calls <code>profil()</code> or sets up |
| a clock signal handler. |
| This routine (<code>monstartup</code>) can be invoked in several ways. |
| On Linux systems, a special profiling startup file <code>gcrt0.o</code>, |
| which invokes <code>monstartup</code> before <code>main</code>, |
| is used instead of the default <code>crt0.o</code>. |
| Use of this special startup file is one of the effects |
| of using ‘<samp><span class="samp">gcc ... -pg</span></samp>’ to link. |
| On SPARC systems, no special startup files are used. |
| Rather, the <code>mcount</code> routine, when it is invoked for |
| the first time (typically when <code>main</code> is called), |
| calls <code>monstartup</code>. |
| |
| <p>If the compiler's ‘<samp><span class="samp">-a</span></samp>’ option was used, basic-block counting |
| is also enabled. Each object file is then compiled with a static array |
| of counts, initially zero. |
| In the executable code, every time a new basic-block begins |
| (i.e., when an <code>if</code> statement appears), an extra instruction |
| is inserted to increment the corresponding count in the array. |
| At compile time, a paired array was constructed that recorded |
| the starting address of each basic-block. Taken together, |
| the two arrays record the starting address of every basic-block, |
| along with the number of times it was executed. |
| |
| <p>The profiling library also includes a function (<code>mcleanup</code>) which is |
| typically registered using <code>atexit()</code> to be called as the |
| program exits, and is responsible for writing the file <samp><span class="file">gmon.out</span></samp>. |
| Profiling is turned off, various headers are output, and the histogram |
| is written, followed by the call-graph arcs and the basic-block counts. |
| |
| <p>The output from <code>gprof</code> gives no indication of parts of your program that |
| are limited by I/O or swapping bandwidth. This is because samples of the |
| program counter are taken at fixed intervals of the program's run time. |
| Therefore, the |
| time measurements in <code>gprof</code> output say nothing about time that your |
| program was not running. For example, a part of the program that creates |
| so much data that it cannot all fit in physical memory at once may run very |
| slowly due to thrashing, but <code>gprof</code> will say it uses little time. On |
| the other hand, sampling by run time has the advantage that the amount of |
| load due to other users won't directly affect the output you get. |
| |
| </body></html> |
| |