| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <!-- Copyright (C) 1988-2015 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 or |
| any later version published by the Free Software Foundation; with the |
| Invariant Sections being "Funding Free Software", the Front-Cover |
| Texts being (a) (see below), and with the Back-Cover Texts being (b) |
| (see below). A copy of the license is included in the section entitled |
| "GNU Free Documentation License". |
| |
| (a) The FSF's Front-Cover Text is: |
| |
| A GNU Manual |
| |
| (b) The FSF's Back-Cover Text is: |
| |
| You have freedom to copy and modify this GNU Manual, like GNU |
| software. Copies published by the Free Software Foundation raise |
| funds for GNU development. --> |
| <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ --> |
| <head> |
| <title>Using the GNU Compiler Collection (GCC): Optimize Options</title> |
| |
| <meta name="description" content="Using the GNU Compiler Collection (GCC): Optimize Options"> |
| <meta name="keywords" content="Using the GNU Compiler Collection (GCC): Optimize Options"> |
| <meta name="resource-type" content="document"> |
| <meta name="distribution" content="global"> |
| <meta name="Generator" content="makeinfo"> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| <link href="index.html#Top" rel="start" title="Top"> |
| <link href="Option-Index.html#Option-Index" rel="index" title="Option Index"> |
| <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> |
| <link href="Invoking-GCC.html#Invoking-GCC" rel="up" title="Invoking GCC"> |
| <link href="Preprocessor-Options.html#Preprocessor-Options" rel="next" title="Preprocessor Options"> |
| <link href="Debugging-Options.html#Debugging-Options" rel="prev" title="Debugging Options"> |
| <style type="text/css"> |
| <!-- |
| a.summary-letter {text-decoration: none} |
| blockquote.smallquotation {font-size: smaller} |
| div.display {margin-left: 3.2em} |
| div.example {margin-left: 3.2em} |
| div.indentedblock {margin-left: 3.2em} |
| div.lisp {margin-left: 3.2em} |
| div.smalldisplay {margin-left: 3.2em} |
| div.smallexample {margin-left: 3.2em} |
| div.smallindentedblock {margin-left: 3.2em; font-size: smaller} |
| div.smalllisp {margin-left: 3.2em} |
| kbd {font-style:oblique} |
| pre.display {font-family: inherit} |
| pre.format {font-family: inherit} |
| pre.menu-comment {font-family: serif} |
| pre.menu-preformatted {font-family: serif} |
| pre.smalldisplay {font-family: inherit; font-size: smaller} |
| pre.smallexample {font-size: smaller} |
| pre.smallformat {font-family: inherit; font-size: smaller} |
| pre.smalllisp {font-size: smaller} |
| span.nocodebreak {white-space:nowrap} |
| span.nolinebreak {white-space:nowrap} |
| span.roman {font-family:serif; font-weight:normal} |
| span.sansserif {font-family:sans-serif; font-weight:normal} |
| ul.no-bullet {list-style: none} |
| --> |
| </style> |
| |
| |
| </head> |
| |
| <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> |
| <a name="Optimize-Options"></a> |
| <div class="header"> |
| <p> |
| Next: <a href="Preprocessor-Options.html#Preprocessor-Options" accesskey="n" rel="next">Preprocessor Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="prev">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p> |
| </div> |
| <hr> |
| <a name="Options-That-Control-Optimization"></a> |
| <h3 class="section">3.10 Options That Control Optimization</h3> |
| <a name="index-optimize-options"></a> |
| <a name="index-options_002c-optimization"></a> |
| |
| <p>These options control various sorts of optimizations. |
| </p> |
| <p>Without any optimization option, the compiler’s goal is to reduce the |
| cost of compilation and to make debugging produce the expected |
| results. Statements are independent: if you stop the program with a |
| breakpoint between statements, you can then assign a new value to any |
| variable or change the program counter to any other statement in the |
| function and get exactly the results you expect from the source |
| code. |
| </p> |
| <p>Turning on optimization flags makes the compiler attempt to improve |
| the performance and/or code size at the expense of compilation time |
| and possibly the ability to debug the program. |
| </p> |
| <p>The compiler performs optimization based on the knowledge it has of the |
| program. Compiling multiple files at once to a single output file mode allows |
| the compiler to use information gained from all of the files when compiling |
| each of them. |
| </p> |
| <p>Not all optimizations are controlled directly by a flag. Only |
| optimizations that have a flag are listed in this section. |
| </p> |
| <p>Most optimizations are only enabled if an <samp>-O</samp> level is set on |
| the command line. Otherwise they are disabled, even if individual |
| optimization flags are specified. |
| </p> |
| <p>Depending on the target and how GCC was configured, a slightly different |
| set of optimizations may be enabled at each <samp>-O</samp> level than |
| those listed here. You can invoke GCC with <samp>-Q --help=optimizers</samp> |
| to find out the exact set of optimizations that are enabled at each level. |
| See <a href="Overall-Options.html#Overall-Options">Overall Options</a>, for examples. |
| </p> |
| <dl compact="compact"> |
| <dt><code>-O</code></dt> |
| <dt><code>-O1</code></dt> |
| <dd><a name="index-O"></a> |
| <a name="index-O1"></a> |
| <p>Optimize. Optimizing compilation takes somewhat more time, and a lot |
| more memory for a large function. |
| </p> |
| <p>With <samp>-O</samp>, the compiler tries to reduce code size and execution |
| time, without performing any optimizations that take a great deal of |
| compilation time. |
| </p> |
| <p><samp>-O</samp> turns on the following optimization flags: |
| </p><div class="smallexample"> |
| <pre class="smallexample">-fauto-inc-dec |
| -fbranch-count-reg |
| -fcombine-stack-adjustments |
| -fcompare-elim |
| -fcprop-registers |
| -fdce |
| -fdefer-pop |
| -fdelayed-branch |
| -fdse |
| -fforward-propagate |
| -fguess-branch-probability |
| -fif-conversion2 |
| -fif-conversion |
| -finline-functions-called-once |
| -fipa-pure-const |
| -fipa-profile |
| -fipa-reference |
| -fmerge-constants |
| -fmove-loop-invariants |
| -fshrink-wrap |
| -fsplit-wide-types |
| -ftree-bit-ccp |
| -ftree-ccp |
| -fssa-phiopt |
| -ftree-ch |
| -ftree-copy-prop |
| -ftree-copyrename |
| -ftree-dce |
| -ftree-dominator-opts |
| -ftree-dse |
| -ftree-forwprop |
| -ftree-fre |
| -ftree-phiprop |
| -ftree-sink |
| -ftree-slsr |
| -ftree-sra |
| -ftree-pta |
| -ftree-ter |
| -funit-at-a-time |
| </pre></div> |
| |
| <p><samp>-O</samp> also turns on <samp>-fomit-frame-pointer</samp> on machines |
| where doing so does not interfere with debugging. |
| </p> |
| </dd> |
| <dt><code>-O2</code></dt> |
| <dd><a name="index-O2"></a> |
| <p>Optimize even more. GCC performs nearly all supported optimizations |
| that do not involve a space-speed tradeoff. |
| As compared to <samp>-O</samp>, this option increases both compilation time |
| and the performance of the generated code. |
| </p> |
| <p><samp>-O2</samp> turns on all optimization flags specified by <samp>-O</samp>. It |
| also turns on the following optimization flags: |
| </p><div class="smallexample"> |
| <pre class="smallexample">-fthread-jumps |
| -falign-functions -falign-jumps |
| -falign-loops -falign-labels |
| -fcaller-saves |
| -fcrossjumping |
| -fcse-follow-jumps -fcse-skip-blocks |
| -fdelete-null-pointer-checks |
| -fdevirtualize -fdevirtualize-speculatively |
| -fexpensive-optimizations |
| -fgcse -fgcse-lm |
| -fhoist-adjacent-loads |
| -finline-small-functions |
| -findirect-inlining |
| -fipa-cp |
| -fipa-cp-alignment |
| -fipa-sra |
| -fipa-icf |
| -fisolate-erroneous-paths-dereference |
| -flra-remat |
| -foptimize-sibling-calls |
| -foptimize-strlen |
| -fpartial-inlining |
| -fpeephole2 |
| -freorder-blocks -freorder-blocks-and-partition -freorder-functions |
| -frerun-cse-after-loop |
| -fsched-interblock -fsched-spec |
| -fschedule-insns -fschedule-insns2 |
| -fstrict-aliasing -fstrict-overflow |
| -ftree-builtin-call-dce |
| -ftree-switch-conversion -ftree-tail-merge |
| -ftree-pre |
| -ftree-vrp |
| -fipa-ra |
| </pre></div> |
| |
| <p>Please note the warning under <samp>-fgcse</samp> about |
| invoking <samp>-O2</samp> on programs that use computed gotos. |
| </p> |
| </dd> |
| <dt><code>-O3</code></dt> |
| <dd><a name="index-O3"></a> |
| <p>Optimize yet more. <samp>-O3</samp> turns on all optimizations specified |
| by <samp>-O2</samp> and also turns on the <samp>-finline-functions</samp>, |
| <samp>-funswitch-loops</samp>, <samp>-fpredictive-commoning</samp>, |
| <samp>-fgcse-after-reload</samp>, <samp>-ftree-loop-vectorize</samp>, |
| <samp>-ftree-loop-distribute-patterns</samp>, |
| <samp>-ftree-slp-vectorize</samp>, <samp>-fvect-cost-model</samp>, |
| <samp>-ftree-partial-pre</samp> and <samp>-fipa-cp-clone</samp> options. |
| </p> |
| </dd> |
| <dt><code>-O0</code></dt> |
| <dd><a name="index-O0"></a> |
| <p>Reduce compilation time and make debugging produce the expected |
| results. This is the default. |
| </p> |
| </dd> |
| <dt><code>-Os</code></dt> |
| <dd><a name="index-Os"></a> |
| <p>Optimize for size. <samp>-Os</samp> enables all <samp>-O2</samp> optimizations that |
| do not typically increase code size. It also performs further |
| optimizations designed to reduce code size. |
| </p> |
| <p><samp>-Os</samp> disables the following optimization flags: |
| </p><div class="smallexample"> |
| <pre class="smallexample">-falign-functions -falign-jumps -falign-loops |
| -falign-labels -freorder-blocks -freorder-blocks-and-partition |
| -fprefetch-loop-arrays |
| </pre></div> |
| |
| </dd> |
| <dt><code>-Ofast</code></dt> |
| <dd><a name="index-Ofast"></a> |
| <p>Disregard strict standards compliance. <samp>-Ofast</samp> enables all |
| <samp>-O3</samp> optimizations. It also enables optimizations that are not |
| valid for all standard-compliant programs. |
| It turns on <samp>-ffast-math</samp> and the Fortran-specific |
| <samp>-fno-protect-parens</samp> and <samp>-fstack-arrays</samp>. |
| </p> |
| </dd> |
| <dt><code>-Og</code></dt> |
| <dd><a name="index-Og"></a> |
| <p>Optimize debugging experience. <samp>-Og</samp> enables optimizations |
| that do not interfere with debugging. It should be the optimization |
| level of choice for the standard edit-compile-debug cycle, offering |
| a reasonable level of optimization while maintaining fast compilation |
| and a good debugging experience. |
| </p> |
| <p>If you use multiple <samp>-O</samp> options, with or without level numbers, |
| the last such option is the one that is effective. |
| </p></dd> |
| </dl> |
| |
| <p>Options of the form <samp>-f<var>flag</var></samp> specify machine-independent |
| flags. Most flags have both positive and negative forms; the negative |
| form of <samp>-ffoo</samp> is <samp>-fno-foo</samp>. In the table |
| below, only one of the forms is listed—the one you typically |
| use. You can figure out the other form by either removing ‘<samp>no-</samp>’ |
| or adding it. |
| </p> |
| <p>The following options control specific optimizations. They are either |
| activated by <samp>-O</samp> options or are related to ones that are. You |
| can use the following flags in the rare cases when “fine-tuning” of |
| optimizations to be performed is desired. |
| </p> |
| <dl compact="compact"> |
| <dt><code>-fno-defer-pop</code></dt> |
| <dd><a name="index-fno_002ddefer_002dpop"></a> |
| <p>Always pop the arguments to each function call as soon as that function |
| returns. For machines that must pop arguments after a function call, |
| the compiler normally lets arguments accumulate on the stack for several |
| function calls and pops them all at once. |
| </p> |
| <p>Disabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fforward-propagate</code></dt> |
| <dd><a name="index-fforward_002dpropagate"></a> |
| <p>Perform a forward propagation pass on RTL. The pass tries to combine two |
| instructions and checks if the result can be simplified. If loop unrolling |
| is active, two passes are performed and the second is scheduled after |
| loop unrolling. |
| </p> |
| <p>This option is enabled by default at optimization levels <samp>-O</samp>, |
| <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-ffp-contract=<var>style</var></code></dt> |
| <dd><a name="index-ffp_002dcontract"></a> |
| <p><samp>-ffp-contract=off</samp> disables floating-point expression contraction. |
| <samp>-ffp-contract=fast</samp> enables floating-point expression contraction |
| such as forming of fused multiply-add operations if the target has |
| native support for them. |
| <samp>-ffp-contract=on</samp> enables floating-point expression contraction |
| if allowed by the language standard. This is currently not implemented |
| and treated equal to <samp>-ffp-contract=off</samp>. |
| </p> |
| <p>The default is <samp>-ffp-contract=fast</samp>. |
| </p> |
| </dd> |
| <dt><code>-fomit-frame-pointer</code></dt> |
| <dd><a name="index-fomit_002dframe_002dpointer"></a> |
| <p>Don’t keep the frame pointer in a register for functions that |
| don’t need one. This avoids the instructions to save, set up and |
| restore frame pointers; it also makes an extra register available |
| in many functions. <strong>It also makes debugging impossible on |
| some machines.</strong> |
| </p> |
| <p>On some machines, such as the VAX, this flag has no effect, because |
| the standard calling sequence automatically handles the frame pointer |
| and nothing is saved by pretending it doesn’t exist. The |
| machine-description macro <code>FRAME_POINTER_REQUIRED</code> controls |
| whether a target machine supports this flag. See <a href="http://gcc.gnu.org/onlinedocs/gccint/Registers.html#Registers">Register |
| Usage</a> in <cite>GNU Compiler Collection (GCC) Internals</cite>. |
| </p> |
| <p>The default setting (when not optimizing for |
| size) for 32-bit GNU/Linux x86 and 32-bit Darwin x86 targets is |
| <samp>-fomit-frame-pointer</samp>. You can configure GCC with the |
| <samp>--enable-frame-pointer</samp> configure option to change the default. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-foptimize-sibling-calls</code></dt> |
| <dd><a name="index-foptimize_002dsibling_002dcalls"></a> |
| <p>Optimize sibling and tail recursive calls. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-foptimize-strlen</code></dt> |
| <dd><a name="index-foptimize_002dstrlen"></a> |
| <p>Optimize various standard C string functions (e.g. <code>strlen</code>, |
| <code>strchr</code> or <code>strcpy</code>) and |
| their <code>_FORTIFY_SOURCE</code> counterparts into faster alternatives. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-inline</code></dt> |
| <dd><a name="index-fno_002dinline"></a> |
| <p>Do not expand any functions inline apart from those marked with |
| the <code>always_inline</code> attribute. This is the default when not |
| optimizing. |
| </p> |
| <p>Single functions can be exempted from inlining by marking them |
| with the <code>noinline</code> attribute. |
| </p> |
| </dd> |
| <dt><code>-finline-small-functions</code></dt> |
| <dd><a name="index-finline_002dsmall_002dfunctions"></a> |
| <p>Integrate functions into their callers when their body is smaller than expected |
| function call code (so overall size of program gets smaller). The compiler |
| heuristically decides which functions are simple enough to be worth integrating |
| in this way. This inlining applies to all functions, even those not declared |
| inline. |
| </p> |
| <p>Enabled at level <samp>-O2</samp>. |
| </p> |
| </dd> |
| <dt><code>-findirect-inlining</code></dt> |
| <dd><a name="index-findirect_002dinlining"></a> |
| <p>Inline also indirect calls that are discovered to be known at compile |
| time thanks to previous inlining. This option has any effect only |
| when inlining itself is turned on by the <samp>-finline-functions</samp> |
| or <samp>-finline-small-functions</samp> options. |
| </p> |
| <p>Enabled at level <samp>-O2</samp>. |
| </p> |
| </dd> |
| <dt><code>-finline-functions</code></dt> |
| <dd><a name="index-finline_002dfunctions"></a> |
| <p>Consider all functions for inlining, even if they are not declared inline. |
| The compiler heuristically decides which functions are worth integrating |
| in this way. |
| </p> |
| <p>If all calls to a given function are integrated, and the function is |
| declared <code>static</code>, then the function is normally not output as |
| assembler code in its own right. |
| </p> |
| <p>Enabled at level <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-finline-functions-called-once</code></dt> |
| <dd><a name="index-finline_002dfunctions_002dcalled_002donce"></a> |
| <p>Consider all <code>static</code> functions called once for inlining into their |
| caller even if they are not marked <code>inline</code>. If a call to a given |
| function is integrated, then the function is not output as assembler code |
| in its own right. |
| </p> |
| <p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fearly-inlining</code></dt> |
| <dd><a name="index-fearly_002dinlining"></a> |
| <p>Inline functions marked by <code>always_inline</code> and functions whose body seems |
| smaller than the function call overhead early before doing |
| <samp>-fprofile-generate</samp> instrumentation and real inlining pass. Doing so |
| makes profiling significantly cheaper and usually inlining faster on programs |
| having large chains of nested wrapper functions. |
| </p> |
| <p>Enabled by default. |
| </p> |
| </dd> |
| <dt><code>-fipa-sra</code></dt> |
| <dd><a name="index-fipa_002dsra"></a> |
| <p>Perform interprocedural scalar replacement of aggregates, removal of |
| unused parameters and replacement of parameters passed by reference |
| by parameters passed by value. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-finline-limit=<var>n</var></code></dt> |
| <dd><a name="index-finline_002dlimit"></a> |
| <p>By default, GCC limits the size of functions that can be inlined. This flag |
| allows coarse control of this limit. <var>n</var> is the size of functions that |
| can be inlined in number of pseudo instructions. |
| </p> |
| <p>Inlining is actually controlled by a number of parameters, which may be |
| specified individually by using <samp>--param <var>name</var>=<var>value</var></samp>. |
| The <samp>-finline-limit=<var>n</var></samp> option sets some of these parameters |
| as follows: |
| </p> |
| <dl compact="compact"> |
| <dt><code>max-inline-insns-single</code></dt> |
| <dd><p>is set to <var>n</var>/2. |
| </p></dd> |
| <dt><code>max-inline-insns-auto</code></dt> |
| <dd><p>is set to <var>n</var>/2. |
| </p></dd> |
| </dl> |
| |
| <p>See below for a documentation of the individual |
| parameters controlling inlining and for the defaults of these parameters. |
| </p> |
| <p><em>Note:</em> there may be no value to <samp>-finline-limit</samp> that results |
| in default behavior. |
| </p> |
| <p><em>Note:</em> pseudo instruction represents, in this particular context, an |
| abstract measurement of function’s size. In no way does it represent a count |
| of assembly instructions and as such its exact meaning might change from one |
| release to an another. |
| </p> |
| </dd> |
| <dt><code>-fno-keep-inline-dllexport</code></dt> |
| <dd><a name="index-fno_002dkeep_002dinline_002ddllexport"></a> |
| <p>This is a more fine-grained version of <samp>-fkeep-inline-functions</samp>, |
| which applies only to functions that are declared using the <code>dllexport</code> |
| attribute or declspec (See <a href="Function-Attributes.html#Function-Attributes">Declaring Attributes of |
| Functions</a>.) |
| </p> |
| </dd> |
| <dt><code>-fkeep-inline-functions</code></dt> |
| <dd><a name="index-fkeep_002dinline_002dfunctions"></a> |
| <p>In C, emit <code>static</code> functions that are declared <code>inline</code> |
| into the object file, even if the function has been inlined into all |
| of its callers. This switch does not affect functions using the |
| <code>extern inline</code> extension in GNU C90. In C++, emit any and all |
| inline functions into the object file. |
| </p> |
| </dd> |
| <dt><code>-fkeep-static-consts</code></dt> |
| <dd><a name="index-fkeep_002dstatic_002dconsts"></a> |
| <p>Emit variables declared <code>static const</code> when optimization isn’t turned |
| on, even if the variables aren’t referenced. |
| </p> |
| <p>GCC enables this option by default. If you want to force the compiler to |
| check if a variable is referenced, regardless of whether or not |
| optimization is turned on, use the <samp>-fno-keep-static-consts</samp> option. |
| </p> |
| </dd> |
| <dt><code>-fmerge-constants</code></dt> |
| <dd><a name="index-fmerge_002dconstants"></a> |
| <p>Attempt to merge identical constants (string constants and floating-point |
| constants) across compilation units. |
| </p> |
| <p>This option is the default for optimized compilation if the assembler and |
| linker support it. Use <samp>-fno-merge-constants</samp> to inhibit this |
| behavior. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fmerge-all-constants</code></dt> |
| <dd><a name="index-fmerge_002dall_002dconstants"></a> |
| <p>Attempt to merge identical constants and identical variables. |
| </p> |
| <p>This option implies <samp>-fmerge-constants</samp>. In addition to |
| <samp>-fmerge-constants</samp> this considers e.g. even constant initialized |
| arrays or initialized constant variables with integral or floating-point |
| types. Languages like C or C++ require each variable, including multiple |
| instances of the same variable in recursive calls, to have distinct locations, |
| so using this option results in non-conforming |
| behavior. |
| </p> |
| </dd> |
| <dt><code>-fmodulo-sched</code></dt> |
| <dd><a name="index-fmodulo_002dsched"></a> |
| <p>Perform swing modulo scheduling immediately before the first scheduling |
| pass. This pass looks at innermost loops and reorders their |
| instructions by overlapping different iterations. |
| </p> |
| </dd> |
| <dt><code>-fmodulo-sched-allow-regmoves</code></dt> |
| <dd><a name="index-fmodulo_002dsched_002dallow_002dregmoves"></a> |
| <p>Perform more aggressive SMS-based modulo scheduling with register moves |
| allowed. By setting this flag certain anti-dependences edges are |
| deleted, which triggers the generation of reg-moves based on the |
| life-range analysis. This option is effective only with |
| <samp>-fmodulo-sched</samp> enabled. |
| </p> |
| </dd> |
| <dt><code>-fno-branch-count-reg</code></dt> |
| <dd><a name="index-fno_002dbranch_002dcount_002dreg"></a> |
| <p>Do not use “decrement and branch” instructions on a count register, |
| but instead generate a sequence of instructions that decrement a |
| register, compare it against zero, then branch based upon the result. |
| This option is only meaningful on architectures that support such |
| instructions, which include x86, PowerPC, IA-64 and S/390. |
| </p> |
| <p>Enabled by default at <samp>-O1</samp> and higher. |
| </p> |
| <p>The default is <samp>-fbranch-count-reg</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-function-cse</code></dt> |
| <dd><a name="index-fno_002dfunction_002dcse"></a> |
| <p>Do not put function addresses in registers; make each instruction that |
| calls a constant function contain the function’s address explicitly. |
| </p> |
| <p>This option results in less efficient code, but some strange hacks |
| that alter the assembler output may be confused by the optimizations |
| performed when this option is not used. |
| </p> |
| <p>The default is <samp>-ffunction-cse</samp> |
| </p> |
| </dd> |
| <dt><code>-fno-zero-initialized-in-bss</code></dt> |
| <dd><a name="index-fno_002dzero_002dinitialized_002din_002dbss"></a> |
| <p>If the target supports a BSS section, GCC by default puts variables that |
| are initialized to zero into BSS. This can save space in the resulting |
| code. |
| </p> |
| <p>This option turns off this behavior because some programs explicitly |
| rely on variables going to the data section—e.g., so that the |
| resulting executable can find the beginning of that section and/or make |
| assumptions based on that. |
| </p> |
| <p>The default is <samp>-fzero-initialized-in-bss</samp>. |
| </p> |
| </dd> |
| <dt><code>-fthread-jumps</code></dt> |
| <dd><a name="index-fthread_002djumps"></a> |
| <p>Perform optimizations that check to see if a jump branches to a |
| location where another comparison subsumed by the first is found. If |
| so, the first branch is redirected to either the destination of the |
| second branch or a point immediately following it, depending on whether |
| the condition is known to be true or false. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fsplit-wide-types</code></dt> |
| <dd><a name="index-fsplit_002dwide_002dtypes"></a> |
| <p>When using a type that occupies multiple registers, such as <code>long |
| long</code> on a 32-bit system, split the registers apart and allocate them |
| independently. This normally generates better code for those types, |
| but may make debugging more difficult. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, |
| <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fcse-follow-jumps</code></dt> |
| <dd><a name="index-fcse_002dfollow_002djumps"></a> |
| <p>In common subexpression elimination (CSE), scan through jump instructions |
| when the target of the jump is not reached by any other path. For |
| example, when CSE encounters an <code>if</code> statement with an |
| <code>else</code> clause, CSE follows the jump when the condition |
| tested is false. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fcse-skip-blocks</code></dt> |
| <dd><a name="index-fcse_002dskip_002dblocks"></a> |
| <p>This is similar to <samp>-fcse-follow-jumps</samp>, but causes CSE to |
| follow jumps that conditionally skip over blocks. When CSE |
| encounters a simple <code>if</code> statement with no else clause, |
| <samp>-fcse-skip-blocks</samp> causes CSE to follow the jump around the |
| body of the <code>if</code>. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-frerun-cse-after-loop</code></dt> |
| <dd><a name="index-frerun_002dcse_002dafter_002dloop"></a> |
| <p>Re-run common subexpression elimination after loop optimizations are |
| performed. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fgcse</code></dt> |
| <dd><a name="index-fgcse"></a> |
| <p>Perform a global common subexpression elimination pass. |
| This pass also performs global constant and copy propagation. |
| </p> |
| <p><em>Note:</em> When compiling a program using computed gotos, a GCC |
| extension, you may get better run-time performance if you disable |
| the global common subexpression elimination pass by adding |
| <samp>-fno-gcse</samp> to the command line. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fgcse-lm</code></dt> |
| <dd><a name="index-fgcse_002dlm"></a> |
| <p>When <samp>-fgcse-lm</samp> is enabled, global common subexpression elimination |
| attempts to move loads that are only killed by stores into themselves. This |
| allows a loop containing a load/store sequence to be changed to a load outside |
| the loop, and a copy/store within the loop. |
| </p> |
| <p>Enabled by default when <samp>-fgcse</samp> is enabled. |
| </p> |
| </dd> |
| <dt><code>-fgcse-sm</code></dt> |
| <dd><a name="index-fgcse_002dsm"></a> |
| <p>When <samp>-fgcse-sm</samp> is enabled, a store motion pass is run after |
| global common subexpression elimination. This pass attempts to move |
| stores out of loops. When used in conjunction with <samp>-fgcse-lm</samp>, |
| loops containing a load/store sequence can be changed to a load before |
| the loop and a store after the loop. |
| </p> |
| <p>Not enabled at any optimization level. |
| </p> |
| </dd> |
| <dt><code>-fgcse-las</code></dt> |
| <dd><a name="index-fgcse_002dlas"></a> |
| <p>When <samp>-fgcse-las</samp> is enabled, the global common subexpression |
| elimination pass eliminates redundant loads that come after stores to the |
| same memory location (both partial and full redundancies). |
| </p> |
| <p>Not enabled at any optimization level. |
| </p> |
| </dd> |
| <dt><code>-fgcse-after-reload</code></dt> |
| <dd><a name="index-fgcse_002dafter_002dreload"></a> |
| <p>When <samp>-fgcse-after-reload</samp> is enabled, a redundant load elimination |
| pass is performed after reload. The purpose of this pass is to clean up |
| redundant spilling. |
| </p> |
| </dd> |
| <dt><code>-faggressive-loop-optimizations</code></dt> |
| <dd><a name="index-faggressive_002dloop_002doptimizations"></a> |
| <p>This option tells the loop optimizer to use language constraints to |
| derive bounds for the number of iterations of a loop. This assumes that |
| loop code does not invoke undefined behavior by for example causing signed |
| integer overflows or out-of-bound array accesses. The bounds for the |
| number of iterations of a loop are used to guide loop unrolling and peeling |
| and loop exit test optimizations. |
| This option is enabled by default. |
| </p> |
| </dd> |
| <dt><code>-funsafe-loop-optimizations</code></dt> |
| <dd><a name="index-funsafe_002dloop_002doptimizations"></a> |
| <p>This option tells the loop optimizer to assume that loop indices do not |
| overflow, and that loops with nontrivial exit condition are not |
| infinite. This enables a wider range of loop optimizations even if |
| the loop optimizer itself cannot prove that these assumptions are valid. |
| If you use <samp>-Wunsafe-loop-optimizations</samp>, the compiler warns you |
| if it finds this kind of loop. |
| </p> |
| </dd> |
| <dt><code>-fcrossjumping</code></dt> |
| <dd><a name="index-fcrossjumping"></a> |
| <p>Perform cross-jumping transformation. |
| This transformation unifies equivalent code and saves code size. The |
| resulting code may or may not perform better than without cross-jumping. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fauto-inc-dec</code></dt> |
| <dd><a name="index-fauto_002dinc_002ddec"></a> |
| <p>Combine increments or decrements of addresses with memory accesses. |
| This pass is always skipped on architectures that do not have |
| instructions to support this. Enabled by default at <samp>-O</samp> and |
| higher on architectures that support this. |
| </p> |
| </dd> |
| <dt><code>-fdce</code></dt> |
| <dd><a name="index-fdce"></a> |
| <p>Perform dead code elimination (DCE) on RTL. |
| Enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fdse</code></dt> |
| <dd><a name="index-fdse"></a> |
| <p>Perform dead store elimination (DSE) on RTL. |
| Enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fif-conversion</code></dt> |
| <dd><a name="index-fif_002dconversion"></a> |
| <p>Attempt to transform conditional jumps into branch-less equivalents. This |
| includes use of conditional moves, min, max, set flags and abs instructions, and |
| some tricks doable by standard arithmetics. The use of conditional execution |
| on chips where it is available is controlled by <samp>-fif-conversion2</samp>. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fif-conversion2</code></dt> |
| <dd><a name="index-fif_002dconversion2"></a> |
| <p>Use conditional execution (where available) to transform conditional jumps into |
| branch-less equivalents. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fdeclone-ctor-dtor</code></dt> |
| <dd><a name="index-fdeclone_002dctor_002ddtor"></a> |
| <p>The C++ ABI requires multiple entry points for constructors and |
| destructors: one for a base subobject, one for a complete object, and |
| one for a virtual destructor that calls operator delete afterwards. |
| For a hierarchy with virtual bases, the base and complete variants are |
| clones, which means two copies of the function. With this option, the |
| base and complete variants are changed to be thunks that call a common |
| implementation. |
| </p> |
| <p>Enabled by <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fdelete-null-pointer-checks</code></dt> |
| <dd><a name="index-fdelete_002dnull_002dpointer_002dchecks"></a> |
| <p>Assume that programs cannot safely dereference null pointers, and that |
| no code or data element resides there. This enables simple constant |
| folding optimizations at all optimization levels. In addition, other |
| optimization passes in GCC use this flag to control global dataflow |
| analyses that eliminate useless checks for null pointers; these assume |
| that if a pointer is checked after it has already been dereferenced, |
| it cannot be null. |
| </p> |
| <p>Note however that in some environments this assumption is not true. |
| Use <samp>-fno-delete-null-pointer-checks</samp> to disable this optimization |
| for programs that depend on that behavior. |
| </p> |
| <p>Some targets, especially embedded ones, disable this option at all levels. |
| Otherwise it is enabled at all levels: <samp>-O0</samp>, <samp>-O1</samp>, |
| <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. Passes that use the information |
| are enabled independently at different optimization levels. |
| </p> |
| </dd> |
| <dt><code>-fdevirtualize</code></dt> |
| <dd><a name="index-fdevirtualize"></a> |
| <p>Attempt to convert calls to virtual functions to direct calls. This |
| is done both within a procedure and interprocedurally as part of |
| indirect inlining (<samp>-findirect-inlining</samp>) and interprocedural constant |
| propagation (<samp>-fipa-cp</samp>). |
| Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fdevirtualize-speculatively</code></dt> |
| <dd><a name="index-fdevirtualize_002dspeculatively"></a> |
| <p>Attempt to convert calls to virtual functions to speculative direct calls. |
| Based on the analysis of the type inheritance graph, determine for a given call |
| the set of likely targets. If the set is small, preferably of size 1, change |
| the call into a conditional deciding between direct and indirect calls. The |
| speculative calls enable more optimizations, such as inlining. When they seem |
| useless after further optimization, they are converted back into original form. |
| </p> |
| </dd> |
| <dt><code>-fdevirtualize-at-ltrans</code></dt> |
| <dd><a name="index-fdevirtualize_002dat_002dltrans"></a> |
| <p>Stream extra information needed for aggressive devirtualization when running |
| the link-time optimizer in local transformation mode. |
| This option enables more devirtualization but |
| significantly increases the size of streamed data. For this reason it is |
| disabled by default. |
| </p> |
| </dd> |
| <dt><code>-fexpensive-optimizations</code></dt> |
| <dd><a name="index-fexpensive_002doptimizations"></a> |
| <p>Perform a number of minor optimizations that are relatively expensive. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-free</code></dt> |
| <dd><a name="index-free"></a> |
| <p>Attempt to remove redundant extension instructions. This is especially |
| helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit |
| registers after writing to their lower 32-bit half. |
| </p> |
| <p>Enabled for Alpha, AArch64 and x86 at levels <samp>-O2</samp>, |
| <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-lifetime-dse</code></dt> |
| <dd><a name="index-fno_002dlifetime_002ddse"></a> |
| <p>In C++ the value of an object is only affected by changes within its |
| lifetime: when the constructor begins, the object has an indeterminate |
| value, and any changes during the lifetime of the object are dead when |
| the object is destroyed. Normally dead store elimination will take |
| advantage of this; if your code relies on the value of the object |
| storage persisting beyond the lifetime of the object, you can use this |
| flag to disable this optimization. |
| </p> |
| </dd> |
| <dt><code>-flive-range-shrinkage</code></dt> |
| <dd><a name="index-flive_002drange_002dshrinkage"></a> |
| <p>Attempt to decrease register pressure through register live range |
| shrinkage. This is helpful for fast processors with small or moderate |
| size register sets. |
| </p> |
| </dd> |
| <dt><code>-fira-algorithm=<var>algorithm</var></code></dt> |
| <dd><a name="index-fira_002dalgorithm"></a> |
| <p>Use the specified coloring algorithm for the integrated register |
| allocator. The <var>algorithm</var> argument can be ‘<samp>priority</samp>’, which |
| specifies Chow’s priority coloring, or ‘<samp>CB</samp>’, which specifies |
| Chaitin-Briggs coloring. Chaitin-Briggs coloring is not implemented |
| for all architectures, but for those targets that do support it, it is |
| the default because it generates better code. |
| </p> |
| </dd> |
| <dt><code>-fira-region=<var>region</var></code></dt> |
| <dd><a name="index-fira_002dregion"></a> |
| <p>Use specified regions for the integrated register allocator. The |
| <var>region</var> argument should be one of the following: |
| </p> |
| <dl compact="compact"> |
| <dt>‘<samp>all</samp>’</dt> |
| <dd><p>Use all loops as register allocation regions. |
| This can give the best results for machines with a small and/or |
| irregular register set. |
| </p> |
| </dd> |
| <dt>‘<samp>mixed</samp>’</dt> |
| <dd><p>Use all loops except for loops with small register pressure |
| as the regions. This value usually gives |
| the best results in most cases and for most architectures, |
| and is enabled by default when compiling with optimization for speed |
| (<samp>-O</samp>, <samp>-O2</samp>, …). |
| </p> |
| </dd> |
| <dt>‘<samp>one</samp>’</dt> |
| <dd><p>Use all functions as a single region. |
| This typically results in the smallest code size, and is enabled by default for |
| <samp>-Os</samp> or <samp>-O0</samp>. |
| </p> |
| </dd> |
| </dl> |
| |
| </dd> |
| <dt><code>-fira-hoist-pressure</code></dt> |
| <dd><a name="index-fira_002dhoist_002dpressure"></a> |
| <p>Use IRA to evaluate register pressure in the code hoisting pass for |
| decisions to hoist expressions. This option usually results in smaller |
| code, but it can slow the compiler down. |
| </p> |
| <p>This option is enabled at level <samp>-Os</samp> for all targets. |
| </p> |
| </dd> |
| <dt><code>-fira-loop-pressure</code></dt> |
| <dd><a name="index-fira_002dloop_002dpressure"></a> |
| <p>Use IRA to evaluate register pressure in loops for decisions to move |
| loop invariants. This option usually results in generation |
| of faster and smaller code on machines with large register files (>= 32 |
| registers), but it can slow the compiler down. |
| </p> |
| <p>This option is enabled at level <samp>-O3</samp> for some targets. |
| </p> |
| </dd> |
| <dt><code>-fno-ira-share-save-slots</code></dt> |
| <dd><a name="index-fno_002dira_002dshare_002dsave_002dslots"></a> |
| <p>Disable sharing of stack slots used for saving call-used hard |
| registers living through a call. Each hard register gets a |
| separate stack slot, and as a result function stack frames are |
| larger. |
| </p> |
| </dd> |
| <dt><code>-fno-ira-share-spill-slots</code></dt> |
| <dd><a name="index-fno_002dira_002dshare_002dspill_002dslots"></a> |
| <p>Disable sharing of stack slots allocated for pseudo-registers. Each |
| pseudo-register that does not get a hard register gets a separate |
| stack slot, and as a result function stack frames are larger. |
| </p> |
| </dd> |
| <dt><code>-fira-verbose=<var>n</var></code></dt> |
| <dd><a name="index-fira_002dverbose"></a> |
| <p>Control the verbosity of the dump file for the integrated register allocator. |
| The default value is 5. If the value <var>n</var> is greater or equal to 10, |
| the dump output is sent to stderr using the same format as <var>n</var> minus 10. |
| </p> |
| </dd> |
| <dt><code>-flra-remat</code></dt> |
| <dd><a name="index-flra_002dremat"></a> |
| <p>Enable CFG-sensitive rematerialization in LRA. Instead of loading |
| values of spilled pseudos, LRA tries to rematerialize (recalculate) |
| values if it is profitable. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fdelayed-branch</code></dt> |
| <dd><a name="index-fdelayed_002dbranch"></a> |
| <p>If supported for the target machine, attempt to reorder instructions |
| to exploit instruction slots available after delayed branch |
| instructions. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fschedule-insns</code></dt> |
| <dd><a name="index-fschedule_002dinsns"></a> |
| <p>If supported for the target machine, attempt to reorder instructions to |
| eliminate execution stalls due to required data being unavailable. This |
| helps machines that have slow floating point or memory load instructions |
| by allowing other instructions to be issued until the result of the load |
| or floating-point instruction is required. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-fschedule-insns2</code></dt> |
| <dd><a name="index-fschedule_002dinsns2"></a> |
| <p>Similar to <samp>-fschedule-insns</samp>, but requests an additional pass of |
| instruction scheduling after register allocation has been done. This is |
| especially useful on machines with a relatively small number of |
| registers and where memory load instructions take more than one cycle. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-sched-interblock</code></dt> |
| <dd><a name="index-fno_002dsched_002dinterblock"></a> |
| <p>Don’t schedule instructions across basic blocks. This is normally |
| enabled by default when scheduling before register allocation, i.e. |
| with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fno-sched-spec</code></dt> |
| <dd><a name="index-fno_002dsched_002dspec"></a> |
| <p>Don’t allow speculative motion of non-load instructions. This is normally |
| enabled by default when scheduling before register allocation, i.e. |
| with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-pressure</code></dt> |
| <dd><a name="index-fsched_002dpressure"></a> |
| <p>Enable register pressure sensitive insn scheduling before register |
| allocation. This only makes sense when scheduling before register |
| allocation is enabled, i.e. with <samp>-fschedule-insns</samp> or at |
| <samp>-O2</samp> or higher. Usage of this option can improve the |
| generated code and decrease its size by preventing register pressure |
| increase above the number of available hard registers and subsequent |
| spills in register allocation. |
| </p> |
| </dd> |
| <dt><code>-fsched-spec-load</code></dt> |
| <dd><a name="index-fsched_002dspec_002dload"></a> |
| <p>Allow speculative motion of some load instructions. This only makes |
| sense when scheduling before register allocation, i.e. with |
| <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-spec-load-dangerous</code></dt> |
| <dd><a name="index-fsched_002dspec_002dload_002ddangerous"></a> |
| <p>Allow speculative motion of more load instructions. This only makes |
| sense when scheduling before register allocation, i.e. with |
| <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-stalled-insns</code></dt> |
| <dt><code>-fsched-stalled-insns=<var>n</var></code></dt> |
| <dd><a name="index-fsched_002dstalled_002dinsns"></a> |
| <p>Define how many insns (if any) can be moved prematurely from the queue |
| of stalled insns into the ready list during the second scheduling pass. |
| <samp>-fno-sched-stalled-insns</samp> means that no insns are moved |
| prematurely, <samp>-fsched-stalled-insns=0</samp> means there is no limit |
| on how many queued insns can be moved prematurely. |
| <samp>-fsched-stalled-insns</samp> without a value is equivalent to |
| <samp>-fsched-stalled-insns=1</samp>. |
| </p> |
| </dd> |
| <dt><code>-fsched-stalled-insns-dep</code></dt> |
| <dt><code>-fsched-stalled-insns-dep=<var>n</var></code></dt> |
| <dd><a name="index-fsched_002dstalled_002dinsns_002ddep"></a> |
| <p>Define how many insn groups (cycles) are examined for a dependency |
| on a stalled insn that is a candidate for premature removal from the queue |
| of stalled insns. This has an effect only during the second scheduling pass, |
| and only if <samp>-fsched-stalled-insns</samp> is used. |
| <samp>-fno-sched-stalled-insns-dep</samp> is equivalent to |
| <samp>-fsched-stalled-insns-dep=0</samp>. |
| <samp>-fsched-stalled-insns-dep</samp> without a value is equivalent to |
| <samp>-fsched-stalled-insns-dep=1</samp>. |
| </p> |
| </dd> |
| <dt><code>-fsched2-use-superblocks</code></dt> |
| <dd><a name="index-fsched2_002duse_002dsuperblocks"></a> |
| <p>When scheduling after register allocation, use superblock scheduling. |
| This allows motion across basic block boundaries, |
| resulting in faster schedules. This option is experimental, as not all machine |
| descriptions used by GCC model the CPU closely enough to avoid unreliable |
| results from the algorithm. |
| </p> |
| <p>This only makes sense when scheduling after register allocation, i.e. with |
| <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-group-heuristic</code></dt> |
| <dd><a name="index-fsched_002dgroup_002dheuristic"></a> |
| <p>Enable the group heuristic in the scheduler. This heuristic favors |
| the instruction that belongs to a schedule group. This is enabled |
| by default when scheduling is enabled, i.e. with <samp>-fschedule-insns</samp> |
| or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-critical-path-heuristic</code></dt> |
| <dd><a name="index-fsched_002dcritical_002dpath_002dheuristic"></a> |
| <p>Enable the critical-path heuristic in the scheduler. This heuristic favors |
| instructions on the critical path. This is enabled by default when |
| scheduling is enabled, i.e. with <samp>-fschedule-insns</samp> |
| or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-spec-insn-heuristic</code></dt> |
| <dd><a name="index-fsched_002dspec_002dinsn_002dheuristic"></a> |
| <p>Enable the speculative instruction heuristic in the scheduler. This |
| heuristic favors speculative instructions with greater dependency weakness. |
| This is enabled by default when scheduling is enabled, i.e. |
| with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> |
| or at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-rank-heuristic</code></dt> |
| <dd><a name="index-fsched_002drank_002dheuristic"></a> |
| <p>Enable the rank heuristic in the scheduler. This heuristic favors |
| the instruction belonging to a basic block with greater size or frequency. |
| This is enabled by default when scheduling is enabled, i.e. |
| with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or |
| at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-last-insn-heuristic</code></dt> |
| <dd><a name="index-fsched_002dlast_002dinsn_002dheuristic"></a> |
| <p>Enable the last-instruction heuristic in the scheduler. This heuristic |
| favors the instruction that is less dependent on the last instruction |
| scheduled. This is enabled by default when scheduling is enabled, |
| i.e. with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or |
| at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-fsched-dep-count-heuristic</code></dt> |
| <dd><a name="index-fsched_002ddep_002dcount_002dheuristic"></a> |
| <p>Enable the dependent-count heuristic in the scheduler. This heuristic |
| favors the instruction that has more instructions depending on it. |
| This is enabled by default when scheduling is enabled, i.e. |
| with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or |
| at <samp>-O2</samp> or higher. |
| </p> |
| </dd> |
| <dt><code>-freschedule-modulo-scheduled-loops</code></dt> |
| <dd><a name="index-freschedule_002dmodulo_002dscheduled_002dloops"></a> |
| <p>Modulo scheduling is performed before traditional scheduling. If a loop |
| is modulo scheduled, later scheduling passes may change its schedule. |
| Use this option to control that behavior. |
| </p> |
| </dd> |
| <dt><code>-fselective-scheduling</code></dt> |
| <dd><a name="index-fselective_002dscheduling"></a> |
| <p>Schedule instructions using selective scheduling algorithm. Selective |
| scheduling runs instead of the first scheduler pass. |
| </p> |
| </dd> |
| <dt><code>-fselective-scheduling2</code></dt> |
| <dd><a name="index-fselective_002dscheduling2"></a> |
| <p>Schedule instructions using selective scheduling algorithm. Selective |
| scheduling runs instead of the second scheduler pass. |
| </p> |
| </dd> |
| <dt><code>-fsel-sched-pipelining</code></dt> |
| <dd><a name="index-fsel_002dsched_002dpipelining"></a> |
| <p>Enable software pipelining of innermost loops during selective scheduling. |
| This option has no effect unless one of <samp>-fselective-scheduling</samp> or |
| <samp>-fselective-scheduling2</samp> is turned on. |
| </p> |
| </dd> |
| <dt><code>-fsel-sched-pipelining-outer-loops</code></dt> |
| <dd><a name="index-fsel_002dsched_002dpipelining_002douter_002dloops"></a> |
| <p>When pipelining loops during selective scheduling, also pipeline outer loops. |
| This option has no effect unless <samp>-fsel-sched-pipelining</samp> is turned on. |
| </p> |
| </dd> |
| <dt><code>-fsemantic-interposition</code></dt> |
| <dd><a name="index-fsemantic_002dinterposition"></a> |
| <p>Some object formats, like ELF, allow interposing of symbols by the |
| dynamic linker. |
| This means that for symbols exported from the DSO, the compiler cannot perform |
| interprocedural propagation, inlining and other optimizations in anticipation |
| that the function or variable in question may change. While this feature is |
| useful, for example, to rewrite memory allocation functions by a debugging |
| implementation, it is expensive in the terms of code quality. |
| With <samp>-fno-semantic-interposition</samp> the compiler assumes that |
| if interposition happens for functions the overwriting function will have |
| precisely the same semantics (and side effects). |
| Similarly if interposition happens |
| for variables, the constructor of the variable will be the same. The flag |
| has no effect for functions explicitly declared inline |
| (where it is never allowed for interposition to change semantics) |
| and for symbols explicitly declared weak. |
| </p> |
| </dd> |
| <dt><code>-fshrink-wrap</code></dt> |
| <dd><a name="index-fshrink_002dwrap"></a> |
| <p>Emit function prologues only before parts of the function that need it, |
| rather than at the top of the function. This flag is enabled by default at |
| <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fcaller-saves</code></dt> |
| <dd><a name="index-fcaller_002dsaves"></a> |
| <p>Enable allocation of values to registers that are clobbered by |
| function calls, by emitting extra instructions to save and restore the |
| registers around such calls. Such allocation is done only when it |
| seems to result in better code. |
| </p> |
| <p>This option is always enabled by default on certain machines, usually |
| those which have no call-preserved registers to use instead. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fcombine-stack-adjustments</code></dt> |
| <dd><a name="index-fcombine_002dstack_002dadjustments"></a> |
| <p>Tracks stack adjustments (pushes and pops) and stack memory references |
| and then tries to find ways to combine them. |
| </p> |
| <p>Enabled by default at <samp>-O1</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fipa-ra</code></dt> |
| <dd><a name="index-fipa_002dra"></a> |
| <p>Use caller save registers for allocation if those registers are not used by |
| any called function. In that case it is not necessary to save and restore |
| them around calls. This is only possible if called functions are part of |
| same compilation unit as current function and they are compiled before it. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fconserve-stack</code></dt> |
| <dd><a name="index-fconserve_002dstack"></a> |
| <p>Attempt to minimize stack usage. The compiler attempts to use less |
| stack space, even if that makes the program slower. This option |
| implies setting the <samp>large-stack-frame</samp> parameter to 100 |
| and the <samp>large-stack-frame-growth</samp> parameter to 400. |
| </p> |
| </dd> |
| <dt><code>-ftree-reassoc</code></dt> |
| <dd><a name="index-ftree_002dreassoc"></a> |
| <p>Perform reassociation on trees. This flag is enabled by default |
| at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-pre</code></dt> |
| <dd><a name="index-ftree_002dpre"></a> |
| <p>Perform partial redundancy elimination (PRE) on trees. This flag is |
| enabled by default at <samp>-O2</samp> and <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-ftree-partial-pre</code></dt> |
| <dd><a name="index-ftree_002dpartial_002dpre"></a> |
| <p>Make partial redundancy elimination (PRE) more aggressive. This flag is |
| enabled by default at <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-ftree-forwprop</code></dt> |
| <dd><a name="index-ftree_002dforwprop"></a> |
| <p>Perform forward propagation on trees. This flag is enabled by default |
| at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-fre</code></dt> |
| <dd><a name="index-ftree_002dfre"></a> |
| <p>Perform full redundancy elimination (FRE) on trees. The difference |
| between FRE and PRE is that FRE only considers expressions |
| that are computed on all paths leading to the redundant computation. |
| This analysis is faster than PRE, though it exposes fewer redundancies. |
| This flag is enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-phiprop</code></dt> |
| <dd><a name="index-ftree_002dphiprop"></a> |
| <p>Perform hoisting of loads from conditional pointers on trees. This |
| pass is enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fhoist-adjacent-loads</code></dt> |
| <dd><a name="index-fhoist_002dadjacent_002dloads"></a> |
| <p>Speculatively hoist loads from both branches of an if-then-else if the |
| loads are from adjacent locations in the same structure and the target |
| architecture has a conditional move instruction. This flag is enabled |
| by default at <samp>-O2</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-copy-prop</code></dt> |
| <dd><a name="index-ftree_002dcopy_002dprop"></a> |
| <p>Perform copy propagation on trees. This pass eliminates unnecessary |
| copy operations. This flag is enabled by default at <samp>-O</samp> and |
| higher. |
| </p> |
| </dd> |
| <dt><code>-fipa-pure-const</code></dt> |
| <dd><a name="index-fipa_002dpure_002dconst"></a> |
| <p>Discover which functions are pure or constant. |
| Enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fipa-reference</code></dt> |
| <dd><a name="index-fipa_002dreference"></a> |
| <p>Discover which static variables do not escape the |
| compilation unit. |
| Enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fipa-pta</code></dt> |
| <dd><a name="index-fipa_002dpta"></a> |
| <p>Perform interprocedural pointer analysis and interprocedural modification |
| and reference analysis. This option can cause excessive memory and |
| compile-time usage on large compilation units. It is not enabled by |
| default at any optimization level. |
| </p> |
| </dd> |
| <dt><code>-fipa-profile</code></dt> |
| <dd><a name="index-fipa_002dprofile"></a> |
| <p>Perform interprocedural profile propagation. The functions called only from |
| cold functions are marked as cold. Also functions executed once (such as |
| <code>cold</code>, <code>noreturn</code>, static constructors or destructors) are identified. Cold |
| functions and loop less parts of functions executed once are then optimized for |
| size. |
| Enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fipa-cp</code></dt> |
| <dd><a name="index-fipa_002dcp"></a> |
| <p>Perform interprocedural constant propagation. |
| This optimization analyzes the program to determine when values passed |
| to functions are constants and then optimizes accordingly. |
| This optimization can substantially increase performance |
| if the application has constants passed to functions. |
| This flag is enabled by default at <samp>-O2</samp>, <samp>-Os</samp> and <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-fipa-cp-clone</code></dt> |
| <dd><a name="index-fipa_002dcp_002dclone"></a> |
| <p>Perform function cloning to make interprocedural constant propagation stronger. |
| When enabled, interprocedural constant propagation performs function cloning |
| when externally visible function can be called with constant arguments. |
| Because this optimization can create multiple copies of functions, |
| it may significantly increase code size |
| (see <samp>--param ipcp-unit-growth=<var>value</var></samp>). |
| This flag is enabled by default at <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-fipa-cp-alignment</code></dt> |
| <dd><a name="index-_002dfipa_002dcp_002dalignment"></a> |
| <p>When enabled, this optimization propagates alignment of function |
| parameters to support better vectorization and string operations. |
| </p> |
| <p>This flag is enabled by default at <samp>-O2</samp> and <samp>-Os</samp>. It |
| requires that <samp>-fipa-cp</samp> is enabled. |
| </p> |
| </dd> |
| <dt><code>-fipa-icf</code></dt> |
| <dd><a name="index-fipa_002dicf"></a> |
| <p>Perform Identical Code Folding for functions and read-only variables. |
| The optimization reduces code size and may disturb unwind stacks by replacing |
| a function by equivalent one with a different name. The optimization works |
| more effectively with link time optimization enabled. |
| </p> |
| <p>Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF |
| works on different levels and thus the optimizations are not same - there are |
| equivalences that are found only by GCC and equivalences found only by Gold. |
| </p> |
| <p>This flag is enabled by default at <samp>-O2</samp> and <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fisolate-erroneous-paths-dereference</code></dt> |
| <dd><a name="index-fisolate_002derroneous_002dpaths_002ddereference"></a> |
| <p>Detect paths that trigger erroneous or undefined behavior due to |
| dereferencing a null pointer. Isolate those paths from the main control |
| flow and turn the statement with erroneous or undefined behavior into a trap. |
| This flag is enabled by default at <samp>-O2</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fisolate-erroneous-paths-attribute</code></dt> |
| <dd><a name="index-fisolate_002derroneous_002dpaths_002dattribute"></a> |
| <p>Detect paths that trigger erroneous or undefined behavior due a null value |
| being used in a way forbidden by a <code>returns_nonnull</code> or <code>nonnull</code> |
| attribute. Isolate those paths from the main control flow and turn the |
| statement with erroneous or undefined behavior into a trap. This is not |
| currently enabled, but may be enabled by <samp>-O2</samp> in the future. |
| </p> |
| </dd> |
| <dt><code>-ftree-sink</code></dt> |
| <dd><a name="index-ftree_002dsink"></a> |
| <p>Perform forward store motion on trees. This flag is |
| enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-bit-ccp</code></dt> |
| <dd><a name="index-ftree_002dbit_002dccp"></a> |
| <p>Perform sparse conditional bit constant propagation on trees and propagate |
| pointer alignment information. |
| This pass only operates on local scalar variables and is enabled by default |
| at <samp>-O</samp> and higher. It requires that <samp>-ftree-ccp</samp> is enabled. |
| </p> |
| </dd> |
| <dt><code>-ftree-ccp</code></dt> |
| <dd><a name="index-ftree_002dccp"></a> |
| <p>Perform sparse conditional constant propagation (CCP) on trees. This |
| pass only operates on local scalar variables and is enabled by default |
| at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-fssa-phiopt</code></dt> |
| <dd><a name="index-fssa_002dphiopt"></a> |
| <p>Perform pattern matching on SSA PHI nodes to optimize conditional |
| code. This pass is enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-switch-conversion</code></dt> |
| <dd><a name="index-ftree_002dswitch_002dconversion"></a> |
| <p>Perform conversion of simple initializations in a switch to |
| initializations from a scalar array. This flag is enabled by default |
| at <samp>-O2</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-tail-merge</code></dt> |
| <dd><a name="index-ftree_002dtail_002dmerge"></a> |
| <p>Look for identical code sequences. When found, replace one with a jump to the |
| other. This optimization is known as tail merging or cross jumping. This flag |
| is enabled by default at <samp>-O2</samp> and higher. The compilation time |
| in this pass can |
| be limited using <samp>max-tail-merge-comparisons</samp> parameter and |
| <samp>max-tail-merge-iterations</samp> parameter. |
| </p> |
| </dd> |
| <dt><code>-ftree-dce</code></dt> |
| <dd><a name="index-ftree_002ddce"></a> |
| <p>Perform dead code elimination (DCE) on trees. This flag is enabled by |
| default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-builtin-call-dce</code></dt> |
| <dd><a name="index-ftree_002dbuiltin_002dcall_002ddce"></a> |
| <p>Perform conditional dead code elimination (DCE) for calls to built-in functions |
| that may set <code>errno</code> but are otherwise side-effect free. This flag is |
| enabled by default at <samp>-O2</samp> and higher if <samp>-Os</samp> is not also |
| specified. |
| </p> |
| </dd> |
| <dt><code>-ftree-dominator-opts</code></dt> |
| <dd><a name="index-ftree_002ddominator_002dopts"></a> |
| <p>Perform a variety of simple scalar cleanups (constant/copy |
| propagation, redundancy elimination, range propagation and expression |
| simplification) based on a dominator tree traversal. This also |
| performs jump threading (to reduce jumps to jumps). This flag is |
| enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-dse</code></dt> |
| <dd><a name="index-ftree_002ddse"></a> |
| <p>Perform dead store elimination (DSE) on trees. A dead store is a store into |
| a memory location that is later overwritten by another store without |
| any intervening loads. In this case the earlier store can be deleted. This |
| flag is enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-ch</code></dt> |
| <dd><a name="index-ftree_002dch"></a> |
| <p>Perform loop header copying on trees. This is beneficial since it increases |
| effectiveness of code motion optimizations. It also saves one jump. This flag |
| is enabled by default at <samp>-O</samp> and higher. It is not enabled |
| for <samp>-Os</samp>, since it usually increases code size. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-optimize</code></dt> |
| <dd><a name="index-ftree_002dloop_002doptimize"></a> |
| <p>Perform loop optimizations on trees. This flag is enabled by default |
| at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-linear</code></dt> |
| <dd><a name="index-ftree_002dloop_002dlinear"></a> |
| <p>Perform loop interchange transformations on tree. Same as |
| <samp>-floop-interchange</samp>. To use this code transformation, GCC has |
| to be configured with <samp>--with-isl</samp> to enable the Graphite loop |
| transformation infrastructure. |
| </p> |
| </dd> |
| <dt><code>-floop-interchange</code></dt> |
| <dd><a name="index-floop_002dinterchange"></a> |
| <p>Perform loop interchange transformations on loops. Interchanging two |
| nested loops switches the inner and outer loops. For example, given a |
| loop like: |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO J = 1, M |
| DO I = 1, N |
| A(J, I) = A(J, I) * C |
| ENDDO |
| ENDDO |
| </pre></div> |
| <p>loop interchange transforms the loop as if it were written: |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO I = 1, N |
| DO J = 1, M |
| A(J, I) = A(J, I) * C |
| ENDDO |
| ENDDO |
| </pre></div> |
| <p>which can be beneficial when <code>N</code> is larger than the caches, |
| because in Fortran, the elements of an array are stored in memory |
| contiguously by column, and the original loop iterates over rows, |
| potentially creating at each access a cache miss. This optimization |
| applies to all the languages supported by GCC and is not limited to |
| Fortran. To use this code transformation, GCC has to be configured |
| with <samp>--with-isl</samp> to enable the Graphite loop transformation |
| infrastructure. |
| </p> |
| </dd> |
| <dt><code>-floop-strip-mine</code></dt> |
| <dd><a name="index-floop_002dstrip_002dmine"></a> |
| <p>Perform loop strip mining transformations on loops. Strip mining |
| splits a loop into two nested loops. The outer loop has strides |
| equal to the strip size and the inner loop has strides of the |
| original loop within a strip. The strip length can be changed |
| using the <samp>loop-block-tile-size</samp> parameter. For example, |
| given a loop like: |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO I = 1, N |
| A(I) = A(I) + C |
| ENDDO |
| </pre></div> |
| <p>loop strip mining transforms the loop as if it were written: |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO II = 1, N, 51 |
| DO I = II, min (II + 50, N) |
| A(I) = A(I) + C |
| ENDDO |
| ENDDO |
| </pre></div> |
| <p>This optimization applies to all the languages supported by GCC and is |
| not limited to Fortran. To use this code transformation, GCC has to |
| be configured with <samp>--with-isl</samp> to enable the Graphite loop |
| transformation infrastructure. |
| </p> |
| </dd> |
| <dt><code>-floop-block</code></dt> |
| <dd><a name="index-floop_002dblock"></a> |
| <p>Perform loop blocking transformations on loops. Blocking strip mines |
| each loop in the loop nest such that the memory accesses of the |
| element loops fit inside caches. The strip length can be changed |
| using the <samp>loop-block-tile-size</samp> parameter. For example, given |
| a loop like: |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO I = 1, N |
| DO J = 1, M |
| A(J, I) = B(I) + C(J) |
| ENDDO |
| ENDDO |
| </pre></div> |
| <p>loop blocking transforms the loop as if it were written: |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO II = 1, N, 51 |
| DO JJ = 1, M, 51 |
| DO I = II, min (II + 50, N) |
| DO J = JJ, min (JJ + 50, M) |
| A(J, I) = B(I) + C(J) |
| ENDDO |
| ENDDO |
| ENDDO |
| ENDDO |
| </pre></div> |
| <p>which can be beneficial when <code>M</code> is larger than the caches, |
| because the innermost loop iterates over a smaller amount of data |
| which can be kept in the caches. This optimization applies to all the |
| languages supported by GCC and is not limited to Fortran. To use this |
| code transformation, GCC has to be configured with <samp>--with-isl</samp> |
| to enable the Graphite loop transformation infrastructure. |
| </p> |
| </dd> |
| <dt><code>-fgraphite-identity</code></dt> |
| <dd><a name="index-fgraphite_002didentity"></a> |
| <p>Enable the identity transformation for graphite. For every SCoP we generate |
| the polyhedral representation and transform it back to gimple. Using |
| <samp>-fgraphite-identity</samp> we can check the costs or benefits of the |
| GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations |
| are also performed by the code generator ISL, like index splitting and |
| dead code elimination in loops. |
| </p> |
| </dd> |
| <dt><code>-floop-nest-optimize</code></dt> |
| <dd><a name="index-floop_002dnest_002doptimize"></a> |
| <p>Enable the ISL based loop nest optimizer. This is a generic loop nest |
| optimizer based on the Pluto optimization algorithms. It calculates a loop |
| structure optimized for data-locality and parallelism. This option |
| is experimental. |
| </p> |
| </dd> |
| <dt><code>-floop-unroll-and-jam</code></dt> |
| <dd><a name="index-floop_002dunroll_002dand_002djam"></a> |
| <p>Enable unroll and jam for the ISL based loop nest optimizer. The unroll |
| factor can be changed using the <samp>loop-unroll-jam-size</samp> parameter. |
| The unrolled dimension (counting from the most inner one) can be changed |
| using the <samp>loop-unroll-jam-depth</samp> parameter. . |
| </p> |
| </dd> |
| <dt><code>-floop-parallelize-all</code></dt> |
| <dd><a name="index-floop_002dparallelize_002dall"></a> |
| <p>Use the Graphite data dependence analysis to identify loops that can |
| be parallelized. Parallelize all the loops that can be analyzed to |
| not contain loop carried dependences without checking that it is |
| profitable to parallelize the loops. |
| </p> |
| </dd> |
| <dt><code>-fcheck-data-deps</code></dt> |
| <dd><a name="index-fcheck_002ddata_002ddeps"></a> |
| <p>Compare the results of several data dependence analyzers. This option |
| is used for debugging the data dependence analyzers. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-if-convert</code></dt> |
| <dd><a name="index-ftree_002dloop_002dif_002dconvert"></a> |
| <p>Attempt to transform conditional jumps in the innermost loops to |
| branch-less equivalents. The intent is to remove control-flow from |
| the innermost loops in order to improve the ability of the |
| vectorization pass to handle these loops. This is enabled by default |
| if vectorization is enabled. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-if-convert-stores</code></dt> |
| <dd><a name="index-ftree_002dloop_002dif_002dconvert_002dstores"></a> |
| <p>Attempt to also if-convert conditional jumps containing memory writes. |
| This transformation can be unsafe for multi-threaded programs as it |
| transforms conditional memory writes into unconditional memory writes. |
| For example, |
| </p><div class="smallexample"> |
| <pre class="smallexample">for (i = 0; i < N; i++) |
| if (cond) |
| A[i] = expr; |
| </pre></div> |
| <p>is transformed to |
| </p><div class="smallexample"> |
| <pre class="smallexample">for (i = 0; i < N; i++) |
| A[i] = cond ? expr : A[i]; |
| </pre></div> |
| <p>potentially producing data races. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-distribution</code></dt> |
| <dd><a name="index-ftree_002dloop_002ddistribution"></a> |
| <p>Perform loop distribution. This flag can improve cache performance on |
| big loop bodies and allow further loop optimizations, like |
| parallelization or vectorization, to take place. For example, the loop |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO I = 1, N |
| A(I) = B(I) + C |
| D(I) = E(I) * F |
| ENDDO |
| </pre></div> |
| <p>is transformed to |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO I = 1, N |
| A(I) = B(I) + C |
| ENDDO |
| DO I = 1, N |
| D(I) = E(I) * F |
| ENDDO |
| </pre></div> |
| |
| </dd> |
| <dt><code>-ftree-loop-distribute-patterns</code></dt> |
| <dd><a name="index-ftree_002dloop_002ddistribute_002dpatterns"></a> |
| <p>Perform loop distribution of patterns that can be code generated with |
| calls to a library. This flag is enabled by default at <samp>-O3</samp>. |
| </p> |
| <p>This pass distributes the initialization loops and generates a call to |
| memset zero. For example, the loop |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO I = 1, N |
| A(I) = 0 |
| B(I) = A(I) + I |
| ENDDO |
| </pre></div> |
| <p>is transformed to |
| </p><div class="smallexample"> |
| <pre class="smallexample">DO I = 1, N |
| A(I) = 0 |
| ENDDO |
| DO I = 1, N |
| B(I) = A(I) + I |
| ENDDO |
| </pre></div> |
| <p>and the initialization loop is transformed into a call to memset zero. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-im</code></dt> |
| <dd><a name="index-ftree_002dloop_002dim"></a> |
| <p>Perform loop invariant motion on trees. This pass moves only invariants that |
| are hard to handle at RTL level (function calls, operations that expand to |
| nontrivial sequences of insns). With <samp>-funswitch-loops</samp> it also moves |
| operands of conditions that are invariant out of the loop, so that we can use |
| just trivial invariantness analysis in loop unswitching. The pass also includes |
| store motion. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-ivcanon</code></dt> |
| <dd><a name="index-ftree_002dloop_002divcanon"></a> |
| <p>Create a canonical counter for number of iterations in loops for which |
| determining number of iterations requires complicated analysis. Later |
| optimizations then may determine the number easily. Useful especially |
| in connection with unrolling. |
| </p> |
| </dd> |
| <dt><code>-fivopts</code></dt> |
| <dd><a name="index-fivopts"></a> |
| <p>Perform induction variable optimizations (strength reduction, induction |
| variable merging and induction variable elimination) on trees. |
| </p> |
| </dd> |
| <dt><code>-ftree-parallelize-loops=n</code></dt> |
| <dd><a name="index-ftree_002dparallelize_002dloops"></a> |
| <p>Parallelize loops, i.e., split their iteration space to run in n threads. |
| This is only possible for loops whose iterations are independent |
| and can be arbitrarily reordered. The optimization is only |
| profitable on multiprocessor machines, for loops that are CPU-intensive, |
| rather than constrained e.g. by memory bandwidth. This option |
| implies <samp>-pthread</samp>, and thus is only supported on targets |
| that have support for <samp>-pthread</samp>. |
| </p> |
| </dd> |
| <dt><code>-ftree-pta</code></dt> |
| <dd><a name="index-ftree_002dpta"></a> |
| <p>Perform function-local points-to analysis on trees. This flag is |
| enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-sra</code></dt> |
| <dd><a name="index-ftree_002dsra"></a> |
| <p>Perform scalar replacement of aggregates. This pass replaces structure |
| references with scalars to prevent committing structures to memory too |
| early. This flag is enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-copyrename</code></dt> |
| <dd><a name="index-ftree_002dcopyrename"></a> |
| <p>Perform copy renaming on trees. This pass attempts to rename compiler |
| temporaries to other variables at copy locations, usually resulting in |
| variable names which more closely resemble the original variables. This flag |
| is enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-coalesce-inlined-vars</code></dt> |
| <dd><a name="index-ftree_002dcoalesce_002dinlined_002dvars"></a> |
| <p>Tell the copyrename pass (see <samp>-ftree-copyrename</samp>) to attempt to |
| combine small user-defined variables too, but only if they are inlined |
| from other functions. It is a more limited form of |
| <samp>-ftree-coalesce-vars</samp>. This may harm debug information of such |
| inlined variables, but it keeps variables of the inlined-into |
| function apart from each other, such that they are more likely to |
| contain the expected values in a debugging session. |
| </p> |
| </dd> |
| <dt><code>-ftree-coalesce-vars</code></dt> |
| <dd><a name="index-ftree_002dcoalesce_002dvars"></a> |
| <p>Tell the copyrename pass (see <samp>-ftree-copyrename</samp>) to attempt to |
| combine small user-defined variables too, instead of just compiler |
| temporaries. This may severely limit the ability to debug an optimized |
| program compiled with <samp>-fno-var-tracking-assignments</samp>. In the |
| negated form, this flag prevents SSA coalescing of user variables, |
| including inlined ones. This option is enabled by default. |
| </p> |
| </dd> |
| <dt><code>-ftree-ter</code></dt> |
| <dd><a name="index-ftree_002dter"></a> |
| <p>Perform temporary expression replacement during the SSA->normal phase. Single |
| use/single def temporaries are replaced at their use location with their |
| defining expression. This results in non-GIMPLE code, but gives the expanders |
| much more complex trees to work on resulting in better RTL generation. This is |
| enabled by default at <samp>-O</samp> and higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-slsr</code></dt> |
| <dd><a name="index-ftree_002dslsr"></a> |
| <p>Perform straight-line strength reduction on trees. This recognizes related |
| expressions involving multiplications and replaces them by less expensive |
| calculations when possible. This is enabled by default at <samp>-O</samp> and |
| higher. |
| </p> |
| </dd> |
| <dt><code>-ftree-vectorize</code></dt> |
| <dd><a name="index-ftree_002dvectorize"></a> |
| <p>Perform vectorization on trees. This flag enables <samp>-ftree-loop-vectorize</samp> |
| and <samp>-ftree-slp-vectorize</samp> if not explicitly specified. |
| </p> |
| </dd> |
| <dt><code>-ftree-loop-vectorize</code></dt> |
| <dd><a name="index-ftree_002dloop_002dvectorize"></a> |
| <p>Perform loop vectorization on trees. This flag is enabled by default at |
| <samp>-O3</samp> and when <samp>-ftree-vectorize</samp> is enabled. |
| </p> |
| </dd> |
| <dt><code>-ftree-slp-vectorize</code></dt> |
| <dd><a name="index-ftree_002dslp_002dvectorize"></a> |
| <p>Perform basic block vectorization on trees. This flag is enabled by default at |
| <samp>-O3</samp> and when <samp>-ftree-vectorize</samp> is enabled. |
| </p> |
| </dd> |
| <dt><code>-fvect-cost-model=<var>model</var></code></dt> |
| <dd><a name="index-fvect_002dcost_002dmodel"></a> |
| <p>Alter the cost model used for vectorization. The <var>model</var> argument |
| should be one of ‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’ or ‘<samp>cheap</samp>’. |
| With the ‘<samp>unlimited</samp>’ model the vectorized code-path is assumed |
| to be profitable while with the ‘<samp>dynamic</samp>’ model a runtime check |
| guards the vectorized code-path to enable it only for iteration |
| counts that will likely execute faster than when executing the original |
| scalar loop. The ‘<samp>cheap</samp>’ model disables vectorization of |
| loops where doing so would be cost prohibitive for example due to |
| required runtime checks for data dependence or alignment but otherwise |
| is equal to the ‘<samp>dynamic</samp>’ model. |
| The default cost model depends on other optimization flags and is |
| either ‘<samp>dynamic</samp>’ or ‘<samp>cheap</samp>’. |
| </p> |
| </dd> |
| <dt><code>-fsimd-cost-model=<var>model</var></code></dt> |
| <dd><a name="index-fsimd_002dcost_002dmodel"></a> |
| <p>Alter the cost model used for vectorization of loops marked with the OpenMP |
| or Cilk Plus simd directive. The <var>model</var> argument should be one of |
| ‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’, ‘<samp>cheap</samp>’. All values of <var>model</var> |
| have the same meaning as described in <samp>-fvect-cost-model</samp> and by |
| default a cost model defined with <samp>-fvect-cost-model</samp> is used. |
| </p> |
| </dd> |
| <dt><code>-ftree-vrp</code></dt> |
| <dd><a name="index-ftree_002dvrp"></a> |
| <p>Perform Value Range Propagation on trees. This is similar to the |
| constant propagation pass, but instead of values, ranges of values are |
| propagated. This allows the optimizers to remove unnecessary range |
| checks like array bound checks and null pointer checks. This is |
| enabled by default at <samp>-O2</samp> and higher. Null pointer check |
| elimination is only done if <samp>-fdelete-null-pointer-checks</samp> is |
| enabled. |
| </p> |
| </dd> |
| <dt><code>-fsplit-ivs-in-unroller</code></dt> |
| <dd><a name="index-fsplit_002divs_002din_002dunroller"></a> |
| <p>Enables expression of values of induction variables in later iterations |
| of the unrolled loop using the value in the first iteration. This breaks |
| long dependency chains, thus improving efficiency of the scheduling passes. |
| </p> |
| <p>A combination of <samp>-fweb</samp> and CSE is often sufficient to obtain the |
| same effect. However, that is not reliable in cases where the loop body |
| is more complicated than a single basic block. It also does not work at all |
| on some architectures due to restrictions in the CSE pass. |
| </p> |
| <p>This optimization is enabled by default. |
| </p> |
| </dd> |
| <dt><code>-fvariable-expansion-in-unroller</code></dt> |
| <dd><a name="index-fvariable_002dexpansion_002din_002dunroller"></a> |
| <p>With this option, the compiler creates multiple copies of some |
| local variables when unrolling a loop, which can result in superior code. |
| </p> |
| </dd> |
| <dt><code>-fpartial-inlining</code></dt> |
| <dd><a name="index-fpartial_002dinlining"></a> |
| <p>Inline parts of functions. This option has any effect only |
| when inlining itself is turned on by the <samp>-finline-functions</samp> |
| or <samp>-finline-small-functions</samp> options. |
| </p> |
| <p>Enabled at level <samp>-O2</samp>. |
| </p> |
| </dd> |
| <dt><code>-fpredictive-commoning</code></dt> |
| <dd><a name="index-fpredictive_002dcommoning"></a> |
| <p>Perform predictive commoning optimization, i.e., reusing computations |
| (especially memory loads and stores) performed in previous |
| iterations of loops. |
| </p> |
| <p>This option is enabled at level <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-fprefetch-loop-arrays</code></dt> |
| <dd><a name="index-fprefetch_002dloop_002darrays"></a> |
| <p>If supported by the target machine, generate instructions to prefetch |
| memory to improve the performance of loops that access large arrays. |
| </p> |
| <p>This option may generate better or worse code; results are highly |
| dependent on the structure of loops within the source code. |
| </p> |
| <p>Disabled at level <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-peephole</code></dt> |
| <dt><code>-fno-peephole2</code></dt> |
| <dd><a name="index-fno_002dpeephole"></a> |
| <a name="index-fno_002dpeephole2"></a> |
| <p>Disable any machine-specific peephole optimizations. The difference |
| between <samp>-fno-peephole</samp> and <samp>-fno-peephole2</samp> is in how they |
| are implemented in the compiler; some targets use one, some use the |
| other, a few use both. |
| </p> |
| <p><samp>-fpeephole</samp> is enabled by default. |
| <samp>-fpeephole2</samp> enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-guess-branch-probability</code></dt> |
| <dd><a name="index-fno_002dguess_002dbranch_002dprobability"></a> |
| <p>Do not guess branch probabilities using heuristics. |
| </p> |
| <p>GCC uses heuristics to guess branch probabilities if they are |
| not provided by profiling feedback (<samp>-fprofile-arcs</samp>). These |
| heuristics are based on the control flow graph. If some branch probabilities |
| are specified by <code>__builtin_expect</code>, then the heuristics are |
| used to guess branch probabilities for the rest of the control flow graph, |
| taking the <code>__builtin_expect</code> info into account. The interactions |
| between the heuristics and <code>__builtin_expect</code> can be complex, and in |
| some cases, it may be useful to disable the heuristics so that the effects |
| of <code>__builtin_expect</code> are easier to understand. |
| </p> |
| <p>The default is <samp>-fguess-branch-probability</samp> at levels |
| <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-freorder-blocks</code></dt> |
| <dd><a name="index-freorder_002dblocks"></a> |
| <p>Reorder basic blocks in the compiled function in order to reduce number of |
| taken branches and improve code locality. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-freorder-blocks-and-partition</code></dt> |
| <dd><a name="index-freorder_002dblocks_002dand_002dpartition"></a> |
| <p>In addition to reordering basic blocks in the compiled function, in order |
| to reduce number of taken branches, partitions hot and cold basic blocks |
| into separate sections of the assembly and .o files, to improve |
| paging and cache locality performance. |
| </p> |
| <p>This optimization is automatically turned off in the presence of |
| exception handling, for linkonce sections, for functions with a user-defined |
| section attribute and on any architecture that does not support named |
| sections. |
| </p> |
| <p>Enabled for x86 at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-freorder-functions</code></dt> |
| <dd><a name="index-freorder_002dfunctions"></a> |
| <p>Reorder functions in the object file in order to |
| improve code locality. This is implemented by using special |
| subsections <code>.text.hot</code> for most frequently executed functions and |
| <code>.text.unlikely</code> for unlikely executed functions. Reordering is done by |
| the linker so object file format must support named sections and linker must |
| place them in a reasonable way. |
| </p> |
| <p>Also profile feedback must be available to make this option effective. See |
| <samp>-fprofile-arcs</samp> for details. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fstrict-aliasing</code></dt> |
| <dd><a name="index-fstrict_002daliasing"></a> |
| <p>Allow the compiler to assume the strictest aliasing rules applicable to |
| the language being compiled. For C (and C++), this activates |
| optimizations based on the type of expressions. In particular, an |
| object of one type is assumed never to reside at the same address as an |
| object of a different type, unless the types are almost the same. For |
| example, an <code>unsigned int</code> can alias an <code>int</code>, but not a |
| <code>void*</code> or a <code>double</code>. A character type may alias any other |
| type. |
| </p> |
| <a name="Type_002dpunning"></a><p>Pay special attention to code like this: |
| </p><div class="smallexample"> |
| <pre class="smallexample">union a_union { |
| int i; |
| double d; |
| }; |
| |
| int f() { |
| union a_union t; |
| t.d = 3.0; |
| return t.i; |
| } |
| </pre></div> |
| <p>The practice of reading from a different union member than the one most |
| recently written to (called “type-punning”) is common. Even with |
| <samp>-fstrict-aliasing</samp>, type-punning is allowed, provided the memory |
| is accessed through the union type. So, the code above works as |
| expected. See <a href="Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation">Structures unions enumerations and bit-fields implementation</a>. However, this code might not: |
| </p><div class="smallexample"> |
| <pre class="smallexample">int f() { |
| union a_union t; |
| int* ip; |
| t.d = 3.0; |
| ip = &t.i; |
| return *ip; |
| } |
| </pre></div> |
| |
| <p>Similarly, access by taking the address, casting the resulting pointer |
| and dereferencing the result has undefined behavior, even if the cast |
| uses a union type, e.g.: |
| </p><div class="smallexample"> |
| <pre class="smallexample">int f() { |
| double d = 3.0; |
| return ((union a_union *) &d)->i; |
| } |
| </pre></div> |
| |
| <p>The <samp>-fstrict-aliasing</samp> option is enabled at levels |
| <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fstrict-overflow</code></dt> |
| <dd><a name="index-fstrict_002doverflow"></a> |
| <p>Allow the compiler to assume strict signed overflow rules, depending |
| on the language being compiled. For C (and C++) this means that |
| overflow when doing arithmetic with signed numbers is undefined, which |
| means that the compiler may assume that it does not happen. This |
| permits various optimizations. For example, the compiler assumes |
| that an expression like <code>i + 10 > i</code> is always true for |
| signed <code>i</code>. This assumption is only valid if signed overflow is |
| undefined, as the expression is false if <code>i + 10</code> overflows when |
| using twos complement arithmetic. When this option is in effect any |
| attempt to determine whether an operation on signed numbers |
| overflows must be written carefully to not actually involve overflow. |
| </p> |
| <p>This option also allows the compiler to assume strict pointer |
| semantics: given a pointer to an object, if adding an offset to that |
| pointer does not produce a pointer to the same object, the addition is |
| undefined. This permits the compiler to conclude that <code>p + u > |
| p</code> is always true for a pointer <code>p</code> and unsigned integer |
| <code>u</code>. This assumption is only valid because pointer wraparound is |
| undefined, as the expression is false if <code>p + u</code> overflows using |
| twos complement arithmetic. |
| </p> |
| <p>See also the <samp>-fwrapv</samp> option. Using <samp>-fwrapv</samp> means |
| that integer signed overflow is fully defined: it wraps. When |
| <samp>-fwrapv</samp> is used, there is no difference between |
| <samp>-fstrict-overflow</samp> and <samp>-fno-strict-overflow</samp> for |
| integers. With <samp>-fwrapv</samp> certain types of overflow are |
| permitted. For example, if the compiler gets an overflow when doing |
| arithmetic on constants, the overflowed value can still be used with |
| <samp>-fwrapv</samp>, but not otherwise. |
| </p> |
| <p>The <samp>-fstrict-overflow</samp> option is enabled at levels |
| <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-falign-functions</code></dt> |
| <dt><code>-falign-functions=<var>n</var></code></dt> |
| <dd><a name="index-falign_002dfunctions"></a> |
| <p>Align the start of functions to the next power-of-two greater than |
| <var>n</var>, skipping up to <var>n</var> bytes. For instance, |
| <samp>-falign-functions=32</samp> aligns functions to the next 32-byte |
| boundary, but <samp>-falign-functions=24</samp> aligns to the next |
| 32-byte boundary only if this can be done by skipping 23 bytes or less. |
| </p> |
| <p><samp>-fno-align-functions</samp> and <samp>-falign-functions=1</samp> are |
| equivalent and mean that functions are not aligned. |
| </p> |
| <p>Some assemblers only support this flag when <var>n</var> is a power of two; |
| in that case, it is rounded up. |
| </p> |
| <p>If <var>n</var> is not specified or is zero, use a machine-dependent default. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-falign-labels</code></dt> |
| <dt><code>-falign-labels=<var>n</var></code></dt> |
| <dd><a name="index-falign_002dlabels"></a> |
| <p>Align all branch targets to a power-of-two boundary, skipping up to |
| <var>n</var> bytes like <samp>-falign-functions</samp>. This option can easily |
| make code slower, because it must insert dummy operations for when the |
| branch target is reached in the usual flow of the code. |
| </p> |
| <p><samp>-fno-align-labels</samp> and <samp>-falign-labels=1</samp> are |
| equivalent and mean that labels are not aligned. |
| </p> |
| <p>If <samp>-falign-loops</samp> or <samp>-falign-jumps</samp> are applicable and |
| are greater than this value, then their values are used instead. |
| </p> |
| <p>If <var>n</var> is not specified or is zero, use a machine-dependent default |
| which is very likely to be ‘<samp>1</samp>’, meaning no alignment. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-falign-loops</code></dt> |
| <dt><code>-falign-loops=<var>n</var></code></dt> |
| <dd><a name="index-falign_002dloops"></a> |
| <p>Align loops to a power-of-two boundary, skipping up to <var>n</var> bytes |
| like <samp>-falign-functions</samp>. If the loops are |
| executed many times, this makes up for any execution of the dummy |
| operations. |
| </p> |
| <p><samp>-fno-align-loops</samp> and <samp>-falign-loops=1</samp> are |
| equivalent and mean that loops are not aligned. |
| </p> |
| <p>If <var>n</var> is not specified or is zero, use a machine-dependent default. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-falign-jumps</code></dt> |
| <dt><code>-falign-jumps=<var>n</var></code></dt> |
| <dd><a name="index-falign_002djumps"></a> |
| <p>Align branch targets to a power-of-two boundary, for branch targets |
| where the targets can only be reached by jumping, skipping up to <var>n</var> |
| bytes like <samp>-falign-functions</samp>. In this case, no dummy operations |
| need be executed. |
| </p> |
| <p><samp>-fno-align-jumps</samp> and <samp>-falign-jumps=1</samp> are |
| equivalent and mean that loops are not aligned. |
| </p> |
| <p>If <var>n</var> is not specified or is zero, use a machine-dependent default. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>-funit-at-a-time</code></dt> |
| <dd><a name="index-funit_002dat_002da_002dtime"></a> |
| <p>This option is left for compatibility reasons. <samp>-funit-at-a-time</samp> |
| has no effect, while <samp>-fno-unit-at-a-time</samp> implies |
| <samp>-fno-toplevel-reorder</samp> and <samp>-fno-section-anchors</samp>. |
| </p> |
| <p>Enabled by default. |
| </p> |
| </dd> |
| <dt><code>-fno-toplevel-reorder</code></dt> |
| <dd><a name="index-fno_002dtoplevel_002dreorder"></a> |
| <p>Do not reorder top-level functions, variables, and <code>asm</code> |
| statements. Output them in the same order that they appear in the |
| input file. When this option is used, unreferenced static variables |
| are not removed. This option is intended to support existing code |
| that relies on a particular ordering. For new code, it is better to |
| use attributes when possible. |
| </p> |
| <p>Enabled at level <samp>-O0</samp>. When disabled explicitly, it also implies |
| <samp>-fno-section-anchors</samp>, which is otherwise enabled at <samp>-O0</samp> on some |
| targets. |
| </p> |
| </dd> |
| <dt><code>-fweb</code></dt> |
| <dd><a name="index-fweb"></a> |
| <p>Constructs webs as commonly used for register allocation purposes and assign |
| each web individual pseudo register. This allows the register allocation pass |
| to operate on pseudos directly, but also strengthens several other optimization |
| passes, such as CSE, loop optimizer and trivial dead code remover. It can, |
| however, make debugging impossible, since variables no longer stay in a |
| “home register”. |
| </p> |
| <p>Enabled by default with <samp>-funroll-loops</samp>. |
| </p> |
| </dd> |
| <dt><code>-fwhole-program</code></dt> |
| <dd><a name="index-fwhole_002dprogram"></a> |
| <p>Assume that the current compilation unit represents the whole program being |
| compiled. All public functions and variables with the exception of <code>main</code> |
| and those merged by attribute <code>externally_visible</code> become static functions |
| and in effect are optimized more aggressively by interprocedural optimizers. |
| </p> |
| <p>This option should not be used in combination with <samp>-flto</samp>. |
| Instead relying on a linker plugin should provide safer and more precise |
| information. |
| </p> |
| </dd> |
| <dt><code>-flto[=<var>n</var>]</code></dt> |
| <dd><a name="index-flto"></a> |
| <p>This option runs the standard link-time optimizer. When invoked |
| with source code, it generates GIMPLE (one of GCC’s internal |
| representations) and writes it to special ELF sections in the object |
| file. When the object files are linked together, all the function |
| bodies are read from these ELF sections and instantiated as if they |
| had been part of the same translation unit. |
| </p> |
| <p>To use the link-time optimizer, <samp>-flto</samp> and optimization |
| options should be specified at compile time and during the final link. |
| For example: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">gcc -c -O2 -flto foo.c |
| gcc -c -O2 -flto bar.c |
| gcc -o myprog -flto -O2 foo.o bar.o |
| </pre></div> |
| |
| <p>The first two invocations to GCC save a bytecode representation |
| of GIMPLE into special ELF sections inside <samp>foo.o</samp> and |
| <samp>bar.o</samp>. The final invocation reads the GIMPLE bytecode from |
| <samp>foo.o</samp> and <samp>bar.o</samp>, merges the two files into a single |
| internal image, and compiles the result as usual. Since both |
| <samp>foo.o</samp> and <samp>bar.o</samp> are merged into a single image, this |
| causes all the interprocedural analyses and optimizations in GCC to |
| work across the two files as if they were a single one. This means, |
| for example, that the inliner is able to inline functions in |
| <samp>bar.o</samp> into functions in <samp>foo.o</samp> and vice-versa. |
| </p> |
| <p>Another (simpler) way to enable link-time optimization is: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">gcc -o myprog -flto -O2 foo.c bar.c |
| </pre></div> |
| |
| <p>The above generates bytecode for <samp>foo.c</samp> and <samp>bar.c</samp>, |
| merges them together into a single GIMPLE representation and optimizes |
| them as usual to produce <samp>myprog</samp>. |
| </p> |
| <p>The only important thing to keep in mind is that to enable link-time |
| optimizations you need to use the GCC driver to perform the link-step. |
| GCC then automatically performs link-time optimization if any of the |
| objects involved were compiled with the <samp>-flto</samp> command-line option. |
| You generally |
| should specify the optimization options to be used for link-time |
| optimization though GCC tries to be clever at guessing an |
| optimization level to use from the options used at compile-time |
| if you fail to specify one at link-time. You can always override |
| the automatic decision to do link-time optimization at link-time |
| by passing <samp>-fno-lto</samp> to the link command. |
| </p> |
| <p>To make whole program optimization effective, it is necessary to make |
| certain whole program assumptions. The compiler needs to know |
| what functions and variables can be accessed by libraries and runtime |
| outside of the link-time optimized unit. When supported by the linker, |
| the linker plugin (see <samp>-fuse-linker-plugin</samp>) passes information |
| to the compiler about used and externally visible symbols. When |
| the linker plugin is not available, <samp>-fwhole-program</samp> should be |
| used to allow the compiler to make these assumptions, which leads |
| to more aggressive optimization decisions. |
| </p> |
| <p>When <samp>-fuse-linker-plugin</samp> is not enabled then, when a file is |
| compiled with <samp>-flto</samp>, the generated object file is larger than |
| a regular object file because it contains GIMPLE bytecodes and the usual |
| final code (see <samp>-ffat-lto-objects</samp>. This means that |
| object files with LTO information can be linked as normal object |
| files; if <samp>-fno-lto</samp> is passed to the linker, no |
| interprocedural optimizations are applied. Note that when |
| <samp>-fno-fat-lto-objects</samp> is enabled the compile-stage is faster |
| but you cannot perform a regular, non-LTO link on them. |
| </p> |
| <p>Additionally, the optimization flags used to compile individual files |
| are not necessarily related to those used at link time. For instance, |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">gcc -c -O0 -ffat-lto-objects -flto foo.c |
| gcc -c -O0 -ffat-lto-objects -flto bar.c |
| gcc -o myprog -O3 foo.o bar.o |
| </pre></div> |
| |
| <p>This produces individual object files with unoptimized assembler |
| code, but the resulting binary <samp>myprog</samp> is optimized at |
| <samp>-O3</samp>. If, instead, the final binary is generated with |
| <samp>-fno-lto</samp>, then <samp>myprog</samp> is not optimized. |
| </p> |
| <p>When producing the final binary, GCC only |
| applies link-time optimizations to those files that contain bytecode. |
| Therefore, you can mix and match object files and libraries with |
| GIMPLE bytecodes and final object code. GCC automatically selects |
| which files to optimize in LTO mode and which files to link without |
| further processing. |
| </p> |
| <p>There are some code generation flags preserved by GCC when |
| generating bytecodes, as they need to be used during the final link |
| stage. Generally options specified at link-time override those |
| specified at compile-time. |
| </p> |
| <p>If you do not specify an optimization level option <samp>-O</samp> at |
| link-time then GCC computes one based on the optimization levels |
| used when compiling the object files. The highest optimization |
| level wins here. |
| </p> |
| <p>Currently, the following options and their setting are take from |
| the first object file that explicitely specified it: |
| <samp>-fPIC</samp>, <samp>-fpic</samp>, <samp>-fpie</samp>, <samp>-fcommon</samp>, |
| <samp>-fexceptions</samp>, <samp>-fnon-call-exceptions</samp>, <samp>-fgnu-tm</samp> |
| and all the <samp>-m</samp> target flags. |
| </p> |
| <p>Certain ABI changing flags are required to match in all compilation-units |
| and trying to override this at link-time with a conflicting value |
| is ignored. This includes options such as <samp>-freg-struct-return</samp> |
| and <samp>-fpcc-struct-return</samp>. |
| </p> |
| <p>Other options such as <samp>-ffp-contract</samp>, <samp>-fno-strict-overflow</samp>, |
| <samp>-fwrapv</samp>, <samp>-fno-trapv</samp> or <samp>-fno-strict-aliasing</samp> |
| are passed through to the link stage and merged conservatively for |
| conflicting translation units. Specifically |
| <samp>-fno-strict-overflow</samp>, <samp>-fwrapv</samp> and <samp>-fno-trapv</samp> take |
| precedence and for example <samp>-ffp-contract=off</samp> takes precedence |
| over <samp>-ffp-contract=fast</samp>. You can override them at linke-time. |
| </p> |
| <p>It is recommended that you compile all the files participating in the |
| same link with the same options and also specify those options at |
| link time. |
| </p> |
| <p>If LTO encounters objects with C linkage declared with incompatible |
| types in separate translation units to be linked together (undefined |
| behavior according to ISO C99 6.2.7), a non-fatal diagnostic may be |
| issued. The behavior is still undefined at run time. Similar |
| diagnostics may be raised for other languages. |
| </p> |
| <p>Another feature of LTO is that it is possible to apply interprocedural |
| optimizations on files written in different languages: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">gcc -c -flto foo.c |
| g++ -c -flto bar.cc |
| gfortran -c -flto baz.f90 |
| g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran |
| </pre></div> |
| |
| <p>Notice that the final link is done with <code>g++</code> to get the C++ |
| runtime libraries and <samp>-lgfortran</samp> is added to get the Fortran |
| runtime libraries. In general, when mixing languages in LTO mode, you |
| should use the same link command options as when mixing languages in a |
| regular (non-LTO) compilation. |
| </p> |
| <p>If object files containing GIMPLE bytecode are stored in a library archive, say |
| <samp>libfoo.a</samp>, it is possible to extract and use them in an LTO link if you |
| are using a linker with plugin support. To create static libraries suitable |
| for LTO, use <code>gcc-ar</code> and <code>gcc-ranlib</code> instead of <code>ar</code> |
| and <code>ranlib</code>; |
| to show the symbols of object files with GIMPLE bytecode, use |
| <code>gcc-nm</code>. Those commands require that <code>ar</code>, <code>ranlib</code> |
| and <code>nm</code> have been compiled with plugin support. At link time, use the the |
| flag <samp>-fuse-linker-plugin</samp> to ensure that the library participates in |
| the LTO optimization process: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo |
| </pre></div> |
| |
| <p>With the linker plugin enabled, the linker extracts the needed |
| GIMPLE files from <samp>libfoo.a</samp> and passes them on to the running GCC |
| to make them part of the aggregated GIMPLE image to be optimized. |
| </p> |
| <p>If you are not using a linker with plugin support and/or do not |
| enable the linker plugin, then the objects inside <samp>libfoo.a</samp> |
| are extracted and linked as usual, but they do not participate |
| in the LTO optimization process. In order to make a static library suitable |
| for both LTO optimization and usual linkage, compile its object files with |
| <samp>-flto</samp> <samp>-ffat-lto-objects</samp>. |
| </p> |
| <p>Link-time optimizations do not require the presence of the whole program to |
| operate. If the program does not require any symbols to be exported, it is |
| possible to combine <samp>-flto</samp> and <samp>-fwhole-program</samp> to allow |
| the interprocedural optimizers to use more aggressive assumptions which may |
| lead to improved optimization opportunities. |
| Use of <samp>-fwhole-program</samp> is not needed when linker plugin is |
| active (see <samp>-fuse-linker-plugin</samp>). |
| </p> |
| <p>The current implementation of LTO makes no |
| attempt to generate bytecode that is portable between different |
| types of hosts. The bytecode files are versioned and there is a |
| strict version check, so bytecode files generated in one version of |
| GCC do not work with an older or newer version of GCC. |
| </p> |
| <p>Link-time optimization does not work well with generation of debugging |
| information. Combining <samp>-flto</samp> with |
| <samp>-g</samp> is currently experimental and expected to produce unexpected |
| results. |
| </p> |
| <p>If you specify the optional <var>n</var>, the optimization and code |
| generation done at link time is executed in parallel using <var>n</var> |
| parallel jobs by utilizing an installed <code>make</code> program. The |
| environment variable <code>MAKE</code> may be used to override the program |
| used. The default value for <var>n</var> is 1. |
| </p> |
| <p>You can also specify <samp>-flto=jobserver</samp> to use GNU make’s |
| job server mode to determine the number of parallel jobs. This |
| is useful when the Makefile calling GCC is already executing in parallel. |
| You must prepend a ‘<samp>+</samp>’ to the command recipe in the parent Makefile |
| for this to work. This option likely only works if <code>MAKE</code> is |
| GNU make. |
| </p> |
| </dd> |
| <dt><code>-flto-partition=<var>alg</var></code></dt> |
| <dd><a name="index-flto_002dpartition"></a> |
| <p>Specify the partitioning algorithm used by the link-time optimizer. |
| The value is either ‘<samp>1to1</samp>’ to specify a partitioning mirroring |
| the original source files or ‘<samp>balanced</samp>’ to specify partitioning |
| into equally sized chunks (whenever possible) or ‘<samp>max</samp>’ to create |
| new partition for every symbol where possible. Specifying ‘<samp>none</samp>’ |
| as an algorithm disables partitioning and streaming completely. |
| The default value is ‘<samp>balanced</samp>’. While ‘<samp>1to1</samp>’ can be used |
| as an workaround for various code ordering issues, the ‘<samp>max</samp>’ |
| partitioning is intended for internal testing only. |
| The value ‘<samp>one</samp>’ specifies that exactly one partition should be |
| used while the value ‘<samp>none</samp>’ bypasses partitioning and executes |
| the link-time optimization step directly from the WPA phase. |
| </p> |
| </dd> |
| <dt><code>-flto-odr-type-merging</code></dt> |
| <dd><a name="index-flto_002dodr_002dtype_002dmerging"></a> |
| <p>Enable streaming of mangled types names of C++ types and their unification |
| at linktime. This increases size of LTO object files, but enable |
| diagnostics about One Definition Rule violations. |
| </p> |
| </dd> |
| <dt><code>-flto-compression-level=<var>n</var></code></dt> |
| <dd><a name="index-flto_002dcompression_002dlevel"></a> |
| <p>This option specifies the level of compression used for intermediate |
| language written to LTO object files, and is only meaningful in |
| conjunction with LTO mode (<samp>-flto</samp>). Valid |
| values are 0 (no compression) to 9 (maximum compression). Values |
| outside this range are clamped to either 0 or 9. If the option is not |
| given, a default balanced compression setting is used. |
| </p> |
| </dd> |
| <dt><code>-flto-report</code></dt> |
| <dd><a name="index-flto_002dreport"></a> |
| <p>Prints a report with internal details on the workings of the link-time |
| optimizer. The contents of this report vary from version to version. |
| It is meant to be useful to GCC developers when processing object |
| files in LTO mode (via <samp>-flto</samp>). |
| </p> |
| <p>Disabled by default. |
| </p> |
| </dd> |
| <dt><code>-flto-report-wpa</code></dt> |
| <dd><a name="index-flto_002dreport_002dwpa"></a> |
| <p>Like <samp>-flto-report</samp>, but only print for the WPA phase of Link |
| Time Optimization. |
| </p> |
| </dd> |
| <dt><code>-fuse-linker-plugin</code></dt> |
| <dd><a name="index-fuse_002dlinker_002dplugin"></a> |
| <p>Enables the use of a linker plugin during link-time optimization. This |
| option relies on plugin support in the linker, which is available in gold |
| or in GNU ld 2.21 or newer. |
| </p> |
| <p>This option enables the extraction of object files with GIMPLE bytecode out |
| of library archives. This improves the quality of optimization by exposing |
| more code to the link-time optimizer. This information specifies what |
| symbols can be accessed externally (by non-LTO object or during dynamic |
| linking). Resulting code quality improvements on binaries (and shared |
| libraries that use hidden visibility) are similar to <samp>-fwhole-program</samp>. |
| See <samp>-flto</samp> for a description of the effect of this flag and how to |
| use it. |
| </p> |
| <p>This option is enabled by default when LTO support in GCC is enabled |
| and GCC was configured for use with |
| a linker supporting plugins (GNU ld 2.21 or newer or gold). |
| </p> |
| </dd> |
| <dt><code>-ffat-lto-objects</code></dt> |
| <dd><a name="index-ffat_002dlto_002dobjects"></a> |
| <p>Fat LTO objects are object files that contain both the intermediate language |
| and the object code. This makes them usable for both LTO linking and normal |
| linking. This option is effective only when compiling with <samp>-flto</samp> |
| and is ignored at link time. |
| </p> |
| <p><samp>-fno-fat-lto-objects</samp> improves compilation time over plain LTO, but |
| requires the complete toolchain to be aware of LTO. It requires a linker with |
| linker plugin support for basic functionality. Additionally, |
| <code>nm</code>, <code>ar</code> and <code>ranlib</code> |
| need to support linker plugins to allow a full-featured build environment |
| (capable of building static libraries etc). GCC provides the <code>gcc-ar</code>, |
| <code>gcc-nm</code>, <code>gcc-ranlib</code> wrappers to pass the right options |
| to these tools. With non fat LTO makefiles need to be modified to use them. |
| </p> |
| <p>The default is <samp>-fno-fat-lto-objects</samp> on targets with linker plugin |
| support. |
| </p> |
| </dd> |
| <dt><code>-fcompare-elim</code></dt> |
| <dd><a name="index-fcompare_002delim"></a> |
| <p>After register allocation and post-register allocation instruction splitting, |
| identify arithmetic instructions that compute processor flags similar to a |
| comparison operation based on that arithmetic. If possible, eliminate the |
| explicit comparison operation. |
| </p> |
| <p>This pass only applies to certain targets that cannot explicitly represent |
| the comparison operation before register allocation is complete. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fcprop-registers</code></dt> |
| <dd><a name="index-fcprop_002dregisters"></a> |
| <p>After register allocation and post-register allocation instruction splitting, |
| perform a copy-propagation pass to try to reduce scheduling dependencies |
| and occasionally eliminate the copy. |
| </p> |
| <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-fprofile-correction</code></dt> |
| <dd><a name="index-fprofile_002dcorrection"></a> |
| <p>Profiles collected using an instrumented binary for multi-threaded programs may |
| be inconsistent due to missed counter updates. When this option is specified, |
| GCC uses heuristics to correct or smooth out such inconsistencies. By |
| default, GCC emits an error message when an inconsistent profile is detected. |
| </p> |
| </dd> |
| <dt><code>-fprofile-dir=<var>path</var></code></dt> |
| <dd><a name="index-fprofile_002ddir"></a> |
| |
| <p>Set the directory to search for the profile data files in to <var>path</var>. |
| This option affects only the profile data generated by |
| <samp>-fprofile-generate</samp>, <samp>-ftest-coverage</samp>, <samp>-fprofile-arcs</samp> |
| and used by <samp>-fprofile-use</samp> and <samp>-fbranch-probabilities</samp> |
| and its related options. Both absolute and relative paths can be used. |
| By default, GCC uses the current directory as <var>path</var>, thus the |
| profile data file appears in the same directory as the object file. |
| </p> |
| </dd> |
| <dt><code>-fprofile-generate</code></dt> |
| <dt><code>-fprofile-generate=<var>path</var></code></dt> |
| <dd><a name="index-fprofile_002dgenerate"></a> |
| |
| <p>Enable options usually used for instrumenting application to produce |
| profile useful for later recompilation with profile feedback based |
| optimization. You must use <samp>-fprofile-generate</samp> both when |
| compiling and when linking your program. |
| </p> |
| <p>The following options are enabled: <samp>-fprofile-arcs</samp>, <samp>-fprofile-values</samp>, <samp>-fvpt</samp>. |
| </p> |
| <p>If <var>path</var> is specified, GCC looks at the <var>path</var> to find |
| the profile feedback data files. See <samp>-fprofile-dir</samp>. |
| </p> |
| </dd> |
| <dt><code>-fprofile-use</code></dt> |
| <dt><code>-fprofile-use=<var>path</var></code></dt> |
| <dd><a name="index-fprofile_002duse"></a> |
| <p>Enable profile feedback-directed optimizations, |
| and the following optimizations |
| which are generally profitable only with profile feedback available: |
| <samp>-fbranch-probabilities</samp>, <samp>-fvpt</samp>, |
| <samp>-funroll-loops</samp>, <samp>-fpeel-loops</samp>, <samp>-ftracer</samp>, |
| <samp>-ftree-vectorize</samp>, and <samp>ftree-loop-distribute-patterns</samp>. |
| </p> |
| <p>By default, GCC emits an error message if the feedback profiles do not |
| match the source code. This error can be turned into a warning by using |
| <samp>-Wcoverage-mismatch</samp>. Note this may result in poorly optimized |
| code. |
| </p> |
| <p>If <var>path</var> is specified, GCC looks at the <var>path</var> to find |
| the profile feedback data files. See <samp>-fprofile-dir</samp>. |
| </p> |
| </dd> |
| <dt><code>-fauto-profile</code></dt> |
| <dt><code>-fauto-profile=<var>path</var></code></dt> |
| <dd><a name="index-fauto_002dprofile"></a> |
| <p>Enable sampling-based feedback-directed optimizations, |
| and the following optimizations |
| which are generally profitable only with profile feedback available: |
| <samp>-fbranch-probabilities</samp>, <samp>-fvpt</samp>, |
| <samp>-funroll-loops</samp>, <samp>-fpeel-loops</samp>, <samp>-ftracer</samp>, |
| <samp>-ftree-vectorize</samp>, |
| <samp>-finline-functions</samp>, <samp>-fipa-cp</samp>, <samp>-fipa-cp-clone</samp>, |
| <samp>-fpredictive-commoning</samp>, <samp>-funswitch-loops</samp>, |
| <samp>-fgcse-after-reload</samp>, and <samp>-ftree-loop-distribute-patterns</samp>. |
| </p> |
| <p><var>path</var> is the name of a file containing AutoFDO profile information. |
| If omitted, it defaults to <samp>fbdata.afdo</samp> in the current directory. |
| </p> |
| <p>Producing an AutoFDO profile data file requires running your program |
| with the <code>perf</code> utility on a supported GNU/Linux target system. |
| For more information, see <a href="https://perf.wiki.kernel.org/">https://perf.wiki.kernel.org/</a>. |
| </p> |
| <p>E.g. |
| </p><div class="smallexample"> |
| <pre class="smallexample">perf record -e br_inst_retired:near_taken -b -o perf.data \ |
| -- your_program |
| </pre></div> |
| |
| <p>Then use the <code>create_gcov</code> tool to convert the raw profile data |
| to a format that can be used by GCC. You must also supply the |
| unstripped binary for your program to this tool. |
| See <a href="https://github.com/google/autofdo">https://github.com/google/autofdo</a>. |
| </p> |
| <p>E.g. |
| </p><div class="smallexample"> |
| <pre class="smallexample">create_gcov --binary=your_program.unstripped --profile=perf.data \ |
| --gcov=profile.afdo |
| </pre></div> |
| </dd> |
| </dl> |
| |
| <p>The following options control compiler behavior regarding floating-point |
| arithmetic. These options trade off between speed and |
| correctness. All must be specifically enabled. |
| </p> |
| <dl compact="compact"> |
| <dt><code>-ffloat-store</code></dt> |
| <dd><a name="index-ffloat_002dstore"></a> |
| <p>Do not store floating-point variables in registers, and inhibit other |
| options that might change whether a floating-point value is taken from a |
| register or memory. |
| </p> |
| <a name="index-floating_002dpoint-precision"></a> |
| <p>This option prevents undesirable excess precision on machines such as |
| the 68000 where the floating registers (of the 68881) keep more |
| precision than a <code>double</code> is supposed to have. Similarly for the |
| x86 architecture. For most programs, the excess precision does only |
| good, but a few programs rely on the precise definition of IEEE floating |
| point. Use <samp>-ffloat-store</samp> for such programs, after modifying |
| them to store all pertinent intermediate computations into variables. |
| </p> |
| </dd> |
| <dt><code>-fexcess-precision=<var>style</var></code></dt> |
| <dd><a name="index-fexcess_002dprecision"></a> |
| <p>This option allows further control over excess precision on machines |
| where floating-point registers have more precision than the IEEE |
| <code>float</code> and <code>double</code> types and the processor does not |
| support operations rounding to those types. By default, |
| <samp>-fexcess-precision=fast</samp> is in effect; this means that |
| operations are carried out in the precision of the registers and that |
| it is unpredictable when rounding to the types specified in the source |
| code takes place. When compiling C, if |
| <samp>-fexcess-precision=standard</samp> is specified then excess |
| precision follows the rules specified in ISO C99; in particular, |
| both casts and assignments cause values to be rounded to their |
| semantic types (whereas <samp>-ffloat-store</samp> only affects |
| assignments). This option is enabled by default for C if a strict |
| conformance option such as <samp>-std=c99</samp> is used. |
| </p> |
| <a name="index-mfpmath"></a> |
| <p><samp>-fexcess-precision=standard</samp> is not implemented for languages |
| other than C, and has no effect if |
| <samp>-funsafe-math-optimizations</samp> or <samp>-ffast-math</samp> is |
| specified. On the x86, it also has no effect if <samp>-mfpmath=sse</samp> |
| or <samp>-mfpmath=sse+387</samp> is specified; in the former case, IEEE |
| semantics apply without excess precision, and in the latter, rounding |
| is unpredictable. |
| </p> |
| </dd> |
| <dt><code>-ffast-math</code></dt> |
| <dd><a name="index-ffast_002dmath"></a> |
| <p>Sets the options <samp>-fno-math-errno</samp>, <samp>-funsafe-math-optimizations</samp>, |
| <samp>-ffinite-math-only</samp>, <samp>-fno-rounding-math</samp>, |
| <samp>-fno-signaling-nans</samp> and <samp>-fcx-limited-range</samp>. |
| </p> |
| <p>This option causes the preprocessor macro <code>__FAST_MATH__</code> to be defined. |
| </p> |
| <p>This option is not turned on by any <samp>-O</samp> option besides |
| <samp>-Ofast</samp> since it can result in incorrect output for programs |
| that depend on an exact implementation of IEEE or ISO rules/specifications |
| for math functions. It may, however, yield faster code for programs |
| that do not require the guarantees of these specifications. |
| </p> |
| </dd> |
| <dt><code>-fno-math-errno</code></dt> |
| <dd><a name="index-fno_002dmath_002derrno"></a> |
| <p>Do not set <code>errno</code> after calling math functions that are executed |
| with a single instruction, e.g., <code>sqrt</code>. A program that relies on |
| IEEE exceptions for math error handling may want to use this flag |
| for speed while maintaining IEEE arithmetic compatibility. |
| </p> |
| <p>This option is not turned on by any <samp>-O</samp> option since |
| it can result in incorrect output for programs that depend on |
| an exact implementation of IEEE or ISO rules/specifications for |
| math functions. It may, however, yield faster code for programs |
| that do not require the guarantees of these specifications. |
| </p> |
| <p>The default is <samp>-fmath-errno</samp>. |
| </p> |
| <p>On Darwin systems, the math library never sets <code>errno</code>. There is |
| therefore no reason for the compiler to consider the possibility that |
| it might, and <samp>-fno-math-errno</samp> is the default. |
| </p> |
| </dd> |
| <dt><code>-funsafe-math-optimizations</code></dt> |
| <dd><a name="index-funsafe_002dmath_002doptimizations"></a> |
| |
| <p>Allow optimizations for floating-point arithmetic that (a) assume |
| that arguments and results are valid and (b) may violate IEEE or |
| ANSI standards. When used at link-time, it may include libraries |
| or startup files that change the default FPU control word or other |
| similar optimizations. |
| </p> |
| <p>This option is not turned on by any <samp>-O</samp> option since |
| it can result in incorrect output for programs that depend on |
| an exact implementation of IEEE or ISO rules/specifications for |
| math functions. It may, however, yield faster code for programs |
| that do not require the guarantees of these specifications. |
| Enables <samp>-fno-signed-zeros</samp>, <samp>-fno-trapping-math</samp>, |
| <samp>-fassociative-math</samp> and <samp>-freciprocal-math</samp>. |
| </p> |
| <p>The default is <samp>-fno-unsafe-math-optimizations</samp>. |
| </p> |
| </dd> |
| <dt><code>-fassociative-math</code></dt> |
| <dd><a name="index-fassociative_002dmath"></a> |
| |
| <p>Allow re-association of operands in series of floating-point operations. |
| This violates the ISO C and C++ language standard by possibly changing |
| computation result. NOTE: re-ordering may change the sign of zero as |
| well as ignore NaNs and inhibit or create underflow or overflow (and |
| thus cannot be used on code that relies on rounding behavior like |
| <code>(x + 2**52) - 2**52</code>. May also reorder floating-point comparisons |
| and thus may not be used when ordered comparisons are required. |
| This option requires that both <samp>-fno-signed-zeros</samp> and |
| <samp>-fno-trapping-math</samp> be in effect. Moreover, it doesn’t make |
| much sense with <samp>-frounding-math</samp>. For Fortran the option |
| is automatically enabled when both <samp>-fno-signed-zeros</samp> and |
| <samp>-fno-trapping-math</samp> are in effect. |
| </p> |
| <p>The default is <samp>-fno-associative-math</samp>. |
| </p> |
| </dd> |
| <dt><code>-freciprocal-math</code></dt> |
| <dd><a name="index-freciprocal_002dmath"></a> |
| |
| <p>Allow the reciprocal of a value to be used instead of dividing by |
| the value if this enables optimizations. For example <code>x / y</code> |
| can be replaced with <code>x * (1/y)</code>, which is useful if <code>(1/y)</code> |
| is subject to common subexpression elimination. Note that this loses |
| precision and increases the number of flops operating on the value. |
| </p> |
| <p>The default is <samp>-fno-reciprocal-math</samp>. |
| </p> |
| </dd> |
| <dt><code>-ffinite-math-only</code></dt> |
| <dd><a name="index-ffinite_002dmath_002donly"></a> |
| <p>Allow optimizations for floating-point arithmetic that assume |
| that arguments and results are not NaNs or +-Infs. |
| </p> |
| <p>This option is not turned on by any <samp>-O</samp> option since |
| it can result in incorrect output for programs that depend on |
| an exact implementation of IEEE or ISO rules/specifications for |
| math functions. It may, however, yield faster code for programs |
| that do not require the guarantees of these specifications. |
| </p> |
| <p>The default is <samp>-fno-finite-math-only</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-signed-zeros</code></dt> |
| <dd><a name="index-fno_002dsigned_002dzeros"></a> |
| <p>Allow optimizations for floating-point arithmetic that ignore the |
| signedness of zero. IEEE arithmetic specifies the behavior of |
| distinct +0.0 and -0.0 values, which then prohibits simplification |
| of expressions such as x+0.0 or 0.0*x (even with <samp>-ffinite-math-only</samp>). |
| This option implies that the sign of a zero result isn’t significant. |
| </p> |
| <p>The default is <samp>-fsigned-zeros</samp>. |
| </p> |
| </dd> |
| <dt><code>-fno-trapping-math</code></dt> |
| <dd><a name="index-fno_002dtrapping_002dmath"></a> |
| <p>Compile code assuming that floating-point operations cannot generate |
| user-visible traps. These traps include division by zero, overflow, |
| underflow, inexact result and invalid operation. This option requires |
| that <samp>-fno-signaling-nans</samp> be in effect. Setting this option may |
| allow faster code if one relies on “non-stop” IEEE arithmetic, for example. |
| </p> |
| <p>This option should never be turned on by any <samp>-O</samp> option since |
| it can result in incorrect output for programs that depend on |
| an exact implementation of IEEE or ISO rules/specifications for |
| math functions. |
| </p> |
| <p>The default is <samp>-ftrapping-math</samp>. |
| </p> |
| </dd> |
| <dt><code>-frounding-math</code></dt> |
| <dd><a name="index-frounding_002dmath"></a> |
| <p>Disable transformations and optimizations that assume default floating-point |
| rounding behavior. This is round-to-zero for all floating point |
| to integer conversions, and round-to-nearest for all other arithmetic |
| truncations. This option should be specified for programs that change |
| the FP rounding mode dynamically, or that may be executed with a |
| non-default rounding mode. This option disables constant folding of |
| floating-point expressions at compile time (which may be affected by |
| rounding mode) and arithmetic transformations that are unsafe in the |
| presence of sign-dependent rounding modes. |
| </p> |
| <p>The default is <samp>-fno-rounding-math</samp>. |
| </p> |
| <p>This option is experimental and does not currently guarantee to |
| disable all GCC optimizations that are affected by rounding mode. |
| Future versions of GCC may provide finer control of this setting |
| using C99’s <code>FENV_ACCESS</code> pragma. This command-line option |
| will be used to specify the default state for <code>FENV_ACCESS</code>. |
| </p> |
| </dd> |
| <dt><code>-fsignaling-nans</code></dt> |
| <dd><a name="index-fsignaling_002dnans"></a> |
| <p>Compile code assuming that IEEE signaling NaNs may generate user-visible |
| traps during floating-point operations. Setting this option disables |
| optimizations that may change the number of exceptions visible with |
| signaling NaNs. This option implies <samp>-ftrapping-math</samp>. |
| </p> |
| <p>This option causes the preprocessor macro <code>__SUPPORT_SNAN__</code> to |
| be defined. |
| </p> |
| <p>The default is <samp>-fno-signaling-nans</samp>. |
| </p> |
| <p>This option is experimental and does not currently guarantee to |
| disable all GCC optimizations that affect signaling NaN behavior. |
| </p> |
| </dd> |
| <dt><code>-fsingle-precision-constant</code></dt> |
| <dd><a name="index-fsingle_002dprecision_002dconstant"></a> |
| <p>Treat floating-point constants as single precision instead of |
| implicitly converting them to double-precision constants. |
| </p> |
| </dd> |
| <dt><code>-fcx-limited-range</code></dt> |
| <dd><a name="index-fcx_002dlimited_002drange"></a> |
| <p>When enabled, this option states that a range reduction step is not |
| needed when performing complex division. Also, there is no checking |
| whether the result of a complex multiplication or division is <code>NaN |
| + I*NaN</code>, with an attempt to rescue the situation in that case. The |
| default is <samp>-fno-cx-limited-range</samp>, but is enabled by |
| <samp>-ffast-math</samp>. |
| </p> |
| <p>This option controls the default setting of the ISO C99 |
| <code>CX_LIMITED_RANGE</code> pragma. Nevertheless, the option applies to |
| all languages. |
| </p> |
| </dd> |
| <dt><code>-fcx-fortran-rules</code></dt> |
| <dd><a name="index-fcx_002dfortran_002drules"></a> |
| <p>Complex multiplication and division follow Fortran rules. Range |
| reduction is done as part of complex division, but there is no checking |
| whether the result of a complex multiplication or division is <code>NaN |
| + I*NaN</code>, with an attempt to rescue the situation in that case. |
| </p> |
| <p>The default is <samp>-fno-cx-fortran-rules</samp>. |
| </p> |
| </dd> |
| </dl> |
| |
| <p>The following options control optimizations that may improve |
| performance, but are not enabled by any <samp>-O</samp> options. This |
| section includes experimental options that may produce broken code. |
| </p> |
| <dl compact="compact"> |
| <dt><code>-fbranch-probabilities</code></dt> |
| <dd><a name="index-fbranch_002dprobabilities"></a> |
| <p>After running a program compiled with <samp>-fprofile-arcs</samp> |
| (see <a href="Debugging-Options.html#Debugging-Options">Options for Debugging Your Program or |
| <code>gcc</code></a>), you can compile it a second time using |
| <samp>-fbranch-probabilities</samp>, to improve optimizations based on |
| the number of times each branch was taken. When a program |
| compiled with <samp>-fprofile-arcs</samp> exits, it saves arc execution |
| counts to a file called <samp><var>sourcename</var>.gcda</samp> for each source |
| file. The information in this data file is very dependent on the |
| structure of the generated code, so you must use the same source code |
| and the same optimization options for both compilations. |
| </p> |
| <p>With <samp>-fbranch-probabilities</samp>, GCC puts a |
| ‘<samp>REG_BR_PROB</samp>’ note on each ‘<samp>JUMP_INSN</samp>’ and ‘<samp>CALL_INSN</samp>’. |
| These can be used to improve optimization. Currently, they are only |
| used in one place: in <samp>reorg.c</samp>, instead of guessing which path a |
| branch is most likely to take, the ‘<samp>REG_BR_PROB</samp>’ values are used to |
| exactly determine which path is taken more often. |
| </p> |
| </dd> |
| <dt><code>-fprofile-values</code></dt> |
| <dd><a name="index-fprofile_002dvalues"></a> |
| <p>If combined with <samp>-fprofile-arcs</samp>, it adds code so that some |
| data about values of expressions in the program is gathered. |
| </p> |
| <p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered |
| from profiling values of expressions for usage in optimizations. |
| </p> |
| <p>Enabled with <samp>-fprofile-generate</samp> and <samp>-fprofile-use</samp>. |
| </p> |
| </dd> |
| <dt><code>-fprofile-reorder-functions</code></dt> |
| <dd><a name="index-fprofile_002dreorder_002dfunctions"></a> |
| <p>Function reordering based on profile instrumentation collects |
| first time of execution of a function and orders these functions |
| in ascending order. |
| </p> |
| <p>Enabled with <samp>-fprofile-use</samp>. |
| </p> |
| </dd> |
| <dt><code>-fvpt</code></dt> |
| <dd><a name="index-fvpt"></a> |
| <p>If combined with <samp>-fprofile-arcs</samp>, this option instructs the compiler |
| to add code to gather information about values of expressions. |
| </p> |
| <p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered |
| and actually performs the optimizations based on them. |
| Currently the optimizations include specialization of division operations |
| using the knowledge about the value of the denominator. |
| </p> |
| </dd> |
| <dt><code>-frename-registers</code></dt> |
| <dd><a name="index-frename_002dregisters"></a> |
| <p>Attempt to avoid false dependencies in scheduled code by making use |
| of registers left over after register allocation. This optimization |
| most benefits processors with lots of registers. Depending on the |
| debug information format adopted by the target, however, it can |
| make debugging impossible, since variables no longer stay in |
| a “home register”. |
| </p> |
| <p>Enabled by default with <samp>-funroll-loops</samp> and <samp>-fpeel-loops</samp>. |
| </p> |
| </dd> |
| <dt><code>-fschedule-fusion</code></dt> |
| <dd><a name="index-fschedule_002dfusion"></a> |
| <p>Performs a target dependent pass over the instruction stream to schedule |
| instructions of same type together because target machine can execute them |
| more efficiently if they are adjacent to each other in the instruction flow. |
| </p> |
| <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. |
| </p> |
| </dd> |
| <dt><code>-ftracer</code></dt> |
| <dd><a name="index-ftracer"></a> |
| <p>Perform tail duplication to enlarge superblock size. This transformation |
| simplifies the control flow of the function allowing other optimizations to do |
| a better job. |
| </p> |
| <p>Enabled with <samp>-fprofile-use</samp>. |
| </p> |
| </dd> |
| <dt><code>-funroll-loops</code></dt> |
| <dd><a name="index-funroll_002dloops"></a> |
| <p>Unroll loops whose number of iterations can be determined at compile time or |
| upon entry to the loop. <samp>-funroll-loops</samp> implies |
| <samp>-frerun-cse-after-loop</samp>, <samp>-fweb</samp> and <samp>-frename-registers</samp>. |
| It also turns on complete loop peeling (i.e. complete removal of loops with |
| a small constant number of iterations). This option makes code larger, and may |
| or may not make it run faster. |
| </p> |
| <p>Enabled with <samp>-fprofile-use</samp>. |
| </p> |
| </dd> |
| <dt><code>-funroll-all-loops</code></dt> |
| <dd><a name="index-funroll_002dall_002dloops"></a> |
| <p>Unroll all loops, even if their number of iterations is uncertain when |
| the loop is entered. This usually makes programs run more slowly. |
| <samp>-funroll-all-loops</samp> implies the same options as |
| <samp>-funroll-loops</samp>. |
| </p> |
| </dd> |
| <dt><code>-fpeel-loops</code></dt> |
| <dd><a name="index-fpeel_002dloops"></a> |
| <p>Peels loops for which there is enough information that they do not |
| roll much (from profile feedback). It also turns on complete loop peeling |
| (i.e. complete removal of loops with small constant number of iterations). |
| </p> |
| <p>Enabled with <samp>-fprofile-use</samp>. |
| </p> |
| </dd> |
| <dt><code>-fmove-loop-invariants</code></dt> |
| <dd><a name="index-fmove_002dloop_002dinvariants"></a> |
| <p>Enables the loop invariant motion pass in the RTL loop optimizer. Enabled |
| at level <samp>-O1</samp> |
| </p> |
| </dd> |
| <dt><code>-funswitch-loops</code></dt> |
| <dd><a name="index-funswitch_002dloops"></a> |
| <p>Move branches with loop invariant conditions out of the loop, with duplicates |
| of the loop on both branches (modified according to result of the condition). |
| </p> |
| </dd> |
| <dt><code>-ffunction-sections</code></dt> |
| <dt><code>-fdata-sections</code></dt> |
| <dd><a name="index-ffunction_002dsections"></a> |
| <a name="index-fdata_002dsections"></a> |
| <p>Place each function or data item into its own section in the output |
| file if the target supports arbitrary sections. The name of the |
| function or the name of the data item determines the section’s name |
| in the output file. |
| </p> |
| <p>Use these options on systems where the linker can perform optimizations |
| to improve locality of reference in the instruction space. Most systems |
| using the ELF object format and SPARC processors running Solaris 2 have |
| linkers with such optimizations. AIX may have these optimizations in |
| the future. |
| </p> |
| <p>Only use these options when there are significant benefits from doing |
| so. When you specify these options, the assembler and linker |
| create larger object and executable files and are also slower. |
| You cannot use <code>gprof</code> on all systems if you |
| specify this option, and you may have problems with debugging if |
| you specify both this option and <samp>-g</samp>. |
| </p> |
| </dd> |
| <dt><code>-fbranch-target-load-optimize</code></dt> |
| <dd><a name="index-fbranch_002dtarget_002dload_002doptimize"></a> |
| <p>Perform branch target register load optimization before prologue / epilogue |
| threading. |
| The use of target registers can typically be exposed only during reload, |
| thus hoisting loads out of loops and doing inter-block scheduling needs |
| a separate optimization pass. |
| </p> |
| </dd> |
| <dt><code>-fbranch-target-load-optimize2</code></dt> |
| <dd><a name="index-fbranch_002dtarget_002dload_002doptimize2"></a> |
| <p>Perform branch target register load optimization after prologue / epilogue |
| threading. |
| </p> |
| </dd> |
| <dt><code>-fbtr-bb-exclusive</code></dt> |
| <dd><a name="index-fbtr_002dbb_002dexclusive"></a> |
| <p>When performing branch target register load optimization, don’t reuse |
| branch target registers within any basic block. |
| </p> |
| </dd> |
| <dt><code>-fstack-protector</code></dt> |
| <dd><a name="index-fstack_002dprotector"></a> |
| <p>Emit extra code to check for buffer overflows, such as stack smashing |
| attacks. This is done by adding a guard variable to functions with |
| vulnerable objects. This includes functions that call <code>alloca</code>, and |
| functions with buffers larger than 8 bytes. The guards are initialized |
| when a function is entered and then checked when the function exits. |
| If a guard check fails, an error message is printed and the program exits. |
| </p> |
| </dd> |
| <dt><code>-fstack-protector-all</code></dt> |
| <dd><a name="index-fstack_002dprotector_002dall"></a> |
| <p>Like <samp>-fstack-protector</samp> except that all functions are protected. |
| </p> |
| </dd> |
| <dt><code>-fstack-protector-strong</code></dt> |
| <dd><a name="index-fstack_002dprotector_002dstrong"></a> |
| <p>Like <samp>-fstack-protector</samp> but includes additional functions to |
| be protected — those that have local array definitions, or have |
| references to local frame addresses. |
| </p> |
| </dd> |
| <dt><code>-fstack-protector-explicit</code></dt> |
| <dd><a name="index-fstack_002dprotector_002dexplicit"></a> |
| <p>Like <samp>-fstack-protector</samp> but only protects those functions which |
| have the <code>stack_protect</code> attribute |
| </p> |
| </dd> |
| <dt><code>-fstdarg-opt</code></dt> |
| <dd><a name="index-fstdarg_002dopt"></a> |
| <p>Optimize the prologue of variadic argument functions with respect to usage of |
| those arguments. |
| </p> |
| </dd> |
| <dt><code>-fsection-anchors</code></dt> |
| <dd><a name="index-fsection_002danchors"></a> |
| <p>Try to reduce the number of symbolic address calculations by using |
| shared “anchor” symbols to address nearby objects. This transformation |
| can help to reduce the number of GOT entries and GOT accesses on some |
| targets. |
| </p> |
| <p>For example, the implementation of the following function <code>foo</code>: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">static int a, b, c; |
| int foo (void) { return a + b + c; } |
| </pre></div> |
| |
| <p>usually calculates the addresses of all three variables, but if you |
| compile it with <samp>-fsection-anchors</samp>, it accesses the variables |
| from a common anchor point instead. The effect is similar to the |
| following pseudocode (which isn’t valid C): |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">int foo (void) |
| { |
| register int *xr = &x; |
| return xr[&a - &x] + xr[&b - &x] + xr[&c - &x]; |
| } |
| </pre></div> |
| |
| <p>Not all targets support this option. |
| </p> |
| </dd> |
| <dt><code>--param <var>name</var>=<var>value</var></code></dt> |
| <dd><a name="index-param"></a> |
| <p>In some places, GCC uses various constants to control the amount of |
| optimization that is done. For example, GCC does not inline functions |
| that contain more than a certain number of instructions. You can |
| control some of these constants on the command line using the |
| <samp>--param</samp> option. |
| </p> |
| <p>The names of specific parameters, and the meaning of the values, are |
| tied to the internals of the compiler, and are subject to change |
| without notice in future releases. |
| </p> |
| <p>In each case, the <var>value</var> is an integer. The allowable choices for |
| <var>name</var> are: |
| </p> |
| <dl compact="compact"> |
| <dt><code>predictable-branch-outcome</code></dt> |
| <dd><p>When branch is predicted to be taken with probability lower than this threshold |
| (in percent), then it is considered well predictable. The default is 10. |
| </p> |
| </dd> |
| <dt><code>max-crossjump-edges</code></dt> |
| <dd><p>The maximum number of incoming edges to consider for cross-jumping. |
| The algorithm used by <samp>-fcrossjumping</samp> is <em>O(N^2)</em> in |
| the number of edges incoming to each block. Increasing values mean |
| more aggressive optimization, making the compilation time increase with |
| probably small improvement in executable size. |
| </p> |
| </dd> |
| <dt><code>min-crossjump-insns</code></dt> |
| <dd><p>The minimum number of instructions that must be matched at the end |
| of two blocks before cross-jumping is performed on them. This |
| value is ignored in the case where all instructions in the block being |
| cross-jumped from are matched. The default value is 5. |
| </p> |
| </dd> |
| <dt><code>max-grow-copy-bb-insns</code></dt> |
| <dd><p>The maximum code size expansion factor when copying basic blocks |
| instead of jumping. The expansion is relative to a jump instruction. |
| The default value is 8. |
| </p> |
| </dd> |
| <dt><code>max-goto-duplication-insns</code></dt> |
| <dd><p>The maximum number of instructions to duplicate to a block that jumps |
| to a computed goto. To avoid <em>O(N^2)</em> behavior in a number of |
| passes, GCC factors computed gotos early in the compilation process, |
| and unfactors them as late as possible. Only computed jumps at the |
| end of a basic blocks with no more than max-goto-duplication-insns are |
| unfactored. The default value is 8. |
| </p> |
| </dd> |
| <dt><code>max-delay-slot-insn-search</code></dt> |
| <dd><p>The maximum number of instructions to consider when looking for an |
| instruction to fill a delay slot. If more than this arbitrary number of |
| instructions are searched, the time savings from filling the delay slot |
| are minimal, so stop searching. Increasing values mean more |
| aggressive optimization, making the compilation time increase with probably |
| small improvement in execution time. |
| </p> |
| </dd> |
| <dt><code>max-delay-slot-live-search</code></dt> |
| <dd><p>When trying to fill delay slots, the maximum number of instructions to |
| consider when searching for a block with valid live register |
| information. Increasing this arbitrarily chosen value means more |
| aggressive optimization, increasing the compilation time. This parameter |
| should be removed when the delay slot code is rewritten to maintain the |
| control-flow graph. |
| </p> |
| </dd> |
| <dt><code>max-gcse-memory</code></dt> |
| <dd><p>The approximate maximum amount of memory that can be allocated in |
| order to perform the global common subexpression elimination |
| optimization. If more memory than specified is required, the |
| optimization is not done. |
| </p> |
| </dd> |
| <dt><code>max-gcse-insertion-ratio</code></dt> |
| <dd><p>If the ratio of expression insertions to deletions is larger than this value |
| for any expression, then RTL PRE inserts or removes the expression and thus |
| leaves partially redundant computations in the instruction stream. The default value is 20. |
| </p> |
| </dd> |
| <dt><code>max-pending-list-length</code></dt> |
| <dd><p>The maximum number of pending dependencies scheduling allows |
| before flushing the current state and starting over. Large functions |
| with few branches or calls can create excessively large lists which |
| needlessly consume memory and resources. |
| </p> |
| </dd> |
| <dt><code>max-modulo-backtrack-attempts</code></dt> |
| <dd><p>The maximum number of backtrack attempts the scheduler should make |
| when modulo scheduling a loop. Larger values can exponentially increase |
| compilation time. |
| </p> |
| </dd> |
| <dt><code>max-inline-insns-single</code></dt> |
| <dd><p>Several parameters control the tree inliner used in GCC. |
| This number sets the maximum number of instructions (counted in GCC’s |
| internal representation) in a single function that the tree inliner |
| considers for inlining. This only affects functions declared |
| inline and methods implemented in a class declaration (C++). |
| The default value is 400. |
| </p> |
| </dd> |
| <dt><code>max-inline-insns-auto</code></dt> |
| <dd><p>When you use <samp>-finline-functions</samp> (included in <samp>-O3</samp>), |
| a lot of functions that would otherwise not be considered for inlining |
| by the compiler are investigated. To those functions, a different |
| (more restrictive) limit compared to functions declared inline can |
| be applied. |
| The default value is 40. |
| </p> |
| </dd> |
| <dt><code>inline-min-speedup</code></dt> |
| <dd><p>When estimated performance improvement of caller + callee runtime exceeds this |
| threshold (in precent), the function can be inlined regardless the limit on |
| <samp>--param max-inline-insns-single</samp> and <samp>--param |
| max-inline-insns-auto</samp>. |
| </p> |
| </dd> |
| <dt><code>large-function-insns</code></dt> |
| <dd><p>The limit specifying really large functions. For functions larger than this |
| limit after inlining, inlining is constrained by |
| <samp>--param large-function-growth</samp>. This parameter is useful primarily |
| to avoid extreme compilation time caused by non-linear algorithms used by the |
| back end. |
| The default value is 2700. |
| </p> |
| </dd> |
| <dt><code>large-function-growth</code></dt> |
| <dd><p>Specifies maximal growth of large function caused by inlining in percents. |
| The default value is 100 which limits large function growth to 2.0 times |
| the original size. |
| </p> |
| </dd> |
| <dt><code>large-unit-insns</code></dt> |
| <dd><p>The limit specifying large translation unit. Growth caused by inlining of |
| units larger than this limit is limited by <samp>--param inline-unit-growth</samp>. |
| For small units this might be too tight. |
| For example, consider a unit consisting of function A |
| that is inline and B that just calls A three times. If B is small relative to |
| A, the growth of unit is 300\% and yet such inlining is very sane. For very |
| large units consisting of small inlineable functions, however, the overall unit |
| growth limit is needed to avoid exponential explosion of code size. Thus for |
| smaller units, the size is increased to <samp>--param large-unit-insns</samp> |
| before applying <samp>--param inline-unit-growth</samp>. The default is 10000. |
| </p> |
| </dd> |
| <dt><code>inline-unit-growth</code></dt> |
| <dd><p>Specifies maximal overall growth of the compilation unit caused by inlining. |
| The default value is 20 which limits unit growth to 1.2 times the original |
| size. Cold functions (either marked cold via an attribute or by profile |
| feedback) are not accounted into the unit size. |
| </p> |
| </dd> |
| <dt><code>ipcp-unit-growth</code></dt> |
| <dd><p>Specifies maximal overall growth of the compilation unit caused by |
| interprocedural constant propagation. The default value is 10 which limits |
| unit growth to 1.1 times the original size. |
| </p> |
| </dd> |
| <dt><code>large-stack-frame</code></dt> |
| <dd><p>The limit specifying large stack frames. While inlining the algorithm is trying |
| to not grow past this limit too much. The default value is 256 bytes. |
| </p> |
| </dd> |
| <dt><code>large-stack-frame-growth</code></dt> |
| <dd><p>Specifies maximal growth of large stack frames caused by inlining in percents. |
| The default value is 1000 which limits large stack frame growth to 11 times |
| the original size. |
| </p> |
| </dd> |
| <dt><code>max-inline-insns-recursive</code></dt> |
| <dt><code>max-inline-insns-recursive-auto</code></dt> |
| <dd><p>Specifies the maximum number of instructions an out-of-line copy of a |
| self-recursive inline |
| function can grow into by performing recursive inlining. |
| </p> |
| <p><samp>--param max-inline-insns-recursive</samp> applies to functions |
| declared inline. |
| For functions not declared inline, recursive inlining |
| happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is |
| enabled; <samp>--param max-inline-insns-recursive-auto</samp> applies instead. The |
| default value is 450. |
| </p> |
| </dd> |
| <dt><code>max-inline-recursive-depth</code></dt> |
| <dt><code>max-inline-recursive-depth-auto</code></dt> |
| <dd><p>Specifies the maximum recursion depth used for recursive inlining. |
| </p> |
| <p><samp>--param max-inline-recursive-depth</samp> applies to functions |
| declared inline. For functions not declared inline, recursive inlining |
| happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is |
| enabled; <samp>--param max-inline-recursive-depth-auto</samp> applies instead. The |
| default value is 8. |
| </p> |
| </dd> |
| <dt><code>min-inline-recursive-probability</code></dt> |
| <dd><p>Recursive inlining is profitable only for function having deep recursion |
| in average and can hurt for function having little recursion depth by |
| increasing the prologue size or complexity of function body to other |
| optimizers. |
| </p> |
| <p>When profile feedback is available (see <samp>-fprofile-generate</samp>) the actual |
| recursion depth can be guessed from probability that function recurses via a |
| given call expression. This parameter limits inlining only to call expressions |
| whose probability exceeds the given threshold (in percents). |
| The default value is 10. |
| </p> |
| </dd> |
| <dt><code>early-inlining-insns</code></dt> |
| <dd><p>Specify growth that the early inliner can make. In effect it increases |
| the amount of inlining for code having a large abstraction penalty. |
| The default value is 14. |
| </p> |
| </dd> |
| <dt><code>max-early-inliner-iterations</code></dt> |
| <dd><p>Limit of iterations of the early inliner. This basically bounds |
| the number of nested indirect calls the early inliner can resolve. |
| Deeper chains are still handled by late inlining. |
| </p> |
| </dd> |
| <dt><code>comdat-sharing-probability</code></dt> |
| <dd><p>Probability (in percent) that C++ inline function with comdat visibility |
| are shared across multiple compilation units. The default value is 20. |
| </p> |
| </dd> |
| <dt><code>profile-func-internal-id</code></dt> |
| <dd><p>A parameter to control whether to use function internal id in profile |
| database lookup. If the value is 0, the compiler uses an id that |
| is based on function assembler name and filename, which makes old profile |
| data more tolerant to source changes such as function reordering etc. |
| The default value is 0. |
| </p> |
| </dd> |
| <dt><code>min-vect-loop-bound</code></dt> |
| <dd><p>The minimum number of iterations under which loops are not vectorized |
| when <samp>-ftree-vectorize</samp> is used. The number of iterations after |
| vectorization needs to be greater than the value specified by this option |
| to allow vectorization. The default value is 0. |
| </p> |
| </dd> |
| <dt><code>gcse-cost-distance-ratio</code></dt> |
| <dd><p>Scaling factor in calculation of maximum distance an expression |
| can be moved by GCSE optimizations. This is currently supported only in the |
| code hoisting pass. The bigger the ratio, the more aggressive code hoisting |
| is with simple expressions, i.e., the expressions that have cost |
| less than <samp>gcse-unrestricted-cost</samp>. Specifying 0 disables |
| hoisting of simple expressions. The default value is 10. |
| </p> |
| </dd> |
| <dt><code>gcse-unrestricted-cost</code></dt> |
| <dd><p>Cost, roughly measured as the cost of a single typical machine |
| instruction, at which GCSE optimizations do not constrain |
| the distance an expression can travel. This is currently |
| supported only in the code hoisting pass. The lesser the cost, |
| the more aggressive code hoisting is. Specifying 0 |
| allows all expressions to travel unrestricted distances. |
| The default value is 3. |
| </p> |
| </dd> |
| <dt><code>max-hoist-depth</code></dt> |
| <dd><p>The depth of search in the dominator tree for expressions to hoist. |
| This is used to avoid quadratic behavior in hoisting algorithm. |
| The value of 0 does not limit on the search, but may slow down compilation |
| of huge functions. The default value is 30. |
| </p> |
| </dd> |
| <dt><code>max-tail-merge-comparisons</code></dt> |
| <dd><p>The maximum amount of similar bbs to compare a bb with. This is used to |
| avoid quadratic behavior in tree tail merging. The default value is 10. |
| </p> |
| </dd> |
| <dt><code>max-tail-merge-iterations</code></dt> |
| <dd><p>The maximum amount of iterations of the pass over the function. This is used to |
| limit compilation time in tree tail merging. The default value is 2. |
| </p> |
| </dd> |
| <dt><code>max-unrolled-insns</code></dt> |
| <dd><p>The maximum number of instructions that a loop may have to be unrolled. |
| If a loop is unrolled, this parameter also determines how many times |
| the loop code is unrolled. |
| </p> |
| </dd> |
| <dt><code>max-average-unrolled-insns</code></dt> |
| <dd><p>The maximum number of instructions biased by probabilities of their execution |
| that a loop may have to be unrolled. If a loop is unrolled, |
| this parameter also determines how many times the loop code is unrolled. |
| </p> |
| </dd> |
| <dt><code>max-unroll-times</code></dt> |
| <dd><p>The maximum number of unrollings of a single loop. |
| </p> |
| </dd> |
| <dt><code>max-peeled-insns</code></dt> |
| <dd><p>The maximum number of instructions that a loop may have to be peeled. |
| If a loop is peeled, this parameter also determines how many times |
| the loop code is peeled. |
| </p> |
| </dd> |
| <dt><code>max-peel-times</code></dt> |
| <dd><p>The maximum number of peelings of a single loop. |
| </p> |
| </dd> |
| <dt><code>max-peel-branches</code></dt> |
| <dd><p>The maximum number of branches on the hot path through the peeled sequence. |
| </p> |
| </dd> |
| <dt><code>max-completely-peeled-insns</code></dt> |
| <dd><p>The maximum number of insns of a completely peeled loop. |
| </p> |
| </dd> |
| <dt><code>max-completely-peel-times</code></dt> |
| <dd><p>The maximum number of iterations of a loop to be suitable for complete peeling. |
| </p> |
| </dd> |
| <dt><code>max-completely-peel-loop-nest-depth</code></dt> |
| <dd><p>The maximum depth of a loop nest suitable for complete peeling. |
| </p> |
| </dd> |
| <dt><code>max-unswitch-insns</code></dt> |
| <dd><p>The maximum number of insns of an unswitched loop. |
| </p> |
| </dd> |
| <dt><code>max-unswitch-level</code></dt> |
| <dd><p>The maximum number of branches unswitched in a single loop. |
| </p> |
| </dd> |
| <dt><code>lim-expensive</code></dt> |
| <dd><p>The minimum cost of an expensive expression in the loop invariant motion. |
| </p> |
| </dd> |
| <dt><code>iv-consider-all-candidates-bound</code></dt> |
| <dd><p>Bound on number of candidates for induction variables, below which |
| all candidates are considered for each use in induction variable |
| optimizations. If there are more candidates than this, |
| only the most relevant ones are considered to avoid quadratic time complexity. |
| </p> |
| </dd> |
| <dt><code>iv-max-considered-uses</code></dt> |
| <dd><p>The induction variable optimizations give up on loops that contain more |
| induction variable uses. |
| </p> |
| </dd> |
| <dt><code>iv-always-prune-cand-set-bound</code></dt> |
| <dd><p>If the number of candidates in the set is smaller than this value, |
| always try to remove unnecessary ivs from the set |
| when adding a new one. |
| </p> |
| </dd> |
| <dt><code>scev-max-expr-size</code></dt> |
| <dd><p>Bound on size of expressions used in the scalar evolutions analyzer. |
| Large expressions slow the analyzer. |
| </p> |
| </dd> |
| <dt><code>scev-max-expr-complexity</code></dt> |
| <dd><p>Bound on the complexity of the expressions in the scalar evolutions analyzer. |
| Complex expressions slow the analyzer. |
| </p> |
| </dd> |
| <dt><code>omega-max-vars</code></dt> |
| <dd><p>The maximum number of variables in an Omega constraint system. |
| The default value is 128. |
| </p> |
| </dd> |
| <dt><code>omega-max-geqs</code></dt> |
| <dd><p>The maximum number of inequalities in an Omega constraint system. |
| The default value is 256. |
| </p> |
| </dd> |
| <dt><code>omega-max-eqs</code></dt> |
| <dd><p>The maximum number of equalities in an Omega constraint system. |
| The default value is 128. |
| </p> |
| </dd> |
| <dt><code>omega-max-wild-cards</code></dt> |
| <dd><p>The maximum number of wildcard variables that the Omega solver is |
| able to insert. The default value is 18. |
| </p> |
| </dd> |
| <dt><code>omega-hash-table-size</code></dt> |
| <dd><p>The size of the hash table in the Omega solver. The default value is |
| 550. |
| </p> |
| </dd> |
| <dt><code>omega-max-keys</code></dt> |
| <dd><p>The maximal number of keys used by the Omega solver. The default |
| value is 500. |
| </p> |
| </dd> |
| <dt><code>omega-eliminate-redundant-constraints</code></dt> |
| <dd><p>When set to 1, use expensive methods to eliminate all redundant |
| constraints. The default value is 0. |
| </p> |
| </dd> |
| <dt><code>vect-max-version-for-alignment-checks</code></dt> |
| <dd><p>The maximum number of run-time checks that can be performed when |
| doing loop versioning for alignment in the vectorizer. |
| </p> |
| </dd> |
| <dt><code>vect-max-version-for-alias-checks</code></dt> |
| <dd><p>The maximum number of run-time checks that can be performed when |
| doing loop versioning for alias in the vectorizer. |
| </p> |
| </dd> |
| <dt><code>vect-max-peeling-for-alignment</code></dt> |
| <dd><p>The maximum number of loop peels to enhance access alignment |
| for vectorizer. Value -1 means ’no limit’. |
| </p> |
| </dd> |
| <dt><code>max-iterations-to-track</code></dt> |
| <dd><p>The maximum number of iterations of a loop the brute-force algorithm |
| for analysis of the number of iterations of the loop tries to evaluate. |
| </p> |
| </dd> |
| <dt><code>hot-bb-count-ws-permille</code></dt> |
| <dd><p>A basic block profile count is considered hot if it contributes to |
| the given permillage (i.e. 0...1000) of the entire profiled execution. |
| </p> |
| </dd> |
| <dt><code>hot-bb-frequency-fraction</code></dt> |
| <dd><p>Select fraction of the entry block frequency of executions of basic block in |
| function given basic block needs to have to be considered hot. |
| </p> |
| </dd> |
| <dt><code>max-predicted-iterations</code></dt> |
| <dd><p>The maximum number of loop iterations we predict statically. This is useful |
| in cases where a function contains a single loop with known bound and |
| another loop with unknown bound. |
| The known number of iterations is predicted correctly, while |
| the unknown number of iterations average to roughly 10. This means that the |
| loop without bounds appears artificially cold relative to the other one. |
| </p> |
| </dd> |
| <dt><code>builtin-expect-probability</code></dt> |
| <dd><p>Control the probability of the expression having the specified value. This |
| parameter takes a percentage (i.e. 0 ... 100) as input. |
| The default probability of 90 is obtained empirically. |
| </p> |
| </dd> |
| <dt><code>align-threshold</code></dt> |
| <dd> |
| <p>Select fraction of the maximal frequency of executions of a basic block in |
| a function to align the basic block. |
| </p> |
| </dd> |
| <dt><code>align-loop-iterations</code></dt> |
| <dd> |
| <p>A loop expected to iterate at least the selected number of iterations is |
| aligned. |
| </p> |
| </dd> |
| <dt><code>tracer-dynamic-coverage</code></dt> |
| <dt><code>tracer-dynamic-coverage-feedback</code></dt> |
| <dd> |
| <p>This value is used to limit superblock formation once the given percentage of |
| executed instructions is covered. This limits unnecessary code size |
| expansion. |
| </p> |
| <p>The <samp>tracer-dynamic-coverage-feedback</samp> parameter |
| is used only when profile |
| feedback is available. The real profiles (as opposed to statically estimated |
| ones) are much less balanced allowing the threshold to be larger value. |
| </p> |
| </dd> |
| <dt><code>tracer-max-code-growth</code></dt> |
| <dd><p>Stop tail duplication once code growth has reached given percentage. This is |
| a rather artificial limit, as most of the duplicates are eliminated later in |
| cross jumping, so it may be set to much higher values than is the desired code |
| growth. |
| </p> |
| </dd> |
| <dt><code>tracer-min-branch-ratio</code></dt> |
| <dd> |
| <p>Stop reverse growth when the reverse probability of best edge is less than this |
| threshold (in percent). |
| </p> |
| </dd> |
| <dt><code>tracer-min-branch-ratio</code></dt> |
| <dt><code>tracer-min-branch-ratio-feedback</code></dt> |
| <dd> |
| <p>Stop forward growth if the best edge has probability lower than this |
| threshold. |
| </p> |
| <p>Similarly to <samp>tracer-dynamic-coverage</samp> two values are present, one for |
| compilation for profile feedback and one for compilation without. The value |
| for compilation with profile feedback needs to be more conservative (higher) in |
| order to make tracer effective. |
| </p> |
| </dd> |
| <dt><code>max-cse-path-length</code></dt> |
| <dd> |
| <p>The maximum number of basic blocks on path that CSE considers. |
| The default is 10. |
| </p> |
| </dd> |
| <dt><code>max-cse-insns</code></dt> |
| <dd><p>The maximum number of instructions CSE processes before flushing. |
| The default is 1000. |
| </p> |
| </dd> |
| <dt><code>ggc-min-expand</code></dt> |
| <dd> |
| <p>GCC uses a garbage collector to manage its own memory allocation. This |
| parameter specifies the minimum percentage by which the garbage |
| collector’s heap should be allowed to expand between collections. |
| Tuning this may improve compilation speed; it has no effect on code |
| generation. |
| </p> |
| <p>The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when |
| RAM >= 1GB. If <code>getrlimit</code> is available, the notion of “RAM” is |
| the smallest of actual RAM and <code>RLIMIT_DATA</code> or <code>RLIMIT_AS</code>. If |
| GCC is not able to calculate RAM on a particular platform, the lower |
| bound of 30% is used. Setting this parameter and |
| <samp>ggc-min-heapsize</samp> to zero causes a full collection to occur at |
| every opportunity. This is extremely slow, but can be useful for |
| debugging. |
| </p> |
| </dd> |
| <dt><code>ggc-min-heapsize</code></dt> |
| <dd> |
| <p>Minimum size of the garbage collector’s heap before it begins bothering |
| to collect garbage. The first collection occurs after the heap expands |
| by <samp>ggc-min-expand</samp>% beyond <samp>ggc-min-heapsize</samp>. Again, |
| tuning this may improve compilation speed, and has no effect on code |
| generation. |
| </p> |
| <p>The default is the smaller of RAM/8, RLIMIT_RSS, or a limit that |
| tries to ensure that RLIMIT_DATA or RLIMIT_AS are not exceeded, but |
| with a lower bound of 4096 (four megabytes) and an upper bound of |
| 131072 (128 megabytes). If GCC is not able to calculate RAM on a |
| particular platform, the lower bound is used. Setting this parameter |
| very large effectively disables garbage collection. Setting this |
| parameter and <samp>ggc-min-expand</samp> to zero causes a full collection |
| to occur at every opportunity. |
| </p> |
| </dd> |
| <dt><code>max-reload-search-insns</code></dt> |
| <dd><p>The maximum number of instruction reload should look backward for equivalent |
| register. Increasing values mean more aggressive optimization, making the |
| compilation time increase with probably slightly better performance. |
| The default value is 100. |
| </p> |
| </dd> |
| <dt><code>max-cselib-memory-locations</code></dt> |
| <dd><p>The maximum number of memory locations cselib should take into account. |
| Increasing values mean more aggressive optimization, making the compilation time |
| increase with probably slightly better performance. The default value is 500. |
| </p> |
| </dd> |
| <dt><code>reorder-blocks-duplicate</code></dt> |
| <dt><code>reorder-blocks-duplicate-feedback</code></dt> |
| <dd> |
| <p>Used by the basic block reordering pass to decide whether to use unconditional |
| branch or duplicate the code on its destination. Code is duplicated when its |
| estimated size is smaller than this value multiplied by the estimated size of |
| unconditional jump in the hot spots of the program. |
| </p> |
| <p>The <samp>reorder-block-duplicate-feedback</samp> parameter |
| is used only when profile |
| feedback is available. It may be set to higher values than |
| <samp>reorder-block-duplicate</samp> since information about the hot spots is more |
| accurate. |
| </p> |
| </dd> |
| <dt><code>max-sched-ready-insns</code></dt> |
| <dd><p>The maximum number of instructions ready to be issued the scheduler should |
| consider at any given time during the first scheduling pass. Increasing |
| values mean more thorough searches, making the compilation time increase |
| with probably little benefit. The default value is 100. |
| </p> |
| </dd> |
| <dt><code>max-sched-region-blocks</code></dt> |
| <dd><p>The maximum number of blocks in a region to be considered for |
| interblock scheduling. The default value is 10. |
| </p> |
| </dd> |
| <dt><code>max-pipeline-region-blocks</code></dt> |
| <dd><p>The maximum number of blocks in a region to be considered for |
| pipelining in the selective scheduler. The default value is 15. |
| </p> |
| </dd> |
| <dt><code>max-sched-region-insns</code></dt> |
| <dd><p>The maximum number of insns in a region to be considered for |
| interblock scheduling. The default value is 100. |
| </p> |
| </dd> |
| <dt><code>max-pipeline-region-insns</code></dt> |
| <dd><p>The maximum number of insns in a region to be considered for |
| pipelining in the selective scheduler. The default value is 200. |
| </p> |
| </dd> |
| <dt><code>min-spec-prob</code></dt> |
| <dd><p>The minimum probability (in percents) of reaching a source block |
| for interblock speculative scheduling. The default value is 40. |
| </p> |
| </dd> |
| <dt><code>max-sched-extend-regions-iters</code></dt> |
| <dd><p>The maximum number of iterations through CFG to extend regions. |
| A value of 0 (the default) disables region extensions. |
| </p> |
| </dd> |
| <dt><code>max-sched-insn-conflict-delay</code></dt> |
| <dd><p>The maximum conflict delay for an insn to be considered for speculative motion. |
| The default value is 3. |
| </p> |
| </dd> |
| <dt><code>sched-spec-prob-cutoff</code></dt> |
| <dd><p>The minimal probability of speculation success (in percents), so that |
| speculative insns are scheduled. |
| The default value is 40. |
| </p> |
| </dd> |
| <dt><code>sched-spec-state-edge-prob-cutoff</code></dt> |
| <dd><p>The minimum probability an edge must have for the scheduler to save its |
| state across it. |
| The default value is 10. |
| </p> |
| </dd> |
| <dt><code>sched-mem-true-dep-cost</code></dt> |
| <dd><p>Minimal distance (in CPU cycles) between store and load targeting same |
| memory locations. The default value is 1. |
| </p> |
| </dd> |
| <dt><code>selsched-max-lookahead</code></dt> |
| <dd><p>The maximum size of the lookahead window of selective scheduling. It is a |
| depth of search for available instructions. |
| The default value is 50. |
| </p> |
| </dd> |
| <dt><code>selsched-max-sched-times</code></dt> |
| <dd><p>The maximum number of times that an instruction is scheduled during |
| selective scheduling. This is the limit on the number of iterations |
| through which the instruction may be pipelined. The default value is 2. |
| </p> |
| </dd> |
| <dt><code>selsched-max-insns-to-rename</code></dt> |
| <dd><p>The maximum number of best instructions in the ready list that are considered |
| for renaming in the selective scheduler. The default value is 2. |
| </p> |
| </dd> |
| <dt><code>sms-min-sc</code></dt> |
| <dd><p>The minimum value of stage count that swing modulo scheduler |
| generates. The default value is 2. |
| </p> |
| </dd> |
| <dt><code>max-last-value-rtl</code></dt> |
| <dd><p>The maximum size measured as number of RTLs that can be recorded in an expression |
| in combiner for a pseudo register as last known value of that register. The default |
| is 10000. |
| </p> |
| </dd> |
| <dt><code>max-combine-insns</code></dt> |
| <dd><p>The maximum number of instructions the RTL combiner tries to combine. |
| The default value is 2 at <samp>-Og</samp> and 4 otherwise. |
| </p> |
| </dd> |
| <dt><code>integer-share-limit</code></dt> |
| <dd><p>Small integer constants can use a shared data structure, reducing the |
| compiler’s memory usage and increasing its speed. This sets the maximum |
| value of a shared integer constant. The default value is 256. |
| </p> |
| </dd> |
| <dt><code>ssp-buffer-size</code></dt> |
| <dd><p>The minimum size of buffers (i.e. arrays) that receive stack smashing |
| protection when <samp>-fstack-protection</samp> is used. |
| </p> |
| </dd> |
| <dt><code>min-size-for-stack-sharing</code></dt> |
| <dd><p>The minimum size of variables taking part in stack slot sharing when not |
| optimizing. The default value is 32. |
| </p> |
| </dd> |
| <dt><code>max-jump-thread-duplication-stmts</code></dt> |
| <dd><p>Maximum number of statements allowed in a block that needs to be |
| duplicated when threading jumps. |
| </p> |
| </dd> |
| <dt><code>max-fields-for-field-sensitive</code></dt> |
| <dd><p>Maximum number of fields in a structure treated in |
| a field sensitive manner during pointer analysis. The default is zero |
| for <samp>-O0</samp> and <samp>-O1</samp>, |
| and 100 for <samp>-Os</samp>, <samp>-O2</samp>, and <samp>-O3</samp>. |
| </p> |
| </dd> |
| <dt><code>prefetch-latency</code></dt> |
| <dd><p>Estimate on average number of instructions that are executed before |
| prefetch finishes. The distance prefetched ahead is proportional |
| to this constant. Increasing this number may also lead to less |
| streams being prefetched (see <samp>simultaneous-prefetches</samp>). |
| </p> |
| </dd> |
| <dt><code>simultaneous-prefetches</code></dt> |
| <dd><p>Maximum number of prefetches that can run at the same time. |
| </p> |
| </dd> |
| <dt><code>l1-cache-line-size</code></dt> |
| <dd><p>The size of cache line in L1 cache, in bytes. |
| </p> |
| </dd> |
| <dt><code>l1-cache-size</code></dt> |
| <dd><p>The size of L1 cache, in kilobytes. |
| </p> |
| </dd> |
| <dt><code>l2-cache-size</code></dt> |
| <dd><p>The size of L2 cache, in kilobytes. |
| </p> |
| </dd> |
| <dt><code>min-insn-to-prefetch-ratio</code></dt> |
| <dd><p>The minimum ratio between the number of instructions and the |
| number of prefetches to enable prefetching in a loop. |
| </p> |
| </dd> |
| <dt><code>prefetch-min-insn-to-mem-ratio</code></dt> |
| <dd><p>The minimum ratio between the number of instructions and the |
| number of memory references to enable prefetching in a loop. |
| </p> |
| </dd> |
| <dt><code>use-canonical-types</code></dt> |
| <dd><p>Whether the compiler should use the “canonical” type system. By |
| default, this should always be 1, which uses a more efficient internal |
| mechanism for comparing types in C++ and Objective-C++. However, if |
| bugs in the canonical type system are causing compilation failures, |
| set this value to 0 to disable canonical types. |
| </p> |
| </dd> |
| <dt><code>switch-conversion-max-branch-ratio</code></dt> |
| <dd><p>Switch initialization conversion refuses to create arrays that are |
| bigger than <samp>switch-conversion-max-branch-ratio</samp> times the number of |
| branches in the switch. |
| </p> |
| </dd> |
| <dt><code>max-partial-antic-length</code></dt> |
| <dd><p>Maximum length of the partial antic set computed during the tree |
| partial redundancy elimination optimization (<samp>-ftree-pre</samp>) when |
| optimizing at <samp>-O3</samp> and above. For some sorts of source code |
| the enhanced partial redundancy elimination optimization can run away, |
| consuming all of the memory available on the host machine. This |
| parameter sets a limit on the length of the sets that are computed, |
| which prevents the runaway behavior. Setting a value of 0 for |
| this parameter allows an unlimited set length. |
| </p> |
| </dd> |
| <dt><code>sccvn-max-scc-size</code></dt> |
| <dd><p>Maximum size of a strongly connected component (SCC) during SCCVN |
| processing. If this limit is hit, SCCVN processing for the whole |
| function is not done and optimizations depending on it are |
| disabled. The default maximum SCC size is 10000. |
| </p> |
| </dd> |
| <dt><code>sccvn-max-alias-queries-per-access</code></dt> |
| <dd><p>Maximum number of alias-oracle queries we perform when looking for |
| redundancies for loads and stores. If this limit is hit the search |
| is aborted and the load or store is not considered redundant. The |
| number of queries is algorithmically limited to the number of |
| stores on all paths from the load to the function entry. |
| The default maxmimum number of queries is 1000. |
| </p> |
| </dd> |
| <dt><code>ira-max-loops-num</code></dt> |
| <dd><p>IRA uses regional register allocation by default. If a function |
| contains more loops than the number given by this parameter, only at most |
| the given number of the most frequently-executed loops form regions |
| for regional register allocation. The default value of the |
| parameter is 100. |
| </p> |
| </dd> |
| <dt><code>ira-max-conflict-table-size</code></dt> |
| <dd><p>Although IRA uses a sophisticated algorithm to compress the conflict |
| table, the table can still require excessive amounts of memory for |
| huge functions. If the conflict table for a function could be more |
| than the size in MB given by this parameter, the register allocator |
| instead uses a faster, simpler, and lower-quality |
| algorithm that does not require building a pseudo-register conflict table. |
| The default value of the parameter is 2000. |
| </p> |
| </dd> |
| <dt><code>ira-loop-reserved-regs</code></dt> |
| <dd><p>IRA can be used to evaluate more accurate register pressure in loops |
| for decisions to move loop invariants (see <samp>-O3</samp>). The number |
| of available registers reserved for some other purposes is given |
| by this parameter. The default value of the parameter is 2, which is |
| the minimal number of registers needed by typical instructions. |
| This value is the best found from numerous experiments. |
| </p> |
| </dd> |
| <dt><code>lra-inheritance-ebb-probability-cutoff</code></dt> |
| <dd><p>LRA tries to reuse values reloaded in registers in subsequent insns. |
| This optimization is called inheritance. EBB is used as a region to |
| do this optimization. The parameter defines a minimal fall-through |
| edge probability in percentage used to add BB to inheritance EBB in |
| LRA. The default value of the parameter is 40. The value was chosen |
| from numerous runs of SPEC2000 on x86-64. |
| </p> |
| </dd> |
| <dt><code>loop-invariant-max-bbs-in-loop</code></dt> |
| <dd><p>Loop invariant motion can be very expensive, both in compilation time and |
| in amount of needed compile-time memory, with very large loops. Loops |
| with more basic blocks than this parameter won’t have loop invariant |
| motion optimization performed on them. The default value of the |
| parameter is 1000 for <samp>-O1</samp> and 10000 for <samp>-O2</samp> and above. |
| </p> |
| </dd> |
| <dt><code>loop-max-datarefs-for-datadeps</code></dt> |
| <dd><p>Building data dapendencies is expensive for very large loops. This |
| parameter limits the number of data references in loops that are |
| considered for data dependence analysis. These large loops are no |
| handled by the optimizations using loop data dependencies. |
| The default value is 1000. |
| </p> |
| </dd> |
| <dt><code>max-vartrack-size</code></dt> |
| <dd><p>Sets a maximum number of hash table slots to use during variable |
| tracking dataflow analysis of any function. If this limit is exceeded |
| with variable tracking at assignments enabled, analysis for that |
| function is retried without it, after removing all debug insns from |
| the function. If the limit is exceeded even without debug insns, var |
| tracking analysis is completely disabled for the function. Setting |
| the parameter to zero makes it unlimited. |
| </p> |
| </dd> |
| <dt><code>max-vartrack-expr-depth</code></dt> |
| <dd><p>Sets a maximum number of recursion levels when attempting to map |
| variable names or debug temporaries to value expressions. This trades |
| compilation time for more complete debug information. If this is set too |
| low, value expressions that are available and could be represented in |
| debug information may end up not being used; setting this higher may |
| enable the compiler to find more complex debug expressions, but compile |
| time and memory use may grow. The default is 12. |
| </p> |
| </dd> |
| <dt><code>min-nondebug-insn-uid</code></dt> |
| <dd><p>Use uids starting at this parameter for nondebug insns. The range below |
| the parameter is reserved exclusively for debug insns created by |
| <samp>-fvar-tracking-assignments</samp>, but debug insns may get |
| (non-overlapping) uids above it if the reserved range is exhausted. |
| </p> |
| </dd> |
| <dt><code>ipa-sra-ptr-growth-factor</code></dt> |
| <dd><p>IPA-SRA replaces a pointer to an aggregate with one or more new |
| parameters only when their cumulative size is less or equal to |
| <samp>ipa-sra-ptr-growth-factor</samp> times the size of the original |
| pointer parameter. |
| </p> |
| </dd> |
| <dt><code>sra-max-scalarization-size-Ospeed</code></dt> |
| <dt><code>sra-max-scalarization-size-Osize</code></dt> |
| <dd><p>The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to |
| replace scalar parts of aggregates with uses of independent scalar |
| variables. These parameters control the maximum size, in storage units, |
| of aggregate which is considered for replacement when compiling for |
| speed |
| (<samp>sra-max-scalarization-size-Ospeed</samp>) or size |
| (<samp>sra-max-scalarization-size-Osize</samp>) respectively. |
| </p> |
| </dd> |
| <dt><code>tm-max-aggregate-size</code></dt> |
| <dd><p>When making copies of thread-local variables in a transaction, this |
| parameter specifies the size in bytes after which variables are |
| saved with the logging functions as opposed to save/restore code |
| sequence pairs. This option only applies when using |
| <samp>-fgnu-tm</samp>. |
| </p> |
| </dd> |
| <dt><code>graphite-max-nb-scop-params</code></dt> |
| <dd><p>To avoid exponential effects in the Graphite loop transforms, the |
| number of parameters in a Static Control Part (SCoP) is bounded. The |
| default value is 10 parameters. A variable whose value is unknown at |
| compilation time and defined outside a SCoP is a parameter of the SCoP. |
| </p> |
| </dd> |
| <dt><code>graphite-max-bbs-per-function</code></dt> |
| <dd><p>To avoid exponential effects in the detection of SCoPs, the size of |
| the functions analyzed by Graphite is bounded. The default value is |
| 100 basic blocks. |
| </p> |
| </dd> |
| <dt><code>loop-block-tile-size</code></dt> |
| <dd><p>Loop blocking or strip mining transforms, enabled with |
| <samp>-floop-block</samp> or <samp>-floop-strip-mine</samp>, strip mine each |
| loop in the loop nest by a given number of iterations. The strip |
| length can be changed using the <samp>loop-block-tile-size</samp> |
| parameter. The default value is 51 iterations. |
| </p> |
| </dd> |
| <dt><code>loop-unroll-jam-size</code></dt> |
| <dd><p>Specify the unroll factor for the <samp>-floop-unroll-and-jam</samp> option. The |
| default value is 4. |
| </p> |
| </dd> |
| <dt><code>loop-unroll-jam-depth</code></dt> |
| <dd><p>Specify the dimension to be unrolled (counting from the most inner loop) |
| for the <samp>-floop-unroll-and-jam</samp>. The default value is 2. |
| </p> |
| </dd> |
| <dt><code>ipa-cp-value-list-size</code></dt> |
| <dd><p>IPA-CP attempts to track all possible values and types passed to a function’s |
| parameter in order to propagate them and perform devirtualization. |
| <samp>ipa-cp-value-list-size</samp> is the maximum number of values and types it |
| stores per one formal parameter of a function. |
| </p> |
| </dd> |
| <dt><code>ipa-cp-eval-threshold</code></dt> |
| <dd><p>IPA-CP calculates its own score of cloning profitability heuristics |
| and performs those cloning opportunities with scores that exceed |
| <samp>ipa-cp-eval-threshold</samp>. |
| </p> |
| </dd> |
| <dt><code>ipa-cp-recursion-penalty</code></dt> |
| <dd><p>Percentage penalty the recursive functions will receive when they |
| are evaluated for cloning. |
| </p> |
| </dd> |
| <dt><code>ipa-cp-single-call-penalty</code></dt> |
| <dd><p>Percentage penalty functions containg a single call to another |
| function will receive when they are evaluated for cloning. |
| </p> |
| |
| </dd> |
| <dt><code>ipa-max-agg-items</code></dt> |
| <dd><p>IPA-CP is also capable to propagate a number of scalar values passed |
| in an aggregate. <samp>ipa-max-agg-items</samp> controls the maximum |
| number of such values per one parameter. |
| </p> |
| </dd> |
| <dt><code>ipa-cp-loop-hint-bonus</code></dt> |
| <dd><p>When IPA-CP determines that a cloning candidate would make the number |
| of iterations of a loop known, it adds a bonus of |
| <samp>ipa-cp-loop-hint-bonus</samp> to the profitability score of |
| the candidate. |
| </p> |
| </dd> |
| <dt><code>ipa-cp-array-index-hint-bonus</code></dt> |
| <dd><p>When IPA-CP determines that a cloning candidate would make the index of |
| an array access known, it adds a bonus of |
| <samp>ipa-cp-array-index-hint-bonus</samp> to the profitability |
| score of the candidate. |
| </p> |
| </dd> |
| <dt><code>ipa-max-aa-steps</code></dt> |
| <dd><p>During its analysis of function bodies, IPA-CP employs alias analysis |
| in order to track values pointed to by function parameters. In order |
| not spend too much time analyzing huge functions, it gives up and |
| consider all memory clobbered after examining |
| <samp>ipa-max-aa-steps</samp> statements modifying memory. |
| </p> |
| </dd> |
| <dt><code>lto-partitions</code></dt> |
| <dd><p>Specify desired number of partitions produced during WHOPR compilation. |
| The number of partitions should exceed the number of CPUs used for compilation. |
| The default value is 32. |
| </p> |
| </dd> |
| <dt><code>lto-minpartition</code></dt> |
| <dd><p>Size of minimal partition for WHOPR (in estimated instructions). |
| This prevents expenses of splitting very small programs into too many |
| partitions. |
| </p> |
| </dd> |
| <dt><code>cxx-max-namespaces-for-diagnostic-help</code></dt> |
| <dd><p>The maximum number of namespaces to consult for suggestions when C++ |
| name lookup fails for an identifier. The default is 1000. |
| </p> |
| </dd> |
| <dt><code>sink-frequency-threshold</code></dt> |
| <dd><p>The maximum relative execution frequency (in percents) of the target block |
| relative to a statement’s original block to allow statement sinking of a |
| statement. Larger numbers result in more aggressive statement sinking. |
| The default value is 75. A small positive adjustment is applied for |
| statements with memory operands as those are even more profitable so sink. |
| </p> |
| </dd> |
| <dt><code>max-stores-to-sink</code></dt> |
| <dd><p>The maximum number of conditional stores paires that can be sunk. Set to 0 |
| if either vectorization (<samp>-ftree-vectorize</samp>) or if-conversion |
| (<samp>-ftree-loop-if-convert</samp>) is disabled. The default is 2. |
| </p> |
| </dd> |
| <dt><code>allow-store-data-races</code></dt> |
| <dd><p>Allow optimizers to introduce new data races on stores. |
| Set to 1 to allow, otherwise to 0. This option is enabled by default |
| at optimization level <samp>-Ofast</samp>. |
| </p> |
| </dd> |
| <dt><code>case-values-threshold</code></dt> |
| <dd><p>The smallest number of different values for which it is best to use a |
| jump-table instead of a tree of conditional branches. If the value is |
| 0, use the default for the machine. The default is 0. |
| </p> |
| </dd> |
| <dt><code>tree-reassoc-width</code></dt> |
| <dd><p>Set the maximum number of instructions executed in parallel in |
| reassociated tree. This parameter overrides target dependent |
| heuristics used by default if has non zero value. |
| </p> |
| </dd> |
| <dt><code>sched-pressure-algorithm</code></dt> |
| <dd><p>Choose between the two available implementations of |
| <samp>-fsched-pressure</samp>. Algorithm 1 is the original implementation |
| and is the more likely to prevent instructions from being reordered. |
| Algorithm 2 was designed to be a compromise between the relatively |
| conservative approach taken by algorithm 1 and the rather aggressive |
| approach taken by the default scheduler. It relies more heavily on |
| having a regular register file and accurate register pressure classes. |
| See <samp>haifa-sched.c</samp> in the GCC sources for more details. |
| </p> |
| <p>The default choice depends on the target. |
| </p> |
| </dd> |
| <dt><code>max-slsr-cand-scan</code></dt> |
| <dd><p>Set the maximum number of existing candidates that are considered when |
| seeking a basis for a new straight-line strength reduction candidate. |
| </p> |
| </dd> |
| <dt><code>asan-globals</code></dt> |
| <dd><p>Enable buffer overflow detection for global objects. This kind |
| of protection is enabled by default if you are using |
| <samp>-fsanitize=address</samp> option. |
| To disable global objects protection use <samp>--param asan-globals=0</samp>. |
| </p> |
| </dd> |
| <dt><code>asan-stack</code></dt> |
| <dd><p>Enable buffer overflow detection for stack objects. This kind of |
| protection is enabled by default when using<samp>-fsanitize=address</samp>. |
| To disable stack protection use <samp>--param asan-stack=0</samp> option. |
| </p> |
| </dd> |
| <dt><code>asan-instrument-reads</code></dt> |
| <dd><p>Enable buffer overflow detection for memory reads. This kind of |
| protection is enabled by default when using <samp>-fsanitize=address</samp>. |
| To disable memory reads protection use |
| <samp>--param asan-instrument-reads=0</samp>. |
| </p> |
| </dd> |
| <dt><code>asan-instrument-writes</code></dt> |
| <dd><p>Enable buffer overflow detection for memory writes. This kind of |
| protection is enabled by default when using <samp>-fsanitize=address</samp>. |
| To disable memory writes protection use |
| <samp>--param asan-instrument-writes=0</samp> option. |
| </p> |
| </dd> |
| <dt><code>asan-memintrin</code></dt> |
| <dd><p>Enable detection for built-in functions. This kind of protection |
| is enabled by default when using <samp>-fsanitize=address</samp>. |
| To disable built-in functions protection use |
| <samp>--param asan-memintrin=0</samp>. |
| </p> |
| </dd> |
| <dt><code>asan-use-after-return</code></dt> |
| <dd><p>Enable detection of use-after-return. This kind of protection |
| is enabled by default when using <samp>-fsanitize=address</samp> option. |
| To disable use-after-return detection use |
| <samp>--param asan-use-after-return=0</samp>. |
| </p> |
| </dd> |
| <dt><code>asan-instrumentation-with-call-threshold</code></dt> |
| <dd><p>If number of memory accesses in function being instrumented |
| is greater or equal to this number, use callbacks instead of inline checks. |
| E.g. to disable inline code use |
| <samp>--param asan-instrumentation-with-call-threshold=0</samp>. |
| </p> |
| </dd> |
| <dt><code>chkp-max-ctor-size</code></dt> |
| <dd><p>Static constructors generated by Pointer Bounds Checker may become very |
| large and significantly increase compile time at optimization level |
| <samp>-O1</samp> and higher. This parameter is a maximum nubmer of statements |
| in a single generated constructor. Default value is 5000. |
| </p> |
| </dd> |
| <dt><code>max-fsm-thread-path-insns</code></dt> |
| <dd><p>Maximum number of instructions to copy when duplicating blocks on a |
| finite state automaton jump thread path. The default is 100. |
| </p> |
| </dd> |
| <dt><code>max-fsm-thread-length</code></dt> |
| <dd><p>Maximum number of basic blocks on a finite state automaton jump thread |
| path. The default is 10. |
| </p> |
| </dd> |
| <dt><code>max-fsm-thread-paths</code></dt> |
| <dd><p>Maximum number of new jump thread paths to create for a finite state |
| automaton. The default is 50. |
| </p> |
| </dd> |
| </dl> |
| </dd> |
| </dl> |
| |
| <hr> |
| <div class="header"> |
| <p> |
| Next: <a href="Preprocessor-Options.html#Preprocessor-Options" accesskey="n" rel="next">Preprocessor Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="prev">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p> |
| </div> |
| |
| |
| |
| </body> |
| </html> |