| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ --> |
| <head> |
| <title>The GNU C Preprocessor Internals: Macro Expansion</title> |
| |
| <meta name="description" content="The GNU C Preprocessor Internals: Macro Expansion"> |
| <meta name="keywords" content="The GNU C Preprocessor Internals: Macro Expansion"> |
| <meta name="resource-type" content="document"> |
| <meta name="distribution" content="global"> |
| <meta name="Generator" content="makeinfo"> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| <link href="index.html#Top" rel="start" title="Top"> |
| <link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index"> |
| <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> |
| <link href="index.html#Top" rel="up" title="Top"> |
| <link href="Token-Spacing.html#Token-Spacing" rel="next" title="Token Spacing"> |
| <link href="Hash-Nodes.html#Hash-Nodes" rel="prev" title="Hash Nodes"> |
| <style type="text/css"> |
| <!-- |
| a.summary-letter {text-decoration: none} |
| blockquote.smallquotation {font-size: smaller} |
| div.display {margin-left: 3.2em} |
| div.example {margin-left: 3.2em} |
| div.indentedblock {margin-left: 3.2em} |
| div.lisp {margin-left: 3.2em} |
| div.smalldisplay {margin-left: 3.2em} |
| div.smallexample {margin-left: 3.2em} |
| div.smallindentedblock {margin-left: 3.2em; font-size: smaller} |
| div.smalllisp {margin-left: 3.2em} |
| kbd {font-style:oblique} |
| pre.display {font-family: inherit} |
| pre.format {font-family: inherit} |
| pre.menu-comment {font-family: serif} |
| pre.menu-preformatted {font-family: serif} |
| pre.smalldisplay {font-family: inherit; font-size: smaller} |
| pre.smallexample {font-size: smaller} |
| pre.smallformat {font-family: inherit; font-size: smaller} |
| pre.smalllisp {font-size: smaller} |
| span.nocodebreak {white-space:nowrap} |
| span.nolinebreak {white-space:nowrap} |
| span.roman {font-family:serif; font-weight:normal} |
| span.sansserif {font-family:sans-serif; font-weight:normal} |
| ul.no-bullet {list-style: none} |
| --> |
| </style> |
| |
| |
| </head> |
| |
| <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> |
| <a name="Macro-Expansion"></a> |
| <div class="header"> |
| <p> |
| Next: <a href="Token-Spacing.html#Token-Spacing" accesskey="n" rel="next">Token Spacing</a>, Previous: <a href="Hash-Nodes.html#Hash-Nodes" accesskey="p" rel="prev">Hash Nodes</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p> |
| </div> |
| <hr> |
| <a name="Macro-Expansion-Algorithm"></a> |
| <h2 class="unnumbered">Macro Expansion Algorithm</h2> |
| <a name="index-macro-expansion"></a> |
| |
| <p>Macro expansion is a tricky operation, fraught with nasty corner cases |
| and situations that render what you thought was a nifty way to |
| optimize the preprocessor’s expansion algorithm wrong in quite subtle |
| ways. |
| </p> |
| <p>I strongly recommend you have a good grasp of how the C and C++ |
| standards require macros to be expanded before diving into this |
| section, let alone the code!. If you don’t have a clear mental |
| picture of how things like nested macro expansion, stringification and |
| token pasting are supposed to work, damage to your sanity can quickly |
| result. |
| </p> |
| <a name="Internal-representation-of-macros"></a> |
| <h3 class="section">Internal representation of macros</h3> |
| <a name="index-macro-representation-_0028internal_0029"></a> |
| |
| <p>The preprocessor stores macro expansions in tokenized form. This |
| saves repeated lexing passes during expansion, at the cost of a small |
| increase in memory consumption on average. The tokens are stored |
| contiguously in memory, so a pointer to the first one and a token |
| count is all you need to get the replacement list of a macro. |
| </p> |
| <p>If the macro is a function-like macro the preprocessor also stores its |
| parameters, in the form of an ordered list of pointers to the hash |
| table entry of each parameter’s identifier. Further, in the macro’s |
| stored expansion each occurrence of a parameter is replaced with a |
| special token of type <code>CPP_MACRO_ARG</code>. Each such token holds the |
| index of the parameter it represents in the parameter list, which |
| allows rapid replacement of parameters with their arguments during |
| expansion. Despite this optimization it is still necessary to store |
| the original parameters to the macro, both for dumping with e.g., |
| <samp>-dD</samp>, and to warn about non-trivial macro redefinitions when |
| the parameter names have changed. |
| </p> |
| <a name="Macro-expansion-overview"></a> |
| <h3 class="section">Macro expansion overview</h3> |
| <p>The preprocessor maintains a <em>context stack</em>, implemented as a |
| linked list of <code>cpp_context</code> structures, which together represent |
| the macro expansion state at any one time. The <code>struct |
| cpp_reader</code> member variable <code>context</code> points to the current top |
| of this stack. The top normally holds the unexpanded replacement list |
| of the innermost macro under expansion, except when cpplib is about to |
| pre-expand an argument, in which case it holds that argument’s |
| unexpanded tokens. |
| </p> |
| <p>When there are no macros under expansion, cpplib is in <em>base |
| context</em>. All contexts other than the base context contain a |
| contiguous list of tokens delimited by a starting and ending token. |
| When not in base context, cpplib obtains the next token from the list |
| of the top context. If there are no tokens left in the list, it pops |
| that context off the stack, and subsequent ones if necessary, until an |
| unexhausted context is found or it returns to base context. In base |
| context, cpplib reads tokens directly from the lexer. |
| </p> |
| <p>If it encounters an identifier that is both a macro and enabled for |
| expansion, cpplib prepares to push a new context for that macro on the |
| stack by calling the routine <code>enter_macro_context</code>. When this |
| routine returns, the new context will contain the unexpanded tokens of |
| the replacement list of that macro. In the case of function-like |
| macros, <code>enter_macro_context</code> also replaces any parameters in the |
| replacement list, stored as <code>CPP_MACRO_ARG</code> tokens, with the |
| appropriate macro argument. If the standard requires that the |
| parameter be replaced with its expanded argument, the argument will |
| have been fully macro expanded first. |
| </p> |
| <p><code>enter_macro_context</code> also handles special macros like |
| <code>__LINE__</code>. Although these macros expand to a single token which |
| cannot contain any further macros, for reasons of token spacing |
| (see <a href="Token-Spacing.html#Token-Spacing">Token Spacing</a>) and simplicity of implementation, cpplib |
| handles these special macros by pushing a context containing just that |
| one token. |
| </p> |
| <p>The final thing that <code>enter_macro_context</code> does before returning |
| is to mark the macro disabled for expansion (except for special macros |
| like <code>__TIME__</code>). The macro is re-enabled when its context is |
| later popped from the context stack, as described above. This strict |
| ordering ensures that a macro is disabled whilst its expansion is |
| being scanned, but that it is <em>not</em> disabled whilst any arguments |
| to it are being expanded. |
| </p> |
| <a name="Scanning-the-replacement-list-for-macros-to-expand"></a> |
| <h3 class="section">Scanning the replacement list for macros to expand</h3> |
| <p>The C standard states that, after any parameters have been replaced |
| with their possibly-expanded arguments, the replacement list is |
| scanned for nested macros. Further, any identifiers in the |
| replacement list that are not expanded during this scan are never |
| again eligible for expansion in the future, if the reason they were |
| not expanded is that the macro in question was disabled. |
| </p> |
| <p>Clearly this latter condition can only apply to tokens resulting from |
| argument pre-expansion. Other tokens never have an opportunity to be |
| re-tested for expansion. It is possible for identifiers that are |
| function-like macros to not expand initially but to expand during a |
| later scan. This occurs when the identifier is the last token of an |
| argument (and therefore originally followed by a comma or a closing |
| parenthesis in its macro’s argument list), and when it replaces its |
| parameter in the macro’s replacement list, the subsequent token |
| happens to be an opening parenthesis (itself possibly the first token |
| of an argument). |
| </p> |
| <p>It is important to note that when cpplib reads the last token of a |
| given context, that context still remains on the stack. Only when |
| looking for the <em>next</em> token do we pop it off the stack and drop |
| to a lower context. This makes backing up by one token easy, but more |
| importantly ensures that the macro corresponding to the current |
| context is still disabled when we are considering the last token of |
| its replacement list for expansion (or indeed expanding it). As an |
| example, which illustrates many of the points above, consider |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">#define foo(x) bar x |
| foo(foo) (2) |
| </pre></div> |
| |
| <p>which fully expands to ‘<samp>bar foo (2)</samp>’. During pre-expansion |
| of the argument, ‘<samp>foo</samp>’ does not expand even though the macro is |
| enabled, since it has no following parenthesis [pre-expansion of an |
| argument only uses tokens from that argument; it cannot take tokens |
| from whatever follows the macro invocation]. This still leaves the |
| argument token ‘<samp>foo</samp>’ eligible for future expansion. Then, when |
| re-scanning after argument replacement, the token ‘<samp>foo</samp>’ is |
| rejected for expansion, and marked ineligible for future expansion, |
| since the macro is now disabled. It is disabled because the |
| replacement list ‘<samp>bar foo</samp>’ of the macro is still on the context |
| stack. |
| </p> |
| <p>If instead the algorithm looked for an opening parenthesis first and |
| then tested whether the macro were disabled it would be subtly wrong. |
| In the example above, the replacement list of ‘<samp>foo</samp>’ would be |
| popped in the process of finding the parenthesis, re-enabling |
| ‘<samp>foo</samp>’ and expanding it a second time. |
| </p> |
| <a name="Looking-for-a-function_002dlike-macro_0027s-opening-parenthesis"></a> |
| <h3 class="section">Looking for a function-like macro’s opening parenthesis</h3> |
| <p>Function-like macros only expand when immediately followed by a |
| parenthesis. To do this cpplib needs to temporarily disable macros |
| and read the next token. Unfortunately, because of spacing issues |
| (see <a href="Token-Spacing.html#Token-Spacing">Token Spacing</a>), there can be fake padding tokens in-between, |
| and if the next real token is not a parenthesis cpplib needs to be |
| able to back up that one token as well as retain the information in |
| any intervening padding tokens. |
| </p> |
| <p>Backing up more than one token when macros are involved is not |
| permitted by cpplib, because in general it might involve issues like |
| restoring popped contexts onto the context stack, which are too hard. |
| Instead, searching for the parenthesis is handled by a special |
| function, <code>funlike_invocation_p</code>, which remembers padding |
| information as it reads tokens. If the next real token is not an |
| opening parenthesis, it backs up that one token, and then pushes an |
| extra context just containing the padding information if necessary. |
| </p> |
| <a name="Marking-tokens-ineligible-for-future-expansion"></a> |
| <h3 class="section">Marking tokens ineligible for future expansion</h3> |
| <p>As discussed above, cpplib needs a way of marking tokens as |
| unexpandable. Since the tokens cpplib handles are read-only once they |
| have been lexed, it instead makes a copy of the token and adds the |
| flag <code>NO_EXPAND</code> to the copy. |
| </p> |
| <p>For efficiency and to simplify memory management by avoiding having to |
| remember to free these tokens, they are allocated as temporary tokens |
| from the lexer’s current token run (see <a href="Lexer.html#Lexing-a-line">Lexing a line</a>) using the |
| function <code>_cpp_temp_token</code>. The tokens are then re-used once the |
| current line of tokens has been read in. |
| </p> |
| <p>This might sound unsafe. However, tokens runs are not re-used at the |
| end of a line if it happens to be in the middle of a macro argument |
| list, and cpplib only wants to back-up more than one lexer token in |
| situations where no macro expansion is involved, so the optimization |
| is safe. |
| </p> |
| <hr> |
| <div class="header"> |
| <p> |
| Next: <a href="Token-Spacing.html#Token-Spacing" accesskey="n" rel="next">Token Spacing</a>, Previous: <a href="Hash-Nodes.html#Hash-Nodes" accesskey="p" rel="prev">Hash Nodes</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p> |
| </div> |
| |
| |
| |
| </body> |
| </html> |