| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ --> |
| <head> |
| <title>The GNU C Preprocessor Internals: Token Spacing</title> |
| |
| <meta name="description" content="The GNU C Preprocessor Internals: Token Spacing"> |
| <meta name="keywords" content="The GNU C Preprocessor Internals: Token Spacing"> |
| <meta name="resource-type" content="document"> |
| <meta name="distribution" content="global"> |
| <meta name="Generator" content="makeinfo"> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| <link href="index.html#Top" rel="start" title="Top"> |
| <link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index"> |
| <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> |
| <link href="index.html#Top" rel="up" title="Top"> |
| <link href="Line-Numbering.html#Line-Numbering" rel="next" title="Line Numbering"> |
| <link href="Macro-Expansion.html#Macro-Expansion" rel="prev" title="Macro Expansion"> |
| <style type="text/css"> |
| <!-- |
| a.summary-letter {text-decoration: none} |
| blockquote.smallquotation {font-size: smaller} |
| div.display {margin-left: 3.2em} |
| div.example {margin-left: 3.2em} |
| div.indentedblock {margin-left: 3.2em} |
| div.lisp {margin-left: 3.2em} |
| div.smalldisplay {margin-left: 3.2em} |
| div.smallexample {margin-left: 3.2em} |
| div.smallindentedblock {margin-left: 3.2em; font-size: smaller} |
| div.smalllisp {margin-left: 3.2em} |
| kbd {font-style:oblique} |
| pre.display {font-family: inherit} |
| pre.format {font-family: inherit} |
| pre.menu-comment {font-family: serif} |
| pre.menu-preformatted {font-family: serif} |
| pre.smalldisplay {font-family: inherit; font-size: smaller} |
| pre.smallexample {font-size: smaller} |
| pre.smallformat {font-family: inherit; font-size: smaller} |
| pre.smalllisp {font-size: smaller} |
| span.nocodebreak {white-space:nowrap} |
| span.nolinebreak {white-space:nowrap} |
| span.roman {font-family:serif; font-weight:normal} |
| span.sansserif {font-family:sans-serif; font-weight:normal} |
| ul.no-bullet {list-style: none} |
| --> |
| </style> |
| |
| |
| </head> |
| |
| <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> |
| <a name="Token-Spacing"></a> |
| <div class="header"> |
| <p> |
| Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p> |
| </div> |
| <hr> |
| <a name="Token-Spacing-1"></a> |
| <h2 class="unnumbered">Token Spacing</h2> |
| <a name="index-paste-avoidance"></a> |
| <a name="index-spacing"></a> |
| <a name="index-token-spacing"></a> |
| |
| <p>First, consider an issue that only concerns the stand-alone |
| preprocessor: there needs to be a guarantee that re-reading its preprocessed |
| output results in an identical token stream. Without taking special |
| measures, this might not be the case because of macro substitution. |
| For example: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">#define PLUS + |
| #define EMPTY |
| #define f(x) =x= |
| +PLUS -EMPTY- PLUS+ f(=) |
| → + + - - + + = = = |
| <em>not</em> |
| → ++ -- ++ === |
| </pre></div> |
| |
| <p>One solution would be to simply insert a space between all adjacent |
| tokens. However, we would like to keep space insertion to a minimum, |
| both for aesthetic reasons and because it causes problems for people who |
| still try to abuse the preprocessor for things like Fortran source and |
| Makefiles. |
| </p> |
| <p>For now, just notice that when tokens are added (or removed, as shown by |
| the <code>EMPTY</code> example) from the original lexed token stream, we need |
| to check for accidental token pasting. We call this <em>paste |
| avoidance</em>. Token addition and removal can only occur because of macro |
| expansion, but accidental pasting can occur in many places: both before |
| and after each macro replacement, each argument replacement, and |
| additionally each token created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators. |
| </p> |
| <p>Look at how the preprocessor gets whitespace output correct |
| normally. The <code>cpp_token</code> structure contains a flags byte, and one |
| of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and |
| indicates that the token was preceded by whitespace of some form other |
| than a new line. The stand-alone preprocessor can use this flag to |
| decide whether to insert a space between tokens in the output. |
| </p> |
| <p>Now consider the result of the following macro expansion: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">#define add(x, y, z) x + y +z; |
| sum = add (1,2, 3); |
| → sum = 1 + 2 +3; |
| </pre></div> |
| |
| <p>The interesting thing here is that the tokens ‘<samp>1</samp>’ and ‘<samp>2</samp>’ are |
| output with a preceding space, and ‘<samp>3</samp>’ is output without a |
| preceding space, but when lexed none of these tokens had that property. |
| Careful consideration reveals that ‘<samp>1</samp>’ gets its preceding |
| whitespace from the space preceding ‘<samp>add</samp>’ in the macro invocation, |
| <em>not</em> replacement list. ‘<samp>2</samp>’ gets its whitespace from the |
| space preceding the parameter ‘<samp>y</samp>’ in the macro replacement list, |
| and ‘<samp>3</samp>’ has no preceding space because parameter ‘<samp>z</samp>’ has none |
| in the replacement list. |
| </p> |
| <p>Once lexed, tokens are effectively fixed and cannot be altered, since |
| pointers to them might be held in many places, in particular by |
| in-progress macro expansions. So instead of modifying the two tokens |
| above, the preprocessor inserts a special token, which I call a |
| <em>padding token</em>, into the token stream to indicate that spacing of |
| the subsequent token is special. The preprocessor inserts padding |
| tokens in front of every macro expansion and expanded macro argument. |
| These point to a <em>source token</em> from which the subsequent real token |
| should inherit its spacing. In the above example, the source tokens are |
| ‘<samp>add</samp>’ in the macro invocation, and ‘<samp>y</samp>’ and ‘<samp>z</samp>’ in the |
| macro replacement list, respectively. |
| </p> |
| <p>It is quite easy to get multiple padding tokens in a row, for example if |
| a macro’s first replacement token expands straight into another macro. |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">#define foo bar |
| #define bar baz |
| [foo] |
| → [baz] |
| </pre></div> |
| |
| <p>Here, two padding tokens are generated with sources the ‘<samp>foo</samp>’ token |
| between the brackets, and the ‘<samp>bar</samp>’ token from foo’s replacement |
| list, respectively. Clearly the first padding token is the one to |
| use, so the output code should contain a rule that the first |
| padding token in a sequence is the one that matters. |
| </p> |
| <p>But what if a macro expansion is left? Adjusting the above |
| example slightly: |
| </p> |
| <div class="smallexample"> |
| <pre class="smallexample">#define foo bar |
| #define bar EMPTY baz |
| #define EMPTY |
| [foo] EMPTY; |
| → [ baz] ; |
| </pre></div> |
| |
| <p>As shown, now there should be a space before ‘<samp>baz</samp>’ and the |
| semicolon in the output. |
| </p> |
| <p>The rules we decided above fail for ‘<samp>baz</samp>’: we generate three |
| padding tokens, one per macro invocation, before the token ‘<samp>baz</samp>’. |
| We would then have it take its spacing from the first of these, which |
| carries source token ‘<samp>foo</samp>’ with no leading space. |
| </p> |
| <p>It is vital that cpplib get spacing correct in these examples since any |
| of these macro expansions could be stringified, where spacing matters. |
| </p> |
| <p>So, this demonstrates that not just entering macro and argument |
| expansions, but leaving them requires special handling too. I made |
| cpplib insert a padding token with a <code>NULL</code> source token when |
| leaving macro expansions, as well as after each replaced argument in a |
| macro’s replacement list. It also inserts appropriate padding tokens on |
| either side of tokens created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators. |
| I expanded the rule so that, if we see a padding token with a |
| <code>NULL</code> source token, <em>and</em> that source token has no leading |
| space, then we behave as if we have seen no padding tokens at all. A |
| quick check shows this rule will then get the above example correct as |
| well. |
| </p> |
| <p>Now a relationship with paste avoidance is apparent: we have to be |
| careful about paste avoidance in exactly the same locations we have |
| padding tokens in order to get white space correct. This makes |
| implementation of paste avoidance easy: wherever the stand-alone |
| preprocessor is fixing up spacing because of padding tokens, and it |
| turns out that no space is needed, it has to take the extra step to |
| check that a space is not needed after all to avoid an accidental paste. |
| The function <code>cpp_avoid_paste</code> advises whether a space is required |
| between two consecutive tokens. To avoid excessive spacing, it tries |
| hard to only require a space if one is likely to be necessary, but for |
| reasons of efficiency it is slightly conservative and might recommend a |
| space where one is not strictly needed. |
| </p> |
| <hr> |
| <div class="header"> |
| <p> |
| Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p> |
| </div> |
| |
| |
| |
| </body> |
| </html> |