aarch64-linux-gnu-4.9/share/doc/cppinternals/Token-Spacing.html - toolchains/linux-x86/gcc - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 <html>
 <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
 <head>
 <title>The GNU C Preprocessor Internals: Token Spacing</title>

 <meta name="description" content="The GNU C Preprocessor Internals: Token Spacing">
 <meta name="keywords" content="The GNU C Preprocessor Internals: Token Spacing">
 <meta name="resource-type" content="document">
 <meta name="distribution" content="global">
 <meta name="Generator" content="makeinfo">
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <link href="index.html#Top" rel="start" title="Top">
 <link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
 <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
 <link href="index.html#Top" rel="up" title="Top">
 <link href="Line-Numbering.html#Line-Numbering" rel="next" title="Line Numbering">
 <link href="Macro-Expansion.html#Macro-Expansion" rel="prev" title="Macro Expansion">
 <style type="text/css">
 <!--
 a.summary-letter {text-decoration: none}
 blockquote.smallquotation {font-size: smaller}
 div.display {margin-left: 3.2em}
 div.example {margin-left: 3.2em}
 div.indentedblock {margin-left: 3.2em}
 div.lisp {margin-left: 3.2em}
 div.smalldisplay {margin-left: 3.2em}
 div.smallexample {margin-left: 3.2em}
 div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
 div.smalllisp {margin-left: 3.2em}
 kbd {font-style:oblique}
 pre.display {font-family: inherit}
 pre.format {font-family: inherit}
 pre.menu-comment {font-family: serif}
 pre.menu-preformatted {font-family: serif}
 pre.smalldisplay {font-family: inherit; font-size: smaller}
 pre.smallexample {font-size: smaller}
 pre.smallformat {font-family: inherit; font-size: smaller}
 pre.smalllisp {font-size: smaller}
 span.nocodebreak {white-space:nowrap}
 span.nolinebreak {white-space:nowrap}
 span.roman {font-family:serif; font-weight:normal}
 span.sansserif {font-family:sans-serif; font-weight:normal}
 ul.no-bullet {list-style: none}
 -->
 </style>


 </head>

 <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
 <a name="Token-Spacing"></a>
 <div class="header">
 <p>
 Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
 </div>
 <hr>
 <a name="Token-Spacing-1"></a>
 <h2 class="unnumbered">Token Spacing</h2>
 <a name="index-paste-avoidance"></a>
 <a name="index-spacing"></a>
 <a name="index-token-spacing"></a>

 <p>First, consider an issue that only concerns the stand-alone
 preprocessor: there needs to be a guarantee that re-reading its preprocessed
 output results in an identical token stream.  Without taking special
 measures, this might not be the case because of macro substitution.
 For example:
 </p>
 <div class="smallexample">
 <pre class="smallexample">#define PLUS +
 #define EMPTY
 #define f(x) =x=
 +PLUS -EMPTY- PLUS+ f(=)
         &rarr; + + - - + + = = =
 <em>not</em>
         &rarr; ++ -- ++ ===
 </pre></div>

 <p>One solution would be to simply insert a space between all adjacent
 tokens.  However, we would like to keep space insertion to a minimum,
 both for aesthetic reasons and because it causes problems for people who
 still try to abuse the preprocessor for things like Fortran source and
 Makefiles.
 </p>
 <p>For now, just notice that when tokens are added (or removed, as shown by
 the <code>EMPTY</code> example) from the original lexed token stream, we need
 to check for accidental token pasting.  We call this <em>paste
 avoidance</em>.  Token addition and removal can only occur because of macro
 expansion, but accidental pasting can occur in many places: both before
 and after each macro replacement, each argument replacement, and
 additionally each token created by the &lsquo;<samp>#</samp>&rsquo; and &lsquo;<samp>##</samp>&rsquo; operators.
 </p>
 <p>Look at how the preprocessor gets whitespace output correct
 normally.  The <code>cpp_token</code> structure contains a flags byte, and one
 of those flags is <code>PREV_WHITE</code>.  This is flagged by the lexer, and
 indicates that the token was preceded by whitespace of some form other
 than a new line.  The stand-alone preprocessor can use this flag to
 decide whether to insert a space between tokens in the output.
 </p>
 <p>Now consider the result of the following macro expansion:
 </p>
 <div class="smallexample">
 <pre class="smallexample">#define add(x, y, z) x + y +z;
 sum = add (1,2, 3);
         &rarr; sum = 1 + 2 +3;
 </pre></div>

 <p>The interesting thing here is that the tokens &lsquo;<samp>1</samp>&rsquo; and &lsquo;<samp>2</samp>&rsquo; are
 output with a preceding space, and &lsquo;<samp>3</samp>&rsquo; is output without a
 preceding space, but when lexed none of these tokens had that property.
 Careful consideration reveals that &lsquo;<samp>1</samp>&rsquo; gets its preceding
 whitespace from the space preceding &lsquo;<samp>add</samp>&rsquo; in the macro invocation,
 <em>not</em> replacement list.  &lsquo;<samp>2</samp>&rsquo; gets its whitespace from the
 space preceding the parameter &lsquo;<samp>y</samp>&rsquo; in the macro replacement list,
 and &lsquo;<samp>3</samp>&rsquo; has no preceding space because parameter &lsquo;<samp>z</samp>&rsquo; has none
 in the replacement list.
 </p>
 <p>Once lexed, tokens are effectively fixed and cannot be altered, since
 pointers to them might be held in many places, in particular by
 in-progress macro expansions.  So instead of modifying the two tokens
 above, the preprocessor inserts a special token, which I call a
 <em>padding token</em>, into the token stream to indicate that spacing of
 the subsequent token is special.  The preprocessor inserts padding
 tokens in front of every macro expansion and expanded macro argument.
 These point to a <em>source token</em> from which the subsequent real token
 should inherit its spacing.  In the above example, the source tokens are
 &lsquo;<samp>add</samp>&rsquo; in the macro invocation, and &lsquo;<samp>y</samp>&rsquo; and &lsquo;<samp>z</samp>&rsquo; in the
 macro replacement list, respectively.
 </p>
 <p>It is quite easy to get multiple padding tokens in a row, for example if
 a macro&rsquo;s first replacement token expands straight into another macro.
 </p>
 <div class="smallexample">
 <pre class="smallexample">#define foo bar
 #define bar baz
 [foo]
         &rarr; [baz]
 </pre></div>

 <p>Here, two padding tokens are generated with sources the &lsquo;<samp>foo</samp>&rsquo; token
 between the brackets, and the &lsquo;<samp>bar</samp>&rsquo; token from foo&rsquo;s replacement
 list, respectively.  Clearly the first padding token is the one to
 use, so the output code should contain a rule that the first
 padding token in a sequence is the one that matters.
 </p>
 <p>But what if a macro expansion is left?  Adjusting the above
 example slightly:
 </p>
 <div class="smallexample">
 <pre class="smallexample">#define foo bar
 #define bar EMPTY baz
 #define EMPTY
 [foo] EMPTY;
         &rarr; [ baz] ;
 </pre></div>

 <p>As shown, now there should be a space before &lsquo;<samp>baz</samp>&rsquo; and the
 semicolon in the output.
 </p>
 <p>The rules we decided above fail for &lsquo;<samp>baz</samp>&rsquo;: we generate three
 padding tokens, one per macro invocation, before the token &lsquo;<samp>baz</samp>&rsquo;.
 We would then have it take its spacing from the first of these, which
 carries source token &lsquo;<samp>foo</samp>&rsquo; with no leading space.
 </p>
 <p>It is vital that cpplib get spacing correct in these examples since any
 of these macro expansions could be stringified, where spacing matters.
 </p>
 <p>So, this demonstrates that not just entering macro and argument
 expansions, but leaving them requires special handling too.  I made
 cpplib insert a padding token with a <code>NULL</code> source token when
 leaving macro expansions, as well as after each replaced argument in a
 macro&rsquo;s replacement list.  It also inserts appropriate padding tokens on
 either side of tokens created by the &lsquo;<samp>#</samp>&rsquo; and &lsquo;<samp>##</samp>&rsquo; operators.
 I expanded the rule so that, if we see a padding token with a
 <code>NULL</code> source token, <em>and</em> that source token has no leading
 space, then we behave as if we have seen no padding tokens at all.  A
 quick check shows this rule will then get the above example correct as
 well.
 </p>
 <p>Now a relationship with paste avoidance is apparent: we have to be
 careful about paste avoidance in exactly the same locations we have
 padding tokens in order to get white space correct.  This makes
 implementation of paste avoidance easy: wherever the stand-alone
 preprocessor is fixing up spacing because of padding tokens, and it
 turns out that no space is needed, it has to take the extra step to
 check that a space is not needed after all to avoid an accidental paste.
 The function <code>cpp_avoid_paste</code> advises whether a space is required
 between two consecutive tokens.  To avoid excessive spacing, it tries
 hard to only require a space if one is likely to be necessary, but for
 reasons of efficiency it is slightly conservative and might recommend a
 space where one is not strictly needed.
 </p>
 <hr>
 <div class="header">
 <p>
 Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
 </div>


 </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
	<html>
	<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
	<head>
	<title>The GNU C Preprocessor Internals: Token Spacing</title>

	<meta name="description" content="The GNU C Preprocessor Internals: Token Spacing">
	<meta name="keywords" content="The GNU C Preprocessor Internals: Token Spacing">
	<meta name="resource-type" content="document">
	<meta name="distribution" content="global">
	<meta name="Generator" content="makeinfo">
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<link href="index.html#Top" rel="start" title="Top">
	<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
	<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
	<link href="index.html#Top" rel="up" title="Top">
	<link href="Line-Numbering.html#Line-Numbering" rel="next" title="Line Numbering">
	<link href="Macro-Expansion.html#Macro-Expansion" rel="prev" title="Macro Expansion">
	<style type="text/css">
	<!--
	a.summary-letter {text-decoration: none}
	blockquote.smallquotation {font-size: smaller}
	div.display {margin-left: 3.2em}
	div.example {margin-left: 3.2em}
	div.indentedblock {margin-left: 3.2em}
	div.lisp {margin-left: 3.2em}
	div.smalldisplay {margin-left: 3.2em}
	div.smallexample {margin-left: 3.2em}
	div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
	div.smalllisp {margin-left: 3.2em}
	kbd {font-style:oblique}
	pre.display {font-family: inherit}
	pre.format {font-family: inherit}
	pre.menu-comment {font-family: serif}
	pre.menu-preformatted {font-family: serif}
	pre.smalldisplay {font-family: inherit; font-size: smaller}
	pre.smallexample {font-size: smaller}
	pre.smallformat {font-family: inherit; font-size: smaller}
	pre.smalllisp {font-size: smaller}
	span.nocodebreak {white-space:nowrap}
	span.nolinebreak {white-space:nowrap}
	span.roman {font-family:serif; font-weight:normal}
	span.sansserif {font-family:sans-serif; font-weight:normal}
	ul.no-bullet {list-style: none}
	-->
	</style>


	</head>

	<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
	<a name="Token-Spacing"></a>
	<div class="header">
	<p>
	Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
	</div>
	<hr>
	<a name="Token-Spacing-1"></a>
	<h2 class="unnumbered">Token Spacing</h2>
	<a name="index-paste-avoidance"></a>
	<a name="index-spacing"></a>
	<a name="index-token-spacing"></a>

	<p>First, consider an issue that only concerns the stand-alone
	preprocessor: there needs to be a guarantee that re-reading its preprocessed
	output results in an identical token stream. Without taking special
	measures, this might not be the case because of macro substitution.
	For example:
	</p>
	<div class="smallexample">
	<pre class="smallexample">#define PLUS +
	#define EMPTY
	#define f(x) =x=
	+PLUS -EMPTY- PLUS+ f(=)
	→ + + - - + + = = =
	<em>not</em>
	→ ++ -- ++ ===
	</pre></div>

	<p>One solution would be to simply insert a space between all adjacent
	tokens. However, we would like to keep space insertion to a minimum,
	both for aesthetic reasons and because it causes problems for people who
	still try to abuse the preprocessor for things like Fortran source and
	Makefiles.
	</p>
	<p>For now, just notice that when tokens are added (or removed, as shown by
	the <code>EMPTY</code> example) from the original lexed token stream, we need
	to check for accidental token pasting. We call this <em>paste
	avoidance</em>. Token addition and removal can only occur because of macro
	expansion, but accidental pasting can occur in many places: both before
	and after each macro replacement, each argument replacement, and
	additionally each token created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators.
	</p>
	<p>Look at how the preprocessor gets whitespace output correct
	normally. The <code>cpp_token</code> structure contains a flags byte, and one
	of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and
	indicates that the token was preceded by whitespace of some form other
	than a new line. The stand-alone preprocessor can use this flag to
	decide whether to insert a space between tokens in the output.
	</p>
	<p>Now consider the result of the following macro expansion:
	</p>
	<div class="smallexample">
	<pre class="smallexample">#define add(x, y, z) x + y +z;
	sum = add (1,2, 3);
	→ sum = 1 + 2 +3;
	</pre></div>

	<p>The interesting thing here is that the tokens ‘<samp>1</samp>’ and ‘<samp>2</samp>’ are
	output with a preceding space, and ‘<samp>3</samp>’ is output without a
	preceding space, but when lexed none of these tokens had that property.
	Careful consideration reveals that ‘<samp>1</samp>’ gets its preceding
	whitespace from the space preceding ‘<samp>add</samp>’ in the macro invocation,
	<em>not</em> replacement list. ‘<samp>2</samp>’ gets its whitespace from the
	space preceding the parameter ‘<samp>y</samp>’ in the macro replacement list,
	and ‘<samp>3</samp>’ has no preceding space because parameter ‘<samp>z</samp>’ has none
	in the replacement list.
	</p>
	<p>Once lexed, tokens are effectively fixed and cannot be altered, since
	pointers to them might be held in many places, in particular by
	in-progress macro expansions. So instead of modifying the two tokens
	above, the preprocessor inserts a special token, which I call a
	<em>padding token</em>, into the token stream to indicate that spacing of
	the subsequent token is special. The preprocessor inserts padding
	tokens in front of every macro expansion and expanded macro argument.
	These point to a <em>source token</em> from which the subsequent real token
	should inherit its spacing. In the above example, the source tokens are
	‘<samp>add</samp>’ in the macro invocation, and ‘<samp>y</samp>’ and ‘<samp>z</samp>’ in the
	macro replacement list, respectively.
	</p>
	<p>It is quite easy to get multiple padding tokens in a row, for example if
	a macro’s first replacement token expands straight into another macro.
	</p>
	<div class="smallexample">
	<pre class="smallexample">#define foo bar
	#define bar baz
	[foo]
	→ [baz]
	</pre></div>

	<p>Here, two padding tokens are generated with sources the ‘<samp>foo</samp>’ token
	between the brackets, and the ‘<samp>bar</samp>’ token from foo’s replacement
	list, respectively. Clearly the first padding token is the one to
	use, so the output code should contain a rule that the first
	padding token in a sequence is the one that matters.
	</p>
	<p>But what if a macro expansion is left? Adjusting the above
	example slightly:
	</p>
	<div class="smallexample">
	<pre class="smallexample">#define foo bar
	#define bar EMPTY baz
	#define EMPTY
	[foo] EMPTY;
	→ [ baz] ;
	</pre></div>

	<p>As shown, now there should be a space before ‘<samp>baz</samp>’ and the
	semicolon in the output.
	</p>
	<p>The rules we decided above fail for ‘<samp>baz</samp>’: we generate three
	padding tokens, one per macro invocation, before the token ‘<samp>baz</samp>’.
	We would then have it take its spacing from the first of these, which
	carries source token ‘<samp>foo</samp>’ with no leading space.
	</p>
	<p>It is vital that cpplib get spacing correct in these examples since any
	of these macro expansions could be stringified, where spacing matters.
	</p>
	<p>So, this demonstrates that not just entering macro and argument
	expansions, but leaving them requires special handling too. I made
	cpplib insert a padding token with a <code>NULL</code> source token when
	leaving macro expansions, as well as after each replaced argument in a
	macro’s replacement list. It also inserts appropriate padding tokens on
	either side of tokens created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators.
	I expanded the rule so that, if we see a padding token with a
	<code>NULL</code> source token, <em>and</em> that source token has no leading
	space, then we behave as if we have seen no padding tokens at all. A
	quick check shows this rule will then get the above example correct as
	well.
	</p>
	<p>Now a relationship with paste avoidance is apparent: we have to be
	careful about paste avoidance in exactly the same locations we have
	padding tokens in order to get white space correct. This makes
	implementation of paste avoidance easy: wherever the stand-alone
	preprocessor is fixing up spacing because of padding tokens, and it
	turns out that no space is needed, it has to take the extra step to
	check that a space is not needed after all to avoid an accidental paste.
	The function <code>cpp_avoid_paste</code> advises whether a space is required
	between two consecutive tokens. To avoid excessive spacing, it tries
	hard to only require a space if one is likely to be necessary, but for
	reasons of efficiency it is slightly conservative and might recommend a
	space where one is not strictly needed.
	</p>
	<hr>
	<div class="header">
	<p>
	Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
	</div>



	</body>
	</html>