aarch64-linux-gnu-5.1/share/doc/cpp/Tokenization.html - toolchains/linux-x86/gcc - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 <html>
 <!-- Copyright (C) 1987-2015 Free Software Foundation, Inc.

 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
 any later version published by the Free Software Foundation.  A copy of
 the license is included in the
 section entitled "GNU Free Documentation License".

 This manual contains no Invariant Sections.  The Front-Cover Texts are
 (a) (see below), and the Back-Cover Texts are (b) (see below).

 (a) The FSF's Front-Cover Text is:

 A GNU Manual

 (b) The FSF's Back-Cover Text is:

 You have freedom to copy and modify this GNU Manual, like GNU
      software.  Copies published by the Free Software Foundation raise
      funds for GNU development. -->
 <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
 <head>
 <title>The C Preprocessor: Tokenization</title>

 <meta name="description" content="The C Preprocessor: Tokenization">
 <meta name="keywords" content="The C Preprocessor: Tokenization">
 <meta name="resource-type" content="document">
 <meta name="distribution" content="global">
 <meta name="Generator" content="makeinfo">
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <link href="index.html#Top" rel="start" title="Top">
 <link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
 <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
 <link href="Overview.html#Overview" rel="up" title="Overview">
 <link href="The-preprocessing-language.html#The-preprocessing-language" rel="next" title="The preprocessing language">
 <link href="Initial-processing.html#Initial-processing" rel="prev" title="Initial processing">
 <style type="text/css">
 <!--
 a.summary-letter {text-decoration: none}
 blockquote.smallquotation {font-size: smaller}
 div.display {margin-left: 3.2em}
 div.example {margin-left: 3.2em}
 div.indentedblock {margin-left: 3.2em}
 div.lisp {margin-left: 3.2em}
 div.smalldisplay {margin-left: 3.2em}
 div.smallexample {margin-left: 3.2em}
 div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
 div.smalllisp {margin-left: 3.2em}
 kbd {font-style:oblique}
 pre.display {font-family: inherit}
 pre.format {font-family: inherit}
 pre.menu-comment {font-family: serif}
 pre.menu-preformatted {font-family: serif}
 pre.smalldisplay {font-family: inherit; font-size: smaller}
 pre.smallexample {font-size: smaller}
 pre.smallformat {font-family: inherit; font-size: smaller}
 pre.smalllisp {font-size: smaller}
 span.nocodebreak {white-space:nowrap}
 span.nolinebreak {white-space:nowrap}
 span.roman {font-family:serif; font-weight:normal}
 span.sansserif {font-family:sans-serif; font-weight:normal}
 ul.no-bullet {list-style: none}
 -->
 </style>


 </head>

 <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
 <a name="Tokenization"></a>
 <div class="header">
 <p>
 Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="prev">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
 </div>
 <hr>
 <a name="Tokenization-1"></a>
 <h3 class="section">1.3 Tokenization</h3>

 <a name="index-tokens"></a>
 <a name="index-preprocessing-tokens"></a>
 <p>After the textual transformations are finished, the input file is
 converted into a sequence of <em>preprocessing tokens</em>.  These mostly
 correspond to the syntactic tokens used by the C compiler, but there are
 a few differences.  White space separates tokens; it is not itself a
 token of any kind.  Tokens do not have to be separated by white space,
 but it is often necessary to avoid ambiguities.
 </p>
 <p>When faced with a sequence of characters that has more than one possible
 tokenization, the preprocessor is greedy.  It always makes each token,
 starting from the left, as big as possible before moving on to the next
 token.  For instance, <code>a+++++b</code> is interpreted as
 <code>a&nbsp;++&nbsp;++&nbsp;+&nbsp;b<!-- /@w --></code>, not as <code>a&nbsp;++&nbsp;+&nbsp;++&nbsp;b<!-- /@w --></code>, even though the
 latter tokenization could be part of a valid C program and the former
 could not.
 </p>
 <p>Once the input file is broken into tokens, the token boundaries never
 change, except when the &lsquo;<samp>##</samp>&rsquo; preprocessing operator is used to paste
 tokens together.  See <a href="Concatenation.html#Concatenation">Concatenation</a>.  For example,
 </p>
 <div class="smallexample">
 <pre class="smallexample">#define foo() bar
 foo()baz
      &rarr; bar baz
 <em>not</em>
      &rarr; barbaz
 </pre></div>

 <p>The compiler does not re-tokenize the preprocessor&rsquo;s output.  Each
 preprocessing token becomes one compiler token.
 </p>
 <a name="index-identifiers"></a>
 <p>Preprocessing tokens fall into five broad classes: identifiers,
 preprocessing numbers, string literals, punctuators, and other.  An
 <em>identifier</em> is the same as an identifier in C: any sequence of
 letters, digits, or underscores, which begins with a letter or
 underscore.  Keywords of C have no significance to the preprocessor;
 they are ordinary identifiers.  You can define a macro whose name is a
 keyword, for instance.  The only identifier which can be considered a
 preprocessing keyword is <code>defined</code>.  See <a href="Defined.html#Defined">Defined</a>.
 </p>
 <p>This is mostly true of other languages which use the C preprocessor.
 However, a few of the keywords of C++ are significant even in the
 preprocessor.  See <a href="C_002b_002b-Named-Operators.html#C_002b_002b-Named-Operators">C++ Named Operators</a>.
 </p>
 <p>In the 1999 C standard, identifiers may contain letters which are not
 part of the &ldquo;basic source character set&rdquo;, at the implementation&rsquo;s
 discretion (such as accented Latin letters, Greek letters, or Chinese
 ideograms).  This may be done with an extended character set, or the
 &lsquo;<samp>\u</samp>&rsquo; and &lsquo;<samp>\U</samp>&rsquo; escape sequences.  GCC only accepts such
 characters in the &lsquo;<samp>\u</samp>&rsquo; and &lsquo;<samp>\U</samp>&rsquo; forms.
 </p>
 <p>As an extension, GCC treats &lsquo;<samp>$</samp>&rsquo; as a letter.  This is for
 compatibility with some systems, such as VMS, where &lsquo;<samp>$</samp>&rsquo; is commonly
 used in system-defined function and object names.  &lsquo;<samp>$</samp>&rsquo; is not a
 letter in strictly conforming mode, or if you specify the <samp>-$</samp>
 option.  See <a href="Invocation.html#Invocation">Invocation</a>.
 </p>
 <a name="index-numbers"></a>
 <a name="index-preprocessing-numbers"></a>
 <p>A <em>preprocessing number</em> has a rather bizarre definition.  The
 category includes all the normal integer and floating point constants
 one expects of C, but also a number of other things one might not
 initially recognize as a number.  Formally, preprocessing numbers begin
 with an optional period, a required decimal digit, and then continue
 with any sequence of letters, digits, underscores, periods, and
 exponents.  Exponents are the two-character sequences &lsquo;<samp>e+</samp>&rsquo;,
 &lsquo;<samp>e-</samp>&rsquo;, &lsquo;<samp>E+</samp>&rsquo;, &lsquo;<samp>E-</samp>&rsquo;, &lsquo;<samp>p+</samp>&rsquo;, &lsquo;<samp>p-</samp>&rsquo;, &lsquo;<samp>P+</samp>&rsquo;, and
 &lsquo;<samp>P-</samp>&rsquo;.  (The exponents that begin with &lsquo;<samp>p</samp>&rsquo; or &lsquo;<samp>P</samp>&rsquo; are new
 to C99.  They are used for hexadecimal floating-point constants.)
 </p>
 <p>The purpose of this unusual definition is to isolate the preprocessor
 from the full complexity of numeric constants.  It does not have to
 distinguish between lexically valid and invalid floating-point numbers,
 which is complicated.  The definition also permits you to split an
 identifier at any position and get exactly two tokens, which can then be
 pasted back together with the &lsquo;<samp>##</samp>&rsquo; operator.
 </p>
 <p>It&rsquo;s possible for preprocessing numbers to cause programs to be
 misinterpreted.  For example, <code>0xE+12</code> is a preprocessing number
 which does not translate to any valid numeric constant, therefore a
 syntax error.  It does not mean <code>0xE&nbsp;+&nbsp;12<!-- /@w --></code>, which is what you
 might have intended.
 </p>
 <a name="index-string-literals"></a>
 <a name="index-string-constants"></a>
 <a name="index-character-constants"></a>
 <a name="index-header-file-names"></a>
 <p><em>String literals</em> are string constants, character constants, and
 header file names (the argument of &lsquo;<samp>#include</samp>&rsquo;).<a name="DOCF2" href="#FOOT2"><sup>2</sup></a>  String constants and character
 constants are straightforward: <tt>&quot;&hellip;&quot;</tt> or <tt>'&hellip;'</tt>.  In
 either case embedded quotes should be escaped with a backslash:
 <tt>'\''</tt> is the character constant for &lsquo;<samp>'</samp>&rsquo;.  There is no limit on
 the length of a character constant, but the value of a character
 constant that contains more than one character is
 implementation-defined.  See <a href="Implementation-Details.html#Implementation-Details">Implementation Details</a>.
 </p>
 <p>Header file names either look like string constants, <tt>&quot;&hellip;&quot;</tt>, or are
 written with angle brackets instead, <tt>&lt;&hellip;&gt;</tt>.  In either case,
 backslash is an ordinary character.  There is no way to escape the
 closing quote or angle bracket.  The preprocessor looks for the header
 file in different places depending on which form you use.  See <a href="Include-Operation.html#Include-Operation">Include Operation</a>.
 </p>
 <p>No string literal may extend past the end of a line.  Older versions
 of GCC accepted multi-line string constants.  You may use continued
 lines instead, or string constant concatenation.  See <a href="Differences-from-previous-versions.html#Differences-from-previous-versions">Differences from previous versions</a>.
 </p>
 <a name="index-punctuators"></a>
 <a name="index-digraphs"></a>
 <a name="index-alternative-tokens"></a>
 <p><em>Punctuators</em> are all the usual bits of punctuation which are
 meaningful to C and C++.  All but three of the punctuation characters in
 ASCII are C punctuators.  The exceptions are &lsquo;<samp>@</samp>&rsquo;, &lsquo;<samp>$</samp>&rsquo;, and
 &lsquo;<samp>`</samp>&rsquo;.  In addition, all the two- and three-character operators are
 punctuators.  There are also six <em>digraphs</em>, which the C++ standard
 calls <em>alternative tokens</em>, which are merely alternate ways to spell
 other punctuators.  This is a second attempt to work around missing
 punctuation in obsolete systems.  It has no negative side effects,
 unlike trigraphs, but does not cover as much ground.  The digraphs and
 their corresponding normal punctuators are:
 </p>
 <div class="smallexample">
 <pre class="smallexample">Digraph:        &lt;%  %&gt;  &lt;:  :&gt;  %:  %:%:
 Punctuator:      {   }   [   ]   #    ##
 </pre></div>

 <a name="index-other-tokens"></a>
 <p>Any other single character is considered &ldquo;other&rdquo;.  It is passed on to
 the preprocessor&rsquo;s output unmolested.  The C compiler will almost
 certainly reject source code containing &ldquo;other&rdquo; tokens.  In ASCII, the
 only other characters are &lsquo;<samp>@</samp>&rsquo;, &lsquo;<samp>$</samp>&rsquo;, &lsquo;<samp>`</samp>&rsquo;, and control
 characters other than NUL (all bits zero).  (Note that &lsquo;<samp>$</samp>&rsquo; is
 normally considered a letter.)  All characters with the high bit set
 (numeric range 0x7F&ndash;0xFF) are also &ldquo;other&rdquo; in the present
 implementation.  This will change when proper support for international
 character sets is added to GCC.
 </p>
 <p>NUL is a special case because of the high probability that its
 appearance is accidental, and because it may be invisible to the user
 (many terminals do not display NUL at all).  Within comments, NULs are
 silently ignored, just as any other character would be.  In running
 text, NUL is considered white space.  For example, these two directives
 have the same meaning.
 </p>
 <div class="smallexample">
 <pre class="smallexample">#define X^@1
 #define X 1
 </pre></div>

 <p>(where &lsquo;<samp>^@</samp>&rsquo; is ASCII NUL).  Within string or character constants,
 NULs are preserved.  In the latter two cases the preprocessor emits a
 warning message.
 </p>
 <div class="footnote">
 <hr>
 <h4 class="footnotes-heading">Footnotes</h4>

 <h3><a name="FOOT2" href="#DOCF2">(2)</a></h3>
 <p>The C
 standard uses the term <em>string literal</em> to refer only to what we are
 calling <em>string constants</em>.</p>
 </div>
 <hr>
 <div class="header">
 <p>
 Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="prev">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
 </div>


 </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
	<html>
	<!-- Copyright (C) 1987-2015 Free Software Foundation, Inc.

	Permission is granted to copy, distribute and/or modify this document
	under the terms of the GNU Free Documentation License, Version 1.3 or
	any later version published by the Free Software Foundation. A copy of
	the license is included in the
	section entitled "GNU Free Documentation License".

	This manual contains no Invariant Sections. The Front-Cover Texts are
	(a) (see below), and the Back-Cover Texts are (b) (see below).

	(a) The FSF's Front-Cover Text is:

	A GNU Manual

	(b) The FSF's Back-Cover Text is:

	You have freedom to copy and modify this GNU Manual, like GNU
	software. Copies published by the Free Software Foundation raise
	funds for GNU development. -->
	<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
	<head>
	<title>The C Preprocessor: Tokenization</title>

	<meta name="description" content="The C Preprocessor: Tokenization">
	<meta name="keywords" content="The C Preprocessor: Tokenization">
	<meta name="resource-type" content="document">
	<meta name="distribution" content="global">
	<meta name="Generator" content="makeinfo">
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<link href="index.html#Top" rel="start" title="Top">
	<link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
	<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
	<link href="Overview.html#Overview" rel="up" title="Overview">
	<link href="The-preprocessing-language.html#The-preprocessing-language" rel="next" title="The preprocessing language">
	<link href="Initial-processing.html#Initial-processing" rel="prev" title="Initial processing">
	<style type="text/css">
	<!--
	a.summary-letter {text-decoration: none}
	blockquote.smallquotation {font-size: smaller}
	div.display {margin-left: 3.2em}
	div.example {margin-left: 3.2em}
	div.indentedblock {margin-left: 3.2em}
	div.lisp {margin-left: 3.2em}
	div.smalldisplay {margin-left: 3.2em}
	div.smallexample {margin-left: 3.2em}
	div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
	div.smalllisp {margin-left: 3.2em}
	kbd {font-style:oblique}
	pre.display {font-family: inherit}
	pre.format {font-family: inherit}
	pre.menu-comment {font-family: serif}
	pre.menu-preformatted {font-family: serif}
	pre.smalldisplay {font-family: inherit; font-size: smaller}
	pre.smallexample {font-size: smaller}
	pre.smallformat {font-family: inherit; font-size: smaller}
	pre.smalllisp {font-size: smaller}
	span.nocodebreak {white-space:nowrap}
	span.nolinebreak {white-space:nowrap}
	span.roman {font-family:serif; font-weight:normal}
	span.sansserif {font-family:sans-serif; font-weight:normal}
	ul.no-bullet {list-style: none}
	-->
	</style>


	</head>

	<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
	<a name="Tokenization"></a>
	<div class="header">
	<p>
	Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="prev">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
	</div>
	<hr>
	<a name="Tokenization-1"></a>
	<h3 class="section">1.3 Tokenization</h3>

	<a name="index-tokens"></a>
	<a name="index-preprocessing-tokens"></a>
	<p>After the textual transformations are finished, the input file is
	converted into a sequence of <em>preprocessing tokens</em>. These mostly
	correspond to the syntactic tokens used by the C compiler, but there are
	a few differences. White space separates tokens; it is not itself a
	token of any kind. Tokens do not have to be separated by white space,
	but it is often necessary to avoid ambiguities.
	</p>
	<p>When faced with a sequence of characters that has more than one possible
	tokenization, the preprocessor is greedy. It always makes each token,
	starting from the left, as big as possible before moving on to the next
	token. For instance, <code>a+++++b</code> is interpreted as
	<code>a ++ ++ + b<!-- /@w --></code>, not as <code>a ++ + ++ b<!-- /@w --></code>, even though the
	latter tokenization could be part of a valid C program and the former
	could not.
	</p>
	<p>Once the input file is broken into tokens, the token boundaries never
	change, except when the ‘<samp>##</samp>’ preprocessing operator is used to paste
	tokens together. See <a href="Concatenation.html#Concatenation">Concatenation</a>. For example,
	</p>
	<div class="smallexample">
	<pre class="smallexample">#define foo() bar
	foo()baz
	→ bar baz
	<em>not</em>
	→ barbaz
	</pre></div>

	<p>The compiler does not re-tokenize the preprocessor’s output. Each
	preprocessing token becomes one compiler token.
	</p>
	<a name="index-identifiers"></a>
	<p>Preprocessing tokens fall into five broad classes: identifiers,
	preprocessing numbers, string literals, punctuators, and other. An
	<em>identifier</em> is the same as an identifier in C: any sequence of
	letters, digits, or underscores, which begins with a letter or
	underscore. Keywords of C have no significance to the preprocessor;
	they are ordinary identifiers. You can define a macro whose name is a
	keyword, for instance. The only identifier which can be considered a
	preprocessing keyword is <code>defined</code>. See <a href="Defined.html#Defined">Defined</a>.
	</p>
	<p>This is mostly true of other languages which use the C preprocessor.
	However, a few of the keywords of C++ are significant even in the
	preprocessor. See <a href="C_002b_002b-Named-Operators.html#C_002b_002b-Named-Operators">C++ Named Operators</a>.
	</p>
	<p>In the 1999 C standard, identifiers may contain letters which are not
	part of the “basic source character set”, at the implementation’s
	discretion (such as accented Latin letters, Greek letters, or Chinese
	ideograms). This may be done with an extended character set, or the
	‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ escape sequences. GCC only accepts such
	characters in the ‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ forms.
	</p>
	<p>As an extension, GCC treats ‘<samp>$</samp>’ as a letter. This is for
	compatibility with some systems, such as VMS, where ‘<samp>$</samp>’ is commonly
	used in system-defined function and object names. ‘<samp>$</samp>’ is not a
	letter in strictly conforming mode, or if you specify the <samp>-$</samp>
	option. See <a href="Invocation.html#Invocation">Invocation</a>.
	</p>
	<a name="index-numbers"></a>
	<a name="index-preprocessing-numbers"></a>
	<p>A <em>preprocessing number</em> has a rather bizarre definition. The
	category includes all the normal integer and floating point constants
	one expects of C, but also a number of other things one might not
	initially recognize as a number. Formally, preprocessing numbers begin
	with an optional period, a required decimal digit, and then continue
	with any sequence of letters, digits, underscores, periods, and
	exponents. Exponents are the two-character sequences ‘<samp>e+</samp>’,
	‘<samp>e-</samp>’, ‘<samp>E+</samp>’, ‘<samp>E-</samp>’, ‘<samp>p+</samp>’, ‘<samp>p-</samp>’, ‘<samp>P+</samp>’, and
	‘<samp>P-</samp>’. (The exponents that begin with ‘<samp>p</samp>’ or ‘<samp>P</samp>’ are new
	to C99. They are used for hexadecimal floating-point constants.)
	</p>
	<p>The purpose of this unusual definition is to isolate the preprocessor
	from the full complexity of numeric constants. It does not have to
	distinguish between lexically valid and invalid floating-point numbers,
	which is complicated. The definition also permits you to split an
	identifier at any position and get exactly two tokens, which can then be
	pasted back together with the ‘<samp>##</samp>’ operator.
	</p>
	<p>It’s possible for preprocessing numbers to cause programs to be
	misinterpreted. For example, <code>0xE+12</code> is a preprocessing number
	which does not translate to any valid numeric constant, therefore a
	syntax error. It does not mean <code>0xE + 12<!-- /@w --></code>, which is what you
	might have intended.
	</p>
	<a name="index-string-literals"></a>
	<a name="index-string-constants"></a>
	<a name="index-character-constants"></a>
	<a name="index-header-file-names"></a>
	<p><em>String literals</em> are string constants, character constants, and
	header file names (the argument of ‘<samp>#include</samp>’).<a name="DOCF2" href="#FOOT2"><sup>2</sup></a> String constants and character
	constants are straightforward: <tt>"…"</tt> or <tt>'…'</tt>. In
	either case embedded quotes should be escaped with a backslash:
	<tt>'\''</tt> is the character constant for ‘<samp>'</samp>’. There is no limit on
	the length of a character constant, but the value of a character
	constant that contains more than one character is
	implementation-defined. See <a href="Implementation-Details.html#Implementation-Details">Implementation Details</a>.
	</p>
	<p>Header file names either look like string constants, <tt>"…"</tt>, or are
	written with angle brackets instead, <tt><…></tt>. In either case,
	backslash is an ordinary character. There is no way to escape the
	closing quote or angle bracket. The preprocessor looks for the header
	file in different places depending on which form you use. See <a href="Include-Operation.html#Include-Operation">Include Operation</a>.
	</p>
	<p>No string literal may extend past the end of a line. Older versions
	of GCC accepted multi-line string constants. You may use continued
	lines instead, or string constant concatenation. See <a href="Differences-from-previous-versions.html#Differences-from-previous-versions">Differences from previous versions</a>.
	</p>
	<a name="index-punctuators"></a>
	<a name="index-digraphs"></a>
	<a name="index-alternative-tokens"></a>
	<p><em>Punctuators</em> are all the usual bits of punctuation which are
	meaningful to C and C++. All but three of the punctuation characters in
	ASCII are C punctuators. The exceptions are ‘<samp>@</samp>’, ‘<samp>$</samp>’, and
	‘<samp>`</samp>’. In addition, all the two- and three-character operators are
	punctuators. There are also six <em>digraphs</em>, which the C++ standard
	calls <em>alternative tokens</em>, which are merely alternate ways to spell
	other punctuators. This is a second attempt to work around missing
	punctuation in obsolete systems. It has no negative side effects,
	unlike trigraphs, but does not cover as much ground. The digraphs and
	their corresponding normal punctuators are:
	</p>
	<div class="smallexample">
	<pre class="smallexample">Digraph: <% %> <: :> %: %:%:
	Punctuator: { } [ ] # ##
	</pre></div>

	<a name="index-other-tokens"></a>
	<p>Any other single character is considered “other”. It is passed on to
	the preprocessor’s output unmolested. The C compiler will almost
	certainly reject source code containing “other” tokens. In ASCII, the
	only other characters are ‘<samp>@</samp>’, ‘<samp>$</samp>’, ‘<samp>`</samp>’, and control
	characters other than NUL (all bits zero). (Note that ‘<samp>$</samp>’ is
	normally considered a letter.) All characters with the high bit set
	(numeric range 0x7F–0xFF) are also “other” in the present
	implementation. This will change when proper support for international
	character sets is added to GCC.
	</p>
	<p>NUL is a special case because of the high probability that its
	appearance is accidental, and because it may be invisible to the user
	(many terminals do not display NUL at all). Within comments, NULs are
	silently ignored, just as any other character would be. In running
	text, NUL is considered white space. For example, these two directives
	have the same meaning.
	</p>
	<div class="smallexample">
	<pre class="smallexample">#define X^@1
	#define X 1
	</pre></div>

	<p>(where ‘<samp>^@</samp>’ is ASCII NUL). Within string or character constants,
	NULs are preserved. In the latter two cases the preprocessor emits a
	warning message.
	</p>
	<div class="footnote">
	<hr>
	<h4 class="footnotes-heading">Footnotes</h4>

	<h3><a name="FOOT2" href="#DOCF2">(2)</a></h3>
	<p>The C
	standard uses the term <em>string literal</em> to refer only to what we are
	calling <em>string constants</em>.</p>
	</div>
	<hr>
	<div class="header">
	<p>
	Next: <a href="The-preprocessing-language.html#The-preprocessing-language" accesskey="n" rel="next">The preprocessing language</a>, Previous: <a href="Initial-processing.html#Initial-processing" accesskey="p" rel="prev">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
	</div>



	</body>
	</html>