aarch64-linux-gnu-5.1/share/doc/cpp/Character-sets.html - toolchains/linux-x86/gcc - Git at Google

 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 <html>
 <!-- Copyright (C) 1987-2015 Free Software Foundation, Inc.

 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
 any later version published by the Free Software Foundation.  A copy of
 the license is included in the
 section entitled "GNU Free Documentation License".

 This manual contains no Invariant Sections.  The Front-Cover Texts are
 (a) (see below), and the Back-Cover Texts are (b) (see below).

 (a) The FSF's Front-Cover Text is:

 A GNU Manual

 (b) The FSF's Back-Cover Text is:

 You have freedom to copy and modify this GNU Manual, like GNU
      software.  Copies published by the Free Software Foundation raise
      funds for GNU development. -->
 <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
 <head>
 <title>The C Preprocessor: Character sets</title>

 <meta name="description" content="The C Preprocessor: Character sets">
 <meta name="keywords" content="The C Preprocessor: Character sets">
 <meta name="resource-type" content="document">
 <meta name="distribution" content="global">
 <meta name="Generator" content="makeinfo">
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <link href="index.html#Top" rel="start" title="Top">
 <link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
 <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
 <link href="Overview.html#Overview" rel="up" title="Overview">
 <link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing">
 <link href="Overview.html#Overview" rel="prev" title="Overview">
 <style type="text/css">
 <!--
 a.summary-letter {text-decoration: none}
 blockquote.smallquotation {font-size: smaller}
 div.display {margin-left: 3.2em}
 div.example {margin-left: 3.2em}
 div.indentedblock {margin-left: 3.2em}
 div.lisp {margin-left: 3.2em}
 div.smalldisplay {margin-left: 3.2em}
 div.smallexample {margin-left: 3.2em}
 div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
 div.smalllisp {margin-left: 3.2em}
 kbd {font-style:oblique}
 pre.display {font-family: inherit}
 pre.format {font-family: inherit}
 pre.menu-comment {font-family: serif}
 pre.menu-preformatted {font-family: serif}
 pre.smalldisplay {font-family: inherit; font-size: smaller}
 pre.smallexample {font-size: smaller}
 pre.smallformat {font-family: inherit; font-size: smaller}
 pre.smalllisp {font-size: smaller}
 span.nocodebreak {white-space:nowrap}
 span.nolinebreak {white-space:nowrap}
 span.roman {font-family:serif; font-weight:normal}
 span.sansserif {font-family:sans-serif; font-weight:normal}
 ul.no-bullet {list-style: none}
 -->
 </style>


 </head>

 <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
 <a name="Character-sets"></a>
 <div class="header">
 <p>
 Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
 </div>
 <hr>
 <a name="Character-sets-1"></a>
 <h3 class="section">1.1 Character sets</h3>

 <p>Source code character set processing in C and related languages is
 rather complicated.  The C standard discusses two character sets, but
 there are really at least four.
 </p>
 <p>The files input to CPP might be in any character set at all.  CPP&rsquo;s
 very first action, before it even looks for line boundaries, is to
 convert the file into the character set it uses for internal
 processing.  That set is what the C standard calls the <em>source</em>
 character set.  It must be isomorphic with ISO 10646, also known as
 Unicode.  CPP uses the UTF-8 encoding of Unicode.
 </p>
 <p>The character sets of the input files are specified using the
 <samp>-finput-charset=</samp> option.
 </p>
 <p>All preprocessing work (the subject of the rest of this manual) is
 carried out in the source character set.  If you request textual
 output from the preprocessor with the <samp>-E</samp> option, it will be
 in UTF-8.
 </p>
 <p>After preprocessing is complete, string and character constants are
 converted again, into the <em>execution</em> character set.  This
 character set is under control of the user; the default is UTF-8,
 matching the source character set.  Wide string and character
 constants have their own character set, which is not called out
 specifically in the standard.  Again, it is under control of the user.
 The default is UTF-16 or UTF-32, whichever fits in the target&rsquo;s
 <code>wchar_t</code> type, in the target machine&rsquo;s byte
 order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a>  Octal and hexadecimal escape sequences do not undergo
 conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently
 selected execution character set.  All other escapes are replaced by
 the character in the source character set that they represent, then
 converted to the execution character set, just like unescaped
 characters.
 </p>
 <p>In identifiers, characters outside the ASCII range can only be
 specified with the &lsquo;<samp>\u</samp>&rsquo; and &lsquo;<samp>\U</samp>&rsquo; escapes, not used
 directly.  If strict ISO C90 conformance is specified with an option
 such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is
 used, then those escapes are not permitted in identifiers.
 </p>
 <div class="footnote">
 <hr>
 <h4 class="footnotes-heading">Footnotes</h4>

 <h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
 <p>UTF-16 does not meet the requirements of the C
 standard for a wide character set, but the choice of 16-bit
 <code>wchar_t</code> is enshrined in some system ABIs so we cannot fix
 this.</p>
 </div>
 <hr>
 <div class="header">
 <p>
 Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
 </div>


 </body>
 </html>
	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
	<html>
	<!-- Copyright (C) 1987-2015 Free Software Foundation, Inc.

	Permission is granted to copy, distribute and/or modify this document
	under the terms of the GNU Free Documentation License, Version 1.3 or
	any later version published by the Free Software Foundation. A copy of
	the license is included in the
	section entitled "GNU Free Documentation License".

	This manual contains no Invariant Sections. The Front-Cover Texts are
	(a) (see below), and the Back-Cover Texts are (b) (see below).

	(a) The FSF's Front-Cover Text is:

	A GNU Manual

	(b) The FSF's Back-Cover Text is:

	You have freedom to copy and modify this GNU Manual, like GNU
	software. Copies published by the Free Software Foundation raise
	funds for GNU development. -->
	<!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ -->
	<head>
	<title>The C Preprocessor: Character sets</title>

	<meta name="description" content="The C Preprocessor: Character sets">
	<meta name="keywords" content="The C Preprocessor: Character sets">
	<meta name="resource-type" content="document">
	<meta name="distribution" content="global">
	<meta name="Generator" content="makeinfo">
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<link href="index.html#Top" rel="start" title="Top">
	<link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
	<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
	<link href="Overview.html#Overview" rel="up" title="Overview">
	<link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing">
	<link href="Overview.html#Overview" rel="prev" title="Overview">
	<style type="text/css">
	<!--
	a.summary-letter {text-decoration: none}
	blockquote.smallquotation {font-size: smaller}
	div.display {margin-left: 3.2em}
	div.example {margin-left: 3.2em}
	div.indentedblock {margin-left: 3.2em}
	div.lisp {margin-left: 3.2em}
	div.smalldisplay {margin-left: 3.2em}
	div.smallexample {margin-left: 3.2em}
	div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
	div.smalllisp {margin-left: 3.2em}
	kbd {font-style:oblique}
	pre.display {font-family: inherit}
	pre.format {font-family: inherit}
	pre.menu-comment {font-family: serif}
	pre.menu-preformatted {font-family: serif}
	pre.smalldisplay {font-family: inherit; font-size: smaller}
	pre.smallexample {font-size: smaller}
	pre.smallformat {font-family: inherit; font-size: smaller}
	pre.smalllisp {font-size: smaller}
	span.nocodebreak {white-space:nowrap}
	span.nolinebreak {white-space:nowrap}
	span.roman {font-family:serif; font-weight:normal}
	span.sansserif {font-family:sans-serif; font-weight:normal}
	ul.no-bullet {list-style: none}
	-->
	</style>


	</head>

	<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
	<a name="Character-sets"></a>
	<div class="header">
	<p>
	Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
	</div>
	<hr>
	<a name="Character-sets-1"></a>
	<h3 class="section">1.1 Character sets</h3>

	<p>Source code character set processing in C and related languages is
	rather complicated. The C standard discusses two character sets, but
	there are really at least four.
	</p>
	<p>The files input to CPP might be in any character set at all. CPP’s
	very first action, before it even looks for line boundaries, is to
	convert the file into the character set it uses for internal
	processing. That set is what the C standard calls the <em>source</em>
	character set. It must be isomorphic with ISO 10646, also known as
	Unicode. CPP uses the UTF-8 encoding of Unicode.
	</p>
	<p>The character sets of the input files are specified using the
	<samp>-finput-charset=</samp> option.
	</p>
	<p>All preprocessing work (the subject of the rest of this manual) is
	carried out in the source character set. If you request textual
	output from the preprocessor with the <samp>-E</samp> option, it will be
	in UTF-8.
	</p>
	<p>After preprocessing is complete, string and character constants are
	converted again, into the <em>execution</em> character set. This
	character set is under control of the user; the default is UTF-8,
	matching the source character set. Wide string and character
	constants have their own character set, which is not called out
	specifically in the standard. Again, it is under control of the user.
	The default is UTF-16 or UTF-32, whichever fits in the target’s
	<code>wchar_t</code> type, in the target machine’s byte
	order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a> Octal and hexadecimal escape sequences do not undergo
	conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently
	selected execution character set. All other escapes are replaced by
	the character in the source character set that they represent, then
	converted to the execution character set, just like unescaped
	characters.
	</p>
	<p>In identifiers, characters outside the ASCII range can only be
	specified with the ‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ escapes, not used
	directly. If strict ISO C90 conformance is specified with an option
	such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is
	used, then those escapes are not permitted in identifiers.
	</p>
	<div class="footnote">
	<hr>
	<h4 class="footnotes-heading">Footnotes</h4>

	<h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
	<p>UTF-16 does not meet the requirements of the C
	standard for a wide character set, but the choice of 16-bit
	<code>wchar_t</code> is enshrined in some system ABIs so we cannot fix
	this.</p>
	</div>
	<hr>
	<div class="header">
	<p>
	Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
	</div>



	</body>
	</html>