| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
| <html> |
| <!-- Copyright (C) 1987-2015 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.3 or |
| any later version published by the Free Software Foundation. A copy of |
| the license is included in the |
| section entitled "GNU Free Documentation License". |
| |
| This manual contains no Invariant Sections. The Front-Cover Texts are |
| (a) (see below), and the Back-Cover Texts are (b) (see below). |
| |
| (a) The FSF's Front-Cover Text is: |
| |
| A GNU Manual |
| |
| (b) The FSF's Back-Cover Text is: |
| |
| You have freedom to copy and modify this GNU Manual, like GNU |
| software. Copies published by the Free Software Foundation raise |
| funds for GNU development. --> |
| <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ --> |
| <head> |
| <title>The C Preprocessor: Character sets</title> |
| |
| <meta name="description" content="The C Preprocessor: Character sets"> |
| <meta name="keywords" content="The C Preprocessor: Character sets"> |
| <meta name="resource-type" content="document"> |
| <meta name="distribution" content="global"> |
| <meta name="Generator" content="makeinfo"> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> |
| <link href="index.html#Top" rel="start" title="Top"> |
| <link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives"> |
| <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> |
| <link href="Overview.html#Overview" rel="up" title="Overview"> |
| <link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing"> |
| <link href="Overview.html#Overview" rel="prev" title="Overview"> |
| <style type="text/css"> |
| <!-- |
| a.summary-letter {text-decoration: none} |
| blockquote.smallquotation {font-size: smaller} |
| div.display {margin-left: 3.2em} |
| div.example {margin-left: 3.2em} |
| div.indentedblock {margin-left: 3.2em} |
| div.lisp {margin-left: 3.2em} |
| div.smalldisplay {margin-left: 3.2em} |
| div.smallexample {margin-left: 3.2em} |
| div.smallindentedblock {margin-left: 3.2em; font-size: smaller} |
| div.smalllisp {margin-left: 3.2em} |
| kbd {font-style:oblique} |
| pre.display {font-family: inherit} |
| pre.format {font-family: inherit} |
| pre.menu-comment {font-family: serif} |
| pre.menu-preformatted {font-family: serif} |
| pre.smalldisplay {font-family: inherit; font-size: smaller} |
| pre.smallexample {font-size: smaller} |
| pre.smallformat {font-family: inherit; font-size: smaller} |
| pre.smalllisp {font-size: smaller} |
| span.nocodebreak {white-space:nowrap} |
| span.nolinebreak {white-space:nowrap} |
| span.roman {font-family:serif; font-weight:normal} |
| span.sansserif {font-family:sans-serif; font-weight:normal} |
| ul.no-bullet {list-style: none} |
| --> |
| </style> |
| |
| |
| </head> |
| |
| <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> |
| <a name="Character-sets"></a> |
| <div class="header"> |
| <p> |
| Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p> |
| </div> |
| <hr> |
| <a name="Character-sets-1"></a> |
| <h3 class="section">1.1 Character sets</h3> |
| |
| <p>Source code character set processing in C and related languages is |
| rather complicated. The C standard discusses two character sets, but |
| there are really at least four. |
| </p> |
| <p>The files input to CPP might be in any character set at all. CPP’s |
| very first action, before it even looks for line boundaries, is to |
| convert the file into the character set it uses for internal |
| processing. That set is what the C standard calls the <em>source</em> |
| character set. It must be isomorphic with ISO 10646, also known as |
| Unicode. CPP uses the UTF-8 encoding of Unicode. |
| </p> |
| <p>The character sets of the input files are specified using the |
| <samp>-finput-charset=</samp> option. |
| </p> |
| <p>All preprocessing work (the subject of the rest of this manual) is |
| carried out in the source character set. If you request textual |
| output from the preprocessor with the <samp>-E</samp> option, it will be |
| in UTF-8. |
| </p> |
| <p>After preprocessing is complete, string and character constants are |
| converted again, into the <em>execution</em> character set. This |
| character set is under control of the user; the default is UTF-8, |
| matching the source character set. Wide string and character |
| constants have their own character set, which is not called out |
| specifically in the standard. Again, it is under control of the user. |
| The default is UTF-16 or UTF-32, whichever fits in the target’s |
| <code>wchar_t</code> type, in the target machine’s byte |
| order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a> Octal and hexadecimal escape sequences do not undergo |
| conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently |
| selected execution character set. All other escapes are replaced by |
| the character in the source character set that they represent, then |
| converted to the execution character set, just like unescaped |
| characters. |
| </p> |
| <p>In identifiers, characters outside the ASCII range can only be |
| specified with the ‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ escapes, not used |
| directly. If strict ISO C90 conformance is specified with an option |
| such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is |
| used, then those escapes are not permitted in identifiers. |
| </p> |
| <div class="footnote"> |
| <hr> |
| <h4 class="footnotes-heading">Footnotes</h4> |
| |
| <h3><a name="FOOT1" href="#DOCF1">(1)</a></h3> |
| <p>UTF-16 does not meet the requirements of the C |
| standard for a wide character set, but the choice of 16-bit |
| <code>wchar_t</code> is enshrined in some system ABIs so we cannot fix |
| this.</p> |
| </div> |
| <hr> |
| <div class="header"> |
| <p> |
| Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p> |
| </div> |
| |
| |
| |
| </body> |
| </html> |