Language

since C++98

Basic

Escape Sequences

Backslash-prefixed sequences in string and character literals that represent otherwise unrepresentable or syntactically ambiguous characters.

Escape Sequencessince C++98

An escape sequence is a backslash followed by one or more characters within a string or character literal that the compiler translates — at compile time — into a single character value that would otherwise be syntactically reserved, non-printable, or platform-specific.

Overview

Escape sequences are resolved during translation phase 5, before any runtime execution. They appear in character literals ('\n'), narrow string literals ("\n"), and their prefixed variants (u8"\n", u"\n", U"\n", L"\n"). There are three categories: simple sequences with fixed semantics, numeric sequences specifying code points directly, and universal character names for Unicode portability.

Simple Escape Sequences

Sequence	Meaning	ASCII
`\\`	backslash	0x5C
`\'`	single quote	0x27
`\"`	double quote	0x22
`\?`	question mark	0x3F
`\a`	audible bell	0x07
`\b`	backspace	0x08
`\f`	form feed	0x0C
`\n`	line feed (newline)	0x0A
`\r`	carriage return	0x0D
`\t`	horizontal tab	0x09
`\v`	vertical tab	0x0B

\? exists to prevent trigraph interpretation (e.g., ??= would otherwise be tokenised as #). Trigraphs were deprecated in C++14 and removed in C++17, so \? has no relevance in modern code targeting C++17 or later.

The null character (\0) is technically the octal escape sequence with value zero, not a simple escape sequence, but it is universally treated as a named constant.

Numeric Escape Sequences

Octal

cpp

char bell  = '\a';    // simple escape
char bell2 = '\007';  // same value via octal — 1 to 3 digits, each 0–7
char A     = '\101';  // octal 101 = decimal 65 = 'A'

Exactly 1–3 octal digits are consumed. The digit 8 and 9 are not octal digits; '\8' and '\9' are ill-formed.

cpp

char bad = '\9';   // ill-formed
char ok  = '\07';  // BEL, same as '\a'

Hexadecimal

cpp

char A    = '\x41';   // 'A'
char nul  = '\x00';   // null character
char bel  = '\x07';   // audible bell

Unlike octal, the hex form is greedy: it consumes all subsequent hexadecimal characters. When the character immediately after the escape is a hex digit, split the literal explicitly:

cpp

const char* wrong = "\xabcdef";    // one character with value 0xABCDEF — likely UB or truncation
const char* ok    = "\xab" "cdef"; // 0xAB then the string "cdef"
const char* also  = "\x0a" "Blue"; // '\n' then "Blue" — 'B' is hex so split is mandatory

C++23: Delimited Numeric Escapes

C++23 (P2290R3) introduces brace-delimited forms that solve the greedy problem and reject out-of-range values at compile time:

cpp

// C++23
char c = '\o{101}';         // octal — equivalent to '\101'
char d = '\x{41}';          // hex   — equivalent to '\x41'
const char* s = "\x{0a}face"; // unambiguous newline then "face"

char16_t bad = u'\x{FFFFFF}'; // ill-formed: value exceeds char16_t range

Universal Character Names

Universal character names (UCNs) specify Unicode code points in any string or character literal. The encoding used depends on the literal prefix.

cpp

// Available since C++98
char32_t euro    = U'\u20AC';          // U+20AC EURO SIGN — 4 hex digits, BMP only
char32_t snowman = U'\U00002603';      // U+2603 SNOWMAN — 8 hex digits, full Unicode range
const char* utf8_euro = u8"\u20AC";    // encodes U+20AC as UTF-8: 0xE2 0x82 0xAC

wchar_t wide_euro = L'\u20AC';         // platform-dependent encoding

\uNNNN requires exactly 4 hex digits. \UNNNNNNNN requires exactly 8. Both work in char, char8_t, char16_t, char32_t, and wchar_t contexts; the compiler encodes the code point appropriately for the target type.

C++23: Named Character Escapes

C++23 (P2071R2) introduces \N{name} using the official Unicode character database name:

cpp

// C++23
char32_t snowman  = U'\N{SNOWMAN}';              // U+2603
char32_t smiley   = U'\N{GRINNING FACE}';        // U+1F600
const char* euro  = u8"\N{EURO SIGN}";           // UTF-8 encoded €

Names must exactly match the Unicode character database. This is strictly a compile-time feature. Prefer it over raw code points in any context where readability matters.

C++23: Delimited Universal Character Escape

cpp

// C++23
char32_t emoji = U'\u{1F600}';   // avoids the 8-digit \U form for code points above U+FFFF

Raw String Literals (C++11)

When expressions contain many escape sequences, raw string literals eliminate them entirely:

cpp

// C++11
const char* path_esc = "C:\\Users\\alice\\Documents\\notes.txt";
const char* path_raw = R"(C:\Users\alice\Documents\notes.txt)";

const char* json_esc = "{\"key\": \"value\", \"n\": 42}";
const char* json_raw = R"({"key": "value", "n": 42})";

const char* re_esc = "(\\w+)\\s+(\\d{1,3}\\.\\d{1,3})";
const char* re_raw = R"((\w+)\s+(\d{1,3}\.\d{1,3}))";

The syntax is R"delimiter(content)delimiter" where delimiter is any string of up to 16 characters (excluding space, \, (, )). Use a custom delimiter when the content contains )":

cpp

const char* tricky = R"delim(ends with )" here)delim"; // C++11

Best Practices

Prefer '\n' over std::endl in output loops. std::endl flushes the stream buffer; '\n' does not. Flushing on every line is expensive when writing large amounts of output.

cpp

for (const auto& line : lines) {
    std::cout << line << '\n';    // fast
    // std::cout << line << std::endl; // flushes each iteration — avoid
}

Use raw string literals for paths and regular expressions. Double-escaping is error-prone and obscures intent.

Split hex escapes explicitly when the next character is alphanumeric:

cpp

const char* ambiguous = "\x0dA";        // is this 0x0DA or CR + 'A'? — formally CR + 'A', but misleading
const char* clear     = "\x0d" "A";     // unambiguous: CR then 'A'

Common Pitfalls

Windows file paths. The most common production mistake involving escape sequences:

cpp

const char* broken = "C:\new_folder\test.txt";  // \n and \t are escapes — BUG
const char* fixed  = "C:\\new_folder\\test.txt";
const char* clean  = R"(C:\new_folder\test.txt)"; // C++11 — preferred

Null bytes in std::string. std::string can contain embedded null bytes, but the single-argument string literal constructor stops at the first null from strlen's perspective:

cpp

std::string s = "abc\0def";      // .size() == 3 — "def" is silently dropped
std::string t("abc\0def", 7);    // .size() == 7 — explicit length constructor
std::string u = "abc"s;          // using namespace std::string_literals (C++14)

Octal range errors. Digits 8 and 9 are not octal; combining them with \ is ill-formed:

cpp

char c = '\8';   // ill-formed
char d = '\08';  // ill-formed — 0 is valid octal but 8 is not
char e = '\07';  // fine — BEL character

Greedy hex in concatenated literals. Adjacent string literals are each processed independently, so cross-literal hex ambiguity does not occur — but ambiguity within a single literal's fragment still does:

cpp

const char* ok   = "\x61" "b";  // 'a' + 'b' — unambiguous
const char* trap = "\x61b";     // 0x61B, not 'a' + 'b' — single literal, greedy

\r\n vs \n in binary I/O. Text-mode streams on Windows translate \n to \r\n on write. When working with network protocols, binary file formats, or cross-platform data, open streams in binary mode and write \r\n explicitly rather than relying on platform translation.

cpp

// For CRLF-terminated HTTP headers:
std::string header = "HTTP/1.1 200 OK\r\n"
                     "Content-Type: text/plain\r\n"
                     "\r\n";

See Also

String literal prefixes: u8, u, U, L and the s suffix (C++14)
char8_t (C++20), char16_t and char32_t (C++11) — Unicode character types
Raw string literals (C++11): R"(...)"
std::string_literals and operator""s (C++14)