Escape Sequences
Backslash-prefixed sequences in string and character literals that represent otherwise unrepresentable or syntactically ambiguous characters.
Escape Sequencessince C++98An escape sequence is a backslash followed by one or more characters within a string or character literal that the compiler translates β at compile time β into a single character value that would otherwise be syntactically reserved, non-printable, or platform-specific.
Overview
Escape sequences are resolved during translation phase 5, before any runtime execution. They appear in character literals ('\n'), narrow string literals ("\n"), and their prefixed variants (u8"\n", u"\n", U"\n", L"\n"). There are three categories: simple sequences with fixed semantics, numeric sequences specifying code points directly, and universal character names for Unicode portability.
Simple Escape Sequences
| Sequence | Meaning | ASCII |
|---|---|---|
\\ | backslash | 0x5C |
\' | single quote | 0x27 |
\" | double quote | 0x22 |
\? | question mark | 0x3F |
\a | audible bell | 0x07 |
\b | backspace | 0x08 |
\f | form feed | 0x0C |
\n | line feed (newline) | 0x0A |
\r | carriage return | 0x0D |
\t | horizontal tab | 0x09 |
\v | vertical tab | 0x0B |
\? exists to prevent trigraph interpretation (e.g., ??= would otherwise be tokenised as #). Trigraphs were deprecated in C++14 and removed in C++17, so \? has no relevance in modern code targeting C++17 or later.
The null character (\0) is technically the octal escape sequence with value zero, not a simple escape sequence, but it is universally treated as a named constant.
Numeric Escape Sequences
Octal
char bell = '\a'; // simple escape
char bell2 = '\007'; // same value via octal β 1 to 3 digits, each 0β7
char A = '\101'; // octal 101 = decimal 65 = 'A'Exactly 1β3 octal digits are consumed. The digit 8 and 9 are not octal digits; '\8' and '\9' are ill-formed.
char bad = '\9'; // ill-formed
char ok = '\07'; // BEL, same as '\a'Hexadecimal
char A = '\x41'; // 'A'
char nul = '\x00'; // null character
char bel = '\x07'; // audible bellUnlike octal, the hex form is greedy: it consumes all subsequent hexadecimal characters. When the character immediately after the escape is a hex digit, split the literal explicitly:
const char* wrong = "\xabcdef"; // one character with value 0xABCDEF β likely UB or truncation
const char* ok = "\xab" "cdef"; // 0xAB then the string "cdef"
const char* also = "\x0a" "Blue"; // '\n' then "Blue" β 'B' is hex so split is mandatoryC++23: Delimited Numeric Escapes
C++23 (P2290R3) introduces brace-delimited forms that solve the greedy problem and reject out-of-range values at compile time:
// C++23
char c = '\o{101}'; // octal β equivalent to '\101'
char d = '\x{41}'; // hex β equivalent to '\x41'
const char* s = "\x{0a}face"; // unambiguous newline then "face"
char16_t bad = u'\x{FFFFFF}'; // ill-formed: value exceeds char16_t rangeUniversal Character Names
Universal character names (UCNs) specify Unicode code points in any string or character literal. The encoding used depends on the literal prefix.
// Available since C++98
char32_t euro = U'\u20AC'; // U+20AC EURO SIGN β 4 hex digits, BMP only
char32_t snowman = U'\U00002603'; // U+2603 SNOWMAN β 8 hex digits, full Unicode range
const char* utf8_euro = u8"\u20AC"; // encodes U+20AC as UTF-8: 0xE2 0x82 0xAC
wchar_t wide_euro = L'\u20AC'; // platform-dependent encoding\uNNNN requires exactly 4 hex digits. \UNNNNNNNN requires exactly 8. Both work in char, char8_t, char16_t, char32_t, and wchar_t contexts; the compiler encodes the code point appropriately for the target type.
C++23: Named Character Escapes
C++23 (P2071R2) introduces \N{name} using the official Unicode character database name:
// C++23
char32_t snowman = U'\N{SNOWMAN}'; // U+2603
char32_t smiley = U'\N{GRINNING FACE}'; // U+1F600
const char* euro = u8"\N{EURO SIGN}"; // UTF-8 encoded β¬Names must exactly match the Unicode character database. This is strictly a compile-time feature. Prefer it over raw code points in any context where readability matters.
C++23: Delimited Universal Character Escape
// C++23
char32_t emoji = U'\u{1F600}'; // avoids the 8-digit \U form for code points above U+FFFFRaw String Literals (C++11)
When expressions contain many escape sequences, raw string literals eliminate them entirely:
// C++11
const char* path_esc = "C:\\Users\\alice\\Documents\\notes.txt";
const char* path_raw = R"(C:\Users\alice\Documents\notes.txt)";
const char* json_esc = "{\"key\": \"value\", \"n\": 42}";
const char* json_raw = R"({"key": "value", "n": 42})";
const char* re_esc = "(\\w+)\\s+(\\d{1,3}\\.\\d{1,3})";
const char* re_raw = R"((\w+)\s+(\d{1,3}\.\d{1,3}))";The syntax is R"delimiter(content)delimiter" where delimiter is any string of up to 16 characters (excluding space, \, (, )). Use a custom delimiter when the content contains )":
const char* tricky = R"delim(ends with )" here)delim"; // C++11Best Practices
Prefer '\n' over std::endl in output loops. std::endl flushes the stream buffer; '\n' does not. Flushing on every line is expensive when writing large amounts of output.
for (const auto& line : lines) {
std::cout << line << '\n'; // fast
// std::cout << line << std::endl; // flushes each iteration β avoid
}Use raw string literals for paths and regular expressions. Double-escaping is error-prone and obscures intent.
Split hex escapes explicitly when the next character is alphanumeric:
const char* ambiguous = "\x0dA"; // is this 0x0DA or CR + 'A'? β formally CR + 'A', but misleading
const char* clear = "\x0d" "A"; // unambiguous: CR then 'A'Common Pitfalls
Windows file paths. The most common production mistake involving escape sequences:
const char* broken = "C:\new_folder\test.txt"; // \n and \t are escapes β BUG
const char* fixed = "C:\\new_folder\\test.txt";
const char* clean = R"(C:\new_folder\test.txt)"; // C++11 β preferredNull bytes in std::string. std::string can contain embedded null bytes, but the single-argument string literal constructor stops at the first null from strlen's perspective:
std::string s = "abc\0def"; // .size() == 3 β "def" is silently dropped
std::string t("abc\0def", 7); // .size() == 7 β explicit length constructor
std::string u = "abc"s; // using namespace std::string_literals (C++14)Octal range errors. Digits 8 and 9 are not octal; combining them with \ is ill-formed:
char c = '\8'; // ill-formed
char d = '\08'; // ill-formed β 0 is valid octal but 8 is not
char e = '\07'; // fine β BEL characterGreedy hex in concatenated literals. Adjacent string literals are each processed independently, so cross-literal hex ambiguity does not occur β but ambiguity within a single literal's fragment still does:
const char* ok = "\x61" "b"; // 'a' + 'b' β unambiguous
const char* trap = "\x61b"; // 0x61B, not 'a' + 'b' β single literal, greedy\r\n vs \n in binary I/O. Text-mode streams on Windows translate \n to \r\n on write. When working with network protocols, binary file formats, or cross-platform data, open streams in binary mode and write \r\n explicitly rather than relying on platform translation.
// For CRLF-terminated HTTP headers:
std::string header = "HTTP/1.1 200 OK\r\n"
"Content-Type: text/plain\r\n"
"\r\n";See Also
- String literal prefixes:
u8,u,U,Land thessuffix (C++14) char8_t(C++20),char16_tandchar32_t(C++11) β Unicode character types- Raw string literals (C++11):
R"(...)" std::string_literalsandoperator""s(C++14)