Wide Strings
Character and string types for wide character sets, including wchar_t, std::wstring, and Unicode-aware character types for international text.
Wide Stringssince C++98Strings using wide character types (wchar_t or Unicode-aware char8_t, char16_t, char32_t) to represent character sets that don't fit in a single byte, particularly for international scripts and Unicode text.
Overview
Wide strings provide support for character sets larger than ASCII, such as Japanese kanji, Cyrillic, and other international scripts. The fundamental wide character type is wchar_t, an implementation-defined integral type typically 2–4 bytes. The Standard Library defines std::wstring as a specialization of std::basic_string<wchar_t>.
Since C++11, character types with explicit Unicode encoding semantics have been added to address wchar_t's portability limitations:
char8_t(C++20): UTF-8;std::u8stringchar16_t(C++11): UTF-16;std::u16stringchar32_t(C++11): UTF-32;std::u32string
These types have guaranteed sizes across all platforms and clearly specify their encoding, making them preferred for portable Unicode handling. Legacy wchar_t varies: Windows typically uses 2 bytes; POSIX systems often use 4 bytes, leading to portability issues when storing Unicode characters outside the basic multilingual plane.
Syntax
Character and String Literals
Wide character literals and wide string literals use the L prefix:
wchar_t ch = L'Ω'; // wide character literal
std::wstring str = L"Москва"; // wide string literalC++11 and later introduce prefixes for explicit Unicode encoding:
auto u8str = u8"hello"; // std::u8string (C++20: char8_t) or std::string (C++11-17)
auto u16str = u"Москва"; // std::u16string, char16_t
auto u32str = U"hello"; // std::u32string, char32_tString Type Definitions
#include <string>
std::wstring ws; // wchar_t-based string
std::wstring ws2 = L"wide string";
std::u8string u8s = u8"UTF-8"; // C++20: char8_t; C++11-17: char
std::u16string u16s = u"UTF-16"; // char16_t (C++11)
std::u32string u32s = U"UTF-32"; // char32_t (C++11)String Views
Since C++17, non-owning view types are available:
#include <string_view>
std::wstring_view wsv = L"wide"; // wchar_t
std::u16string_view u16sv = u"UTF-16"; // char16_t
std::u32string_view u32sv = U"UTF-32"; // char32_t
std::u8string_view u8sv = u8"UTF-8"; // C++20: char8_tExamples
Basic Wide String Operations
#include <string>
#include <iostream>
int main() {
std::wstring greeting = L"こんにちは"; // Japanese "hello"
std::wstring name = L"世界"; // "world"
std::wstring message = greeting + L" " + name;
std::wcout << message << std::endl;
return 0;
}Numeric Conversions with Wide Strings
#include <string>
#include <iostream>
int main() {
// to_wstring: convert numeric types to wide string (C++11)
int value = 42;
double pi = 3.14159;
std::wstring wstr_int = std::to_wstring(value); // L"42"
std::wstring wstr_double = std::to_wstring(pi); // L"3.141590"
// Reverse: string to number (C++11)
std::wstring wnum = L"123";
int result = std::stoi(wnum); // 123
double dval = std::stod(L"3.14"); // 3.14
return 0;
}Wide String Streams
#include <sstream>
#include <iostream>
int main() {
std::wstringstream wss;
wss << L"Temperature: " << 25 << L"°C";
std::wstring result = wss.str();
std::wcout << result << std::endl; // Wide character output
return 0;
}UTF-32 for Complete Unicode Coverage
#include <string>
#include <iostream>
int main() {
// UTF-32 can represent any Unicode code point directly
std::u32string emojis = U"🌍🌎🌏"; // All representable in single char32_t
for (char32_t ch : emojis) {
// Each iteration: one complete Unicode character
}
// Equivalent with wchar_t on some platforms may require surrogate pairs
// or fail entirely, depending on platform and locale
return 0;
}String View for Zero-Copy Passing
#include <string_view>
void process_unicode(std::u32string_view sv) {
// No allocation, no copy
for (char32_t ch : sv) {
// Process each code point
}
}
int main() {
std::u32string text = U"Hello";
process_unicode(text); // Implicit conversion to string_view
return 0;
}Best Practices
-
Prefer Fixed-Size Unicode Types for New Code: Use
std::u8string,std::u16string, orstd::u32stringinstead ofstd::wstring. They are portable, explicitly specify encoding, and have guaranteed sizes. Reservestd::wstringonly for APIs that mandatewchar_t(e.g., Windows Unicode APIs). -
Use String Views for Function Parameters: Pass wide strings as
std::u32string_vieworstd::wstring_view(C++17+) to eliminate copy overhead and bind temporaries. -
Choose the Right Encoding: Use UTF-32 (
std::u32string) for algorithms that iterate or index characters by code point; use UTF-8 (std::u8string, C++20) for storage and network transmission; use UTF-16 only when interfacing with platform APIs that require it. -
Be Explicit with Locale: Character classification functions (
iswalpha,iswupper) are locale-dependent. For consistent results across platforms, use Unicode property algorithms from a dedicated library (e.g., ICU) or avoid implicit locale dependence. -
Handle Conversion Explicitly: When converting between encodings or narrow/wide, use explicit constructors or range algorithms. Never silently truncate or mix without conversion.
Common Pitfalls
-
Assuming wchar_t Portability: The size and encoding of
wchar_tare implementation-defined. Code that works on Windows (typically 2 bytes, UCS-2 semantics) may fail on Linux (typically 4 bytes, full Unicode). Use fixed-size types for portable code. -
Silent Character Loss: Assigning a narrow string to a wide string without explicit conversion truncates non-ASCII characters. Always use explicit constructors:
std::wstring(str.begin(), str.end()). -
Surrogate Pair Handling: On systems where
wchar_tis 2 bytes, characters outside the basic multilingual plane (e.g., emojis, rare scripts) require surrogate pairs. String indexing and iteration become unsafe. UTF-32 avoids this entirely. -
Incomplete Locale Support: Character classification and collation depend on the active locale. The same character may classify differently on different systems. Never assume classification is deterministic without fixing the locale.
-
Performance Regression: Wide strings consume 2–4× memory compared to UTF-8. Operations like comparison and iteration are slower. Use wide strings only when necessary, and prefer UTF-8 for storage.
See Also
- String Type —
std::basic_stringtemplate and its specializations - String Conversions —
std::to_wstring(),std::sto*()family - String View — Non-owning string references for C++17+