Undefined Behavior
C++ undefined behavior — what it is, how compilers exploit it, the most dangerous forms, and how to detect and eliminate it.
Undefined Behaviorsince C++98A program that executes an operation the C++ standard designates as undefined behavior gives up all guarantees — the compiler may generate any code it chooses, including code that silently corrupts data, eliminates safety checks, or introduces security vulnerabilities.
Overview
Undefined behavior is not a runtime failure mode — it is a compile-time contract violation the optimizer exploits. The standard deliberately leaves certain operations undefined so compilers can optimize without disproving impossibility. When you write x + 1 where x is a signed int, the compiler is permitted to assume that expression never overflows, and it will restructure your code around that assumption.
The core danger: UB does not produce "garbage output." It produces code whose behavior the optimizer derived from a false premise. The result can appear correct in debug builds, fail silently under -O2, and change behavior across compiler versions — without any diagnostic.
The standard defines four tiers below "defined":
| Term | Meaning |
|---|---|
| Undefined behavior | No requirements — anything may happen |
| Unspecified behavior | One of several valid outcomes; standard doesn't say which |
| Implementation-defined | Platform must pick one outcome and document it |
| Erroneous behavior (C++26) | UB with mandatory diagnostic support in hardened implementations |
C++26 introduces erroneous behavior for operations like reading an uninitialized bool — conforming implementations must either produce the indeterminate value deterministically or trap, closing some of the practical UB gap without full defined-behavior overhead.
How compilers exploit UB
The optimizer operates under the as-if rule: it may rewrite any code that preserves observable behavior for a valid program. Because UB cannot occur in a valid program by definition, any path reachable only via UB is provably dead and may be eliminated.
// Overflow check the optimizer deletes entirely
bool will_overflow(int x) {
return x + 1 <= x; // signed overflow is UB — so x+1 > x always
}
// -O2 output (GCC 14, Clang 18): return false;// Null check after dereference — optimizer proves it's dead code
void write(int* p) {
*p = 42; // dereference: establishes p != null
if (p == nullptr) // provably false — eliminated
log_error();
}// Signed loop counter — wrap assumed never to occur; termination check dropped
for (int i = 0; i >= 0; ++i) { // UB: wraps to INT_MIN
process(i);
}
// GCC -O3: may generate an infinite loop — the "i >= 0" check is always trueThese are not compiler bugs. They are correct transformations of a program that has already violated the standard.
Examples
Signed integer overflow
Signed arithmetic is not modular in C++. Two's complement wrapping that C programmers sometimes rely on is undefined behavior — and C++ only mandated two's complement representation in C++20.
int x = INT_MAX;
int y = x + 1; // UB: signed overflow (pre-C++20 and C++20+)
// Safe alternatives:
if (x > INT_MAX - 1) throw std::overflow_error("overflow");
long long y = static_cast<long long>(x) + 1; // promote before adding
int y = std::add_sat(x, 1); // C++26: saturating arithmeticUnsigned integers wrap by definition — that is fully specified behavior:
unsigned u = UINT_MAX;
unsigned v = u + 1; // 0 — defined, guaranteed modular wrapOut-of-bounds access
int arr[5];
int x = arr[5]; // UB: one past end
arr[-1] = 0; // UB: before start
std::vector<int> v{1, 2, 3};
v[10]; // UB: operator[] is unchecked
v.at(10); // C++98: throws std::out_of_range — use at trust boundariesDangling references and use-after-free
int& bad() {
int x = 42;
return x; // UB: x destroyed at return
}
// Iterator invalidation — common and silent
std::vector<int> v{1, 2, 3};
auto it = v.begin();
v.push_back(4); // may reallocate — it is now dangling
*it = 99; // UB
// Fix: hold indices instead of iterators across mutations
// Or: v.reserve(N) before inserting, when final size is knownUninitialized reads
int x; // indeterminate value — not zero-initialized
if (x > 0) { ... } // UB: reading indeterminate value
// C++26 erroneous behavior: reading uninitialized trivial types becomes
// erroneous (not UB) — implementations may produce 0 or trapStrict aliasing violations
The strict aliasing rule permits the compiler to assume pointers of different types do not alias, enabling load/store reordering that breaks programs that violate it. The only types that may alias any object are char*, unsigned char*, and (C++17) std::byte*.
float f = 3.14f;
int* p = reinterpret_cast<int*>(&f);
*p = 0x3f800000; // UB: accessing float storage through int*
// Correct type punning:
int bits;
std::memcpy(&bits, &f, sizeof(float)); // always defined
int bits = std::bit_cast<int>(f); // C++20: preferred, constexpr-capablestd::bit_cast (C++20) is the canonical solution — it is constexpr, communicates intent, and generates identical code to memcpy on optimized builds.
Shift overflow
int x = 1;
x << 31; // UB pre-C++20: shifts into sign bit for signed int
x << 32; // UB all versions: shift amount >= bit width
// C++20 change: left-shift of signed integers is defined as
// truncation (two's complement) — shifting into sign bit is no longer UB
// C++20: (1 << 31) is defined for 32-bit int as INT_MIN
uint32_t u = 1u << 31; // Always defined: 0x80000000uData races
int counter = 0;
std::thread t1([&]{ ++counter; }); // C++11: std::thread introduced
std::thread t2([&]{ ++counter; });
t1.join(); t2.join();
// UB: concurrent non-atomic writes — C++11 memory model defines this as UB
// Fix: std::atomic<int> counter{0}; // C++11Detection
| Tool | Catches | Flag |
|---|---|---|
| UBSan | Signed overflow, null deref, invalid enum, OOB | -fsanitize=undefined |
| ASan | OOB access, use-after-free, stack use-after-scope | -fsanitize=address |
| MSan | Uninitialized reads | -fsanitize=memory |
| TSan | Data races | -fsanitize=thread |
| Valgrind | OOB, use-after-free, leaks | No recompile needed |
| clang-tidy | Static: uninitialized vars, aliasing patterns | Compile-time |
# Full sanitizer build — use this for CI on test suites
clang++ -fsanitize=address,undefined -fno-sanitize-recover=all \
-g -O1 -o myapp main.cpp
# -fno-sanitize-recover=all: first UB hit aborts with stack trace
# -O1: enough optimization to trigger UB, cheap enough for CI-fsanitize-trap=undefined (Clang) emits a hardware trap instead of a runtime call — useful for embedded targets that cannot link the sanitizer runtime.
Best Practices
- Enable ASan + UBSan in CI. Run every test suite under
-fsanitize=address,undefined -fno-sanitize-recover=all. Most dynamic UB is caught here at near-zero maintenance cost. - Test at
-O2, not just debug. Many UB manifestations only appear when the optimizer runs. Debug builds mask the problem. - Use
std::bit_castfor type punning (C++20). It'sconstexpr, expresses intent, and compiles to zero overhead. - Prefer
.at()over[]at trust boundaries — where input size comes from external sources. The bounds check is a single branch. - Use
std::atomicfor any shared variable (C++11). Even reads of non-atomic data are UB when another thread writes. - Initialize all variables at declaration. Modern optimizers eliminate redundant stores; the bug risk from skipping initialization is not worth it.
Common Pitfalls
Signed/unsigned comparison. int i = -1; if (i < v.size()) — size() returns size_t (unsigned); -1 promotes to SIZE_MAX. Not UB, but almost always a bug. Compile with -Wsign-compare.
std::optional access without checking (C++17):
std::optional<int> opt; // C++17
int x = *opt; // UB: accessing disengaged optional
int x = opt.value(); // throws std::bad_optional_access — safer at trust boundariesPlacement new without std::launder (C++17). Reusing storage via placement new and then accessing through the original pointer is UB:
alignas(int) char buf[sizeof(int)];
new (buf) int{42};
int* p = reinterpret_cast<int*>(buf); // UB: stale pointer
int* p = std::launder(reinterpret_cast<int*>(buf)); // C++17: definedAssuming sanitizers catch everything. MSan misses some heap uninitialized reads; UBSan typically misses strict aliasing violations (-fsanitize=strict-aliasing is rarely enabled); neither tool catches logic errors. Sanitizers are a floor, not a ceiling.
Integer overflow in security-sensitive length arithmetic.
// Classic heap overflow via wrapping length
uint32_t len = attacker_controlled;
char* out = new char[len + 1]; // len = UINT_MAX → +1 wraps to 0
memcpy(out, src, len); // copies UINT_MAX bytes — heap corruption
// Fix:
if (len > MAX_SAFE_SIZE) return error;
char* out = new char[len + 1];