std::memory_order
Controls how memory accesses around atomic operations are ordered across threads; the backbone of C++'s lock-free concurrency model.
std::memory_ordersince C++11An enumeration that specifies the visibility and ordering constraints applied to memory operations surrounding an atomic load, store, or read-modify-write, governing how those operations synchronize between threads.
Overview
Modern CPUs and compilers reorder memory accesses for performance β a load issued before a store in source code may execute after it at runtime. On a single thread this is invisible, but across threads the effect is observable and catastrophic in lock-free code. std::memory_order gives precise control over which reorderings are forbidden around each atomic operation.
The six values form a spectrum from cheapest to most restrictive:
| Value | Guarantees |
|---|---|
memory_order_relaxed | Atomicity only; no ordering constraints |
memory_order_consume | Data-dependency ordering on the consuming load |
memory_order_acquire | No subsequent loads/stores move before this load |
memory_order_release | No preceding loads/stores move after this store |
memory_order_acq_rel | Acquire + release semantics on a single RMW |
memory_order_seq_cst | Total order across all seq_cst operations (default) |
In C++20, std::memory_order became a scoped enum. The unscoped names (memory_order_relaxed, etc.) remain as inline constexpr aliases, so existing code continues to compile unchanged. New code may use std::memory_order::relaxed, std::memory_order::acquire, etc.
Synchronizes-with
The central relationship is synchronizes-with: a release store on thread A synchronizes-with an acquire load on thread B that reads the stored value. Everything thread A wrote before the release is guaranteed visible to thread B after the acquire. This is how data ownership transfers between threads without a mutex.
Syntax
#include <atomic>
// C++11: unscoped enum (names remain valid in all later standards)
enum memory_order {
memory_order_relaxed,
memory_order_consume,
memory_order_acquire,
memory_order_release,
memory_order_acq_rel,
memory_order_seq_cst
};
// C++20: scoped enum; unscoped names become inline constexpr aliases
namespace std {
enum class memory_order : /* unspecified */ {
relaxed, consume, acquire, release, acq_rel, seq_cst
};
inline constexpr memory_order memory_order_relaxed = memory_order::relaxed;
// β¦ and so on for the remaining five values
}Atomic member functions accept the order as an explicit parameter; the default is always seq_cst:
std::atomic<int> x{0};
x.store(42, std::memory_order_release);
int v = x.load(std::memory_order_acquire);
int old = x.exchange(1, std::memory_order_acq_rel);
// compare_exchange takes separate success and failure orders
bool ok = x.compare_exchange_strong(
old, 2,
std::memory_order_release, // applied on success
std::memory_order_relaxed); // applied on failureExamples
Relaxed β independent counter
memory_order_relaxed is correct when you need only atomicity, not ordering. A hit counter accumulated from many threads and read once at shutdown is the canonical case:
#include <atomic>
#include <thread>
#include <vector>
std::atomic<unsigned long long> total_hits{0}; // C++11
void worker(int n) {
for (int i = 0; i < n; ++i)
total_hits.fetch_add(1, std::memory_order_relaxed);
}
int main() {
std::vector<std::thread> threads;
for (int i = 0; i < 8; ++i)
threads.emplace_back(worker, 1'000'000);
for (auto& t : threads) t.join();
// One final read; seq_cst (the default) is fine here
return static_cast<int>(total_hits.load());
}Release / Acquire β producer-consumer handoff
The most pervasive pattern in lock-free code: one thread publishes data and signals readiness; another consumes it.
#include <atomic>
#include <cassert>
#include <thread>
struct Payload { int x, y; };
Payload data{}; // non-atomic shared data
std::atomic<bool> ready{false}; // C++11
void producer() {
data = {42, 99};
// The release ensures the 'data' write is visible before 'ready' is seen true.
ready.store(true, std::memory_order_release);
}
void consumer() {
while (!ready.load(std::memory_order_acquire))
;
// Synchronizes-with the release store: data is fully visible here.
assert(data.x == 42 && data.y == 99);
}
int main() {
std::thread t1{producer}, t2{consumer};
t1.join(); t2.join();
}Replacing either order with memory_order_relaxed breaks the guarantee: the compiler or CPU may reorder the data write to after the flag store, leaving the consumer reading indeterminate values.
Acquire-release β spinlock
#include <atomic>
class Spinlock {
// C++11β17: ATOMIC_FLAG_INIT required; C++20: default constructor clears the flag
std::atomic_flag flag_ = ATOMIC_FLAG_INIT;
public:
void lock() noexcept {
while (flag_.test_and_set(std::memory_order_acquire))
;
}
void unlock() noexcept {
flag_.clear(std::memory_order_release);
}
};The acquire on test_and_set pairs with the release on clear: every thread that acquires the lock sees all writes made by the previous lock holder before they released it.
Sequential consistency β enforcing a global order
memory_order_seq_cst is the default for all std::atomic compound operators (++, +=, etc.) and imposes a single total order on all participating operations across all threads. It is the only ordering that prevents the "independent stores" reordering visible on weakly-ordered architectures (ARM, POWER):
#include <atomic>
#include <cassert>
#include <thread>
std::atomic<int> x{0}, y{0}; // C++11
int r1 = -1, r2 = -1;
void t1() {
x.store(1, std::memory_order_seq_cst);
r1 = y.load(std::memory_order_seq_cst);
}
void t2() {
y.store(1, std::memory_order_seq_cst);
r2 = x.load(std::memory_order_seq_cst);
}
// With seq_cst: r1 == 0 && r2 == 0 is impossible.
// Either x.store or y.store is first in the total order.With acquire/release alone, r1 == 0 && r2 == 0 is a permitted outcome on weakly-ordered hardware β the two stores can appear after both loads from the perspective of the other thread.
memory_order_consume β avoid in practice
memory_order_consume is theoretically weaker than acquire: it orders only operations that data-depend on the loaded pointer. No major compiler implements the required dependency-tracking correctly. As of C++17, implementations are permitted β and expected β to promote consume to acquire. Avoid it in new code unless targeting an embedded platform with guaranteed dependency ordering and a compiler that explicitly supports it.
Best Practices
- Start with
seq_cst. It is easiest to reason about. Profile under realistic contention before weakening anything. - Use release/acquire for ownership transfer. This is the right tool whenever one thread finishes writing data and another thread needs to read it.
- Reserve
relaxedfor genuinely independent values. Counters, statistics, flags that are set-only and never guard other memory. Never use it to protect access to non-atomic shared state. - Use
acq_relon RMW operations mid-chain. When afetch_addorcompare_exchangesits between a read and a write of associated data, it must both acquire preceding writes and release subsequent ones. - Validate on weakly-ordered hardware or with ThreadSanitizer. x86's strong memory model masks acquire/release bugs; ARM and POWER expose them. ThreadSanitizer catches violations portably.
Common Pitfalls
Relaxed flag guarding non-atomic data. The store being atomic does not make guarded data visible. You need at least release on the write side and acquire on the read side.
Mixing seq_cst with weaker orderings on the same variable. The total-order guarantee applies only among seq_cst operations. A memory_order_release store is not part of the seq_cst total order, even if the load on the other side uses seq_cst.
Illegal memory order on compare_exchange failure. The failure order cannot be memory_order_release or memory_order_acq_rel (failure does not write), and it cannot be stronger than the success order:
// Wrong: release on failure is undefined behavior
x.compare_exchange_strong(expected, desired,
std::memory_order_acq_rel,
std::memory_order_release); // UB
// Correct
x.compare_exchange_strong(expected, desired,
std::memory_order_acq_rel,
std::memory_order_acquire); // OK: failure is not stronger, not releaseAssuming lock-free equals fast. On architectures that require full memory barriers for seq_cst, a single atomic increment can be 10β50Γ slower than a non-atomic one under contention. Measure before replacing a mutex with hand-rolled atomics β a well-implemented mutex often wins at low thread counts.
See Also
std::atomicβ the primary type parameterised bymemory_orderstd::condition_variableβ mutex-based synchronization that internalises ordering concernsstd::async/std::futureβ task-based concurrency that eliminates explicit memory ordering in most cases