Skip to content
C++
Library
since C++11
Expert

std::memory_order

Controls how memory accesses around atomic operations are ordered across threads; the backbone of C++'s lock-free concurrency model.

std::memory_ordersince C++11

An enumeration that specifies the visibility and ordering constraints applied to memory operations surrounding an atomic load, store, or read-modify-write, governing how those operations synchronize between threads.

Overview

Modern CPUs and compilers reorder memory accesses for performance β€” a load issued before a store in source code may execute after it at runtime. On a single thread this is invisible, but across threads the effect is observable and catastrophic in lock-free code. std::memory_order gives precise control over which reorderings are forbidden around each atomic operation.

The six values form a spectrum from cheapest to most restrictive:

ValueGuarantees
memory_order_relaxedAtomicity only; no ordering constraints
memory_order_consumeData-dependency ordering on the consuming load
memory_order_acquireNo subsequent loads/stores move before this load
memory_order_releaseNo preceding loads/stores move after this store
memory_order_acq_relAcquire + release semantics on a single RMW
memory_order_seq_cstTotal order across all seq_cst operations (default)

In C++20, std::memory_order became a scoped enum. The unscoped names (memory_order_relaxed, etc.) remain as inline constexpr aliases, so existing code continues to compile unchanged. New code may use std::memory_order::relaxed, std::memory_order::acquire, etc.

Synchronizes-with

The central relationship is synchronizes-with: a release store on thread A synchronizes-with an acquire load on thread B that reads the stored value. Everything thread A wrote before the release is guaranteed visible to thread B after the acquire. This is how data ownership transfers between threads without a mutex.

Syntax

cpp
#include <atomic>

// C++11: unscoped enum (names remain valid in all later standards)
enum memory_order {
    memory_order_relaxed,
    memory_order_consume,
    memory_order_acquire,
    memory_order_release,
    memory_order_acq_rel,
    memory_order_seq_cst
};

// C++20: scoped enum; unscoped names become inline constexpr aliases
namespace std {
    enum class memory_order : /* unspecified */ {
        relaxed, consume, acquire, release, acq_rel, seq_cst
    };
    inline constexpr memory_order memory_order_relaxed = memory_order::relaxed;
    // … and so on for the remaining five values
}

Atomic member functions accept the order as an explicit parameter; the default is always seq_cst:

cpp
std::atomic<int> x{0};

x.store(42, std::memory_order_release);
int v = x.load(std::memory_order_acquire);
int old = x.exchange(1, std::memory_order_acq_rel);

// compare_exchange takes separate success and failure orders
bool ok = x.compare_exchange_strong(
    old, 2,
    std::memory_order_release,   // applied on success
    std::memory_order_relaxed);  // applied on failure

Examples

Relaxed β€” independent counter

memory_order_relaxed is correct when you need only atomicity, not ordering. A hit counter accumulated from many threads and read once at shutdown is the canonical case:

cpp
#include <atomic>
#include <thread>
#include <vector>

std::atomic<unsigned long long> total_hits{0};  // C++11

void worker(int n) {
    for (int i = 0; i < n; ++i)
        total_hits.fetch_add(1, std::memory_order_relaxed);
}

int main() {
    std::vector<std::thread> threads;
    for (int i = 0; i < 8; ++i)
        threads.emplace_back(worker, 1'000'000);
    for (auto& t : threads) t.join();
    // One final read; seq_cst (the default) is fine here
    return static_cast<int>(total_hits.load());
}

Release / Acquire β€” producer-consumer handoff

The most pervasive pattern in lock-free code: one thread publishes data and signals readiness; another consumes it.

cpp
#include <atomic>
#include <cassert>
#include <thread>

struct Payload { int x, y; };

Payload data{};                    // non-atomic shared data
std::atomic<bool> ready{false};   // C++11

void producer() {
    data = {42, 99};
    // The release ensures the 'data' write is visible before 'ready' is seen true.
    ready.store(true, std::memory_order_release);
}

void consumer() {
    while (!ready.load(std::memory_order_acquire))
        ;
    // Synchronizes-with the release store: data is fully visible here.
    assert(data.x == 42 && data.y == 99);
}

int main() {
    std::thread t1{producer}, t2{consumer};
    t1.join(); t2.join();
}

Replacing either order with memory_order_relaxed breaks the guarantee: the compiler or CPU may reorder the data write to after the flag store, leaving the consumer reading indeterminate values.

Acquire-release β€” spinlock

cpp
#include <atomic>

class Spinlock {
    // C++11–17: ATOMIC_FLAG_INIT required; C++20: default constructor clears the flag
    std::atomic_flag flag_ = ATOMIC_FLAG_INIT;
public:
    void lock() noexcept {
        while (flag_.test_and_set(std::memory_order_acquire))
            ;
    }
    void unlock() noexcept {
        flag_.clear(std::memory_order_release);
    }
};

The acquire on test_and_set pairs with the release on clear: every thread that acquires the lock sees all writes made by the previous lock holder before they released it.

Sequential consistency β€” enforcing a global order

memory_order_seq_cst is the default for all std::atomic compound operators (++, +=, etc.) and imposes a single total order on all participating operations across all threads. It is the only ordering that prevents the "independent stores" reordering visible on weakly-ordered architectures (ARM, POWER):

cpp
#include <atomic>
#include <cassert>
#include <thread>

std::atomic<int> x{0}, y{0};  // C++11
int r1 = -1, r2 = -1;

void t1() {
    x.store(1, std::memory_order_seq_cst);
    r1 = y.load(std::memory_order_seq_cst);
}

void t2() {
    y.store(1, std::memory_order_seq_cst);
    r2 = x.load(std::memory_order_seq_cst);
}

// With seq_cst: r1 == 0 && r2 == 0 is impossible.
// Either x.store or y.store is first in the total order.

With acquire/release alone, r1 == 0 && r2 == 0 is a permitted outcome on weakly-ordered hardware β€” the two stores can appear after both loads from the perspective of the other thread.

memory_order_consume β€” avoid in practice

memory_order_consume is theoretically weaker than acquire: it orders only operations that data-depend on the loaded pointer. No major compiler implements the required dependency-tracking correctly. As of C++17, implementations are permitted β€” and expected β€” to promote consume to acquire. Avoid it in new code unless targeting an embedded platform with guaranteed dependency ordering and a compiler that explicitly supports it.

Best Practices

  • Start with seq_cst. It is easiest to reason about. Profile under realistic contention before weakening anything.
  • Use release/acquire for ownership transfer. This is the right tool whenever one thread finishes writing data and another thread needs to read it.
  • Reserve relaxed for genuinely independent values. Counters, statistics, flags that are set-only and never guard other memory. Never use it to protect access to non-atomic shared state.
  • Use acq_rel on RMW operations mid-chain. When a fetch_add or compare_exchange sits between a read and a write of associated data, it must both acquire preceding writes and release subsequent ones.
  • Validate on weakly-ordered hardware or with ThreadSanitizer. x86's strong memory model masks acquire/release bugs; ARM and POWER expose them. ThreadSanitizer catches violations portably.

Common Pitfalls

Relaxed flag guarding non-atomic data. The store being atomic does not make guarded data visible. You need at least release on the write side and acquire on the read side.

Mixing seq_cst with weaker orderings on the same variable. The total-order guarantee applies only among seq_cst operations. A memory_order_release store is not part of the seq_cst total order, even if the load on the other side uses seq_cst.

Illegal memory order on compare_exchange failure. The failure order cannot be memory_order_release or memory_order_acq_rel (failure does not write), and it cannot be stronger than the success order:

cpp
// Wrong: release on failure is undefined behavior
x.compare_exchange_strong(expected, desired,
    std::memory_order_acq_rel,
    std::memory_order_release);  // UB

// Correct
x.compare_exchange_strong(expected, desired,
    std::memory_order_acq_rel,
    std::memory_order_acquire);  // OK: failure is not stronger, not release

Assuming lock-free equals fast. On architectures that require full memory barriers for seq_cst, a single atomic increment can be 10–50Γ— slower than a non-atomic one under contention. Measure before replacing a mutex with hand-rolled atomics β€” a well-implemented mutex often wins at low thread counts.

See Also