What is std::memory_order in C++?

Controls how memory accesses around atomic operations are ordered across threads; the backbone of C++'s lock-free concurrency model.

Which C++ standard introduced std::memory_order?

std::memory_order was introduced in C++11.

What is the difficulty level of std::memory_order?

std::memory_order is considered Expert-level C++ material.

std::memory_order

std::memory_ordersince C++11

An enumeration that specifies the visibility and ordering constraints applied to memory operations surrounding an atomic load, store, or read-modify-write, governing how those operations synchronize between threads.

Overview

Modern CPUs and compilers reorder memory accesses for performance — a load issued before a store in source code may execute after it at runtime. On a single thread this is invisible, but across threads the effect is observable and catastrophic in lock-free code. std::memory_order gives precise control over which reorderings are forbidden around each atomic operation.

The six values form a spectrum from cheapest to most restrictive:

Value	Guarantees
`memory_order_relaxed`	Atomicity only; no ordering constraints
`memory_order_consume`	Data-dependency ordering on the consuming load
`memory_order_acquire`	No subsequent loads/stores move before this load
`memory_order_release`	No preceding loads/stores move after this store
`memory_order_acq_rel`	Acquire + release semantics on a single RMW
`memory_order_seq_cst`	Total order across all seq_cst operations (default)

In C++20, std::memory_order became a scoped enum. The unscoped names (memory_order_relaxed, etc.) remain as inline constexpr aliases, so existing code continues to compile unchanged. New code may use std::memory_order::relaxed, std::memory_order::acquire, etc.

Synchronizes-with

The central relationship is synchronizes-with: a release store on thread A synchronizes-with an acquire load on thread B that reads the stored value. Everything thread A wrote before the release is guaranteed visible to thread B after the acquire. This is how data ownership transfers between threads without a mutex.

Syntax

cpp

Godbolt

#include <atomic>

// C++11: unscoped enum (names remain valid in all later standards)
enum memory_order {
    memory_order_relaxed,
    memory_order_consume,
    memory_order_acquire,
    memory_order_release,
    memory_order_acq_rel,
    memory_order_seq_cst
};

// C++20: scoped enum; unscoped names become inline constexpr aliases
namespace std {
    enum class memory_order : /* unspecified */ {
        relaxed, consume, acquire, release, acq_rel, seq_cst
    };
    inline constexpr memory_order memory_order_relaxed = memory_order::relaxed;
    // … and so on for the remaining five values
}

Atomic member functions accept the order as an explicit parameter; the default is always seq_cst:

cpp

Godbolt

std::atomic<int> x{0};

x.store(42, std::memory_order_release);
int v = x.load(std::memory_order_acquire);
int old = x.exchange(1, std::memory_order_acq_rel);

// compare_exchange takes separate success and failure orders
bool ok = x.compare_exchange_strong(
    old, 2,
    std::memory_order_release,   // applied on success
    std::memory_order_relaxed);  // applied on failure

Examples

Relaxed — independent counter

memory_order_relaxed is correct when you need only atomicity, not ordering. A hit counter accumulated from many threads and read once at shutdown is the canonical case:

cpp

Godbolt

#include <atomic>
#include <thread>
#include <vector>

std::atomic<unsigned long long> total_hits{0};  // C++11

void worker(int n) {
    for (int i = 0; i < n; ++i)
        total_hits.fetch_add(1, std::memory_order_relaxed);
}

int main() {
    std::vector<std::thread> threads;
    for (int i = 0; i < 8; ++i)
        threads.emplace_back(worker, 1'000'000);
    for (auto& t : threads) t.join();
    // One final read; seq_cst (the default) is fine here
    return static_cast<int>(total_hits.load());
}

Release / Acquire — producer-consumer handoff

The most pervasive pattern in lock-free code: one thread publishes data and signals readiness; another consumes it.

cpp

Godbolt

#include <atomic>
#include <cassert>
#include <thread>

struct Payload { int x, y; };

Payload data{};                    // non-atomic shared data
std::atomic<bool> ready{false};   // C++11

void producer() {
    data = {42, 99};
    // The release ensures the 'data' write is visible before 'ready' is seen true.
    ready.store(true, std::memory_order_release);
}

void consumer() {
    while (!ready.load(std::memory_order_acquire))
        ;
    // Synchronizes-with the release store: data is fully visible here.
    assert(data.x == 42 && data.y == 99);
}

int main() {
    std::thread t1{producer}, t2{consumer};
    t1.join(); t2.join();
}

Replacing either order with memory_order_relaxed breaks the guarantee: the compiler or CPU may reorder the data write to after the flag store, leaving the consumer reading indeterminate values.

Acquire-release — spinlock

cpp

Godbolt

#include <atomic>

class Spinlock {
    // C++11–17: ATOMIC_FLAG_INIT required; C++20: default constructor clears the flag
    std::atomic_flag flag_ = ATOMIC_FLAG_INIT;
public:
    void lock() noexcept {
        while (flag_.test_and_set(std::memory_order_acquire))
            ;
    }
    void unlock() noexcept {
        flag_.clear(std::memory_order_release);
    }
};

The acquire on test_and_set pairs with the release on clear: every thread that acquires the lock sees all writes made by the previous lock holder before they released it.

Sequential consistency — enforcing a global order

memory_order_seq_cst is the default for all std::atomic compound operators (++, +=, etc.) and imposes a single total order on all participating operations across all threads. It is the only ordering that prevents the "independent stores" reordering visible on weakly-ordered architectures (ARM, POWER):

cpp

Godbolt

#include <atomic>
#include <cassert>
#include <thread>

std::atomic<int> x{0}, y{0};  // C++11
int r1 = -1, r2 = -1;

void t1() {
    x.store(1, std::memory_order_seq_cst);
    r1 = y.load(std::memory_order_seq_cst);
}

void t2() {
    y.store(1, std::memory_order_seq_cst);
    r2 = x.load(std::memory_order_seq_cst);
}

// With seq_cst: r1 == 0 && r2 == 0 is impossible.
// Either x.store or y.store is first in the total order.

With acquire/release alone, r1 == 0 && r2 == 0 is a permitted outcome on weakly-ordered hardware — the two stores can appear after both loads from the perspective of the other thread.

`memory_order_consume` — avoid in practice

memory_order_consume is theoretically weaker than acquire: it orders only operations that data-depend on the loaded pointer. No major compiler implements the required dependency-tracking correctly. As of C++17, implementations are permitted — and expected — to promote consume to acquire. Avoid it in new code unless targeting an embedded platform with guaranteed dependency ordering and a compiler that explicitly supports it.

Best Practices

Start with seq_cst. It is easiest to reason about. Profile under realistic contention before weakening anything.
Use release/acquire for ownership transfer. This is the right tool whenever one thread finishes writing data and another thread needs to read it.
Reserve relaxed for genuinely independent values. Counters, statistics, flags that are set-only and never guard other memory. Never use it to protect access to non-atomic shared state.
Use acq_rel on RMW operations mid-chain. When a fetch_add or compare_exchange sits between a read and a write of associated data, it must both acquire preceding writes and release subsequent ones.
Validate on weakly-ordered hardware or with ThreadSanitizer. x86's strong memory model masks acquire/release bugs; ARM and POWER expose them. ThreadSanitizer catches violations portably.

Common Pitfalls

Relaxed flag guarding non-atomic data. The store being atomic does not make guarded data visible. You need at least release on the write side and acquire on the read side.

Mixing seq_cst with weaker orderings on the same variable. The total-order guarantee applies only among seq_cst operations. A memory_order_release store is not part of the seq_cst total order, even if the load on the other side uses seq_cst.

Illegal memory order on compare_exchange failure. The failure order cannot be memory_order_release or memory_order_acq_rel (failure does not write), and it cannot be stronger than the success order:

cpp

Godbolt

// Wrong: release on failure is undefined behavior
x.compare_exchange_strong(expected, desired,
    std::memory_order_acq_rel,
    std::memory_order_release);  // UB

// Correct
x.compare_exchange_strong(expected, desired,
    std::memory_order_acq_rel,
    std::memory_order_acquire);  // OK: failure is not stronger, not release

Assuming lock-free equals fast. On architectures that require full memory barriers for seq_cst, a single atomic increment can be 10–50× slower than a non-atomic one under contention. Measure before replacing a mutex with hand-rolled atomics — a well-implemented mutex often wins at low thread counts.