Skip to content
C++
Idiom
since C++11
Advanced

Thread-Safe Singleton

Guaranteeing exactly-once construction of a singleton across concurrent threads using function-local statics, std::call_once, or atomic DCLP.

Thread-Safe Singletonsince C++11

A singleton initialization strategy that guarantees exactly one instance is constructed regardless of how many threads concurrently invoke the accessor, relying on C++11's memory model rather than platform-specific primitives or ad-hoc locking.

Overview

Prior to C++11, the C++ standard said nothing about threads. "Thread-safe initialization" was entirely the programmer's problem, and every common approach had a crack in it. A raw static T* p = nullptr with a mutex around the first assignment is races-prone: the load of p before acquiring the lock is a data race in C++11's formal model and was exploitable on weakly-ordered CPUs even before that. The naive double-checked locking pattern (DCLP)—checking the pointer before and after locking—compiled to correct assembly on x86 by accident, but was undefined behavior on any architecture where stores could reorder, and compilers were free to "optimize" it into the broken form.

C++11 resolved this at two levels. It introduced a formal memory model with sequenced-before, happens-before, and synchronized-with relations. It also mandated in [stmt.dcl] that if multiple threads simultaneously enter the initialization of a function-local static variable, all but one block until initialization completes. That one rule makes the Meyers singleton correct without any user-level locking.

There are three credible approaches in modern C++:

  1. Function-local static (Meyers singleton) — zero locking on the fast path, simplest code.
  2. std::call_once + std::once_flag — explicit, heap-compatible, retry-on-exception.
  3. Double-Checked Locking with std::atomic — heap-allocated, precise memory-ordering control.

Approach 1: Function-Local Static

cpp
#include <string_view> // C++17

class Logger {
public:
    static Logger& instance() {
        static Logger inst; // C++11: initialized exactly once, even under concurrency
        return inst;
    }

    void log(std::string_view msg); // C++17

    Logger(const Logger&)            = delete;
    Logger& operator=(const Logger&) = delete;
    Logger(Logger&&)                 = delete;
    Logger& operator=(Logger&&)      = delete;

private:
    Logger();
};

The compiler inserts a hidden guard variable (typically a one-byte flag) and a platform fence. The fast path after initialization is a single load and a predicted branch — usually two or three cycles. Destruction is registered with atexit automatically, in reverse construction order relative to other statics.

This is the right default. Use it whenever the concrete type is known at compile time and you don't need retry semantics on a throwing constructor.

Lifetime caveat: If another static's destructor calls Logger::instance() after Logger has been destroyed, the behavior is undefined — the guard flag is not reset. If cross-singleton dependencies during teardown are a concern, see the heap-leak mitigation in Common Pitfalls.

Approach 2: std::call_once

std::call_once (C++11, <mutex>) provides once-initialization with a heap-allocated instance, which unlocks runtime-polymorphic singletons and explicit lifetime control:

cpp
#include <memory>  // C++11
#include <mutex>   // C++11

class ConnectionPool {
public:
    static ConnectionPool& instance() {
        std::call_once(init_flag_, &ConnectionPool::create);
        return *instance_;
    }

    ConnectionPool(const ConnectionPool&)            = delete;
    ConnectionPool& operator=(const ConnectionPool&) = delete;

private:
    static void create() {
        instance_ = std::make_unique<ConnectionPool>(128); // C++14: make_unique
    }

    explicit ConnectionPool(int max_connections);

    // C++17 inline static removes the need for out-of-class definitions
    inline static std::unique_ptr<ConnectionPool> instance_;
    inline static std::once_flag init_flag_;
};

std::once_flag is a one-shot event. Once the callable passed to call_once returns normally, the flag is set and every subsequent call on the same flag becomes a no-op — no lock is acquired. The synchronization guarantee is strong: the end of the active (returning) call synchronizes-with the return from every passive call on the same flag, so all threads observe the constructed object's state without additional barriers.

Exception semantics differ from the Meyers approach. If the callable throws, the flag is not flipped. The next call to call_once with the same flag will retry initialization. This lets you recover from transient failures (e.g., a database pool that can't connect on the first attempt). The Meyers singleton offers no such retry — a throwing constructor during static initialization calls std::terminate.

Prefer call_once over the Meyers singleton when:

  • The concrete type is determined at runtime (factory pattern, configuration-driven).
  • You need to reset or replace the instance in tests.
  • You want retry-on-exception during initialization.

Approach 3: Double-Checked Locking with std::atomic

DCLP is correct in C++11 when implemented with std::atomic and the appropriate memory orders. It is rarely the best choice today — reach for it only when profiling has identified call_once's internal bookkeeping as a bottleneck on a genuinely hot path.

cpp
#include <atomic>  // C++11
#include <mutex>   // C++11

class MetricsRegistry {
public:
    static MetricsRegistry* instance() {
        // Fast path: acquire-load pairs with the release-store after construction.
        MetricsRegistry* p = instance_.load(std::memory_order_acquire);
        if (!p) {
            std::lock_guard<std::mutex> lock(mutex_); // C++11
            // Re-read under the lock: another thread may have won the race.
            p = instance_.load(std::memory_order_relaxed);
            if (!p) {
                p = new MetricsRegistry();
                instance_.store(p, std::memory_order_release);
            }
        }
        return p;
    }

private:
    MetricsRegistry() = default;

    inline static std::atomic<MetricsRegistry*> instance_{nullptr}; // C++17
    inline static std::mutex mutex_;
};

The acquire-load / release-store pair is load-bearing. The release-store ensures that the write to instance_ cannot be observed by any thread before the constructor body has finished executing. The acquire-load ensures that any thread reading a non-null pointer also sees all memory writes performed before that release-store. Relaxing either order breaks the guarantee on ARM, POWER, or any other weakly-ordered architecture.

Why DCLP failed before C++11: Without a formal memory model, nothing prevented compilers from reordering the pointer store before the constructor body. A racing thread would read a non-null pointer and dereference a partially-constructed object. volatile was never the answer — it disables certain compiler optimizations but provides zero inter-thread ordering guarantees and is explicitly not a synchronization mechanism in the C++ standard.

Best Practices

Default to the Meyers singleton. It is correct by construction in C++11 and later, requires no auxiliary types, and has the lowest overhead after initialization. Its only real constraint is that the type must be known statically.

Delete all four special members. Delete the copy constructor, copy-assignment, move constructor, and move-assignment operators. Leaving move members implicitly available allows code to accidentally extract the singleton from its accessor.

Keep the constructor private. This is the invariant that enforces single-instance semantics at compile time, not at runtime.

Avoid singletons that depend on each other during teardown. Static destruction order is the reverse of construction order within a translation unit but unspecified across translation units. If A::~A() calls B::instance() and B has already been destroyed, the behavior is undefined.

Use std::call_once in test harnesses. In production code the flag is set once per process. In tests, construct a fresh once_flag and unique_ptr pair per test fixture using a reset mechanism rather than fighting the static lifetime.

Common Pitfalls

Assuming C++03 safety. The function-local static thread-safety guarantee is a C++11 language rule. Code compiled as C++03 — or with a compiler that predates the rule — can race on the guard variable. Audit any pre-C++11 codebase carefully.

Using volatile as a synchronization primitive. volatile prevents the compiler from caching a memory location in a register; it has no effect on CPU reordering and establishes no happens-before relationship. DCLP with volatile T* instead of std::atomic<T*> is a data race under the C++11 memory model.

Loading with memory_order_relaxed on the fast path of DCLP. The outer load must use memory_order_acquire. A relaxed load can be reordered by the CPU to occur before the null check passes, causing a thread to read an unconstructed object's fields.

Reusing a once_flag for independent initializations. A once_flag is a one-shot event that cannot be reset. Using the same flag for two separate initialization tasks silently skips the second one after the first succeeds.

The Phoenix problem. If destruction-time code (e.g., another static's destructor or a thread still running during shutdown) calls instance() after the Meyers singleton has been destroyed, the guard flag is not reset and the returned reference is to a destroyed object. A common mitigation for singletons with complex teardown dependencies is to new the instance and never delete it — the OS reclaims the memory at process exit, and the object is never formally destroyed, preventing use-after-destroy. This is a deliberate and documented tradeoff, not a leak in the pejorative sense.

See Also