What is Semaphore, Latch, and Barrier (C++20) in C++?

C++20 counting_semaphore, binary_semaphore, latch, and barrier — efficient thread synchronization primitives with defined happens-before semantics.

Which C++ standard introduced Semaphore, Latch, and Barrier (C++20)?

Semaphore, Latch, and Barrier (C++20) was introduced in C++20.

What is the difficulty level of Semaphore, Latch, and Barrier (C++20)?

Semaphore, Latch, and Barrier (C++20) is considered Intermediate-level C++ material.

Semaphore, Latch, and Barrier (C++20)

std::latch / std::barrier / std::counting_semaphoresince C++20

Three lightweight synchronization primitives — std::latch (one-shot countdown), std::barrier (reusable phase rendezvous), and std::counting_semaphore (bounded permit pool) — each more efficient than std::mutex + std::condition_variable for their respective coordination patterns.

Overview

C++20 introduces three purpose-built synchronization primitives that cover common coordination patterns previously requiring manual mutex/condvar combinations. Each has a distinct behavioral contract:

std::latch (<latch>): a countdown counter that reaches zero exactly once. All threads blocking on wait() unblock permanently when the count hits zero. Non-reusable.
std::barrier (<barrier>): a reusable rendezvous for a fixed set of threads. After every participant arrives, an optional completion callback runs in one of the arriving threads, then all participants are released for the next phase.
std::counting_semaphore / std::binary_semaphore (<semaphore>): a permit counter. Threads block when the count is zero and proceed after acquiring a permit. Unlike std::mutex, semaphores carry no ownership — any thread may call release().

All three guarantee happens-before semantics: side effects visible before the signal operation (count_down, arrive, release) are visible to threads that observe the signal (wait, acquire).

Syntax

cpp

Godbolt

#include <latch>
#include <barrier>
#include <semaphore>

// --- std::latch (C++20) ---
std::latch gate{N};
gate.count_down(n = 1);        // decrement by n (does not block)
gate.arrive_and_wait(n = 1);   // count_down(n) then block until count == 0
gate.wait();                   // block until count == 0
bool done = gate.try_wait();   // non-blocking: returns true if count == 0

// --- std::barrier (C++20) ---
std::barrier sync{N};                          // N participants, no callback
std::barrier sync{N, []() noexcept { ... }};  // completion callback — MUST be noexcept

auto token = sync.arrive();       // decrement arrival count; return opaque phase token
sync.wait(std::move(token));      // block until the phase corresponding to token ends
sync.arrive_and_wait();           // arrive() + wait() combined (most common)
sync.arrive_and_drop();           // arrive, then permanently remove self from future phases

// --- std::counting_semaphore<LeastMaxValue> (C++20) ---
std::counting_semaphore<16> sem{initial};  // template arg is a lower-bound hint on max value
std::binary_semaphore flag{0};             // alias for counting_semaphore<1>

sem.acquire();                            // block until count > 0, then decrement
sem.release(n = 1);                       // increment by n, unblock waiters
bool ok = sem.try_acquire();             // non-blocking; returns false if count == 0
bool ok = sem.try_acquire_for(dur);      // timed variant
bool ok = sem.try_acquire_until(tp);     // timed variant

The LeastMaxValue template parameter on counting_semaphore is a lower bound: the implementation may support larger values but may optimize storage when the hint is small. std::binary_semaphore allows reduction to a single bit of storage.

Examples

One-shot startup gate with `std::latch`

Workers initialize themselves and signal readiness; the main thread proceeds only after all are ready.

cpp

Godbolt

#include <latch>
#include <thread>
#include <vector>

void launch_workers(int n) {
    std::latch ready{n};
    std::vector<std::jthread> workers;
    workers.reserve(n);

    for (int i = 0; i < n; ++i) {
        workers.emplace_back([&, i] {
            load_thread_local_state(i);  // expensive per-thread init
            ready.count_down();          // signal: I'm ready (does not block)
            run_event_loop(i);           // proceeds immediately
        });
    }

    ready.wait();  // main blocks until all n workers counted down
    start_accepting_requests();
}  // jthread destructors join

Everything a worker does before count_down() is visible to the main thread after ready.wait() returns — the happens-before guarantee makes this safe without additional synchronization.

Iterative parallel algorithm with `std::barrier`

The barrier completion callback is ideal for inter-phase state transitions: it runs exactly once after all threads arrive and before any thread proceeds.

cpp

Godbolt

#include <barrier>
#include <thread>
#include <vector>

void jacobi_iterate(std::vector<double>& u, int rows, int cols, int iters) {
    std::vector<double> v(u);  // scratch buffer
    const int n = static_cast<int>(std::thread::hardware_concurrency());

    // Callback runs after all threads finish a sweep — safe to swap buffers here
    std::barrier sync{n, [&]() noexcept { u.swap(v); }};

    auto worker = [&](int tid) {
        const int lo = 1 + tid * ((rows - 2) / n);  // skip boundary rows
        const int hi = lo + (rows - 2) / n;

        for (int iter = 0; iter < iters; ++iter) {
            // Read from u (previous values), write to v (new values)
            for (int r = lo; r < hi; ++r)
                for (int c = 1; c < cols - 1; ++c)
                    v[r * cols + c] = 0.25 * (
                        u[(r-1)*cols+c] + u[(r+1)*cols+c] +
                        u[r*cols+c-1]  + u[r*cols+c+1]);

            sync.arrive_and_wait();
            // After this point: callback swapped u and v
            // u now holds the values just computed; next iter reads them
        }
    };

    std::vector<std::jthread> threads;
    for (int t = 0; t < n; ++t)
        threads.emplace_back(worker, t);
}

The buffer swap inside the callback is sequenced with respect to all threads — workers see the swapped state immediately after arrive_and_wait() returns.

Resource pool with `std::counting_semaphore`

cpp

Godbolt

#include <semaphore>

template<int MaxConns>
struct ConnectionPool {
    std::counting_semaphore<MaxConns> slots;

    explicit ConnectionPool(int initial) : slots{initial} {}

    template<typename F>
    auto with_connection(F&& fn) {
        slots.acquire();
        struct Guard {
            std::counting_semaphore<MaxConns>& s;
            ~Guard() { s.release(); }
        } g{slots};
        return std::forward<F>(fn)(checkout_connection());
    }
};

ConnectionPool<8> db_pool{3};  // at most 3 concurrent queries; storage hint: 8 max

Ping-pong signaling with `std::binary_semaphore`

cpp

Godbolt

#include <semaphore>
#include <string>

std::string shared_item;
std::binary_semaphore produced{0};  // consumer waits on this; starts unsignaled
std::binary_semaphore consumed{1};  // producer waits on this; starts signaled (slot free)

std::jthread producer([] {
    for (auto& item : data_source()) {
        consumed.acquire();  // wait for consumer to clear the slot
        shared_item = item;
        produced.release();  // notify consumer
    }
});

std::jthread consumer([] {
    while (has_more()) {
        produced.acquire();  // wait for producer
        process(shared_item);
        consumed.release();  // notify producer
    }
});

Two semaphores implement single-slot producer-consumer with back-pressure. No mutex, no condition variable, no spurious wakeups.

Best Practices

Use latch for one-shot startup, not for repeated synchronization. std::latch is non-reusable by design — once the count reaches zero it stays there permanently. For repeated synchronization between rounds, allocate a fresh latch per round or switch to barrier.

Put inter-phase state transitions in the barrier completion callback. It executes after all arrivals and before any thread proceeds, providing a sequenced window to swap buffers, advance phase counters, or update shared state — without additional locking or explicit coordination.

Always wrap semaphore acquisition in RAII. The standard provides no scope guard for semaphores. A bare acquire()/release() pair is exception-unsafe. Use a guard struct (as shown above) or a custom wrapper type.

Use arrive_and_drop() when a thread finishes early. In workloads where some threads complete their share before others, arrive_and_drop() permanently removes that thread from the expected count for all future phases. Without it, the remaining threads will block forever at the next barrier.

Prefer binary_semaphore over mutex for producer-consumer signaling. Semaphores allow release() from a different thread than acquire(), which mutexes do not. For one-directional notification (producer signals consumer), binary_semaphore is the right tool.

Common Pitfalls

Decrementing a latch below zero is undefined behavior. Calling count_down(n) when the internal count is less than n is UB. If multiple code paths can call count_down, guard against over-decrement — the standard provides no diagnostic.

Barrier completion callbacks must be noexcept. Any exception thrown from the callback propagates to std::terminate. Wrap fallible logic in try/catch and stash the exception for later inspection.

Semaphores are not mutexes. Any thread can call release(), even one that never called acquire(). Calling release(n) when the count would exceed LeastMaxValue is undefined behavior. Semaphores provide no recursive-acquire, no priority-inversion protection, and no ownership tracking.

Don't silently discard arrive()'s return value. The opaque arrival_token is the only way to later synchronize with that phase via wait(token). Discarding it means the calling thread never synchronizes with phase completion — almost always a bug. If you want fire-and-forget arrival, use arrive_and_wait() and document the intent.

std::latch and std::barrier are neither copyable nor movable. Manage them via reference, pointer, or std::shared_ptr. Attempting to pass or return by value will not compile.