Write Safe Concurrent C++ with Threads and Mutexes

By the end of this page, you will know how to launch multiple threads in C++11, share data between them without corruption, coordinate work using std::mutex and std::lock_guard, and recognize the two most dangerous mistakes beginners make: data races and deadlocks.

What and Why

Modern CPUs have many cores sitting idle while a single-threaded program runs. Concurrency lets you put those cores to work simultaneously — shortening total runtime, keeping a UI responsive while a network call is in flight, or processing a queue of tasks in parallel.

C++ gained a standard threading model in C++11. Before that, every platform had its own API (pthreads on Linux, Win32 threads on Windows). Today <thread>, <mutex>, and related headers give you a portable, well-defined model for concurrent execution.

The central promise — and the central danger — of concurrency is shared mutable state. Two threads reading the same variable is fine. Two threads where at least one is writing to the same variable at the same time is a data race, which is undefined behaviour in C++. Your job as the programmer is to ensure that every write to shared data is protected so only one thread touches it at a time.

Step by Step

1. Launch a thread

std::thread takes any callable — a function, a lambda, a functor — and runs it concurrently.

cpp

Godbolt

#include <iostream>
#include <thread>

void greet(int id) {
    std::cout << "Hello from thread " << id << "\n";
}

int main() {
    std::thread t1(greet, 1);
    std::thread t2(greet, 2);

    t1.join();  // wait for t1 to finish
    t2.join();  // wait for t2 to finish
}

join() blocks the calling thread until the target thread completes. You must call either join() or detach() before a std::thread object is destroyed — failing to do so calls std::terminate().

2. Understand the race

Remove the mutex for a moment and see what goes wrong:

cpp

Godbolt

#include <iostream>
#include <thread>

int counter = 0;  // shared, unprotected

void increment() {
    for (int i = 0; i < 100'000; ++i)
        ++counter;  // DATA RACE — undefined behaviour
}

int main() {
    std::thread t1(increment);
    std::thread t2(increment);
    t1.join();
    t2.join();
    std::cout << counter << "\n";  // will NOT reliably print 200000
}

++counter looks like one operation but compiles to three: read the value, add one, write it back. Two threads interleave those steps and silently lose increments. Run this enough times and you will see different answers.

3. Protect shared data with `std::mutex`

A mutex (mutual exclusion object) ensures only one thread executes a critical section at a time. std::lock_guard acquires the mutex on construction and releases it when it goes out of scope — no chance of forgetting to unlock.

cpp

Godbolt

#include <iostream>
#include <mutex>
#include <thread>

int counter = 0;
std::mutex mtx;

void increment() {
    for (int i = 0; i < 100'000; ++i) {
        std::lock_guard<std::mutex> lock(mtx);
        ++counter;
    }
}

int main() {
    std::thread t1(increment);
    std::thread t2(increment);
    t1.join();
    t2.join();
    std::cout << counter << "\n";  // always 200000
}

In C++17 you can drop the template argument: std::lock_guard lock(mtx);. Either spelling works — choose whichever your compiler standard supports.

4. Prefer `std::unique_lock` when you need flexibility

lock_guard is fast and simple but can only unlock when it destructs. std::unique_lock can be unlocked early and re-locked, which is required when working with condition variables (covered in /reference/library/concurrency/condition-variable).

cpp

Godbolt

#include <mutex>
#include <vector>

std::mutex mtx;
std::vector<int> shared_data;

void add_item(int value) {
    std::unique_lock<std::mutex> lock(mtx);
    shared_data.push_back(value);
    lock.unlock();          // release early — no need to hold during the print
    // ...do something that does not need the lock...
}

Common Patterns

Pattern 1 — Parallel work on independent data

When threads genuinely do not share state, concurrency is effortless. Partition the input so each thread owns a separate slice:

cpp

Godbolt

#include <numeric>
#include <thread>
#include <vector>

long long partial_sum(const std::vector<int>& v, std::size_t begin, std::size_t end) {
    long long total = 0;
    for (auto i = begin; i < end; ++i) total += v[i];
    return total;
}

int main() {
    std::vector<int> data(1'000'000, 1);
    long long result_a = 0, result_b = 0;

    std::thread t1([&]{ result_a = partial_sum(data, 0,       500'000); });
    std::thread t2([&]{ result_b = partial_sum(data, 500'000, 1'000'000); });

    t1.join();
    t2.join();

    long long total = result_a + result_b;  // safe: reads happen after join()
}

Each thread writes to its own variable (result_a, result_b). The final read of both variables happens after join(), which acts as a happens-before barrier — no mutex needed.

Pattern 2 — A thread-safe task queue

A mutex-guarded queue is the backbone of many producer/consumer systems:

cpp

Godbolt

#include <mutex>
#include <optional>
#include <queue>

template<typename T>
class SafeQueue {
public:
    void push(T value) {
        std::lock_guard<std::mutex> lock(mtx_);
        q_.push(std::move(value));
    }

    std::optional<T> try_pop() {
        std::lock_guard<std::mutex> lock(mtx_);
        if (q_.empty()) return std::nullopt;
        T val = std::move(q_.front());
        q_.pop();
        return val;
    }

private:
    std::queue<T> q_;
    std::mutex    mtx_;
};

Pattern 3 — One-time initialisation with `std::call_once`

When expensive initialisation must happen exactly once regardless of how many threads race to trigger it, use std::once_flag and std::call_once:

cpp

Godbolt

#include <mutex>
#include <string>

std::once_flag config_flag;
std::string    config_value;

void ensure_config_loaded() {
    std::call_once(config_flag, []{
        config_value = "loaded_from_disk";  // runs exactly once
    });
}

What Can Go Wrong

Mistake 1 — Forgetting `join()` or `detach()`

cpp

Godbolt

void bad() {
    std::thread t([]{ /* work */ });
    // t goes out of scope here without join or detach → std::terminate()
}

Fix: use RAII. In C++20, std::jthread joins automatically on destruction (see /reference/library/concurrency/jthread). In C++11/14/17, join in a destructor or at the end of the owning scope.

Mistake 2 — Locking the wrong mutex (or two mutexes in the wrong order)

cpp

Godbolt

std::mutex a, b;

void thread1() { std::lock_guard la(a); std::lock_guard lb(b); }
void thread2() { std::lock_guard lb(b); std::lock_guard la(a); }
// thread1 holds a, waits for b; thread2 holds b, waits for a → DEADLOCK

Fix: always acquire multiple mutexes in the same order throughout the codebase, or use std::scoped_lock (C++17), which locks all of them atomically:

cpp

Godbolt

#include <mutex>

std::mutex a, b;

void safe_transfer() {
    std::scoped_lock lock(a, b);  // deadlock-free regardless of acquisition order
    // ... operate on resources guarded by a and b ...
}

Mistake 3 — Holding a lock too long

Locking around a slow operation (file I/O, network, std::cout) serialises all threads through that bottleneck. Keep critical sections as short as possible: do the computation outside the lock, then lock only for the final write.

Quick Reference

Need	Tool	Header
Run code concurrently	`std::thread`	`<thread>`
Auto-join on destruction (C++20)	`std::jthread`	`<thread>`
Exclusive access, simple	`std::lock_guard`	`<mutex>`
Exclusive access, flexible	`std::unique_lock`	`<mutex>`
Lock two mutexes atomically (C++17)	`std::scoped_lock`	`<mutex>`
Run a callable once across all threads	`std::call_once`	`<mutex>`
Lock-free integer operations	`std::atomic<T>`	`<atomic>`
Async result / future value	`std::async` + `std::future`	`<future>`

Compile flags: always pass -pthread on Linux/macOS and enable at least C++11: g++ -std=c++17 -pthread main.cpp.

What's Next

/reference/library/concurrency/atomic — when you just need a thread-safe counter, std::atomic beats a mutex for performance.
/reference/library/concurrency/async-future — std::async lets you run a function on a background thread and retrieve its return value via std::future, avoiding manual thread management entirely.
/reference/library/concurrency/condition-variable — coordinate threads that need to wait for a condition (e.g., "queue is non-empty") rather than busy-looping.
/reference/library/concurrency/jthread — C++20's safer, self-joining thread that also supports cooperative cancellation via std::stop_token.
/reference/library/concurrency/execution — parallel execution policies for standard algorithms let you parallelise a std::sort or std::transform with a single extra argument.