Write Safe Concurrent C++ with Threads and Mutexes
Learn to launch threads, protect shared data with mutexes, and sidestep the classic pitfalls of concurrent C++ programs.
By the end of this page, you will know how to launch multiple threads in C++11, share data between them without corruption, coordinate work using std::mutex and std::lock_guard, and recognize the two most dangerous mistakes beginners make: data races and deadlocks.
What and Why
Modern CPUs have many cores sitting idle while a single-threaded program runs. Concurrency lets you put those cores to work simultaneously β shortening total runtime, keeping a UI responsive while a network call is in flight, or processing a queue of tasks in parallel.
C++ gained a standard threading model in C++11. Before that, every platform had its own API (pthreads on Linux, Win32 threads on Windows). Today <thread>, <mutex>, and related headers give you a portable, well-defined model for concurrent execution.
The central promise β and the central danger β of concurrency is shared mutable state. Two threads reading the same variable is fine. Two threads where at least one is writing to the same variable at the same time is a data race, which is undefined behaviour in C++. Your job as the programmer is to ensure that every write to shared data is protected so only one thread touches it at a time.
Step by Step
1. Launch a thread
std::thread takes any callable β a function, a lambda, a functor β and runs it concurrently.
#include <iostream>
#include <thread>
void greet(int id) {
std::cout << "Hello from thread " << id << "\n";
}
int main() {
std::thread t1(greet, 1);
std::thread t2(greet, 2);
t1.join(); // wait for t1 to finish
t2.join(); // wait for t2 to finish
}join() blocks the calling thread until the target thread completes. You must call either join() or detach() before a std::thread object is destroyed β failing to do so calls std::terminate().
2. Understand the race
Remove the mutex for a moment and see what goes wrong:
#include <iostream>
#include <thread>
int counter = 0; // shared, unprotected
void increment() {
for (int i = 0; i < 100'000; ++i)
++counter; // DATA RACE β undefined behaviour
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << counter << "\n"; // will NOT reliably print 200000
}++counter looks like one operation but compiles to three: read the value, add one, write it back. Two threads interleave those steps and silently lose increments. Run this enough times and you will see different answers.
3. Protect shared data with std::mutex
A mutex (mutual exclusion object) ensures only one thread executes a critical section at a time. std::lock_guard acquires the mutex on construction and releases it when it goes out of scope β no chance of forgetting to unlock.
#include <iostream>
#include <mutex>
#include <thread>
int counter = 0;
std::mutex mtx;
void increment() {
for (int i = 0; i < 100'000; ++i) {
std::lock_guard<std::mutex> lock(mtx);
++counter;
}
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << counter << "\n"; // always 200000
}In C++17 you can drop the template argument: std::lock_guard lock(mtx);. Either spelling works β choose whichever your compiler standard supports.
4. Prefer std::unique_lock when you need flexibility
lock_guard is fast and simple but can only unlock when it destructs. std::unique_lock can be unlocked early and re-locked, which is required when working with condition variables (covered in /reference/library/concurrency/condition-variable).
#include <mutex>
#include <vector>
std::mutex mtx;
std::vector<int> shared_data;
void add_item(int value) {
std::unique_lock<std::mutex> lock(mtx);
shared_data.push_back(value);
lock.unlock(); // release early β no need to hold during the print
// ...do something that does not need the lock...
}Common Patterns
Pattern 1 β Parallel work on independent data
When threads genuinely do not share state, concurrency is effortless. Partition the input so each thread owns a separate slice:
#include <numeric>
#include <thread>
#include <vector>
long long partial_sum(const std::vector<int>& v, std::size_t begin, std::size_t end) {
long long total = 0;
for (auto i = begin; i < end; ++i) total += v[i];
return total;
}
int main() {
std::vector<int> data(1'000'000, 1);
long long result_a = 0, result_b = 0;
std::thread t1([&]{ result_a = partial_sum(data, 0, 500'000); });
std::thread t2([&]{ result_b = partial_sum(data, 500'000, 1'000'000); });
t1.join();
t2.join();
long long total = result_a + result_b; // safe: reads happen after join()
}Each thread writes to its own variable (result_a, result_b). The final read of both variables happens after join(), which acts as a happens-before barrier β no mutex needed.
Pattern 2 β A thread-safe task queue
A mutex-guarded queue is the backbone of many producer/consumer systems:
#include <mutex>
#include <optional>
#include <queue>
template<typename T>
class SafeQueue {
public:
void push(T value) {
std::lock_guard<std::mutex> lock(mtx_);
q_.push(std::move(value));
}
std::optional<T> try_pop() {
std::lock_guard<std::mutex> lock(mtx_);
if (q_.empty()) return std::nullopt;
T val = std::move(q_.front());
q_.pop();
return val;
}
private:
std::queue<T> q_;
std::mutex mtx_;
};Pattern 3 β One-time initialisation with std::call_once
When expensive initialisation must happen exactly once regardless of how many threads race to trigger it, use std::once_flag and std::call_once:
#include <mutex>
#include <string>
std::once_flag config_flag;
std::string config_value;
void ensure_config_loaded() {
std::call_once(config_flag, []{
config_value = "loaded_from_disk"; // runs exactly once
});
}What Can Go Wrong
Mistake 1 β Forgetting join() or detach()
void bad() {
std::thread t([]{ /* work */ });
// t goes out of scope here without join or detach β std::terminate()
}Fix: use RAII. In C++20, std::jthread joins automatically on destruction (see /reference/library/concurrency/jthread). In C++11/14/17, join in a destructor or at the end of the owning scope.
Mistake 2 β Locking the wrong mutex (or two mutexes in the wrong order)
std::mutex a, b;
void thread1() { std::lock_guard la(a); std::lock_guard lb(b); }
void thread2() { std::lock_guard lb(b); std::lock_guard la(a); }
// thread1 holds a, waits for b; thread2 holds b, waits for a β DEADLOCKFix: always acquire multiple mutexes in the same order throughout the codebase, or use std::scoped_lock (C++17), which locks all of them atomically:
#include <mutex>
std::mutex a, b;
void safe_transfer() {
std::scoped_lock lock(a, b); // deadlock-free regardless of acquisition order
// ... operate on resources guarded by a and b ...
}Mistake 3 β Holding a lock too long
Locking around a slow operation (file I/O, network, std::cout) serialises all threads through that bottleneck. Keep critical sections as short as possible: do the computation outside the lock, then lock only for the final write.
Quick Reference
| Need | Tool | Header |
|---|---|---|
| Run code concurrently | std::thread | <thread> |
| Auto-join on destruction (C++20) | std::jthread | <thread> |
| Exclusive access, simple | std::lock_guard | <mutex> |
| Exclusive access, flexible | std::unique_lock | <mutex> |
| Lock two mutexes atomically (C++17) | std::scoped_lock | <mutex> |
| Run a callable once across all threads | std::call_once | <mutex> |
| Lock-free integer operations | std::atomic<T> | <atomic> |
| Async result / future value | std::async + std::future | <future> |
Compile flags: always pass -pthread on Linux/macOS and enable at least C++11: g++ -std=c++17 -pthread main.cpp.
What's Next
- /reference/library/concurrency/atomic β when you just need a thread-safe counter,
std::atomicbeats a mutex for performance. - /reference/library/concurrency/async-future β
std::asynclets you run a function on a background thread and retrieve its return value viastd::future, avoiding manual thread management entirely. - /reference/library/concurrency/condition-variable β coordinate threads that need to wait for a condition (e.g., "queue is non-empty") rather than busy-looping.
- /reference/library/concurrency/jthread β C++20's safer, self-joining thread that also supports cooperative cancellation via
std::stop_token. - /reference/library/concurrency/execution β parallel execution policies for standard algorithms let you parallelise a
std::sortorstd::transformwith a single extra argument.