Multithreading & Job Systems in C++ Games
"Task-based parallelism for games: job queues, work stealing, fiber-based systems, and safe patterns for avoiding data races in game loops."
TL;DR
Modern games run CPU-bound logic on all cores via a job system — a work queue where threads pick up tasks and execute them. The game loop becomes a graph of dependent jobs rather than a sequential loop. C++17 parallel algorithms and std::jthread (C++20) are good starting points; production engines use custom fiber-based or work-stealing job schedulers.
The problem with naive multithreading
// Wrong: data races on shared state
std::vector<Entity> entities;
auto t1 = std::thread([&]{ for (auto& e : entities) updatePhysics(e); });
auto t2 = std::thread([&]{ for (auto& e : entities) updateAI(e); });
// Physics and AI accessing the same entities concurrently → UBThe solution isn't locking everything — locks serialize execution and kill performance. The solution is partitioning work so threads don't share mutable state.
Partition-based parallelism
// Divide entities into N chunks, one per thread
void parallelUpdate(std::vector<Entity>& entities, int num_threads) {
size_t chunk = entities.size() / num_threads;
std::vector<std::jthread> threads;
for (int t = 0; t < num_threads; ++t) {
size_t start = t * chunk;
size_t end = (t == num_threads - 1) ? entities.size() : start + chunk;
threads.emplace_back([&entities, start, end] {
for (size_t i = start; i < end; ++i)
updatePhysics(entities[i]);
});
}
// jthread joins on destruction
}This works when entities are independent (no inter-entity writes). Physics integration is embarrassingly parallel; collision detection is not.
C++17 parallel algorithms
The simplest entry point to parallelism:
#include <algorithm>
#include <execution>
// Transform with TBB/OpenMP under the hood
std::transform(std::execution::par_unseq,
positions.begin(), positions.end(),
velocities.begin(),
positions.begin(),
[dt](const Vec3& pos, const Vec3& vel) {
return pos + vel * dt;
});
// Parallel for_each
std::for_each(std::execution::par_unseq,
entities.begin(), entities.end(),
[](Entity& e) { e.update(); });Requires libtbb (Intel TBB) or similar on Linux. Good for non-game code; production games usually need finer control.
Simple job queue
#include <thread>
#include <functional>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <atomic>
class JobQueue {
public:
using Job = std::function<void()>;
explicit JobQueue(int num_workers) {
for (int i = 0; i < num_workers; ++i)
workers_.emplace_back([this] { workerLoop(); });
}
~JobQueue() {
{
std::lock_guard lock(mutex_);
done_ = true;
}
cv_.notify_all();
}
void submit(Job job) {
{
std::lock_guard lock(mutex_);
queue_.push(std::move(job));
}
cv_.notify_one();
}
void waitAll() {
std::unique_lock lock(mutex_);
cv_.wait(lock, [this] { return queue_.empty() && active_ == 0; });
}
private:
void workerLoop() {
while (true) {
Job job;
{
std::unique_lock lock(mutex_);
cv_.wait(lock, [this] { return !queue_.empty() || done_; });
if (done_ && queue_.empty()) return;
job = std::move(queue_.front());
queue_.pop();
++active_;
}
job();
{
std::lock_guard lock(mutex_);
--active_;
}
cv_.notify_all();
}
}
std::vector<std::jthread> workers_;
std::queue<Job> queue_;
std::mutex mutex_;
std::condition_variable cv_;
int active_ = 0;
bool done_ = false;
};
// Usage
JobQueue jobs(std::thread::hardware_concurrency() - 1);
void gameLoop() {
// Split physics update into jobs
for (int chunk = 0; chunk < num_chunks; ++chunk) {
jobs.submit([chunk] { updatePhysicsChunk(chunk); });
}
jobs.waitAll(); // sync point before rendering
render(); // single-threaded render command recording
}Dependency graph (job graph)
Real game engines use a dependency graph where jobs declare what they read/write:
Frame N:
[Input]──────────────────┐
[Physics]────────────────┤
[AI Decisions]───────────┤──[Collision]──[Apply Forces]──[Render]
[Animation Sampling]─────┘struct JobHandle { std::atomic<bool> done{false}; };
struct Job {
std::function<void()> fn;
std::vector<JobHandle*> dependencies;
JobHandle* handle;
};
void submitWithDeps(JobQueue& q, Job job) {
q.submit([j = std::move(job)] {
// Wait for all dependencies
for (auto* dep : j.dependencies)
while (!dep->done.load(std::memory_order_acquire))
std::this_thread::yield();
j.fn();
j.handle->done.store(true, std::memory_order_release);
});
}Production engines (Naughty Dog's Fibers, Unreal's Task Graph) implement this with lock-free queues and OS fibers for better throughput.
Thread-safe patterns
Read-many / write-once (frame flip)
// Double-buffered state — writer updates "back", reader uses "front"
struct DoubleBuffered {
EntityState state[2];
std::atomic<int> front{0};
EntityState& write() { return state[front ^ 1]; }
const EntityState& read() const { return state[front]; }
void flip() { front.store(front ^ 1, std::memory_order_release); }
};Atomic counters for task completion
std::atomic<int> remaining_jobs{num_jobs};
auto job = [&] {
doWork();
if (remaining_jobs.fetch_sub(1, std::memory_order_acq_rel) == 1) {
// Last job — signal completion
completion_event.set();
}
};What to parallelize
| System | Parallelizable? | Notes |
|---|---|---|
| Physics integration | Yes | Embarrassingly parallel |
| Collision detection broad phase | Yes | Spatial partitioning per thread |
| Collision response | Careful | May need write ordering |
| AI state machines | Yes | Independent per entity |
| Animation sampling | Yes | No shared state |
| Render command recording | Yes | Per-object draw calls |
| Audio mixing | Yes | Mix per-source, sum at end |
| Asset streaming I/O | Yes | Async I/O + thread pool |
| Gameplay scripting | Risky | Usually keep single-threaded |