Skip to content
C++
Domain Deep-Dive
Expert

Multithreading & Job Systems in C++ Games

"Task-based parallelism for games: job queues, work stealing, fiber-based systems, and safe patterns for avoiding data races in game loops."

TL;DR

Modern games run CPU-bound logic on all cores via a job system — a work queue where threads pick up tasks and execute them. The game loop becomes a graph of dependent jobs rather than a sequential loop. C++17 parallel algorithms and std::jthread (C++20) are good starting points; production engines use custom fiber-based or work-stealing job schedulers.

The problem with naive multithreading

cpp
// Wrong: data races on shared state
std::vector<Entity> entities;

auto t1 = std::thread([&]{ for (auto& e : entities) updatePhysics(e); });
auto t2 = std::thread([&]{ for (auto& e : entities) updateAI(e); });
// Physics and AI accessing the same entities concurrently → UB

The solution isn't locking everything — locks serialize execution and kill performance. The solution is partitioning work so threads don't share mutable state.

Partition-based parallelism

cpp
// Divide entities into N chunks, one per thread
void parallelUpdate(std::vector<Entity>& entities, int num_threads) {
    size_t chunk = entities.size() / num_threads;
    std::vector<std::jthread> threads;

    for (int t = 0; t < num_threads; ++t) {
        size_t start = t * chunk;
        size_t end = (t == num_threads - 1) ? entities.size() : start + chunk;

        threads.emplace_back([&entities, start, end] {
            for (size_t i = start; i < end; ++i)
                updatePhysics(entities[i]);
        });
    }
    // jthread joins on destruction
}

This works when entities are independent (no inter-entity writes). Physics integration is embarrassingly parallel; collision detection is not.

C++17 parallel algorithms

The simplest entry point to parallelism:

cpp
#include <algorithm>
#include <execution>

// Transform with TBB/OpenMP under the hood
std::transform(std::execution::par_unseq,
    positions.begin(), positions.end(),
    velocities.begin(),
    positions.begin(),
    [dt](const Vec3& pos, const Vec3& vel) {
        return pos + vel * dt;
    });

// Parallel for_each
std::for_each(std::execution::par_unseq,
    entities.begin(), entities.end(),
    [](Entity& e) { e.update(); });

Requires libtbb (Intel TBB) or similar on Linux. Good for non-game code; production games usually need finer control.

Simple job queue

cpp
#include <thread>
#include <functional>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <atomic>

class JobQueue {
public:
    using Job = std::function<void()>;

    explicit JobQueue(int num_workers) {
        for (int i = 0; i < num_workers; ++i)
            workers_.emplace_back([this] { workerLoop(); });
    }

    ~JobQueue() {
        {
            std::lock_guard lock(mutex_);
            done_ = true;
        }
        cv_.notify_all();
    }

    void submit(Job job) {
        {
            std::lock_guard lock(mutex_);
            queue_.push(std::move(job));
        }
        cv_.notify_one();
    }

    void waitAll() {
        std::unique_lock lock(mutex_);
        cv_.wait(lock, [this] { return queue_.empty() && active_ == 0; });
    }

private:
    void workerLoop() {
        while (true) {
            Job job;
            {
                std::unique_lock lock(mutex_);
                cv_.wait(lock, [this] { return !queue_.empty() || done_; });
                if (done_ && queue_.empty()) return;
                job = std::move(queue_.front());
                queue_.pop();
                ++active_;
            }
            job();
            {
                std::lock_guard lock(mutex_);
                --active_;
            }
            cv_.notify_all();
        }
    }

    std::vector<std::jthread> workers_;
    std::queue<Job> queue_;
    std::mutex mutex_;
    std::condition_variable cv_;
    int active_ = 0;
    bool done_ = false;
};

// Usage
JobQueue jobs(std::thread::hardware_concurrency() - 1);

void gameLoop() {
    // Split physics update into jobs
    for (int chunk = 0; chunk < num_chunks; ++chunk) {
        jobs.submit([chunk] { updatePhysicsChunk(chunk); });
    }
    jobs.waitAll();  // sync point before rendering

    render();        // single-threaded render command recording
}

Dependency graph (job graph)

Real game engines use a dependency graph where jobs declare what they read/write:

cpp
Frame N:
  [Input]──────────────────┐
  [Physics]────────────────┤
  [AI Decisions]───────────┤──[Collision]──[Apply Forces]──[Render]
  [Animation Sampling]─────┘
cpp
struct JobHandle { std::atomic<bool> done{false}; };

struct Job {
    std::function<void()> fn;
    std::vector<JobHandle*> dependencies;
    JobHandle* handle;
};

void submitWithDeps(JobQueue& q, Job job) {
    q.submit([j = std::move(job)] {
        // Wait for all dependencies
        for (auto* dep : j.dependencies)
            while (!dep->done.load(std::memory_order_acquire)) 
                std::this_thread::yield();
        j.fn();
        j.handle->done.store(true, std::memory_order_release);
    });
}

Production engines (Naughty Dog's Fibers, Unreal's Task Graph) implement this with lock-free queues and OS fibers for better throughput.

Thread-safe patterns

Read-many / write-once (frame flip)

cpp
// Double-buffered state — writer updates "back", reader uses "front"
struct DoubleBuffered {
    EntityState state[2];
    std::atomic<int> front{0};

    EntityState& write() { return state[front ^ 1]; }
    const EntityState& read() const { return state[front]; }

    void flip() { front.store(front ^ 1, std::memory_order_release); }
};

Atomic counters for task completion

cpp
std::atomic<int> remaining_jobs{num_jobs};

auto job = [&] {
    doWork();
    if (remaining_jobs.fetch_sub(1, std::memory_order_acq_rel) == 1) {
        // Last job — signal completion
        completion_event.set();
    }
};

What to parallelize

SystemParallelizable?Notes
Physics integrationYesEmbarrassingly parallel
Collision detection broad phaseYesSpatial partitioning per thread
Collision responseCarefulMay need write ordering
AI state machinesYesIndependent per entity
Animation samplingYesNo shared state
Render command recordingYesPer-object draw calls
Audio mixingYesMix per-source, sum at end
Asset streaming I/OYesAsync I/O + thread pool
Gameplay scriptingRiskyUsually keep single-threaded
Edit on GitHubUpdated 2026-05-01T00:00:00.000Z