Skip to content
C++
Idiom
since C++11
Advanced

Executor

Design pattern decoupling callable work from its execution context; formalized in C++26 as the scheduler/sender/receiver model in std::execution.

Executorsince C++11

An executor is an object that decouples the submission of callable work from the execution context that runs it, letting algorithms dispatch tasks uniformly to thread pools, event loops, GPU streams, or inline runners without encoding that choice at the call site.

Overview

Any non-trivial async codebase eventually asks the same question: who decides where work runs? Hard-wiring std::thread or std::async inside a library component couples it to a specific execution strategy, preventing reuse across contexts that differ in threading, scheduling policy, or hardware.

The executor idiom answers this by introducing an intermediary: a lightweight handle to an execution context that accepts callable work and arranges for it to run. The caller controls the executor; the algorithm only knows how to submit through it.

The pattern appeared throughout the C++11–20 ecosystem β€” in Boost.ASIO's io_context, Intel TBB's task_arena, and folly's Executor base class β€” but each framework defined its own interface. Code written against one could not be reused with another, and composing pipelines across framework boundaries required manual glue.

C++26 standardized the vocabulary through the sender/receiver model (formerly P2300). The standard introduces four interlocking concepts:

  • Scheduler β€” a lightweight, copyable handle to an execution context (thread pool, run_loop, GPU queue). A scheduler does not run work itself; it is a factory that produces senders.
  • Sender β€” a lazy description of async work. A sender does not begin executing when constructed. It describes what should happen; execution begins only when the sender is connected to a receiver and the resulting operation state is started.
  • Receiver β€” a generalized callback with three typed completion channels: set_value (success result), set_error (exception or error code), and set_stopped (cancellation signal).
  • Operation State β€” the heap- or stack-resident object produced by connecting a sender to a receiver. Calling start() on it begins the async operation; the operation state's address must remain stable until one of the three completion functions fires.

This replaces the "callable in, future out" interface of std::async (C++11) with a composable graph model where pipelines are built before any work begins.

Syntax

Pre-standard executor (C++11 idiom)

Before C++26, an executor was any type satisfying an informal concept: it exposes an execute method that accepts a callable and arranges for it to run. A minimal thread-pool implementation:

cpp
// C++11 β€” hand-rolled executor concept
class ThreadPoolExecutor {
    std::vector<std::thread>              workers_; // C++11
    std::queue<std::function<void()>>     tasks_;   // C++11
    std::mutex                            mu_;
    std::condition_variable               cv_;
    bool                                  stop_ = false;

public:
    explicit ThreadPoolExecutor(std::size_t n) {
        for (std::size_t i = 0; i < n; ++i)
            workers_.emplace_back([this] {
                for (;;) {
                    std::function<void()> task;
                    {
                        std::unique_lock lk{mu_}; // C++17 CTAD
                        cv_.wait(lk, [this]{ return stop_ || !tasks_.empty(); });
                        if (stop_ && tasks_.empty()) return;
                        task = std::move(tasks_.front());
                        tasks_.pop();
                    }
                    task();
                }
            });
    }

    template<typename F>
    void execute(F&& f) {
        { std::lock_guard lk{mu_}; tasks_.emplace(std::forward<F>(f)); } // C++17 CTAD
        cv_.notify_one();
    }

    ~ThreadPoolExecutor() {
        { std::lock_guard lk{mu_}; stop_ = true; }
        cv_.notify_all();
        for (auto& t : workers_) t.join();
    }
};

// Algorithm parameterized over any executor that satisfies the concept
template<typename Executor, typename F>
void schedule(Executor& ex, F&& task) {
    ex.execute(std::forward<F>(task));
}

The shortfall: no standard way to propagate errors, no cancellation contract, and callers must bolt on a return-value mechanism (std::promise/std::future, callbacks) ad hoc.

C++26 std::execution

The standard library provides #include <execution> (C++26) with a run_loop execution context, sender factories, and adaptors:

cpp
#include <execution>  // C++26
#include <thread>     // C++11
#include <string>     // C++98

int main() {
    std::execution::run_loop loop;                      // C++26

    // Worker thread drives the run_loop; stop_token triggers finish()
    std::jthread worker([&](std::stop_token st) {       // C++20 jthread
        std::stop_callback cb{st, [&]{ loop.finish(); }}; // C++20
        loop.run();   // blocks until loop.finish() is called
    });

    std::execution::scheduler auto sched = loop.get_scheduler(); // C++26

    // Build the pipeline β€” no work executes yet
    std::execution::sender auto work =
        std::execution::just(std::string{"hello, world"})   // C++26: value sender
        | std::execution::then([](std::string s) {          // C++26: transform
              return static_cast<int>(s.size());
          });

    // Pin the pipeline to sched and block until done
    std::execution::sender auto pinned = std::execution::on(sched, std::move(work)); // C++26

    auto [n] = std::this_thread::sync_wait(std::move(pinned)).value(); // C++26
    // n == 13
}

sync_wait returns std::optional<std::tuple<Ts...>> holding the value-channel results. It returns std::nullopt on cancellation and throws on error-channel completion.

Examples

Execution-context-agnostic algorithm

A function that computes a result asynchronously, with the scheduler supplied by the caller:

cpp
// C++26 β€” caller chooses execution context; algorithm stays generic
#include <execution>
#include <span>    // C++20
#include <numeric> // C++98

auto async_dot_product(std::execution::scheduler auto sched,
                       std::span<const float> a,  // C++20
                       std::span<const float> b) {
    return std::execution::on(
        sched,
        std::execution::just(a, b)
            | std::execution::then([](std::span<const float> x,
                                      std::span<const float> y) {
                  float sum = 0.f;
                  for (std::size_t i = 0; i < x.size(); ++i)
                      sum += x[i] * y[i];
                  return sum;
              })
    );
}

// Usage β€” same algorithm, different contexts:
// auto [r1] = std::this_thread::sync_wait(async_dot_product(pool_sched, a, b)).value();
// auto [r2] = std::this_thread::sync_wait(async_dot_product(gpu_sched,  a, b)).value();

Multi-stage pipeline with context transitions

A realistic pipeline: read on an IO scheduler, compute on a CPU pool, then continue on the originating context:

cpp
// C++26 β€” multi-stage pipeline composing two schedulers
auto io_then_compute(std::execution::scheduler auto io_sched,
                     std::execution::scheduler auto cpu_sched,
                     std::string path) {
    return std::execution::on(
               io_sched,
               std::execution::just(std::move(path))
                   | std::execution::then([](std::string p) {
                         return read_file(p); // returns std::vector<std::byte>
                     })
           )
           | std::execution::then([cpu_sched](std::vector<std::byte> raw) {
                 return std::execution::on(
                     cpu_sched,
                     std::execution::just(std::move(raw))
                         | std::execution::then([](std::vector<std::byte> data) {
                               return parse(data); // returns ParseResult
                           })
                 );
             });
}

Execution begins only when a consumer calls sync_wait or connects a receiver β€” constructing the pipeline is pure description with no side effects.

Best Practices

Parameterize over schedulers, not thread handles. A function accepting std::execution::scheduler auto works with any conforming execution context. A function accepting std::thread& or a pool pointer is locked to one implementation.

Keep senders lazy. A sender must not begin work in its constructor. Code that launches a thread or enqueues a task during sender construction breaks composition: an on() adaptor can no longer redirect that work to the correct scheduler, and two senders cannot be safely restarted.

Handle all three completion channels. Every receiver implementation must supply set_value, set_error, and set_stopped. Omitting set_stopped silently converts cancellation into a hang; omitting set_error converts a propagated exception into std::terminate.

Prefer run_loop for integration and testing. std::execution::run_loop is the reference single-threaded event loop: deterministic, easy to control, zero external dependencies. It is the right starting point when adapting a third-party async framework to the std::execution model, or when writing unit tests that need a real execution context without a thread pool.

Avoid std::async as a substitute. std::async (C++11) is not an executor. Its default launch policy is implementation-defined, its std::future destructor blocks when the future is the last handle, and it provides no cancellation or composition model. Do not retrofit it into executor-parameterized code.

Common Pitfalls

Destroying an operation state before completion. Once start() is called on an operation state, its address must remain stable until a completion function fires. Storing the operation state in a temporary, moving it, or letting it go out of scope early is undefined behavior. This is the most common bug in hand-written sender implementations.

Confusing scheduler lifetime with execution-context lifetime. A scheduler is a lightweight handle β€” cheap to copy, trivial to pass by value. The execution context it points to (the thread pool, the run_loop) has a separate, longer lifetime. Storing a scheduler and then destroying the pool it refers to produces a dangling reference that may not crash immediately.

Blocking inside a sender on the owning scheduler's thread. Calling sync_wait from a thread that belongs to the scheduler you are waiting on is a deadlock. The worker cannot dequeue the completion because it is blocked waiting for it. The pattern is structurally identical to calling std::future::get inside a task submitted to the same thread pool.

Assuming the value channel carries exceptions. Exceptions are routed through set_error, not set_value. A receiver that ignores set_error will std::terminate when a sender propagates an uncaught exception. Always wire all three channels even if the error handler is just a rethrow.

Using type-erased std::function in hot paths. The pre-standard executor pattern typically stores work as std::function<void()>, which allocates on the heap for non-trivial callables. In tight dispatch loops this dominates. The C++26 sender model avoids this by representing the continuation type statically in the operation state, enabling allocation-free dispatch when the operation state fits on the stack.

See Also

  • std::execution β€” C++26 header containing the full sender/receiver vocabulary
  • std::jthread (C++20) β€” cooperative thread with std::stop_token integration; the natural worker thread for run_loop-based execution contexts
  • std::future / std::promise (C++11) β€” shared-state mechanism for one-shot async values; useful in isolation but does not compose into pipelines
  • reference/idioms/command β€” the command pattern is the conceptual ancestor of a submitted work unit; executors generalize command dispatch across heterogeneous execution resources