What Are Coroutines and Why They Exist
A coroutine is a function that can be suspended in the middle of its execution and resumed at a later point in time — without blocking a thread and without discarding its local state. C++20 formalises coroutines as a core language feature, enabling a style of programming that was previously only possible through callbacks, state machines, or third-party libraries.
The problem coroutines solve
Writing code that must do several things concurrently — or that must produce a sequence of values one at a time, pausing between each — has historically required one of two unpleasant compromises. You could launch additional threads, which introduces synchronization overhead and the risk of data races. Or you could restructure your logic as a callback chain or an explicit state machine, sacrificing the linear readability that makes programs easy to reason about.
Consider a generator that produces the next value in a Fibonacci sequence each time it is called. With a regular function you cannot pause between values and return to exactly where you left off — a function either runs to completion or it does not run. The classic workaround is a class with member state that remembers where it is, which is verbose and obscures the algorithm. The idiomatic C++20 solution is a coroutine that simply uses co_yield to hand a value to the caller and then waits to be asked for the next one.
Pre-coroutine — stateful class
class FibGenerator {
long long a = 0, b = 1;
public:
long long next() {
auto val = a;
auto tmp = a + b;
a = b;
b = tmp;
return val;
}
};
FibGenerator gen;
for (int i = 0; i < 10; ++i)
std::cout << gen.next() << '\n';
// Works, but algorithm is obscured by stateC++23 coroutine — algorithm reads naturally
#include <generator>
std::generator<long long> fibonacci() {
long long a = 0, b = 1;
while (true) {
co_yield a; // return a, pause here
auto tmp = a + b;
a = b;
b = tmp;
}
}
for (long long v : fibonacci() | std::views::take(10))
std::cout << v << '\n';
// Algorithm reads exactly as you'd describe itSuspension and resumption: how it works
When a coroutine suspends, the compiler saves all of its local variables, temporaries, and the instruction pointer into a heap-allocated structure called the coroutine frame (sometimes called the coroutine state). The calling function gets control back immediately — as if the coroutine had returned — but the coroutine frame persists. When the coroutine is resumed (by the caller or by some scheduler), execution continues from exactly the point where it paused, with all local state intact.
This is fundamentally different from a thread switch. A thread switch saves the entire call stack and hands control to another thread; the OS is involved, and the cost is measured in microseconds. A coroutine suspension saves only the coroutine's own frame and hands control back to the caller in the same thread — with zero OS involvement and nanosecond overhead. Millions of coroutines can be in flight simultaneously because they share a single thread and only one of them runs at any moment.
What happens at a suspension point
Coroutine hits co_yield / co_await
The expression is evaluated. The coroutine frame is updated with current local state.
Control returns to the caller
The caller gets the yielded value (or a handle) immediately. No blocking. No thread switch.
Coroutine frame stays alive
All local variables remain intact in heap memory until the coroutine is resumed or destroyed.
Resumption
When the caller asks for the next value (or an awaited result is ready), execution continues from the suspension point.
The three coroutine keywords
A function becomes a coroutine the moment the compiler sees any of three new keywords in its body. There is no special declaration syntax — the presence of these keywords is sufficient and necessary for the compiler to transform the function into a coroutine. You cannot use plain return in a coroutine body (except in unreachable code); you must use co_return instead.
co_yield exprYield a value to the caller and suspendProduces a value of type expr and suspends the coroutine. The caller receives the value. When the coroutine is resumed, execution continues immediately after the co_yield expression. Used to implement generators and lazy sequences.
// Yields 0, 1, 2, 3, … one at a time
std::generator<int> counting(int start = 0) {
while (true) co_yield start++;
}co_await exprSuspend until an awaitable is readyEvaluates expr to an awaitable object and checks if the result is already available. If it is, execution continues immediately. If not, the coroutine suspends and a completion callback is registered; when the awaitable signals completion, the coroutine is resumed automatically.
// Suspend until the async read completes
// (using a hypothetical async_read that returns an awaitable)
async_task<std::string> readFile(std::string path) {
auto content = co_await async_read(path); // suspends here
co_return content; // resumes here
}co_return [expr]Return a final value and terminate the coroutineEnds the coroutine's execution. An optional expression provides the final return value, which is stored in the promise object. After co_return, the coroutine cannot be resumed. The coroutine frame is destroyed when the last handle to it is released.
// co_return with a value
async_task<int> compute() {
// ... do work ...
co_return 42; // stores 42 in the promise; terminates
}
// co_return without a value (void coroutine)
async_task<void> setup() {
co_await doSomethingAsync();
co_return; // explicit termination (optional at end of body)
}Stackless coroutines: what C++ chose
There are two families of coroutine implementations in the industry: stackful and stackless. C++20 chose the stackless variant, and this choice has practical consequences.
Stackful coroutines
- Maintain a full shadow call stack
- Can suspend from any depth of nested calls
- Each coroutine costs kilobytes of stack memory
- Example: Go goroutines, Boost.Context fibers
Stackless coroutines ← C++20
- Only saves the frame of the coroutine itself
- Can only suspend from the top-level coroutine frame
- Frame size proportional to local variables only (often <100 bytes)
- Millions can exist simultaneously with minimal memory
The stackless design means you cannot call a regular function from inside a coroutine and have that regular function suspend the coroutine. Only the coroutine body itself contains suspension points. If a nested call needs to suspend, it must itself be a coroutine (and you must co_await it). This propagates up the call chain — in practice, async code tends to be coroutines all the way down. The upside is exceptional memory efficiency: a server can maintain a coroutine per connection with no more memory than a struct per connection.
Coroutines are not threads
This distinction trips up many developers. A coroutine, by itself, does not introduce concurrency. When a coroutine suspends at a co_await, another coroutine might run on the same thread — but only because a scheduler (often the event loop of a coroutine library) decides to resume it. Two coroutines on the same thread never run simultaneously; they cooperatively yield control.
Concurrency
Thread
True: multiple threads can run on different CPU cores simultaneously.
Coroutine
False by default: one coroutine runs at a time on a thread. Parallelism requires explicitly scheduling coroutines across threads.
Context switch cost
Thread
~1–10 µs (OS involvement, full register save/restore).
Coroutine
~10–100 ns (simple frame pointer swap, no OS).
Memory per unit
Thread
~8 MB stack (default) even if 99% is unused.
Coroutine
Exact size of local variables (often <1 KB).
Data sharing hazards
Thread
Data races if shared data is accessed without synchronization.
Coroutine
No races within a single thread; safe by construction without locks.
Where coroutines shine
Generators and lazy sequences
Produce a sequence of values on demand without buffering the entire sequence in memory. The Fibonacci example above is the archetypal case. std::generator (C++23) provides a ready-made coroutine type for this.
Asynchronous I/O
Issue an I/O operation, suspend the coroutine, and resume it when the data arrives — all without blocking a thread. A server using coroutines can handle thousands of concurrent connections on a single thread, with each connection's state living in a small coroutine frame.
Lazy computations
Compute only as much of a result as is needed. A coroutine that builds a large data structure can yield partial results; the consumer stops requesting more when it has what it needs — and the coroutine frame is simply destroyed.
Event-driven state machines
Express a state machine as a linear sequence of co_await calls rather than an explicit switch-on-state dispatch. Each co_await represents 'wait for this event', making the control flow readable even when the underlying protocol is complex.
Pipelines and stream processing
Chain coroutines together: one coroutine produces data, the next transforms it, the last consumes it — all suspended and resumed in lockstep without intermediate buffers.
Low-level machinery, high-level libraries
C++20 provides the language machinery for coroutines — the keywords, the coroutine frame, and a small set of customisation points (the promise type and awaitable protocol) that determine exactly how suspension and resumption behave. It does not provide much in the way of ready-made high-level types.
C++23 closes the most obvious gap by standardising std::generator<T> in <generator> — a ready-to-use coroutine type for the generator pattern. For everything else — async I/O, task types, schedulers — you will typically reach for a library such as cppcoro (Lewis Baker), Asio (Boost or standalone), or libunifex (Facebook) until the standard catches up. The dedicated lessons in this module build up from the language machinery to practical use of these libraries.
// std::generator<T> — the one ready-made coroutine type in C++23
// Defined in <generator>, requires C++23
#include <generator>
#include <ranges>
#include <iostream>
std::generator<int> sequence(int start, int count) {
for (int i = start; i < start + count; ++i)
co_yield i; // yield i, suspend, resume on next iteration
}
int main() {
// generators integrate with ranges/views
for (int v : sequence(10, 5))
std::cout << v << ' '; // prints: 10 11 12 13 14
}