Async Coroutines with `co_await`

The previous lessons built the coroutine machinery — promise types, the coroutine handle, generators with co_yield. This lesson focuses on the other half of the picture: co_await, which suspends a coroutine while waiting for some asynchronous result, then resumes it when the result is ready. The result is code that reads like straight-line sequential logic but executes asynchronously — no callbacks, no state machines, no threads required for concurrency.

The problem: I/O blocks or callbacks

Network I/O, file I/O, and timer waits are slow relative to CPU work — often microseconds to milliseconds when CPU instructions take nanoseconds. The classic approaches each have problems. Blocking I/O is the simplest to write but wastes a thread per connection. Callback-based async I/O scales but splits one logical operation across dozens of small functions, making error handling and control flow nearly unreadable. Thread-per-task avoids callbacks but burns memory (each thread needs a stack) and triggers costly context switches at high connection counts.

Coroutines with co_await offer a fourth path. A coroutine suspends at an await point, releases the thread back to the event loop, and resumes on the same or a different thread when the I/O completes. The code reads sequentially and the stack is off the heap in the coroutine frame — so you can have millions of suspended coroutines with negligible memory cost compared to millions of threads.

Blocking I/O

// Simple, but burns one
// thread per connection
Socket s = accept();
request = s.read();
response = handle(request);
s.write(response);

1 thread per connection. Doesn't scale.

Callback-based

acceptor.async_accept(
 [](Socket s) {
   s.async_read(
    [=](req) {
      s.async_write(
        handle(req), []{});
    });
 });

Scales, but error handling is brutal.

Coroutines

// Reads like blocking,
// executes like callbacks
Socket s = co_await accept();
auto req = co_await s.read();
co_await s.write(handle(req));

Scales + readable sequential code.

How `co_await` suspends and resumes

When the compiler encounters co_await expr, it calls the three awaitable interface functions on expr in sequence. First, await_ready() — if it returns true, the result is already available and the coroutine does not suspend at all. If it returns false, await_suspend(handle) is called with the handle to the current coroutine. This is where the awaitable stores the continuation — when the I/O completes, it calls handle.resume() to wake the coroutine up. Finally, await_resume() provides the actual value that the entire co_await expression evaluates to.

// What the compiler generates for:   auto value = co_await some_task;
//
// Step 1: check if result is already ready
if (!some_task.await_ready()) {
    // Step 2: save the continuation and suspend
    //         the awaitable schedules resume when done
    some_task.await_suspend(coroutine_handle<>::from_address(frame));
    // --- coroutine is now suspended; thread is free to do other work ---
    // <resume point>
}
// Step 3: retrieve the value (called after resume)
auto value = some_task.await_resume();

The key insight is that await_suspend receives the coroutine handle and decides what to do with it. For a simple in-memory task this might be an immediate handle.resume(). For a real async I/O operation, it would store the handle in a completion callback registered with the OS (via io_uring, epoll, IOCP, etc.) so the coroutine resumes exactly when the data arrives.

Chaining async coroutines

Coroutines compose naturally: a coroutine can co_await another coroutine that itself uses co_await, building a chain of suspended computations. Each suspension yields the thread back to whatever is driving the execution — the polling loop, the event loop, or the runtime scheduler. This is the foundation of structured async code: individual async operations are small, testable coroutines that compose into larger workflows with no callback nesting.

#include <coroutine>

// Forward declaration of our task type (built in the promise-type lesson)
template <typename T = void> struct task;

// Leaf coroutine: produces a value asynchronously
task<int> get_answer()
{
    // In a real program, this would co_await a timer or I/O
    // Here we co_return directly for illustration
    co_return 42;
}

// Middle coroutine: awaits another coroutine's result, then uses it
task<> print_answer()
{
    auto t    = co_await get_answer();   // suspends here until get_answer() is done
    int value = t;                       // or: int value = co_await get_answer();
    std::println("the answer is {}", value);
}

// Driver: executes a coroutine by polling it from non-coroutine context
template <typename T>
void execute(T&& t)
{
    while (!t.is_ready())
        t.resume();
}

int main()
{
    // Pattern 1: get the value from a coroutine directly
    auto t = get_answer();
    execute(t);
    std::println("answer = {}", t.value());  // 42

    // Pattern 2: chain coroutines — print_answer drives get_answer internally
    execute(print_answer());
}

The execute() function is a synchronous driver: it resumes the coroutine repeatedly until it is done. This works for coroutines that suspend and resume synchronously (no OS I/O involved). The important thing to understand is that main() cannot itself be a coroutine — it is one of the functions explicitly excluded by the C++20 standard. So any async chain must be driven from non-coroutine code at its root.

Remember: The following cannot be coroutines — constructors, destructors, constexpr functions, variadic functions, functions returning auto, and main().

The canonical async pattern: a network server

The textbook demonstration of async coroutines is a network accept loop. Without coroutines, accepting connections, reading a request, and writing a response requires either blocking (one thread per client) or a chain of callbacks. With coroutines, the logic collapses to a loop that reads like synchronous code but suspends the coroutine at each I/O boundary — freeing the thread to handle other connections while this one waits for data.

// Pseudo-code: the coroutine reads like blocking I/O
// but each co_await yields the thread back to the event loop

task<> handle_connection(Socket socket)
{
    while (true) {
        auto request  = co_await socket.read();    // suspend until data arrives
        auto response = process(request);
        co_await socket.write(response);           // suspend until write completes
        if (request.is_close()) break;
    }
}

task<> accept_loop(Acceptor acceptor)
{
    while (true) {
        Socket socket = co_await acceptor.accept(); // suspend until connection arrives
        // Spawn handle_connection without co_await — fire and forget
        // (in cppcoro: schedule_on(pool, handle_connection(socket)))
        auto conn = handle_connection(std::move(socket));
        conn.detach();   // or schedule on a thread pool
    }
}

int main()
{
    io_context ctx;       // event loop (Asio, io_uring, etc.)
    Acceptor acceptor { ctx, 443 };
    ctx.run(accept_loop(acceptor));   // drives everything
}

The event loop (io_context in this sketch) is what makes the pattern truly non-blocking. When a coroutine hits a co_await acceptor.accept(), it registers a completion callback with the OS and suspends. The event loop processes other ready events — timers, other connections, in-memory tasks — until the OS signals that a new connection has arrived, at which point the event loop resumes the suspended coroutine.

Exception propagation in async coroutines

When an exception propagates out of a coroutine body without being caught, the compiler calls promise.unhandled_exception(). In the minimal task<T> we built in the promise-type lesson, unhandled_exception() calls std::terminate() — fine for illustration, fatal in production. A robust implementation stores the exception with std::current_exception()and rethrows it from await_resume() when the caller retrieves the result. This means exceptions propagate across co_await boundaries exactly as they do in synchronous code — the awaiting coroutine sees the exception as if the co_await expression itself threw.

// Robust promise stores the exception for rethrow
struct promise_base {
    std::exception_ptr exception_;   // stores an uncaught exception

    void unhandled_exception() noexcept
    {
        exception_ = std::current_exception();  // capture, don't terminate
    }
};

// In await_resume: rethrow so the caller sees it
decltype(auto) await_resume()
{
    if (!handle_) throw std::runtime_error{"broken coroutine"};
    if (auto& ep = handle_.promise().exception_; ep)
        std::rethrow_exception(ep);         // propagates to the awaiting co_await
    return handle_.promise().get_value();
}

// Usage: exception propagates naturally across co_await
task<int> might_throw()
{
    throw std::runtime_error{"network error"};
    co_return 42;
}

task<> caller()
{
    try {
        int v = co_await might_throw();   // exception rethrown here
        std::println("{}", v);
    } catch (const std::exception& e) {
        std::println("caught: {}", e.what());  // "caught: network error"
    }
}

Bridging to `main()` — the sync_wait pattern

Since main() cannot be a coroutine, you need a synchronous bridge that blocks the current thread until a coroutine chain completes. The minimal version is the execute() polling loop shown earlier. Production code uses a library primitive — cppcoro calls it sync_wait — that integrates with the scheduler and avoids busy-polling. The pattern looks like:

// Minimal polling driver (no scheduler — busy-waits)
template <typename T>
auto sync_wait_poll(T&& task)
{
    while (!task.is_ready())
        task.resume();          // busy-polls until done
    return task.value();
}

// cppcoro's production version (blocks without busy-waiting)
// #include <cppcoro/sync_wait.hpp>
// auto result = cppcoro::sync_wait(my_coroutine());

int main()
{
    // With the minimal version:
    auto result = sync_wait_poll(get_answer());   // blocks until 42 is ready
    std::println("answer: {}", result);

    // With cppcoro (integrates with the scheduler, no busy-wait):
    // cppcoro::sync_wait(print_answer());
}

Important: The polling loop above spins the CPU while waiting. That is fine for unit tests and examples; it is unacceptable in production code that might wait on real I/O. Production code must use a library that integrates the coroutine resume callback with an OS-level I/O notification mechanism (epoll, kqueue, io_uring, IOCP) so the thread sleeps while waiting.

When async coroutines pay off

High-concurrency network servers (many connections, I/O-bound)

Coroutines + io_uring/IOCP: millions of suspended coroutines vs millions of threads

Composing multiple async operations sequentially

co_await chains: reads sequentially, executes asynchronously

Running a single async computation to completion from synchronous code

sync_wait() or a polling execute() loop — use a library for production

CPU-bound parallelism

std::async / thread pools are still the right tool — coroutines add no parallelism

Simple scripts or tools with one or two async operations

Blocking I/O is fine here — coroutines only pay off at scale

← Generators with co_yield Coroutine libraries

Async Coroutines with co_await