Read-Copy Update (RCU)
Lock-free reader synchronization for read-heavy workloads — readers never block; writers retire old data after a guaranteed grace period.
RCU (Read-Copy Update)since C++26A synchronization mechanism where readers access shared data with a single atomic load and no locking, while writers publish new versions atomically and defer reclamation of old versions until every thread has passed through a quiescent state — a moment when it holds no live reference to RCU-protected data.
Overview
RCU solves the same problem as a reader-writer lock — coordinating readers and writers on shared mutable data — but with fundamentally different performance characteristics. A reader-writer lock serializes every reader behind any ongoing write; RCU eliminates all reader-side synchronization. The read path is a single memory_order_acquire atomic load followed by ordinary pointer dereference. No mutex, no CAS, no per-thread counter.
This makes RCU ideal for data that is read millions of times per second but updated infrequently: routing tables, DNS caches, security policy configurations, feature-flag snapshots, subscriber registries.
The cost is asymmetric write complexity. A writer cannot update in place. It must:
- Allocate a new version of the protected object.
- Atomically publish the new pointer (replacing the shared
std::atomic<T*>). - Retire the old pointer — scheduling its destruction after a grace period elapses.
A grace period is the minimum interval during which every thread has passed through at least one quiescent state. The RCU domain monitors these transitions and invokes each retired object's deleter only after a complete grace period has elapsed, guaranteeing no thread still holds a reference to the object being destroyed.
RCU vs hazard pointers: both enable lock-free reads. Hazard pointers require each reader to register the exact pointer it is about to dereference in a shared hazard record before use — per-access write traffic, but immediate per-object protection. RCU requires no per-access registration; instead the domain tracks per-thread quiescent states, making reads cheaper but requiring that read-side critical sections be bounded in duration and not persist across thread sleep or synchronization boundaries.
C++26 provides RCU via the <rcu> header. The implementation strategy (epoch counters, quiescent-state barriers, periodic scans) is left unspecified; only the observable guarantees are standardized.
Syntax
// <rcu> — C++26
namespace std {
// Manages retired objects and tracks grace periods across threads.
class rcu_domain { /* implementation-defined */ };
// Returns the process-wide default domain.
rcu_domain& rcu_default_domain() noexcept;
// CRTP mixin that adds a retire() member to derived types.
template<class T, class D = std::default_delete<T>>
class rcu_obj_base {
public:
// Enqueue *this for deletion after the next complete grace period.
void retire(D d = D{},
rcu_domain& domain = rcu_default_domain()) noexcept;
protected:
rcu_obj_base() = default;
rcu_obj_base(const rcu_obj_base&) = default;
rcu_obj_base& operator=(const rcu_obj_base&) = default;
};
// Retire p on domain: call d(p) after a complete grace period.
// Non-blocking — returns before deletion occurs.
template<class T, class D = std::default_delete<T>>
void rcu_retire(T* p, D d = D{},
rcu_domain& domain = rcu_default_domain());
// Block the caller until a complete grace period has elapsed on domain.
// Any object retired before this call is guaranteed deleted on return.
void rcu_synchronize(rcu_domain& domain = rcu_default_domain()) noexcept;
// Block until every retire callback enqueued on domain has been invoked.
void rcu_barrier(rcu_domain& domain = rcu_default_domain()) noexcept;
} // namespace stdrcu_retire is asynchronous — it enqueues and returns immediately. rcu_synchronize is its synchronous counterpart: after it returns, every object retired before the call is deleted. rcu_barrier is narrower: it waits only for the invocation of all queued callbacks, not necessarily for a full new grace period.
Examples
Read-heavy configuration snapshot
#include <rcu> // C++26
#include <atomic>
#include <cstdint>
#include <string>
struct ProxyConfig : std::rcu_obj_base<ProxyConfig> { // C++26
std::string upstream_host;
std::uint16_t upstream_port{443};
int timeout_ms{5000};
bool tls_enabled{true};
};
class ProxyRouter {
public:
// Read path: a single acquire-load. No blocking, no allocation.
// Do not persist this pointer beyond the calling scope.
const ProxyConfig* config() const noexcept {
return config_.load(std::memory_order_acquire);
}
// Write path: copy-update-swap-retire.
void reconfigure(ProxyConfig next) {
auto* incoming = new ProxyConfig{std::move(next)};
ProxyConfig* old = config_.exchange(incoming,
std::memory_order_acq_rel);
if (old) {
old->retire(); // C++26: deleted after the next grace period
}
}
~ProxyRouter() {
delete config_.exchange(nullptr, std::memory_order_acq_rel);
std::rcu_barrier(); // C++26: drain all pending retire callbacks
}
private:
std::atomic<ProxyConfig*> config_{nullptr};
};
// Hot path — called from many threads simultaneously.
void forward_request(const ProxyRouter& router,
std::string_view request_body) {
const ProxyConfig* cfg = router.config();
if (!cfg) return;
// Use cfg entirely within this scope.
connect_and_send(cfg->upstream_host, cfg->upstream_port,
cfg->tls_enabled, request_body);
}Synchronous reclaim with rcu_synchronize
Use rcu_synchronize when the write side needs a hard guarantee before proceeding — for example, when the old object owns resources that must be released in order:
void ordered_reconfigure(ProxyRouter& router, ProxyConfig next) {
auto* incoming = new ProxyConfig{std::move(next)};
// Swap atomically so new readers see `incoming` immediately.
ProxyConfig* old = router.swap_raw(incoming); // your atomic exchange
// Block until every thread that loaded `old` is provably done with it.
std::rcu_synchronize(); // C++26
// Safe to inspect and clean up `old` here before releasing memory.
audit_log("retired config", old->upstream_host);
delete old;
}Retiring types that cannot inherit rcu_obj_base
Third-party structs, C types, or final classes cannot inherit from rcu_obj_base. Use the free function rcu_retire with a custom deleter:
#include <rcu> // C++26
#include <atomic>
struct ssl_ctx_wrapper {
SSL_CTX* ctx{};
int generation{};
};
std::atomic<ssl_ctx_wrapper*> g_ssl_ctx{nullptr};
void rotate_ssl_context(SSL_CTX* new_ctx, int gen) {
auto* next = new ssl_ctx_wrapper{new_ctx, gen};
ssl_ctx_wrapper* old = g_ssl_ctx.exchange(next,
std::memory_order_acq_rel);
if (old) {
// C++26: free SSL_CTX, then delete the wrapper, after grace period
std::rcu_retire(old, [](ssl_ctx_wrapper* p) {
SSL_CTX_free(p->ctx);
delete p;
});
}
}Dedicated domain for isolated subsystems
#include <rcu> // C++26
// Separate domain prevents TLS-subsystem retirements from delaying
// grace periods in the routing subsystem and vice versa.
inline std::rcu_domain& tls_rcu_domain() {
static std::rcu_domain d;
return d;
}
void retire_ssl_config(SslConfig* old) {
std::rcu_retire(old, std::default_delete<SslConfig>{}, tls_rcu_domain());
}
void flush_tls_retirements() {
std::rcu_barrier(tls_rcu_domain()); // C++26
}Best Practices
Keep read-side critical sections short. A grace period cannot complete until every thread exits its current critical section. Long-running work inside a reader — blocking I/O, sleeping, waiting on a condition variable — stalls reclamation and causes retired objects to accumulate without bound.
Never persist an RCU pointer across a quiescent state. Storing config() in a member variable and using it ten milliseconds later (after the thread has yielded, locked, or synchronized) breaks the safety contract. Obtain the pointer fresh per logical operation and confine its use to that scope.
Prefer retire() over rcu_synchronize() for throughput. rcu_synchronize blocks the writing thread until all threads quiesce — under high reader concurrency this can mean milliseconds of stall. Reserve it for cases requiring ordered cleanup. For routine pointer rotation, retire() is non-blocking and lets the domain batch reclamation efficiently.
Call rcu_barrier before destruction. Before destroying any data structure that owns RCU-protected objects, call std::rcu_barrier() on the relevant domain. Without it, domain-internal threads may invoke deleters that reference memory you have already freed.
Partition heavy workloads with dedicated domains. The default domain is a global singleton. A subsystem that retires thousands of objects per second can delay grace periods for unrelated subsystems sharing that domain. Construct a per-subsystem rcu_domain to provide isolation.
Always publish through std::atomic. Updating the shared pointer via a plain (non-atomic) store is undefined behaviour under the C++ memory model regardless of hardware guarantees. Use memory_order_release for the publish store and memory_order_acquire for the reader load, or memory_order_acq_rel for atomic exchange on the write side.
Common Pitfalls
Dereferencing a retired pointer. After calling retire() or rcu_retire(), the pointer is owned by the domain. Any subsequent access — including in the retiring thread — is undefined behaviour. Treat retirement as an irrevocable ownership transfer.
Mutating data through a reader's raw pointer. The reader receives (conceptually) a const view of the published object. Writing through it races with the writer's atomic swap and corrupts the new version that replaced it. Always go through a full copy-update-retire cycle.
Assuming rcu_synchronize is free. On systems with many threads or threads in lengthy critical sections, rcu_synchronize may block for substantial wall time. It is not a lightweight fence; budget for it to be the bottleneck of any write-path that calls it.
Forgetting rcu_barrier at shutdown. The domain may schedule deleter invocations on background threads. If the program exits or a shared library unloads while callbacks are still pending, those callbacks fire into unmapped memory. rcu_barrier drains the queue before you proceed.
Allocating retired objects at unbounded rates. RCU defers deletion; under a sustained write storm, retired-but-not-yet-reclaimed objects accumulate in memory. Rate-limit writers or intersperse rcu_synchronize calls to bound live memory.
See Also
std::hazard_pointer— per-pointer lock-free reclamation; higher read overhead but tighter reclamation latencystd::atomic— required for all RCU pointer publish/subscribe operationsstd::jthread— cooperative cancellation model that pairs naturally with RCU-driven hot-reload patterns