Expression Templates
Encode arithmetic expression trees as nested types so chained operations compile to a single loop with no intermediate heap allocations.
Expression Templatessince C++98A metaprogramming technique that represents arithmetic sub-expressions as nested template types, deferring element-wise evaluation until assignment to eliminate heap allocations for intermediate results.
Overview
When you write result = a + b + c with a naive vector type, operator+ is called twice. The first call allocates storage, fills it, and returns a temporary t1 = a + b. The second call repeats this for result = t1 + c. For an n-element vector you pay two heap allocations, three full passes, and two destructions — where a hand-written loop would need one pass and zero allocations.
Expression templates solve this by making operator+ return a lightweight expression node — a struct holding const-references to its operands — instead of a materialised result. The type of a + b + c becomes Add<Add<Vec, Vec>, Vec>. No storage is allocated. The single element-wise loop only runs when the expression is assigned to a Vec. A good optimiser sees through the entire call chain and emits code identical to the hand-written loop.
This approach dates to the C++98/03 era. Every serious numerical library uses a variant of it: Eigen, Blaze, Armadillo, and xtensor all build their zero-overhead abstraction on this foundation.
Syntax
The canonical structure uses CRTP (C++98) to give every expression node a common base, enabling a single constrained operator= on the concrete type.
#include <vector>
#include <cstddef>
#include <initializer_list>
// C++98: CRTP expression base
template<typename Derived>
struct VecBase {
const Derived& derived() const { return static_cast<const Derived&>(*this); }
double operator[](std::size_t i) const { return derived()[i]; }
std::size_t size() const { return derived().size(); }
};
// Expression node: element-wise addition
template<typename L, typename R>
struct Add : VecBase<Add<L, R>> {
const L& lhs;
const R& rhs;
Add(const L& l, const R& r) : lhs(l), rhs(r) {}
double operator[](std::size_t i) const { return lhs[i] + rhs[i]; }
std::size_t size() const { return lhs.size(); }
};
// Expression node: element-wise multiplication
template<typename L, typename R>
struct Mul : VecBase<Mul<L, R>> {
const L& lhs;
const R& rhs;
Mul(const L& l, const R& r) : lhs(l), rhs(r) {}
double operator[](std::size_t i) const { return lhs[i] * rhs[i]; }
std::size_t size() const { return lhs.size(); }
};
// Expression node: scalar multiplication
template<typename E>
struct Scale : VecBase<Scale<E>> {
double scalar;
const E& expr;
Scale(double s, const E& e) : scalar(s), expr(e) {}
double operator[](std::size_t i) const { return scalar * expr[i]; }
std::size_t size() const { return expr.size(); }
};
// Concrete storage type
class Vec : public VecBase<Vec> {
std::vector<double> data_;
public:
explicit Vec(std::size_t n, double fill = 0.0) : data_(n, fill) {}
Vec(std::initializer_list<double> il) : data_(il) {}
double operator[](std::size_t i) const { return data_[i]; }
double& operator[](std::size_t i) { return data_[i]; }
std::size_t size() const { return data_.size(); }
// Assign from any expression — one loop, no temporaries (C++98)
template<typename Expr>
Vec& operator=(const VecBase<Expr>& rhs) {
const Expr& e = static_cast<const Expr&>(rhs);
for (std::size_t i = 0; i < data_.size(); ++i)
data_[i] = e[i];
return *this;
}
// Construct from expression — size deduced from the tree (C++98)
template<typename Expr>
explicit Vec(const VecBase<Expr>& rhs) : data_(rhs.size()) { *this = rhs; }
};
// Operators return expression nodes — not Vec
template<typename L, typename R>
Add<L, R> operator+(const VecBase<L>& l, const VecBase<R>& r) { return {l.derived(), r.derived()}; }
template<typename L, typename R>
Mul<L, R> operator*(const VecBase<L>& l, const VecBase<R>& r) { return {l.derived(), r.derived()}; }
template<typename E>
Scale<E> operator*(double s, const VecBase<E>& e) { return {s, e.derived()}; }
template<typename E>
Scale<E> operator*(const VecBase<E>& e, double s) { return {s, e.derived()}; }Examples
Single-pass evaluation
Vec a{1.0, 2.0, 3.0};
Vec b{4.0, 5.0, 6.0};
Vec c{7.0, 8.0, 9.0};
// Deduced type: Add<Add<Vec,Vec>, Vec> — zero allocations
Vec result(a.size());
result = a + b + c; // one loop: result[i] = a[i] + b[i] + c[i]
// Mixed: 2*a[i] + b[i]*c[i] — still one loop
Vec result2(2.0 * a + b * c);At -O2 the compiler inlines every operator[] call in the tree; the emitted assembly is indistinguishable from a hand-written loop.
Eigen: expression templates in the wild
Eigen has used this pattern since its first release and extends it to matrix products, reductions, and block operations.
#include <Eigen/Dense>
Eigen::VectorXd a(1000), b(1000), c(1000);
// fills omitted
// Type: Eigen::CwiseBinaryOp<...> — no allocation until assignment
Eigen::VectorXd r = a + b + c;
// .eval() forces immediate materialisation — needed before aliasing ops
auto t = (a + b).eval();
// Avoid a defensive copy for non-aliasing matrix products (Eigen-specific API)
Eigen::MatrixXd A(100,100), B(100,100), C(100,100);
C.noalias() += A * B; // skips the default temporary that guards against aliasingC++20 Simplifications
C++20 concepts (C++20) replace CRTP as the constraint mechanism. The expression node types are unchanged; only the constraint syntax improves.
// C++20: concept models the VecBase protocol without inheritance
template<typename T>
concept VecExpr = requires(const T& e, std::size_t i) {
{ e[i] } -> std::convertible_to<double>;
{ e.size() } -> std::convertible_to<std::size_t>;
};
// Abbreviated function template — C++20
VecExpr auto operator+(const VecExpr auto& l, const VecExpr auto& r) {
return Add{l, r}; // CTAD deduces Add<L, R> — C++17
}
// Vec::operator= with concept constraint instead of CRTP base parameter
template<VecExpr Expr>
Vec& Vec::operator=(const Expr& rhs) {
for (std::size_t i = 0; i < data_.size(); ++i)
data_[i] = rhs[i];
return *this;
}When a type mismatch occurs, the compiler reports which VecExpr requirement the offending type fails to satisfy — far more readable than the multi-screen instantiation backtraces produced by unconstrained C++98/11 templates.
Best Practices
Expose .eval() explicitly. Any expression type should provide a method that forces materialisation:
template<typename Derived>
struct VecBase {
Vec eval() const; // defined after Vec is complete
};Callers use (a + b).eval() to state intent unambiguously — helpful at API boundaries and before storing an expression in a named variable.
Keep the element type generic. Replace double with a template parameter T on every node and Vec. The expression tree pattern is type-agnostic; locking it to double rules out float, int, and complex numbers for no reason.
Assert the CRTP invariant in C++17. Before C++20, add a static_assert in operator= to produce a readable diagnostic instead of a wall of substitution failures:
template<typename Expr>
Vec& operator=(const VecBase<Expr>& rhs) {
static_assert(std::is_base_of_v<VecBase<Expr>, Expr>, // C++17
"rhs must derive from VecBase<Expr>");
...
}Common Pitfalls
Dangling references
Every expression node stores const& to its operands. If an operand is a temporary returned by value, it is destroyed before the assignment loop runs:
// UB: temporaries from factory() are destroyed before Vec's constructor loop
Vec bad(makeVec(1) + makeVec(2));
// Safe: materialise each operand first
Vec tmp1 = makeVec(1);
Vec tmp2 = makeVec(2);
Vec good(tmp1 + tmp2);This is the most common production bug with expression templates. Libraries that need to support temporaries store operands by value when the operand is an rvalue — a technique Eigen applies via internal::ref_selector.
Aliasing in place-wise operations
For element-wise nodes each output index reads only its own input index, so v = v + v is safe: v[i] is read before it is overwritten. However, any operation where output index i depends on multiple input indices — matrix-matrix multiply, reductions, convolutions — requires a defensive copy unless the library can prove the expression is alias-free. This is why Eigen defaults to a temporary for C = A * B and requires .noalias() to opt out.
Template instantiation depth
Deep expression trees like a + b + c + d + ... + z produce deeply nested types that can exceed default template instantiation depth limits on some compilers (-ftemplate-depth on GCC/Clang). Real libraries keep trees shallow with N-ary expression nodes or cap expression depth via traits.
See Also
- CRTP — the static polymorphism mechanism the classic approach relies on
- Concepts (C++20) — the modern alternative to CRTP constraints
- Operator Overloading — prerequisite for building expression DSLs
- Template Metaprogramming — broader context for compile-time type computations