Skip to content
C++
Idiom
since C++98
Advanced

Expression Templates

Encode arithmetic expression trees as nested types so chained operations compile to a single loop with no intermediate heap allocations.

Expression Templatessince C++98

A metaprogramming technique that represents arithmetic sub-expressions as nested template types, deferring element-wise evaluation until assignment to eliminate heap allocations for intermediate results.

Overview

When you write result = a + b + c with a naive vector type, operator+ is called twice. The first call allocates storage, fills it, and returns a temporary t1 = a + b. The second call repeats this for result = t1 + c. For an n-element vector you pay two heap allocations, three full passes, and two destructions — where a hand-written loop would need one pass and zero allocations.

Expression templates solve this by making operator+ return a lightweight expression node — a struct holding const-references to its operands — instead of a materialised result. The type of a + b + c becomes Add<Add<Vec, Vec>, Vec>. No storage is allocated. The single element-wise loop only runs when the expression is assigned to a Vec. A good optimiser sees through the entire call chain and emits code identical to the hand-written loop.

This approach dates to the C++98/03 era. Every serious numerical library uses a variant of it: Eigen, Blaze, Armadillo, and xtensor all build their zero-overhead abstraction on this foundation.

Syntax

The canonical structure uses CRTP (C++98) to give every expression node a common base, enabling a single constrained operator= on the concrete type.

cpp
#include <vector>
#include <cstddef>
#include <initializer_list>

// C++98: CRTP expression base
template<typename Derived>
struct VecBase {
    const Derived& derived() const { return static_cast<const Derived&>(*this); }
    double      operator[](std::size_t i) const { return derived()[i]; }
    std::size_t size()                    const { return derived().size(); }
};

// Expression node: element-wise addition
template<typename L, typename R>
struct Add : VecBase<Add<L, R>> {
    const L& lhs;
    const R& rhs;
    Add(const L& l, const R& r) : lhs(l), rhs(r) {}
    double      operator[](std::size_t i) const { return lhs[i] + rhs[i]; }
    std::size_t size()                    const { return lhs.size(); }
};

// Expression node: element-wise multiplication
template<typename L, typename R>
struct Mul : VecBase<Mul<L, R>> {
    const L& lhs;
    const R& rhs;
    Mul(const L& l, const R& r) : lhs(l), rhs(r) {}
    double      operator[](std::size_t i) const { return lhs[i] * rhs[i]; }
    std::size_t size()                    const { return lhs.size(); }
};

// Expression node: scalar multiplication
template<typename E>
struct Scale : VecBase<Scale<E>> {
    double   scalar;
    const E& expr;
    Scale(double s, const E& e) : scalar(s), expr(e) {}
    double      operator[](std::size_t i) const { return scalar * expr[i]; }
    std::size_t size()                    const { return expr.size(); }
};

// Concrete storage type
class Vec : public VecBase<Vec> {
    std::vector<double> data_;
public:
    explicit Vec(std::size_t n, double fill = 0.0) : data_(n, fill) {}
    Vec(std::initializer_list<double> il) : data_(il) {}

    double       operator[](std::size_t i) const { return data_[i]; }
    double&      operator[](std::size_t i)       { return data_[i]; }
    std::size_t  size()                    const { return data_.size(); }

    // Assign from any expression — one loop, no temporaries  (C++98)
    template<typename Expr>
    Vec& operator=(const VecBase<Expr>& rhs) {
        const Expr& e = static_cast<const Expr&>(rhs);
        for (std::size_t i = 0; i < data_.size(); ++i)
            data_[i] = e[i];
        return *this;
    }

    // Construct from expression — size deduced from the tree  (C++98)
    template<typename Expr>
    explicit Vec(const VecBase<Expr>& rhs) : data_(rhs.size()) { *this = rhs; }
};

// Operators return expression nodes — not Vec
template<typename L, typename R>
Add<L, R>   operator+(const VecBase<L>& l, const VecBase<R>& r) { return {l.derived(), r.derived()}; }

template<typename L, typename R>
Mul<L, R>   operator*(const VecBase<L>& l, const VecBase<R>& r) { return {l.derived(), r.derived()}; }

template<typename E>
Scale<E>    operator*(double s, const VecBase<E>& e) { return {s, e.derived()}; }
template<typename E>
Scale<E>    operator*(const VecBase<E>& e, double s) { return {s, e.derived()}; }

Examples

Single-pass evaluation

cpp
Vec a{1.0, 2.0, 3.0};
Vec b{4.0, 5.0, 6.0};
Vec c{7.0, 8.0, 9.0};

// Deduced type: Add<Add<Vec,Vec>, Vec>  — zero allocations
Vec result(a.size());
result = a + b + c;          // one loop: result[i] = a[i] + b[i] + c[i]

// Mixed: 2*a[i] + b[i]*c[i] — still one loop
Vec result2(2.0 * a + b * c);

At -O2 the compiler inlines every operator[] call in the tree; the emitted assembly is indistinguishable from a hand-written loop.

Eigen: expression templates in the wild

Eigen has used this pattern since its first release and extends it to matrix products, reductions, and block operations.

cpp
#include <Eigen/Dense>

Eigen::VectorXd a(1000), b(1000), c(1000);
// fills omitted

// Type: Eigen::CwiseBinaryOp<...> — no allocation until assignment
Eigen::VectorXd r = a + b + c;

// .eval() forces immediate materialisation — needed before aliasing ops
auto t = (a + b).eval();

// Avoid a defensive copy for non-aliasing matrix products  (Eigen-specific API)
Eigen::MatrixXd A(100,100), B(100,100), C(100,100);
C.noalias() += A * B;   // skips the default temporary that guards against aliasing

C++20 Simplifications

C++20 concepts (C++20) replace CRTP as the constraint mechanism. The expression node types are unchanged; only the constraint syntax improves.

cpp
// C++20: concept models the VecBase protocol without inheritance
template<typename T>
concept VecExpr = requires(const T& e, std::size_t i) {
    { e[i]     } -> std::convertible_to<double>;
    { e.size() } -> std::convertible_to<std::size_t>;
};

// Abbreviated function template — C++20
VecExpr auto operator+(const VecExpr auto& l, const VecExpr auto& r) {
    return Add{l, r};   // CTAD deduces Add<L, R>  — C++17
}

// Vec::operator= with concept constraint instead of CRTP base parameter
template<VecExpr Expr>
Vec& Vec::operator=(const Expr& rhs) {
    for (std::size_t i = 0; i < data_.size(); ++i)
        data_[i] = rhs[i];
    return *this;
}

When a type mismatch occurs, the compiler reports which VecExpr requirement the offending type fails to satisfy — far more readable than the multi-screen instantiation backtraces produced by unconstrained C++98/11 templates.

Best Practices

Expose .eval() explicitly. Any expression type should provide a method that forces materialisation:

cpp
template<typename Derived>
struct VecBase {
    Vec eval() const;   // defined after Vec is complete
};

Callers use (a + b).eval() to state intent unambiguously — helpful at API boundaries and before storing an expression in a named variable.

Keep the element type generic. Replace double with a template parameter T on every node and Vec. The expression tree pattern is type-agnostic; locking it to double rules out float, int, and complex numbers for no reason.

Assert the CRTP invariant in C++17. Before C++20, add a static_assert in operator= to produce a readable diagnostic instead of a wall of substitution failures:

cpp
template<typename Expr>
Vec& operator=(const VecBase<Expr>& rhs) {
    static_assert(std::is_base_of_v<VecBase<Expr>, Expr>,  // C++17
                  "rhs must derive from VecBase<Expr>");
    ...
}

Common Pitfalls

Dangling references

Every expression node stores const& to its operands. If an operand is a temporary returned by value, it is destroyed before the assignment loop runs:

cpp
// UB: temporaries from factory() are destroyed before Vec's constructor loop
Vec bad(makeVec(1) + makeVec(2));

// Safe: materialise each operand first
Vec tmp1 = makeVec(1);
Vec tmp2 = makeVec(2);
Vec good(tmp1 + tmp2);

This is the most common production bug with expression templates. Libraries that need to support temporaries store operands by value when the operand is an rvalue — a technique Eigen applies via internal::ref_selector.

Aliasing in place-wise operations

For element-wise nodes each output index reads only its own input index, so v = v + v is safe: v[i] is read before it is overwritten. However, any operation where output index i depends on multiple input indices — matrix-matrix multiply, reductions, convolutions — requires a defensive copy unless the library can prove the expression is alias-free. This is why Eigen defaults to a temporary for C = A * B and requires .noalias() to opt out.

Template instantiation depth

Deep expression trees like a + b + c + d + ... + z produce deeply nested types that can exceed default template instantiation depth limits on some compilers (-ftemplate-depth on GCC/Clang). Real libraries keep trees shallow with N-ary expression nodes or cap expression depth via traits.

See Also