Skip to content
C++
Library
since C++98
Intermediate

Scan Operations

Prefix-sum algorithms in <numeric>: partial_sum (C++98) and the parallel-capable inclusive_scan, exclusive_scan, and transform variants added in C++17.

Scan Operationsince C++98

A scan algorithm maps an input sequence to an output sequence where each output element is the accumulated result of all elements up to and including (inclusive) or strictly before (exclusive) the corresponding input position.

Overview

Scan algorithms live in <numeric> and come in two generations. C++98 introduced std::partial_sum, which computes a strict left-to-right fold at each position. C++17 added four new algorithms β€” std::inclusive_scan, std::exclusive_scan, std::transform_inclusive_scan, and std::transform_exclusive_scan β€” formulated to allow parallel execution.

The key difference between generations is not just parallelism. std::partial_sum applies the binary operation in guaranteed left-to-right order, so the operation need not be associative. The C++17 algorithms require an associative binary operation β€” not necessarily commutative β€” which gives the scheduler freedom to split and merge sub-ranges arbitrarily. Passing a non-associative operation to a C++17 scan with a parallel execution policy is undefined behaviour.

The inclusive/exclusive axis describes what element is counted at position i:

  • Inclusive: output[i] accumulates input[0..i] β€” the current element is included.
  • Exclusive: output[i] accumulates input[0..i-1] β€” the current element is excluded; output[0] takes an explicit init value.

For input {1, 2, 3, 4} with addition and init = 0:

cpp
inclusive: {1, 3, 6, 10}
exclusive: {0, 1, 3, 6}

The exclusive form shifts the scan forward by one slot. This makes it the natural choice for computing per-segment write offsets in parallel algorithms: each worker reads its starts[i] without a separate coordination pass.

Syntax

cpp
#include <numeric>
#include <execution>  // C++17: execution policies

// C++98 β€” sequential; op need not be associative
template<class InIt, class OutIt>
OutIt partial_sum(InIt first, InIt last, OutIt result);

template<class InIt, class OutIt, class BinaryOp>
OutIt partial_sum(InIt first, InIt last, OutIt result, BinaryOp op);

// C++17 β€” inclusive scan; op must be associative
template<class InIt, class OutIt>
OutIt inclusive_scan(InIt first, InIt last, OutIt result);

template<class InIt, class OutIt, class BinaryOp>
OutIt inclusive_scan(InIt first, InIt last, OutIt result, BinaryOp op);

template<class InIt, class OutIt, class BinaryOp, class T>
OutIt inclusive_scan(InIt first, InIt last, OutIt result, BinaryOp op, T init);

// C++17 β€” exclusive scan; init is mandatory
template<class InIt, class OutIt, class T>
OutIt exclusive_scan(InIt first, InIt last, OutIt result, T init);

template<class InIt, class OutIt, class T, class BinaryOp>
OutIt exclusive_scan(InIt first, InIt last, OutIt result, T init, BinaryOp op);

// C++17 β€” transform then inclusive scan
template<class InIt, class OutIt, class BinaryOp, class UnaryOp>
OutIt transform_inclusive_scan(InIt first, InIt last, OutIt result,
                                BinaryOp binary_op, UnaryOp unary_op);

template<class InIt, class OutIt, class BinaryOp, class UnaryOp, class T>
OutIt transform_inclusive_scan(InIt first, InIt last, OutIt result,
                                BinaryOp binary_op, UnaryOp unary_op, T init);

// C++17 β€” transform then exclusive scan; note: init precedes the ops
template<class InIt, class OutIt, class T, class BinaryOp, class UnaryOp>
OutIt transform_exclusive_scan(InIt first, InIt last, OutIt result,
                                T init, BinaryOp binary_op, UnaryOp unary_op);

Every C++17 algorithm has an overload family prefixed with an ExecutionPolicy argument (std::execution::seq, par, par_unseq).

Examples

Segment offset computation

cpp
#include <numeric>
#include <vector>
#include <iostream>

int main() {
    std::vector<int> sizes{3, 1, 4, 1, 5, 9};

    std::vector<int> ends(sizes.size());
    std::inclusive_scan(sizes.begin(), sizes.end(), ends.begin()); // C++17
    // ends: {3, 4, 8, 9, 14, 23}

    std::vector<int> starts(sizes.size());
    std::exclusive_scan(sizes.begin(), sizes.end(), starts.begin(), 0); // C++17
    // starts: {0, 3, 4, 8, 9, 14}

    for (std::size_t i = 0; i < sizes.size(); ++i)
        std::cout << "[" << starts[i] << ", " << ends[i] << ")\n";
}

The exclusive/inclusive pair together provides both endpoints of each half-open interval in a single pass β€” the canonical pattern for partitioning a flat buffer among variable-length segments.

Running product with partial_sum

cpp
#include <numeric>
#include <vector>
#include <functional>

// C++98: compute factorials in-place β€” [1!, 2!, 3!, 4!, 5!]
std::vector<int> v{1, 2, 3, 4, 5};
std::partial_sum(v.begin(), v.end(), v.begin(), std::multiplies<int>{});
// v: {1, 2, 6, 24, 120}

Because std::partial_sum is strictly sequential, it is safe for non-associative operations. Avoid substituting inclusive_scan here without verifying associativity.

Parallel prefix sum on large data

cpp
#include <numeric>
#include <execution>  // C++17
#include <vector>

std::vector<double> data(10'000'000);
// ... fill data ...

std::vector<double> prefix(data.size());
std::inclusive_scan(
    std::execution::par_unseq,   // C++17: parallel + vectorisable
    data.begin(), data.end(),
    prefix.begin()
);

With par_unseq, the runtime may reorder and interleave operations freely. For IEEE 754 doubles, this means results can diverge from a sequential scan at the ULP level due to floating-point non-associativity.

Fused transform scan: cumulative energy

cpp
#include <numeric>
#include <vector>

std::vector<double> signal{1.0, 2.0, 3.0, 4.0, 5.0};
std::vector<double> energy(signal.size());

// C++17: square each element, then accumulate β€” no intermediate buffer
std::transform_inclusive_scan(
    signal.begin(), signal.end(),
    energy.begin(),
    std::plus<double>{},                      // accumulation op
    [](double x) { return x * x; }           // per-element transform
);
// energy: {1, 5, 14, 30, 55}

Fusing the transform with the scan avoids a temporary allocation and improves cache utilisation relative to a separate std::transform followed by std::inclusive_scan.

Exclusive scan with a non-additive identity

cpp
#include <numeric>
#include <limits>
#include <vector>
#include <algorithm>

// Prefix-max: output[i] = max of input[0..i-1]
std::vector<int> vals{3, 1, 4, 1, 5, 9, 2, 6};
std::vector<int> prefix_max(vals.size());

std::exclusive_scan(                          // C++17
    vals.begin(), vals.end(),
    prefix_max.begin(),
    std::numeric_limits<int>::min(),          // identity for max
    [](int a, int b) { return std::max(a, b); }
);
// prefix_max: {INT_MIN, 3, 3, 4, 4, 5, 9, 9}

Best Practices

Prefer the C++17 algorithms over partial_sum for new code. inclusive_scan and exclusive_scan make the inclusive/exclusive distinction explicit, support execution policies, and compose cleanly with the rest of the C++17 parallel algorithm suite. Reserve partial_sum only when the operation is non-associative or when you must target C++14 and earlier.

Always supply a correct identity element for non-additive operations. For exclusive_scan and transform_exclusive_scan, init seeds the very first output slot. An incorrect identity silently corrupts every output element. For multiplication use 1; for bitwise AND use ~0; for max use std::numeric_limits<T>::min().

In-place writes are legal. All scan algorithms allow result == first when the range is contiguous, which avoids a second allocation:

cpp
std::inclusive_scan(v.begin(), v.end(), v.begin()); // C++17, in-place: valid

Match the execution policy to your operation's cost. par_unseq pays off on large contiguous ranges with cheap, branchless binary ops. For operations with significant per-element overhead (virtual calls, heap allocation), par without unseq is often the better choice.

Common Pitfalls

exclusive_scan has no overload without init. Unlike inclusive_scan, every exclusive_scan overload requires an explicit initial value. Failing to provide it or passing the wrong type is a compile error; passing a semantically wrong value is a silent correctness bug.

partial_sum and inclusive_scan are not interchangeable for floating-point. On a conforming implementation both are correct, but they are not required to produce the same bit pattern. More critically, passing a non-associative op to inclusive_scan with par or par_unseq is undefined behaviour β€” the standard is explicit on this.

Argument order diverges between transform_inclusive_scan and transform_exclusive_scan. In transform_inclusive_scan, init is the last argument (and optional). In transform_exclusive_scan, init precedes both the binary and unary operations. This asymmetry is a persistent source of compile errors; always consult the signature before writing.

Parallelism does not guarantee reproducibility. A parallel scan over double may produce results that differ from a sequential one by more than floating-point rounding noise, depending on the scheduler's choice of merge order. For reproducibility-critical domains, pass std::execution::seq explicitly.

See Also

  • std::reduce (C++17) β€” parallel-capable total reduction without prefix output
  • std::transform_reduce (C++17) β€” fused transform and fold over one or two ranges
  • std::adjacent_difference (C++98) β€” inverse operation; produces per-element differences from a prefix-sum sequence
  • std::inner_product (C++98) β€” dot product and generalised fold over two ranges simultaneously