Scan Operations
Prefix-sum algorithms in <numeric>: partial_sum (C++98) and the parallel-capable inclusive_scan, exclusive_scan, and transform variants added in C++17.
Scan Operationsince C++98A scan algorithm maps an input sequence to an output sequence where each output element is the accumulated result of all elements up to and including (inclusive) or strictly before (exclusive) the corresponding input position.
Overview
Scan algorithms live in <numeric> and come in two generations. C++98 introduced std::partial_sum, which computes a strict left-to-right fold at each position. C++17 added four new algorithms β std::inclusive_scan, std::exclusive_scan, std::transform_inclusive_scan, and std::transform_exclusive_scan β formulated to allow parallel execution.
The key difference between generations is not just parallelism. std::partial_sum applies the binary operation in guaranteed left-to-right order, so the operation need not be associative. The C++17 algorithms require an associative binary operation β not necessarily commutative β which gives the scheduler freedom to split and merge sub-ranges arbitrarily. Passing a non-associative operation to a C++17 scan with a parallel execution policy is undefined behaviour.
The inclusive/exclusive axis describes what element is counted at position i:
- Inclusive:
output[i]accumulatesinput[0..i]β the current element is included. - Exclusive:
output[i]accumulatesinput[0..i-1]β the current element is excluded;output[0]takes an explicitinitvalue.
For input {1, 2, 3, 4} with addition and init = 0:
inclusive: {1, 3, 6, 10}
exclusive: {0, 1, 3, 6}The exclusive form shifts the scan forward by one slot. This makes it the natural choice for computing per-segment write offsets in parallel algorithms: each worker reads its starts[i] without a separate coordination pass.
Syntax
#include <numeric>
#include <execution> // C++17: execution policies
// C++98 β sequential; op need not be associative
template<class InIt, class OutIt>
OutIt partial_sum(InIt first, InIt last, OutIt result);
template<class InIt, class OutIt, class BinaryOp>
OutIt partial_sum(InIt first, InIt last, OutIt result, BinaryOp op);
// C++17 β inclusive scan; op must be associative
template<class InIt, class OutIt>
OutIt inclusive_scan(InIt first, InIt last, OutIt result);
template<class InIt, class OutIt, class BinaryOp>
OutIt inclusive_scan(InIt first, InIt last, OutIt result, BinaryOp op);
template<class InIt, class OutIt, class BinaryOp, class T>
OutIt inclusive_scan(InIt first, InIt last, OutIt result, BinaryOp op, T init);
// C++17 β exclusive scan; init is mandatory
template<class InIt, class OutIt, class T>
OutIt exclusive_scan(InIt first, InIt last, OutIt result, T init);
template<class InIt, class OutIt, class T, class BinaryOp>
OutIt exclusive_scan(InIt first, InIt last, OutIt result, T init, BinaryOp op);
// C++17 β transform then inclusive scan
template<class InIt, class OutIt, class BinaryOp, class UnaryOp>
OutIt transform_inclusive_scan(InIt first, InIt last, OutIt result,
BinaryOp binary_op, UnaryOp unary_op);
template<class InIt, class OutIt, class BinaryOp, class UnaryOp, class T>
OutIt transform_inclusive_scan(InIt first, InIt last, OutIt result,
BinaryOp binary_op, UnaryOp unary_op, T init);
// C++17 β transform then exclusive scan; note: init precedes the ops
template<class InIt, class OutIt, class T, class BinaryOp, class UnaryOp>
OutIt transform_exclusive_scan(InIt first, InIt last, OutIt result,
T init, BinaryOp binary_op, UnaryOp unary_op);Every C++17 algorithm has an overload family prefixed with an ExecutionPolicy argument (std::execution::seq, par, par_unseq).
Examples
Segment offset computation
#include <numeric>
#include <vector>
#include <iostream>
int main() {
std::vector<int> sizes{3, 1, 4, 1, 5, 9};
std::vector<int> ends(sizes.size());
std::inclusive_scan(sizes.begin(), sizes.end(), ends.begin()); // C++17
// ends: {3, 4, 8, 9, 14, 23}
std::vector<int> starts(sizes.size());
std::exclusive_scan(sizes.begin(), sizes.end(), starts.begin(), 0); // C++17
// starts: {0, 3, 4, 8, 9, 14}
for (std::size_t i = 0; i < sizes.size(); ++i)
std::cout << "[" << starts[i] << ", " << ends[i] << ")\n";
}The exclusive/inclusive pair together provides both endpoints of each half-open interval in a single pass β the canonical pattern for partitioning a flat buffer among variable-length segments.
Running product with partial_sum
#include <numeric>
#include <vector>
#include <functional>
// C++98: compute factorials in-place β [1!, 2!, 3!, 4!, 5!]
std::vector<int> v{1, 2, 3, 4, 5};
std::partial_sum(v.begin(), v.end(), v.begin(), std::multiplies<int>{});
// v: {1, 2, 6, 24, 120}Because std::partial_sum is strictly sequential, it is safe for non-associative operations. Avoid substituting inclusive_scan here without verifying associativity.
Parallel prefix sum on large data
#include <numeric>
#include <execution> // C++17
#include <vector>
std::vector<double> data(10'000'000);
// ... fill data ...
std::vector<double> prefix(data.size());
std::inclusive_scan(
std::execution::par_unseq, // C++17: parallel + vectorisable
data.begin(), data.end(),
prefix.begin()
);With par_unseq, the runtime may reorder and interleave operations freely. For IEEE 754 doubles, this means results can diverge from a sequential scan at the ULP level due to floating-point non-associativity.
Fused transform scan: cumulative energy
#include <numeric>
#include <vector>
std::vector<double> signal{1.0, 2.0, 3.0, 4.0, 5.0};
std::vector<double> energy(signal.size());
// C++17: square each element, then accumulate β no intermediate buffer
std::transform_inclusive_scan(
signal.begin(), signal.end(),
energy.begin(),
std::plus<double>{}, // accumulation op
[](double x) { return x * x; } // per-element transform
);
// energy: {1, 5, 14, 30, 55}Fusing the transform with the scan avoids a temporary allocation and improves cache utilisation relative to a separate std::transform followed by std::inclusive_scan.
Exclusive scan with a non-additive identity
#include <numeric>
#include <limits>
#include <vector>
#include <algorithm>
// Prefix-max: output[i] = max of input[0..i-1]
std::vector<int> vals{3, 1, 4, 1, 5, 9, 2, 6};
std::vector<int> prefix_max(vals.size());
std::exclusive_scan( // C++17
vals.begin(), vals.end(),
prefix_max.begin(),
std::numeric_limits<int>::min(), // identity for max
[](int a, int b) { return std::max(a, b); }
);
// prefix_max: {INT_MIN, 3, 3, 4, 4, 5, 9, 9}Best Practices
Prefer the C++17 algorithms over partial_sum for new code. inclusive_scan and exclusive_scan make the inclusive/exclusive distinction explicit, support execution policies, and compose cleanly with the rest of the C++17 parallel algorithm suite. Reserve partial_sum only when the operation is non-associative or when you must target C++14 and earlier.
Always supply a correct identity element for non-additive operations. For exclusive_scan and transform_exclusive_scan, init seeds the very first output slot. An incorrect identity silently corrupts every output element. For multiplication use 1; for bitwise AND use ~0; for max use std::numeric_limits<T>::min().
In-place writes are legal. All scan algorithms allow result == first when the range is contiguous, which avoids a second allocation:
std::inclusive_scan(v.begin(), v.end(), v.begin()); // C++17, in-place: validMatch the execution policy to your operation's cost. par_unseq pays off on large contiguous ranges with cheap, branchless binary ops. For operations with significant per-element overhead (virtual calls, heap allocation), par without unseq is often the better choice.
Common Pitfalls
exclusive_scan has no overload without init. Unlike inclusive_scan, every exclusive_scan overload requires an explicit initial value. Failing to provide it or passing the wrong type is a compile error; passing a semantically wrong value is a silent correctness bug.
partial_sum and inclusive_scan are not interchangeable for floating-point. On a conforming implementation both are correct, but they are not required to produce the same bit pattern. More critically, passing a non-associative op to inclusive_scan with par or par_unseq is undefined behaviour β the standard is explicit on this.
Argument order diverges between transform_inclusive_scan and transform_exclusive_scan. In transform_inclusive_scan, init is the last argument (and optional). In transform_exclusive_scan, init precedes both the binary and unary operations. This asymmetry is a persistent source of compile errors; always consult the signature before writing.
Parallelism does not guarantee reproducibility. A parallel scan over double may produce results that differ from a sequential one by more than floating-point rounding noise, depending on the scheduler's choice of merge order. For reproducibility-critical domains, pass std::execution::seq explicitly.
See Also
std::reduce(C++17) β parallel-capable total reduction without prefix outputstd::transform_reduce(C++17) β fused transform and fold over one or two rangesstd::adjacent_difference(C++98) β inverse operation; produces per-element differences from a prefix-sum sequencestd::inner_product(C++98) β dot product and generalised fold over two ranges simultaneously