Skip to content
C++
Library
since C++98
Intermediate

Set and Heap Algorithms

Set algorithms operating on sorted ranges and heap operations — multiset semantics, complexity guarantees, and C++20 ranges versions.

Set and Heap Algorithmssince C++98

Six algorithms in <algorithm> that compute multiset relationships (union, intersection, difference, subset) over sorted ranges, plus five heap algorithms implementing priority-queue operations on any random-access range.

Overview

The set algorithms treat sorted ranges as mathematical multisets: element multiplicity is preserved according to well-defined rules. If an element appears M times in the first range and N times in the second:

AlgorithmOutput count per element
set_unionmax(M, N)
set_intersectionmin(M, N)
set_differencemax(0, M − N)
set_symmetric_difference|M − N|

All four produce a sorted output range using the same ordering relation as the inputs. The heap algorithms implement a binary max-heap stored in a flat array, guaranteeing the maximum element is always at v[0].

Critical preconditions for set algorithms:

  • Both input ranges must be sorted under the same comparator passed to the algorithm. Violating this is undefined behavior, not just wrong results.
  • The output range must not overlap either input range.
  • The comparator must be a strict weak ordering.

Syntax

cpp
// All set operation overloads follow this pattern (C++98):
template<class InputIt1, class InputIt2, class OutputIt>
OutputIt std::set_union(InputIt1 first1, InputIt1 last1,
                        InputIt2 first2, InputIt2 last2,
                        OutputIt d_first);

template<class InputIt1, class InputIt2, class OutputIt, class Compare>
OutputIt std::set_union(InputIt1 first1, InputIt1 last1,
                        InputIt2 first2, InputIt2 last2,
                        OutputIt d_first, Compare comp);

// Same pattern applies to: set_intersection, set_difference,
// set_symmetric_difference, includes.

// Heap operations (C++98):
template<class RandomIt> void std::make_heap(RandomIt first, RandomIt last);
template<class RandomIt> void std::push_heap(RandomIt first, RandomIt last);
template<class RandomIt> void std::pop_heap(RandomIt first, RandomIt last);
template<class RandomIt> void std::sort_heap(RandomIt first, RandomIt last);

// Heap validation (C++11):
template<class RandomIt> bool     std::is_heap(RandomIt first, RandomIt last);
template<class RandomIt> RandomIt std::is_heap_until(RandomIt first, RandomIt last);

Examples

Set union — all elements from A and B

cpp
#include <algorithm>
#include <vector>
#include <iterator>

std::vector<int> a{1, 2, 3, 3, 5};
std::vector<int> b{2, 3, 3, 3, 6};

std::vector<int> out;
out.reserve(a.size() + b.size());  // exact upper bound for union

std::set_union(a.begin(), a.end(), b.begin(), b.end(),
               std::back_inserter(out));
// out: {1, 2, 3, 3, 3, 5, 6}
// 3 appears max(2, 3) = 3 times — multiset semantics

Set intersection — elements present in both

cpp
std::vector<int> a{1, 2, 3, 3, 5};
std::vector<int> b{2, 3, 3, 3, 6};

std::vector<int> out;
out.reserve(std::min(a.size(), b.size()));

std::set_intersection(a.begin(), a.end(), b.begin(), b.end(),
                      std::back_inserter(out));
// out: {2, 3, 3}
// 3 appears min(2, 3) = 2 times

Set difference — elements in A but not B

cpp
std::vector<int> a{1, 2, 3, 3, 5};
std::vector<int> b{3, 4};

std::vector<int> out;
out.reserve(a.size());

std::set_difference(a.begin(), a.end(), b.begin(), b.end(),
                    std::back_inserter(out));
// out: {1, 2, 3, 5}
// 3 appears max(0, 2-1) = 1 time

Symmetric difference — elements in exactly one of A or B

cpp
std::vector<int> a{1, 2, 3, 3};
std::vector<int> b{3, 3, 3, 5};

std::vector<int> out;
out.reserve(a.size() + b.size());

std::set_symmetric_difference(a.begin(), a.end(), b.begin(), b.end(),
                               std::back_inserter(out));
// out: {1, 2, 3, 5}
// 3 appears |2-3| = 1 time

includes — subset test

cpp
std::vector<int> universe{1, 2, 3, 4, 5, 6};
std::vector<int> sub{2, 4, 6};

bool ok = std::includes(universe.begin(), universe.end(),
                        sub.begin(), sub.end());
// true: every element of sub appears in universe

// Empty range is always a subset of anything:
std::vector<int> empty;
std::includes(universe.begin(), universe.end(),
              empty.begin(), empty.end());  // always true

Custom comparator — must match the sort order

cpp
struct Event {
    int timestamp;
    std::string name;
};

auto by_time = [](const Event& x, const Event& y) {
    return x.timestamp < y.timestamp;
};

// Both ranges already sorted by timestamp:
std::vector<Event> log1 = {{1, "boot"}, {3, "login"}, {7, "logout"}};
std::vector<Event> log2 = {{2, "connect"}, {3, "auth"}, {9, "disconnect"}};

std::vector<Event> merged;
merged.reserve(log1.size() + log2.size());

// Pass the same comparator used for sorting:
std::set_union(log1.begin(), log1.end(),
               log2.begin(), log2.end(),
               std::back_inserter(merged), by_time);

C++20 ranges — projections eliminate the comparator boilerplate

cpp
// C++20
#include <algorithm>

struct Person { std::string name; int score; };

// Both sorted by name:
std::vector<Person> team_a{{"Alice", 92}, {"Bob", 78}, {"Carol", 85}};
std::vector<Person> team_b{{"Bob", 78}, {"Dave", 90}};

std::vector<Person> combined;
std::ranges::set_union(team_a, team_b,
                       std::back_inserter(combined),
                       std::less{},
                       &Person::name,   // projection for range 1
                       &Person::name);  // projection for range 2
// combined: Alice, Bob, Carol, Dave

Heap Operations

A binary max-heap stored in a flat array satisfies: for every index i, v[i] >= v[2i+1] and v[i] >= v[2i+2]. The standard only guarantees v[0] is maximum; all other positions are unspecified.

Building and modifying a heap

cpp
#include <algorithm>
#include <vector>
#include <cassert>

std::vector<int> v{3, 1, 4, 1, 5, 9, 2, 6};

// O(N) — build max-heap in-place (not O(N log N) — often misquoted):
std::make_heap(v.begin(), v.end());
// v[0] == 9; rest of order is unspecified

// O(log N) — insert: push_back first, then sift up:
v.push_back(10);
std::push_heap(v.begin(), v.end());
// v[0] == 10; heap property fully restored

// O(log N) — extract max: moves it to end, shrinks heap range:
std::pop_heap(v.begin(), v.end());
int max_val = v.back();  // 10
v.pop_back();            // must call separately — pop_heap does NOT remove
// v[0] == 9; heap restored over [begin, end)

// C++11: validate heap property:
assert(std::is_heap(v.begin(), v.end()));

Min-heap with comparator — comparator must be consistent

cpp
std::vector<int> v{3, 1, 4, 1, 5, 9};

std::make_heap(v.begin(), v.end(), std::greater<int>{});
// v[0] == 1 (minimum)

v.push_back(0);
// Pass the SAME comparator — mixing is undefined behavior:
std::push_heap(v.begin(), v.end(), std::greater<int>{});
// v[0] == 0

std::pop_heap(v.begin(), v.end(), std::greater<int>{});
v.pop_back();

sort_heap — heapsort

cpp
// Calling sort_heap destroys the heap property:
std::vector<int> v{3, 1, 4, 1, 5, 9};
std::make_heap(v.begin(), v.end());   // O(N)
std::sort_heap(v.begin(), v.end());   // O(N log N)
// v: {1, 1, 3, 4, 5, 9} — sorted ascending

Manual priority queue vs std::priority_queue

cpp
// Manual approach exposes the underlying vector — useful for bulk
// inspection or modifications (e.g., implementing decrease-key):
struct Task {
    int priority;
    std::string name;
    bool operator<(const Task& o) const { return priority < o.priority; }
};

std::vector<Task> pq;

auto push_task = [&](Task t) {
    pq.push_back(std::move(t));
    std::push_heap(pq.begin(), pq.end());
};

auto pop_task = [&]() -> Task {
    std::pop_heap(pq.begin(), pq.end());
    Task t = std::move(pq.back());
    pq.pop_back();
    return t;
};

push_task({3, "low"});
push_task({10, "urgent"});
push_task({5, "medium"});
Task next = pop_task();  // {10, "urgent"}

// For everything else, prefer the adaptor — it prevents heap invariant bugs:
// std::priority_queue<Task> wraps the same heap ops with a cleaner interface.

Best Practices

Reserve output capacity before writing. The bounds below are exact maximums — no reallocation will occur:

cpp
out.reserve(a.size() + b.size());              // set_union, set_symmetric_difference
out.reserve(std::min(a.size(), b.size()));     // set_intersection
out.reserve(a.size());                         // set_difference (a minus b)

Assert sorted preconditions in debug builds. The algorithms produce no diagnostic — silent wrong output is the failure mode:

cpp
assert(std::is_sorted(a.begin(), a.end(), comp));
assert(std::is_sorted(b.begin(), b.end(), comp));

Prefer std::ranges:: versions (C++20) when projections are involved. They eliminate the need to wrap field access in a lambda comparator.

Know when sorted-range algorithms beat std::set. For bulk one-shot operations on already-sorted data (merging sorted log files, diffing sorted ID lists), the linear-time set algorithms are faster than repeated std::set::insert. For repeated membership tests against a stable set, std::set or std::unordered_set is the right tool.


Common Pitfalls

Unsorted input is undefined behavior, not degraded output. There is no runtime check. The standard imposes this as a precondition.

Mixing comparators between sort and algorithm:

cpp
// BUG: sorted descending, algorithm uses default ascending:
std::sort(a.begin(), a.end(), std::greater<int>{});
std::sort(b.begin(), b.end(), std::greater<int>{});
std::set_union(a.begin(), a.end(), b.begin(), b.end(),
               std::back_inserter(out));  // UB — comparator mismatch
// Fix: pass std::greater<int>{} to set_union as well

Confusing std::merge with set_union. std::merge concatenates all elements (total count M + N, no deduplication). For a = {1,1} and b = {1}, merge produces {1,1,1} while set_union produces {1,1}.

Output overlapping input:

cpp
// BUG: writing back into one of the input vectors invalidates iterators:
std::set_union(a.begin(), a.end(), b.begin(), b.end(),
               std::back_inserter(a));  // UB

Forgetting pop_back() after pop_heap. pop_heap moves the maximum to v.back() and restores the heap over [begin, end-1). The element is still in the vector — you must call v.pop_back() yourself.

Inconsistent heap comparators across make_heap / push_heap / pop_heap are undefined behavior. Store the comparator alongside the vector when the type is not the default.


Complexity

AlgorithmTimeComparisons
set_unionO(M + N)at most 2(M+N) − 1
set_intersectionO(M + N)at most 2·min(M,N)
set_differenceO(M + N)at most 2M − 1
set_symmetric_differenceO(M + N)at most 2(M+N) − 1
includesO(M + N)at most 2N − 1
make_heapO(N)at most 3N comparisons
push_heapO(log N)
pop_heapO(log N)
sort_heapO(N log N)
is_heap / is_heap_untilO(N)C++11

See Also

  • std::merge — merges two sorted ranges retaining all duplicates (M + N total elements)
  • std::inplace_merge — merges two consecutive sorted sub-ranges in-place, O(N log N)
  • std::sort — prerequisite for all set algorithms
  • std::priority_queue — container adaptor wrapping the heap operations
  • std::set — ordered associative container with O(log N) per-element operations