Set and Heap Algorithms
Set algorithms operating on sorted ranges and heap operations — multiset semantics, complexity guarantees, and C++20 ranges versions.
Set and Heap Algorithmssince C++98Six algorithms in <algorithm> that compute multiset relationships (union, intersection, difference, subset) over sorted ranges, plus five heap algorithms implementing priority-queue operations on any random-access range.
Overview
The set algorithms treat sorted ranges as mathematical multisets: element multiplicity is preserved according to well-defined rules. If an element appears M times in the first range and N times in the second:
| Algorithm | Output count per element |
|---|---|
set_union | max(M, N) |
set_intersection | min(M, N) |
set_difference | max(0, M − N) |
set_symmetric_difference | |M − N| |
All four produce a sorted output range using the same ordering relation as the inputs. The heap algorithms implement a binary max-heap stored in a flat array, guaranteeing the maximum element is always at v[0].
Critical preconditions for set algorithms:
- Both input ranges must be sorted under the same comparator passed to the algorithm. Violating this is undefined behavior, not just wrong results.
- The output range must not overlap either input range.
- The comparator must be a strict weak ordering.
Syntax
// All set operation overloads follow this pattern (C++98):
template<class InputIt1, class InputIt2, class OutputIt>
OutputIt std::set_union(InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first);
template<class InputIt1, class InputIt2, class OutputIt, class Compare>
OutputIt std::set_union(InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first, Compare comp);
// Same pattern applies to: set_intersection, set_difference,
// set_symmetric_difference, includes.
// Heap operations (C++98):
template<class RandomIt> void std::make_heap(RandomIt first, RandomIt last);
template<class RandomIt> void std::push_heap(RandomIt first, RandomIt last);
template<class RandomIt> void std::pop_heap(RandomIt first, RandomIt last);
template<class RandomIt> void std::sort_heap(RandomIt first, RandomIt last);
// Heap validation (C++11):
template<class RandomIt> bool std::is_heap(RandomIt first, RandomIt last);
template<class RandomIt> RandomIt std::is_heap_until(RandomIt first, RandomIt last);Examples
Set union — all elements from A and B
#include <algorithm>
#include <vector>
#include <iterator>
std::vector<int> a{1, 2, 3, 3, 5};
std::vector<int> b{2, 3, 3, 3, 6};
std::vector<int> out;
out.reserve(a.size() + b.size()); // exact upper bound for union
std::set_union(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(out));
// out: {1, 2, 3, 3, 3, 5, 6}
// 3 appears max(2, 3) = 3 times — multiset semanticsSet intersection — elements present in both
std::vector<int> a{1, 2, 3, 3, 5};
std::vector<int> b{2, 3, 3, 3, 6};
std::vector<int> out;
out.reserve(std::min(a.size(), b.size()));
std::set_intersection(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(out));
// out: {2, 3, 3}
// 3 appears min(2, 3) = 2 timesSet difference — elements in A but not B
std::vector<int> a{1, 2, 3, 3, 5};
std::vector<int> b{3, 4};
std::vector<int> out;
out.reserve(a.size());
std::set_difference(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(out));
// out: {1, 2, 3, 5}
// 3 appears max(0, 2-1) = 1 timeSymmetric difference — elements in exactly one of A or B
std::vector<int> a{1, 2, 3, 3};
std::vector<int> b{3, 3, 3, 5};
std::vector<int> out;
out.reserve(a.size() + b.size());
std::set_symmetric_difference(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(out));
// out: {1, 2, 3, 5}
// 3 appears |2-3| = 1 timeincludes — subset test
std::vector<int> universe{1, 2, 3, 4, 5, 6};
std::vector<int> sub{2, 4, 6};
bool ok = std::includes(universe.begin(), universe.end(),
sub.begin(), sub.end());
// true: every element of sub appears in universe
// Empty range is always a subset of anything:
std::vector<int> empty;
std::includes(universe.begin(), universe.end(),
empty.begin(), empty.end()); // always trueCustom comparator — must match the sort order
struct Event {
int timestamp;
std::string name;
};
auto by_time = [](const Event& x, const Event& y) {
return x.timestamp < y.timestamp;
};
// Both ranges already sorted by timestamp:
std::vector<Event> log1 = {{1, "boot"}, {3, "login"}, {7, "logout"}};
std::vector<Event> log2 = {{2, "connect"}, {3, "auth"}, {9, "disconnect"}};
std::vector<Event> merged;
merged.reserve(log1.size() + log2.size());
// Pass the same comparator used for sorting:
std::set_union(log1.begin(), log1.end(),
log2.begin(), log2.end(),
std::back_inserter(merged), by_time);C++20 ranges — projections eliminate the comparator boilerplate
// C++20
#include <algorithm>
struct Person { std::string name; int score; };
// Both sorted by name:
std::vector<Person> team_a{{"Alice", 92}, {"Bob", 78}, {"Carol", 85}};
std::vector<Person> team_b{{"Bob", 78}, {"Dave", 90}};
std::vector<Person> combined;
std::ranges::set_union(team_a, team_b,
std::back_inserter(combined),
std::less{},
&Person::name, // projection for range 1
&Person::name); // projection for range 2
// combined: Alice, Bob, Carol, DaveHeap Operations
A binary max-heap stored in a flat array satisfies: for every index i, v[i] >= v[2i+1] and v[i] >= v[2i+2]. The standard only guarantees v[0] is maximum; all other positions are unspecified.
Building and modifying a heap
#include <algorithm>
#include <vector>
#include <cassert>
std::vector<int> v{3, 1, 4, 1, 5, 9, 2, 6};
// O(N) — build max-heap in-place (not O(N log N) — often misquoted):
std::make_heap(v.begin(), v.end());
// v[0] == 9; rest of order is unspecified
// O(log N) — insert: push_back first, then sift up:
v.push_back(10);
std::push_heap(v.begin(), v.end());
// v[0] == 10; heap property fully restored
// O(log N) — extract max: moves it to end, shrinks heap range:
std::pop_heap(v.begin(), v.end());
int max_val = v.back(); // 10
v.pop_back(); // must call separately — pop_heap does NOT remove
// v[0] == 9; heap restored over [begin, end)
// C++11: validate heap property:
assert(std::is_heap(v.begin(), v.end()));Min-heap with comparator — comparator must be consistent
std::vector<int> v{3, 1, 4, 1, 5, 9};
std::make_heap(v.begin(), v.end(), std::greater<int>{});
// v[0] == 1 (minimum)
v.push_back(0);
// Pass the SAME comparator — mixing is undefined behavior:
std::push_heap(v.begin(), v.end(), std::greater<int>{});
// v[0] == 0
std::pop_heap(v.begin(), v.end(), std::greater<int>{});
v.pop_back();sort_heap — heapsort
// Calling sort_heap destroys the heap property:
std::vector<int> v{3, 1, 4, 1, 5, 9};
std::make_heap(v.begin(), v.end()); // O(N)
std::sort_heap(v.begin(), v.end()); // O(N log N)
// v: {1, 1, 3, 4, 5, 9} — sorted ascendingManual priority queue vs std::priority_queue
// Manual approach exposes the underlying vector — useful for bulk
// inspection or modifications (e.g., implementing decrease-key):
struct Task {
int priority;
std::string name;
bool operator<(const Task& o) const { return priority < o.priority; }
};
std::vector<Task> pq;
auto push_task = [&](Task t) {
pq.push_back(std::move(t));
std::push_heap(pq.begin(), pq.end());
};
auto pop_task = [&]() -> Task {
std::pop_heap(pq.begin(), pq.end());
Task t = std::move(pq.back());
pq.pop_back();
return t;
};
push_task({3, "low"});
push_task({10, "urgent"});
push_task({5, "medium"});
Task next = pop_task(); // {10, "urgent"}
// For everything else, prefer the adaptor — it prevents heap invariant bugs:
// std::priority_queue<Task> wraps the same heap ops with a cleaner interface.Best Practices
Reserve output capacity before writing. The bounds below are exact maximums — no reallocation will occur:
out.reserve(a.size() + b.size()); // set_union, set_symmetric_difference
out.reserve(std::min(a.size(), b.size())); // set_intersection
out.reserve(a.size()); // set_difference (a minus b)Assert sorted preconditions in debug builds. The algorithms produce no diagnostic — silent wrong output is the failure mode:
assert(std::is_sorted(a.begin(), a.end(), comp));
assert(std::is_sorted(b.begin(), b.end(), comp));Prefer std::ranges:: versions (C++20) when projections are involved. They eliminate the need to wrap field access in a lambda comparator.
Know when sorted-range algorithms beat std::set. For bulk one-shot operations on already-sorted data (merging sorted log files, diffing sorted ID lists), the linear-time set algorithms are faster than repeated std::set::insert. For repeated membership tests against a stable set, std::set or std::unordered_set is the right tool.
Common Pitfalls
Unsorted input is undefined behavior, not degraded output. There is no runtime check. The standard imposes this as a precondition.
Mixing comparators between sort and algorithm:
// BUG: sorted descending, algorithm uses default ascending:
std::sort(a.begin(), a.end(), std::greater<int>{});
std::sort(b.begin(), b.end(), std::greater<int>{});
std::set_union(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(out)); // UB — comparator mismatch
// Fix: pass std::greater<int>{} to set_union as wellConfusing std::merge with set_union. std::merge concatenates all elements (total count M + N, no deduplication). For a = {1,1} and b = {1}, merge produces {1,1,1} while set_union produces {1,1}.
Output overlapping input:
// BUG: writing back into one of the input vectors invalidates iterators:
std::set_union(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(a)); // UBForgetting pop_back() after pop_heap. pop_heap moves the maximum to v.back() and restores the heap over [begin, end-1). The element is still in the vector — you must call v.pop_back() yourself.
Inconsistent heap comparators across make_heap / push_heap / pop_heap are undefined behavior. Store the comparator alongside the vector when the type is not the default.
Complexity
| Algorithm | Time | Comparisons |
|---|---|---|
set_union | O(M + N) | at most 2(M+N) − 1 |
set_intersection | O(M + N) | at most 2·min(M,N) |
set_difference | O(M + N) | at most 2M − 1 |
set_symmetric_difference | O(M + N) | at most 2(M+N) − 1 |
includes | O(M + N) | at most 2N − 1 |
make_heap | O(N) | at most 3N comparisons |
push_heap | O(log N) | |
pop_heap | O(log N) | |
sort_heap | O(N log N) | |
is_heap / is_heap_until | O(N) | C++11 |
See Also
std::merge— merges two sorted ranges retaining all duplicates (M + N total elements)std::inplace_merge— merges two consecutive sorted sub-ranges in-place, O(N log N)std::sort— prerequisite for all set algorithmsstd::priority_queue— container adaptor wrapping the heap operationsstd::set— ordered associative container with O(log N) per-element operations