unordered_set and unordered_multiset
Hash-based sets with O(1) average lookup—custom hashing, rehashing, iterator invalidation, C++17/20 additions, and performance traps.
std::unordered_set / std::unordered_multisetsince C++11Hash-table–based set containers that provide O(1) average-case insert, erase, and lookup, at the cost of no ordering guarantee and O(n) worst-case performance when hash collisions are pathological.
Overview
Both containers live in <unordered_set> and use separate chaining: each bucket holds a singly-linked list of elements whose hashes map to that bucket. Elements are never physically moved when a bucket grows—only during a rehash, which invalidates all iterators, pointers, and references.
unordered_set enforces uniqueness. unordered_multiset allows duplicate keys and keeps equal elements adjacent within their bucket, making equal_range efficient.
Prefer unordered_set over set when:
- Membership tests dominate and sorted order is irrelevant.
- The key type has a cheap, well-distributed hash.
- The dataset is large enough that O(log n) comparisons matter (empirically ~10k+ elements).
Stay with set when:
- You need ordered iteration,
lower_bound, or range queries. - Worst-case guarantees matter—adversarial inputs that degenerate a hash table are real.
- The key type lacks a natural hash but has a cheap less-than comparison.
Syntax
template<
class Key,
class Hash = std::hash<Key>,
class KeyEqual = std::equal_to<Key>,
class Alloc = std::allocator<Key>
>
class unordered_set; // C++11The Hash callable must satisfy: equal keys produce equal hashes. It need not be injective. KeyEqual must be consistent with Hash—if a == b then hash(a) == hash(b). Violating this silently corrupts lookup.
Examples
Basic Operations
#include <unordered_set>
#include <print> // C++23
std::unordered_set<std::string> words{"apple", "banana", "cherry"};
words.insert("date");
words.emplace("elderberry"); // in-place construction, C++11
// Lookup
bool has = words.contains("apple"); // C++20; use count() != 0 before C++20
auto it = words.find("banana"); // O(1) avg; returns end() if absent
// insert returns pair<iterator, bool>
auto [pos, inserted] = words.insert("apple"); // C++17 structured bindings
if (!inserted) { /* already present */ }
// Erase by value (O(1) avg) or by iterator (O(1))
words.erase("cherry");
if (it != words.end()) words.erase(it);
// Range insert
std::vector<std::string> extras{"fig", "grape"};
words.insert(extras.begin(), extras.end());Custom Hash — Proper hash_combine
XOR-only combination (h1 ^ h2) is a common mistake: it is symmetric, so {1, 2} and {2, 1} hash identically, and it zero-cancels equal fields. Use a mixing function instead:
template<typename T>
void hash_combine(std::size_t& seed, const T& v) noexcept {
// Boost-style avalanche mixing; the magic constant is 2^32 / phi
seed ^= std::hash<T>{}(v) + 0x9e3779b9u + (seed << 6) + (seed >> 2);
}
struct Point { int x, y; };
bool operator==(const Point& a, const Point& b) noexcept {
return a.x == b.x && a.y == b.y;
}
template<>
struct std::hash<Point> {
std::size_t operator()(const Point& p) const noexcept {
std::size_t seed = 0;
hash_combine(seed, p.x);
hash_combine(seed, p.y);
return seed;
}
};
std::unordered_set<Point> grid;
grid.insert({0, 0});
grid.emplace(3, 4);
std::println("{}", grid.contains({3, 4})); // true; C++20 + C++23Bucket API and Rehashing
std::unordered_set<int> s;
// Reserve before bulk insertion — avoids mid-insertion rehashes
s.reserve(10'000); // equivalent to rehash(ceil(10000 / max_load_factor()))
// Tune load factor before filling; default is 1.0
s.max_load_factor(0.75f); // rehash when 75% full; lower = fewer collisions, more memory
for (int i = 0; i < 10'000; ++i) s.insert(i);
std::println("buckets: {}", s.bucket_count());
std::println("load: {:.3f}", s.load_factor());
// Inspect a specific bucket
std::size_t b = s.bucket(42);
std::println("bucket {} size: {}", b, s.bucket_size(b));
// Force a rehash — ALL iterators, pointers, and references are invalidated
s.rehash(20'000); // request at least 20 000 bucketsreserve(n) is the idiomatic way to pre-size. rehash(n) directly sets the bucket count; use it when you want explicit control over memory layout after deletions shrink the set.
C++17: Node Extraction and Merge
std::unordered_set<std::string> src{"foo", "bar", "baz"};
std::unordered_set<std::string> dst{"qux"};
// Extract a node without allocation or copy — O(1) // C++17
auto node = src.extract("bar");
if (!node.empty()) {
node.value() += "_v2"; // mutate before reinserting
dst.insert(std::move(node));
}
// Splice all non-duplicate elements from src into dst // C++17
dst.merge(src);
// Elements moved to dst are removed from src.
// Duplicates stay in src (no allocation; just not transferred).extract is the correct way to rename or mutate a set element—the only alternative is erase-then-reinsert, which allocates.
C++20: Heterogeneous Lookup
Without a transparent hash, find("literal") constructs a temporary std::string. Opt into heterogeneous lookup to eliminate that allocation:
struct StringHash {
using is_transparent = void; // required marker; activates heterogeneous overloads
std::size_t operator()(std::string_view sv) const noexcept {
return std::hash<std::string_view>{}(sv);
}
// string_view is implicitly constructible from string, so one overload suffices
};
// Both Hash and KeyEqual must be transparent for heterogeneous lookup to activate // C++20
std::unordered_set<std::string, StringHash, std::equal_to<>> words{"alpha", "beta"};
words.contains(std::string_view{"alpha"}); // no string construction, C++20
words.find("beta"); // const char* — still no allocationstd::equal_to<> (C++14, no template argument) is the transparent equality comparator. The compiler requires both Hash::is_transparent and KeyEqual::is_transparent to exist before enabling the heterogeneous overloads.
unordered_multiset
std::unordered_multiset<int> bag{1, 2, 2, 3, 3, 3};
bag.insert(2);
std::println("count(2): {}", bag.count(2)); // 3
// equal_range: all copies of a key in O(1) avg
auto [lo, hi] = bag.equal_range(3); // C++17 structured bindings
std::println("copies of 3: {}", std::distance(lo, hi)); // 3
// Erase exactly one occurrence
auto it = bag.find(2);
if (it != bag.end()) bag.erase(it); // removes one '2'
// Erase ALL occurrences
bag.erase(3); // removes all three copiescount() on unordered_multiset is O(k) where k is the number of matches, not O(1)—budget for that in hot paths.
Best Practices
Pre-size with reserve: call reserve(expected_count) before bulk insertion. A single rehash on a 1M-element set recomputes every bucket index and relinks every node.
Write a distributing hash: verify distribution by inspecting max_bucket_size() after loading data. A hash that maps many keys to one bucket silently degrades all operations to O(n). If you control neither the key type nor its hash, wrap it.
Use heterogeneous lookup for string-keyed sets (C++20): every find("literal") on a plain unordered_set<string> allocates. In a search-heavy hot path this dominates.
Prefer emplace over insert for non-trivial types: emplace constructs the element directly inside the allocated node, avoiding a temporary.
Prefer contains over count() != 0 (C++20): more expressive, and for unordered_set equally fast; for unordered_multiset it short-circuits on the first match.
Common Pitfalls
Iterator invalidation on rehash: any insert that pushes the load factor past max_load_factor() triggers a rehash and invalidates all outstanding iterators.
auto it = s.find(key);
s.insert(other_key); // may rehash — 'it' is now dangling
*it; // undefined behaviourCapture values, not iterators, across insertions—or reserve enough capacity upfront.
Mutating an element through its iterator is undefined behaviour: elements are stored as const Key in a set. Casting away const and modifying the element corrupts the bucket structure without any diagnostic. Use extract (C++17) to remove, modify, and reinsert safely.
XOR-only hash for compound keys: h1 ^ h2 is commutative—hash({a,b}) == hash({b,a})—and zero-cancels when a == b. Always use hash_combine.
Worst-case O(n) is not theoretical: a carefully chosen or user-supplied key set can deliberately pack all elements into one bucket. For containers keyed on untrusted input, use a randomised seed or a hash-hardened third-party implementation (e.g., absl::flat_hash_set, which applies per-process randomisation by default).
unordered_multiset::erase(key) removes all copies: to drop a single occurrence, pass an iterator from find(), not the key value.
See Also
std::set— ordered alternative; O(log n), supports range queries, no hash requiredstd::unordered_map— key→value variant ofunordered_setstd::flat_set— C++23 sorted flat array; cache-friendly for read-heavy workloads, no hash- Custom Hash Functions — hash_combine, randomised hashing, writing quality hash functions