Skip to content
C++
Library
since C++11
Intermediate

unordered_set and unordered_multiset

Hash-based sets with O(1) average lookup—custom hashing, rehashing, iterator invalidation, C++17/20 additions, and performance traps.

std::unordered_set / std::unordered_multisetsince C++11

Hash-table–based set containers that provide O(1) average-case insert, erase, and lookup, at the cost of no ordering guarantee and O(n) worst-case performance when hash collisions are pathological.

Overview

Both containers live in <unordered_set> and use separate chaining: each bucket holds a singly-linked list of elements whose hashes map to that bucket. Elements are never physically moved when a bucket grows—only during a rehash, which invalidates all iterators, pointers, and references.

unordered_set enforces uniqueness. unordered_multiset allows duplicate keys and keeps equal elements adjacent within their bucket, making equal_range efficient.

Prefer unordered_set over set when:

  • Membership tests dominate and sorted order is irrelevant.
  • The key type has a cheap, well-distributed hash.
  • The dataset is large enough that O(log n) comparisons matter (empirically ~10k+ elements).

Stay with set when:

  • You need ordered iteration, lower_bound, or range queries.
  • Worst-case guarantees matter—adversarial inputs that degenerate a hash table are real.
  • The key type lacks a natural hash but has a cheap less-than comparison.

Syntax

cpp
template<
    class Key,
    class Hash     = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Alloc    = std::allocator<Key>
>
class unordered_set;   // C++11

The Hash callable must satisfy: equal keys produce equal hashes. It need not be injective. KeyEqual must be consistent with Hash—if a == b then hash(a) == hash(b). Violating this silently corrupts lookup.

Examples

Basic Operations

cpp
#include <unordered_set>
#include <print>   // C++23

std::unordered_set<std::string> words{"apple", "banana", "cherry"};

words.insert("date");
words.emplace("elderberry");          // in-place construction, C++11

// Lookup
bool has = words.contains("apple");   // C++20; use count() != 0 before C++20
auto it  = words.find("banana");      // O(1) avg; returns end() if absent

// insert returns pair<iterator, bool>
auto [pos, inserted] = words.insert("apple");   // C++17 structured bindings
if (!inserted) { /* already present */ }

// Erase by value (O(1) avg) or by iterator (O(1))
words.erase("cherry");
if (it != words.end()) words.erase(it);

// Range insert
std::vector<std::string> extras{"fig", "grape"};
words.insert(extras.begin(), extras.end());

Custom Hash — Proper hash_combine

XOR-only combination (h1 ^ h2) is a common mistake: it is symmetric, so {1, 2} and {2, 1} hash identically, and it zero-cancels equal fields. Use a mixing function instead:

cpp
template<typename T>
void hash_combine(std::size_t& seed, const T& v) noexcept {
    // Boost-style avalanche mixing; the magic constant is 2^32 / phi
    seed ^= std::hash<T>{}(v) + 0x9e3779b9u + (seed << 6) + (seed >> 2);
}

struct Point { int x, y; };

bool operator==(const Point& a, const Point& b) noexcept {
    return a.x == b.x && a.y == b.y;
}

template<>
struct std::hash<Point> {
    std::size_t operator()(const Point& p) const noexcept {
        std::size_t seed = 0;
        hash_combine(seed, p.x);
        hash_combine(seed, p.y);
        return seed;
    }
};

std::unordered_set<Point> grid;
grid.insert({0, 0});
grid.emplace(3, 4);
std::println("{}", grid.contains({3, 4}));   // true; C++20 + C++23

Bucket API and Rehashing

cpp
std::unordered_set<int> s;

// Reserve before bulk insertion — avoids mid-insertion rehashes
s.reserve(10'000);            // equivalent to rehash(ceil(10000 / max_load_factor()))

// Tune load factor before filling; default is 1.0
s.max_load_factor(0.75f);     // rehash when 75% full; lower = fewer collisions, more memory

for (int i = 0; i < 10'000; ++i) s.insert(i);

std::println("buckets: {}",        s.bucket_count());
std::println("load: {:.3f}",       s.load_factor());

// Inspect a specific bucket
std::size_t b = s.bucket(42);
std::println("bucket {} size: {}", b, s.bucket_size(b));

// Force a rehash — ALL iterators, pointers, and references are invalidated
s.rehash(20'000);    // request at least 20 000 buckets

reserve(n) is the idiomatic way to pre-size. rehash(n) directly sets the bucket count; use it when you want explicit control over memory layout after deletions shrink the set.

C++17: Node Extraction and Merge

cpp
std::unordered_set<std::string> src{"foo", "bar", "baz"};
std::unordered_set<std::string> dst{"qux"};

// Extract a node without allocation or copy — O(1)              // C++17
auto node = src.extract("bar");
if (!node.empty()) {
    node.value() += "_v2";      // mutate before reinserting
    dst.insert(std::move(node));
}

// Splice all non-duplicate elements from src into dst            // C++17
dst.merge(src);
// Elements moved to dst are removed from src.
// Duplicates stay in src (no allocation; just not transferred).

extract is the correct way to rename or mutate a set element—the only alternative is erase-then-reinsert, which allocates.

C++20: Heterogeneous Lookup

Without a transparent hash, find("literal") constructs a temporary std::string. Opt into heterogeneous lookup to eliminate that allocation:

cpp
struct StringHash {
    using is_transparent = void;    // required marker; activates heterogeneous overloads

    std::size_t operator()(std::string_view sv) const noexcept {
        return std::hash<std::string_view>{}(sv);
    }
    // string_view is implicitly constructible from string, so one overload suffices
};

// Both Hash and KeyEqual must be transparent for heterogeneous lookup to activate  // C++20
std::unordered_set<std::string, StringHash, std::equal_to<>> words{"alpha", "beta"};

words.contains(std::string_view{"alpha"});   // no string construction, C++20
words.find("beta");                          // const char* — still no allocation

std::equal_to<> (C++14, no template argument) is the transparent equality comparator. The compiler requires both Hash::is_transparent and KeyEqual::is_transparent to exist before enabling the heterogeneous overloads.

unordered_multiset

cpp
std::unordered_multiset<int> bag{1, 2, 2, 3, 3, 3};

bag.insert(2);
std::println("count(2): {}", bag.count(2));   // 3

// equal_range: all copies of a key in O(1) avg
auto [lo, hi] = bag.equal_range(3);           // C++17 structured bindings
std::println("copies of 3: {}", std::distance(lo, hi));  // 3

// Erase exactly one occurrence
auto it = bag.find(2);
if (it != bag.end()) bag.erase(it);           // removes one '2'

// Erase ALL occurrences
bag.erase(3);    // removes all three copies

count() on unordered_multiset is O(k) where k is the number of matches, not O(1)—budget for that in hot paths.

Best Practices

Pre-size with reserve: call reserve(expected_count) before bulk insertion. A single rehash on a 1M-element set recomputes every bucket index and relinks every node.

Write a distributing hash: verify distribution by inspecting max_bucket_size() after loading data. A hash that maps many keys to one bucket silently degrades all operations to O(n). If you control neither the key type nor its hash, wrap it.

Use heterogeneous lookup for string-keyed sets (C++20): every find("literal") on a plain unordered_set<string> allocates. In a search-heavy hot path this dominates.

Prefer emplace over insert for non-trivial types: emplace constructs the element directly inside the allocated node, avoiding a temporary.

Prefer contains over count() != 0 (C++20): more expressive, and for unordered_set equally fast; for unordered_multiset it short-circuits on the first match.

Common Pitfalls

Iterator invalidation on rehash: any insert that pushes the load factor past max_load_factor() triggers a rehash and invalidates all outstanding iterators.

cpp
auto it = s.find(key);
s.insert(other_key);    // may rehash — 'it' is now dangling
*it;                    // undefined behaviour

Capture values, not iterators, across insertions—or reserve enough capacity upfront.

Mutating an element through its iterator is undefined behaviour: elements are stored as const Key in a set. Casting away const and modifying the element corrupts the bucket structure without any diagnostic. Use extract (C++17) to remove, modify, and reinsert safely.

XOR-only hash for compound keys: h1 ^ h2 is commutative—hash({a,b}) == hash({b,a})—and zero-cancels when a == b. Always use hash_combine.

Worst-case O(n) is not theoretical: a carefully chosen or user-supplied key set can deliberately pack all elements into one bucket. For containers keyed on untrusted input, use a randomised seed or a hash-hardened third-party implementation (e.g., absl::flat_hash_set, which applies per-process randomisation by default).

unordered_multiset::erase(key) removes all copies: to drop a single occurrence, pass an iterator from find(), not the key value.

See Also

  • std::set — ordered alternative; O(log n), supports range queries, no hash required
  • std::unordered_map — key→value variant of unordered_set
  • std::flat_set — C++23 sorted flat array; cache-friendly for read-heavy workloads, no hash
  • Custom Hash Functions — hash_combine, randomised hashing, writing quality hash functions