Skip to content
C++
Library
since C++11
Intermediate

std::regex

C++ regular expressions — regex_match, regex_search, regex_replace, capture groups, iterators, and when to reach for a faster library.

std::regexsince C++11

std::regex encapsulates a compiled regular expression pattern and, paired with regex_match, regex_search, regex_replace, and iterator adapters, provides ECMAScript-flavored pattern matching over std::string, std::wstring, and raw character arrays.

Overview

<regex>, introduced in C++11, exposes a grammar-agnostic engine through std::basic_regex<CharT>. The default grammar is a subset of ECMAScript — specifically ES3, which predates lookbehind assertions and named captures. Alternate grammars (POSIX BRE, POSIX ERE, awk, grep, egrep) can be selected at construction time via std::regex_constants::syntax_option_type.

The library is correct but expensive: pattern compilation allocates an NFA internally and costs on the order of microseconds to milliseconds depending on pattern complexity. Matching uses NFA simulation, which can exhibit exponential worst-case behavior on pathological patterns (ReDoS).

Grammar limitations. C++ regex is based on ES3. Lookahead ((?=…), (?!…)) works. Lookbehind ((?<=…), (?<!…)) and named captures ((?<name>…)) are not supported — they remain absent from standard <regex> through C++23.

Three core functions

cpp
// Full-string match — entire input must satisfy the pattern
bool std::regex_match(str, re);
bool std::regex_match(str, match_results, re);   // also fills captures

// Substring search — find first occurrence anywhere in input
bool std::regex_search(str, re);
bool std::regex_search(str, match_results, re);

// Replace occurrences — returns new std::string
std::string std::regex_replace(str, re, fmt);
std::string std::regex_replace(str, re, fmt, flags);

Examples

Validating and extracting with regex_match

cpp
#include <regex>
#include <string>
#include <print>  // C++23

static const std::regex date_re{
    R"(^(\d{4})-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$)"
};

std::string input = "2026-05-22";
std::smatch m;

if (std::regex_match(input, m, date_re)) {
    // m[0] — whole match; m[1..N] — capture groups (std::sub_match)
    std::println("year={} month={} day={}", m[1].str(), m[2].str(), m[3].str());

    // sub_match members: position(), length(), str(), first, second
    std::println("day starts at offset {}", m[3].position());
}

regex_match requires the entire string to satisfy the pattern. Use it for validation. Use regex_search when the pattern may appear at any offset.

Searching within a string

cpp
std::string log = "[2026-05-22 14:30:01] ERROR: disk full on /dev/sda1";
static const std::regex log_re{
    R"(\[(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2})\] (\w+): (.+))"
};
std::smatch m;

if (std::regex_search(log, m, log_re)) {
    std::println("date={} time={} level={} msg={}",
        m[1].str(), m[2].str(), m[3].str(), m[4].str());

    // prefix() and suffix() yield the text before and after the match
    std::println("before match: '{}'", m.prefix().str());
    std::println("after match:  '{}'", m.suffix().str());
}

Iterating all matches with sregex_iterator

cpp
// C++11: sregex_iterator wraps repeated regex_search calls
std::string src = "port=8080 timeout=30 retries=3";
static const std::regex kv_re{R"((\w+)=(\d+))"};

for (std::sregex_iterator it{src.begin(), src.end(), kv_re}, end;
     it != end; ++it) {
    const std::smatch& m = *it;
    std::println("{}={}", m[1].str(), m[2].str());
}
// port=8080
// timeout=30
// retries=3

Splitting with regex_token_iterator

Submatch index -1 yields the tokens between matches — a split:

cpp
// C++11: regex_token_iterator with index -1 splits on the separator
std::string csv = "alpha,beta,,gamma,delta";
static const std::regex sep{","};

std::vector<std::string> fields{
    std::sregex_token_iterator{csv.begin(), csv.end(), sep, -1},
    std::sregex_token_iterator{}
};
// {"alpha", "beta", "", "gamma", "delta"}

Pass a positive index to extract a specific capture group from each match instead.

Replacing with back-references

cpp
std::string dates = "Born: 2026-05-22, Expires: 2027-01-01";
static const std::regex iso_re{R"((\d{4})-(\d{2})-(\d{2}))"};

// $1, $2, $3 reference capture groups in the replacement format string
std::string us = std::regex_replace(dates, iso_re, "$2/$3/$1");
// "Born: 05/22/2026, Expires: 01/01/2027"

// Replace only the first match
std::string first = std::regex_replace(
    dates, iso_re, "$2/$3/$1",
    std::regex_constants::format_first_only
);
// "Born: 05/22/2026, Expires: 2027-01-01"

Syntax reference (ECMAScript grammar, default)

cpp
// Character classes
// [abc]    literal set         [a-z]   range         [^abc]  negated set
// \d       [0-9]               \D      [^0-9]
// \w       [a-zA-Z0-9_]        \W      [^\w]
// \s       [ \t\r\n\f\v]       \S      [^\s]
// .        any except \n

// Anchors
// ^        start of string     $       end of string
// \b       word boundary       \B      non-word boundary

// Quantifiers — append ? for lazy (e.g. a*?)
// a?  0 or 1    a*  0 or more    a+  1 or more
// a{n}  exactly n    a{n,m}  n to m    a{n,}  at least n

// Groups and lookahead
// (abc)    capture group       (?:abc)  non-capturing group
// a|b      alternation
// (?=abc)  positive lookahead  (?!abc)  negative lookahead
// (?<=…)   lookbehind — NOT supported in C++ ECMAScript dialect
// (?<name>…) named captures — NOT supported in standard <regex>

Flags

cpp
namespace rc = std::regex_constants;

// Case-insensitive matching
std::regex re1{R"([a-z]+)", rc::icase};

// Multiline — ^ and $ match line boundaries (ECMAScript grammar only)
std::regex re2{R"(^\d+)", rc::multiline};

// nosubs — disable capture group recording (cheaper matching for bool tests)
std::regex re3{R"(foo(bar))", rc::nosubs};

// optimize — hint: spend more time at construction for faster repeated matching
std::regex re4{R"(\d{4}-\d{2}-\d{2})", rc::optimize};

// rc::extended selects POSIX ERE grammar — it does NOT enable verbose/comment mode
// There is no whitespace-as-comment mode in standard <regex>
std::regex re5{R"([0-9]+\.[0-9]+)", rc::extended};

// Flags combine with |
std::regex re6{R"(hello\s+world)", rc::icase | rc::optimize};

Best Practices

Declare patterns static const. NFA compilation is the dominant cost — typically 100 µs–1 ms depending on pattern complexity. A pattern reconstructed on every call makes regex useless for any hot path.

cpp
// BAD — NFA compiled on every invocation
void validate_bad(const std::string& s) {
    if (std::regex_match(s, std::regex{R"(\d{3}-\d{4})"})) { /* ... */ }
}

// GOOD — compiled once; static const regex is safe to read from multiple threads
void validate_good(const std::string& s) {
    static const std::regex re{R"(\d{3}-\d{4})"};
    if (std::regex_match(s, re)) { /* ... */ }
}

Use raw string literals. R"(\d+\.\d+)" vs "\\d+\\.\\d+" — the difference grows fast with complex patterns, and raw literals eliminate a common source of bugs.

Catch std::regex_error on untrusted patterns. Construction throws std::regex_error (a std::runtime_error) for invalid patterns. Never construct std::regex from user input without a try/catch.

cpp
try {
    std::regex re{user_pattern};
} catch (const std::regex_error& e) {
    // e.code() returns std::regex_constants::error_type
    report_error(e.what());
}

Prefer rc::nosubs for boolean-only matches. When you only need a yes/no answer and the pattern has no captures you use, nosubs skips sub-match bookkeeping.

Common Pitfalls

regex_match vs regex_search confusion. regex_match fails unless the pattern accounts for the entire string. Adding .* anchors to a pattern to make it work with regex_match is a sign you want regex_search.

ReDoS with nested quantifiers. Patterns like (a+)+b on a non-matching string like "aaaaaaaaac" trigger exponential NFA backtracking. Unlike RE2, std::regex provides no linear-time guarantee. Avoid nested quantifiers on variable-length groups in security-sensitive or user-controlled inputs.

Dangling iterator references. sregex_iterator and sregex_token_iterator hold iterators into the source string. The string must outlive all iterators.

cpp
// Undefined behavior — iterator references a destroyed temporary
auto it = std::sregex_iterator{
    std::string{"foo 42"}.begin(),   // temporary destroyed here
    std::string{"foo 42"}.end(),
    re
};

match_results invalidated on string mutation. std::sub_match objects inside smatch hold iterators into the original string. Modifying the string after a successful match invalidates all results.

rc::extended is not verbose mode. std::regex_constants::extended selects POSIX ERE grammar — it does not enable whitespace-as-comment mode. Standard <regex> has no equivalent to Perl's /x or Python's re.VERBOSE flag.

When not to use std::regex

For fixed-string operations, avoid <regex> entirely:

cpp
s.find("needle");           // substring search
s.starts_with("prefix");    // C++20
s.ends_with(".cpp");        // C++20
std::ranges::count(s, '\n');  // character counting  // C++20

For performance-critical or high-throughput workloads:

  • RE2 — linear-time guarantees, safe against ReDoS, similar API surface
  • PCRE2 — full PCRE syntax with JIT compilation, roughly 5–10× faster than std::regex
  • Hyperscan — Intel SIMD-accelerated, simultaneous multi-pattern matching on byte streams

std::regex is appropriate when patterns are not on hot paths, external dependencies are unwelcome, and ES3-dialect ECMAScript syntax is sufficient.

See Also