Skip to content
C++
Library
since C++98
Beginner

String Algorithms

C++ string operations — searching, splitting, trimming, case conversion, replace-all, and fast number parsing with from_chars and string_view.

std::string algorithmssince C++98

std::string provides member functions for searching, modifying, and comparing character sequences; C++20 added starts_with, ends_with, and contains; C++17 introduced std::string_view for zero-copy string references and std::from_chars/std::to_chars for locale-independent, allocation-free number parsing.

Overview

std::string (since C++98) is a mutable, owning sequence of char. Its member API covers most common operations, but several utilities — splitting, joining, trimming, replace-all — have no single built-in and require a few lines of composition. C++20 added the predicate trio (starts_with, ends_with, contains) and std::views::split. For read-only string handling, prefer std::string_view (C++17): it accepts both std::string and const char* arguments without copying.


Searching

cpp
#include <string>

std::string s = "hello world hello";

// find / rfind — return position or std::string::npos (C++98)
s.find("world")        // 6
s.find("world", 7)     // npos (start search at offset 7)
s.rfind("hello")       // 12 (last occurrence)
s.find('o')            // 4 (char overload)

// Character-set scanning (C++98)
s.find_first_of("aeiou")      // 1 ('e')
s.find_last_of("aeiou")       // 13 ('o')
s.find_first_not_of("helo ") // 6 ('w')
s.find_last_not_of(" ")      // 16

// C++20 — cleaner boolean predicates
s.starts_with("hello")   // true
s.ends_with("hello")     // true
s.contains("world")      // true
s.contains("xyz")        // false

All find_* methods are also available on std::string_view with identical semantics.


Substrings and Modification

cpp
std::string s = "hello world";

// substr — always allocates (C++98); prefer string_view::substr for read-only work
s.substr(6)     // "world"
s.substr(6, 3)  // "wor"

// replace, insert, erase (C++98)
s.replace(6, 5, "C++");     // "hello C++"
s.insert(5, ",");            // "hello, C++"
s.erase(5, 2);               // "hello C++"

// Append (C++98)
s += " rocks";
s.append(3, '!');            // append 3 '!' chars
s.push_back('?');

// In-place case transform — MUST cast through unsigned char (avoid UB)
#include <algorithm>
#include <cctype>
std::transform(s.begin(), s.end(), s.begin(),
               [](unsigned char c) { return std::toupper(c); });  // C++11 lambda

::toupper and UB: Passing a char directly to std::toupper is undefined behaviour when char is signed and the value is negative (i.e., code points above 127). Always cast through unsigned char.


Split

The standard library has no split built-in before C++20's ranges.

cpp
#include <string_view>
#include <vector>

// Manual split — O(n), no allocation per token for view version (C++17)
std::vector<std::string_view> split(std::string_view sv, char delim) {
    std::vector<std::string_view> out;
    for (;;) {
        auto pos = sv.find(delim);
        out.push_back(sv.substr(0, pos));
        if (pos == std::string_view::npos) break;
        sv.remove_prefix(pos + 1);  // string_view::remove_prefix — C++17
    }
    return out;  // views into the original buffer; don't outlive it
}

// C++20 ranges split — lazy, produces contiguous subranges
#include <ranges>
auto rng = std::views::split("one,two,three"sv, ',');
for (auto part : rng) {
    std::string_view sv{part.begin(), part.end()};  // C++20
    // C++23: std::string_view sv{part};  (range constructor)
}

// C++23: collect into vector without manual loop
// auto parts = std::views::split("a,b,c"sv, ',')
//            | std::ranges::to<std::vector<std::string>>();

The manual version is often faster in benchmarks because it avoids the lazy-range machinery overhead for small inputs and returns string_view tokens — zero allocation.


Join

cpp
#include <numeric>
#include <string>

// Accumulate-based join (C++11)
std::string join(std::span<const std::string> parts, std::string_view sep) {
    // std::span — C++20
    if (parts.empty()) return {};
    std::string out = parts[0];
    for (std::size_t i = 1; i < parts.size(); ++i) {
        out += sep;
        out += parts[i];
    }
    return out;
}

// Reserve up-front for large ranges (avoids repeated reallocations)
std::string join_reserved(std::span<const std::string> parts, std::string_view sep) {
    if (parts.empty()) return {};
    std::size_t total = sep.size() * (parts.size() - 1);
    for (auto& p : parts) total += p.size();
    std::string out;
    out.reserve(total);
    out = parts[0];
    for (std::size_t i = 1; i < parts.size(); ++i) {
        out.append(sep);
        out.append(parts[i]);
    }
    return out;
}

Trim

cpp
#include <string_view>
#include <cctype>

// Whitespace characters per the C locale
constexpr std::string_view kWS = " \t\r\n\v\f";

std::string_view ltrim(std::string_view sv) {
    auto pos = sv.find_first_not_of(kWS);
    return pos == std::string_view::npos ? "" : sv.substr(pos);
}

std::string_view rtrim(std::string_view sv) {
    auto pos = sv.find_last_not_of(kWS);
    return pos == std::string_view::npos ? "" : sv.substr(0, pos + 1);
}

std::string_view trim(std::string_view sv) {
    return ltrim(rtrim(sv));
}

find_first_not_of with an explicit whitespace set sidesteps the ::isspace UB trap and is locale-independent. The functions return string_view — zero allocation.


Replace All

cpp
// std::string has no replace-all member (C++98–C++23)
std::string replace_all(std::string s, std::string_view from, std::string_view to) {
    if (from.empty()) return s;
    std::size_t pos = 0;
    while ((pos = s.find(from, pos)) != std::string::npos) {
        s.replace(pos, from.size(), to);
        pos += to.size();  // advance past replacement — handles "aa"→"a" correctly
    }
    return s;
}

Number Conversions

cpp
#include <charconv>   // C++17
#include <format>     // C++20

// Fast, no-alloc, locale-independent (C++17)
char buf[32];
auto [ptr, ec] = std::to_chars(buf, buf + sizeof(buf), 3.14159, std::chars_format::fixed, 2);
// ptr points past the last written char; ec == std::errc{} on success
std::string_view result{buf, ptr};  // "3.14"

int n{};
auto [p2, ec2] = std::from_chars("42abc", "42abc" + 5, n);
// n == 42, p2 points to 'a', ec2 == std::errc{}

// Legacy — locale-sensitive, may throw, allocates (C++11)
int  i = std::stoi("42");
double d = std::stod("3.14");
std::string s = std::to_string(42);  // "42"

// Formatted (C++20)
std::string f = std::format("{:.2f}", 3.14159);  // "3.14"
std::string h = std::format("{:#010x}", 255);    // "0x000000ff"

Prefer std::from_chars/std::to_chars in hot paths: they are 5–10× faster than stoi/stod in benchmarks and never throw.


std::string_view Efficiency

cpp
#include <string_view>

// Accept any string-like input without allocation (C++17)
void process(std::string_view sv) {
    auto word = sv.substr(0, sv.find(' '));  // view into sv, no heap
    sv.remove_prefix(word.size());           // slide window in-place
    sv.remove_suffix(sv.size() - sv.rfind('\n') - 1);
}

process("literal");             // no allocation
process(some_std_string);       // implicit conversion, no copy
process(buf, len);              // from raw buffer

// PITFALL: string_view does not own — never return a view into a local
std::string_view bad() {
    std::string s = compute();
    return s;   // s is destroyed; returned view dangles — UB
}

// Safe: return std::string or ensure the backing string outlives the view

std::string_view also exposes remove_prefix and remove_suffix (C++17), making it ideal as a cursor when parsing tokenised input without extra allocation.


Best Practices

  • Prefer string_view parameters over const std::string& for read-only functions; it works with string, const char*, and string literals without copies.
  • Reserve before building large strings. If the final size is predictable, call s.reserve(n) before a loop of append or += to avoid repeated reallocations.
  • Use std::from_chars/std::to_chars for number serialisation in performance-sensitive or locale-sensitive contexts; stoi/stod respect the global locale and may block on locale mutex.
  • C++20 predicates first. Replace s.find(x) == 0 with s.starts_with(x) for clarity and marginally better optimiser hints.
  • Cast through unsigned char before passing to any <cctype> function (toupper, tolower, isspace, isdigit) to avoid UB on platforms where char is signed.

Common Pitfalls

  • string::npos comparisons with signed integers. s.find(...) returns std::size_t; comparing the result to -1 or int variables is always false on typical 64-bit platforms. Always compare to std::string::npos.
  • substr is O(n). On inner loops over large strings, compose string_view::substr instead; it's O(1) and allocates nothing.
  • Dangling string_view from temporaries. std::string_view sv = std::string{"temp"}; compiles and immediately dangles — the temporary is destroyed at the semicolon.
  • std::views::split in C++20 returns subranges, not string_views. The range constructor std::string_view{part} was not available until C++23. Code that compiles on C++23 may fail on C++20 without the explicit {part.begin(), part.end()} form.
  • replace_all with an empty from string. Without a guard, find("", pos) always succeeds at every position, producing an infinite loop.

See Also