String Algorithms
C++ string operations — searching, splitting, trimming, case conversion, replace-all, and fast number parsing with from_chars and string_view.
std::string algorithmssince C++98std::string provides member functions for searching, modifying, and comparing character sequences; C++20 added starts_with, ends_with, and contains; C++17 introduced std::string_view for zero-copy string references and std::from_chars/std::to_chars for locale-independent, allocation-free number parsing.
Overview
std::string (since C++98) is a mutable, owning sequence of char. Its member API covers most common operations, but several utilities — splitting, joining, trimming, replace-all — have no single built-in and require a few lines of composition. C++20 added the predicate trio (starts_with, ends_with, contains) and std::views::split. For read-only string handling, prefer std::string_view (C++17): it accepts both std::string and const char* arguments without copying.
Searching
#include <string>
std::string s = "hello world hello";
// find / rfind — return position or std::string::npos (C++98)
s.find("world") // 6
s.find("world", 7) // npos (start search at offset 7)
s.rfind("hello") // 12 (last occurrence)
s.find('o') // 4 (char overload)
// Character-set scanning (C++98)
s.find_first_of("aeiou") // 1 ('e')
s.find_last_of("aeiou") // 13 ('o')
s.find_first_not_of("helo ") // 6 ('w')
s.find_last_not_of(" ") // 16
// C++20 — cleaner boolean predicates
s.starts_with("hello") // true
s.ends_with("hello") // true
s.contains("world") // true
s.contains("xyz") // falseAll find_* methods are also available on std::string_view with identical semantics.
Substrings and Modification
std::string s = "hello world";
// substr — always allocates (C++98); prefer string_view::substr for read-only work
s.substr(6) // "world"
s.substr(6, 3) // "wor"
// replace, insert, erase (C++98)
s.replace(6, 5, "C++"); // "hello C++"
s.insert(5, ","); // "hello, C++"
s.erase(5, 2); // "hello C++"
// Append (C++98)
s += " rocks";
s.append(3, '!'); // append 3 '!' chars
s.push_back('?');
// In-place case transform — MUST cast through unsigned char (avoid UB)
#include <algorithm>
#include <cctype>
std::transform(s.begin(), s.end(), s.begin(),
[](unsigned char c) { return std::toupper(c); }); // C++11 lambda
::toupperand UB: Passing achardirectly tostd::toupperis undefined behaviour whencharis signed and the value is negative (i.e., code points above 127). Always cast throughunsigned char.
Split
The standard library has no split built-in before C++20's ranges.
#include <string_view>
#include <vector>
// Manual split — O(n), no allocation per token for view version (C++17)
std::vector<std::string_view> split(std::string_view sv, char delim) {
std::vector<std::string_view> out;
for (;;) {
auto pos = sv.find(delim);
out.push_back(sv.substr(0, pos));
if (pos == std::string_view::npos) break;
sv.remove_prefix(pos + 1); // string_view::remove_prefix — C++17
}
return out; // views into the original buffer; don't outlive it
}
// C++20 ranges split — lazy, produces contiguous subranges
#include <ranges>
auto rng = std::views::split("one,two,three"sv, ',');
for (auto part : rng) {
std::string_view sv{part.begin(), part.end()}; // C++20
// C++23: std::string_view sv{part}; (range constructor)
}
// C++23: collect into vector without manual loop
// auto parts = std::views::split("a,b,c"sv, ',')
// | std::ranges::to<std::vector<std::string>>();The manual version is often faster in benchmarks because it avoids the lazy-range machinery overhead for small inputs and returns string_view tokens — zero allocation.
Join
#include <numeric>
#include <string>
// Accumulate-based join (C++11)
std::string join(std::span<const std::string> parts, std::string_view sep) {
// std::span — C++20
if (parts.empty()) return {};
std::string out = parts[0];
for (std::size_t i = 1; i < parts.size(); ++i) {
out += sep;
out += parts[i];
}
return out;
}
// Reserve up-front for large ranges (avoids repeated reallocations)
std::string join_reserved(std::span<const std::string> parts, std::string_view sep) {
if (parts.empty()) return {};
std::size_t total = sep.size() * (parts.size() - 1);
for (auto& p : parts) total += p.size();
std::string out;
out.reserve(total);
out = parts[0];
for (std::size_t i = 1; i < parts.size(); ++i) {
out.append(sep);
out.append(parts[i]);
}
return out;
}Trim
#include <string_view>
#include <cctype>
// Whitespace characters per the C locale
constexpr std::string_view kWS = " \t\r\n\v\f";
std::string_view ltrim(std::string_view sv) {
auto pos = sv.find_first_not_of(kWS);
return pos == std::string_view::npos ? "" : sv.substr(pos);
}
std::string_view rtrim(std::string_view sv) {
auto pos = sv.find_last_not_of(kWS);
return pos == std::string_view::npos ? "" : sv.substr(0, pos + 1);
}
std::string_view trim(std::string_view sv) {
return ltrim(rtrim(sv));
}find_first_not_of with an explicit whitespace set sidesteps the ::isspace UB trap and is locale-independent. The functions return string_view — zero allocation.
Replace All
// std::string has no replace-all member (C++98–C++23)
std::string replace_all(std::string s, std::string_view from, std::string_view to) {
if (from.empty()) return s;
std::size_t pos = 0;
while ((pos = s.find(from, pos)) != std::string::npos) {
s.replace(pos, from.size(), to);
pos += to.size(); // advance past replacement — handles "aa"→"a" correctly
}
return s;
}Number Conversions
#include <charconv> // C++17
#include <format> // C++20
// Fast, no-alloc, locale-independent (C++17)
char buf[32];
auto [ptr, ec] = std::to_chars(buf, buf + sizeof(buf), 3.14159, std::chars_format::fixed, 2);
// ptr points past the last written char; ec == std::errc{} on success
std::string_view result{buf, ptr}; // "3.14"
int n{};
auto [p2, ec2] = std::from_chars("42abc", "42abc" + 5, n);
// n == 42, p2 points to 'a', ec2 == std::errc{}
// Legacy — locale-sensitive, may throw, allocates (C++11)
int i = std::stoi("42");
double d = std::stod("3.14");
std::string s = std::to_string(42); // "42"
// Formatted (C++20)
std::string f = std::format("{:.2f}", 3.14159); // "3.14"
std::string h = std::format("{:#010x}", 255); // "0x000000ff"Prefer std::from_chars/std::to_chars in hot paths: they are 5–10× faster than stoi/stod in benchmarks and never throw.
std::string_view Efficiency
#include <string_view>
// Accept any string-like input without allocation (C++17)
void process(std::string_view sv) {
auto word = sv.substr(0, sv.find(' ')); // view into sv, no heap
sv.remove_prefix(word.size()); // slide window in-place
sv.remove_suffix(sv.size() - sv.rfind('\n') - 1);
}
process("literal"); // no allocation
process(some_std_string); // implicit conversion, no copy
process(buf, len); // from raw buffer
// PITFALL: string_view does not own — never return a view into a local
std::string_view bad() {
std::string s = compute();
return s; // s is destroyed; returned view dangles — UB
}
// Safe: return std::string or ensure the backing string outlives the viewstd::string_view also exposes remove_prefix and remove_suffix (C++17), making it ideal as a cursor when parsing tokenised input without extra allocation.
Best Practices
- Prefer
string_viewparameters overconst std::string&for read-only functions; it works withstring,const char*, and string literals without copies. - Reserve before building large strings. If the final size is predictable, call
s.reserve(n)before a loop ofappendor+=to avoid repeated reallocations. - Use
std::from_chars/std::to_charsfor number serialisation in performance-sensitive or locale-sensitive contexts;stoi/stodrespect the global locale and may block on locale mutex. - C++20 predicates first. Replace
s.find(x) == 0withs.starts_with(x)for clarity and marginally better optimiser hints. - Cast through
unsigned charbefore passing to any<cctype>function (toupper,tolower,isspace,isdigit) to avoid UB on platforms wherecharis signed.
Common Pitfalls
string::nposcomparisons with signed integers.s.find(...)returnsstd::size_t; comparing the result to-1orintvariables is always false on typical 64-bit platforms. Always compare tostd::string::npos.substrisO(n). On inner loops over large strings, composestring_view::substrinstead; it'sO(1)and allocates nothing.- Dangling
string_viewfrom temporaries.std::string_view sv = std::string{"temp"};compiles and immediately dangles — the temporary is destroyed at the semicolon. std::views::splitin C++20 returns subranges, notstring_views. The range constructorstd::string_view{part}was not available until C++23. Code that compiles on C++23 may fail on C++20 without the explicit{part.begin(), part.end()}form.replace_allwith an emptyfromstring. Without a guard,find("", pos)always succeeds at every position, producing an infinite loop.
See Also
std::string_view— lightweight non-owning string referencestd::format— type-safe formatted string construction (C++20)std::regex— POSIX/ECMAScript regular expressions (C++11)std::from_chars/std::to_chars— fast number ↔ string conversion (C++17)std::ranges::split_view— lazy range-based splitting (C++20)