Small Buffer Optimization
Embed a fixed-size inline buffer inside an object to eliminate heap allocation for small payloads, trading object size for latency and cache locality.
Small Buffer Optimization (SBO)since C++11An implementation technique that reserves a fixed-size inline storage buffer within an object, using placement new to construct contained values there when they fit, and falling back to heap allocation only for values that exceed the buffer.
Overview
Every call to operator new carries cost: the allocator must locate a free block, update bookkeeping, and return a pointer that the CPU's cache likely does not hold. For containers whose typical payload is small β a short string, a lightweight callable, a pointer-sized variant β that cost dominates runtime.
Small Buffer Optimization sidesteps allocation on the common path by embedding a raw byte array inside the containing object. The type maintains a discriminant β often a single pointer that is null on the inline path β to know whether live data resides in the buffer or on the heap. The public interface is identical in both cases; only the internal storage path differs.
The C++ standard library uses this technique pervasively. Every conforming std::string implementation employs Short String Optimization (SSO), a direct specialization of SBO that typically stores up to 15 or 22 bytes without touching the heap. Since C++11, std::function maintains an internal buffer for small callable objects; moving a lambda that captures a single pointer never allocates. Since C++17, std::any is explicitly encouraged by the standard to avoid dynamic allocation for small, trivially-destructible types β std::is_nothrow_move_constructible_v<T> being true is the usual gate.
The tradeoff is object size. The inline buffer inflates sizeof of the owning class regardless of whether the buffer is ever used. For objects stored densely in containers, oversizing the buffer hurts cache utilization. The correct buffer size is workload-specific and should be driven by profiling.
Syntax
The structural skeleton requires four components: an aligned raw buffer, an optional heap pointer that doubles as an inline/heap discriminant, a vtable or discriminant tag, and correct handling of copy, move, and destruction on both paths.
#include <cstddef>
#include <memory>
#include <new>
#include <type_traits>
#include <utility>
template <std::size_t InlineCapacity = 32>
class SBOStorage {
public:
SBOStorage() noexcept = default;
template <typename T>
explicit SBOStorage(T&& value) {
using U = std::decay_t<T>;
static_assert(std::is_copy_constructible_v<U>); // C++17 trait
if constexpr (fits_inline<U>()) { // C++17 if constexpr
::new (buf_) U(std::forward<T>(value));
} else {
heap_ = ::new U(std::forward<T>(value));
}
}
~SBOStorage() {
::operator delete(heap_); // no-op when null
}
private:
template <typename T>
static constexpr bool fits_inline() noexcept {
return sizeof(T) <= InlineCapacity &&
alignof(T) <= alignof(std::max_align_t);
}
// alignas ensures safe placement new for any scalar type
alignas(std::max_align_t) char buf_[InlineCapacity]{};
void* heap_ = nullptr; // null β inline path active
};Using alignas(std::max_align_t) on the buffer satisfies alignment requirements for every fundamental type. For types with extended alignment (e.g., __m256), you need an explicit alignas(alignof(T)) check and either reject them or allocate on the heap. std::aligned_storage was deprecated in C++23; prefer the alignas char array directly.
Examples
Type-erased value container
Below is a production-viable pattern: a non-owning vtable handles type-specific operations, while storage dispatch is decided at construction time via if constexpr.
#include <cassert>
#include <cstddef>
#include <new>
#include <type_traits>
#include <utility>
template <std::size_t N = 48>
class AnySmall {
struct Ops {
void (*destroy)(void*) noexcept;
void (*copy_to)(void* dst, const void* src);
void (*move_to)(void* dst, void* src) noexcept;
};
template <typename T>
static const Ops* ops_for() noexcept {
static constexpr Ops kOps{
[](void* p) noexcept { static_cast<T*>(p)->~T(); },
[](void* dst, const void* src) {
::new (dst) T(*static_cast<const T*>(src));
},
[](void* dst, void* src) noexcept {
::new (dst) T(std::move(*static_cast<T*>(src)));
static_cast<T*>(src)->~T();
},
};
return &kOps;
}
template <typename T>
static constexpr bool inline_ok() noexcept {
return sizeof(T) <= N && alignof(T) <= alignof(std::max_align_t) &&
std::is_nothrow_move_constructible_v<T>; // C++11 trait
}
alignas(std::max_align_t) char buf_[N]{};
void* heap_ = nullptr;
const Ops* ops_ = nullptr;
void* live() noexcept { return heap_ ? heap_ : static_cast<void*>(buf_); }
const void* live() const noexcept {
return heap_ ? heap_ : static_cast<const void*>(buf_);
}
public:
AnySmall() = default;
template <typename T, typename = std::enable_if_t< // C++11 SFINAE
!std::is_same_v<std::decay_t<T>, AnySmall>>>
explicit AnySmall(T&& v) {
using U = std::decay_t<T>;
ops_ = ops_for<U>();
if constexpr (inline_ok<U>()) {
::new (buf_) U(std::forward<T>(v));
} else {
heap_ = ::new U(std::forward<T>(v));
}
}
AnySmall(AnySmall&& o) noexcept : ops_(o.ops_) {
if (!ops_) return;
if (o.heap_) {
heap_ = o.heap_;
o.heap_ = nullptr;
} else {
ops_->move_to(buf_, o.buf_); // O(N), not O(1)
}
o.ops_ = nullptr;
}
AnySmall(const AnySmall& o) : ops_(o.ops_) {
if (!ops_) return;
if (o.heap_) {
// allocate raw storage then copy-construct in place
heap_ = ::operator new(/* sizeof(T) unknown here β use vtable size field */0);
ops_->copy_to(heap_, o.heap_);
} else {
ops_->copy_to(buf_, o.buf_);
}
}
~AnySmall() {
if (ops_) ops_->destroy(live());
::operator delete(heap_);
}
};In practice, the vtable is extended with a size and align field so the copy constructor can call ::operator new(size) with the correct size. Libraries like llvm::Any and folly::Function follow exactly this pattern.
Observing SSO in std::string
#include <iostream>
#include <string>
int main() {
std::string small = "hello"; // no heap allocation on libstdc++/libc++
std::string large(64, 'x'); // heap allocated
// Capacity at which heap kicks in is implementation-defined.
// On libstdc++ (GCC) the threshold is 15 bytes for char.
// On libc++ (Clang) it is typically 22 bytes.
std::cout << small.capacity() << '\n'; // often 15 or 22
std::cout << large.capacity() << '\n'; // >= 64
}std::function and small callables (C++11)
#include <functional>
void register_handler(std::function<void(int)> f); // C++11
int main() {
int offset = 10;
// Single captured int: fits inline, no allocation.
register_handler([offset](int x) { return x + offset; });
// Capturing a large struct forces heap fallback.
struct Big { char data[256]; };
Big b{};
register_handler([b](int x) { (void)b; return x; }); // heap allocated
}Best Practices
Size the buffer from data, not intuition. Profile the distribution of payload sizes before choosing N. A buffer of 48β64 bytes covers the common case in most event systems and callback registries without excessive stack waste.
Gate inline storage on is_nothrow_move_constructible. If moving the inline object can throw, a move of the container becomes potentially throwing, which breaks many standard-library guarantees. std::any uses this gate explicitly.
Extend the vtable with size and alignment. A vtable entry for alloc_size and alloc_align lets the copy path allocate correctly without knowing the concrete type. This avoids storing a separate type-erased sizeof.
Mark move operations noexcept when the inline path is always noexcept. Containers such as std::vector choose between move and copy during reallocation based on noexcept on the move constructor. Failing to propagate noexcept correctly degrades push_back from O(1) amortized to O(n).
Common Pitfalls
Assuming moves are O(1). When the inline buffer is active, a move must memcpy or move-construct the buffer contents. There is no pointer to swap. Code that benchmarks moves expecting constant time will find linear-in-buffer-size performance on the inline path.
Alignment UB. Storing a double (8-byte aligned) into a buffer declared as char buf_[N] without alignas is undefined behavior. Always use alignas(std::max_align_t) for a general-purpose buffer, or alignas(alignof(T)) when the stored type is known at compile time.
Double-free on self-move. Naive operator=(AnySmall&& o) that frees heap_ then copies o.heap_ is safe only if this != &o. Self-move must be explicitly guarded or handled by the standard swap idiom.
Forgetting to nullify after move. After transferring heap_ to the new owner, the source must set heap_ = nullptr. Otherwise its destructor will call ::operator delete on memory it no longer owns.
Buffer too large in hot paths. A 256-byte SBO buffer on a type that lives in a std::vector<T> means 256 bytes per element even when all values are small. The resulting cache pressure can erase the allocation savings entirely. Prefer the smallest buffer that covers the p95 payload.
See Also
std::any(C++17) β standard type-erased container encouraged to use SBO internallystd::function(C++11) β type-erases callables with an internal fixed bufferstd::stringβ Short String Optimization is the canonical SBO specialization for charactersboost::container::small_vector/llvm::SmallVectorβ SBO applied to dynamic arrays: inline storage for up to N elements before spilling to the heap