Hello and welcome to bnch_swt or "Benchmark Suite". This is a collection of classes/functions for the purpose of benchmarking CPU performance.
The following operating systems and compilers are officially supported:
This guide will walk you through setting up and running benchmarks using BenchmarkSuite
.
To use BenchmarkSuite
, include the necessary header files in your project. Ensure you have a C++23 (or later) compliant compiler.
#include <BnchSwt/BenchmarkSuite.hpp>
#include <vector>
#include <string>
#include <cstring>
The following example demonstrates how to set up and run a benchmark comparing two integer-to-string conversion functions:
template<size_t count, typename value_type, bnch_swt::string_literal testName>
BNCH_SWT_INLINE void testFunction() {
std::vector<value_type> testValues{ generateRandomIntegers<value_type>(count, sizeof(value_type) == 4 ? 10 : 20) };
std::vector<std::string> testValues00;
std::vector<std::string> testValues01(count);
for (size_t x = 0; x < count; ++x) {
testValues00.emplace_back(std::to_string(testValues[x]));
}
bnch_swt::benchmark_stage<"old-vs-new-i-to-str" + testName>::template runBenchmark<"glz::to_chars", "CYAN">([&] {
size_t bytesProcessed = 0;
char newerString[30]{};
for (size_t x = 0; x < count; ++x) {
std::memset(newerString, '\0', sizeof(newerString));
auto newPtr = to_chars(newerString, testValues[x]);
bytesProcessed += testValues00[x].size();
testValues01[x] = std::string{newerString, static_cast<size_t>(newPtr - newerString)};
}
bnch_swt::doNotOptimizeAway(bytesProcessed);
return bytesProcessed;
});
bnch_swt::benchmark_stage<"old-vs-new-i-to-str" + testName>::template runBenchmark<"jsonifier_internal::toChars", "CYAN">([&] {
size_t bytesProcessed = 0;
char newerString[30]{};
for (size_t x = 0; x < count; ++x) {
std::memset(newerString, '\0', sizeof(newerString));
auto newPtr = jsonifier_internal::toChars(newerString, testValues[x]);
bytesProcessed += testValues00[x].size();
testValues01[x] = std::string{newerString, static_cast<size_t>(newPtr - newerString)};
}
bnch_swt::doNotOptimizeAway(bytesProcessed);
return bytesProcessed;
});
bnch_swt::benchmark_stage<"old-vs-new-i-to-str" + testName>::printResults(true, false);
}
int main() {
testFunction<512, uint64_t, "-uint64">();
testFunction<512, int64_t, "-int64">();
return 0;
}
To create a benchmark:
- Generate or initialize test data.
- Use
bnch_swt::benchmark_stage
to define a benchmark. By setting the name of thebnch_swt::benchmark_stage
using a string literal, you are instantiating a single "stage" within which to execute different benchmarks. - Implement test functions with lambdas capturing your benchmark logic.
The benchmark_stage
structure orchestrates each test:
runBenchmark()
: Executes a given lambda function, measuring performance. By setting the name of the benchmark 'run' using a string literal, you are instantiating a single benchmark "entity" or "library" to have its data collected and compared, within the given benchmark stage.printResults()
: Displays detailed performance metrics and comparisons.
runBenchmark
: Executes a lambda function and tracks performance."glz::to_chars"
: A label for the function being benchmarked."jsonifier_internal::toChars"
: An alternative implementation to compare.
Use bnch_swt::doNotOptimizeAway
to prevent the compiler from optimizing away results.
Compile and run your program:
Performance Metrics for: int-to-string-comparisons-1
Metrics for: jsonifier::internal::toChars
Total Iterations to Stabilize : 394
Measured Iterations : 20
Bytes Processed : 512.00
Nanoseconds per Execution : 5785.25
Frequency (GHz) : 4.83
Throughput (MB/s) : 84.58
Throughput Percentage Deviation (+/-%) : 8.36
Cycles per Execution : 27921.20
Cycles per Byte : 54.53
Instructions per Execution : 52026.00
Instructions per Cycle : 1.86
Instructions per Byte : 101.61
Branches per Execution : 361.45
Branch Misses per Execution : 0.73
Cache References per Execution : 97.03
Cache Misses per Execution : 74.68
----------------------------------------
Metrics for: glz::to_chars
Total Iterations to Stabilize : 421
Measured Iterations : 20
Bytes Processed : 512.00
Nanoseconds per Execution : 6480.30
Frequency (GHz) : 4.68
Throughput (MB/s) : 75.95
Throughput Percentage Deviation (+/-%) : 17.58
Cycles per Execution : 30314.40
Cycles per Byte : 59.21
Instructions per Execution : 51513.00
Instructions per Cycle : 1.70
Instructions per Byte : 100.61
Branches per Execution : 438.25
Branch Misses per Execution : 0.73
Cache References per Execution : 95.93
Cache Misses per Execution : 73.59
----------------------------------------
Library jsonifier::internal::toChars, is faster than library: glz::to_chars, by roughly: 11.36%.
This structured output helps you quickly identify which implementation is faster or more efficient.
Now you’re ready to start benchmarking with BenchmarkSuite!