Benchmark Suite

Hello and welcome to bnch_swt or "Benchmark Suite". This is a collection of classes/functions for the purpose of benchmarking CPU performance.

The following operating systems and compilers are officially supported:

Compiler Support

Operating System Support

Quickstart Guide for BenchmarkSuite

This guide will walk you through setting up and running benchmarks using BenchmarkSuite.

Installation

To use BenchmarkSuite, include the necessary header files in your project. Ensure you have a C++23 (or later) compliant compiler.

#include <BnchSwt/BenchmarkSuite.hpp>
#include <vector>
#include <string>
#include <cstring>

Basic Example

The following example demonstrates how to set up and run a benchmark comparing two integer-to-string conversion functions:

template<size_t count, typename value_type, bnch_swt::string_literal testName>
BNCH_SWT_INLINE void testFunction() {
    std::vector<value_type> testValues{ generateRandomIntegers<value_type>(count, sizeof(value_type) == 4 ? 10 : 20) };
    std::vector<std::string> testValues00;
    std::vector<std::string> testValues01(count);

    for (size_t x = 0; x < count; ++x) {
        testValues00.emplace_back(std::to_string(testValues[x]));
    }

    bnch_swt::benchmark_stage<"old-vs-new-i-to-str" + testName>::template runBenchmark<"glz::to_chars", "CYAN">([&] {
        size_t bytesProcessed = 0;
        char newerString[30]{};
        for (size_t x = 0; x < count; ++x) {
            std::memset(newerString, '\0', sizeof(newerString));
            auto newPtr = to_chars(newerString, testValues[x]);
            bytesProcessed += testValues00[x].size();
            testValues01[x] = std::string{newerString, static_cast<size_t>(newPtr - newerString)};
        }
        bnch_swt::doNotOptimizeAway(bytesProcessed);
        return bytesProcessed;
    });

    bnch_swt::benchmark_stage<"old-vs-new-i-to-str" + testName>::template runBenchmark<"jsonifier_internal::toChars", "CYAN">([&] {
        size_t bytesProcessed = 0;
        char newerString[30]{};
        for (size_t x = 0; x < count; ++x) {
            std::memset(newerString, '\0', sizeof(newerString));
            auto newPtr = jsonifier_internal::toChars(newerString, testValues[x]);
            bytesProcessed += testValues00[x].size();
            testValues01[x] = std::string{newerString, static_cast<size_t>(newPtr - newerString)};
        }
        bnch_swt::doNotOptimizeAway(bytesProcessed);
        return bytesProcessed;
    });

    bnch_swt::benchmark_stage<"old-vs-new-i-to-str" + testName>::printResults(true, false);
}

int main() {
    testFunction<512, uint64_t, "-uint64">();
    testFunction<512, int64_t, "-int64">();
    return 0;
}

Creating Benchmarks

To create a benchmark:

Generate or initialize test data.
Use bnch_swt::benchmark_stage to define a benchmark. By setting the name of the bnch_swt::benchmark_stage using a string literal, you are instantiating a single "stage" within which to execute different benchmarks.
Implement test functions with lambdas capturing your benchmark logic.

Benchmark Stage

The benchmark_stage structure orchestrates each test:

Methods

runBenchmark(): Executes a given lambda function, measuring performance. By setting the name of the benchmark 'run' using a string literal, you are instantiating a single benchmark "entity" or "library" to have its data collected and compared, within the given benchmark stage.
printResults(): Displays detailed performance metrics and comparisons.

Example Benchmark Definitions

runBenchmark: Executes a lambda function and tracks performance.
- "glz::to_chars": A label for the function being benchmarked.
- "jsonifier_internal::toChars": An alternative implementation to compare.

Avoiding Compiler Optimizations

Use bnch_swt::doNotOptimizeAway to prevent the compiler from optimizing away results.

Running Benchmarks

Compile and run your program:

Output and Results

Performance Metrics for: int-to-string-comparisons-1
Metrics for: jsonifier::internal::toChars
Total Iterations to Stabilize                               : 394
Measured Iterations                                         : 20
Bytes Processed                                             : 512.00
Nanoseconds per Execution                                   : 5785.25
Frequency (GHz)                                             : 4.83
Throughput (MB/s)                                           : 84.58
Throughput Percentage Deviation (+/-%)                      : 8.36
Cycles per Execution                                        : 27921.20
Cycles per Byte                                             : 54.53
Instructions per Execution                                  : 52026.00
Instructions per Cycle                                      : 1.86
Instructions per Byte                                       : 101.61
Branches per Execution                                      : 361.45
Branch Misses per Execution                                 : 0.73
Cache References per Execution                              : 97.03
Cache Misses per Execution                                  : 74.68
----------------------------------------
Metrics for: glz::to_chars
Total Iterations to Stabilize                               : 421
Measured Iterations                                         : 20
Bytes Processed                                             : 512.00
Nanoseconds per Execution                                   : 6480.30
Frequency (GHz)                                             : 4.68
Throughput (MB/s)                                           : 75.95
Throughput Percentage Deviation (+/-%)                      : 17.58
Cycles per Execution                                        : 30314.40
Cycles per Byte                                             : 59.21
Instructions per Execution                                  : 51513.00
Instructions per Cycle                                      : 1.70
Instructions per Byte                                       : 100.61
Branches per Execution                                      : 438.25
Branch Misses per Execution                                 : 0.73
Cache References per Execution                              : 95.93
Cache Misses per Execution                                  : 73.59
----------------------------------------
Library jsonifier::internal::toChars, is faster than library: glz::to_chars, by roughly: 11.36%.

This structured output helps you quickly identify which implementation is faster or more efficient.

Now you’re ready to start benchmarking with BenchmarkSuite!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
Assembly		Assembly
Benchmark		Benchmark
CMake		CMake
Include/BnchSwt		Include/BnchSwt
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
ReadMe.md		ReadMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark Suite

Compiler Support

Operating System Support

Quickstart Guide for BenchmarkSuite

Table of Contents

Installation

Basic Example

Creating Benchmarks

Benchmark Stage

Methods

Example Benchmark Definitions

Avoiding Compiler Optimizations

Running Benchmarks

Output and Results

About

Releases

Packages

Languages

RealTimeChris/BenchmarkSuite

Folders and files

Latest commit

History

Repository files navigation

Benchmark Suite

Compiler Support

Operating System Support

Quickstart Guide for BenchmarkSuite

Table of Contents

Installation

Basic Example

Creating Benchmarks

Benchmark Stage

Methods

Example Benchmark Definitions

Avoiding Compiler Optimizations

Running Benchmarks

Output and Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages