5 stable releases
| new 1.1.2 | Mar 9, 2026 |
|---|---|
| 1.0.1 | Nov 16, 2025 |
| 1.0.0 | Nov 15, 2025 |
#1390 in Text processing
91KB
1.5K
SLoC
dash-em
Enterprise-Grade Em-Dash Removal Infrastructure — Leveraging Advanced SIMD Vectorization for Optimal Character Stream Processing
Overview
dash-em is an absurdly over-engineered, deliberately meme-grade, production-ready, enterprise-certified string manipulation library designed with singular, unwavering purpose—removing em-dashes (U+2014) from UTF-8 encoded text—with unprecedented obsession.
Building upon decades of accumulated wisdom in systems programming—combined with cutting-edge SIMD acceleration techniques—dash-em delivers truly unnecessary—yet deeply satisfying—performance characteristics in the em-dash elimination category.
🎭 MEME REPOSITORY DISCLOSURE — This project is a deliberately absurd, tongue-in-cheek exploration of over-engineering. The em-dash removal use case is intentionally ridiculous. This is not serious production software, despite being written with genuine engineering rigor. Enjoy the absurdity.
Key Value Propositions
- ⚡ SIMD-Accelerated Processing — Employing SSE4.2, AVX, AVX2, AVX-512F, and ARM NEON instruction sets for—optimal throughput
- 🚀 Extraordinary Performance — Up to 538x faster than byte-level iteration—in language bindings
- 🔒 Memory-Safe Architecture — Engineered with defensive programming—paradigms throughout
- 📦 Zero External Dependencies — Pure C implementation—no transitive dependency chains
- 🌍 True Cross-Platform Support — Linux, macOS, Windows—and ARM-based systems—all supported
- 🎯 Polyglot Language Support — 20+ language bindings—ensuring accessibility across heterogeneous technology stacks
- 🏢 Enterprise-Ready Infrastructure — Battle-tested, production-hardened, deployable—at scale
Features
What makes dash-em fast?
Instead of checking characters one by one—which is slow—dash-em uses SIMD instructions to process 16–64 bytes in parallel. Modern CPUs can do this crazy fast—we just have to tell them what to do.
Real-world speedups (measured across multiple architectures):
- Core C library: 5x-11x faster than scalar implementation—depending on CPU architecture
- Python bindings: 211x-538x faster than byte-level iteration
- JavaScript bindings: 2x-31x faster than byte-level Buffer manipulation
- Best case (no em-dashes—fast path): Up to 15.58 GB/s throughput on modern x86-64
How it works
The library auto-detects your CPU and picks the fastest path:
- AVX-512F (if available) — 64 bytes per iteration. Cutting edge. Stupid fast.
- AVX2 (fallback) — 32 bytes per iteration. Still very fast. Works on most modern CPUs.
- SSE4.2 (older systems) — 16 bytes per iteration. Slower but still beats naive approaches.
- ARM NEON (ARM/Apple Silicon) — 16 bytes per iteration. Works on servers and M-series Macs.
- Scalar (last resort) — One byte at a time. Works everywhere.
Optimizations under the hood
- Fast path for tiny strings — If you're only removing dashes from a few characters, we skip SIMD overhead
- Loop unrolling — Process multiple chunks per iteration to keep the CPU pipeline full
- Cache prefetching — Tell the CPU to load the next chunk early so it's ready when we need it
- Bitmask matching — Use clever bit tricks to find patterns instead of checking bytes individually
- Smart memory operations — Bulk copy unchanged regions instead of processing byte-by-byte
It's absurdly optimized. Maybe too optimized. But it works—and it's fast.
Performance
Core Library Performance
Multi-architecture SIMD performance (statistical benchmarks):
| Pattern | macos-14-aarch64 | windows-2022-msvc | ubuntu-22.04-gcc | ubuntu-22.04-clang |
|---|---|---|---|---|
| sparse | 3.89 GB/s (2.63x) | 12.33 GB/s (11.47x) | 12.78 GB/s (8.71x) | 9.67 GB/s (6.61x) |
| moderate | 2.73 GB/s (1.82x) | 7.23 GB/s (6.55x) | 7.55 GB/s (5.58x) | 6.93 GB/s (5.10x) |
| dense | 1.53 GB/s (0.67x) | 1.54 GB/s (1.06x) | 2.57 GB/s (1.07x) | 2.18 GB/s (1.09x) |
| alternating | 1.53 GB/s (0.67x) | 1.58 GB/s (1.09x) | 2.57 GB/s (1.07x) | 2.18 GB/s (0.94x) |
| boundary | 3.28 GB/s (2.16x) | 9.84 GB/s (9.24x) | 10.45 GB/s (7.45x) | 9.57 GB/s (6.88x) |
| no | 4.27 GB/s (2.90x) | 12.02 GB/s (11.10x) | 15.58 GB/s (10.51x) | 13.66 GB/s (9.21x) |
Language Bindings Performance
Comparing dash-em bindings against native byte-level implementations:
| Language | Test Pattern | Native (μs) | dash-em (μs) | Speedup |
|---|---|---|---|---|
| javascript | alternating | 66.1 | 19.0 | 3.48x |
| javascript | dense | 127.4 | 53.6 | 2.38x |
| javascript | moderate | 302.7 | 50.0 | 6.05x |
| javascript | no | 2932.3 | 94.1 | 31.14x |
| javascript | sparse | 2861.1 | 107.6 | 26.58x |
| python | alternating | 3624.9 | 15.5 | 234.20x |
| python | dense | 7577.0 | 31.2 | 243.19x |
| python | moderate | 18888.8 | 73.3 | 257.74x |
| python | no | 186188.6 | 345.9 | 538.21x |
| python | sparse | 193434.6 | 915.9 | 211.21x |
Installation
C/C++ Core Library
git clone https://github.com/Gaurav-Gosain/dash-em
cd dash-em
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
sudo make install
Language-Specific Bindings
JavaScript/TypeScript (Node.js)
npm install dash-em
String API (easy to use):
const dashem = require('dash-em');
const result = dashem.remove('Hello—world');
console.log(result); // Output: Helloworld
Buffer API (high-performance, zero-copy):
const dashem = require('dash-em');
// Process Buffer directly (10-26x faster than string API)
const input = Buffer.from('Hello—world', 'utf-8');
const output = dashem.removeBuffer(input);
console.log(output.toString('utf-8')); // Output: Helloworld
// In-place modification (fastest, modifies input)
const buffer = Buffer.from('Hello—world', 'utf-8');
const newLength = dashem.removeBufferInPlace(buffer);
console.log(buffer.slice(0, newLength).toString('utf-8')); // Output: Helloworld
Python
pip install dash-em
import dashem
result = dashem.remove('Hello—world')
print(result) # Output: Helloworld
Go
go get github.com/Gaurav-Gosain/dash-em/go
package main
import (
"fmt"
dashem "github.com/Gaurav-Gosain/dash-em/go"
)
func main() {
result, _ := dashem.Remove("Hello—world")
fmt.Println(result) // Output: Helloworld
}
Rust
[dependencies]
dash-em = "1.0"
fn main() {
let result = dash_em::remove("Hello—world").unwrap();
println!("{}", result); // Output: Helloworld
}
Java
public class Example {
public static void main(String[] args) {
String result = Dashem.remove("Hello—world");
System.out.println(result); // Output: Helloworld
}
}
C# / .NET
string result = Dashem.Remove("Hello—world");
Console.WriteLine(result); // Output: Helloworld
PHP
<?php
$result = dashem_remove('Hello—world');
echo $result; // Output: Helloworld
?>
Ruby
require 'dashem'
result = Dashem.remove('Hello—world')
puts result # Output: Helloworld
Swift
import Dashem
let result = removeEmDashes("Hello—world")
print(result) // Output: Helloworld
Additional Language Bindings
Comprehensive bindings are provided for—and thoroughly tested against—the following languages:
- Kotlin — Native interop with dash-em core
- R — Rcpp-based integration layer
- Dart — dart:ffi bindings for cross-platform applications
- Scala — Native compilation via Scala Native
- Perl — XS extension module—providing optimal performance characteristics
- Lua — Lightweight C API integration
- Haskell — Pure FFI bindings—maintaining functional purity
- Elixir — NIF-based native implementation—ensuring BEAM compatibility
- Zig — C ABI import with modern language ergonomics
- Objective-C — Direct C interoperability layer
WebAssembly
dash-em compiles to—high-performance WebAssembly modules supporting multiple target specifications:
# wasm32 (Emscripten)
cd bindings/wasm && ./build.sh
# WASI (WebAssembly System Interface)
WASI_SDK_PATH=/opt/wasi-sdk ./build.sh wasi
Architecture
The dispatch system checks your CPU once, then uses the best available implementation for all subsequent calls:
graph TD
A["dashem_remove()"] --> B{Input < 32 bytes?}
B -->|Yes| C["fast_small() scalar"]
B -->|No| D["Check CPU capabilities<br/>(cached after first call)"]
D --> E{AVX-512?}
E -->|Yes| F["dashem_remove_avx512"]
E -->|No| G{AVX2?}
G -->|Yes| H["dashem_remove_avx2_unrolled"]
G -->|No| I{SSE4.2?}
I -->|Yes| J["dashem_remove_sse42"]
I -->|No| K{ARM NEON?}
K -->|Yes| L["dashem_remove_neon"]
K -->|No| M["dashem_remove_scalar"]
C --> N["Output"]
F --> N
H --> N
J --> N
L --> N
M --> N
API Reference
C API
/**
* Remove em-dashes from UTF-8 string
*
* @param input Input UTF-8 string
* @param input_len Length of input in bytes
* @param output Output buffer
* @param output_cap Output buffer capacity
* @param output_len Output length (set on return)
* @return 0 on success, -1 on buffer overflow, -2 on invalid input
*/
int dashem_remove(
const char *input,
size_t input_len,
char *output,
size_t output_capacity,
size_t *output_len
);
/**
* Get library version
* @return Version string (e.g., "1.0.0")
*/
const char* dashem_version(void);
/**
* Get active implementation name
* @return Implementation name (e.g., "AVX2", "SSE4.2", "Scalar")
*/
const char* dashem_implementation_name(void);
/**
* Detect available CPU features
* @return Bitmask of DASHEM_CPU_* flags
*/
uint32_t dashem_detect_cpu_features(void);
JavaScript/Node.js API
/**
* Remove em-dashes from a string
* @param {string} input - Input string
* @returns {string} String with em-dashes removed
*/
dashem.remove(input);
/**
* Remove em-dashes from a Buffer (zero-copy, high-performance)
* @param {Buffer} buffer - Input Buffer
* @returns {Buffer} New Buffer with em-dashes removed
*/
dashem.removeBuffer(buffer);
/**
* Remove em-dashes from a Buffer in-place (ultra-fast, modifies input!)
* WARNING: This modifies the input Buffer
* @param {Buffer} buffer - Input Buffer (will be modified)
* @returns {number} New length of valid data in buffer
*/
dashem.removeBufferInPlace(buffer);
/**
* Get library version
* @returns {string} Version string
*/
dashem.version();
/**
* Get implementation name
* @returns {string} Implementation name (e.g., "AVX2", "SSE4.2")
*/
dashem.implementationName();
Performance Notes:
remove(): Easy to use but includes UTF-8 conversion overheadremoveBuffer(): 10-26x faster thanremove(), zero-copy operationremoveBufferInPlace(): Fastest option, modifies input buffer directly
Running Benchmarks
Performance benchmarks are automatically updated via GitHub Actions. See the Performance section above for the latest results across all architectures and language bindings.
To run benchmarks locally:
C/C++ Core Library:
cd build && ./bench_dashem
Multi-Language Benchmarks:
cd benchmarks
./run_all_benchmarks.sh
Results are generated in JSON format for easy integration with continuous performance monitoring systems.
Testing
Comprehensive test suites—validated across all supported platforms—ensure—correctness and reliability:
# C/C++ tests
cd build && ctest
# Language-specific tests
npm test # JavaScript
python -m pytest # Python
cargo test # Rust
go test ./... # Go
Continuous Integration
dash-em leverages GitHub Actions—to ensure—consistent quality across:
- ✓ Linux (x86_64, ARM64)—builds and tests
- ✓ macOS (Intel, Apple Silicon)—native execution
- ✓ Windows (MSVC, MinGW)—compatibility verification
- ✓ WebAssembly (Emscripten, WASI)—cross-compilation
- ✓ All language bindings—comprehensive integration testing
Contributing
Contributions are welcome! Please ensure:
- Code adheres to—professional C/C++ standards—with comprehensive documentation
- Commit messages are—descriptive and—reference relevant issues
- All tests pass—before submitting—pull requests
- Performance characteristics are—benchmarked against—baseline implementations
License
MIT License — See LICENSE file for details.
Citation
If dash-em is utilized in—academic or—commercial contexts, please reference:
@software{gosain2025dashem,
title={dash-em: Enterprise-Grade Em-Dash Removal Infrastructure},
author={Gosain, Gaurav},
year={2025},
url={https://github.com/Gaurav-Gosain/dash-em}
}
Acknowledgments
This project exists because—em-dashes matter—and they deserve—the most efficient, highly optimized—removal mechanism—available on modern computing platforms.
Building excellence—one em-dash at a time. —
Version: 1.0.1 | Status: Production-Ready | License: MIT | Repository: github.com/Gaurav-Gosain/dash-em
No runtime deps
~235KB