2 releases
Uses new Rust 2024
| new 0.1.1 | Mar 11, 2026 |
|---|---|
| 0.1.0 | Mar 11, 2026 |
#32 in #dictionary
50KB
800 lines
compact-dict
compact-dict is a highly customizable, open-addressing dictionary in Rust. It features linear probing and continuous memory layout for string keys, heavily inspired by Mojo's Dict.
Why compact-dict?
Modern CPUs hate pointer chasing. While traditional maps often fragment data, compact-dict ensures your data stays together.
Memory Layout vs Traditional Maps
graph TD
subgraph Separate_Chaining [Traditional Map: Separate Chaining]
NodeA[Bucket 1] -->|Pointer| NodeB[Heap Node]
NodeB -->|Pointer| NodeC[Heap Node]
style NodeB fill:#f96,stroke:#333
style NodeC fill:#f96,stroke:#333
end
subgraph Compact_Dict [compact-dict: Continuous Layout]
direction LR
Slot1[K1:V1] --- Slot2[K2:V2] --- Slot3[K3:V3] --- Slot4[K4:V4]
style Slot1 fill:#4CAF50,stroke:#333
style Slot2 fill:#4CAF50,stroke:#333
style Slot3 fill:#4CAF50,stroke:#333
style Slot4 fill:#4CAF50,stroke:#333
end
Note1[Pointer Chasing = Cache Misses] -.-> Separate_Chaining
Note2[Strict Contiguity = Prefetcher Heaven] -.-> Compact_Dict
Where we fit in
quadrantChart
title Hash Map Ecosystem Positioning
x-axis Low Memory Overhead --> High Performance
y-axis Complex Architecture --> Simple & Local
quadrant-1 High Speed & Cache Locality
quadrant-2 Pure Speed Complex SIMD
quadrant-3 Small Footprint
quadrant-4 General Purpose
"std::HashMap (SipHash)": [0.35, 0.4]
"hashbrown (AHash)": [0.65, 0.35]
"Separate Chaining Maps": [0.2, 0.8]
"compact-dict": [0.85, 0.9]
Wait, std::collections::HashMap is literally hashbrown under the hood! Why does the pure hashbrown crate perform better in benchmarks?
Because the standard library version uses a cryptographically secure hashing algorithm (SipHash / RandomState) by default to prevent DOS attacks. The raw hashbrown crate (and our benchmarks) typically defaults to AHash or allows swapping to faster, non-cryptographic hashers. Architecturally, they are the same SwissTable.
Features
- Zero-Copy Deserialization (
rkyv): Supportsrkyvfor blazing fast zero-copy lookups and memory-mapped files. Enable via therkyvfeature! - Customizable Layout: Generic integer types for key-count (
KC) and key-offset (KO) sizes. This flexibility allows you to perfectly balance memory usage (e.g.u16vsu32indices) depending on your scale requirements. - Cache Friendly: Linear probing design with optional cached hashes (
CACHING_HASHES). - SIMD Preparation: Includes preparations for SIMD-accelerated probing (via
portable_simd). - Configurable Mutability: Optional destructive operations via the
DESTRUCTIVEconstant generic parameter, which toggles bitsets to mask deleted operations cleanly payload. - Performant Hash Function: By default, it uses high-performance internal hasher variants built upon aHash (like
MojoAHashStrHash).
Requirements
- Rust Nightly: The crate strictly uses
#![feature(portable_simd)]and relies on thestd::simdfeatures, meaning you must compile it with the nightly Rust toolchain.
Usage
Below is a standard usage example:
use compact_dict::dict::Dict;
fn main() {
// Creating a new Dict: Values are of type `u32`
// Internally uses default Configs: MojoAHashStrHash, u32 for indices
let mut map: Dict<u32> = Dict::new(8);
// Insert elements
map.put("hello", 42);
map.put("world", 100);
// Retrieval
assert_eq!(map.get_or("hello", 0), 42);
assert_eq!(map.get_or("world", 0), 100);
// Missing key returns default
assert_eq!(map.get_or("missing", 0), 0);
// Metadata checks
assert_eq!(map.contains("hello"), true);
assert_eq!(map.len(), 2);
// Reset the map
map.clear();
assert_eq!(map.len(), 0);
}
Zero-Copy Memory Mapping (rkyv)
If you enable the rkyv feature, you can serialize and memory-map your dictionaries for instant access without any deserialization overhead (saving both time and RAM on massive structures).
[dependencies]
compact-dict = { version = "0.1.1", features = ["rkyv"] }
use compact_dict::dict::Dict;
use rkyv::ser::{serializers::AllocSerializer, Serializer};
// 1. Serialize
let mut dict = Dict::<u32>::new(16);
dict.put("zero-copy", 42);
let mut serializer = AllocSerializer::<256>::default();
serializer.serialize_value(&dict).unwrap();
let bytes = serializer.into_serializer().into_inner();
// 2. Instant Zero-Copy Lookup (e.g., mapped directly from a file via `mmap`)
// By utilizing relative pointer offsets, the `ArchivedDict` requires absolutely no allocations or deserialization CPU cycles to read.
let archived = unsafe { rkyv::archived_root::<Dict<u32>>(&bytes) };
assert_eq!(archived.get("zero-copy"), Some(42));
Running Tests
To test the package, run cargo test utilizing the nightly toolchain.
cargo +nightly test
Benchmarking:
There are two benchmarks in the repository: workload_bench (simulating real-world string ingestion and mutations) and dict_bench (using Criterion for pure primitive operations like get).
RUSTFLAGS="-C target-cpu=native" cargo +nightly bench --bench workload_bench
Workload Benchmark Results (workload_bench - Insertions + Mutations):
compact_dict_fx: 0.152 s ๐๐ฅ
fxhash: 0.182 s
std_hashmap: 0.267 s
hashbrown: 0.272 s
Pure Lookup Results (dict_bench - 10k random get operations):
RUSTFLAGS="-C target-cpu=native" cargo +nightly bench --bench dict_bench
hashbrown: ~74 ยตs ๐๐ฅ
compact_dict: ~143 ยตs
Scaling Benchmark (workload_bench by size):
To demonstrate the exact boundaries of Cache Locality vs SIMD pointer-chasing, we benchmarked the time to initialize, insert, and query varying dataset sizes.
| Dataset Size | Keys/Values Memory | compact-dict (AHash) | hashbrown | IndexMap | Winner |
|---|---|---|---|---|---|
| 1k | ~25 KB | 0.00006 s | 0.00005 s | 0.00005 s | Tie |
| 10k | ~250 KB | 0.00056 s | 0.00051 s | 0.00052 s | Tie |
| 50k | ~1.25 MB | 0.00332 s | 0.00294 s | 0.00387 s | hashbrown (~12% faster) |
| 100k | ~2.5 MB | 0.00670 s | 0.00744 s | 0.00716 s | compact-dict (~10% faster) |
| 500k | ~12.5 MB | 0.06847 s | 0.10188 s | 0.10495 s | compact-dict (~1.5x faster) ๐ |
| 1M | ~25 MB | 0.13251 s | 0.24084 s | 0.24170 s | compact-dict (~1.8x faster) ๐ |
| 5M | ~120 MB | 1.03868 s | 0.88833 s | 1.45756 s | hashbrown (~16% faster) |
| 10M | ~250 MB | 2.23751 s | 1.92807 s | 3.17355 s | hashbrown (~16% faster) |
| 20M | ~500 MB | 5.90096 s | 6.52020 s | 8.73994 s | compact-dict (~10% faster) ๐ |
| 50M | ~1.25 GB | 13.56649 s | 19.67217 s | 27.40097 s | compact-dict (~45% faster) ๐ |
Conclusion: The scaling behavior reveals an incredibly fascinating bimodal performance curve.
- L1/L2 Cache Zone (< 2 MB):
hashbrowncompetes fiercely through raw SIMD optimizations. - L3 Cache Bound (2MB - 50MB):
compact-dictutterly destroys pointing-chasing architectures, being nearly 2x faster due to perfect L3 prefetching. - DRAM Latency Zone (50MB - 300MB): Once the working dataset firmly spills into RAM (e.g. 10M), the continuous memory structure suffers from linear DRAM lookups, allowing
hashbrown's highly localized SIMD metadata scans to gracefully overtake it. - Memory Bandwidth Saturation (> 500MB): At ultra-large scales (50 million elements), the massive pointer graph and memory bloat of traditional maps causes severe cache thrashing array-wide.
compact-dictre-takes the lead by a massive 45% margin, simply because linear array iteration maximizes extreme DRAM memory bandwidth throughput, beating out random pointer lookups.
Diverse Benchmark Scenarios (1M Elements)
To explore the extreme edges of linear probing, we benchmarked specific scenarios using diverse_bench:
| Scenario | compact-dict (AHash) | hashbrown | Winner |
|---|---|---|---|
| Read-Heavy Workload (90% Reads) | 0.324 s | 0.629 s | compact-dict (~1.9x faster) ๐ |
| High Load Factor (~85%) | 0.073 s | 0.198 s | compact-dict (~2.7x faster) ๐ |
| Unsuccessful Lookups (100% Misses) | 0.021 s | 0.011 s | hashbrown (~1.9x faster) |
Takeaway:
compact-dictdominates read-heavy sequential workloads perfectly.- While high load factors theoretically degrade linear probing, the L3 Cache locality still powers through linear sequences faster than pointer-chasing at 1M scale.
- The absolute worst-case scenario for
compact-dictis Unsuccessful Lookups. Linear probing is forced to search sequentially until it hits an empty slot (which takes longer at higher load factors), making it roughly ~2x slower thanhashbrown's SIMD metadata scanning which immediately rejects misses.
Note: For the most accurate comparisons without allocation overhead skewing metrics, ensure to run tests natively with LTO enabled, as seen in the bench profile.
โ๏ธ Design Trade-offs & Philosophy
compact-dict isn't a drop-in replacement for every use case. It is a specialized tool built with specific constraints to achieve maximum throughput.
1. The "No-Deletion" Strategy
Currently, compact-dict is optimized for Append-Only or Static workloads.
- Why? Implementing deletions in a linear probing map usually requires either "Tombstones" (which pollute the cache and slow down lookups) or "Backward Shift Deletion" (which is expensive).
- Status: If you need frequent
remove()operations, stick tohashbrown. If you need raw lookup speed for datasets that are built once and read many times, this is for you.
2. Linear Probing vs. SwissTables (SIMD)
While hashbrown uses SIMD instructions to scan metadata buckets, compact-dict bets on the modern CPU's L1/L2 cache prefetcher.
- The Bet: For small to medium-sized maps, the overhead of setting up SIMD registers can be higher than just letting the CPU scan a contiguous block of memory. We prioritize minimal pointer chasing.
3. Memory Brutalism
We use std::ptr and raw memory layouts to bypass some of the overhead of high-level abstractions.
- Safety: The core logic is wrapped in
unsafeblocks where performance dictates it. While we strive for correctness, the primary goal is squeezing every nanosecond out of the hardware. - Audit: We welcome contributors to run
cargo miri testand help us refine the memory boundaries.
4. Load Factor & Clustering
Because we use Linear Probing, this map is sensitive to the load factor.
- To maintain peak performance, we recommend keeping the load factor below 0.7.
- Past this point, "Primary Clustering" can occur. We trade this risk for the benefit of extreme cache locality during successful lookups.
Performance Analysis & Honest Comparison
compact-dict is exceptionally fast for very specific workloads, beating out highly optimized SwissTable implementations like hashbrown and standard std::collections::HashMap. However, it makes severe trade-offs to achieve this speed. Here is an honest guide on when to use what:
โ
Where compact-dict Wins (Strengths)
- String-Heavy Initialization & Iteration: Instead of individually heap-allocating every
Stringkey like standard HashMaps do,compact-dictcopies all strings into a single densely-packed, continuousVec<u8>memory buffer (KeysContainer). - SIMD Vectorization: Lookups leverage
#![feature(portable_simd)]to compare up to 16 cachedu32hashes in exactly a single hardware instruction cycle. - Data Analysis & Short-lived Workloads: If you need to ingest millions of unique strings rapidly, perform some math/updates on their values, and then drop the map,
compact-dictwill significantly outpace its competitors.
โ Where compact-dict Loses (Weaknesses / Use hashbrown instead)
- Continuous Server Deletions: Deleting elements in
compact-dictonly marks a bit field as deleted (tombstoning). It never compacts or frees the physical string memory from the keys buffer. If you implement a long-running web server that constantly adds and removes strings,compact-dictwill act as a memory leak until the entire dictionary is dropped. - Generic Keys: The dictionary is hardcoded around
KeysContaineroffsets, heavily specializing in&str. You cannot drop inHashMap<Uuid, usize>or custom structs as keys easily. Standard map implementations are completely generic. - Ecosystem Stability: It relies on Nightly Rust explicitly for
std::simd.hashbrownhas zero unstable dependencies and runs practically everywhere perfectly optimized for all target architectures. - Pure Lookup Speed: In pure read-heavy workloads (e.g., retrieving 10,000 strings without inserting or scanning new ones), highly optimized SwissTables like
hashbrownstill outperformcompact-dict. As seen in the pure-getmicrobenchmark,hashbrowncan be about 2x faster thancompact-dictfor isolated random-access lookups. The performance strength ofcompact-dictrevolves around the combined speed of contiguous string ingestion, sequential iteration cache-locality, and mutation operations.
Dependencies
~2โ2.7MB
~52K SLoC