Skip to main content

Crate embeddenator_testkit

Crate embeddenator_testkit 

Source
Expand description

§Embeddenator TestKit

Comprehensive testing utilities for embeddenator VSA operations, performance benchmarking, and large-scale data validation.

§Performance Optimization Insights (v0.20.0-alpha.1)

Based on extensive benchmarking across scales from 250MB to 20GB+:

§Current Optimizations (bt-phase-2 + SIMD)

  • Packed ternary operations: 10-20x speedup for dense vectors
  • SIMD cosine similarity: Platform-specific acceleration (AVX2/NEON)
  • Thread-local scratch buffers: Eliminates allocation overhead
  • Hybrid bundling: Adaptive selection between pairwise/sum-many modes

§Performance Baselines (Intel i7-14700K, 46GB RAM)

  • Bundle (pairwise): ~43ns (sparse), ~32µs (dense packed)
  • Bind: ~11ns (sparse), ~20µs (dense packed)
  • Cosine: ~7ns (sparse), ~14µs (dense packed)
  • Ingestion: ~15 MB/s (2GB dataset), scales linearly
  • Extraction: ~41 MB/s (2GB dataset), bit-perfect reconstruction

§Memory Scaling

  • Storage overhead: 2.8x (engram size vs input)
  • Peak memory: Bounded by hierarchical chunking
  • Large datasets: 20GB+ supported with linear scaling

§Future Optimizations (Planned)

  • GPU acceleration: CUDA/OpenCL backends for VSA operations
  • CPU-GPU coprocessing: Hybrid execution models
  • Memory-mapped I/O: For datasets > RAM capacity
  • Distributed processing: Multi-node VSA operations

§Testing Infrastructure

§Benchmark Categories

  • Micro-benchmarks: Individual VSA operations (ns scale)
  • Macro-benchmarks: End-to-end workflows (ms-seconds scale)
  • Scale benchmarks: 20GB-40GB dataset validation
  • Stress tests: Memory pressure, concurrent operations

§Dataset Generation

  • Synthetic data: Controlled patterns for reproducible testing
  • Realistic data: Varied file types, sizes, and content patterns
  • Scale patterns: Linear growth from KB to TB scales

§Usage Examples

use embeddenator_testkit::*;

// Generate random sparse vectors for testing
let mut rng = rand::thread_rng();
let vec = generators::random_sparse_vec(&mut rng, 10000, 200);

// Create test datasets
let harness = TestHarness::new();
let dataset = harness.create_dataset(100); // 100MB

// Run performance validation
let mut metrics = TestMetrics::new("bind_operation");
metrics.start_timing();
let result = vec.bind(&vec);
metrics.stop_timing();
println!("{}", metrics.summary());

Re-exports§

pub use chaos::ChaosInjector;
pub use fixtures::create_test_data;
pub use fixtures::create_test_dataset;
pub use fixtures::TestDataPattern;
pub use generators::deterministic_sparse_vec;
pub use generators::mk_random_sparsevec;
pub use generators::random_sparse_vec;
pub use generators::sparse_dot;
pub use harness::TestHarness;
pub use integrity::IntegrityReport;
pub use integrity::IntegrityValidator;
pub use metrics::AccuracyMetrics;
pub use metrics::TestMetrics;
pub use metrics::TimingStats;
pub use metrics::VsaEvaluationMetrics;

Modules§

chaos
Chaos injection and resilience testing utilities
fixtures
Test data fixtures and dataset generation
generators
Test data generators for VSA vectors and test datasets
harness
Test harness for managing temporary directories and test datasets
integrity
Data integrity validation utilities
metrics
Performance metrics and timing utilities for testing

Structs§

SparseVec
Sparse ternary vector with positive and negative indices
VsaConfig
Configuration for VSA vector operations
VsaConfigSchema
User-provided schema for custom VSA configuration

Enums§

SparsityScaling
Sparsity scaling modes for dynamic dimension handling

Constants§

DIM
Default dimension of VSA vectors (10,000 dimensions)

Functions§

testkit_smoke
Smoke test for testkit functionality