Readme
Embeddenator — Holographic Computing Substrate
Version 0.22.0 | Rust implementation of sparse ternary Vector Symbolic Architecture (VSA) for holographic data encoding.
Embeddenator is an encoding method and data model. It is not a security implementation.
Author: Tyler Zervas tz-dev@vectorweight.com
License: MIT (see LICENSE file)
Component Architecture
Embeddenator is organized into 8 independent library crates:
See Component Architecture for details.
Current Capabilities
Implemented Features
Engram Encoding/Decoding : Create holographic encodings (. engram files) of filesystems
Data Reconstruction : Reconstruction of files from engrams with correction store
VSA Operations : Bundle, bind, and other vector symbolic operations on sparse ternary vectors
Hierarchical Encoding : Multi-level chunking for handling larger datasets
SIMD Support : Optional AVX2/NEON optimizations (2-4x speedup on supported hardware)
CLI Tool : Command-line interface for ingest, extract, query, and update operations
Incremental Updates : Add, remove, modify files without full re-ingestion
Test Coverage : 160+ integration tests covering core functionality
Known Limitations
The following limitations are documented based on test results:
Large file reconstruction : Fidelity degrades for files over 1MB with default configuration
Deep path encoding : Path depths beyond 20 levels may produce incorrect output
Bind inverse : The bind inverse operation degrades for sparse key configurations
Storage overhead : VSA encoding produces larger output than input (approximately 2-3x)
These limitations are inherent to the VSA encoding model and are documented in the test suite.
Experimental/In Development
FUSE Filesystem : EmbrFS integration (partial implementation)
Query Performance : Similarity search and retrieval (basic implementation)
Large-Scale Testing : TB-scale validation (manual testing only)
Version History
Comprehensive test suite (unit + integration + e2e + doc tests)
Intelligent test runner with accurate counting and debug mode
Dual versioning strategy for OS builds (LTS + nightly)
Zero clippy warnings (29 fixes applied)
Extended OS support : Debian 12 LTS, Debian Testing/Sid, Ubuntu 24.04 LTS, Ubuntu Devel/Rolling
Native amd64 CI (required pre-merge check) + arm64 ready for self-hosted runners
Automated documentation with rustdoc and 9 doc tests
Core Concepts
Vector Symbolic Architecture (VSA)
Embeddenator uses sparse ternary vectors to represent data holographically:
Bundle (⊕) : Superposition operation for combining vectors
Bind (⊙) : Compositional operation with approximate self-inverse property
Cosine Similarity : Measure of vector similarity for retrieval
The ternary representation {-1, 0, +1} enables efficient computation:
39-40 trits can be encoded in a 64-bit register
Sparse representation reduces memory and computation requirements
Based on balanced ternary arithmetic
Current Configuration :
10,000 dimensions with ~1% sparsity (~100-200 non-zero elements per vector)
Provides balance between collision resistance and computational efficiency
Higher dimensions and sparsity configurations are under investigation
Engrams
An engram is a holographic encoding of an entire filesystem or dataset:
Single root vector containing superposition of all chunks
Codebook storing encoded vector representations of data chunks
Manifest tracking file structure and metadata
Data Encoding : The codebook stores encoded vector representations of data chunks:
Codebook is required for reconstruction
Uses sparse ternary vectors for holographic superposition
Supports deterministic encoding and decoding
Note: Embeddenator is an encoding method, not a security implementation. The codebook provides no cryptographic guarantees.
Quick Start
Installation
# From crates.io
cargo install embeddenator-core
# Or clone the repository
git clone https://github.com/tzervas/embeddenator-core.git
cd embeddenator-core
# Build with Cargo
cargo build --release
# Or use the orchestrator
python3 orchestrator.py --mode build --verbose
As a Library
# Cargo.toml
[ dependencies ]
embeddenator-core = " 0.22"
// Rust imports use the lib name "embeddenator"
use embeddenator:: prelude:: * ;
Basic Usage
# Ingest a directory into an engram
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v
# Extract from an engram
cargo run --release -- extract -e root.engram -m manifest.json -o ./output -v
# Query similarity
cargo run --release -- query -e root.engram -q ./test_file.txt -v
Using the Orchestrator
The orchestrator provides unified build, test, and deployment workflows:
# Quick start: build, test, and package everything
python3 orchestrator.py --mode full --verbose -i
# Run integration tests
python3 orchestrator.py --mode test --verbose
# Build Docker image
python3 orchestrator.py --mode package --verbose
# Display system info
python3 orchestrator.py --mode info
# Clean all artifacts
python3 orchestrator.py --mode clean
CLI Reference
Embeddenator provides the following commands for working with holographic engrams:
embeddenator - - help
Get comprehensive help information:
# Show main help with examples
embeddenator --help
# Show detailed help for a specific command
embeddenator ingest --help
embeddenator extract --help
embeddenator query --help
embeddenator query-text --help
embeddenator bundle-hier --help
ingest - Create Holographic Engram
Process one or more files and/or directories and encode them into a holographic engram.
embeddenator ingest [ OPTIONS] -- input < PATH> ...
Required:
- i , -- input < PATH> ... Input file(s ) and/or directory(ies ) to ingest
Options:
-e, -- engram < FILE> Output engram file [ default: root.engram]
-m, -- manifest < FILE> Output manifest file [ default: manifest.json]
-v, -- verbose Enable verbose output with progress and statistics
-h, -- help Print help information
Examples:
# Basic ingestion
embeddenator ingest -i ./myproject -e project.engram -m project.json
# Mix files and directories (repeat -i/--input)
embeddenator ingest -i ./src -i ./README.md -e project.engram -m project.json
# With verbose output
embeddenator ingest -i ~/Documents -e docs.engram -v
# Custom filenames
embeddenator ingest --input ./data --engram backup.engram --manifest backup.json
What it does:
Recursively scans any input directories
Ingests any input files directly
Chunks files (4KB default)
Encodes chunks using sparse ternary VSA
Creates holographic superposition in root vector
Saves engram (holographic data) and manifest (metadata)
Bit-perfect reconstruction of all files from an engram.
embeddenator extract [ OPTIONS] -- output-dir < DIR>
Required:
-o, -- output-dir < DIR> Output directory for reconstructed files
Options:
-e, -- engram < FILE> Input engram file [ default: root.engram]
-m, -- manifest < FILE> Input manifest file [ default: manifest.json]
-v, -- verbose Enable verbose output with progress
-h, -- help Print help information
Examples:
# Basic extraction
embeddenator extract -e project.engram -m project.json -o ./restored
# With default filenames
embeddenator extract -o ./output -v
# From backup
embeddenator extract --engram backup.engram --manifest backup.json --output-dir ~/restored
What it does:
Loads engram and manifest
Reconstructs directory structure
Algebraically unbinds chunks from root vector
Writes bit-perfect copies of all files
Preserves file hierarchy and metadata
query - Similarity Search
Compute cosine similarity between a query file and engram contents.
embeddenator query [ OPTIONS] -- query < FILE>
Required:
-q, -- query < FILE> Query file or pattern to search for
Options:
-e, -- engram < FILE> Engram file to query [ default: root.engram]
--hierarchical-manifest < FILE> Optional hierarchical manifest (selective unfolding )
--sub-engrams-dir < DIR> Directory of ` .subengram ` files (used with -- hierarchical-manifest )
--k < K> Top-k results to print for codebook/hierarchical search [ default: 10]
-v, -- verbose Enable verbose output with similarity details
-h, -- help Print help information
Examples:
# Query similarity
embeddenator query -e archive.engram -q search.txt
# With verbose output
embeddenator query -e data.engram -q pattern.bin -v
# Using default engram
embeddenator query --query testfile.txt -v
What it does:
Encodes query file using VSA
Computes cosine similarity with engram
Returns similarity score
If --hierarchical-manifest and --sub-engrams-dir are provided, it also runs a store-backed hierarchical query and prints the top hierarchical matches.
Similarity interpretation:
>0.75 : Strong match, likely contains similar content
0.3-0.75 : Moderate similarity, some shared patterns
<0.3 : Low similarity, likely unrelated content
query-text - Similarity Search (Text)
Encode a literal text string as a query vector and run the same retrieval path as query .
embeddenator query-text - e root.engram -- text " search phrase" -- k 10
# With hierarchical selective unfolding:
embeddenator query-text -e root.engram --text "search phrase" \
--hierarchical-manifest hier.json --sub-engrams-dir ./sub_engrams --k 10
bundle-hier - Build Hierarchical Retrieval Artifacts
Build a hierarchical manifest and a directory of sub-engrams from an existing flat root. engram + manifest. json . This enables store-backed selective unfolding queries.
embeddenator bundle-hier - e root.engram - m manifest.json \
--out-hierarchical-manifest hier.json \
--out-sub-engrams-dir ./sub_engrams
# Optional: deterministically shard large nodes (bounds per-node indexing cost)
embeddenator bundle-hier -e root.engram -m manifest.json \
--max-chunks-per-node 2000 \
--out-hierarchical-manifest hier.json \
--out-sub-engrams-dir ./sub_engrams
Docker Usage (Experimental)
Note: Docker support is in development and may not be fully functional.
docker build - f Dockerfile.tool - t embeddenator-tool:latest .
Run in Container
# Ingest data
docker run -v $(pwd)/input_ws:/input -v $(pwd)/workspace:/workspace \
embeddenator-tool:latest \
ingest -i /input -e /workspace/root.engram -m /workspace/manifest.json -v
# Extract data
docker run -v $(pwd)/workspace:/workspace -v $(pwd)/output:/output \
embeddenator-tool:latest \
extract -e /workspace/root.engram -m /workspace/manifest.json -o /output -v
Test Coverage
Embeddenator has comprehensive test coverage:
160+ integration tests across 23 test suites
97.6% pass rate (166/170 tests passing)
Test categories : Balanced ternary, codebook operations, VSA properties, error recovery, hierarchical operations, CLI integration
Continuous testing : All core functionality verified with each build
Verified Capabilities
Text file reconstruction : Byte-for-byte identical reconstruction verified
Binary file recovery : Exact binary reconstruction tested
VSA operations : Bundle, bind, and similarity operations tested
Hierarchical encoding : Multi-level chunking verified
Error recovery : Corruption and concurrency handling tested
In Development
Large-scale testing : TB-scale datasets not yet fully validated
Performance optimization : Benchmarking and tuning ongoing
Security audit : Cryptographic properties under research
Architecture
Core Components
SparseVec : Sparse ternary vector implementation
pos : Indices with +1 value
neg : Indices with -1 value
Efficient operations: bundle, bind, cosine similarity
Hardware-optimized: 39-40 trits per 64-bit register
EmbrFS : Holographic filesystem layer
Chunked encoding (4KB default)
Manifest for file metadata
Codebook for chunk storage
CLI : Command-line interface
Ingest: directory → engram
Extract: engram → directory
Query: similarity search
Architecture Decision Records (ADRs)
Comprehensive architectural documentation is available in docs/adr/ :
ADR-001 : Sparse Ternary VSA
Core VSA design and sparse ternary vectors
Balanced ternary mathematics and hardware optimization
64-bit register encoding (39-40 trits per register)
ADR-002 : Multi-Agent Workflow System
ADR-003 : Self-Hosted Runner Architecture
ADR-004 : Holographic OS Container Design
Configuration-driven builder for Debian/Ubuntu
Dual versioning strategy (LTS + nightly)
Package isolation capabilities
ADR-005 : Hologram-Based Package Isolation
Factoralization of holographic containers
Balanced ternary encoding for compact representation
Package-level granular updates
Hardware optimization strategy for 64-bit CPUs
ADR-006 : Dimensionality and Sparsity Scaling
Scaling holographic space to TB-scale datasets
Adaptive sparsity strategy (maintain constant computational cost)
Performance analysis and collision probability projections
Impact on 100% bit-perfect guarantee
Deep operation resilience for factoralization
ADR-007 : Codebook Security and Reversible Encoding
VSA-as-a-lens cryptographic primitive
Quantum-resistant encoding mechanism
Mathematically trivial with key, impossible without
Bulk encryption with selective decryption
Integration with holographic indexing
See docs/adr/README.md for the complete ADR index.
Engram (. engram):
Binary serialized format (bincode)
Contains root SparseVec and codebook
Self-contained holographic state
Manifest (. json):
Human-readable file listing
Chunk mapping and metadata
Required for extraction
Development
API Documentation
Comprehensive API documentation is available:
# Generate and open documentation locally
cargo doc --open
# Or use the automated script
./generate_docs.sh
# View online (after publishing)
# https://docs.rs/embeddenator
The documentation includes:
Module-level overviews with examples
Function documentation with usage patterns
9 runnable doc tests demonstrating API usage
VSA operation examples (bundle, bind, cosine)
Running Tests
# Recommended: everything Cargo considers testable (lib/bin/tests/examples/benches)
cargo test --workspace --all-targets
# Doc tests only
cargo test --doc
# Optimized build tests (useful before benchmarking)
cargo test --release --workspace --all-targets
# Feature-gated correctness/perf gates
cargo test --workspace --all-targets --features "bt-phase-2 proptest"
# Long-running/expensive tests are explicitly opt-in:
# - QA memory scaling (requires env var + ignored flag)
EMBEDDENATOR_RUN_QA_MEMORY=1 cargo test --features qa --test memory_scaled -- --ignored --nocapture
# - Multi-GB soak test (requires env var + ignored flag)
EMBEDDENATOR_RUN_SOAK=1 cargo test --release --features soak-memory --test soak_memory -- --ignored --nocapture
# Integration tests via orchestrator
python3 orchestrator.py --mode test --verbose
# Full test suite
python3 orchestrator.py --mode full --verbose
Notes:
Seeing many tests marked as "ignored" during cargo bench is expected: Cargo runs the unit test
harness in libtest's --bench mode, which skips normal # [ test ] functions (it prints i for each).
Use cargo test (commands above) to actually execute tests.
cargo test -- workspace -- all-targets will also compile/run Criterion benches in a fast "smoke" mode
(they print Testing ... Success ). This is intended to catch broken benches early.
CI/CD and Build Monitoring
The project uses separated CI/CD workflows for optimal performance and reliability:
# Test CI build locally with monitoring
./ci_build_monitor.sh linux/amd64 build 300
# Monitor for specific timeout (in seconds)
./ci_build_monitor.sh linux/amd64 full 900
CI Workflow Structure:
Three separate workflows eliminate duplication and provide clear responsibilities:
ci-pre-checks.yml - Fast validation (fmt, clippy, unit tests, doc tests)
ci-amd64.yml - Full AMD64 build and test (REQUIRED PRE-MERGE CHECK )
ci-arm64.yml - ARM64 build and test (configured for self-hosted runners)
CI Features:
Separated workflows prevent duplicate runs
AMD64 workflow is a required status check - PRs cannot merge until it passes
Parallel builds using all available cores
Intelligent timeout management (15min tests, 10min builds, 30min total)
Build artifact upload on failure
Performance metrics reporting
Automatic parallelization with CARGO_BUILD_JOBS
Architecture Support:
Architecture
Status
Runner Type
Trigger
Notes
amd64 (x86_64)
Production
GitHub-hosted (ubuntu-latest)
Every PR (required check)
Stable, 5-7min
arm64 (aarch64)
🚧 Ready
Self-hosted (pending deployment)
Manual only
Will enable on merge to main
ARM64 Deployment Roadmap:
Phase 1 : Root cause analysis completed - GitHub doesn't provide standard ARM64 runners
Phase 2 : Workflow configured for self-hosted runners with labels [ " self-hosted" , " linux" , " ARM64" ]
🚧 Phase 3 : Deploy self-hosted ARM64 infrastructure (in progress)
⏳ Phase 4 : Manual testing and validation
⏳ Phase 5 : Enable automatic trigger on merge to main only
Why Self-Hosted for ARM64?
GitHub Actions doesn't provide standard hosted ARM64 runners
Self-hosted provides native execution (no emulation overhead)
Cost-effective for frequent builds
Ready to deploy when infrastructure is available
See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.
Self-Hosted Runner Automation
Embeddenator includes a comprehensive Python-based automation system for managing GitHub Actions self-hosted runners with complete lifecycle management and multi-architecture support :
Features:
Automated registration with short-lived tokens
Complete lifecycle management (register → run → deregister)
Configurable auto-deregistration after idle timeout
Manual mode for persistent runners
Multi-runner deployment support
Multi-architecture support (x64, ARM64, RISC-V)
QEMU emulation for cross-architecture runners
Health monitoring and status reporting
🧹 Automatic cleanup of Docker resources
⚙️ Flexible configuration via .env file or CLI arguments
Supported Architectures:
x64 (AMD64) - Native x86_64 runners
ARM64 (aarch64) - ARM64 runners (native or emulated via QEMU)
RISC-V (riscv64) - RISC-V runners (native or emulated via QEMU)
Quick Start:
# 1. Copy and configure environment file
cp .env.example .env
# Edit .env and set GITHUB_REPOSITORY and GITHUB_TOKEN
# 2. Run in auto mode (registers, starts, monitors, auto-deregisters when idle)
python3 runner_manager.py run
# 3. Or use manual mode (keeps running until stopped)
RUNNER_MODE=manual python3 runner_manager.py run
Multi-Architecture Examples:
# Deploy ARM64 runners on x86_64 hardware (with emulation, auto-detect runtime)
RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Deploy runners for all architectures
RUNNER_TARGET_ARCHITECTURES=x64,arm64,riscv64 RUNNER_COUNT=6 python3 runner_manager.py run
# Deploy with automatic QEMU installation (requires sudo)
RUNNER_EMULATION_AUTO_INSTALL=true RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Use specific emulation method (docker, podman, or qemu)
RUNNER_EMULATION_METHOD=podman RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Use Docker for emulation
RUNNER_EMULATION_METHOD=docker RUNNER_TARGET_ARCHITECTURES=arm64,riscv64 python3 runner_manager.py run
Individual Commands:
# Register runner(s)
python3 runner_manager.py register
# Start runner service(s)
python3 runner_manager.py start
# Monitor and manage lifecycle
python3 runner_manager.py monitor
# Check status
python3 runner_manager.py status
# Stop and deregister
python3 runner_manager.py stop
Advanced Usage:
# Deploy multiple runners
python3 runner_manager.py run --runner-count 4
# Custom labels
python3 runner_manager.py register --labels self-hosted,linux,ARM64,large
# Auto-deregister after 10 minutes of inactivity
RUNNER_IDLE_TIMEOUT=600 python3 runner_manager.py run
Configuration Options:
Key environment variables (see . env. example for full list):
GITHUB_REPOSITORY - Repository to register runners for (required)
GITHUB_TOKEN - Personal access token with repo scope (required)
RUNNER_MODE - Deployment mode: auto (default) or manual
RUNNER_IDLE_TIMEOUT - Auto-deregister timeout in seconds (default: 300)
RUNNER_COUNT - Number of runners to deploy (default: 1)
RUNNER_LABELS - Comma-separated runner labels
RUNNER_EPHEMERAL - Enable ephemeral runners (deregister after one job)
RUNNER_TARGET_ARCHITECTURES - Target architectures: x64 , arm64 , riscv64 (comma-separated)
RUNNER_ENABLE_EMULATION - Enable QEMU emulation for cross-architecture (default: true)
RUNNER_EMULATION_METHOD - Emulation method: auto , qemu , docker , podman (default: auto)
RUNNER_EMULATION_AUTO_INSTALL - Auto-install QEMU if missing (default: false, requires sudo)
See . env. example for complete configuration documentation.
Deployment Modes:
Auto Mode (default): Runners automatically deregister after being idle for a specified timeout
Perfect for cost optimization
Ideal for CI/CD pipelines with sporadic builds
Runners terminate when queue is empty
Manual Mode : Runners keep running until manually stopped
Best for development environments
Useful for persistent infrastructure
Explicit control over runner lifecycle
See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.
Project Structure
embeddenator/
├── Cargo. toml # Rust dependencies
├── src/
│ └── main. rs # Complete implementation
├── tests/
│ ├── e2e_regression. rs # 6 E2E tests ( includes critical engram modification test)
│ ├── integration_cli. rs # 7 integration tests
│ └── unit_tests. rs # 11 unit tests
├── Dockerfile. tool # Static binary packaging
├── Dockerfile. holographic # Holographic OS container
├── orchestrator. py # Unified build/ test/ deploy
├── runner_manager. py # Self - hosted runner automation entry point ( NEW )
├── runner_automation/ # Runner automation package ( NEW )
│ ├── __init__. py # Package initialization ( v1. 1. 0 )
│ ├── config. py # Configuration management
│ ├── github_api. py # GitHub API client
│ ├── installer. py # Runner installation
│ ├── runner. py # Individual runner lifecycle
│ ├── manager. py # Multi- runner orchestration
│ ├── emulation. py # QEMU emulation for cross- arch ( NEW )
│ ├── cli. py # Command- line interface
│ └── README . md # Package documentation
├── . env. example # Runner configuration template ( NEW )
├── ci_build_monitor. sh # CI hang detection and monitoring
├── generate_docs. sh # Documentation generation
├── . github/
│ └── workflows/
│ ├── ci- pre- checks. yml # Pre- build validation ( every PR )
│ ├── ci- amd64. yml # AMD64 build ( required for merge)
│ ├── ci- arm64. yml # ARM64 build ( self - hosted, pending)
│ ├── build- holographic- os. yml# OS container builds
│ ├── build- push- images. yml # Multi- OS image pipeline
│ ├── nightly- builds. yml # Nightly bleeding- edge builds
│ └── README . md # Complete CI / CD documentation
├── input_ws/ # Example input ( gitignored)
├── workspace/ # Build artifacts ( gitignored)
└── README . md # This file
Contributing
We welcome contributions to Embeddenator! Here's how you can help:
Getting Started
Fork the repository on GitHub
Clone your fork locally:
git clone https://github.com/YOUR_USERNAME/embeddenator.git
cd embeddenator
Create a feature branch :
git checkout - b feature/my-new-feature
Development Workflow
Make your changes with clear, focused commits
Add tests for new functionality:
Unit tests in src/ modules
Integration tests in tests/integration_*.rs
End-to-end tests in tests/e2e_*.rs
Run the full test suite :
# Run all Rust tests
cargo test
# Run integration tests via orchestrator
python3 orchestrator.py --mode test --verbose
# Run full validation suite
python3 orchestrator.py --mode full --verbose
Check code quality :
# Run Clippy linter (zero warnings required)
cargo clippy -- -D warnings
# Format code
cargo fmt
# Check Python syntax
python3 -m py_compile *.py
Test cross-platform (if applicable):
# Build Docker images
docker build -f Dockerfile.tool -t embeddenator-tool:test .
# Test on different architectures
python3 orchestrator.py --platform linux/arm64 --mode test
Pull Request Guidelines
Write clear commit messages describing what and why
Reference issues in commit messages (e.g., "Fixes #123")
Keep PRs focused - one feature or fix per PR
Update documentation if you change CLI options or add features
Ensure all tests pass before submitting
Maintain code coverage - aim for >80% test coverage
Code Style
Rust : Follow standard Rust conventions (use cargo fmt )
Python : Follow PEP 8 style guide
Comments : Document complex algorithms, especially VSA operations
Error handling : Use proper error types, avoid . unwrap ( ) in library code
Areas for Contribution
We especially welcome contributions in these areas:
🔬 Performance optimizations for VSA operations
Benchmarking tools and performance analysis
Additional test cases covering edge cases
Documentation improvements and examples
🐛 Bug fixes and error handling improvements
Multi-platform support (Windows, macOS testing)
New features (incremental updates, compression options, etc.)
Reporting Issues
When reporting bugs, please include:
Embeddenator version (embeddenator - - version )
Operating system and architecture
Rust version (rustc -- version)
Minimal reproduction steps
Expected vs. actual behavior
Relevant log output (use --verbose flag)
Questions and Discussions
Issues : Bug reports and feature requests
Discussions : Questions, ideas, and general discussion
Pull Requests : Code contributions with tests
Code of Conduct
Be respectful and inclusive
Provide constructive feedback
Focus on the technical merits
Help others learn and grow
Thank you for contributing to Embeddenator!
Advanced Usage
Custom Chunk Size
Modify chunk_size in EmbrFS:: ingest_file for different trade-offs:
let chunk_size = 8192 ; // Larger chunks = better compression, slower reconstruction
Hierarchical Encoding
For very large datasets, implement multi-level engrams:
// Level 1: Individual files
// Level 2: Directory summaries
// Level 3: Root engram of all directories
Algebraic Operations
Combine multiple engrams:
let combined = engram1. root. bundle ( & engram2. root) ;
// Now combined contains both datasets holographically
Troubleshooting
Out of Memory
Reduce chunk size or process files in batches:
# Process directories separately
for dir in input_ws/*/; do
cargo run --release -- ingest -i "$dir" -e "engrams/$(basename $dir).engram"
done
Reconstruction Mismatches
Verify manifest and engram are from the same ingest:
# Check manifest metadata
jq '.total_chunks' workspace/manifest.json
# Re-ingest if needed
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v
Use release builds : cargo build -- release is 10-100x faster
Enable SIMD acceleration : For query-heavy workloads, build with --features simd and RUSTFLAGS = " -C target-cpu=native"
# Build with SIMD optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release --features simd
See docs/SIMD_OPTIMIZATION.md for details on 2-4x query speedup
Batch processing : Ingest multiple directories separately for parallel processing
SSD storage : Engram I/O benefits significantly from fast storage
Memory : Ensure sufficient RAM for large codebooks (~100 bytes per chunk)
License
MIT License - see LICENSE file for details
References
Vector Symbolic Architectures (VSA)
Vector Symbolic Architectures: Kanerva, P. (2009)
Sparse Distributed Representations
Holographic Reduced Representations (HRR)
Ternary Computing and Hardware Optimization
Balanced Ternary - Wikipedia overview
Ternary Computing - Historical and mathematical foundations
Three-Valued Logic and Quantum Computing
Optimal encoding: 39-40 trits in 64-bit registers (39 for signed, 40 for unsigned)
Architecture Documentation
Use Cases and Applications
Specialized AI Assistant Models - Architecture for deploying coding and research assistant LLMs with embeddenator-enhanced retrieval, multi-model parallel execution, and document-driven development workflows
Support
Getting Help
Common Questions
Q: What file types are supported?
A: All file types - text, binary, executables, images, etc. Embeddenator is file-format agnostic.
Q: Is the reconstruction really bit-perfect?
A: Yes, for files tested so far. We have 160+ tests verifying reconstruction accuracy. However, large-scale (TB) testing is still in progress.
Q: What's the project's development status?
A: This is alpha software (v0.20.0-alpha). Core functionality works and is tested, but APIs are unstable and not recommended for production use. See PROJECT_STATUS.md for details.
Q: Can I combine multiple engrams?
A: Yes! The bundle operation allows combining engrams. This is tested for basic cases but advanced algebraic operations are still experimental.
Q: What's the maximum data size?
A: Hierarchical encoding is designed for large datasets. Currently tested with MB-scale data; TB-scale testing is planned but not yet validated.
Q: How does this compare to compression?
A: Embeddenator is not primarily a compression tool. It creates holographic representations that enable algebraic operations on encoded data. Size characteristics vary by data type.
Reporting Issues
When reporting bugs, please include:
Embeddenator version: embeddenator - - version
Operating system and architecture
Rust version: rustc -- version
Minimal reproduction steps
Expected vs. actual behavior
Relevant log output (use --verbose flag)
Documentation
Project Documentation
PROJECT_STATUS.md - Complete status: what works, what's experimental, what's planned
TESTING.md - Comprehensive testing guide and infrastructure documentation
LICENSE - MIT License terms
Technical Documentation
API Documentation
# Generate and view API documentation
cargo doc --open
License: MIT - See LICENSE file for full text
Copyright: 2025-2026 Tyler Zervas tz-dev@vectorweight.com
Built with Rust and Vector Symbolic Architecture principles.