1 unstable release
| 0.22.0 | Jan 26, 2026 |
|---|
#10 in #holographic
775KB
1.5K
SLoC
Embeddenator — Holographic Computing Substrate
Version 0.22.0 | Rust implementation of sparse ternary Vector Symbolic Architecture (VSA) for holographic data encoding.
Embeddenator is an encoding method and data model. It is not a security implementation.
Author: Tyler Zervas tz-dev@vectorweight.com License: MIT (see LICENSE file)
Component Architecture
Embeddenator is organized into 8 independent library crates:
| Crate | Description | crates.io |
|---|---|---|
| embeddenator-vsa | Sparse ternary VSA primitives | 0.21.0 |
| embeddenator-io | Codebook, manifest, engram I/O | 0.21.0 |
| embeddenator-obs | Observability and metrics | 0.21.0 |
| embeddenator-retrieval | Query engine with shift-sweep search | 0.21.0 |
| embeddenator-fs | FUSE filesystem integration | 0.23.0 |
| embeddenator-interop | Python/FFI bindings | 0.22.0 |
| embeddenator-cli | Command-line interface | 0.21.0 |
| embeddenator-core | Umbrella crate (re-exports) | 0.22.0 |
See Component Architecture for details.
Current Capabilities
Implemented Features
- Engram Encoding/Decoding: Create holographic encodings (
.engramfiles) of filesystems - Data Reconstruction: Reconstruction of files from engrams with correction store
- VSA Operations: Bundle, bind, and other vector symbolic operations on sparse ternary vectors
- Hierarchical Encoding: Multi-level chunking for handling larger datasets
- SIMD Support: Optional AVX2/NEON optimizations (2-4x speedup on supported hardware)
- CLI Tool: Command-line interface for ingest, extract, query, and update operations
- Incremental Updates: Add, remove, modify files without full re-ingestion
- Test Coverage: 160+ integration tests covering core functionality
Known Limitations
The following limitations are documented based on test results:
- Large file reconstruction: Fidelity degrades for files over 1MB with default configuration
- Deep path encoding: Path depths beyond 20 levels may produce incorrect output
- Bind inverse: The bind inverse operation degrades for sparse key configurations
- Storage overhead: VSA encoding produces larger output than input (approximately 2-3x)
These limitations are inherent to the VSA encoding model and are documented in the test suite.
Experimental/In Development
- FUSE Filesystem: EmbrFS integration (partial implementation)
- Query Performance: Similarity search and retrieval (basic implementation)
- Large-Scale Testing: TB-scale validation (manual testing only)
Version History
- Comprehensive test suite (unit + integration + e2e + doc tests)
- Intelligent test runner with accurate counting and debug mode
- Dual versioning strategy for OS builds (LTS + nightly)
- Zero clippy warnings (29 fixes applied)
- Extended OS support: Debian 12 LTS, Debian Testing/Sid, Ubuntu 24.04 LTS, Ubuntu Devel/Rolling
- Native amd64 CI (required pre-merge check) + arm64 ready for self-hosted runners
- Automated documentation with rustdoc and 9 doc tests
Core Concepts
Vector Symbolic Architecture (VSA)
Embeddenator uses sparse ternary vectors to represent data holographically:
- Bundle (⊕): Superposition operation for combining vectors
- Bind (⊙): Compositional operation with approximate self-inverse property
- Cosine Similarity: Measure of vector similarity for retrieval
The ternary representation {-1, 0, +1} enables efficient computation:
- 39-40 trits can be encoded in a 64-bit register
- Sparse representation reduces memory and computation requirements
- Based on balanced ternary arithmetic
Current Configuration:
- 10,000 dimensions with ~1% sparsity (~100-200 non-zero elements per vector)
- Provides balance between collision resistance and computational efficiency
- Higher dimensions and sparsity configurations are under investigation
Engrams
An engram is a holographic encoding of an entire filesystem or dataset:
- Single root vector containing superposition of all chunks
- Codebook storing encoded vector representations of data chunks
- Manifest tracking file structure and metadata
Data Encoding: The codebook stores encoded vector representations of data chunks:
- Codebook is required for reconstruction
- Uses sparse ternary vectors for holographic superposition
- Supports deterministic encoding and decoding
Note: Embeddenator is an encoding method, not a security implementation. The codebook provides no cryptographic guarantees.
Quick Start
Installation
# From crates.io
cargo install embeddenator-core
# Or clone the repository
git clone https://github.com/tzervas/embeddenator-core.git
cd embeddenator-core
# Build with Cargo
cargo build --release
# Or use the orchestrator
python3 orchestrator.py --mode build --verbose
As a Library
# Cargo.toml
[dependencies]
embeddenator-core = "0.22"
// Rust imports use the lib name "embeddenator"
use embeddenator::prelude::*;
Basic Usage
# Ingest a directory into an engram
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v
# Extract from an engram
cargo run --release -- extract -e root.engram -m manifest.json -o ./output -v
# Query similarity
cargo run --release -- query -e root.engram -q ./test_file.txt -v
Using the Orchestrator
The orchestrator provides unified build, test, and deployment workflows:
# Quick start: build, test, and package everything
python3 orchestrator.py --mode full --verbose -i
# Run integration tests
python3 orchestrator.py --mode test --verbose
# Build Docker image
python3 orchestrator.py --mode package --verbose
# Display system info
python3 orchestrator.py --mode info
# Clean all artifacts
python3 orchestrator.py --mode clean
CLI Reference
Embeddenator provides the following commands for working with holographic engrams:
embeddenator --help
Get comprehensive help information:
# Show main help with examples
embeddenator --help
# Show detailed help for a specific command
embeddenator ingest --help
embeddenator extract --help
embeddenator query --help
embeddenator query-text --help
embeddenator bundle-hier --help
ingest - Create Holographic Engram
Process one or more files and/or directories and encode them into a holographic engram.
embeddenator ingest [OPTIONS] --input <PATH>...
Required:
-i, --input <PATH>... Input file(s) and/or directory(ies) to ingest
Options:
-e, --engram <FILE> Output engram file [default: root.engram]
-m, --manifest <FILE> Output manifest file [default: manifest.json]
-v, --verbose Enable verbose output with progress and statistics
-h, --help Print help information
Examples:
# Basic ingestion
embeddenator ingest -i ./myproject -e project.engram -m project.json
# Mix files and directories (repeat -i/--input)
embeddenator ingest -i ./src -i ./README.md -e project.engram -m project.json
# With verbose output
embeddenator ingest -i ~/Documents -e docs.engram -v
# Custom filenames
embeddenator ingest --input ./data --engram backup.engram --manifest backup.json
What it does:
- Recursively scans any input directories
- Ingests any input files directly
- Chunks files (4KB default)
- Encodes chunks using sparse ternary VSA
- Creates holographic superposition in root vector
- Saves engram (holographic data) and manifest (metadata)
extract - Reconstruct Files
Bit-perfect reconstruction of all files from an engram.
embeddenator extract [OPTIONS] --output-dir <DIR>
Required:
-o, --output-dir <DIR> Output directory for reconstructed files
Options:
-e, --engram <FILE> Input engram file [default: root.engram]
-m, --manifest <FILE> Input manifest file [default: manifest.json]
-v, --verbose Enable verbose output with progress
-h, --help Print help information
Examples:
# Basic extraction
embeddenator extract -e project.engram -m project.json -o ./restored
# With default filenames
embeddenator extract -o ./output -v
# From backup
embeddenator extract --engram backup.engram --manifest backup.json --output-dir ~/restored
What it does:
- Loads engram and manifest
- Reconstructs directory structure
- Algebraically unbinds chunks from root vector
- Writes bit-perfect copies of all files
- Preserves file hierarchy and metadata
query - Similarity Search
Compute cosine similarity between a query file and engram contents.
embeddenator query [OPTIONS] --query <FILE>
Required:
-q, --query <FILE> Query file or pattern to search for
Options:
-e, --engram <FILE> Engram file to query [default: root.engram]
--hierarchical-manifest <FILE> Optional hierarchical manifest (selective unfolding)
--sub-engrams-dir <DIR> Directory of `.subengram` files (used with --hierarchical-manifest)
--k <K> Top-k results to print for codebook/hierarchical search [default: 10]
-v, --verbose Enable verbose output with similarity details
-h, --help Print help information
Examples:
# Query similarity
embeddenator query -e archive.engram -q search.txt
# With verbose output
embeddenator query -e data.engram -q pattern.bin -v
# Using default engram
embeddenator query --query testfile.txt -v
What it does:
- Encodes query file using VSA
- Computes cosine similarity with engram
- Returns similarity score
If --hierarchical-manifest and --sub-engrams-dir are provided, it also runs a store-backed hierarchical query and prints the top hierarchical matches.
Similarity interpretation:
- >0.75: Strong match, likely contains similar content
- 0.3-0.75: Moderate similarity, some shared patterns
- <0.3: Low similarity, likely unrelated content
query-text - Similarity Search (Text)
Encode a literal text string as a query vector and run the same retrieval path as query.
embeddenator query-text -e root.engram --text "search phrase" --k 10
# With hierarchical selective unfolding:
embeddenator query-text -e root.engram --text "search phrase" \
--hierarchical-manifest hier.json --sub-engrams-dir ./sub_engrams --k 10
bundle-hier - Build Hierarchical Retrieval Artifacts
Build a hierarchical manifest and a directory of sub-engrams from an existing flat root.engram + manifest.json. This enables store-backed selective unfolding queries.
embeddenator bundle-hier -e root.engram -m manifest.json \
--out-hierarchical-manifest hier.json \
--out-sub-engrams-dir ./sub_engrams
# Optional: deterministically shard large nodes (bounds per-node indexing cost)
embeddenator bundle-hier -e root.engram -m manifest.json \
--max-chunks-per-node 2000 \
--out-hierarchical-manifest hier.json \
--out-sub-engrams-dir ./sub_engrams
Docker Usage (Experimental)
Note: Docker support is in development and may not be fully functional.
Build Tool Image
docker build -f Dockerfile.tool -t embeddenator-tool:latest .
Run in Container
# Ingest data
docker run -v $(pwd)/input_ws:/input -v $(pwd)/workspace:/workspace \
embeddenator-tool:latest \
ingest -i /input -e /workspace/root.engram -m /workspace/manifest.json -v
# Extract data
docker run -v $(pwd)/workspace:/workspace -v $(pwd)/output:/output \
embeddenator-tool:latest \
extract -e /workspace/root.engram -m /workspace/manifest.json -o /output -v
Test Coverage
Embeddenator has comprehensive test coverage:
- 160+ integration tests across 23 test suites
- 97.6% pass rate (166/170 tests passing)
- Test categories: Balanced ternary, codebook operations, VSA properties, error recovery, hierarchical operations, CLI integration
- Continuous testing: All core functionality verified with each build
Verified Capabilities
- Text file reconstruction: Byte-for-byte identical reconstruction verified
- Binary file recovery: Exact binary reconstruction tested
- VSA operations: Bundle, bind, and similarity operations tested
- Hierarchical encoding: Multi-level chunking verified
- Error recovery: Corruption and concurrency handling tested
In Development
- Large-scale testing: TB-scale datasets not yet fully validated
- Performance optimization: Benchmarking and tuning ongoing
- Security audit: Cryptographic properties under research
Architecture
Core Components
-
SparseVec: Sparse ternary vector implementation
pos: Indices with +1 valueneg: Indices with -1 value- Efficient operations: bundle, bind, cosine similarity
- Hardware-optimized: 39-40 trits per 64-bit register
-
EmbrFS: Holographic filesystem layer
- Chunked encoding (4KB default)
- Manifest for file metadata
- Codebook for chunk storage
-
CLI: Command-line interface
- Ingest: directory → engram
- Extract: engram → directory
- Query: similarity search
Architecture Decision Records (ADRs)
Comprehensive architectural documentation is available in docs/adr/:
-
ADR-001: Sparse Ternary VSA
- Core VSA design and sparse ternary vectors
- Balanced ternary mathematics and hardware optimization
- 64-bit register encoding (39-40 trits per register)
-
ADR-002: Multi-Agent Workflow System
-
ADR-003: Self-Hosted Runner Architecture
-
ADR-004: Holographic OS Container Design
- Configuration-driven builder for Debian/Ubuntu
- Dual versioning strategy (LTS + nightly)
- Package isolation capabilities
-
ADR-005: Hologram-Based Package Isolation
- Factoralization of holographic containers
- Balanced ternary encoding for compact representation
- Package-level granular updates
- Hardware optimization strategy for 64-bit CPUs
-
ADR-006: Dimensionality and Sparsity Scaling
- Scaling holographic space to TB-scale datasets
- Adaptive sparsity strategy (maintain constant computational cost)
- Performance analysis and collision probability projections
- Impact on 100% bit-perfect guarantee
- Deep operation resilience for factoralization
-
ADR-007: Codebook Security and Reversible Encoding
- VSA-as-a-lens cryptographic primitive
- Quantum-resistant encoding mechanism
- Mathematically trivial with key, impossible without
- Bulk encryption with selective decryption
- Integration with holographic indexing
See docs/adr/README.md for the complete ADR index.
File Format
Engram (.engram):
- Binary serialized format (bincode)
- Contains root SparseVec and codebook
- Self-contained holographic state
Manifest (.json):
- Human-readable file listing
- Chunk mapping and metadata
- Required for extraction
Development
API Documentation
Comprehensive API documentation is available:
# Generate and open documentation locally
cargo doc --open
# Or use the automated script
./generate_docs.sh
# View online (after publishing)
# https://docs.rs/embeddenator
The documentation includes:
- Module-level overviews with examples
- Function documentation with usage patterns
- 9 runnable doc tests demonstrating API usage
- VSA operation examples (bundle, bind, cosine)
Running Tests
# Recommended: everything Cargo considers testable (lib/bin/tests/examples/benches)
cargo test --workspace --all-targets
# Doc tests only
cargo test --doc
# Optimized build tests (useful before benchmarking)
cargo test --release --workspace --all-targets
# Feature-gated correctness/perf gates
cargo test --workspace --all-targets --features "bt-phase-2 proptest"
# Long-running/expensive tests are explicitly opt-in:
# - QA memory scaling (requires env var + ignored flag)
EMBEDDENATOR_RUN_QA_MEMORY=1 cargo test --features qa --test memory_scaled -- --ignored --nocapture
# - Multi-GB soak test (requires env var + ignored flag)
EMBEDDENATOR_RUN_SOAK=1 cargo test --release --features soak-memory --test soak_memory -- --ignored --nocapture
# Integration tests via orchestrator
python3 orchestrator.py --mode test --verbose
# Full test suite
python3 orchestrator.py --mode full --verbose
Notes:
- Seeing many tests marked as "ignored" during
cargo benchis expected: Cargo runs the unit test harness in libtest's--benchmode, which skips normal#[test]functions (it printsifor each). Usecargo test(commands above) to actually execute tests. cargo test --workspace --all-targetswill also compile/run Criterion benches in a fast "smoke" mode (they printTesting ... Success). This is intended to catch broken benches early.
CI/CD and Build Monitoring
The project uses separated CI/CD workflows for optimal performance and reliability:
# Test CI build locally with monitoring
./ci_build_monitor.sh linux/amd64 build 300
# Monitor for specific timeout (in seconds)
./ci_build_monitor.sh linux/amd64 full 900
CI Workflow Structure:
Three separate workflows eliminate duplication and provide clear responsibilities:
- ci-pre-checks.yml - Fast validation (fmt, clippy, unit tests, doc tests)
- ci-amd64.yml - Full AMD64 build and test (REQUIRED PRE-MERGE CHECK)
- ci-arm64.yml - ARM64 build and test (configured for self-hosted runners)
CI Features:
- Separated workflows prevent duplicate runs
- AMD64 workflow is a required status check - PRs cannot merge until it passes
- Parallel builds using all available cores
- Intelligent timeout management (15min tests, 10min builds, 30min total)
- Build artifact upload on failure
- Performance metrics reporting
- Automatic parallelization with
CARGO_BUILD_JOBS
Architecture Support:
| Architecture | Status | Runner Type | Trigger | Notes |
|---|---|---|---|---|
| amd64 (x86_64) | Production | GitHub-hosted (ubuntu-latest) | Every PR (required check) | Stable, 5-7min |
| arm64 (aarch64) | 🚧 Ready | Self-hosted (pending deployment) | Manual only | Will enable on merge to main |
ARM64 Deployment Roadmap:
- Phase 1: Root cause analysis completed - GitHub doesn't provide standard ARM64 runners
- Phase 2: Workflow configured for self-hosted runners with labels
["self-hosted", "linux", "ARM64"] - 🚧 Phase 3: Deploy self-hosted ARM64 infrastructure (in progress)
- ⏳ Phase 4: Manual testing and validation
- ⏳ Phase 5: Enable automatic trigger on merge to main only
Why Self-Hosted for ARM64?
- GitHub Actions doesn't provide standard hosted ARM64 runners
- Self-hosted provides native execution (no emulation overhead)
- Cost-effective for frequent builds
- Ready to deploy when infrastructure is available
See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.
Self-Hosted Runner Automation
Embeddenator includes a comprehensive Python-based automation system for managing GitHub Actions self-hosted runners with complete lifecycle management and multi-architecture support:
Features:
- Automated registration with short-lived tokens
- Complete lifecycle management (register → run → deregister)
- Configurable auto-deregistration after idle timeout
- Manual mode for persistent runners
- Multi-runner deployment support
- Multi-architecture support (x64, ARM64, RISC-V)
- QEMU emulation for cross-architecture runners
- Health monitoring and status reporting
- 🧹 Automatic cleanup of Docker resources
- ⚙️ Flexible configuration via .env file or CLI arguments
Supported Architectures:
- x64 (AMD64) - Native x86_64 runners
- ARM64 (aarch64) - ARM64 runners (native or emulated via QEMU)
- RISC-V (riscv64) - RISC-V runners (native or emulated via QEMU)
Quick Start:
# 1. Copy and configure environment file
cp .env.example .env
# Edit .env and set GITHUB_REPOSITORY and GITHUB_TOKEN
# 2. Run in auto mode (registers, starts, monitors, auto-deregisters when idle)
python3 runner_manager.py run
# 3. Or use manual mode (keeps running until stopped)
RUNNER_MODE=manual python3 runner_manager.py run
Multi-Architecture Examples:
# Deploy ARM64 runners on x86_64 hardware (with emulation, auto-detect runtime)
RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Deploy runners for all architectures
RUNNER_TARGET_ARCHITECTURES=x64,arm64,riscv64 RUNNER_COUNT=6 python3 runner_manager.py run
# Deploy with automatic QEMU installation (requires sudo)
RUNNER_EMULATION_AUTO_INSTALL=true RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Use specific emulation method (docker, podman, or qemu)
RUNNER_EMULATION_METHOD=podman RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Use Docker for emulation
RUNNER_EMULATION_METHOD=docker RUNNER_TARGET_ARCHITECTURES=arm64,riscv64 python3 runner_manager.py run
Individual Commands:
# Register runner(s)
python3 runner_manager.py register
# Start runner service(s)
python3 runner_manager.py start
# Monitor and manage lifecycle
python3 runner_manager.py monitor
# Check status
python3 runner_manager.py status
# Stop and deregister
python3 runner_manager.py stop
Advanced Usage:
# Deploy multiple runners
python3 runner_manager.py run --runner-count 4
# Custom labels
python3 runner_manager.py register --labels self-hosted,linux,ARM64,large
# Auto-deregister after 10 minutes of inactivity
RUNNER_IDLE_TIMEOUT=600 python3 runner_manager.py run
Configuration Options:
Key environment variables (see .env.example for full list):
GITHUB_REPOSITORY- Repository to register runners for (required)GITHUB_TOKEN- Personal access token with repo scope (required)RUNNER_MODE- Deployment mode:auto(default) ormanualRUNNER_IDLE_TIMEOUT- Auto-deregister timeout in seconds (default: 300)RUNNER_COUNT- Number of runners to deploy (default: 1)RUNNER_LABELS- Comma-separated runner labelsRUNNER_EPHEMERAL- Enable ephemeral runners (deregister after one job)RUNNER_TARGET_ARCHITECTURES- Target architectures:x64,arm64,riscv64(comma-separated)RUNNER_ENABLE_EMULATION- Enable QEMU emulation for cross-architecture (default: true)RUNNER_EMULATION_METHOD- Emulation method:auto,qemu,docker,podman(default: auto)RUNNER_EMULATION_AUTO_INSTALL- Auto-install QEMU if missing (default: false, requires sudo)
See .env.example for complete configuration documentation.
Deployment Modes:
-
Auto Mode (default): Runners automatically deregister after being idle for a specified timeout
- Perfect for cost optimization
- Ideal for CI/CD pipelines with sporadic builds
- Runners terminate when queue is empty
-
Manual Mode: Runners keep running until manually stopped
- Best for development environments
- Useful for persistent infrastructure
- Explicit control over runner lifecycle
See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.
Project Structure
embeddenator/
├── Cargo.toml # Rust dependencies
├── src/
│ └── main.rs # Complete implementation
├── tests/
│ ├── e2e_regression.rs # 6 E2E tests (includes critical engram modification test)
│ ├── integration_cli.rs # 7 integration tests
│ └── unit_tests.rs # 11 unit tests
├── Dockerfile.tool # Static binary packaging
├── Dockerfile.holographic # Holographic OS container
├── orchestrator.py # Unified build/test/deploy
├── runner_manager.py # Self-hosted runner automation entry point (NEW)
├── runner_automation/ # Runner automation package (NEW)
│ ├── __init__.py # Package initialization (v1.1.0)
│ ├── config.py # Configuration management
│ ├── github_api.py # GitHub API client
│ ├── installer.py # Runner installation
│ ├── runner.py # Individual runner lifecycle
│ ├── manager.py # Multi-runner orchestration
│ ├── emulation.py # QEMU emulation for cross-arch (NEW)
│ ├── cli.py # Command-line interface
│ └── README.md # Package documentation
├── .env.example # Runner configuration template (NEW)
├── ci_build_monitor.sh # CI hang detection and monitoring
├── generate_docs.sh # Documentation generation
├── .github/
│ └── workflows/
│ ├── ci-pre-checks.yml # Pre-build validation (every PR)
│ ├── ci-amd64.yml # AMD64 build (required for merge)
│ ├── ci-arm64.yml # ARM64 build (self-hosted, pending)
│ ├── build-holographic-os.yml# OS container builds
│ ├── build-push-images.yml # Multi-OS image pipeline
│ ├── nightly-builds.yml # Nightly bleeding-edge builds
│ └── README.md # Complete CI/CD documentation
├── input_ws/ # Example input (gitignored)
├── workspace/ # Build artifacts (gitignored)
└── README.md # This file
Contributing
We welcome contributions to Embeddenator! Here's how you can help:
Getting Started
- Fork the repository on GitHub
- Clone your fork locally:
git clone https://github.com/YOUR_USERNAME/embeddenator.git cd embeddenator - Create a feature branch:
git checkout -b feature/my-new-feature
Development Workflow
- Make your changes with clear, focused commits
- Add tests for new functionality:
- Unit tests in
src/modules - Integration tests in
tests/integration_*.rs - End-to-end tests in
tests/e2e_*.rs
- Unit tests in
- Run the full test suite:
# Run all Rust tests cargo test # Run integration tests via orchestrator python3 orchestrator.py --mode test --verbose # Run full validation suite python3 orchestrator.py --mode full --verbose - Check code quality:
# Run Clippy linter (zero warnings required) cargo clippy -- -D warnings # Format code cargo fmt # Check Python syntax python3 -m py_compile *.py - Test cross-platform (if applicable):
# Build Docker images docker build -f Dockerfile.tool -t embeddenator-tool:test . # Test on different architectures python3 orchestrator.py --platform linux/arm64 --mode test
Pull Request Guidelines
- Write clear commit messages describing what and why
- Reference issues in commit messages (e.g., "Fixes #123")
- Keep PRs focused - one feature or fix per PR
- Update documentation if you change CLI options or add features
- Ensure all tests pass before submitting
- Maintain code coverage - aim for >80% test coverage
Code Style
- Rust: Follow standard Rust conventions (use
cargo fmt) - Python: Follow PEP 8 style guide
- Comments: Document complex algorithms, especially VSA operations
- Error handling: Use proper error types, avoid
.unwrap()in library code
Areas for Contribution
We especially welcome contributions in these areas:
- 🔬 Performance optimizations for VSA operations
- Benchmarking tools and performance analysis
- Additional test cases covering edge cases
- Documentation improvements and examples
- 🐛 Bug fixes and error handling improvements
- Multi-platform support (Windows, macOS testing)
- New features (incremental updates, compression options, etc.)
Reporting Issues
When reporting bugs, please include:
- Embeddenator version (
embeddenator --version) - Operating system and architecture
- Rust version (
rustc --version) - Minimal reproduction steps
- Expected vs. actual behavior
- Relevant log output (use
--verboseflag)
Questions and Discussions
- Issues: Bug reports and feature requests
- Discussions: Questions, ideas, and general discussion
- Pull Requests: Code contributions with tests
Code of Conduct
- Be respectful and inclusive
- Provide constructive feedback
- Focus on the technical merits
- Help others learn and grow
Thank you for contributing to Embeddenator!
Advanced Usage
Custom Chunk Size
Modify chunk_size in EmbrFS::ingest_file for different trade-offs:
let chunk_size = 8192; // Larger chunks = better compression, slower reconstruction
Hierarchical Encoding
For very large datasets, implement multi-level engrams:
// Level 1: Individual files
// Level 2: Directory summaries
// Level 3: Root engram of all directories
Algebraic Operations
Combine multiple engrams:
let combined = engram1.root.bundle(&engram2.root);
// Now combined contains both datasets holographically
Troubleshooting
Out of Memory
Reduce chunk size or process files in batches:
# Process directories separately
for dir in input_ws/*/; do
cargo run --release -- ingest -i "$dir" -e "engrams/$(basename $dir).engram"
done
Reconstruction Mismatches
Verify manifest and engram are from the same ingest:
# Check manifest metadata
jq '.total_chunks' workspace/manifest.json
# Re-ingest if needed
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v
Performance Tips
- Use release builds:
cargo build --releaseis 10-100x faster - Enable SIMD acceleration: For query-heavy workloads, build with
--features simdandRUSTFLAGS="-C target-cpu=native"
See docs/SIMD_OPTIMIZATION.md for details on 2-4x query speedup# Build with SIMD optimizations RUSTFLAGS="-C target-cpu=native" cargo build --release --features simd - Batch processing: Ingest multiple directories separately for parallel processing
- SSD storage: Engram I/O benefits significantly from fast storage
- Memory: Ensure sufficient RAM for large codebooks (~100 bytes per chunk)
License
MIT License - see LICENSE file for details
References
Vector Symbolic Architectures (VSA)
- Vector Symbolic Architectures: Kanerva, P. (2009)
- Sparse Distributed Representations
- Holographic Reduced Representations (HRR)
Ternary Computing and Hardware Optimization
- Balanced Ternary - Wikipedia overview
- Ternary Computing - Historical and mathematical foundations
- Three-Valued Logic and Quantum Computing
- Optimal encoding: 39-40 trits in 64-bit registers (39 for signed, 40 for unsigned)
Architecture Documentation
- ADR-001: Sparse Ternary VSA - Core design and hardware optimization
- ADR-005: Hologram Package Isolation - Balanced ternary implementation
- Complete ADR Index - All architecture decision records
Use Cases and Applications
- Specialized AI Assistant Models - Architecture for deploying coding and research assistant LLMs with embeddenator-enhanced retrieval, multi-model parallel execution, and document-driven development workflows
Support
Getting Help
- Documentation: This README and built-in help (
embeddenator --help) - Issues: Report bugs or request features at https://github.com/tzervas/embeddenator/issues
- Discussions: Ask questions and share ideas at https://github.com/tzervas/embeddenator/discussions
- Examples: See
examples/directory (coming soon) for usage patterns
Common Questions
Q: What file types are supported?
A: All file types - text, binary, executables, images, etc. Embeddenator is file-format agnostic.
Q: Is the reconstruction really bit-perfect?
A: Yes, for files tested so far. We have 160+ tests verifying reconstruction accuracy. However, large-scale (TB) testing is still in progress.
Q: What's the project's development status?
A: This is alpha software (v0.20.0-alpha). Core functionality works and is tested, but APIs are unstable and not recommended for production use. See PROJECT_STATUS.md for details.
Q: Can I combine multiple engrams?
A: Yes! The bundle operation allows combining engrams. This is tested for basic cases but advanced algebraic operations are still experimental.
Q: What's the maximum data size?
A: Hierarchical encoding is designed for large datasets. Currently tested with MB-scale data; TB-scale testing is planned but not yet validated.
Q: How does this compare to compression?
A: Embeddenator is not primarily a compression tool. It creates holographic representations that enable algebraic operations on encoded data. Size characteristics vary by data type.
Reporting Issues
When reporting bugs, please include:
- Embeddenator version:
embeddenator --version - Operating system and architecture
- Rust version:
rustc --version - Minimal reproduction steps
- Expected vs. actual behavior
- Relevant log output (use
--verboseflag)
Documentation
Project Documentation
- PROJECT_STATUS.md - Complete status: what works, what's experimental, what's planned
- TESTING.md - Comprehensive testing guide and infrastructure documentation
- LICENSE - MIT License terms
Technical Documentation
- Component Architecture - Modular crate structure
- Local Development - Development environment setup
- ADR Index - Architecture Decision Records
API Documentation
# Generate and view API documentation
cargo doc --open
License: MIT - See LICENSE file for full text
Copyright: 2025-2026 Tyler Zervas tz-dev@vectorweight.com
Built with Rust and Vector Symbolic Architecture principles.
Dependencies
~19–38MB
~460K SLoC