Hexz Documentation
Hexz is a seekable, deduplicated archive format for ML model checkpoints. It reads safetensors and GGUF natively, chunks data at tensor boundaries, and stores fine-tuned models as XOR deltas against their base — so only what changed is written to disk.
Quick Navigation
I'm an ML Engineer
Goal: Store and load model checkpoints efficiently
- Start: Getting Started (10 min)
- Understand: Why Hexz for ML
- Deep dive: XOR Delta Compression
- Reference: Python API · CLI Reference
I'm a Contributor
Goal: Understand architecture and contribute
- Setup: CONTRIBUTING.md
- Architecture: System Architecture
- Roadmap: Development Roadmap
- Format: File Format Spec
Documentation Structure
This documentation follows the Diátaxis framework:
| Quadrant | Purpose |
|---|---|
| Tutorials | Learn by doing — step-by-step from zero |
| How-To Guides | Solve specific problems — practical recipes |
| Reference | Look up details — API and command specs |
| Explanation | Understand concepts — design and rationale |
Tutorials
- Getting Started — Pack your first model and load it back in 10 minutes
How-To Guides
ML Workflows: - Store Fine-tuned Models — Checkpoint chains, delta storage, parent references - Remote Access via S3 — Load tensors on demand from object storage - Performance Tuning — Block size, compression level, CDC vs fixed chunking
Reference
- Python API Reference — Complete Python API (
hexz.checkpoint,hexz.open, etc.) - CLI Command Reference —
hexz store,hexz extract,hexz diff, etc. - Tensor Format Support — Safetensors and GGUF format details
- File Format Specification —
.hxzbinary format - Compression Algorithms — lz4, zstd, XOR delta
- Version Compatibility — Python/PyTorch version matrix
Explanation
- System Architecture — How Hexz works internally
- Why Hexz for ML — Problem, solution, honest tradeoffs
- XOR Delta Compression — The delta algorithm explained
- Deduplication Deep Dive — BLAKE3, FastCDC, block dedup
- Block vs File Compression — Why block-level compression enables random access
- Zero-Copy I/O — Buffer protocol and memoryview paths
ADRs
- ADR-0001: Rust for Core Engine
- ADR-0002: Block-Level Compression
- ADR-0003: BLAKE3 + FastCDC Deduplication
- ADR-0004: Storage Backend Abstraction
- ADR-0005: PyO3 Python Bindings
Key Concepts
.hxz archive
A .hxz file is an immutable, compressed archive with:
- Block-level compression — random access without full decompression
- BLAKE3 deduplication — identical blocks stored once, even across parent/child archives
- Seekable 2-level index — O(log N) lookup for any byte offset
- Tensor manifest — embedded map of tensor name → (offset, length, dtype, shape) for named-tensor access
- Multiple backends — local disk, S3, or HTTP with byte-range requests
Tensor-level chunking
For safetensors and GGUF files, Hexz chunks at tensor boundaries rather than using content-defined chunking (CDC). The file header tells Hexz exactly where each tensor starts and ends — this is simpler than CDC, avoids the rolling-hash overhead, and means tensor-level deduplication is exact.
XOR delta compression
When storing a fine-tuned model against its base, Hexz aligns tensors by name and XORs corresponding raw byte buffers. The result is sparse low-magnitude data that zstd compresses well. See XOR Delta Compression for details.
Implementation status: Tensor-level chunking (Phase 2) is complete. XOR delta compression (Phase 3) is in development. See ROADMAP.md.
Installation
pip install hexz # Python package
cargo install hexz-cli # CLI tool
Community & Support
- GitHub: hexz-org/hexz
- Issues: Report bugs or request features
- Contributing: See CONTRIBUTING.md
License
Apache License 2.0 — See LICENSE