Hexz Documentation

Hexz is a seekable, deduplicated archive format for ML model checkpoints. It reads safetensors and GGUF natively, chunks data at tensor boundaries, and stores fine-tuned models as XOR deltas against their base — so only what changed is written to disk.

I'm an ML Engineer

Goal: Store and load model checkpoints efficiently

Start: Getting Started (10 min)
Understand: Why Hexz for ML
Deep dive: XOR Delta Compression
Reference: Python API · CLI Reference

I'm a Contributor

Goal: Understand architecture and contribute

Documentation Structure

This documentation follows the Diátaxis framework:

Quadrant	Purpose
Tutorials	Learn by doing — step-by-step from zero
How-To Guides	Solve specific problems — practical recipes
Reference	Look up details — API and command specs
Explanation	Understand concepts — design and rationale

Tutorials

Getting Started — Pack your first model and load it back in 10 minutes

How-To Guides

ML Workflows: - Store Fine-tuned Models — Checkpoint chains, delta storage, parent references - Remote Access via S3 — Load tensors on demand from object storage - Performance Tuning — Block size, compression level, CDC vs fixed chunking

Reference

Python API Reference — Complete Python API (hexz.checkpoint, hexz.open, etc.)
CLI Command Reference — hexz store, hexz extract, hexz diff, etc.
Tensor Format Support — Safetensors and GGUF format details
File Format Specification — .hxz binary format
Compression Algorithms — lz4, zstd, XOR delta
Version Compatibility — Python/PyTorch version matrix

Explanation

System Architecture — How Hexz works internally
Why Hexz for ML — Problem, solution, honest tradeoffs
XOR Delta Compression — The delta algorithm explained
Deduplication Deep Dive — BLAKE3, FastCDC, block dedup
Block vs File Compression — Why block-level compression enables random access
Zero-Copy I/O — Buffer protocol and memoryview paths

ADRs

Key Concepts

.hxz archive

A .hxz file is an immutable, compressed archive with: - Block-level compression — random access without full decompression - BLAKE3 deduplication — identical blocks stored once, even across parent/child archives - Seekable 2-level index — O(log N) lookup for any byte offset - Tensor manifest — embedded map of tensor name → (offset, length, dtype, shape) for named-tensor access - Multiple backends — local disk, S3, or HTTP with byte-range requests

Tensor-level chunking

For safetensors and GGUF files, Hexz chunks at tensor boundaries rather than using content-defined chunking (CDC). The file header tells Hexz exactly where each tensor starts and ends — this is simpler than CDC, avoids the rolling-hash overhead, and means tensor-level deduplication is exact.

XOR delta compression

When storing a fine-tuned model against its base, Hexz aligns tensors by name and XORs corresponding raw byte buffers. The result is sparse low-magnitude data that zstd compresses well. See XOR Delta Compression for details.

Implementation status: Tensor-level chunking (Phase 2) is complete. XOR delta compression (Phase 3) is in development. See ROADMAP.md.

Installation

pip install hexz           # Python package
cargo install hexz-cli     # CLI tool

Community & Support

GitHub: hexz-org/hexz
Issues: Report bugs or request features
Contributing: See CONTRIBUTING.md

License

Apache License 2.0 — See LICENSE