Skip to content

Hexz Documentation

Hexz is a seekable, deduplicated archive format for ML model checkpoints. It reads safetensors and GGUF natively, chunks data at tensor boundaries, and stores fine-tuned models as XOR deltas against their base — so only what changed is written to disk.

Quick Navigation

I'm an ML Engineer

Goal: Store and load model checkpoints efficiently

  1. Start: Getting Started (10 min)
  2. Understand: Why Hexz for ML
  3. Deep dive: XOR Delta Compression
  4. Reference: Python API · CLI Reference

I'm a Contributor

Goal: Understand architecture and contribute

  1. Setup: CONTRIBUTING.md
  2. Architecture: System Architecture
  3. Roadmap: Development Roadmap
  4. Format: File Format Spec

Documentation Structure

This documentation follows the Diátaxis framework:

Quadrant Purpose
Tutorials Learn by doing — step-by-step from zero
How-To Guides Solve specific problems — practical recipes
Reference Look up details — API and command specs
Explanation Understand concepts — design and rationale

Tutorials

How-To Guides

ML Workflows: - Store Fine-tuned Models — Checkpoint chains, delta storage, parent references - Remote Access via S3 — Load tensors on demand from object storage - Performance Tuning — Block size, compression level, CDC vs fixed chunking

Reference

Explanation

ADRs


Key Concepts

.hxz archive

A .hxz file is an immutable, compressed archive with: - Block-level compression — random access without full decompression - BLAKE3 deduplication — identical blocks stored once, even across parent/child archives - Seekable 2-level index — O(log N) lookup for any byte offset - Tensor manifest — embedded map of tensor name → (offset, length, dtype, shape) for named-tensor access - Multiple backends — local disk, S3, or HTTP with byte-range requests

Tensor-level chunking

For safetensors and GGUF files, Hexz chunks at tensor boundaries rather than using content-defined chunking (CDC). The file header tells Hexz exactly where each tensor starts and ends — this is simpler than CDC, avoids the rolling-hash overhead, and means tensor-level deduplication is exact.

XOR delta compression

When storing a fine-tuned model against its base, Hexz aligns tensors by name and XORs corresponding raw byte buffers. The result is sparse low-magnitude data that zstd compresses well. See XOR Delta Compression for details.

Implementation status: Tensor-level chunking (Phase 2) is complete. XOR delta compression (Phase 3) is in development. See ROADMAP.md.


Installation

pip install hexz           # Python package
cargo install hexz-cli     # CLI tool

Community & Support

License

Apache License 2.0 — See LICENSE