Deduplicated checkpoint storage for ML model weights. Pack safetensors and GGUF files into seekable .hxz archives that chunk at tensor boundaries and store fine-tunes as XOR deltas against their base.
Documentation · PyPI · crates.io · Releases
pip install hexz # Python library
cargo install hexz-cli # CLI tool# Pack a safetensors or GGUF model
hexz store base-model.safetensors base-model.hxz
# Store a fine-tune — delta against the base (XOR compression)
hexz store finetuned.safetensors finetuned.hxz --base base-model.hxz
# Export back to safetensors
hexz extract finetuned.hxz finetuned-out.safetensors
# Extract a single tensor
hexz extract finetuned.hxz --tensor lm_head.weight
# Inspect: tensor list, shapes, storage stats
hexz inspect finetuned.hxz
# Compare two archives — which tensors changed
hexz diff base-model.hxz finetuned.hxz
# List archives in a directory with chain info and unique bytes on disk
hexz ls ./checkpoints/
# Sign and verify
hexz keygen
hexz sign --key private.key finetuned.hxz
hexz verify --key public.key finetuned.hxzRun hexz --help or hexz COMMAND --help for full usage.
import hexz.checkpoint as ckpt
# Convert a safetensors file — chunks at tensor boundaries
ckpt.convert("base-model.safetensors", "base-model.hxz")
# Store a fine-tune as XOR delta — only diffs stored
ckpt.convert("finetuned.safetensors", "finetuned.hxz", base="base-model.hxz")
# Export back to safetensors
ckpt.extract("finetuned.hxz", "finetuned-out.safetensors")
# Extract a single tensor to raw bytes
ckpt.extract("finetuned.hxz", tensor="lm_head.weight")import hexz.checkpoint as ckpt
# Save — stores XOR delta against parent
ckpt.save(model.state_dict(), "finetuned.hxz", parent="base-model.hxz")
# Load all tensors
state = ckpt.load("finetuned.hxz", device="cuda")
# Load only what you need — reads only those blocks from disk or S3
state = ckpt.load("finetuned.hxz", keys=["lm_head.weight", "embed_tokens.weight"])
# Inspect names and shapes without loading data
manifest = ckpt.manifest("finetuned.hxz")import hexz
# Only the requested blocks are downloaded
with hexz.open("s3://my-bucket/finetuned.hxz") as r:
data = r.read(length, offset=tensor_offset)When you pass --base base.hxz (CLI) or base= (Python):
- The safetensors header tells Hexz exactly where each tensor lives — no CDC rolling-hash scan needed
- For each tensor present in both files:
delta = base_tensor XOR fine_tensor - Fine-tuning perturbs weights across all layers without inserting or deleting bytes, so
deltais dense but low-magnitude — zstd handles this well - Tensors with no parent match (new adapter layers) are stored as-is
- Tensors byte-identical to the parent cost zero extra bytes via BLAKE3 block dedup
On load, Hexz reads the base tensor, decompresses the XOR delta, and XORs again to reconstruct. This is transparent to ckpt.load().
Note: XOR delta compression ratios on real model fine-tunes have not yet been benchmarked. The theoretical basis (ZipLLM, Hachiuma et al.) predicts significant savings; empirical numbers will be added once Phase 3 is complete. See ROADMAP.md.
50 fine-tunes of a 7B model (~14 GB each), stored against the same base:
| Approach | Storage |
|---|---|
| Raw file copies | ~700 GB |
| git-lfs | ~700 GB — tracks blobs, does not deduplicate content |
| DVC + S3 | ~700 GB — pointer tracking, not a content store |
| Hexz (XOR delta) | [UNTESTED — benchmark in progress] |
The CDC block dedup benchmark (validated) shows 92.4% deduplication on shifted data vs 0% for fixed-size blocks. See COMPETITIVE_COMPARISON.md for full benchmark details.
Licensed under Apache 2.0 or MIT, at your option.