Skip to content

Python API Reference

Complete reference for the Hexz Python package.

Installation

pip install hexz

Or build from source:

git clone https://github.com/Alethic-Systems/hexz.git
cd hexz
make develop

Opening Snapshots

The primary way to open snapshots is using hexz.open():

open

open(path: PathLike, *, mode: str = 'r', **options: Any) -> Union[Reader, Writer]

Open a Hexz snapshot for reading or writing.

Parameters:

Name Type Description Default
path PathLike

Path to .hxz file. Supports local paths, HTTP/HTTPS URLs, and S3 URIs.

required
mode str

'r' for reading, 'w' for writing

'r'
**options Any

Additional options for Reader or Writer

{}

Keyword Arguments (Read Mode): cache_size (str): Block cache size (e.g., "512M", "1G", "2GB"). Default: ~4MB prefetch (bool): Enable background prefetching for sequential reads. Default: True s3_region (str): AWS region for S3 URLs endpoint_url (str): Custom S3 endpoint URL (for MinIO, Ceph, etc.) allow_restricted (bool): Allow connections to private/internal IPs. Default: False

Keyword Arguments (Write Mode): compression (str): Compression algorithm ('lz4' or 'zstd') block_size (int): Block size in bytes packing (str): Packing strategy ('fast', 'tight', etc.)

Returns:

Type Description
Union[Reader, Writer]

Reader or Writer instance

Example

Read with default settings (cache_size=default, prefetch=True)

with hexz.open("data.hxz") as reader: ... data = reader.read(4096) ... chunk = reader.read(100, offset=0) # random access ...

Read with custom cache and prefetch disabled

with hexz.open("data.hxz", cache_size="2G", prefetch=False) as reader: ... data = reader.read(4096) ...

Write a new snapshot

with hexz.open("out.hxz", mode="w", packing="tight") as writer: ... writer.add("input.img")


Reading Snapshots

The Reader class is returned by hexz.open(path, mode='r') and provides methods for reading data:

Reader

High-level reader for Hexz snapshots with pythonic interface.

Provides a file-like interface with additional random access capabilities. Supports context managers, pickle serialization, and slice notation.

Example

with hexz.Reader("dataset.hxz") as reader: ... data = reader.read(4096) ... # Zero-copy into buffer ... buf = bytearray(4096) ... n = reader.read(buffer=buf) ... chunk = reader.read(100, offset=1000) # random access ... # Or slice notation ... chunk = reader[1000:1100]

size property

size: int

Total size of the snapshot in bytes.

metadata property

metadata: Metadata

File metadata (version, compression, etc.).

read

read(size: int = -1, *, offset: Optional[int] = None, buffer: Optional[Union[bytearray, memoryview]] = None) -> Union[bytes, int]

Read bytes or fill a buffer. Single method for stream and random access.

From current position (default) or at a specific offset. With a buffer, fills it and returns the number of bytes read.

Parameters:

Name Type Description Default
size int

Number of bytes to read (-1 for all remaining). Ignored when buffer is provided (then up to len(buffer) bytes are read).

-1
offset Optional[int]

If given, read from this byte offset without moving the cursor. If None (default), read from current position and advance the cursor.

None
buffer Optional[Union[bytearray, memoryview]]

If provided, fill this writable buffer and return bytes read (int). Use when reusing one buffer in a loop; combine with offset= for random access.

None

Returns:

Type Description
Union[bytes, int]

If buffer is None: bytes. If buffer is provided: int (bytes read).

Example

data = reader.read(4096) chunk = reader.read(100, offset=1000) n = reader.read(buffer=buf) n = reader.read(buffer=buf, offset=0)

seek

seek(offset: int, whence: int = 0) -> int

Seek to a position in the file.

Parameters:

Name Type Description Default
offset int

Offset to seek to

required
whence int

0 (absolute), 1 (relative), 2 (from end)

0

Returns:

Type Description
int

New absolute position

tell

tell() -> int

Get current position in the file.

Returns:

Type Description
int

Current byte offset

analyze

analyze() -> AnalysisReport

Analyze snapshot for deduplication statistics.

Returns:

Type Description
AnalysisReport

AnalysisReport with dedup ratio and savings information

Example

with hexz.open("snapshot.hxz") as reader: ... report = reader.analyze() ... print(f"Dedup savings: {report.savings_percent:.1f}%")


Async Reading

For async/await support, use AsyncReader:

AsyncReader

Async reader for Hexz snapshots.

Use as an async context manager; the snapshot is opened when you enter the context.

Example

async with hexz.AsyncReader("dataset.hxz") as reader: ... data = await reader.read(4096) ... chunk = await reader.read(100, offset=0)

size

size() -> int

Size of the primary stream in bytes.

read async

read(size: Optional[int] = None, *, offset: Optional[int] = None) -> bytes

Read bytes. From current position (default) or at a specific offset.

Parameters:

Name Type Description Default
size Optional[int]

Number of bytes to read (None for all remaining).

None
offset Optional[int]

If given, read from this byte offset without moving the cursor. If None (default), read from current position and advance the cursor.

None

Returns:

Type Description
bytes

Bytes read from the snapshot

seek async

seek(offset: int, whence: int = 0) -> int

Seek to a position. Returns new position.

tell

tell() -> int

Current read position.


Writing Snapshots

The Writer class is returned by hexz.open(path, mode='w'):

Writer

High-level writer for creating Hexz snapshots with pythonic interface.

Provides a fluent API for building snapshots with automatic finalization via context managers.

Example

with hexz.Writer("output.hxz", compression="zstd") as writer: ... writer.add("disk.img") ... writer.add_metadata({"created": "2026-02-09"}) ... # Automatically finalized on exit

bytes_written property

bytes_written: int

Total bytes written so far.

add

add(source: Any, *, kind: Optional[str] = None) -> Writer

Add any source to the snapshot (fluent API).

Dispatches to specific methods based on source type: - str/Path: add_file() - bytes: add_bytes() - numpy array: add_array()

Parameters:

Name Type Description Default
source Any

Source to add (file path, bytes, array, etc.)

required
kind Optional[str]

Optional hint about source type ("disk", "memory", etc.)

None

Returns:

Type Description
Writer

Self for method chaining

add_file

add_file(path: PathLike, *, kind: Optional[str] = None, **kwargs: Any) -> Writer

Add a file to the snapshot.

Parameters:

Name Type Description Default
path PathLike

Path to file

required
kind Optional[str]

Optional kind hint ("disk", "memory", etc.)

None
**kwargs Any

Extra arguments (e.g., name) - currently ignored

{}

Returns:

Type Description
Writer

Self for method chaining

add_bytes

add_bytes(data: bytes, **kwargs: Any) -> Writer

Add raw bytes to the snapshot.

Parameters:

Name Type Description Default
data bytes

Bytes to add

required
**kwargs Any

Extra arguments (e.g., name) - currently ignored

{}

Returns:

Type Description
Writer

Self for method chaining

add_array

add_array(array: Any, *, offset: Optional[int] = None, name: Optional[str] = None, **kwargs: Any) -> Writer

Add a NumPy array to the snapshot.

Parameters:

Name Type Description Default
array Any

NumPy array to add

required
offset Optional[int]

Optional byte offset (currently ignored)

None
name Optional[str]

Optional name for the array (currently ignored)

None
**kwargs Any

Extra arguments - currently ignored

{}

Returns:

Type Description
Writer

Self for method chaining

Note

Current implementation converts array to bytes. Named arrays and metadata require Rust support (TODO).

add_xor_delta

add_xor_delta(data, base_offset: int, base_length: int, element_size: int = 1) -> Writer

Add XOR delta bytes against the parent snapshot.

Parameters:

Name Type Description Default
data

Buffer-protocol object with the new tensor bytes.

required
base_offset int

Byte offset of the base tensor in the parent.

required
base_length int

Byte length of the base tensor in the parent.

required
element_size int

Dtype width in bytes (e.g. 4 for float32). Used for byte-shuffle pre-processing that dramatically improves compression of XOR deltas.

1

Returns:

Type Description
Writer

Self for method chaining.

add_xor_delta_from_buffers

add_xor_delta_from_buffers(data, base_data, element_size: int = 1) -> Writer

Add XOR delta from two explicit buffers (no parent file read).

Used when the parent's tensor is itself stored as an XOR delta, so the caller must reconstruct the actual parent bytes first.

Parameters:

Name Type Description Default
data

Buffer with the new tensor bytes.

required
base_data

Buffer with the reconstructed parent tensor bytes.

required
element_size int

Dtype width in bytes (e.g. 4 for float32).

1

Returns:

Type Description
Writer

Self for method chaining.

add_metadata

add_metadata(metadata: Dict[str, Any]) -> Writer

Add custom metadata to the snapshot.

Parameters:

Name Type Description Default
metadata Dict[str, Any]

Dictionary of metadata

required

Returns:

Type Description
Writer

Self for method chaining

write

write(data: bytes, *, offset: Optional[int] = None) -> int

Write bytes.

Parameters:

Name Type Description Default
data bytes

Bytes to write

required
offset Optional[int]

Optional byte offset (currently ignored)

None

Returns:

Type Description
int

Number of bytes written

tell

tell() -> int

Get current write position.

merge_overlay

merge_overlay(*, base: PathLike, overlay: PathLike, thin: bool = False) -> Writer

Merge a copy-on-write overlay with a base snapshot.

Parameters:

Name Type Description Default
base PathLike

Path to the base .hxz snapshot

required
overlay PathLike

Path to the overlay data file

required
thin bool

If True, create a thin snapshot that references the base

False

Returns:

Type Description
Writer

Self for method chaining

Example

with hexz.Writer("merged.hxz") as writer: ... writer.merge_overlay(base="base.hxz", overlay="overlay.img") ...

Thin snapshot (references base for unmodified blocks)

with hexz.Writer("thin.hxz") as writer: ... writer.merge_overlay(base="base.hxz", overlay="overlay.img", thin=True)

finalize

finalize() -> None

Finalize the snapshot and write all metadata.

This must be called to complete snapshot creation. It: - Writes the master index - Updates the header - Flushes all buffers


Building Snapshots

build

build(source: PathLike, output: PathLike, *, profile: BuildProfile = 'generic', **overrides: Any) -> Metadata

Build a snapshot using a preset profile.

This is a convenience function that combines Writer configuration and common build patterns.

Parameters:

Name Type Description Default
source PathLike

Source file, directory, or data

required
output PathLike

Output .hxz file path

required
profile BuildProfile

Build profile to use

'generic'
**overrides Any

Override any profile settings

{}

Returns:

Type Description
Metadata

Metadata object with snapshot information

Example

ML dataset with defaults

meta = hexz.build("imagenet/", "imagenet.hxz", profile="ml") print(f"Compressed to {meta.size_compressed / 1e9:.1f} GB") ...

Archival with encryption

meta = hexz.build( ... "backup/", ... "backup.hxz", ... profile="archival", ... encrypt=True, ... password="secret", ... )

PROFILES module-attribute

PROFILES: Dict[BuildProfile, Dict[str, Any]] = {'ml': {'mode': 'fast', 'block_size': 128 * 1024, 'dedup': True, 'compression': 'lz4'}, 'eda': {'mode': 'balanced', 'block_size': 64 * 1024, 'dedup': True, 'compression': 'lz4'}, 'embedded': {'mode': 'tight', 'block_size': 32 * 1024, 'dedup': True, 'compression': 'zstd'}, 'generic': {'mode': 'balanced', 'block_size': 64 * 1024, 'dedup': True, 'compression': 'lz4'}, 'archival': {'mode': 'tight', 'block_size': 256 * 1024, 'dedup': True, 'compression': 'zstd'}}

Inspection & Verification

inspect

inspect(path: PathLike) -> Metadata

Inspect a Hexz snapshot and return structured metadata.

Parameters:

Name Type Description Default
path PathLike

Path to .hxz file

required

Returns:

Type Description
Metadata

Metadata object with snapshot information

Example

meta = hexz.inspect("snapshot.hxz") print(f"Version: {meta.version}") print(f"Compression: {meta.compression}") print(f"Size: {meta.primary_size:,} bytes") print(meta) # Human-readable output meta.print() # Same as above

verify

verify(path: PathLike, *, checksum: bool = True, structure: bool = True, public_key: Optional[PathLike] = None) -> bool

Verify snapshot integrity and optionally signature.

Performs structural validation, checksum verification, and optional cryptographic signature verification.

Parameters:

Name Type Description Default
path PathLike

Path to snapshot to verify

required
checksum bool

Verify block checksums by reading entire file

True
structure bool

Verify file structure (header and index)

True
public_key Optional[PathLike]

Optional path to public key for signature verification

None

Returns:

Type Description
bool

True if all checks pass, False otherwise

Example

Basic integrity check

valid = hexz.verify("snapshot.hxz") ...

With signature verification

valid = hexz.verify("snapshot.hxz", public_key="key.pub") if not valid: ... print("Snapshot verification failed!")