Python API Reference

Complete reference for the Hexz Python package.

Installation

pip install hexz

Or build from source:

git clone https://github.com/Alethic-Systems/hexz.git
cd hexz
make develop

Opening Snapshots

The primary way to open snapshots is using hexz.open():

open

open(path: PathLike, *, mode: str = 'r', **options: Any) -> Union[Reader, Writer]

Open a Hexz snapshot for reading or writing.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to .hxz file. Supports local paths, HTTP/HTTPS URLs, and S3 URIs.	required
`mode`	`str`	'r' for reading, 'w' for writing	`'r'`
`**options`	`Any`	Additional options for Reader or Writer	`{}`

Keyword Arguments (Read Mode): cache_size (str): Block cache size (e.g., "512M", "1G", "2GB"). Default: ~4MB prefetch (bool): Enable background prefetching for sequential reads. Default: True s3_region (str): AWS region for S3 URLs endpoint_url (str): Custom S3 endpoint URL (for MinIO, Ceph, etc.) allow_restricted (bool): Allow connections to private/internal IPs. Default: False

Keyword Arguments (Write Mode): compression (str): Compression algorithm ('lz4' or 'zstd') block_size (int): Block size in bytes packing (str): Packing strategy ('fast', 'tight', etc.)

Returns:

Type	Description
`Union[Reader, Writer]`	Reader or Writer instance

Example

Read with default settings (cache_size=default, prefetch=True)

with hexz.open("data.hxz") as reader: ... data = reader.read(4096) ... chunk = reader.read(100, offset=0) # random access ...

Read with custom cache and prefetch disabled

with hexz.open("data.hxz", cache_size="2G", prefetch=False) as reader: ... data = reader.read(4096) ...

Write a new snapshot

with hexz.open("out.hxz", mode="w", packing="tight") as writer: ... writer.add("input.img")

Reading Snapshots

The Reader class is returned by hexz.open(path, mode='r') and provides methods for reading data:

Reader

High-level reader for Hexz snapshots with pythonic interface.

Provides a file-like interface with additional random access capabilities. Supports context managers, pickle serialization, and slice notation.

Example

with hexz.Reader("dataset.hxz") as reader: ... data = reader.read(4096) ... # Zero-copy into buffer ... buf = bytearray(4096) ... n = reader.read(buffer=buf) ... chunk = reader.read(100, offset=1000) # random access ... # Or slice notation ... chunk = reader[1000:1100]

size `property`

size: int

Total size of the snapshot in bytes.

metadata `property`

metadata: Metadata

File metadata (version, compression, etc.).

read

read(size: int = -1, *, offset: Optional[int] = None, buffer: Optional[Union[bytearray, memoryview]] = None) -> Union[bytes, int]

Read bytes or fill a buffer. Single method for stream and random access.

From current position (default) or at a specific offset. With a buffer, fills it and returns the number of bytes read.

Parameters:

Name	Type	Description	Default
`size`	`int`	Number of bytes to read (-1 for all remaining). Ignored when buffer is provided (then up to len(buffer) bytes are read).	`-1`
`offset`	`Optional[int]`	If given, read from this byte offset without moving the cursor. If None (default), read from current position and advance the cursor.	`None`
`buffer`	`Optional[Union[bytearray, memoryview]]`	If provided, fill this writable buffer and return bytes read (int). Use when reusing one buffer in a loop; combine with offset= for random access.	`None`

Returns:

Type	Description
`Union[bytes, int]`	If buffer is None: bytes. If buffer is provided: int (bytes read).

Example

data = reader.read(4096) chunk = reader.read(100, offset=1000) n = reader.read(buffer=buf) n = reader.read(buffer=buf, offset=0)

seek

seek(offset: int, whence: int = 0) -> int

Seek to a position in the file.

Parameters:

Name	Type	Description	Default
`offset`	`int`	Offset to seek to	required
`whence`	`int`	0 (absolute), 1 (relative), 2 (from end)	`0`

Returns:

Type	Description
`int`	New absolute position

tell

tell() -> int

Get current position in the file.

Returns:

Type	Description
`int`	Current byte offset

analyze

analyze() -> AnalysisReport

Analyze snapshot for deduplication statistics.

Returns:

Type	Description
`AnalysisReport`	AnalysisReport with dedup ratio and savings information

Example

with hexz.open("snapshot.hxz") as reader: ... report = reader.analyze() ... print(f"Dedup savings: {report.savings_percent:.1f}%")

Async Reading

For async/await support, use AsyncReader:

AsyncReader

Async reader for Hexz snapshots.

Use as an async context manager; the snapshot is opened when you enter the context.

Example

async with hexz.AsyncReader("dataset.hxz") as reader: ... data = await reader.read(4096) ... chunk = await reader.read(100, offset=0)

size

size() -> int

Size of the primary stream in bytes.

read `async`

read(size: Optional[int] = None, *, offset: Optional[int] = None) -> bytes

Read bytes. From current position (default) or at a specific offset.

Parameters:

Name	Type	Description	Default
`size`	`Optional[int]`	Number of bytes to read (None for all remaining).	`None`
`offset`	`Optional[int]`	If given, read from this byte offset without moving the cursor. If None (default), read from current position and advance the cursor.	`None`

Returns:

Type	Description
`bytes`	Bytes read from the snapshot

seek `async`

seek(offset: int, whence: int = 0) -> int

Seek to a position. Returns new position.

tell

tell() -> int

Current read position.

Writing Snapshots

The Writer class is returned by hexz.open(path, mode='w'):

Writer

High-level writer for creating Hexz snapshots with pythonic interface.

Provides a fluent API for building snapshots with automatic finalization via context managers.

Example

with hexz.Writer("output.hxz", compression="zstd") as writer: ... writer.add("disk.img") ... writer.add_metadata({"created": "2026-02-09"}) ... # Automatically finalized on exit

bytes_written `property`

bytes_written: int

Total bytes written so far.

add

add(source: Any, *, kind: Optional[str] = None) -> Writer

Add any source to the snapshot (fluent API).

Dispatches to specific methods based on source type: - str/Path: add_file() - bytes: add_bytes() - numpy array: add_array()

Parameters:

Name	Type	Description	Default
`source`	`Any`	Source to add (file path, bytes, array, etc.)	required
`kind`	`Optional[str]`	Optional hint about source type ("disk", "memory", etc.)	`None`

Returns:

Type	Description
`Writer`	Self for method chaining

add_file

add_file(path: PathLike, *, kind: Optional[str] = None, **kwargs: Any) -> Writer

Add a file to the snapshot.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to file	required
`kind`	`Optional[str]`	Optional kind hint ("disk", "memory", etc.)	`None`
`**kwargs`	`Any`	Extra arguments (e.g., name) - currently ignored	`{}`

Returns:

Type	Description
`Writer`	Self for method chaining

add_bytes

add_bytes(data: bytes, **kwargs: Any) -> Writer

Add raw bytes to the snapshot.

Parameters:

Name	Type	Description	Default
`data`	`bytes`	Bytes to add	required
`**kwargs`	`Any`	Extra arguments (e.g., name) - currently ignored	`{}`

Returns:

Type	Description
`Writer`	Self for method chaining

add_array

add_array(array: Any, *, offset: Optional[int] = None, name: Optional[str] = None, **kwargs: Any) -> Writer

Add a NumPy array to the snapshot.

Parameters:

Name	Type	Description	Default
`array`	`Any`	NumPy array to add	required
`offset`	`Optional[int]`	Optional byte offset (currently ignored)	`None`
`name`	`Optional[str]`	Optional name for the array (currently ignored)	`None`
`**kwargs`	`Any`	Extra arguments - currently ignored	`{}`

Returns:

Type	Description
`Writer`	Self for method chaining

Note

Current implementation converts array to bytes. Named arrays and metadata require Rust support (TODO).

add_xor_delta

add_xor_delta(data, base_offset: int, base_length: int, element_size: int = 1) -> Writer

Add XOR delta bytes against the parent snapshot.

Parameters:

Name	Type	Description	Default
`data`		Buffer-protocol object with the new tensor bytes.	required
`base_offset`	`int`	Byte offset of the base tensor in the parent.	required
`base_length`	`int`	Byte length of the base tensor in the parent.	required
`element_size`	`int`	Dtype width in bytes (e.g. 4 for float32). Used for byte-shuffle pre-processing that dramatically improves compression of XOR deltas.	`1`

Returns:

Type	Description
`Writer`	Self for method chaining.

add_xor_delta_from_buffers

add_xor_delta_from_buffers(data, base_data, element_size: int = 1) -> Writer

Add XOR delta from two explicit buffers (no parent file read).

Used when the parent's tensor is itself stored as an XOR delta, so the caller must reconstruct the actual parent bytes first.

Parameters:

Name	Type	Description	Default
`data`		Buffer with the new tensor bytes.	required
`base_data`		Buffer with the reconstructed parent tensor bytes.	required
`element_size`	`int`	Dtype width in bytes (e.g. 4 for float32).	`1`

Returns:

Type	Description
`Writer`	Self for method chaining.

add_metadata

add_metadata(metadata: Dict[str, Any]) -> Writer

Add custom metadata to the snapshot.

Parameters:

Name	Type	Description	Default
`metadata`	`Dict[str, Any]`	Dictionary of metadata	required

Returns:

Type	Description
`Writer`	Self for method chaining

write

write(data: bytes, *, offset: Optional[int] = None) -> int

Write bytes.

Parameters:

Name	Type	Description	Default
`data`	`bytes`	Bytes to write	required
`offset`	`Optional[int]`	Optional byte offset (currently ignored)	`None`

Returns:

Type	Description
`int`	Number of bytes written

tell

tell() -> int

Get current write position.

merge_overlay

merge_overlay(*, base: PathLike, overlay: PathLike, thin: bool = False) -> Writer

Merge a copy-on-write overlay with a base snapshot.

Parameters:

Name	Type	Description	Default
`base`	`PathLike`	Path to the base .hxz snapshot	required
`overlay`	`PathLike`	Path to the overlay data file	required
`thin`	`bool`	If True, create a thin snapshot that references the base	`False`

Returns:

Type	Description
`Writer`	Self for method chaining

Example

with hexz.Writer("merged.hxz") as writer: ... writer.merge_overlay(base="base.hxz", overlay="overlay.img") ...

Thin snapshot (references base for unmodified blocks)

with hexz.Writer("thin.hxz") as writer: ... writer.merge_overlay(base="base.hxz", overlay="overlay.img", thin=True)

finalize

finalize() -> None

Finalize the snapshot and write all metadata.

This must be called to complete snapshot creation. It: - Writes the master index - Updates the header - Flushes all buffers

Building Snapshots

build

build(source: PathLike, output: PathLike, *, profile: BuildProfile = 'generic', **overrides: Any) -> Metadata

Build a snapshot using a preset profile.

This is a convenience function that combines Writer configuration and common build patterns.

Parameters:

Name	Type	Description	Default
`source`	`PathLike`	Source file, directory, or data	required
`output`	`PathLike`	Output .hxz file path	required
`profile`	`BuildProfile`	Build profile to use	`'generic'`
`**overrides`	`Any`	Override any profile settings	`{}`

Returns:

Type	Description
`Metadata`	Metadata object with snapshot information

Example

ML dataset with defaults

meta = hexz.build("imagenet/", "imagenet.hxz", profile="ml") print(f"Compressed to {meta.size_compressed / 1e9:.1f} GB") ...

Archival with encryption

meta = hexz.build( ... "backup/", ... "backup.hxz", ... profile="archival", ... encrypt=True, ... password="secret", ... )

PROFILES `module-attribute`

PROFILES: Dict[BuildProfile, Dict[str, Any]] = {'ml': {'mode': 'fast', 'block_size': 128 * 1024, 'dedup': True, 'compression': 'lz4'}, 'eda': {'mode': 'balanced', 'block_size': 64 * 1024, 'dedup': True, 'compression': 'lz4'}, 'embedded': {'mode': 'tight', 'block_size': 32 * 1024, 'dedup': True, 'compression': 'zstd'}, 'generic': {'mode': 'balanced', 'block_size': 64 * 1024, 'dedup': True, 'compression': 'lz4'}, 'archival': {'mode': 'tight', 'block_size': 256 * 1024, 'dedup': True, 'compression': 'zstd'}}

Inspection & Verification

inspect

inspect(path: PathLike) -> Metadata

Inspect a Hexz snapshot and return structured metadata.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to .hxz file	required

Returns:

Type	Description
`Metadata`	Metadata object with snapshot information

Example

meta = hexz.inspect("snapshot.hxz") print(f"Version: {meta.version}") print(f"Compression: {meta.compression}") print(f"Size: {meta.primary_size:,} bytes") print(meta) # Human-readable output meta.print() # Same as above

verify

verify(path: PathLike, *, checksum: bool = True, structure: bool = True, public_key: Optional[PathLike] = None) -> bool

Verify snapshot integrity and optionally signature.

Performs structural validation, checksum verification, and optional cryptographic signature verification.

Parameters:

Name	Type	Description	Default
`path`	`PathLike`	Path to snapshot to verify	required
`checksum`	`bool`	Verify block checksums by reading entire file	`True`
`structure`	`bool`	Verify file structure (header and index)	`True`
`public_key`	`Optional[PathLike]`	Optional path to public key for signature verification	`None`

Returns:

Type	Description
`bool`	True if all checks pass, False otherwise

Example

Basic integrity check

valid = hexz.verify("snapshot.hxz") ...

With signature verification

valid = hexz.verify("snapshot.hxz", public_key="key.pub") if not valid: ... print("Snapshot verification failed!")

Python API Reference

Installation

Opening Snapshots

open

Read with default settings (cache_size=default, prefetch=True)

Read with custom cache and prefetch disabled

Write a new snapshot

Reading Snapshots

Reader

size property

metadata property

read

seek

tell

analyze

Async Reading

AsyncReader

size

read async

seek async

tell

Writing Snapshots

Writer

bytes_written property

add

add_file

add_bytes

add_array

add_xor_delta

add_xor_delta_from_buffers

add_metadata

write

tell

merge_overlay

Thin snapshot (references base for unmodified blocks)

finalize

Building Snapshots

build

ML dataset with defaults

Archival with encryption

PROFILES module-attribute

Inspection & Verification

inspect

verify

Basic integrity check

With signature verification

size `property`

metadata `property`

read `async`

seek `async`

bytes_written `property`

PROFILES `module-attribute`