peirene

2 releases

0.1.1	Nov 7, 2025
0.1.0	Nov 7, 2025

#7 in #unidirectional

Custom license

135KB
652 lines

This crate provides a production-ready implementation of Luby Transform (LT) codes with secure verification—a rateless erasure coding framework applicable across distributed storage, reliable networking, and decentralized systems.

LT codes are rateless fountain codes that enable encoding arbitrary binary data into an unlimited stream of coded packets, any $k + O(\log k)$ of which can recover the original data. The SeF (Secure Fountain) extension adds cryptographic verification to detect and reject corrupted or maliciously-formed packets, making the scheme robust against adversarial environments.

Why LT/SeF?

LT/SeF codes excel in scenarios requiring:

Distributed Storage & Archival: Redundantly encode files across multiple unreliable nodes; recover data from any sufficient subset, tolerating node failures, packet loss, and Byzantine corruption
Reliable Multicast & Broadcasting: Broadcast large files to heterogeneous receivers over lossy channels; clients recover via any combination of received packets (no retransmission needed)
Peer-to-Peer Networks: Encode content for DHT/IPFS-like systems; peers reconstruct from any superset of received fragments
Wireless & Satellite Networks: Rateless transmission to mobile or intermittently-available nodes; adapts automatically to varying loss rates
Blockchain & Distributed Ledgers: Reduce full-node storage (100x–1000x compression) while enabling efficient bootstrap
Disaster Recovery & Censorship Resistance: Survive substantial data loss or targeted deletion; reconstruct from partial, scattered, or adversarial sources

Why LT/SeF Codes?

Challenge	Traditional Solution	LT/SeF Advantage
Storage Overhead	Replication (3x+)	Coded redundancy ($k/s$ ratio, tunable)
Network Efficiency	ARQ (retransmit on loss)	Rateless (stream codewords until success)
Computation	Reed-Solomon (O(k²))	XOR-based (O(k log k))
Adversarial Environments	Trust-all or trust-none	Cryptographic verification per packet
Adaptive Environments	Static rates	Unlimited codeword generation

This Crate

This Rust implementation provides:

Robust Soliton Distribution – Optimal degree distribution balancing recovery probability and encoding efficiency
Error-Resilient Peeling Decoder – Detects and rejects corrupted packets via configurable verification
Precomputed Distributions – $O(\log k)$ sampling via binary search on CDF
Parallel Encoding – Optional Rayon-based parallelization for large epochs
Parallel Decoding - WIP
Flexible Verification – SHA-256 by default; custom hash functions supported
Comprehensive Error Handling – Rich error types for debugging and resilience

Usage

Basic Encode-Decode

use peirene::{Block, SeFEncoder, SeFDecoder, SolitonParams};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create blocks
    let k = 100; // Epoch size
    let blocks: Vec<Block> = (0..k)
        .map(|i| Block::new(format!("Block {}", i).into_bytes(), None))
        .collect();

    // 2. Configure fountain codes
    let params = SolitonParams::new(k, 0.1, 0.1); // c=0.1, δ=0.1
    let encoder = SeFEncoder::new(params);

    // 3. Encode: generate droplets
    // Note: s must satisfy s >= k + sqrt(k)*ln²(k/δ)
    let s = 10000; // ~100x storage saving
    let droplets = encoder.encode(&blocks, s)?;
    println!("Generated {} droplets from {} blocks", droplets.len(), k);

    // 4. Extract header chain for verification
    let header_chain: Vec<_> = blocks
        .iter()
        .map(|b| (b.header_hash.clone(), b.merkle_root.clone()))
        .collect();

    // 5. Decode: recover original blocks
    let decoder = SeFDecoder::new(k, header_chain);
    let recovered = decoder.decode(droplets)?;
    println!("Successfully recovered {} blocks", recovered.len());

    // 6. Verify
    for (orig, recov) in blocks.iter().zip(recovered.iter()) {
        assert_eq!(orig.data, recov.data);
    }
    println!("✓ All blocks verified");

    Ok(())
}

Custom Merkle Implementation

use sef_codes::{Block, SeFEncoder};

fn custom_hash(data: &[u8]) -> Vec<u8> {
    // Use Blake3 or any other hash function
    use blake3;
    blake3::hash(data).as_bytes().to_vec()
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let blocks: Vec<Block> = (0..100)
        .map(|i| {
            let data = format!("Block {}", i).into_bytes();
            // Pass custom hash function
            Block::new(data, Some(custom_hash))
        })
        .collect();

    // Rest of encoding/decoding...
    Ok(())
}

Parallel Encoding

use sef_codes::{SeFEncoder, SolitonParams};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let k = 10000;
    let blocks = vec![/* ... */];
    let s = 100000; // Large scale

    let params = SolitonParams::new(k, 0.1, 0.1);
    let encoder = SeFEncoder::new(params);

    // Parallel encoding (if feature enabled)
    let droplets = encoder.encode_parallel(&blocks, s)?;
    println!("Parallel-encoded {} droplets", droplets.len());

    Ok(())
}

Handling Adversarial Droplets

use sef_codes::{Block, SeFEncoder, SeFDecoder, SolitonParams};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let k = 100;
    let blocks: Vec<Block> = (0..k)
        .map(|i| Block::new(format!("Block {}", i).into_bytes(), None))
        .collect();

    let params = SolitonParams::new(k, 0.1, 0.1);
    let encoder = SeFEncoder::new(params);
    let mut droplets = encoder.encode(&blocks, 1000)?;

    // Simulate adversary corrupting some droplets
    for i in 0..50 {
        droplets[i].data = vec![0xFF; droplets[i].data.len()];
    }

    let header_chain: Vec<_> = blocks
        .iter()
        .map(|b| (b.header_hash.clone(), b.merkle_root.clone()))
        .collect();

    let decoder = SeFDecoder::new(k, header_chain);
    let recovered = decoder.decode(droplets)?;

    // Decoder automatically rejected murky droplets via Merkle verification
    println!("✓ Decoded successfully despite {} corrupted droplets", 50);
    assert_eq!(recovered.len(), k);

    Ok(())
}

How many droplets do I need? (how to set $s$)

The bound equation

To recover all $k$ original packets with high (tunable) probability (roughly $1 - \delta$), you need:

$$ m \approx k + \sqrt{k} \cdot \ln^2(k/\delta) $$

where:

$k$: Number of original packets (epoch size)
$m$: Minimum packets to receive/store for reliable recovery
$\delta$: Acceptable failure probability (e.g., 0.1 = 10% chance of failure)
$\ln^2(k/\delta)$: Natural logarithm, squared

What This Means in Plain English

For small $k$ (e.g., $k=100$):

The $\sqrt{k} \cdot \ln^2(k/\delta)$ term dominates
You need 3–5x more packets than original data
Example: 100 original packets → ~400–500 encoded packets minimum

For large $k$ (e.g., $k=10,000$):

The overhead term grows much slower than $k$
You need ~1.3–1.5x more packets
Example: 10,000 original packets → ~13,000–15,000 encoded packets minimum

Intuition: Small datasets need high redundancy; large datasets achieve near-optimal compression.

How to Estimate $s$ in Your Code

use std::f64;

fn estimate_min_buckets(k: usize, delta: f64) -> usize {
    let overhead = (k as f64).sqrt() * (k as f64 / delta).ln().powi(2);
    (k as f64 + overhead).ceil() as usize
}

// Examples:
let m1 = estimate_min_buckets(100, 0.1);     // ~380
let m2 = estimate_min_buckets(500, 0.1);     // ~2,122
let m3 = estimate_min_buckets(10000, 0.1);   // ~35,000

Practical Guidelines

Scenario	$k$	$\delta$	Min $s$	Overhead	Use Case
Small/test	10	0.1	~60	6x	Experimentation
Medium	100	0.1	~380	3.8x	Distributed storage
Large	1,000	0.1	~4,000	4x	Production systems
Very large	10,000	0.1	~35,000	3.5x	High-scale deployment
High reliability	1,000	0.01	~8,000	8x	Mission-critical

Key Insight

The overhead is NOT "slightly more than $k$" as textbooks or papers suggest. For practical sizes:

$k < 1,000$: Expect 3–6x overhead (must generate 3–6x more packets)
This is the price you pay for ratelessness and adversarial resilience
Compression savings come from storing a fraction of packets per node, not total network overhead

In Your Code

The encoder automatically validates this bound:

let params = SolitonParams::new(k, 0.1, 0.1);
let encoder = SeFEncoder::new(params);

// This will error if s is too small:
let droplets = encoder.encode(&blocks, s)?;
// Err: InsufficientDroplets { provided: 1000, required: 4000 }

References

"SeF: A Secure Fountain Architecture for Slashing Storage Costs in Blockchains"
Swanand Kadhe, Jichan Chung, and Kannan Ramchandran (UC Berkeley)
[arXiv:1906.12140v1 cs.CR](https://arxiv.org/abs/1906.12140v1)

LT Codes, M. Luby The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002.

Dependencies

~1–1.4MB
~30K SLoC