2 releases
| 0.1.1 | Nov 7, 2025 |
|---|---|
| 0.1.0 | Nov 7, 2025 |
#7 in #unidirectional
135KB
652 lines
This crate provides a production-ready implementation of Luby Transform (LT) codes with secure verification—a rateless erasure coding framework applicable across distributed storage, reliable networking, and decentralized systems.
LT codes are rateless fountain codes that enable encoding arbitrary binary data into an unlimited stream of coded packets, any $k + O(\log k)$ of which can recover the original data. The SeF (Secure Fountain) extension adds cryptographic verification to detect and reject corrupted or maliciously-formed packets, making the scheme robust against adversarial environments.
Why LT/SeF?
LT/SeF codes excel in scenarios requiring:
- Distributed Storage & Archival: Redundantly encode files across multiple unreliable nodes; recover data from any sufficient subset, tolerating node failures, packet loss, and Byzantine corruption
- Reliable Multicast & Broadcasting: Broadcast large files to heterogeneous receivers over lossy channels; clients recover via any combination of received packets (no retransmission needed)
- Peer-to-Peer Networks: Encode content for DHT/IPFS-like systems; peers reconstruct from any superset of received fragments
- Wireless & Satellite Networks: Rateless transmission to mobile or intermittently-available nodes; adapts automatically to varying loss rates
- Blockchain & Distributed Ledgers: Reduce full-node storage (100x–1000x compression) while enabling efficient bootstrap
- Disaster Recovery & Censorship Resistance: Survive substantial data loss or targeted deletion; reconstruct from partial, scattered, or adversarial sources
Why LT/SeF Codes?
| Challenge | Traditional Solution | LT/SeF Advantage |
|---|---|---|
| Storage Overhead | Replication (3x+) | Coded redundancy ($k/s$ ratio, tunable) |
| Network Efficiency | ARQ (retransmit on loss) | Rateless (stream codewords until success) |
| Computation | Reed-Solomon (O(k²)) | XOR-based (O(k log k)) |
| Adversarial Environments | Trust-all or trust-none | Cryptographic verification per packet |
| Adaptive Environments | Static rates | Unlimited codeword generation |
This Crate
This Rust implementation provides:
- Robust Soliton Distribution – Optimal degree distribution balancing recovery probability and encoding efficiency
- Error-Resilient Peeling Decoder – Detects and rejects corrupted packets via configurable verification
- Precomputed Distributions – $O(\log k)$ sampling via binary search on CDF
- Parallel Encoding – Optional Rayon-based parallelization for large epochs
- Parallel Decoding - WIP
- Flexible Verification – SHA-256 by default; custom hash functions supported
- Comprehensive Error Handling – Rich error types for debugging and resilience
Usage
Basic Encode-Decode
use peirene::{Block, SeFEncoder, SeFDecoder, SolitonParams};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1. Create blocks
let k = 100; // Epoch size
let blocks: Vec<Block> = (0..k)
.map(|i| Block::new(format!("Block {}", i).into_bytes(), None))
.collect();
// 2. Configure fountain codes
let params = SolitonParams::new(k, 0.1, 0.1); // c=0.1, δ=0.1
let encoder = SeFEncoder::new(params);
// 3. Encode: generate droplets
// Note: s must satisfy s >= k + sqrt(k)*ln²(k/δ)
let s = 10000; // ~100x storage saving
let droplets = encoder.encode(&blocks, s)?;
println!("Generated {} droplets from {} blocks", droplets.len(), k);
// 4. Extract header chain for verification
let header_chain: Vec<_> = blocks
.iter()
.map(|b| (b.header_hash.clone(), b.merkle_root.clone()))
.collect();
// 5. Decode: recover original blocks
let decoder = SeFDecoder::new(k, header_chain);
let recovered = decoder.decode(droplets)?;
println!("Successfully recovered {} blocks", recovered.len());
// 6. Verify
for (orig, recov) in blocks.iter().zip(recovered.iter()) {
assert_eq!(orig.data, recov.data);
}
println!("✓ All blocks verified");
Ok(())
}
Custom Merkle Implementation
use sef_codes::{Block, SeFEncoder};
fn custom_hash(data: &[u8]) -> Vec<u8> {
// Use Blake3 or any other hash function
use blake3;
blake3::hash(data).as_bytes().to_vec()
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let blocks: Vec<Block> = (0..100)
.map(|i| {
let data = format!("Block {}", i).into_bytes();
// Pass custom hash function
Block::new(data, Some(custom_hash))
})
.collect();
// Rest of encoding/decoding...
Ok(())
}
Parallel Encoding
use sef_codes::{SeFEncoder, SolitonParams};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let k = 10000;
let blocks = vec![/* ... */];
let s = 100000; // Large scale
let params = SolitonParams::new(k, 0.1, 0.1);
let encoder = SeFEncoder::new(params);
// Parallel encoding (if feature enabled)
let droplets = encoder.encode_parallel(&blocks, s)?;
println!("Parallel-encoded {} droplets", droplets.len());
Ok(())
}
Handling Adversarial Droplets
use sef_codes::{Block, SeFEncoder, SeFDecoder, SolitonParams};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let k = 100;
let blocks: Vec<Block> = (0..k)
.map(|i| Block::new(format!("Block {}", i).into_bytes(), None))
.collect();
let params = SolitonParams::new(k, 0.1, 0.1);
let encoder = SeFEncoder::new(params);
let mut droplets = encoder.encode(&blocks, 1000)?;
// Simulate adversary corrupting some droplets
for i in 0..50 {
droplets[i].data = vec![0xFF; droplets[i].data.len()];
}
let header_chain: Vec<_> = blocks
.iter()
.map(|b| (b.header_hash.clone(), b.merkle_root.clone()))
.collect();
let decoder = SeFDecoder::new(k, header_chain);
let recovered = decoder.decode(droplets)?;
// Decoder automatically rejected murky droplets via Merkle verification
println!("✓ Decoded successfully despite {} corrupted droplets", 50);
assert_eq!(recovered.len(), k);
Ok(())
}
How many droplets do I need? (how to set $s$)
The bound equation
To recover all $k$ original packets with high (tunable) probability (roughly $1 - \delta$), you need:
$$ m \approx k + \sqrt{k} \cdot \ln^2(k/\delta) $$
where:
- $k$: Number of original packets (epoch size)
- $m$: Minimum packets to receive/store for reliable recovery
- $\delta$: Acceptable failure probability (e.g., 0.1 = 10% chance of failure)
- $\ln^2(k/\delta)$: Natural logarithm, squared
What This Means in Plain English
For small $k$ (e.g., $k=100$):
- The $\sqrt{k} \cdot \ln^2(k/\delta)$ term dominates
- You need 3–5x more packets than original data
- Example: 100 original packets → ~400–500 encoded packets minimum
For large $k$ (e.g., $k=10,000$):
- The overhead term grows much slower than $k$
- You need ~1.3–1.5x more packets
- Example: 10,000 original packets → ~13,000–15,000 encoded packets minimum
Intuition: Small datasets need high redundancy; large datasets achieve near-optimal compression.
How to Estimate $s$ in Your Code
use std::f64;
fn estimate_min_buckets(k: usize, delta: f64) -> usize {
let overhead = (k as f64).sqrt() * (k as f64 / delta).ln().powi(2);
(k as f64 + overhead).ceil() as usize
}
// Examples:
let m1 = estimate_min_buckets(100, 0.1); // ~380
let m2 = estimate_min_buckets(500, 0.1); // ~2,122
let m3 = estimate_min_buckets(10000, 0.1); // ~35,000
Practical Guidelines
| Scenario | $k$ | $\delta$ | Min $s$ | Overhead | Use Case |
|---|---|---|---|---|---|
| Small/test | 10 | 0.1 | ~60 | 6x | Experimentation |
| Medium | 100 | 0.1 | ~380 | 3.8x | Distributed storage |
| Large | 1,000 | 0.1 | ~4,000 | 4x | Production systems |
| Very large | 10,000 | 0.1 | ~35,000 | 3.5x | High-scale deployment |
| High reliability | 1,000 | 0.01 | ~8,000 | 8x | Mission-critical |
Key Insight
The overhead is NOT "slightly more than $k$" as textbooks or papers suggest. For practical sizes:
- $k < 1,000$: Expect 3–6x overhead (must generate 3–6x more packets)
- This is the price you pay for ratelessness and adversarial resilience
- Compression savings come from storing a fraction of packets per node, not total network overhead
In Your Code
The encoder automatically validates this bound:
let params = SolitonParams::new(k, 0.1, 0.1);
let encoder = SeFEncoder::new(params);
// This will error if s is too small:
let droplets = encoder.encode(&blocks, s)?;
// Err: InsufficientDroplets { provided: 1000, required: 4000 }
References
"SeF: A Secure Fountain Architecture for Slashing Storage Costs in Blockchains"
Swanand Kadhe, Jichan Chung, and Kannan Ramchandran (UC Berkeley)
[arXiv:1906.12140v1 cs.CR](https://arxiv.org/abs/1906.12140v1)
LT Codes, M. Luby The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002.
Dependencies
~1–1.4MB
~30K SLoC
