2 unstable releases
| 0.2.0 | Dec 11, 2025 |
|---|---|
| 0.1.0 | Dec 6, 2025 |
#883 in Concurrency
Used in bz2zstd
52KB
507 lines
parallel_bzip2_decoder
A high-performance, parallel bzip2 decoder for Rust.
This crate provides a Bz2Decoder that implements std::io::Read, allowing you to decompress bzip2 files in parallel using multiple CPU cores. It is designed to work efficiently with both single-stream (standard) and multi-stream (e.g., pbzip2) bzip2 files by scanning for block boundaries and decompressing them concurrently.
Features
- Parallel Decompression: Utilizes
rayonto decompress blocks in parallel. - Standard API: Implements
std::io::Readfor easy integration. - Memory Mapped: Efficiently handles large files using memory mapping.
- Flexible: Supports opening files directly or working with in-memory buffers (via
Arc). - Full bzip2 format support: Handles both single-stream and multi-stream bzip2 files
- Error handling: Comprehensive error reporting with
anyhowintegration - Memory efficient: Bounded channels and buffer reuse to minimize memory usage
Usage
Add this to your Cargo.toml:
[dependencies]
parallel_bzip2_decoder = "0.1"
Decompressing a File
The easiest way to use parallel_bzip2_decoder is to use Bz2Decoder::open, which handles memory mapping internally:
use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;
fn main() -> anyhow::Result<()> {
let mut decoder = Bz2Decoder::open("input.bz2")?;
let mut buffer = Vec::new();
decoder.read_to_end(&mut buffer)?;
println!("Decompressed {} bytes", buffer.len());
Ok(())
}
Decompressing from Memory
If you already have the data in memory (e.g., an Arc<[u8]> or Arc<Mmap>), you can use Bz2Decoder::new:
use parallel_bzip2_decoder::Bz2Decoder;
use std::io::Read;
use std::sync::Arc;
fn main() -> anyhow::Result<()> {
let data: Vec<u8> = vec![/* ... bzip2 data ... */];
let data_arc = Arc::new(data);
let mut decoder = Bz2Decoder::new(data_arc);
let mut buffer = Vec::new();
decoder.read_to_end(&mut buffer)?;
Ok(())
}
Performance
parallel_bzip2_decoder scales linearly with the number of available CPU cores. It is significantly faster than standard single-threaded decoders for large files.
Benchmarking and Profiling
This crate includes comprehensive benchmarks and profiling tools:
- Decode benchmarks: Test decompression with various file sizes (1MB, 10MB, 50MB)
- Scanner benchmarks: Measure block scanning performance
- End-to-end benchmarks: Test the full decompression pipeline
- CPU profiling: Generate flamegraphs to identify performance bottlenecks
- Memory profiling: Track memory usage and detect leaks
Running Benchmarks
# Run all benchmarks
cargo bench
# Run specific benchmark suite
cargo bench --bench decode_benchmark
cargo bench --bench scanner_benchmark
cargo bench --bench e2e_benchmark
Profiling
# CPU profiling with flamegraphs
cd ../scripts
./profile_cpu.sh
# Memory profiling with valgrind
./profile_memory.sh
For detailed instructions, see BENCHMARKING.md.
API Stability
This crate follows semantic versioning. Breaking changes will only occur with major version updates.
License
MIT
Contributing
See the main repository's CONTRIBUTING.md for details on how to contribute.
Changelog
See CHANGELOG.md for a history of changes (when available).
Dependencies
~2.7–4MB
~65K SLoC