Expand description
Fast copy-forward compression for message threads.
Detects repeated substrings across messages and replaces them with references to earlier occurrences, reducing storage and bandwidth.
Copy-forward compression is particularly effective for:
- Chat logs and message threads with quoted replies
- Document version histories with incremental changes
- Any sequence of texts with repeated phrases or patterns
§Quick Start
§Rust
use copyforward::{exact, Config, CopyForward};
let messages = &["Hello world", "Hello world, how are you?"];
let compressed = exact(messages, Config::default());
// Render back to original text
let original = compressed.render_with(|_, _, _, text| text.to_string());
assert_eq!(original, messages);§Python
import copyforward
messages = ["Hello world", "Hello world, how are you?"]
# Exact mode (default) - perfect compression
cf = copyforward.CopyForward.from_texts(messages)
print(cf.compression_ratio())§Algorithm Selection
Choose between two optimized algorithms:
-
exact(): Perfect compression using binary search extension (O(n log m) time)- Best for: <1MB total text, when perfect compression is needed
- Finds optimal substring matches, never misses opportunities
-
approximate(): Fast compression with capped extension (~2x faster)- Best for: >1MB text, when speed matters more than perfect compression
- May split long references into multiple shorter ones
- Still achieves excellent compression ratios (typically 50-90% size reduction)
Re-exports§
pub use crate::core::Config;pub use crate::core::CopyForward;pub use crate::core::CopyForwardTokens;pub use crate::core::Segment;pub use crate::core::TokenSegment;
Modules§
- core
- fixture
- hashing
- Shared hashing utilities for polynomial rolling hashes used by hashed algorithms.
- tokenization
Structs§
- Approximate
- Text-mode wrapper for approximate algorithm routing through the token core.
- Exact
- Text-mode wrapper for exact algorithm routing through the token core.
Traits§
- Message
Like - Trait for types that can be used as message inputs, supporting both regular strings and None values.
- Token
Like - Trait for types that can be used as token inputs, supporting both regular tokens and None values.
Functions§
- approximate
- Create an approximate copy-forward compressor.
- approximate_
tokens - Create an approximate token-mode compressor over u32 token sequences.
- exact
- Create an exact copy-forward compressor.
- exact_
tokens - Create an exact token-mode compressor over u32 token sequences.
Type Aliases§
- Approximate
Tokens - Approximate copy-forward compression with capped extension.
- Exact
Tokens - Exact copy-forward compression for token sequences (u32 IDs).