Root-Scoped Provider Summaries using Golomb Coded Sets (GCS) for efficient DHT lookups and cache management in P2P networks.
In decentralized networks like IPFS, BitTorrent, and other DHT-based P2P systems, finding content is expensive. Traditional approaches require:
- Broadcasting provider records for every piece of content to the entire DHT network, consuming massive bandwidth
- Flooding the network with discovery requests when searching for related content
- Storing individual provider records for millions of Content IDs (CIDs), overwhelming DHT nodes with storage and lookup overhead
Saorsa RSPS solves this by introducing hierarchical content organization with ultra-compact summaries:
When you store a large dataset (like a website, software repository, or media collection) in a P2P network, each file chunk gets its own CID. A typical website might have thousands of CIDs, a software repository tens of thousands. Publishing provider records for each CID individually to the DHT:
- Creates millions of DHT messages for large content
- Overwhelms DHT nodes with storage requirements
- Makes content discovery slow due to network-wide searches
- Wastes bandwidth with redundant provider advertisements
Instead of advertising individual CIDs, RSPS lets you:
- Group related content under a single "root CID" (like a directory, repository, or collection)
- Create a compact summary using Golomb Coded Sets that represents thousands of CIDs in just a few KB
- Advertise only the summary to the DHT, reducing messages by 1000x or more
- Enable fast batch discovery - one lookup tells you if ANY of thousands of CIDs might be available
- Content Distribution: Efficiently advertise that you host an entire website/app without flooding the DHT
- Software Repositories: Let peers discover if you have specific versions/packages without individual lookups
- Media Collections: Advertise entire albums, movie series, or dataset collections as single summaries
- Version Control: Organize git-like repositories with hierarchical content discovery
- Caching Networks: Smart cache admission - only cache content that's part of advertised collections
- 20-30% more compact than Bloom filters for the same false positive rate
- 1000x reduction in DHT provider record messages for large content collections
- Sub-millisecond membership testing for thousands of CIDs
- Minimal memory overhead - entire summaries fit in L1 cache
- Network-efficient serialization - summaries transport in single UDP packets
- Golomb Coded Sets: Space-efficient Content ID (CID) summaries with configurable false positive rates
- Root-anchored Cache: Cache admission policies anchored to root CIDs for hierarchical data organization
- TTL Management: Sophisticated time-to-live management with hit tracking and witness receipts
- VRF Pseudonyms: Verifiable Random Function pseudonyms for witness receipt systems
- Async/Await Support: Full async/await support with Tokio
- High Performance: Optimized for P2P DHT operations with minimal memory overhead
Add this to your Cargo.toml:
[dependencies]
saorsa-rsps = "0.1.0"use saorsa_rsps::{Rsps, RspsConfig, Cid};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let root_cid = [1u8; 32];
let cids = vec![
[2u8; 32],
[3u8; 32],
[4u8; 32],
];
let config = RspsConfig::default();
// Create RSPS for a root with associated CIDs
let rsps = Rsps::new(root_cid, 1, &cids, &config)?;
// Check if a CID might be under this root
let test_cid = [2u8; 32];
if rsps.contains(&test_cid) {
println!("CID might be under this root");
}
// Get digest for DHT advertisement
let digest = rsps.digest();
println!("RSPS digest: {:?}", digest);
Ok(())
}Efficient probabilistic data structure for representing sets with configurable false positive rates. Optimized for P2P networks where bandwidth and storage efficiency are critical.
Root-anchored cache policies that organize data hierarchically under root CIDs, with sophisticated TTL management based on hit frequency and witness receipts.
Advanced time-to-live management with:
- Base TTL for new entries
- TTL extension per cache hit
- TTL extension per witness receipt
- Temporal bucketing for receipt aggregation
VRF-based witness receipts for distributed verification and reputation systems in P2P networks.
RSPS (Root-Scoped Provider Summaries) organize content hierarchically under root CIDs, enabling efficient DHT lookups for complex data structures. Each RSPS contains:
- Root CID: The anchor point for hierarchical organization
- Epoch: Version/time identifier for cache invalidation
- GCS: Space-efficient summary of CIDs under this root
- Salt: Deterministic salt for GCS construction
- Metadata: Creation timestamp and configuration
This crate is designed for use in DHT-based or gossip-based networks where providers advertise summaries of content clustered under a root CID. A typical flow:
-
Producer builds RSPS
- Select a
root_cidand gather the set of child CIDs. - Build an
Rspswith an epoch representing the dataset version. - Serialize or extract the
digest()for lightweight advertisement in the DHT.
- Select a
-
Advertisement
- Publish either the RSPS bytes or its
digestkeyed byroot_cid+epochin your routing layer. - Peers cache the announcement, possibly with TTL heuristics matching their local policy.
- Publish either the RSPS bytes or its
-
Discovery
- A client looking for a
cidfirst fetches the RSPS for the relevantroot_cid/epoch. - Use
rsps.contains(&cid)to check probabilistic membership. - On positive result, proceed to fetch the content from providers under that root.
- A client looking for a
-
Cache integration
- Register an
RspswithRootAnchoredCacheto gate cache admission: only items in the RSPS are eligible. - Use the
TtlEngineto manage lifetimes based on hits and witness receipts.
- Register an
-
Epoch rotation
- When the dataset changes, increment
epochand republish a new RSPS. - Consumers prefer the highest known epoch and drop stale ones per policy.
- When the dataset changes, increment
RspsConfig.target_fprsets the target false positive rate. This crate uses Golomb–Rice coding internally, selecting ap = 2^kconsistent with the requested FPR.- Trade-offs:
- Lower FPR → larger RSPS, more CPU to encode/decode, less cache pollution.
- Higher FPR → smaller RSPS, faster, but occasional extra fetches.
Golomb coding represents non-negative integers as a quotient (unary-coded) and a remainder (binary-coded) with respect to a parameter p. When p is a power of two (p = 2^k), the scheme is called Golomb–Rice coding. We choose Rice coding for RSPS because:
- Simpler and faster decode: the remainder is exactly
k = log2(p)bits; no truncated-binary logic is needed, which keeps parsing branch-light and cache-friendly. - Deterministic footprint vs. FPR: we pick
k = ceil(log2(1 / target_fpr))so the parameter ties directly to the requested false-positive rate. - Good practical compression for hashed, near-uniform deltas: the sorted hashed values modulo
n * pproduce geometric-like deltas that Rice handles efficiently.
In this crate, we enforce Rice coding by deriving p as a power of two and validating p during decode. This provides predictable encode/decode behavior and avoids edge cases that arise with general Golomb parameters.
GolombCodedSet::to_bytes/from_bytesserialize the GCS; an RSPS can be reconstructed from its components across nodes.- For transport, include:
root_cid,epoch,salt, andgcs.to_bytes().
- Nodes can issue witness receipts for successful retrievals under a root to extend TTLs and inform reputation systems.
- Uses production-ready ed25519-dalek v2 for signatures and RFC 9381 ECVRF on ristretto255 for VRF pseudonyms.
- Implements domain separation to prevent cross-protocol attacks.
- Ed25519 signatures: Uses
ed25519-dalekv2.x, a battle-tested, misuse-resistant implementation - VRF pseudonyms: RFC 9381 compliant ECVRF on ristretto255 via
vrf-r255crate - Domain separation: All cryptographic operations use distinct domain prefixes:
- VRF inputs:
b"saorsa-rsps:vrf:v1:" - Witness signatures:
b"saorsa-rsps:witness:v1:"
- VRF inputs:
- Key hygiene: Secret keys are automatically zeroized on drop
- Separate key domains: Ed25519 and VRF keys are kept completely separate
- No panics: All cryptographic operations return
Resulttypes with proper error handling - Strict validation: Input validation for all key sizes and proof lengths
- Memory safety: Built with Rust 2024 edition,
#![forbid(unsafe_code)]in crypto module - Audit trail: Uses well-audited cryptographic libraries with extensive test coverage
- GCS uses Golomb–Rice coding (power-of-two parameter) to ensure decode correctness and performance
- VRF keys are separate from Ed25519 signature keys (different ciphersuites)
- Domain separation prevents attacks where signatures/proofs from one context are replayed in another
- All cryptographic operations are deterministic for the same inputs
Saorsa RSPS is optimized for:
- Memory Efficiency: GCS provides space-efficient CID summaries
- Network Efficiency: Minimal bandwidth for DHT advertisements
- Lookup Speed: Fast probabilistic membership testing
- Cache Effectiveness: Smart TTL management with hit tracking
- Built with Rust 2024 edition for memory safety
- No
unwrap()orexpect()in production code - Comprehensive error handling with
thiserror - Security audit workflow with
cargo-audit - Cryptographically secure random number generation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the AGPL-3.0 license. See LICENSE for details.
- saorsa-fec - Patent-free erasure coding
- saorsa-core - P2P networking foundation