2 releases
Uses new Rust 2024
| new 0.1.1 | Feb 22, 2026 |
|---|---|
| 0.1.0 | Feb 21, 2026 |
#654 in Database interfaces
616 downloads per month
Used in 10 crates
(3 directly)
4MB
81K
SLoC
fsqlite-mvcc
Multi-version concurrency control (MVCC) for fsqlite, enabling multiple concurrent writers with page-level versioning, serializable snapshot isolation (SSI), and cross-process coordination.
Overview
fsqlite-mvcc is the concurrency control layer that sits atop the pager, WAL, and B-tree crates. It implements page-level MVCC so that multiple transactions can read and write concurrently without global locking. Conflicts are detected at commit time using SSI validation and a First-Committer-Wins protocol. When possible, conflicting writes are resolved automatically through a deterministic merge ladder (intent replay, structured page patches, XOR deltas) instead of aborting.
The crate also provides the infrastructure for BEGIN CONCURRENT semantics, epoch-based reclamation (EBR) for safe garbage collection of old page versions, shared-memory coordination for cross-process operation, two-phase commit for multi-database transactions, time-travel snapshots, and differential privacy primitives.
Position in the dependency graph:
fsqlite-vfs
|
fsqlite-pager
/ \
fsqlite-wal fsqlite-btree
\ /
fsqlite-mvcc <-- you are here
|
fsqlite-core
Key Types
Transaction Lifecycle
TransactionManager-- Orchestrates begin/read/write/commit/abort for both serialized and concurrent modes.BeginKind-- Transaction modes:Deferred,Immediate,Exclusive,Concurrent.CommitResponse-- Result of the commit sequencer.Savepoint-- B-tree-level page state snapshot within a transaction.MvccError-- MVCC-specific error variants (busy, conflict, stale snapshot).
Core Runtime (Version Store)
VersionArena/VersionIdx-- Chunked arena allocator for page versions with generation-based ABA protection.VersionStore/ChainHeadTable-- Lock-free version chain storage. Each page has a chain of versions; readers walk the chain to find the version visible to their snapshot.InProcessPageLockTable-- Sharded page-level lock table for write conflict detection.CommitLog/CommitIndex/CommitRecord-- Append-only commit log mapping commit sequence numbers to transaction metadata.Transaction/TransactionState/TransactionMode-- Per-transaction runtime state.TxnManager/SerializedWriteMutex-- Transaction slot management and serialized-writer coordination.
BEGIN CONCURRENT
ConcurrentRegistry/ConcurrentHandle-- Registration and management of concurrent writer sessions.concurrent_commit_with_ssi-- Commit path for concurrent transactions with full SSI validation.validate_first_committer_wins-- First-Committer-Wins conflict check.ConcurrentSavepoint-- Savepoint support within concurrent transactions.
SSI Validation
SsiState-- Per-transaction SSI bookkeeping (read set, write set, dependency edges).ssi_validate_and_publish-- Core SSI validation: discovers rw-dependency edges and checks for dangerous structures.discover_incoming_edges/discover_outgoing_edges-- Dependency graph edge discovery.SsiAbortReason-- Why a transaction was aborted (rw-conflict, write-skew, etc.).
SSI Abort Policy
SsiEvidenceLedger/SsiDecisionCard-- Evidence-based abort policy using historical conflict data.select_victim-- Victim selection for multi-transaction conflicts using cost matrices.ConformalCalibrator-- Online calibration of abort thresholds using conformal prediction.
Conflict Resolution (Merge Ladder)
deterministic_rebase-- Replays intent-level operations against a newer base version.physical_merge/StructuredPagePatch-- Structured B-tree page diffing and merging.xor_delta/SparseXorDeltaObject-- Compact delta encoding for page versions using sparse XOR runs.
Witness System
WitnessSet/WitnessKey-- Tracks which pages a transaction read (read witnesses) and wrote (write witnesses) for SSI validation.HotWitnessIndex-- Cache-friendly index for high-frequency witness lookups.WitnessPublisher/ProofCarryingCommit-- Proof-carrying commit protocol where each commit publishes cryptographic evidence of its read/write set.witness_refinement-- Value-of-information-based refinement of coarse witness keys into precise ones.
Epoch-Based Reclamation (EBR) and GC
VersionGuardRegistry/VersionGuard-- Epoch-based reader registration for safe old-version reclamation.QsbrRegistry/QsbrHandle-- Quiescent-state-based reclamation (QSBR) for lock-free data structures.GcScheduler/gc_tick/prune_page_chain-- Background garbage collection of unreachable page versions.RcuCell/RcuPair/RcuTriple-- Read-copy-update primitives.
Concurrency Primitives
SeqLock-- Sequence lock for fast read-side access to small shared state.LeftRight-- Left-right concurrency primitive for wait-free reads.FlatCombiner-- Flat combining for serializing concurrent operations through a combiner thread.CacheAligned/TxnSlotArray/RecentlyCommittedReadersIndex-- Cache-line-aligned transaction slot management.
Shared Memory and IPC
SharedMemoryLayout/ShmSnapshot-- Cross-process shared memory layout for MVCC coordination.coordinator_ipc(module) -- Unix domain socket IPC for the write coordinator.WriteCoordinator/CoordinatorMode-- Centralized write coordination with spill-to-disk support for large write sets.
History Compression
CompressedPageHistory/compress_page_history-- Compresses version chains by merging independent operations.MergeCertificate/generate_merge_certificate-- Cryptographic certificates proving merge correctness.
Conflict Modeling
AmsSketch/NitroSketch-- Approximate sketches for estimating write-set overlap and conflict probability.birthday_conflict_probability_uniform/pairwise_conflict_probability-- Analytical conflict probability models.SpaceSavingSummary-- Heavy-hitter detection for hot pages.
Two-Phase Commit
TwoPhaseCoordinator/TwoPhaseState-- Coordinator for multi-database atomic commits.GlobalCommitMarker-- Durable commit markers for crash recovery of distributed transactions.
Time Travel
create_time_travel_snapshot/resolve_page_at_commit-- Read historical page versions at a given commit or timestamp.
Observability
CasMetricsSnapshot/SsiMetricsSnapshot/SnapshotReadMetricsSnapshot-- Atomic metric counters for CAS retries, SSI aborts, and snapshot read performance.BocpdMonitor-- Bayesian online changepoint detection for regime shifts in conflict rates.
Retry Policy
RetryController/RetryAction-- Adaptive retry policy using Gittins-index approximations and hazard models to decide between retry and fail-fast.
Differential Privacy
DpEngine/PrivacyBudget-- Differential privacy query engine with configurable noise mechanisms and privacy budget tracking.
Row ID Allocation
ConcurrentRowIdAllocator/LocalRowIdCache-- Lock-free row ID allocation with per-thread range reservations.
Usage
use fsqlite_mvcc::{
TransactionManager, BeginKind, CommitResponse,
Transaction, TransactionMode, TransactionState,
VersionStore, CommitLog, InProcessPageLockTable,
};
// The TransactionManager orchestrates the full lifecycle.
// In practice, it is constructed by fsqlite-core and wired to the
// pager, WAL, and B-tree layers. A simplified sketch:
// Begin a concurrent transaction.
// let txn = manager.begin(BeginKind::Concurrent)?;
// Read pages (recorded in the SSI read-witness set).
// let page = txn.read_page(page_no)?;
// Write pages (page locks acquired, write-witness recorded).
// txn.write_page(page_no, &new_data)?;
// Commit -- runs SSI validation, First-Committer-Wins, merge ladder.
// let response = manager.commit(txn)?;
Dependencies
fsqlite-types-- Shared type definitions (TxnId,CommitSeq,Snapshot,PageVersion,WitnessKey).fsqlite-error-- Unified error/result types.fsqlite-observability-- Global metric sinks.fsqlite-pager-- Page cache and pager traits.fsqlite-wal-- WAL frame operations and FEC repair symbols.crossbeam-epoch-- Epoch-based memory reclamation.parking_lot-- Fast mutexes and reader-writer locks.smallvec-- Stack-allocated small vectors.serde-- Serialization for snapshot export and IPC.xxhash-rust-- Fast hashing.blake3-- Cryptographic hashing for merge certificates and content addressing.nix-- Unix domain socket and process credential support.tracing-- Structured logging.
License
MIT
Dependencies
~29–46MB
~734K SLoC