Crate frankensearch_index

Expand description

Vector index storage and loading for frankensearch.

This crate implements the FSVI binary format reader/writer plus exact brute-force top-k vector search, with optional HNSW ANN acceleration.

§FSVI File Layout

All multi-byte integers are little-endian. The vector slab is 64-byte aligned for cache-line / SIMD friendliness.

┌───────────────────────────────────────────┐
│ Header (variable length)                  │
│   magic: b"FSVI"              (4 bytes)   │
│   version: u16                (2 bytes)   │
│   embedder_id_len: u16        (2 bytes)   │
│   embedder_id: [u8]           (variable)  │
│   embedder_revision_len: u16  (2 bytes)   │
│   embedder_revision: [u8]     (variable)  │
│   dimension: u32              (4 bytes)   │
│   quantization: u8            (1 byte)    │
│   reserved: [u8; 3]           (3 bytes)   │
│   record_count: u64           (8 bytes)   │
│   vectors_offset: u64         (8 bytes)   │
│   header_crc32: u32           (4 bytes)   │
├───────────────────────────────────────────┤
│ Record Table                              │
│   record_count × 16 bytes each:           │
│     doc_id_hash: u64          (8 bytes)   │
│     doc_id_offset: u32        (4 bytes)   │
│     doc_id_len: u16           (2 bytes)   │
│     flags: u16                (2 bytes)   │
├───────────────────────────────────────────┤
│ String Table                              │
│   Concatenated UTF-8 doc_id strings       │
├───────────────────────────────────────────┤
│ Padding (to 64-byte alignment)            │
├───────────────────────────────────────────┤
│ Vector Slab                               │
│   record_count × dimension × elem_size    │
│   (2 bytes/elem for f16, 4 for f32)       │
└───────────────────────────────────────────┘

Re-exports§

pub use mrl::MrlConfig;
pub use mrl::MrlSearchStats;
pub use quantization::ScalarQuantizer;
pub use search::PARALLEL_CHUNK_SIZE;
pub use search::PARALLEL_THRESHOLD;
pub use search::SearchParams;
pub use simd::cosine_similarity_f16;
pub use simd::dot_product_f16_f32;
pub use simd::dot_product_f32_f32;
pub use two_tier::TwoTierIndex;
pub use two_tier::TwoTierIndexBuilder;
pub use two_tier::VECTOR_INDEX_FALLBACK_FILENAME;
pub use two_tier::VECTOR_INDEX_FAST_FILENAME;
pub use two_tier::VECTOR_INDEX_QUALITY_FILENAME;
pub use wal::CompactionStats;
pub use wal::WalConfig;
pub use wal::wal_path_for;
pub use warmup::AdaptiveConfig;
pub use warmup::HeatMap;
pub use warmup::WarmUpConfig;
pub use warmup::WarmUpResult;
pub use warmup::WarmUpStrategy;

Modules§

mrl: Matryoshka Representation Learning (MRL) adaptive dimensionality at search time.
quantization: Scalar quantization for vector compression.
search: Brute-force top-k vector search over an opened crate::VectorIndex.
simd: Portable SIMD dot-product helpers for vector search.
two_tier: Two-tier index wrapper for fast and quality vector indices.
wal: Write-ahead log for incremental FSVI index updates.
warmup: Index warm-up and adaptive page prefaulting for memory-mapped FSVI indices.

Structs§

VacuumStats: Statistics returned by VectorIndex::vacuum.
VectorIndex
VectorIndexWriter
VectorMetadata: Parsed metadata from an FSVI file header.

Enums§

Quantization: Vector element quantization stored in the FSVI slab.

Constants§

FSVI_MAGIC: Magic bytes at the start of every FSVI file.
FSVI_VERSION: Supported FSVI format version.

Crate frankensearch_index

Crate frankensearch_index Copy item path

§FSVI File Layout

Re-exports§

Modules§

Structs§

Enums§

Constants§

Crate frankensearch_index