4 releases (2 breaking)
| new 0.3.0 | Feb 15, 2026 |
|---|---|
| 0.2.1 | Jan 24, 2026 |
| 0.2.0 | Jan 24, 2026 |
| 0.1.0 | Nov 22, 2025 |
#2579 in Database interfaces
33KB
333 lines
Feather DB 🪶
Fast, lightweight context-aware vector database
Part of Hawky.ai - AI Native Digital Marketing OS
A fast, lightweight vector database built with C++ and HNSW (Hierarchical Navigable Small World) algorithm for approximate nearest neighbor search.
Features (v0.3.0)
- 🪶 Multimodal Pockets: Store Text, Visual, and Audio vectors in a single Entity ID.
- 🕸️ Contextual Graph: Native
link(source, target)support for modeling relationships. - 🧠 Living Context: Adaptive "Sticky Memory" decay—frequently accessed items stay fresh.
- 🚀 High Performance: Built with C++ and optimized HNSW algorithm (~0.05ms multimodal search).
- 🔍 Filtered Search: Domain-logic filtering (by type, source, tags) during HNSW search.
- 🐍 Python Integration: Native Python bindings with
FilterBuildersupport. - 🦀 Rust CLI: Enhanced CLI for metadata, linking, and filtered operations.
📖 Phase 3 Features Guide - Complete documentation for Multimodal & Graph capabilities.
Quick Start
Python Usage
import feather_db
import numpy as np
# Open or create a database
db = feather_db.DB.open("my_vectors.feather", dim=768)
# Add vectors
vector = np.random.random(768).astype(np.float32)
db.add(id=1, vec=vector)
# Search for similar vectors
query = np.random.random(768).astype(np.float32)
ids, distances = db.search(query, k=5)
print(f"Found {len(ids)} similar vectors")
for i, (id, dist) in enumerate(zip(ids, distances)):
print(f" {i+1}. ID: {id}, Distance: {dist:.4f}")
# Save the database
db.save()
### Context Usage (Phase 3)
```python
from feather_db import DB, Metadata, ContextType
# 1. Add Multimodal Data
db.add(id=100, vec=img_vec, modality="visual")
db.add(id=100, vec=txt_vec, modality="text") # Same ID!
# 2. Link Records (Graph)
db.link(source_id=100, target_id=999)
# 3. Search with Context
results = db.search(query_vec, k=5, modality="visual")
print(f"Linked to: {results[0].metadata.links}")
### C++ Usage
```cpp
#include "include/feather.h"
#include <vector>
int main() {
// Open database
auto db = feather::DB::open("my_vectors.feather", 768);
// Add a vector
std::vector<float> vec(768, 0.1f);
db->add(1, vec);
// Search
std::vector<float> query(768, 0.1f);
auto results = db->search(query, 5);
for (auto [id, distance] : results) {
std::cout << "ID: " << id << ", Distance: " << distance << std::endl;
}
return 0;
}
CLI Usage
# Create a new database
feather new my_db.feather --dim 768
# Add vectors from NumPy files
feather add my_db.feather 1 --npy vector1.npy
feather add my_db.feather 2 --npy vector2.npy
# Search for similar vectors
feather search my_db.feather --npy query.npy --k 10
Rust CLI
The CLI is available as a native binary for fast database management.
# Add with metadata
feather add --npy vector.npy --content "Hello world" --source "cli" my_db 123
# Search with filters
feather search --npy query.npy --type-filter 0 --source-filter "cli" my_db
Installation
Python Package (Recommended)
pip install feather-db
Build from Source
Prerequisites
- C++17 compatible compiler
- Python 3.8+ (for Python bindings)
- Rust 1.70+ (for CLI tool)
- pybind11 (for Python bindings)
Steps
-
Clone the repository
git clone <repository-url> cd feather -
Install Python Package
pip install . -
Build Rust CLI (Optional)
cd feather-cli cargo build --release
Architecture
Core Components
feather::DB: Main C++ class providing vector database functionality- HNSW Index: Hierarchical Navigable Small World algorithm for fast ANN search
- Binary Format: Custom storage format with magic number validation
- Multi-language Bindings: Python (pybind11) and Rust (FFI) interfaces
File Format
Feather uses a custom binary format:
[4 bytes] Magic number: 0x46454154 ("FEAT")
[4 bytes] Version: 1
[4 bytes] Dimension
[Records] ID (8 bytes) + Vector data (dim * 4 bytes)
Performance Characteristics
- Index Type: HNSW with L2 distance
- Max Elements: 1,000,000 (configurable)
- Construction Parameters: M=16, ef_construction=200
- Memory Usage: ~4 bytes per dimension per vector + index overhead
API Reference
Python API
feather_db.DB
DB.open(path: str, dim: int = 768): Open or create databaseadd(id: int, vec: np.ndarray): Add vector with IDsearch(query: np.ndarray, k: int = 5): Search k nearest neighborssave(): Persist database to diskdim(): Get vector dimension
C++ API
feather::DB
static std::unique_ptr<DB> open(path, dim): Factory methodvoid add(uint64_t id, const std::vector<float>& vec): Add vectorauto search(const std::vector<float>& query, size_t k): Search vectorsvoid save(): Save to disksize_t dim() const: Get dimension
CLI Commands
feather new <path> --dim <dimension>: Create new databasefeather add <db> <id> --npy <file>: Add vector from .npy filefeather search <db> --npy <query> --k <count>: Search similar vectors
Examples
Semantic Search with Embeddings
import feather_db
import numpy as np
# Create database for sentence embeddings
db = feather_db.DB.open("sentences.feather", dim=384)
# Add document embeddings
documents = [
"The quick brown fox jumps over the lazy dog",
"Machine learning is a subset of artificial intelligence",
"Vector databases enable semantic search capabilities"
]
for i, doc in enumerate(documents):
# Assume get_embedding() returns a 384-dim vector
embedding = get_embedding(doc)
db.add(i, embedding)
# Search for similar documents
query_embedding = get_embedding("What is machine learning?")
ids, distances = db.search(query_embedding, k=2)
for id, dist in zip(ids, distances):
print(f"Document: {documents[id]}")
print(f"Similarity: {1 - dist:.3f}\n")
Batch Processing
import feather_db
import numpy as np
db = feather_db.DB.open("large_dataset.feather", dim=512)
# Batch add vectors
batch_size = 1000
for batch_start in range(0, 100000, batch_size):
for i in range(batch_size):
vector_id = batch_start + i
vector = np.random.random(512).astype(np.float32)
db.add(vector_id, vector)
# Periodic save
if batch_start % 10000 == 0:
db.save()
print(f"Processed {batch_start + batch_size} vectors")
Performance Tips
- Batch Operations: Add vectors in batches and save periodically
- Memory Management: Consider vector dimension vs. memory usage trade-offs
- Search Parameters: Adjust
kparameter based on your precision/recall needs - File I/O: Use SSD storage for better performance with large databases
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
[Add your license information here]
Acknowledgments
Dependencies
~6MB
~113K SLoC