sparse-ann

Introduction

The BigANN challenge aims to encourage the development of indexing data structures and search algorithms for practical variants of the Approximate Nearest Neighbor (ANN) or Vector search problem on commodity hardware.

This work in progress targets the high-dimensional sparse approximate nearest neighbor search problem using ray tracing primitives and is built into FAISS.

Quick Start

Edit env.sh to ensure the correct paths point to the libraries in your system, and then:

$ source env.sh

Put datasets in the data directory:

$ mkdir data

Unpack the siftsmall dataset into data (see below for source):

$ cd data
$ tar -xzvf siftsmall.tar.gz

To run the sparse ray tracing implementation:

$ cd bin
$ sh ./run.sh

To run other implementations (see bin):

$ make all
$ cd bin
$ sh ./run_<impl-name>.sh

For a larger dataset:

$ cd data
$ wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
$ tar -xzvf sift.tar.gz

References

An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors [Paper]
Billion-scale Similarity Search with GPUs [Paper] [Code]
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node [Paper] [Video] [Slides 1] [Slides 2]
Product Quantization for Nearest Neighbor Search [Paper]
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs [Paper]
Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations [Paper]
big-ann-benchmarks
ann-benchmarks

More Datasets

Name	Link	# Datapoints	Dimensions	Format
DEEP 1M	https://github.com/johnpzh/iQAN_AE/blob/master/scripts/get.deep1m.sh	1,000,000	96	float32
DEEP 10M	https://github.com/johnpzh/iQAN_AE/blob/master/scripts/get.deep10m.sh	10,000,000	96	float32
DEEP 100M	https://github.com/johnpzh/iQAN_AE/blob/master/scripts/get.deep100m.sh	100,000,000	96	float32
DEEP 1B	https://www.tensorflow.org/datasets/catalog/deep1b	1,000,000,000	96	float32

SIFT small	http://corpus-texmex.irisa.fr	10,000	128	float32
SIFT 1M	http://corpus-texmex.irisa.fr	1,000,000	128	float32
SIFT 100M	https://github.com/johnpzh/iQAN_AE/blob/master/scripts/get.sift100m.sh	100,000,000	128	float32
SIFT 1B	http://corpus-texmex.irisa.fr	1,000,000,000	128	uint8

GIST1M	http://corpus-texmex.irisa.fr	1,000,000	960	float32

YFCC 10M		10,000,000	192	uint8
YFCC 100M	https://multimediacommons.wordpress.com/yfcc100m-core-dataset/	99,200,000	192	uint8
Yandex T2I 1B	https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search	1,000,000,000	200	float32

MS MARCO	https://microsoft.github.io/msmarco/	8,841,823	~30,000	float32
MS SPACEV 1B	https://github.com/microsoft/SPTAG/tree/main/datasets/SPACEV1B	1,402,020,720	100	float32
MS Turing 30M		30,000,000	100	float32

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bin		bin
include		include
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
beam_search.cpp		beam_search.cpp
bfs_cpu.cpp		bfs_cpu.cpp
bfs_gpu.cu		bfs_gpu.cu
brute_force_cpu.cpp		brute_force_cpu.cpp
brute_force_gpu.cu		brute_force_gpu.cu
common.mk		common.mk
env.sh		env.sh
eval_perf.cpp		eval_perf.cpp
ivf_flat_cpu.cpp		ivf_flat_cpu.cpp
ivf_flat_gpu.cu		ivf_flat_gpu.cu
kmeans_cpu.cpp		kmeans_cpu.cpp
kmeans_gpu.cu		kmeans_gpu.cu
quantized_search_cpu.cpp		quantized_search_cpu.cpp
sparse_rt_search_cpu.cpp		sparse_rt_search_cpu.cpp
test_ivf.py		test_ivf.py
test_pq.cc		test_pq.cc
test_pq.sh		test_pq.sh
tester.cpp		tester.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sparse-ann

Introduction

Quick Start

References

More Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sparse-ann

Introduction

Quick Start

References

More Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages