The BigANN challenge aims to encourage the development of indexing data structures and search algorithms for practical variants of the Approximate Nearest Neighbor (ANN) or Vector search problem on commodity hardware.
This work in progress targets the high-dimensional sparse approximate nearest neighbor search problem using ray tracing primitives and is built into FAISS.
Edit env.sh to ensure the correct paths point to the libraries in your system, and then:
$ source env.sh
Put datasets in the data directory:
$ mkdir data
Unpack the siftsmall dataset into data (see below for source):
$ cd data
$ tar -xzvf siftsmall.tar.gz
To run the sparse ray tracing implementation:
$ cd bin
$ sh ./run.sh
To run other implementations (see bin):
$ make all
$ cd bin
$ sh ./run_<impl-name>.sh
For a larger dataset:
$ cd data
$ wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
$ tar -xzvf sift.tar.gz
-
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors [Paper]
-
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node [Paper] [Video] [Slides 1] [Slides 2]
-
Product Quantization for Nearest Neighbor Search [Paper]
-
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs [Paper]
-
Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations [Paper]