Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 0.5.0 changes to master #279

Merged
merged 73 commits into from
Jan 29, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
fa32338
currObj should be updated as the closest from all candidated.
orrorcol Jun 27, 2020
b6b338e
1. Replace the template interface searchKnn with virtual interface
orrorcol Jun 27, 2020
8987188
minor fix
orrorcol Jun 27, 2020
a35fcb5
Adding cassert include in header to fix compilation error on Ubuntu 1…
Jul 9, 2020
4a4689c
Small patch to enable compilation with sign_compare and reorder warni…
Jul 9, 2020
40f31da
Merge pull request #231 from jwimberley/cassert_beaver_gcc730_fix
yurymalkov Jul 14, 2020
ab012ae
Merge pull request #233 from jwimberley/gcc_warning_fixes
yurymalkov Jul 15, 2020
6f2c3fb
L2SqrI: add fallback if the dimension is not a multiple of 4
fabiencastan Aug 19, 2020
21b54fe
Merge pull request #243 from alicevision/dev/l2sqrI4x
yurymalkov Aug 31, 2020
cb7b398
New methods loadIndexFromStream and saveIndexToStream expose de-/seri…
dbespalov Oct 12, 2020
e161db8
Implement __getstate__ and __setstate__ to allow pickling of hnswlib.…
dbespalov Oct 12, 2020
e0eacad
Verify knn_query results match before/after pickling hnswlib.Index ob…
dbespalov Oct 12, 2020
ec4f4b1
add documeentation
dbespalov Oct 12, 2020
a3646cc
clean-up readme
dbespalov Oct 12, 2020
a1ba4e5
clean-up readme
dbespalov Oct 12, 2020
cf3846c
clean-up readme
dbespalov Oct 12, 2020
27471cd
clean-up readme
dbespalov Oct 12, 2020
4220956
Update bindings_test_pickle.py
dbespalov Oct 12, 2020
72b6501
Revert "New methods loadIndexFromStream and saveIndexToStream expose …
Oct 23, 2020
3a62b41
use python's buffer protocol to avoid making copies of ann data (stat…
Oct 23, 2020
fe6d2fa
replace tab characters with spaces
Oct 23, 2020
c9fb60d
test each space (ip/cosine/l2) as a separate unittest
Oct 23, 2020
3c4510d
return array_t pointers
dbespalov Oct 25, 2020
64c5154
expose static method of Index class as copy constructor in python
dbespalov Oct 25, 2020
7b445c8
do not waste space when returning serialized appr_alg->linkLists_
dbespalov Oct 25, 2020
c02f1dc
serialize element_lookup_ and element_level_ as array_t arrays; pass …
dbespalov Oct 26, 2020
1f25102
warn that serialization is not thread safe with add_items
dbespalov Nov 3, 2020
1165370
warn that serialization is not thread safe with add_items; add todo b…
dbespalov Nov 3, 2020
2c040e6
remove camel casing
dbespalov Nov 3, 2020
6298996
add static const int data member to class Index that stores serializa…
dbespalov Nov 6, 2020
c8276d8
add todo block to convert parameter tuple to dicts
dbespalov Nov 6, 2020
345f71d
add todo block to convert parameter tuple to dicts
dbespalov Nov 6, 2020
a64a001
Fixes of some typos in readme
dyashuni Nov 6, 2020
cee0e99
Merge pull request #251 from dbespalov/python_bindings_pickle_io
yurymalkov Nov 9, 2020
1c97b5d
Merge pull request #253 from dyashuni/patch-1
yurymalkov Nov 9, 2020
ec38db1
Rename space_name to space on the python side
Nov 18, 2020
a0c2076
Add gitignore file to ignore build folders
Nov 18, 2020
8cc442d
Merge pull request #255 from dyashuni/develop
yurymalkov Nov 22, 2020
ded26fc
use dict for Index serialization
dbespalov Nov 23, 2020
e845d8a
debugging; have to wrap state dict into a tuple
dbespalov Nov 25, 2020
6425deb
Move setup.py into root folder to fix bindings build when symlink doe…
Nov 28, 2020
376c8cd
Update gitignore
Nov 28, 2020
68a8a36
Update Makefile to clean tmp folder
Nov 28, 2020
19abf9b
Update readme
Nov 28, 2020
2799aab
clean assert error message
dbespalov Nov 30, 2020
4c002bc
fix compilation error on osx
dbespalov Nov 30, 2020
5b2585d
Revert symlink to hnswlib and add windows to build matrix
Dec 1, 2020
dda9b31
Fix symlink
Dec 1, 2020
b1994a5
Update travis
Dec 1, 2020
afd18d2
Update travis
Dec 1, 2020
334cc6c
Merge pull request #258 from dbespalov/python_bindings_state_dict
yurymalkov Dec 1, 2020
b4b7b86
Merge pull request #224 from uestc-lfs/fix-update-ep
yurymalkov Dec 7, 2020
6efa48c
Add symlink to setup.py instead of hnswlib
Dec 8, 2020
4cf279b
Merge remote-tracking branch 'upstream/develop' into fix-interface
orrorcol Dec 10, 2020
9fe639d
fix interface
orrorcol Dec 12, 2020
21c1ad7
minor fix
orrorcol Dec 13, 2020
52da3d2
Merge pull request #225 from uestc-lfs/fix-interface
yurymalkov Dec 14, 2020
21b908f
Update README.md
js1010 Jan 4, 2021
d2e5a18
Update README.md
js1010 Jan 4, 2021
6449e64
Merge pull request #273 from js1010/patch-2
yurymalkov Jan 5, 2021
6ae02a5
Run sift test from separate directory
Jan 6, 2021
6d3b29f
Merge pull request #260 from dyashuni/develop
yurymalkov Jan 7, 2021
68b6257
PEP-517 support
groodt Jan 10, 2021
e94c5dc
Simplify include_dirs
groodt Jan 10, 2021
467c98f
Remove deprecated `setup.py test`
groodt Jan 13, 2021
2248ab4
pybind11 isn't needed at runtime, only build time
groodt Jan 13, 2021
8fe02c0
Support for packaging sdist
groodt Jan 15, 2021
73134a7
https git clone in README
groodt Jan 15, 2021
a9153e9
Add license file to pypi package
Jan 16, 2021
65a5f28
Merge pull request #274 from groodt/groodt-pyproject-toml
yurymalkov Jan 17, 2021
215526c
Merge pull request #276 from dyashuni/develop
yurymalkov Jan 17, 2021
1469702
bump version
yurymalkov Jan 25, 2021
e03162b
Merge pull request #278 from nmslib/upd0.5
yurymalkov Jan 25, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
hnswlib.egg-info/
build/
dist/
tmp/
python_bindings/tests/__pycache__/
*.pyd
hnswlib.cpython*.so
hnswlib.egg-info/
build/
dist/
tmp/
python_bindings/tests/__pycache__/
*.pyd
hnswlib.cpython*.so
var/
5 changes: 2 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,8 @@ jobs:

install:
- |
pip install -r requirements.txt
python setup.py install
python -m pip install .

script:
- |
python setup.py test
python -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py"
7 changes: 4 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ pypi: dist

dist:
-rm dist/*
python3 setup.py sdist
pip install build
python3 -m build --sdist

test:
python3 setup.py test
python3 -m unittest discover --start-directory python_bindings/tests --pattern "*_test*.py"

clean:
rm -rf *.egg-info build dist tmp var tests/__pycache__ hnswlib.cpython*.so

.PHONY: dist
.PHONY: dist
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,8 +213,9 @@ print("Recall for two batches:", np.mean(labels.reshape(-1) == np.arange(len(dat
You can install from sources:
```bash
apt-get install -y python-setuptools python-pip
pip3 install pybind11 numpy setuptools
python3 setup.py install
git clone https://github.com/nmslib/hnswlib.git
cd hnswlib
pip install .
```

or you can install via pip:
Expand Down
9 changes: 9 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[build-system]
requires = [
"setuptools>=42",
"wheel",
"numpy>=1.10.0",
"pybind11>=2.0",
]

build-backend = "setuptools.build_meta"
14 changes: 6 additions & 8 deletions python_bindings/tests/bindings_test.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import os
import unittest

import numpy as np

import hnswlib


class RandomSelfTestCase(unittest.TestCase):
def testRandomSelf(self):
import hnswlib
import numpy as np

dim = 16
num_elements = 10000
Expand Down Expand Up @@ -41,7 +43,7 @@ def testRandomSelf(self):

# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data1, k=1)
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))),1.0,3)
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)

# Serializing and deleting the index:
index_path = 'first_half.bin'
Expand All @@ -61,10 +63,6 @@ def testRandomSelf(self):
# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data, k=1)

self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))),1.0,3)
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)

os.remove(index_path)


if __name__ == "__main__":
unittest.main()
9 changes: 4 additions & 5 deletions python_bindings/tests/bindings_test_getdata.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import unittest

import numpy as np

import hnswlib


class RandomSelfTestCase(unittest.TestCase):
def testGettingItems(self):
print("\n**** Getting the data by label test ****\n")
import hnswlib
import numpy as np

dim = 16
num_elements = 10000
Expand Down Expand Up @@ -42,6 +44,3 @@ def testGettingItems(self):
# After adding them, all labels should be retrievable
returned_items = p.get_items(labels)
self.assertSequenceEqual(data.tolist(), returned_items)

if __name__ == "__main__":
unittest.main()
240 changes: 118 additions & 122 deletions python_bindings/tests/bindings_test_labels.py
Original file line number Diff line number Diff line change
@@ -1,131 +1,127 @@
import os
import unittest

import numpy as np

class RandomSelfTestCase(unittest.TestCase):
def testRandomSelf(self):
for idx in range(16):
print("\n**** Index save-load test ****\n")
import hnswlib
import numpy as np

np.random.seed(idx)
dim = 16
num_elements = 10000

# Generating sample data
data = np.float32(np.random.random((num_elements, dim)))

# Declaring index
p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip

# Initing index
# max_elements - the maximum number of elements, should be known beforehand
# (probably will be made optional in the future)
#
# ef_construction - controls index search speed/build speed tradeoff
# M - is tightly connected with internal dimensionality of the data
# stronlgy affects the memory consumption

p.init_index(max_elements = num_elements, ef_construction = 100, M = 16)

# Controlling the recall by setting ef:
# higher ef leads to better accuracy, but slower search
p.set_ef(100)

p.set_num_threads(4) # by default using all available cores

# We split the data in two batches:
data1 = data[:num_elements // 2]
data2 = data[num_elements // 2:]

print("Adding first batch of %d elements" % (len(data1)))
p.add_items(data1)

# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data1, k=1)

items=p.get_items(labels)

# Check the recall:
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))),1.0,3)

# Check that the returned element data is correct:
diff_with_gt_labels=np.mean(np.abs(data1-items))
self.assertAlmostEqual(diff_with_gt_labels, 0, delta = 1e-4)

# Serializing and deleting the index.
# We need the part to check that serialization is working properly.

index_path = 'first_half.bin'
print("Saving index to '%s'" % index_path)
p.save_index(index_path)
print("Saved. Deleting...")
del p
print("Deleted")

print("\n**** Mark delete test ****\n")
# Reiniting, loading the index
print("Reiniting")
p = hnswlib.Index(space='l2', dim=dim)

print("\nLoading index from '%s'\n" % index_path)
p.load_index(index_path)
p.set_ef(100)

print("Adding the second batch of %d elements" % (len(data2)))
p.add_items(data2)

# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data, k=1)
items=p.get_items(labels)

# Check the recall:
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))),1.0,3)

# Check that the returned element data is correct:
diff_with_gt_labels=np.mean(np.abs(data-items))
self.assertAlmostEqual(diff_with_gt_labels, 0, delta = 1e-4) # deleting index.

# Checking that all labels are returned correctly:
sorted_labels=sorted(p.get_ids_list())
self.assertEqual(np.sum(~np.asarray(sorted_labels)==np.asarray(range(num_elements))),0)

# Delete data1
labels1, _ = p.knn_query(data1, k=1)

for l in labels1:
p.mark_deleted(l[0])
labels2, _ = p.knn_query(data2, k=1)
items=p.get_items(labels2)
diff_with_gt_labels=np.mean(np.abs(data2-items))
self.assertAlmostEqual(diff_with_gt_labels, 0, delta = 1e-3) # console


labels1_after, _ = p.knn_query(data1, k=1)
for la in labels1_after:
for lb in labels1:
if la[0] == lb[0]:
self.assertTrue(False)
print("All the data in data1 are removed")
import hnswlib

# checking saving/loading index with elements marked as deleted
del_index_path = "with_deleted.bin"
p.save_index(del_index_path)
p = hnswlib.Index(space='l2', dim=dim)
p.load_index(del_index_path)
p.set_ef(100)

labels1_after, _ = p.knn_query(data1, k=1)
for la in labels1_after:
for lb in labels1:
if la[0] == lb[0]:
self.assertTrue(False)

os.remove(index_path)
os.remove(del_index_path)
class RandomSelfTestCase(unittest.TestCase):
def testRandomSelf(self):
for idx in range(16):
print("\n**** Index save-load test ****\n")

np.random.seed(idx)
dim = 16
num_elements = 10000

# Generating sample data
data = np.float32(np.random.random((num_elements, dim)))

if __name__ == "__main__":
unittest.main()
# Declaring index
p = hnswlib.Index(space='l2', dim=dim) # possible options are l2, cosine or ip

# Initing index
# max_elements - the maximum number of elements, should be known beforehand
# (probably will be made optional in the future)
#
# ef_construction - controls index search speed/build speed tradeoff
# M - is tightly connected with internal dimensionality of the data
# stronlgy affects the memory consumption

p.init_index(max_elements=num_elements, ef_construction=100, M=16)

# Controlling the recall by setting ef:
# higher ef leads to better accuracy, but slower search
p.set_ef(100)

p.set_num_threads(4) # by default using all available cores

# We split the data in two batches:
data1 = data[:num_elements // 2]
data2 = data[num_elements // 2:]

print("Adding first batch of %d elements" % (len(data1)))
p.add_items(data1)

# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data1, k=1)

items=p.get_items(labels)

# Check the recall:
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data1))), 1.0, 3)

# Check that the returned element data is correct:
diff_with_gt_labels=np.mean(np.abs(data1-items))
self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4)

# Serializing and deleting the index.
# We need the part to check that serialization is working properly.

index_path = 'first_half.bin'
print("Saving index to '%s'" % index_path)
p.save_index(index_path)
print("Saved. Deleting...")
del p
print("Deleted")

print("\n**** Mark delete test ****\n")
# Reiniting, loading the index
print("Reiniting")
p = hnswlib.Index(space='l2', dim=dim)

print("\nLoading index from '%s'\n" % index_path)
p.load_index(index_path)
p.set_ef(100)

print("Adding the second batch of %d elements" % (len(data2)))
p.add_items(data2)

# Query the elements for themselves and measure recall:
labels, distances = p.knn_query(data, k=1)
items=p.get_items(labels)

# Check the recall:
self.assertAlmostEqual(np.mean(labels.reshape(-1) == np.arange(len(data))), 1.0, 3)

# Check that the returned element data is correct:
diff_with_gt_labels=np.mean(np.abs(data-items))
self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-4) # deleting index.

# Checking that all labels are returned correctly:
sorted_labels=sorted(p.get_ids_list())
self.assertEqual(np.sum(~np.asarray(sorted_labels) == np.asarray(range(num_elements))), 0)

# Delete data1
labels1, _ = p.knn_query(data1, k=1)

for l in labels1:
p.mark_deleted(l[0])
labels2, _ = p.knn_query(data2, k=1)
items=p.get_items(labels2)
diff_with_gt_labels = np.mean(np.abs(data2-items))
self.assertAlmostEqual(diff_with_gt_labels, 0, delta=1e-3) # console

labels1_after, _ = p.knn_query(data1, k=1)
for la in labels1_after:
for lb in labels1:
if la[0] == lb[0]:
self.assertTrue(False)
print("All the data in data1 are removed")

# checking saving/loading index with elements marked as deleted
del_index_path = "with_deleted.bin"
p.save_index(del_index_path)
p = hnswlib.Index(space='l2', dim=dim)
p.load_index(del_index_path)
p.set_ef(100)

labels1_after, _ = p.knn_query(data1, k=1)
for la in labels1_after:
for lb in labels1:
if la[0] == lb[0]:
self.assertTrue(False)

os.remove(index_path)
os.remove(del_index_path)
Loading