Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update hnswalg.h #324

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Update hnswalg.h #324

wants to merge 2 commits into from

Conversation

zeraph6
Copy link

@zeraph6 zeraph6 commented Jul 9, 2021

Collect_metrics@searchBaseLayer : We should not increment metric_distance_computation with the size of neighbors, since there may be some neighbors already visited.

yurymalkov and others added 2 commits June 29, 2021 22:17
Merge 0.5.2 changes into master
Collect_metrics@searchBaseLayer  : We should not increment metric_distance_computation with the size of neighbors, since there may be some neighbors already visited.
@yurymalkov
Copy link
Member

Hi @zeraph6. Thanks for the PR!
I wonder, have you tested if it affects the performance (e.g. due to doing branching on every step)?

@zeraph6
Copy link
Author

zeraph6 commented Jul 11, 2021 via email

@zeraph6
Copy link
Author

zeraph6 commented Jul 11, 2021

But the difference in the number of distance calculations is important (~5%)

@yurymalkov
Copy link
Member

Got it! I'll test it on small dim data and let you know.
Also, I wonder, were you using the statistics in the C++ interface?

@zeraph6
Copy link
Author

zeraph6 commented Jul 11, 2021 via email

@yurymalkov yurymalkov changed the base branch from master to develop August 1, 2021 02:29
@yurymalkov
Copy link
Member

@zeraph6 I've finally tested the change. Sorry for big delay. The change actually leads to a drop of performance for multi-threaded search (up to 2X on low-dim data). This is probably due to more frequent access to a shared variable which causes excessive locking.

There are several solutions for that, but the best is probably to accumulate metrics locally and increment them once search.
I wonder, can you update the PR with adding local increments (or a similar solution)?

@yurymalkov
Copy link
Member

I've used this code (python that calls C++ code):

import hnswlib
import numpy as np
import pickle
import time

dim = 4

# Generating sample data
for _ in range(10):
    print()
    for num_elements in[10000,100000,1000000]:
        for t in [1,24]:
            
            data = np.float32(np.random.random((num_elements, dim)))
            ids = np.arange(num_elements)

            # Declaring index
            p = hnswlib.Index(space = 'l2', dim = dim) # possible options are l2, cosine or ip

            # Initializing index - the maximum number of elements should be known beforehand
            p.init_index(max_elements = num_elements*2, ef_construction = 200, M = 16)

            # Element insertion (can be called several times):
            p.add_items(data)

            # Controlling the recall by setting ef:
            p.set_ef(50) # ef should always be > k

 
            t0=time.time()

            p.set_num_threads(t)
            labels, distances = p.knn_query(data, k = 1)
            print(f"num_elements: {num_elements:10}, threads:{t:2}, time:{time.time()-t0}")
        print()

For baseline it gives (on a 3900X):

num_elements:      10000, threads:24, time:0.02010512351989746

num_elements:     100000, threads: 1, time:1.32535719871521
num_elements:     100000, threads:24, time:0.21545171737670898

num_elements:    1000000, threads: 1, time:21.12733006477356
num_elements:    1000000, threads:24, time:2.674464464187622

For tested changes it gives:

num_elements:      10000, threads:24, time:0.044780731201171875

num_elements:     100000, threads: 1, time:1.1950428485870361
num_elements:     100000, threads:24, time:0.5131497383117676

num_elements:    1000000, threads: 1, time:21.29676127433777
num_elements:    1000000, threads:24, time:5.72478461265564

Let me know if plan to update the PR (if not I'll do this, but later)

@zeraph6
Copy link
Author

zeraph6 commented Aug 2, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants