Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect results in bruteforce with filter #514

Merged
merged 3 commits into from
Dec 11, 2023

Conversation

lukaszsmolinski
Copy link

There is a bug in the searchKnn function of BruteforceSearch. When it is used with a filter, in some cases, it returns fewer items than expected. This PR should fix it.

The bug can be reproduced using the following code:

import hnswlib

data = [[0], [1], [2]]
max_elements = len(data)
dim = len(data[0])

bf_index = hnswlib.BFIndex(space='l2', dim=dim)
bf_index.init_index(max_elements=max_elements)
bf_index.add_items(data, ids=range(max_elements))

filter_function = lambda id: id in [0, 2]
labels, _ = bf_index.knn_query(data[0], k=2, filter=filter_function)

print(labels)
# Should be: [[0 2]]
# Is: [[0 0]]

@dyashuni
Copy link
Contributor

@lukaszsmolinski Thank you for the PR!
Good catch. Looks good.

@dyashuni
Copy link
Contributor

For some reason, this code

std::priority_queue<std::pair<dist_t, hnswlib::labeltype >> result = alg->searchKnn(
    (void*)items.data(row), k, p_idFilter);
for (int i = k - 1; i >= 0; i--) {
    auto& result_tuple = result.top();
    data_numpy_d[row * k + i] = result_tuple.first;
    data_numpy_l[row * k + i] = result_tuple.second;
    result.pop();
}

from here works even if result.size() < k.

Could you please add an exception if result.size() != k to prevent unexpected results from being returned?
Something like this

if (result.size() != k)
  throw std::runtime_error(
      "Cannot return the results in a contiguous 2D array. Probably ef or M is too small");

from here

@lukaszsmolinski
Copy link
Author

Done. By the way, it seems like this function is also missing the if (normalize) check. I can fix that in this PR if you'd like.

@dyashuni
Copy link
Contributor

dyashuni commented Nov 1, 2023

Thank you, @lukaszsmolinski !
Yes, please add query normalization to BFIndex search to align it with Index

Copy link
Contributor

@dyashuni dyashuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@yurymalkov yurymalkov merged commit dbcef01 into nmslib:develop Dec 11, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants