Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big value of k and ef #96

Open
realsergii opened this issue Mar 6, 2019 · 2 comments
Open

Big value of k and ef #96

realsergii opened this issue Mar 6, 2019 · 2 comments

Comments

@realsergii
Copy link

Hello and thanks for this great product!
In my system I expect k to be around 10 usually, and ef, let's say 50. But at some point I expect to get items with very similar l2 distance (e.g. 0.1-0.3) and in that case I will need to retrieve ALL of them. So I can dynamically increase k, and ef (to always be more than k), and re-query until I start to get items with l2 distance more than I need (e.g. > 0.3).
My concern is, what if k will become 1000, or 10000 (and ef still > k)? Will hnswlib still perform optimally? Or maybe such algorithms are nor designed to work with such big k (and also frequent changes of ef)?

@yurymalkov
Copy link
Member

Hi @realsergii ,

So, you need epsilon-Nearest Neighbor search, not K-nearest neighbors. Right?
Hnswlib can operate with large number of K (i.e. it should be optimal), but at current implementation it relies on the knowledge that of K beforehand.

The search algorithm can be modified for epsilon-Nearest Neighbor search. It would be a nice feature if someone implements it :)

Also, ef is overrided to K is ef<K inside hnswlib.

@realsergii
Copy link
Author

realsergii commented Mar 6, 2019

Thanks @yurymalkov.

but at current implementation it relies on the knowledge that of K beforehand

If you mean that k is not calculated based on some conditions, I get it.

Do you have metrics of search time depending on k? Same search but with k = 1 ... x ... number of items. I feel that for my case, if the search speed is pretty much the same, I can apply the approach I described in the first post - to get the items in some distance range. If not, I should seek for decent implementation of epsilon ball NN (though after quick googling I didn't find one :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants