Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bruteforce] Fix bruteforce removePoint. #473

Merged
merged 2 commits into from
Jul 4, 2023

Conversation

ttsugriy
Copy link
Contributor

Previous implementation was not thread-safe - unlike addPoint it didn't acquire an index lock.

While at it, this change also switches to .find() for cur_c lookups, so that following erase wouldn't have to perform another lookup.

P.S.: I don't see how addPoint is thread-safe as well, since write to data_ is performed outside of critical section, but oh well...

@yurymalkov
Copy link
Member

@ttsugriy
Thank you!
Concerning the writing to data_ - agree. We probably need to move it inside.

@yurymalkov yurymalkov merged commit e7c66c8 into nmslib:develop Jul 4, 2023
@ttsugriy ttsugriy deleted the develop branch July 5, 2023 00:48
@ttsugriy
Copy link
Contributor Author

ttsugriy commented Jul 5, 2023

sounds good, @yurymalkov. Btw, unrelated to this change - have you considered using clangformat? Current formatting is somewhat inconsistent and makes reading code a bit harder. Also, if perf is important, I'd recommend considering moving away from interfaces to templates. Using interfaces for distance calculation and other strategy hooks prevents inlining, which even on its own adds overhead and branch mispredictions, but more importantly it prevents all useful compiler optimizations that rely on inlining.

@yurymalkov
Copy link
Member

Hi @ttsugriy,
I am not into strict formatting myself. I think @dyashuni did some formatting some time ago, maybe he has opinions on that.

As for interfaces/function pointers/templates I think I've tested inlining the distance computation manually long time ago and, as far as I recall, there was little if any penalty even for small dimensionalities. There is also a problem that there are several distance implementations depending on the vector length, if those are added as templates, I think the compile time will probably blow up.

@ttsugriy
Copy link
Contributor Author

ttsugriy commented Jul 5, 2023

I am not into strict formatting myself.

your call :)

I think I've tested inlining the distance computation manually long time ago and, as far as I recall, there was little if any penalty even for small dimensionalities.

This is super surprising since this overhead matters a lot even for much less hot code paths than distance computation. The only explanation for such results I can come up with is that the testing was performed using some terrible compiler, maybe visual studio?

There is also a problem that there are several distance implementations depending on the vector length, if those are added as templates, I think the compile time will probably blow up.

Adding templates would most definitely add some compile time overhead, but its cost is O(1) in terms of number of possible implementations - it's only O(N) in terms of the number of implementations actually used during compilation, which is likely to be just 1. In any case, I trust your judgement :)

@yurymalkov
Copy link
Member

Got it! I can check if it affects the performance now.
The compile time now matters for python, as every time people do pip install it compiles from scratch. We can fix it with pre-compiled binaries at some point though

@dyashuni
Copy link
Contributor

@ttsugriy for formatting I used https://github.com/cpplint/cpplint. Probably I missed some parts of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants