-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmap support and other features #227
base: develop
Are you sure you want to change the base?
Conversation
@alpinejoe Thank you! It would take some time to review this PR. |
I have not. I wonder if adding a condition or moving those instructions to another loop would give an equal performance. Let me know if you want me to revert that change as-is. |
@alpinejoe Thank you! I'll run the performance tests and will let you know the results. |
@alpinejoe I've started testing the branch. Some results should be overnight. |
Hm. So far is seems to be about 40% slower... |
It is overall slower. I doubt this can be explained only by prefetches. |
You mean output of knn_query? It should still be a numpy array, although it's a structured array now. This allows the python interface to create a numpy buffer and pass the reference to the algorithm, avoiding copying from vector/priority queue to numpy array.
I have also done other memory optimizations, but can't imagine any of them making things slow by 40%. Running the python tests was about 3x faster for develop vs master, even though develop tests include additional mmap tests. What are the performance tests you ran? Was it SIFT 200M? If not, can you point me to a script/gist? |
@alpinejoe I ran ann benchmark. |
Hm. The sift 200m does not seem to work on the updated branch... Maybe it makes sense to just do random a data test using the python interface. ann benchmarks will probably do, but they are slow to test. I can share a faster script. |
- mmap support - Accepts pandas Series values without recasting - Support for non-integer labels (labeltype is a template parameter) - Index is stored differently in terms of how element_levels_ were stored. Added magic number and version to index files to check. - Other miscellaneous refactoring - _mm_prefetch calls were being identified as memory leaks so they were removed.
@yurymalkov you should be able to run the sift tests now. master branch:
develop without _mm_prefetch:
develop with _mm_prefetch:
Let me know if you have that faster script ready, I can try running that as well. |
@alpinejoe and @yurymalkov could anybody point me to the exact point where buffer overrun is happening? |
@alpinejoe Ok. Thanks! I would need to find and revive that script. |
mmap support
Accepts pandas Series values without recasting
Support for non-integer labels (labeltype is a template parameter)
Index is stored differently in terms of how element_levels_ were stored. Added magic number and version to index files to check.
Other miscellaneous refactoring
_mm_prefetch calls were being identified as memory leaks so they were removed.