Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap support and other features #227

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

alpinejoe
Copy link

  • mmap support

  • Accepts pandas Series values without recasting

  • Support for non-integer labels (labeltype is a template parameter)

  • Index is stored differently in terms of how element_levels_ were stored. Added magic number and version to index files to check.

  • Other miscellaneous refactoring

  • _mm_prefetch calls were being identified as memory leaks so they were removed.

@yurymalkov
Copy link
Member

@alpinejoe Thank you! It would take some time to review this PR.
I wonder, did you check that performance didn't go down as _mm_prefetch were removed?

@alpinejoe
Copy link
Author

I have not.
From what I've read so far, those calls hint the processor to fetch the data to the cache. And by the looks of it, the call is made for the next iteration in the beginning of the current iteration. I'm assuming that's done so the relevant memory blocks could be fetched in parallel while the ALU is in busy with the current iteration. But that also means the last iteration of the loop fetches arbitrary memory.

I wonder if adding a condition or moving those instructions to another loop would give an equal performance.

Let me know if you want me to revert that change as-is.

@yurymalkov
Copy link
Member

@alpinejoe Thank you! I'll run the performance tests and will let you know the results.

@yurymalkov
Copy link
Member

@alpinejoe I've started testing the branch. Some results should be overnight.
I've noticed the python interface have changed, specifically, the output is not numpy, but python.
I wonder, what is the rationale for such behavior?

@yurymalkov
Copy link
Member

Hm. So far is seems to be about 40% slower...

@yurymalkov
Copy link
Member

It is overall slower. I doubt this can be explained only by prefetches.
What do you think might be the other reasons for the slowdown?

@alpinejoe
Copy link
Author

I've noticed the python interface have changed, specifically, the output is not numpy, but python.

You mean output of knn_query? It should still be a numpy array, although it's a structured array now. This allows the python interface to create a numpy buffer and pass the reference to the algorithm, avoiding copying from vector/priority queue to numpy array.
You can still extract the label/distance in the traditional python [] syntax or use a view like this:

dtype = np.dtype({
    'names': ['label'],
    'formats': ['<u8'],
    'offsets': [8],
    'itemsize': 16
})
labels = knn_results.view(dtype=dtype)  # labels is a non-contiguous 2D array

I have also done other memory optimizations, but can't imagine any of them making things slow by 40%.

Running the python tests was about 3x faster for develop vs master, even though develop tests include additional mmap tests. What are the performance tests you ran? Was it SIFT 200M? If not, can you point me to a script/gist?

@yurymalkov
Copy link
Member

@alpinejoe I ran ann benchmark.
But probably 200M sift would be a better test (1M subset should be more than enough), unless the slowdown is caused by python bindings. I can check the slowdown there.

@yurymalkov
Copy link
Member

Hm. The sift 200m does not seem to work on the updated branch... Maybe it makes sense to just do random a data test using the python interface. ann benchmarks will probably do, but they are slow to test. I can share a faster script.

- mmap support
- Accepts pandas Series values without recasting
- Support for non-integer labels (labeltype is a template parameter)
- Index is stored differently in terms of how element_levels_ were stored. Added magic number and version to index files to check.
- Other miscellaneous refactoring

- _mm_prefetch calls were being identified as memory leaks so they were removed.
@alpinejoe
Copy link
Author

@yurymalkov you should be able to run the sift tests now.
Absence of _mm_prefetch calls were indeed causing a slow-down, so I've added them back. The performance jump is worth the read buffer-overflow I guess. Last few lines of the sift tests:

master branch:

300	0.9965	819.016 us
340	0.9976	921.286 us
380	0.9979	988.705 us
420	0.9981	1080.55 us
460	0.9984	1176.99 us
Actual memory usage: 428 Mb 

develop without _mm_prefetch:

300	0.9965	960.721 us
340	0.9971	1071.55 us
380	0.9975	1171.02 us
420	0.9977	1290.54 us
460	0.9979	1400.13 us
Actual memory usage: 423 Mb 

develop with _mm_prefetch:

300	0.9963	774.196 us
340	0.9971	867.445 us
380	0.9976	953.785 us
420	0.9979	1041.6 us
460	0.9983	1125.08 us
Actual memory usage: 423 Mb 

Let me know if you have that faster script ready, I can try running that as well.

@searchivarius
Copy link
Member

@alpinejoe and @yurymalkov could anybody point me to the exact point where buffer overrun is happening?

@yurymalkov
Copy link
Member

@alpinejoe Ok. Thanks! I would need to find and revive that script.
@searchivarius The code prefetching the next elements before the current ones and also does it for the last element in the loop, so it can go out of bounds of the memory.
It does not seem like a big deal though, potentially it can lead to loss of performance (https://stackoverflow.com/questions/40128151/what-happens-if-an-invalid-address-is-prefetched) but not sure if it the case here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants