-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix heap buffer overflow caused by prefetch. #459
base: master
Are you sure you want to change the base?
Conversation
The prefetching goes past `size` since `datal` is initialized 1 index past `data`. I uncovered this issue on ASAN when SSE is enabled. Also spotted earlier here: nmslib#107
hnswlib/hnswalg.h
Outdated
_mm_prefetch((char *) (visited_array + *(datal + j + 1)), _MM_HINT_T0); | ||
_mm_prefetch(getDataByInternalId(*(datal + j + 1)), _MM_HINT_T0); | ||
_mm_prefetch((char *) (visited_array + *(datal + j)), _MM_HINT_T0); | ||
_mm_prefetch(getDataByInternalId(*(datal + j)), _MM_HINT_T0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that fix defeats the purpose of the prefetch, as it need to be done for the next element.
Is there any evidence this prefetch affects the program behavior?
If so, I guess we probably should come with a different fix, e.g. min(j+1, max_value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that a cache line is atleast 64 bytes, so does the extra +1
really matter? What matters is that the loop is moving forward via the j
variable increments so we keep fetching ahead. We should be able to confirm this via a benchmark: I don't think there will be any degradation in performance.
There are actually a couple of other places the same issue happens, which I just noticed:
- https://github.com/nmslib/hnswlib/blob/master/hnswlib/hnswalg.h#L346
- https://github.com/nmslib/hnswlib/blob/master/hnswlib/hnswalg.h#L1010
(Edit: updated PR to cover these as well)
Is there any evidence this prefetch affects the program behavior?
There is a buffer overflow here by accessing past the array length. Whether that causes a segfault will depend on hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think the proper way to actually use a _mm_prefetch
is to fetch a full 64-bytes ahead, for example how it's done here by fetching "8 doubles ahead":
But this is quite tricky to implement and test because modern processors have gotten really good at prefetching without any hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_mm_prefetch((char *) (visited_array + *(datal + j + 1)), _MM_HINT_T0);
+1 fetches the visited bit for the next item, which can be anywhere in the memory compared the previous element. Almost certain different cache and page and at least skylake's cpus were struggling without the prefetches, that is why it ended up in the code despite the obvious issue.
Yeah, it would be great if you could profile the performance on few different x64 platforms and maybe improve the prefetching performance and hopefully fix this out-of-bound issue that made people uneasy.
I think the proper way to test the performance is to do search on small dimensional data to focus on memory sparsity (I used d=4) and keep two implementations of the search function (with and without the change), so the performance would not depend on random memory layout changes in the data. That should allow measuring the performance within 1-2% accuracy.
Ah, sorry, I missed the extra pointer de-referencing in the code and assumed a sequential lookup. I benchmarked with/without prefetching by searching on 1M / 4-dimension points.
So, I've now updated the PR to use |
Hi @kishorenc, |
@kishorenc can you please share the benchmarking code? |
Here's the code for the benchmark. I just ran this with/without the changes in the PR and compared.
Yes, those are out-of-bound as well, but I didn't address them because I was not sure what their actual sizes were. In the places that I've changed, there was a clear |
The prefetching goes past
size
sincedatal
is initialized 1 index pastdata
.I uncovered this issue on ASAN when SSE is enabled. Also spotted earlier here: #107
cc @dyashuni