-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support DataLevel0BlocksMemory data struct #577
base: develop
Are you sure you want to change the base?
Conversation
…memory doubling problem
Thank you for the update! I wonder if you've looked into if the speed is the same before and after the code change for flat memory? |
Ok, I'll update the test results later. |
Git commit:
Git commit:
For the accuracy of the test, the above set of test data is the average value taken after I tested three sets of data. I don't think adding an if judgment will significantly affect performance, because the cost of the if judgment is negligible relative to the computation of the vector itself. |
58b7344
to
2a4cab5
Compare
@yurymalkov Please take a look at the performance test data here |
Thank you so much for the test! After discussions with @dyashuni we think 1-1.5% performance hit should be fine for now (an alternative it to make it a template, but it might slow down the compilation and install time for the pip package). I am going to review the PR in much more detail (one thing that now is evident is different style of naming compared to the library). I also wonder if you can implement automatic allocation of a new block when the new element is added, but there is no more room, I think that is something many people asked for. I guess that is adding synchronization lock logic and might be even just calling resize at https://github.dev/Axlgrep/hnswlib/blob/2a4cab5dce364587b1f9c64dcd7532c4f47f9f24/hnswlib/hnswalg.h#L1391-L1393 (I guess that would require a global lock and a counter on the unfinished ops or a read-write lock). |
// load hnsw index | ||
hnswlib::AlgorithmInterface<float>* alg_hnsw_second = new hnswlib::HierarchicalNSW<float>(&space, false, | ||
0, allow_replace_deleted, hnsw_second_use_blocks_memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we don't load index. We initialize empty index (max_elements=0). I think something is missing here
for (size_t i = 0; i < total_items; ++i) { | ||
alg_hnsw_first->addPoint(data.data() + dimension * i, i); | ||
} | ||
check_knn_closer(alg_hnsw_first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what we are supposed to check here. searchKnnCloserFirst
only calls searchKnn
and rearranges the found elements."
const size_t dimension = 1024; | ||
const size_t total_items = 100 * 10000; | ||
const size_t num_query = 500 * 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With such parameters, it takes a lot of time to run the test.
|
||
} // namespace | ||
|
||
int main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test to check the resizeIndex
function as well?
For example, we can call resizeIndex
multiple times and then verify that the recall is high and all elements' data is valid in the index.
@Axlgrep thank you for PR! |
support DataLevel0BlocksMemory data struct to solve the realloc huge memory doubling problem
Performance
Environment
All of the benchmarks are run on the same Tencent instance. Here are the details of the test setup:
Test data
Thread Num: 48
We constructed (randomly generated) 1 million data(vector dimension: 1024) for testing.
addPoint Performance
searchKnn Performance
@yurymalkov Please take a look at this MR, thank you...