Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function #475

Merged
merged 2 commits into from
Jul 10, 2023

Conversation

aurora327
Copy link
Contributor

InnerProductSIMD16ExtAVX512 functions are implemented using the more efficient AVX512 instruction set

…on consider the size of a Vector that is not divisible by 4
@aurora327
Copy link
Contributor Author

Hi @yurymalko, Can you please to review the code?

@yurymalkov
Copy link
Member

Hi @aurora327,

Thank you for the PR! I am slow to respond currently due to sickness, sorry.

I wonder, how much improvement did you see in the tests with the better implementation?

@aurora327
Copy link
Contributor Author

hi, @yurymalkov
Different Size of vector1 and vector2 of the passed parameters have different performance gains, my own dataset build on 4th Generation Intel® Xeon® resulted in a 2% to 10% end-to-end improvement with 4 cores bound.
I hope you feel better soon :)

@yurymalkov
Copy link
Member

Thanks again for the PR! I've also checked the query performance, it is up to 15% for 16-dim.
I also wonder if aligned/unaligned memory makes a difference for the current architectures?

@yurymalkov yurymalkov merged commit f30b6e1 into nmslib:develop Jul 10, 2023
@aurora327
Copy link
Contributor Author

No differences were observed since the avx512 instruction used handles both aligned and unaligned data well. At the same time, the aligned buffer is highly recommended if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants