Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function #475

aurora327 · 2023-06-20T02:44:25Z

InnerProductSIMD16ExtAVX512 functions are implemented using the more efficient AVX512 instruction set

…efficient AVX512 instruction set

…on consider the size of a Vector that is not divisible by 4

aurora327 · 2023-06-30T02:50:37Z

Hi @yurymalko, Can you please to review the code?

yurymalkov · 2023-07-01T01:01:15Z

Hi @aurora327,

Thank you for the PR! I am slow to respond currently due to sickness, sorry.

I wonder, how much improvement did you see in the tests with the better implementation?

aurora327 · 2023-07-02T06:48:03Z

hi, @yurymalkov
Different Size of vector1 and vector2 of the passed parameters have different performance gains, my own dataset build on 4th Generation Intel® Xeon® resulted in a 2% to 10% end-to-end improvement with 4 cores bound.
I hope you feel better soon :)

yurymalkov · 2023-07-10T01:18:45Z

Thanks again for the PR! I've also checked the query performance, it is up to 15% for 16-dim.
I also wonder if aligned/unaligned memory makes a difference for the current architectures?

aurora327 · 2023-07-11T02:08:15Z

No differences were observed since the avx512 instruction used handles both aligned and unaligned data well. At the same time, the aligned buffer is highly recommended if possible.

aurora327 added 2 commits June 19, 2023 22:42

InnerProductSIMD16ExtAVX512 functions are implemented using the more …

74f14d2

…efficient AVX512 instruction set

InnerProductSIMD16ExtAVX512 Efficient AVX512 instruction implementati…

9291020

…on consider the size of a Vector that is not divisible by 4

yurymalkov merged commit f30b6e1 into nmslib:develop Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function #475

Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function #475

aurora327 commented Jun 20, 2023

aurora327 commented Jun 30, 2023

yurymalkov commented Jul 1, 2023

aurora327 commented Jul 2, 2023

yurymalkov commented Jul 10, 2023

aurora327 commented Jul 11, 2023

Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function #475

Efficient AVX512 implementation in 'InnerProductSIMD16ExtAVX512' Function #475

Conversation

aurora327 commented Jun 20, 2023

aurora327 commented Jun 30, 2023

yurymalkov commented Jul 1, 2023

aurora327 commented Jul 2, 2023

yurymalkov commented Jul 10, 2023

aurora327 commented Jul 11, 2023