Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.8.0 #523

Merged
merged 52 commits into from
Dec 3, 2023
Merged

Release v0.8.0 #523

merged 52 commits into from
Dec 3, 2023

Conversation

yurymalkov
Copy link
Member

@yurymalkov yurymalkov commented Nov 19, 2023

Highlights:

  • Multi-vector document search and epsilon search (for now, only in C++)
  • By default, there is no statistic aggregation, which speeds up the multi-threaded search (it does not seem like people are using it anyway: Issue #495).
  • Various bugfixes and improvements
  • get_items now have return_type parameter, which can be either 'numpy' or 'list'

Huge thanks to the contributors:

  1. Dmitry Yashunin (dyashuni) - Led the addition of a custom stop condition, multivector and epsilon search functionalities, and various code refinements and bug fixes.
  2. Alexander Vieth (alxvth) - Contributed to resolving global linkage issues and improving Mac setup in the CI environment.
  3. Taras Tsugrii (ttsugriy) - Focused on fixing build warnings, enhancing bruteforce removePoint functionality, and improving memory management.
  4. aurora327 - Implemented more efficient AVX512 instruction sets in functions.
  5. Étienne Mollier (emollier) - added a cap on the 'M' parameter and a typo.
  6. Johan Rade (jrade) - Addressed a linking error when compiling with Visual Studio.
  7. James Melville (jlmelville) - Addressed reordering warning, sign mismatch in loops, and standardized error handling using macros (Provide a macro to override the use of std::cerr #508 has the details)
  8. stephematician - Resolved initialization order warning in GNU compilers.
  9. moritz-h - Added CMake install targets and set CMake version range.
  10. drons - Implemented functions for precise memory footprint control and fixed memory leak issues.
  11. Atsushi Tatsuma (yoshoku) - Added a missing virtual destructor to BaseFilterFunctor.

jlmelville and others added 30 commits March 5, 2023 14:10
* Add CMake install targets
* Set CMake version range
* Add multithread search for BF index
Add HierarchicalNSW::indexFileSize() function for precise memory footprint control
* Add macos into CI
* Fix mac setup
* Use macos-13 in CI
* Allow asserts, fix checkIntegrity
* Revert macos to latest in CI
* Fix CI
* Windows build warnings
Replace priority_queue::push with emplace.
…on consider the size of a Vector that is not divisible by 4
[bruteforce] Fix bruteforce removePoint.
Use unique_ptr to manage visited_list_pool_
Efficient AVX512  implementation in 'InnerProductSIMD16ExtAVX512' Function
Fix memory leak on loadIndex with non-empty HierarchicalNSW object
This patch works around issue #467, also referenced as CVE-2023-37365,
by implementing Yury Malkov's suggestion about capping the M value,
coding the maximum number of outgoing connections in the graph, to a
reasonable enough value of the order of 100000.  For the record, the
documentation indicates reasonable values for M range from 2 to 100,
which are well within the cap; see ALGO_PARAMS.md.

The reproducer shown in issue #467 doesn't trigger the double free
condition anymore after this change is applied, but completes
successfully, although with the below warning popping up on purpose:

	warning: M parameter exceeds 100000 which may lead to adverse effects.
	         Cap to 100000 will be applied for the rest of the processing.

Signed-off-by: Étienne Mollier <[email protected]>
per comment in merge request discussion.
dyashuni and others added 20 commits August 12, 2023 15:03
Bring back HNSWLIB_NO_NATIVE
python_bindings/bindings.cpp: fix typo.
Fixes linking error when compiling with Visual Studio and including hnswlib.h in several cpp files.

before:
2>Tools.lib(Pch.obj) : error LNK2005: "void __cdecl cpuid(int * const,int,int)" (?cpuid@@YAXQEAHHH@Z) already defined in Pch.obj
2>Tools.lib(NearestNeighbors.obj) : error LNK2005: "void __cdecl cpuid(int * const,int,int)" (?cpuid@@YAXQEAHHH@Z) already defined in Pch.obj
2>C:\Users\Rade\Documents\Data Analysis\Code\GenomicsLab Repo\x64\Release\ExpressionDemo.exe : fatal error LNK1169: one or more multiply defined symbols found
2>Done building project "ExpressionDemo.vcxproj" -- FAILED.

after:
2>ExpressionDemo.vcxproj -> C:\Users\Rade\Documents\Data Analysis\Code\GenomicsLab Repo\x64\Release\ExpressionDemo.exe
GNU compilers no longer warn with -Wall or -Wreorder that initialisation
order does not match declaration order in HierarchicalHNSW
Provide a macro to override the use of std::cerr
Avoid sign mismatch in loop
Resolve initialisation order warning
* Add stop condition, multivector search, epsilon search

* Fix include

* Update readme

* Update multivector tests

* One header file

* Add bare_bone_search flag

* Fix epsilon search

* Refactoring

* Adress comments

* Fix assert

* Add ef to multivector search, return vector, refactoring

* Refactoring

* Adress comments

* Add bare bone search comment

Co-authored-by: Yury Malkov <[email protected]>

* Remove has_deletions flag

---------
Co-authored-by: Yury Malkov <[email protected]>
@yurymalkov yurymalkov requested a review from dyashuni November 20, 2023 03:43
Copy link
Contributor

@dyashuni dyashuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@yurymalkov yurymalkov merged commit 3f34296 into master Dec 3, 2023
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.