- New class:
HnswEuclidean
. This uses Euclidean distances internally and will be returned fromhnsw_build
whendistance = "euclidean"
is specified. This fixes an issue where if you created an index withhnsw_build
anddistance = "euclidean"
(the default), then after saving, you would be unable to reload the index and have it find Euclidean distances. You would have to create it as anHsnwL2
object and take the square root of the distances yourself (#21). - The
Hnsw
constructors now expose arandom_seed
parameter that you can use to set the random seed used in constructing the HNSW index. Internally, thehnsw_build
andhnsw_knn
functions will use a random seed based on R's RNG state. This means that if you want to reproduce results, you need to set the random seed in R viaset.seed
before calling those functions. Based on a request by Maciej Beręsewicz (#23).
- Updated hnswlib to version 0.8.0.
- Updated hnswlib to version 0.7.0. Note that I made some very minor changes to the code to silence some compiler warnings. These changes have been submitted up-stream to the hnswlib project.
- For high-dimensional data, there can be a noticeable CPU overhead in copying data out of the
non-contiguous memory regions when row-wise data is used. If you wish to provide data where each
column of the input matrix contains an item to be indexed/search then see the following additions
to the API:
- For the class-based API:
addItemsCol
,getAllNNsCol
andgetAllNNsListCol
are the column-based equivalents ofaddItems
,getAllNNs
andgetAllNNsList
, respectively. Note that the returned nearest neighbor data fromgetAllNNsCol
andgetAllNNsListCol
are also stored by column, i.e. the matrices have dimensionsk x n
wherek
is the number of neighbors, andn
the number of items in the data being searched. - For the function-based API, a new parameter
byrow
has been added tohnsw_knn
,hnsw_build
andhnsw_search
. By default this is set toTRUE
and indicates that the items in the input matrix are found in each row. To pass column-stored items, setbyrow = FALSE
. Any matrices returned byhnsw_search
andhnsw_knn
will now follow the convention provided by the value ofbyrow
: i.e. ifbyrow = FALSE
, the matrices contain nearest neighbor information in each column.
- For the class-based API:
- new method:
getItems
, which returns a matrix of the data vectors in the index with the specified integer identifiers. From a feature request made by d4tum (#18).
- The
progress
parameter in the functional interface no longer does anything. Whenverbose = TRUE
, a progress bar is no longer shown. - Due to a breaking change in roxygen2 7.0.0, there was a missing package alias in the documentation.
- Rolled back to hnswlib v0.4.0 due to valgrind problems in v0.6.2
- Updated hnswlib to version 0.6.2.
- Minor future-proofing of licensing: RcppHNSW is now GPLv3 or later, rather than GPLv3 only.
- Multi-threading support is now available. Use the
setNumThreads
method if using the object-based API, and then_threads
parameter in thehnsw_*
function API. For finer control, asetGrainSize
andgrain_size
option is also available in the object and function interface respectively. Thank you to Dmitriy Selivanov for a lot of the work on this. - Updated hnswlib to version 0.4.0.
- Setting
verbose = TRUE
now has incurs substantially less computational overhead associated with calculating the progress bar. Thank you to Samuel Granjeaud for spotting the problem and coming up with various solutions. - New parameter:
progress
. By default this is set to"bar"
and will show the progress bar whenverbose = TRUE
. If you want a more terse output, setprogress = NULL
.progress = NULL
will eventually be the default setting: for now,verbose = TRUE
will get you the progress bar by default for backwards compatibility. - No progress bar will be shown if you have less than 50 items to process.
- Updated hnswlib to https://github.com/nmslib/hnswlib/commit/c5c38f0 (20 September 2019).
- A new method,
markDeleted
, that will remove an object from being retrieved from the index. - A new method,
resizeIndex
, that allows the index to be increased without having to save and reload the index. - A new method,
size
is available for the index objects and reports the number of items added to the index.
hnsw_search
wouldstop
if the number of rows in the input matrix was smaller thank
. This check has been removed. Note that the correct behavior is to ensure thatk
is smaller than or equal toindex$size()
whereindex
is the index you are searching. Because thesize()
method is new to this version, to preserve compatibility with old indexes, this check hasn't been added tohnsw_search
. If this matters to you, manually compareindex$size()
withk
before runninghnsw_search
. An error will be thrown ifk
neighbors can't be found in the index. Thank you to Yuxing Liao for spotting this and the pull request to remove the check.
Initial release.