You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a requirement of getting best match with gallery size of about 40 Million (embedding size 128) with best performance and accuracy. Can you please suggest us what could be the suitable distance type, ef, M parameters. We are having a hard time figuring out these parameters. We hope your expertise on dealing huge data could help us in refining the parameters and arriving at optimal results. Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi @sujigrena,
The optimal parameters depends on the intrinsic data dimensionality, so it is is hard to tell the exact ones (unless you have an estimate, e.g. the clustering factor of the k-NN graph)
The distance type depends on the origin of the vectors. If those are an output of a neural network I would recommend to directly train on objective for a decided distance (by default the neural classifier is trained for inner product, this can be altered to L2 or cosine).
I would go with M=16 first, and have a bench for checking the accuracy on the query set. Build an index, find ef which give high recall (e.g. 0.95) and set ef_contruction to that parameter. As a rule of thumb, increase M if ef_consruction is more than a thousand and repeat. Also please look at https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md
Hi Team @piem @fabiencastan @groodt @2ooom @vinnitu @yurymalkov ,
We have a requirement of getting best match with gallery size of about 40 Million (embedding size 128) with best performance and accuracy. Can you please suggest us what could be the suitable distance type, ef, M parameters. We are having a hard time figuring out these parameters. We hope your expertise on dealing huge data could help us in refining the parameters and arriving at optimal results. Thanks in advance.
The text was updated successfully, but these errors were encountered: