Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add other data storage types to Python bindings. #364

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

psobot
Copy link
Contributor

@psobot psobot commented Jan 27, 2022

This is a bit of a big one - happy to break this into smaller PRs if that would be useful. This pull request:

  • Adds new Python bindings for indexes with different data storage types:
    • DoubleIndex (float64)
    • Int8Index
    • UInt8Index
    • Int16Index
    • UInt16Index
  • Adds another template argument to subclasses of SpaceInterface: data_t, to specify the data storage type used.
  • Changes get_items to return a Numpy array instead of a List[List[data_t]].
  • Adds templated InnerProduct and L2Sqr distance comparison functions that auto-unroll and auto-vectorize their inner loops (see Godbolt). This allows us to use different data types without manually having to write every comparison function. (This might make the manual SIMD functions obsolete, although I've left them in there for now out of an abundance of caution.)
  • Extends test coverage to cover these new classes.

I'm not 100% sure if this is the best way to architect this API; in particular, should the data type be a property of Index, set at creation time, rather than a separate class of Index? (For now, it's the latter.)

The existing tests seem to pass, and the new data types allow for smaller index files on disk (in situations where reduced-precision is acceptable):
image

The new int8 and uint8 data types even seem to perform about 60% faster in the best case (when using 1024 dimensions):
image

@psobot
Copy link
Contributor Author

psobot commented Jan 27, 2022

Hmm, it looks like this PR is failing on Windows but passing on Ubuntu (and on macOS, where I'm testing locally). I'll dig into that.

@psobot psobot changed the base branch from master to develop January 31, 2022 18:28
@patelprateek
Copy link

@psobot Curios if you are still working on this ?
With this change , do you see performance impact on the old fp32 distance computation ?

@patelprateek
Copy link

@psobot : IIUC for the new storage types (uint8 , uint16 ...) ,it seems we rely on compiler vectorization , we dont have support for explicit vectorized code like fp32 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants