Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small #465

Open
AmanKishore opened this issue May 22, 2023 · 4 comments

Comments

@AmanKishore
Copy link

Getting this error with the following code:

vectordb = Chroma.from_documents(results, embeddings)
relevant_docs = vectordb.similarity_search(query=item.question, k=min(len(vectordb.get()["ids"]), num_search_results))

Any ideas how to fix?

@yurymalkov
Copy link
Member

Hi @AmanKishore,

Can you provide more details on the dataset? Are you using filtering?

@aramperes
Copy link

aramperes commented Jun 3, 2023

I can reproduce this when filtering out enough items that k > filtered_element_count.

Say I have an index with 10 documents, but only 2 evaluate to true in _predicate(id):

# idx: ef_construction=200, M=64
docs, distance = idx.knn_query(vector, k=2, filter=_predicate, num_threads=1)
# works OK, returns 2 docs

# delete the first result
idx.mark_deleted(docs[0])

# do the same search (expect 1 results because of filtering)
docs, distance = idx.knn_query(vector, k=2, filter=_predicate, num_threads=1)
# RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small

It would be nice for the knn_query() function contract to allow return less than k items if the list of valid elements is exhausted.

@yurymalkov
Copy link
Member

Hi @aramperes,

Thanks for the example. We will fix it soon for non-batched queries.

I wonder if it is a problem for batched queries? There there would be an issue with a the different number of returned nearest neighbors, which would require a flag that would do switching output to lists, padding, or returning the number of items. I am not sure which would work the best.

@vinay-kasireddy-zacks
Copy link

I tried multiple parameters for ef and M but problem persists. However, if I remove the where clause in query, it goes away. Perhaps the error message is misleading and it has nothing to do with the tuning but bug with memory issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants