-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding elements to an index loaded in memory iteratively #79
Comments
Hi @mehrrsaa, |
Hi @yurymalkov and thank you for the fast response! Considering the example in hnswlib docs: What I am wondering about is, if there is a way to grow the index without saving and loading it again into memory, so considering if we keep the index in the memory indefinitely, when a new batch of data comes in and exceeds its previously set "max element" limit, we would want to do something like: p.add_items(data3, new_max_element = num_elements + len(data3)) I hope it was more clear this time. My guess is this can't be done, but I want to make sure. |
@mehrrsaa Yes, |
Thank you, that would be awesome! |
Hello, I was wondering if there is still a plan in place to implement this functionality? Thank you |
Hi @mehrrsaa |
Thank you for getting back to me @yurymalkov, I appreciate it! |
@mehrrsaa Finally done it as a manual index resize( |
@yurymalkov hi, I have one question following from the previous discussion: I want to build an index using 2 million samples, and in order to avoid memory problem, I'm reading the data in chunks and add to the index one by one. I set the max_elements to be 2million from the start. Currently I'm following the example code and implementing save and load and it has been working fine:
I would like to check whether I can skip the saving and loading part? something like this:
Thank you! |
@Allenlaobai7 index_vemb = hnswlib.Index(space='cosine', dim=args.dim)
index_vemb.init_index(max_elements=args.vid_cnt, ef_construction=200, M=16) # M=16
for samples in pd.read_csv(path, chunksize=CHUNK_SIZE):
index_vemb.add_items(samples['emb'].tolist(), sample['vid'].tolist())
index_vemb.save_index(args.model_path) |
@yurymalkov Thank you for the quick reply, I followed the code from read.me and therefore implemented the save and load part. I think it make sense to continue adding items as long as the sample size does not exceed max_elements. Let me test it later to make sure it works. |
Ok. Thanks for the feedback! Didn't think about it... |
Currently, from what I understand of the documentation, we would need to load an index into the memory to be able to update the number of max elements and add new elements to it.
My question is: what if the index is already loaded in the memory, some process is being run iteratively and we want to append new elements yielded from that process to this already loaded index and update the object. At the moment it seems that we need to follow this routine: build index, save it, load it back (increase max elements in load argument), add elements, save, and again...
This adds load time to this process. I was wondering if there is a way to do this without saving and loading the index and just append to an already in memory index over and over (and save when we want).
Thank you in advance for any help on this!
The text was updated successfully, but these errors were encountered: