Adding elements to an index loaded in memory iteratively #79

mehrrsaa · 2018-12-06T19:49:19Z

Currently, from what I understand of the documentation, we would need to load an index into the memory to be able to update the number of max elements and add new elements to it.

My question is: what if the index is already loaded in the memory, some process is being run iteratively and we want to append new elements yielded from that process to this already loaded index and update the object. At the moment it seems that we need to follow this routine: build index, save it, load it back (increase max elements in load argument), add elements, save, and again...

This adds load time to this process. I was wondering if there is a way to do this without saving and loading the index and just append to an already in memory index over and over (and save when we want).

Thank you in advance for any help on this!

yurymalkov · 2018-12-07T11:33:43Z

Hi @mehrrsaa,
Not sure what do you mean. There is no easy way to merge two indexes.
What can be done, is an automatic extension of the number of max elements as the index grows.

mehrrsaa · 2018-12-07T14:40:40Z

Hi @yurymalkov and thank you for the fast response!
Sorry, may be I can make it more clear with an example:

Considering the example in hnswlib docs:
This is what is done now:
We init p and load an already built index into it and add new elements to it (which is an awesome capability to have, thank you!):
p = hnswlib.Index(space='l2', dim=dim)
p.load_index("first_half.bin", max_elements = num_elements)
p.add_items(data2)

What I am wondering about is, if there is a way to grow the index without saving and loading it again into memory, so considering if we keep the index in the memory indefinitely, when a new batch of data comes in and exceeds its previously set "max element" limit, we would want to do something like:

p.add_items(data3, new_max_element = num_elements + len(data3))

I hope it was more clear this time. My guess is this can't be done, but I want to make sure.

yurymalkov · 2018-12-08T06:53:08Z

@mehrrsaa Yes, p.add_items(data3, new_max_element = num_elements + len(data3)) is not available at the moment.
But implementing similar functionality is on the TODO list. Probably it will be done within few weeks.

mehrrsaa · 2018-12-10T20:49:41Z

Thank you, that would be awesome!

mehrrsaa · 2019-03-12T19:39:47Z

Hello,

I was wondering if there is still a plan in place to implement this functionality?

Thank you

yurymalkov · 2019-03-13T14:15:45Z

Hi @mehrrsaa
Yes it is still in the plans. I am too busy right now, sorry...
Will start doing it in two weeks.

mehrrsaa · 2019-03-13T14:37:30Z

Thank you for getting back to me @yurymalkov, I appreciate it!

yurymalkov · 2019-06-08T06:42:39Z

@mehrrsaa Finally done it as a manual index resize(resize_index). Now it is the develop branch.
Sorry it took that long.

Allenlaobai7 · 2020-10-26T09:49:56Z

@yurymalkov hi, I have one question following from the previous discussion:

I want to build an index using 2 million samples, and in order to avoid memory problem, I'm reading the data in chunks and add to the index one by one. I set the max_elements to be 2million from the start. Currently I'm following the example code and implementing save and load and it has been working fine:

init = 1
for samples in pd.read_csv(path, chunksize=CHUNK_SIZE):
    index_vemb = hnswlib.Index(space='cosine', dim=args.dim)
    if init == 1:  # init
        index_vemb.init_index(max_elements=args.vid_cnt, ef_construction=200, M=16)  # M=16
        init = 0
    else:  # load and append new data
        index_vemb.load_index(args.model_path)
    index_vemb.add_items(samples['emb'].tolist(), sample['vid'].tolist())
    index_vemb.save_index(args.model_path)
    del index_vemb

I would like to check whether I can skip the saving and loading part? something like this:

init = 1
for samples in pd.read_csv(path, chunksize=CHUNK_SIZE):
    index_vemb = hnswlib.Index(space='cosine', dim=args.dim)
    if init == 1:  # init
        index_vemb.init_index(max_elements=args.vid_cnt, ef_construction=200, M=16)  # M=16
        init = 0
    index_vemb.add_items(samples['emb'].tolist(), sample['vid'].tolist())
index_vemb.save_index(args.model_path)

Thank you!

yurymalkov · 2020-10-27T05:56:26Z

@Allenlaobai7
I am not sure I fully understand. You do not need to load from the index to add elements.
I think something like this should work (though I have not tested the code):

index_vemb = hnswlib.Index(space='cosine', dim=args.dim)
index_vemb.init_index(max_elements=args.vid_cnt, ef_construction=200, M=16)  # M=16
for samples in pd.read_csv(path, chunksize=CHUNK_SIZE):
    index_vemb.add_items(samples['emb'].tolist(), sample['vid'].tolist())
index_vemb.save_index(args.model_path)

Allenlaobai7 · 2020-10-27T06:08:33Z

@yurymalkov Thank you for the quick reply, I followed the code from read.me and therefore implemented the save and load part. I think it make sense to continue adding items as long as the sample size does not exceed max_elements. Let me test it later to make sure it works.

yurymalkov · 2020-10-27T06:12:59Z

Ok. Thanks for the feedback! Didn't think about it...
The code was to demonstrate that you can add elements after loading the index (e.g. the index if fully dynamic).
Yes, you can safely add elements until the capacity is reached. And when the capacity is reached, you can use resize_index to increase it (though probably a more user-friendly way is needed).

mehrrsaa closed this as completed Dec 10, 2018

mehrrsaa reopened this Mar 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding elements to an index loaded in memory iteratively #79

Adding elements to an index loaded in memory iteratively #79

mehrrsaa commented Dec 6, 2018

yurymalkov commented Dec 7, 2018

mehrrsaa commented Dec 7, 2018

yurymalkov commented Dec 8, 2018

mehrrsaa commented Dec 10, 2018

mehrrsaa commented Mar 12, 2019

yurymalkov commented Mar 13, 2019

mehrrsaa commented Mar 13, 2019

yurymalkov commented Jun 8, 2019

Allenlaobai7 commented Oct 26, 2020 •

edited

Loading

yurymalkov commented Oct 27, 2020

Allenlaobai7 commented Oct 27, 2020

yurymalkov commented Oct 27, 2020

Adding elements to an index loaded in memory iteratively #79

Adding elements to an index loaded in memory iteratively #79

Comments

mehrrsaa commented Dec 6, 2018

yurymalkov commented Dec 7, 2018

mehrrsaa commented Dec 7, 2018

yurymalkov commented Dec 8, 2018

mehrrsaa commented Dec 10, 2018

mehrrsaa commented Mar 12, 2019

yurymalkov commented Mar 13, 2019

mehrrsaa commented Mar 13, 2019

yurymalkov commented Jun 8, 2019

Allenlaobai7 commented Oct 26, 2020 • edited Loading

yurymalkov commented Oct 27, 2020

Allenlaobai7 commented Oct 27, 2020

yurymalkov commented Oct 27, 2020

Allenlaobai7 commented Oct 26, 2020 •

edited

Loading