Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic graph without need of max_elements #172

Open
ashfaq92 opened this issue Nov 20, 2019 · 3 comments
Open

Dynamic graph without need of max_elements #172

ashfaq92 opened this issue Nov 20, 2019 · 3 comments

Comments

@ashfaq92
Copy link
Contributor

Hi, thanks for great work.
In some domains, the size of graph is not know in advance. Each time a node is created and should be added into graph. In other words, graph should dynamically adjust the new nodes. One solution can be: giving graph maximum affordable size i.e. max_elements= sys.maxint. But this costs huge time and space overhead in indexing process. While in real workflow the nodes can be only 1000. Let suppose we don't know the number of nodes (max_elements) in advance, then how to use this library to solve this problem? What are limitations that require that max_elements should be specified in advance? It would be charming if we don't have to specify max_elements in advance. If there are some programing limitations, please describe so that we can collaboratively find the solution. If it is impossible, please share some work_around/hacks to deal with this problem.

@yurymalkov
Copy link
Member

Hi @ashfaq92,

This can be automatized by extending the index once it hits the capacity by making a copy of index's working memory. Also, realloc can be used instead of copying to avoid doubling the memory consumption (but not guaranteed).

On the downside, it seems that doing so would require to halt the index during the operation with a dedicated shared lock. I think that should be possible to implement it via a wrapper of the python API.

C++ version might be a much better solution, but 1) it would require a shared lock, which is C++14 (not sure if it is a big problem). 2) might cause some slowdown (not sure if a big problem either, as the lock can be acquired only once per search/insertion).

@yurymalkov
Copy link
Member

I can guide through the process if you want to implement that.

@ashfaq92
Copy link
Contributor Author

Hi @yurymalkov,
Today I tried thershold (extending the index once it hits the capacity) idea with multiple configurations. After simulations on random data, I figured out that setting initial size of graph from 01-100000 takes less than 1 millisecond (on my machine). Therefore, it can be good idea to set initial graph size to 100000 (0.1 million). Then the problem arises, how to extend graph. The best configuration was doubling size after hitting threshold.
The simulations are performed on random numeric data.
But this can be considered as workaround not a proper solution.
Please guide me how to implement that. It shall be great to work with mutual collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants