Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] intersection between a python set and a BloomFilter #51

Open
yvesx opened this issue Mar 22, 2014 · 5 comments
Open

[question] intersection between a python set and a BloomFilter #51

yvesx opened this issue Mar 22, 2014 · 5 comments

Comments

@yvesx
Copy link

yvesx commented Mar 22, 2014

What would be the best way to find out the intersection between a python set and a BloomFilter? I'm currently doing:

intersection = [item for item in set if item in bloom_filter]

Is it possible to support a bulk lookup method?

@ers81239
Copy link

ers81239 commented Dec 1, 2014

From my understanding that would not be possible. The only similar thing you could do is create a comparable bloom filter and put all of the items in your set into it. Now you could compare the two filters to see if all of the elements in the second one are also in the first one. But this won't give you the intersection, it will only tell you if the intersection of the sets exists.

@trbs
Copy link

trbs commented Dec 2, 2014

Alternatively with a bulk lookup method, I assume @yvesx is looking for this:

  intersection = bloom_filter.multi_lookup_return_keys(some_set)

Which would be functionally equivalent with the example. My initial feeling is that this might not be much faster unless Cython can iterate through and setup a new set/list far more efficiently then the list comprehension.

The case for bloom_filter.match_all(iterable) or bloom_filter.match_any(iterable) seems more logical in terms of getting a speed up compared to any(e in bloom_filter for e in iterable) and all(e in bloomfilter for e in iterable).

But maybe the overhead of the function call and call to _assert_open is worth it to create bulk operations
:)

I probably need all three use-cases in 2015 so I will benchmark them and implement them in my fork if their worth it.

@alexjironkin
Copy link

You can intersect and union bloom filters but only if they have the same bit array lengths right?!

Union is just bitwise or. If bit is set in one array, then it should also be set in the union. If both are unset then it should be 0 in the union too.

Similarly, intersection is bitwise and. Iff bit was set in both arrays then it will be in the intersecting set.

This should be quiet easy to implement. Again, obviously this only works for the same length filters. As long as that's OK, then it will work with the same error rate and everything.

@axiak
Copy link
Owner

axiak commented Feb 26, 2016

@alexjironkin as far as I know, this library already supports bitsize or/and for bloom filters of equal size

@alexjironkin
Copy link

Through an API method, or by accessing bit array and doing and/or manually?

But they way do you use inyernal counters for deletion etc? Because they would have to change appropriately as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants