0% found this document useful (0 votes)
93 views5 pages

DGIM Algorithm

The DGIM algorithm efficiently counts the number of 1's in a data stream using O(log²N) bits and provides an estimate with a maximum error of 50%. It organizes incoming bits into buckets based on specific rules, allowing for dynamic updates as new bits arrive. While the algorithm is advantageous for its space efficiency and ease of updates, it may incur significant errors if all 1's are located in the unknown region of the data stream.

Uploaded by

deepa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views5 pages

DGIM Algorithm

The DGIM algorithm efficiently counts the number of 1's in a data stream using O(log²N) bits and provides an estimate with a maximum error of 50%. It organizes incoming bits into buckets based on specific rules, allowing for dynamic updates as new bits arrive. While the algorithm is advantageous for its space efficiency and ease of updates, it may incur significant errors if all 1's are located in the unknown region of the data stream.

Uploaded by

deepa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

COUNTING THE NUMBER OF 1’s IN THE DATA

STREAM

DGIM algorithm (Datar-Gionis-Indyk-Motwani Algorithm)

Designed to find the number 1’s in a data set. This


algorithm uses O(log²N) bits to represent a window of N
bit, allows to estimate the number of 1’s in the window
with and error of no more than 50%.

So this algorithm gives a 50% precise answer.

In DGIM algorithm, each bit that arrives has a timestamp,


for the position at which it arrives. if the first bit has a
timestamp 1, the second bit has a timestamp 2 and so on..
the positions are recognized with the window size N (the
window sizes are usually taken as a multiple of 2).The
windows are divided into buckets consisting of 1’s and 0's.

RULES FOR FORMING THE BUCKETS:

1. The right side of the bucket should always start


with 1. (if it starts with a 0,it is to be neglected)
E.g. · 1001011 → a bucket of size 4 ,having four 1’s
and starting with 1 on it’s right end.

2. Every bucket should have at least one 1, else no


bucket can be formed.
3. All buckets should be in powers of 2.

4. The buckets cannot decrease in size as we move to


the left. (move in increasing order towards left)

Let us take an example to understand the algorithm.

Estimating the number of 1’s and counting the buckets in


the given data stream.

This picture shows how we can form the buckets based on


the number of ones by following the rules.

In the given data stream let us assume the new bit arrives
from the right. When the new bit = 0
After the new bit ( 0 ) arrives with a time stamp 101, there
is no change in the buckets.

But what if the new bit that arrives is 1, then we need to


make changes..
· Create a new bucket with the current timestamp and size
1.

· If there was only one bucket of size 1, then nothing more


needs to be done. However, if there are now three buckets
of size 1( buckets with timestamp 100,102, 103 in the
second step in the picture) We fix the problem by
combining the leftmost(earliest) two buckets of size 1.
(purple box)

To combine any two adjacent buckets of the same size,


replace them by one bucket of twice the size. The
timestamp of the new bucket is the timestamp of the
rightmost of the two buckets.

Now, sometimes combining two buckets of size 1 may


create a third bucket of size 2. If so, we combine the
leftmost two buckets of size 2 into a bucket of size 4. This
process may ripple through the bucket sizes.

How long can you continue doing this…

You can continue if current timestamp- leftmost bucket


timestamp of window < N (=24 here) E.g. 103–87=16 < 24
so I continue, if it greater or equal to then I stop.

Finally the answer to the query.

How many 1’s are there in the last 20 bits?


Counting the sizes of the buckets in the last 20 bits, we
say, there are 11 ones.

Advantages

 Stores only O(log2 N) bits - O(log N)counts of log2N bits each

 Easy update as more bits enter - Error in count no greater than the number of 1’s in the unknown
area.

Drawbacks

• As long as the 1s are fairly evenly distributed, the error due to the unknown region is small – no
more than 50%

• But it could be that all the 1s are in the unknown area (indicated by “?” in the below figure) at the
end. In that case, the error is unbounded.

You might also like