Huffman Coding Notes
Huffman Coding Notes
Huffman Coding
Introduction
Huffman Coding is one approach followed for T
ext Compression. Text
compression means reducing the space requirement for saving a particular text.
Here, each of the characters of the string takes 8 bits of memory. Since there are a
total of 15 characters in the string so the total memory consumption will be 15*8 =
120 bits. Let’s try to compress its size using the Huffman Algorithm.
1
1. Begin with calculating the frequency of each character value in the given
string.
2. Sort the characters in ascending order concerning their frequency and store
them in a priority queue, say Q
.
3. Each character should be considered as a different leaf node.
4. Make an empty node, say z . The left child of z is marked as the minimum
frequency and the right child, the second minimum frequency. The value of z
is calculated by summing up the first two frequencies.
2
5. Now, remove the two characters with the lowest frequencies from the
priority queue Q a
nd append their sum to the same.
6. Simply insert the above node z to the tree.
7. For every character in the string, repeat steps 3 to 5.
3
8. Assign 0 to the left side and 1 to the right side except for the leaf nodes.
4
B 1 100 1*3 = 3
C 6 0 6*1 = 6
D 3 101 3*3 = 9
To decode the code, simply traverse through the tree (starting from the root)
to find the character. Suppose we want to decode 101, then:
Time complexity:
In the case of encoding, inserting each character into the priority queue takes
O(log n) time. Therefore, for the complete array, the time complexity becomes
O(nlog(n)).
5
Python Code:
Go through the given Python code, for deeper understanding:
def nodes(self):
return (self.left, self.right)
def __str__(self):
return '%s_%s' % (self.left, self.right)
6
# Calculating frequency
freq = {}
for c in string:
if c in freq:
freq[c] += 1
else:
freq[c] = 1
huffmanCode = huffman_code_tree(nodes[0][0])
7