Huffman Coding Scheme
Huffman Coding Scheme
Presented To:
Miss Syeda Hira Fatima Naqvi.
Presented By:
Sadaf Rasheed (2K15-CSE-72)
Huffman codes are an effective technique of lossless data compression which means no
information is lost.
Huffman coding could perform effective data compression by reducing the
amount of redundancy in the coding of symbols.
Huffman code is method for the compression for standard text documents.
It makes use of a binary tree to develop codes of varying lengths for the letters used in the
original message.
The algorithm was introduced by David Huffman in 1952 as part of a course assignment at
MIT.
2
HUFFMAN CODING SCHEME
EXAMPLE
6
HUFFMAN CODING SCHEME
DRAWBACKS OF FIXED-LENGTH CODES:
Wasted space
Unicode uses twice as much space as ASCII
inefficient for plain-text messages containing only ASCII characters
Same number of bits used to represent all characters
a and e occur more frequently than q and z
Potential solution: use variable-length codes
variable number of bits to represent characters when frequency of occurrence is known
short codes for characters that occur frequently
7
HUFFMAN CODING SCHEME
ADVANTAGES OF VARIABLE-LENGTH CODES:
The advantage of variable-length codes over fixed-length is short codes can be given to characters that
occur frequently
on average, the length of the encoded message is less than fixed-length encoding
Potential problem: how do we know where one character ends and another begins?
not a problem if number of bits is fixed!
8
HUFFMAN CODING SCHEME
PREFIX PROPERTY:
Prefix codes
Huffman codes are constructed in such a way that they can be unambiguously
translated back to the original data, yet still be an optimal character code
Huffman codes are really considered prefix codes
A code has the prefix property if no character code is the prefix (start of the
code) for another character code
9
HUFFMAN CODING SCHEME
EXAMPLE (PREFIX):
10
HUFFMAN CODING SCHEME
CODE WITHOUT PREFIX PROPERTY:
13
11
HUFFMAN CODING SCHEME
CONTD:
A prefix code is a uniquely decodable code: a receiver can identify each word
without requiring a special marker between words.
12
HUFFMAN CODING SCHEME
CONTD:
Suppose we have two binary code words a and b, where a is k bits long,
b is n bits long, and k < n. If the first k bits of b are identical to a, then a is
called a prefix of b. The last n k bits of b are called the dangling suffix.
For example, if
a = 010 and b = 01011,
then a is a prefix of b and the dangling suffix is 11.
13
HUFFMAN CODING SCHEME
GROUP ID: 18 SMART SHEET BASED ATTENDANCE SYSTEM 16
PURPOSE OF HUFFMAN CODING:
14
HUFFMAN CODING SCHEME
THE BASIC ALGORITHM:
15
HUFFMAN CODING SCHEME
THE BASIC ALGORITHM:
16
HUFFMAN CODING SCHEME
THE (REAL) BASIC ALGORITHM:
1. Scan text to be compressed and tally occurrence of all characters.
5. Scan text again and create new file using the Huffman codes.
17
HUFFMAN CODING SCHEME
ALGORITHM:
n <- |C
Q <- C
for i <- 1 to n-1
do allocate a new node z
left [ z ] <- x <- EXTRACT-MIN (Q)
right [ z ] <- y <- EXTRACT-MIN (Q)
f [ z ] <- f [ x ] + f [ y ]
INSERT(Q, Z)
return EXTRACT-MIN (Q)
18
HUFFMAN CODING SCHEME
ANALYSIS :
Time Complexity
Time complexity of Huffman algorithm is O(nlogn) where each
iteration requires O(logn) time to determine the cheapest
weight and there would be O(n) iterations.
19
HUFFMAN CODING SCHEME
BUILDING A TREE
SCAN THE ORIGINAL TEXT
20
HUFFMAN CODING SCHEME
BUILDING A TREE
SCAN THE ORIGINAL TEXT
E e r i space
ysnarlk.
21
HUFFMAN CODING SCHEME
BUILDING A TREE
SCAN THE ORIGINAL TEXT
22
HUFFMAN CODING SCHEME
BUILDING A TREE
PRIORITIZE CHARACTERS
23
HUFFMAN CODING SCHEME
BUILDING A TREE
E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
24
HUFFMAN CODING SCHEME
BUILDING A TREE
While priority queue contains two or more nodes
Create new node
Dequeue node and make it left subtree
Dequeue next node and make it right subtree
Frequency of new node equals sum of frequency of left and right
children
Enqueue new node back into queue
25
HUFFMAN CODING SCHEME
BUILDING A TREE
E i y l k . r s n a sp e
1 1 1 1 1 1 2 2 2 2 4 8
26
HUFFMAN CODING SCHEME
BUILDING A TREE
27
HUFFMAN CODING SCHEME
BUILDING A TREE
28
HUFFMAN CODING SCHEME
BUILDING A TREE
29
HUFFMAN CODING SCHEME
BUILDING A TREE
30
HUFFMAN CODING SCHEME
BUILDING A TREE
31
HUFFMAN CODING SCHEME
BUILDING A TREE
32
HUFFMAN CODING SCHEME
BUILDING A TREE
33
HUFFMAN CODING SCHEME
BUILDING A TREE
34
HUFFMAN CODING SCHEME
BUILDING A TREE
35
HUFFMAN CODING SCHEME
BUILDING A TREE
36
BUILDING A TREE
37
HUFFMAN CODING SCHEME
BUILDING A TREE
38
HUFFMAN CODING SCHEME
BUILDING A TREE
39
HUFFMAN CODING SCHEME
BUILDING A TREE
40
HUFFMAN CODING SCHEME
BUILDING A TREE
41
HUFFMAN CODING SCHEME
BUILDING A TREE
42
HUFFMAN CODING SCHEME
BUILDING A TREE
43
HUFFMAN CODING SCHEME
BUILDING A TREE
44
HUFFMAN CODING SCHEME
BUILDING A TREE
45
HUFFMAN CODING SCHEME
BUILDING A TREE
46
HUFFMAN CODING SCHEME
BUILDING A TREE
47
HUFFMAN CODING SCHEME
BUILDING A TREE
After
enqueueing
this node there
is only one
node left in
priority queue.
48
HUFFMAN CODING SCHEME
BUILDING A TREE
Dequeue the single node left in the
queue.
This tree contains the new code
words for each character.
Frequency of root node should equal
number of characters in text.
Eerie eyes seen near lake. 26 characters
49
HUFFMAN CODING SCHEME
ENCODING THE FILE
TRAVERSE TREE FOR CODES
4
code word is only completed when a
e 8
6 8
50
HUFFMAN CODING SCHEME
ENCODING THE FILE
TRAVERSE TREE FOR CODES
Char Code
E 0000
i 0001 26
y 0010
l 0011 16
10
k 0100
. 0101
4
space 011 6
e
8
8
e 10
r 1100 2 2
2 sp 4 4
s 1101 4
n 1110 E i y l k .
1 1 1 1 1 1 n a
a 1111 r s
2 2 2 2
51
HUFFMAN CODING SCHEME
ENCODING THE FILE
Rescan text and encode Char Code
file using new code
E 0000
words i
y
0001
0010
l 0011
Eerie eyes seen near lake. k 0100
000010110000011001110001 . 0101
0101101101001111101011111 space 011
100011001111110100100101
e 10
r 1100
Q. Why is there no need for a
s 1101
n 1110
separator character? a 1111
.
52
HUFFMAN CODING SCHEME
ENCODING THE FILE
RESULTS
53
HUFFMAN CODING SCHEME
APPLICATIONS OF HUFFMAN CODING:
54
HUFFMAN CODING SCHEME
CONCLUSION:
55
HUFFMAN CODING SCHEME
59