0% found this document useful (0 votes)
24 views

Lecture 2: Huffman Coding

Huffman coding is a variable-length encoding technique that uses the frequencies of symbols to construct a prefix code with shorter codes assigned to more frequent symbols. It builds a binary tree from the frequency data and maps symbols to binary strings based on paths in the tree. This results in a code with the minimum possible expected code length, or bit rate, for a prefix code. An example shows how Huffman coding assigns shorter codes like '0' to more frequent symbols like 'a' compared to less frequent symbols like 'd' coded as '11'.

Uploaded by

Nivedita Basu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 2: Huffman Coding

Huffman coding is a variable-length encoding technique that uses the frequencies of symbols to construct a prefix code with shorter codes assigned to more frequent symbols. It builds a binary tree from the frequency data and maps symbols to binary strings based on paths in the tree. This results in a code with the minimum possible expected code length, or bit rate, for a prefix code. An example shows how Huffman coding assigns shorter codes like '0' to more frequent symbols like 'a' compared to less frequent symbols like 'd' coded as '11'.

Uploaded by

Nivedita Basu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Lecture 2: Huffman Coding

Huffman Coding
Huffman (1951) Uses frequencies of symbols in a string to build a variable rate prefix code. Each symbol is mapped to a binary string. More frequent symbols have shorter codes. No code is a prefix of another. Example a b c d 0 100 101 11

Variable Rate Code Example


Example: a 0, b 100, c 101, d 11

Coding: aabddcaa = 16 bits 0 0 100 11 11 101 0 0= 14 bits


Prefix code ensures unique decodability.

Cost of a Huffman Tree


Let p1, p2, ... , pm be the probabilities for the symbols a1, a2, ... ,am, respectively. Define the cost of the Huffman tree T to be

where ri is the length of the path from the root to ai. HC(T) is the expected length of the code of a symbol coded by the tree T. HC(T) is the bit rate of the code.

Example of Cost
Example: a 1/2, b 1/8, c 1/8, d 1/4

HC(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75 a b c d

Huffman Tree
Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively. Output: A tree that minimizes the average number of bits (bit rate) to code a symbol.That is, minimizes

where ri is the length of the path from the root to ai. This is a Huffman tree or Huffman code.

Example of Huffman Tree


Algorithm P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1

Example of Huffman Tree Algorithm

Example of Huffman Tree Algorithm

Example of Huffman Tree Algorithm

Huffman Code
average number of bits per symbol is .4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4 = 2.1

a b c d e

0 1110 10 110 1111

Optimal Huffman Code vs. Entropy


P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1 Entropy H = -(.4 x log2(.4) + .1 x log2(.1) + .3 x log2(.3) + .1 x log2(.1) + .1 x log2(.1)) = 2.05 bits per symbol Huffman Code HC = .4 x 1 + .1 x 4 + .3 x 2 + .1 x 3 + .1 x 4 = 2.1 bits per symbol pretty good!

You might also like