Huffman Coding
Huffman Coding
BS(SE)-V (Evening)
1.1 background
This project was conducted for Massey university 159.333 Individual
Programming Project paper. The main purpose of this proposed project is about
researching and implementing data compression algorithms. The objectives of the
project are:
1. Research and apply data compression algorithms involves C++ and data
structures.
2. Implement compression algorithms and compare effectiveness of
implementations.
3. Make extensions on current methodological approaches and test designs of
developed algorithms. In this project, two lossless data compression algorithms
were implemented and discussed. Lossless data compression uses statistical
redundancy to represent data with lossless information
2. LITERATURE REVIEW
In this section some compression-related research in the areas of information
theory and data compression algorithm will be discussed. For this project, entropy
and lossless compression algorithm are the main topics to be discussed.
Therefore, a brief introduction of data compression and lossless compression will
be discussed accordingly.
2.1 GENERAL IDEA OF DATA COMPRESSION
2.1.1 SCHEMA OF DATA COMPRESSION
In computer science and communication theory, data compression or source
coding is the art of encoding original data with specific mechanism to fewer bits
(or other information related units). For example, if the word “compression” in
previous sentence is encoding to “comp”, then this sentence can be stored in a
text file with less data. A popular living example is ZIP file format, which is widely
used in PC. It not only provides compress functions, but also offers archive tools
(Archiver) that can store many files in a same root file.
1. Spatial redundancy.
Data compression is possible because most real-world data contain large amount
of statistical redundancy. Redundancy can exist in various forms. For example, in a
text file, the letter “e” in English appears more frequently than the letter “q”. The
possibility is dramatically reduced if the letter “q” followed by “z”. There also
exists data redundancy in multimedia information. Basically, data redundancy has
following types: 1. Spatial redundancy. (The static architectural background, blue
sky and lawn of an image contain many same pixels. If such image is stored pixel
by pixel, then it will waste a lot of space. )
2. Temporal redundancy.
(In television, animation images between adjacent frames likely contain the same
background, only the position of moving objects have some slightly changes. So it
is only worthy to store discrepant portion of the adjacent frame. ) 3. Structure
redundancy 4. Knowledge redundancy 5. Information entropy redundancy. (It is
also known as encoding redundancy, which is the entropy that data carry.) Hence,
one aspect of compression is redundancy removal. Characterization of
redundancy involves some form of modeling. So for the previous English letter
redundancy example, the model of this redundancy is English text. This process is
also known as de-correlation. After the modelling process the information needs
to be encoded into a binary bits representation. This encoding part involves
coding algorithms which are applied
2.2.2 CODING AND ALGORITHM
3. METHODOLOGY
This section will mainly present various methods which were used in this project.
Implementations and experimental results will be discussed in next sections. All
the methods were conducted for testing and making extensions on current data
compression algorithms, such as Huffman Algorithm. A series of programs were
written in C++ to make adaptation of already available device, algorithm and
design, in order to research and improve the productivity of current compression
algorithms. These programs are tested using Visual Studio 2017 IDE; on an Intel
Core i7-4790k 2.7GHz equipped PC machine under Windows 10 operation system.
HuffManCoding.h File
#include <iostream>
#include <cstdlib>
using namespace std;
#define MAX_TREE_HT 100
struct MinHeapNode{
char data;
unsigned freq;
MinHeapNode *left, *right;
};
struct MinHeap{
unsigned size;
unsigned capacity;
MinHeapNode** array;
};
class HuffManCoding
{
public:
HuffManCoding();
~HuffManCoding();
MinHeapNode* newNode(char data, unsigned freq);
MinHeap* createMinHeap(unsigned capacity);
void swapMinHeapNode(MinHeapNode** a,
MinHeapNode** b);
void minHeapify(MinHeap* minHeap, int idx);
int isSizeOne(MinHeap* minHeap);
MinHeapNode* extractMin(MinHeap* minHeap);
void insertMinHeap(MinHeap* minHeap,
MinHeapNode* minHeapNode);
void buildMinHeap(MinHeap* minHeap);
void printArr(int arr[], int n);
int isLeaf(MinHeapNode* root);
MinHeap* createAndBuildMinHeap(char data[], int freq[], int size);
MinHeapNode* buildHuffmanTree(char data[], int freq[], int size);
void printCodes(MinHeapNode* root, int arr[], int top);
void HuffmanCodes(char data[], int freq[], int size);
};
HuffManCoding.cpp
#include "HuffManCoding.h"
HuffManCoding::HuffManCoding()
{
}
HuffManCoding::~HuffManCoding()
{
}
MinHeapNode* HuffManCoding::newNode(char data, unsigned freq)
{
MinHeapNode* temp
= new MinHeapNode();
temp->left = temp->right = NULL;
temp->data = data;
temp->freq = freq;
return temp;
}
MinHeap* minHeap
= new MinHeap();
minHeap->size = 0;
minHeap->capacity = capacity;
minHeap->array
= new MinHeapNode*[capacity];
return minHeap;
}
{
MinHeapNode* t = *a;
*a = *b;
*b = t;
}
{
int smallest = idx;
int left = 2 * idx + 1;
int right = 2 * idx + 2;
if (smallest != idx) {
swapMinHeapNode(&minHeap->array[smallest],
&minHeap->array[idx]);
minHeapify(minHeap, smallest);
}
}
{
MinHeapNode* temp = minHeap->array[0];
minHeap->array[0]
= minHeap->array[minHeap->size - 1];
--minHeap->size;
minHeapify(minHeap, 0);
return temp;
}
++minHeap->size;
int i = minHeap->size - 1;
minHeap->array[i] = minHeapNode;
}
int n = minHeap->size - 1;
int i;
minHeap->size = size;
buildMinHeap(minHeap);
return minHeap;
}
{
MinHeapNode *left, *right, *top;
while (!isSizeOne(minHeap)) {
left = extractMin(minHeap);
right = extractMin(minHeap);
top->left = left;
top->right = right;
insertMinHeap(minHeap, top);
}
return extractMin(minHeap);
}
if (root->left) {
arr[top] = 0;
printCodes(root->left, arr, top + 1);
}
if (root->right) {
arr[top] = 1;
printCodes(root->right, arr, top + 1);
}
if (isLeaf(root)) {
{
MinHeapNode* root
= buildHuffmanTree(data, freq, size);
Main.cpp
#include<iostream>
using namespace std;
#include"HuffManCoding.h";
int main()
{
HuffManCoding HMC;
char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
int freq[] = { 5, 9, 12, 13, 16, 45 };
5. BIBLIOGRAPHY
A.Lesne. (2011). Shannon entropy: a rigorous mathematical notion at the
crossroads between probability, information theory, dynamical systems and
statistical physics.
Retrieved from: http://www.lptmc.jussieu.fr/user/lesne/MSCS-entropy.pdf
A.RRNYI. (1961).
ON MEASURES OF ENTROPY AND INFORMATION. Retrieved from:
http://l.academicdirect.org/Horticulture/GAs/Refs/Renyi_1961.pdf
A.Shahbahrami, R.Bahrampour, M.S.Rostami, M.A.Mobarhan. (2011). Evaluation
of Huffman and Arithmetic Algorithms for Multimedia Compression Standards.
Retrieved from: http://arxiv.org/ftp/arxiv/papers/1109/1109.0216.pdf