0% found this document useful (0 votes)
124 views20 pages

Introduction To Communications: Source Coding

This lecture discusses source coding and Huffman coding. It introduces information theory concepts like entropy and self-information. The source coding theorem states that it is possible to encode source symbols into a bitstream at a rate of at least the source entropy. Huffman coding is an optimal variable-length coding technique that assigns shorter codes to more probable symbols. It constructs a binary tree by recursively combining the least probable symbols.

Uploaded by

Harsha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
124 views20 pages

Introduction To Communications: Source Coding

This lecture discusses source coding and Huffman coding. It introduces information theory concepts like entropy and self-information. The source coding theorem states that it is possible to encode source symbols into a bitstream at a rate of at least the source entropy. Huffman coding is an optimal variable-length coding technique that assigns shorter codes to more probable symbols. It constructs a binary tree by recursively combining the least probable symbols.

Uploaded by

Harsha
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 20

3100/7100

Introduction to Communications
Lecture 31: Source Coding
This lecture:
1. Information Theory.
2. Entropy.
3. Source Coding.
4. Huffman Coding.
Ref: CCR pp. 697–709, A Mathematical Theory of Communication.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 1 / 20


Information Theory

Claude Shannon’s paper, A Mathematical Theory of


Communication, showed that reliable
communication is possible at non-zero rate.
I Shannon proposed the following model for

point-to-point communication.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 2 / 20


Information Theory (2)
I Shannon showed that there is a very widely
applicable, quantitative de inition of
information.
I The source generates information at a certain
rate.
I The noisy channel can be shown to have a
capacity at which it can reliably transmit
information.
I Reliable transmission is possible if and only if
the source’s rate is not greater than the
channel’s capacity.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 3 / 20


The Mathematical Model

The source produces symbols from an alphabet.


I The alphabet is the (discrete) set of all
possible symbols.
E.g., the Roman alphabet, the Unicode character set or the range of
possible pixel intensities from a camera sensor.

I The source produces these symbols in a


discrete-time sequence.
I The channel has its own alphabet or, rather,
alphabets: one at the input and one at the
output.
E.g., consider a channel including a polar NRZ modulator and
demodulator, so the input and output alphabets are {−A, +A}.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 4 / 20


The Mathematical Model (2)
I At each use of the channel, the channel output
is random (because of noise) but dependent
on the input.
I The rate at which the source generates
symbols may be different to the rate at which
the channel is used.
I Some mechanism is needed to translate
between the source and the channel alphabets
and back again.
I Shannon found that this can always be broken
into two independent processes at the
transmitter: source and channel coding.
I Similarly, at the receiver, source and channel decoding.
COMS3100/COMS7100 Intro to Communications L31 - Source Coding 5 / 20
The Mathematical Model (3)

I Without loss of generality, the common


language of the source and channel (de)coder
is a bitstream.
I Source symbol generation, intermediate
bitstream and channel use may all be at
different rates.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 6 / 20


Information & Entropy
We measure information and entropy in terms of
probability and random variables.
Self-Information
I In order to use random variables, map the
source alphabet to the numbers {0, . . . , M − 1}
where M is the size of the alphabet.
I Let the source symbol selected for
transmission at a certain time instant be
represented as a discrete r.v. X.
I Let i represent one possible value for X and
de ine pi = P (X = i).

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 7 / 20


Self-Information
The amount of information (or surprise) at
learning that X = i is log(1/pi ) = − log pi .
I Hence, the rarer the event, i.e., the less probable,

the more surprising and the more informative


it is.
I Shannon calls this self-information.
I The base of the logarithm has not been
speci ied, but it is usually taken to be 2, in
which case the unit is bits.
I If we take the natural logarithm, the unit is nats.
I (Technically, self-information is dimensionless.)
I Measurement in bits is natural: if eight
symbols are equally likely, it makes sense that
they have 3 bits of information each.
COMS3100/COMS7100 Intro to Communications L31 - Source Coding 8 / 20
Entropy
The expected self-information is called the
entropy of X:
[ ] ∑
M−1
1
H (X) = E log =− pi log pi .
pi i=0

I Shannon used this name because of the


similar expression that arises in statistical
thermodynamics.
I It can be regarded as measure of randomness
or disorder.
I Degenerate r.v.s have zero entropy.
COMS3100/COMS7100 Intro to Communications L31 - Source Coding 9 / 20
Entropy (2)

I It can be shown that uniformly distributed r.v.s


have the highest entropy for a given M, so that

0 ≤ H (X) ≤ log M.
I We’ll use Hb (X) when we need to be explicit
about using a logarithm to the base b.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 10 / 20


The Source Coding Theorem
If, each time the source emits a symbol, it is
independent of previous symbols and identically
distributed, we call it a discrete memoryless source
(DMS).
I Suppose the DMS emits r symbols per second.

I Shannon showed that it is possible to use


source coding to encode the symbols in a
bitstream at rH2 (X) bits per second.
I Conversely, he showed it is impossible to have a uniquely
decodable bitstream at a lower rate.

I This is Shannon’s source coding theorem (and


converse).

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 11 / 20


Shannon’s Source Coding Procedure

In proving the source coding theorem, Shannon


devised a simple but impractical source coding
scheme.
I We group N symbols together into a block.

I All likely sequences (in a certain sense) are

identi ied.
I These sequences are enumerated using binary

words of NH2 (X) bits (rounded down;


ignoring some sequences if too many).
I This constitutes the codebook.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 12 / 20


Coding Procedure (2)

I To perform source coding, compare a given


symbol sequence against those in the
codebook & output the code, if there is one.
I The probability that this scheme doesn’t work
→ 0 as N → ∞.
I This scheme is impractical because there is not
necessarily much structure in the codebook.
⇒ The codebook may require massive storage space.
⇒ The codebook may need to be exhaustively searched.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 13 / 20


Variable-Length Codes

An ideal source-coding scheme is theoretically


easy but practically dif icult.
I Also, for inite N, the scheme is unreliable (not

all sequences have codes!).


I How to make the best possible codes for inite

block sizes?
I We’ll start with codes for a single symbol, i.e.,

N = 1.
I Consider a variable-length source code where

each symbol maps to a variable number of bits.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 14 / 20


Variable-Length Codes(2)

I Suppose symbol i is assigned a code of ni bits.


I It turns out that the code can be made
uniquely decodable if and only if the Kraft
inequality is satis ied:

M−1
2−ni ≤ 1.
i=0

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 15 / 20


Huffman Coding

In 1952, David Huffman discovered a simple


method of constructing an optimal
variable-length, uniquely decodable source code.
I The method constructs a binary tree—a tree in

which each node has at most two children.


I It proceeds from the bottom up, combining

leaves into ‘twigs’, twigs into ‘branches’ and so


on until the tree is built.
I Let’s call any partially assembled portion of

the tree a twig.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 16 / 20


Huffman Coding Technique

I To start with, there are M twigs which consist


only of the leaves themselves, the symbols.
I The probability of a twig is the sum of the
probabilities of all of its leaves.
I The code construction algorithm is simply the
following step, iterated (M − 1 times) until
only one twig remains:
I Choose the two twigs with the least probability and assemble
them together to make a larger twig.
I To read off the codes, descend through the tree
towards the symbol’s leaf.
I Each time we take a left branch, output a ‘0’, otherwise a ‘1’.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 17 / 20


Huffman Code Example
Consider a source with M = 5 for which the probabilities are
p0 = p1 = 0.25, p2 = 0.2, p3 = p4 = 0.15.

p=1 Step 4
0

p=
Step 3 0.55
1

0 1

p=
p= 0 0.45
0.3
0 1
p0=0.25 0 1
Step 2
Step 1 1 2
3 4 p1=0.25 p2=0.2
p3=0.15 p4=0.15

I Average code length is 2.3 bits and the entropy is 2.29 bits.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 18 / 20


Developments of Source Coding

Source coding is also known as data compression.


I More particularly, lossless data compression,

since the input and output symbols are


identical.
I Our exposition required that the probability

distribution is known in advance.


I If we don’t, we can use universal source
coding.
I Examples: Lempel-Ziv (LZ77) & Lempel-Ziv-Welch (LZW)
algorithms & derivatives such as DEFLATE in ZIP & gzip
software.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 19 / 20


Source Coding Applications

I For sources like English text, lossless coding is


very important.
I In other applications, like audio, images and
video, we may be able to put up with some
distortion for a lower bit rate.
I In 1963, Shannon developed rate-distortion
theory, the basis of modern lossy data
compression.
I Examples: voice coding in mobile phones, MP3 for music,
JPEG for images, MPEG for video.

COMS3100/COMS7100 Intro to Communications L31 - Source Coding 20 / 20

You might also like