0% found this document useful (0 votes)

29 views

Lecture 3: Entropy, Relative Entropy, and Mutual Information

1) The lecture introduces key information measures including entropy, mutual information, and relative entropy. Entropy measures the uncertainty of a random variable, mutual information measures the information one variable conveys about another, and relative entropy measures the distance between two probability distributions. 2) Properties of entropy are discussed, including that entropy is maximized for a uniform distribution and minimized for a deterministic variable. The chain rule and non-negativity of relative entropy are also covered. 3) Conditional entropy, joint entropy, and mutual information are defined building on entropy. Mutual information quantifies the reduction in uncertainty of one variable given knowledge of another.

Uploaded by

Mostafa Naseri

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Lecture 3: Entropy, Relative Entropy, and Mutual Information

Uploaded by

Mostafa Naseri

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

EE376A Information Theory

Lecture 3 - 01/13/2015

Lecture 3: Entropy, Relative Entropy, and Mutual Information

Lecturer: Tsachy Weissman

Scribe: Alon Devorah, David Hallac, Kevin Shutzberg

In this lecture1 , we will introduce certain key measures of information, that play crucial roles in theoretical
and operational characterizations throughout the course. These include the entropy, the mutual information,
and the relative entropy. We will also exhibit some key properties exhibited by these information measures.

Notation

A quick summary of the notation

1. Random Variables (objects): used more loosely, i.e. X, Y, U, V
2. Alphabets: X , Y, U, V
3. Specific Values: x, y, u, v
(u)

For discrete random variable (object), U has p.m.f: PU

(y|x)
(x,y)
Similarly: p(x, y) for PX,Y and p(y|x) for PY |X , etc.

, P (U = u). Often, well just write p(u).

Entropy

Definition 1. Surprise Function:

s(u) , log

1
(u)

(1)

Definition 2. Entropy: Let U a discrete R.V. taking values in U. The entropy of U is defined by:
(2)
Note: The entropy H(U ) is not a random variable. In fact it is not a function of the object U , but
(u)
rather a functional (or property) of the underlying distribution PU , u U. An analogy is E[U ], which is
also a number (the mean) corresponding to the distribution.
Jensens Inequality: Let Q denote a convex function, and X denote any random variable. Jensens
inequality states that
E[Q(X)] Q(E[X]).

(3)

Further, if Q is strictly convex, equality holds iff X is deterministic.

Example: Q(x) = ex is a convex function. Therefore, for a random variable X, we have by Jensens
inquality:
E[eX ] eE[X]
Conversely, if Q is a concave function, then
E[Q(X) Q(E[X]).

(4)

Example: Q(x) = log x is a concave function. Therefore, for a random variable X 0,

E[log X] log E[X]
1 Reading:

Chapter 2 of Cover and Thomas.

(5)

2.1

Properties of Entropy

W.L.O.G suppose U = {1,2,...,m}

1. H(U ) log m, with equality iff P (u) =

1
m u

(i.e. uniform).

Proof:
1
]
P (U
1
] (Jensens inequality, since log is concave)
log E[
P (U )
X
1
= log
P (U )
P (U )
u

(7)

= log m.

(9)

H(U ) = E[log

Equality in Jensen, iff

1
P (U )

is deterministic, iff p(u) =

(6)

(8)

1
m

2. H(U ) 0, with equality iff U is deterministic.

Proof:
H(U ) = E[log
The equality occurs iff log

1
P (U )

1
1
] 0 since log
0
P (U )
P (U )

(10)

= 0 with probability 1, iff P(U) = 1 w.p. 1 iff U is deterministic.

3. For a PMF q, defined on the same alphabet as p, define

Hq (U ) ,

p(u) log

1
.
q(u)

(11)

Note that this is the expected surprise function, but instead of the surprise associated with p, it is the
surprise associated U , which is distributed according to PMF p, but incorrectly assumed to be having
the PMF of q. The following result stipulates, that we will (on average) be more surprised if we had
the wrong distribution in mind. This makes intuitive sense! Mathematically,
H(U ) Hq (U ),

(12)

with equality iff q = p.

Proof:

1
1
H(U ) Hq (U ) = E log
E log
p(u)
q(u)

q(u)
H(U ) Hq (U ) = E log
p(u)

(13)
(14)

h
By Jensens, we know that E log

q(u)
p(u)

log E

q(u)
p(u)

, so

q(u)
H(U ) Hq (U ) log E
p(u)
X
q(u)
= log
p(u)
p(u)
uU
X
= log
q(u)

(15)
(16)
(17)

= log 1

(18)

(19)

Therefore, we see that

H(U ) Hq (U ) 0.
Equality only holds when Jensens yields equality. That only happens when
only occurs when q = p, i.e. the distributions are identical.

q(u)
p(u)

is deterministic, which

Definition 3. Relative Entropy. An important measure of distance between probability measures is

relative entropy, or the KullbackLeibler divergence:

X
p(u)
p(u)
D(p||q) ,
= E log
p(u) log
(20)
q(u)
q(u)
uU

Note that property 3 is equivalent to saying that the relative entropy is always greater than or equal
to 0, with equality iff q = p (convince yourself).
4. If X1 , X2 , . . . , Xn are independent random variables, then
H(X1 , X2 , . . . , Xn ) =

n
X

H(Xi )

(21)

i=1

Proof:

H(X1 , X2 , . . . , Xn ) = E log

1
p(x1 , x2 , . . . , xn )

(22)

= E [ log p(x1 , x2 , . . . , xn )]

(23)

= E [ log p(x1 )p(x2 ) . . . p(xn )]

" n
#
X
=E
log p(xi )

(24)
(25)

i=1

=
=

n
X
i=1
n
X

E [ log p(xi )]

(26)

H(Xi ).

(27)

i=1

Therefore, the entropy of independent random variables is the sum of the individual entropies. This is
also intuitive, since the uncertainty (or surprise) associated with each random variable is independent.

Definition 4. Conditional Entropy of X given Y

H(X|Y ) , E log

(28)
(29)
(30)
(31)

Note: The conditional entropy is a functional of the joint distribution of (X, Y ). Note that this is also
a number, and denotes the average surprise in X when we observe Y. Here, by definition, we also
average over the realizations of Y. Note that the conditional entropy is NOT a function of the random
variable Y . In this sense, it is very different from a familar object in probability, the conditional
expectation E[X|Y ] which is a random variable (and a function of Y ).
5. H(X|Y ) H(X), equal iff X Y
Proof:

1
1
E log
P (X)
P (X|Y )

P (X|Y ) P (Y )
P (X, Y )
= E log
= E log
]
P (X) P (Y )
P (X)P (Y )
X
P (x, y)
=
P (x, y) log
P
(x)P (y)
x,y

H(X) H(X|Y ) = E log

(32)
(33)
(34)

= D(Px,y ||Px Py )

(35)

(36)

equal iff X Y.

The last step follows from the non-negativity of relative entropy. Equality holds iff Px,y Px Py , i.e.
X and Y are independent.
Definition 5. Joint Entropy of X and Y

1
P (X, Y )

1
= E log
]
P (X)P (Y |X)

H(X, Y ) , E log

(37)
(38)

6. Chain rule for entropy:

H(X, Y ) = H(X) + H(Y |X)

(39)

= H(Y ) + H(X|Y )

(40)

7. Sub-additivity of entropy
H(X, Y ) H(X) + H(Y ),
with equality iff X Y (follows from the property that conditioning does not increase entropy)
4

(41)

Definition 6. Mutual information between X and Y

We now define the mutual information between random variables X and Y distributed according to
the joint PMF P (x, y):
I(X, Y ) , H(X) + H(Y ) H(X, Y )

(42)

= H(Y ) H(Y |X)

(43)

= H(X) H(X|Y )

(44)

= D(Px,y ||Px Py )

(45)

The mutual information is a canonical measure of the information conveyed by one random variable
about another. The definition tells us that it is the reduction in average surprise, upon observing a
correlated random variable. The mutual information is again a functional of the joint distribution of
the pair (X, Y ). It can also be viewed as the relative entropy between the joint distribution, and the
product of the marginals.

Sullivan Wicks Solution Manual Introduction To Optimum Design 3rd Ed Jasbir Arora Students Would Be Able To Come Up With Innovative Conceptual Solutions in W PDF
0% (7)
Sullivan Wicks Solution Manual Introduction To Optimum Design 3rd Ed Jasbir Arora Students Would Be Able To Come Up With Innovative Conceptual Solutions in W PDF
3 pages
Ordinary Thermodynamics - Tamás Matolcsi
No ratings yet
Ordinary Thermodynamics - Tamás Matolcsi
389 pages
Entropy
No ratings yet
Entropy
21 pages
Lecture 2: Entropy and Mutual Information: 2.1 Example
No ratings yet
Lecture 2: Entropy and Mutual Information: 2.1 Example
8 pages
STAT 538 Maximum Entropy Models C Marina Meil A Mmp@stat - Washington.edu
No ratings yet
STAT 538 Maximum Entropy Models C Marina Meil A Mmp@stat - Washington.edu
20 pages
1.1 Shannon's Information Measures: Lecture 1 - January 26
No ratings yet
1.1 Shannon's Information Measures: Lecture 1 - January 26
5 pages
Math7224 Notes
No ratings yet
Math7224 Notes
32 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
22 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
Topo Contractive
No ratings yet
Topo Contractive
5 pages
College Statistics
No ratings yet
College Statistics
244 pages
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
No ratings yet
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
8 pages
L02
No ratings yet
L02
5 pages
Lecture 3: Fano's, Differential Entropy, Maximum Entropy Distributions
No ratings yet
Lecture 3: Fano's, Differential Entropy, Maximum Entropy Distributions
4 pages
2 Entropy and Mutual Information: I (A) F (P (A) )
No ratings yet
2 Entropy and Mutual Information: I (A) F (P (A) )
27 pages
Lecture 1: Entropy and Mutual Information: 2.1 Example
No ratings yet
Lecture 1: Entropy and Mutual Information: 2.1 Example
8 pages
EntropyMethod
No ratings yet
EntropyMethod
8 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
Week 2 PNS Monsoon2019
No ratings yet
Week 2 PNS Monsoon2019
5 pages
Lecture 2: Gibb's, Data Processing and Fano's Inequalities: 2.1.1 Fundamental Limits in Information Theory
No ratings yet
Lecture 2: Gibb's, Data Processing and Fano's Inequalities: 2.1.1 Fundamental Limits in Information Theory
6 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
Theories Joint Distribution
No ratings yet
Theories Joint Distribution
25 pages
Stochastic I CE
No ratings yet
Stochastic I CE
2 pages
Problem Set 1
No ratings yet
Problem Set 1
3 pages
Lecture_15
No ratings yet
Lecture_15
7 pages
Lecture 4 - Inequalities
No ratings yet
Lecture 4 - Inequalities
19 pages
LECTURE 1: Introduction
No ratings yet
LECTURE 1: Introduction
16 pages
Information Theory Entropy Relative Entropy
No ratings yet
Information Theory Entropy Relative Entropy
60 pages
Slide 04
No ratings yet
Slide 04
16 pages
Lecture Note PDF
No ratings yet
Lecture Note PDF
373 pages
Chapter 4
No ratings yet
Chapter 4
97 pages
Ito Shit
No ratings yet
Ito Shit
35 pages
1 Introduction To Information Theory
No ratings yet
1 Introduction To Information Theory
9 pages
Stochastic Processes Notes
100% (1)
Stochastic Processes Notes
22 pages
1-Information Removed
No ratings yet
1-Information Removed
5 pages
Lecture 3 - Entropy
No ratings yet
Lecture 3 - Entropy
35 pages
Modern Crypto 18 Homework 2 Solution
No ratings yet
Modern Crypto 18 Homework 2 Solution
5 pages
Mathematical Problems and Solutions On Information Theory
No ratings yet
Mathematical Problems and Solutions On Information Theory
28 pages
mutual_info_boolean_functions_AGKN2013
No ratings yet
mutual_info_boolean_functions_AGKN2013
7 pages
Zhang Manifold
No ratings yet
Zhang Manifold
16 pages
Simulation
No ratings yet
Simulation
5 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Stochastic Processes
No ratings yet
Stochastic Processes
46 pages
On The Increase Rate of Random Fields From Space On Unbounded Domains
No ratings yet
On The Increase Rate of Random Fields From Space On Unbounded Domains
14 pages
Fernandez
No ratings yet
Fernandez
18 pages
Ee5143 Pset1 PDF
No ratings yet
Ee5143 Pset1 PDF
4 pages
Review of Basic Probability: 1.1 Random Variables and Distributions
No ratings yet
Review of Basic Probability: 1.1 Random Variables and Distributions
8 pages
Cringanu
No ratings yet
Cringanu
10 pages
Stat276 Chapter 6
100% (2)
Stat276 Chapter 6
9 pages
KJM 2013 282
No ratings yet
KJM 2013 282
12 pages
Uncertain We Are of The Outcome
No ratings yet
Uncertain We Are of The Outcome
14 pages
Wattle Lecture 15
No ratings yet
Wattle Lecture 15
6 pages
Information Theory Differential Entropy
No ratings yet
Information Theory Differential Entropy
29 pages
RV Intro
No ratings yet
RV Intro
5 pages
Discussion Notes 2-6
No ratings yet
Discussion Notes 2-6
3 pages
Lecture 1: To Be Determined
No ratings yet
Lecture 1: To Be Determined
3 pages
Afrouzi
No ratings yet
Afrouzi
11 pages
L01
No ratings yet
L01
5 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
BDA - AIDS Syllabus
No ratings yet
BDA - AIDS Syllabus
2 pages
Regression Quiz
No ratings yet
Regression Quiz
2 pages
Thermodynamics - Deriving The Maxwell Relations PDF
100% (1)
Thermodynamics - Deriving The Maxwell Relations PDF
15 pages
PCFG
No ratings yet
PCFG
79 pages
Unofficial USMLE Step1 Score Predictor
No ratings yet
Unofficial USMLE Step1 Score Predictor
8 pages
Data Migration Plan - SF - EC
No ratings yet
Data Migration Plan - SF - EC
1 page
Panel Data
No ratings yet
Panel Data
31 pages
Recognition of Food Type and Calorie Estimation Using Neural Network
No ratings yet
Recognition of Food Type and Calorie Estimation Using Neural Network
22 pages
Ge 114 Module 4
No ratings yet
Ge 114 Module 4
22 pages
gRAPHING LINEAR EQUATIONS PDF
No ratings yet
gRAPHING LINEAR EQUATIONS PDF
1 page
Empirical Models PDF
No ratings yet
Empirical Models PDF
11 pages
NieHouAn AIPlays2048 Report
No ratings yet
NieHouAn AIPlays2048 Report
5 pages
CHAPTER 2 - 2 - LP-Simplex Solution
No ratings yet
CHAPTER 2 - 2 - LP-Simplex Solution
78 pages
(Chatterjee, 2011) Interactions of Self-Organizing Systems in Nature
No ratings yet
(Chatterjee, 2011) Interactions of Self-Organizing Systems in Nature
3 pages
Algorithms Notes For Professionals
100% (1)
Algorithms Notes For Professionals
252 pages
Quiz 3 Pool
No ratings yet
Quiz 3 Pool
9 pages
COL 106 - Intro. To Data Structures and Algorithms
No ratings yet
COL 106 - Intro. To Data Structures and Algorithms
3 pages
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
No ratings yet
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
4 pages
The Mathematics of Modern Physics: Description
No ratings yet
The Mathematics of Modern Physics: Description
2 pages
Unit1 DBMS
No ratings yet
Unit1 DBMS
112 pages
IV Sem Flexi Core2
No ratings yet
IV Sem Flexi Core2
4 pages
Eas254 - SDM Part 3 - No Sidesway
No ratings yet
Eas254 - SDM Part 3 - No Sidesway
6 pages
Data Science For Supply Chain Forecasting 2nd Edition-Extract2
No ratings yet
Data Science For Supply Chain Forecasting 2nd Edition-Extract2
84 pages
An Introduction To Chaotic Dynamical Systems: October 2021
No ratings yet
An Introduction To Chaotic Dynamical Systems: October 2021
36 pages
Geovariances MPS
No ratings yet
Geovariances MPS
2 pages
Opt Lecture 1
No ratings yet
Opt Lecture 1
13 pages
A 21 A 891 Ed 27881
No ratings yet
A 21 A 891 Ed 27881
16 pages
CSI 2110 Midterm 2014
No ratings yet
CSI 2110 Midterm 2014
11 pages