Lecture Notes of Advanced Probability

Contents
1 Set Theory 1
1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic set operations . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Operations of sequence of sets . . . . . . . . . . . . . . . . . . 2
1.4 Indicator functions . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Semirings, Fields and σ-Fields . . . . . . . . . . . . . . . . . . 7
1.6 Minimal σ-fields . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Measure 13
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Some examples of measure . . . . . . . . . . . . . . . . . . . . 13
2.3 Properties of a measure . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Arithmetics with ∞ . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Extension of measures . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Complete measure spaces . . . . . . . . . . . . . . . . . 17
2.5.2 Caratheodory Extension Theorem . . . . . . . . . . . . 18
2.6 Probability measures and distribution functions . . . . . . . . 20
3 Random Variables 25
3.1 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Measurable mapping . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Random Variables (Vectors) . . . . . . . . . . . . . . . . . . . 27
3.3.1 Random variables . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 How to check a random variable? . . . . . . . . . . . . 28
3.3.3 Random vectors . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Construction of random variables . . . . . . . . . . . . . . . . 28
3.5 Approximations of r.v. by simple r.v.s . . . . . . . . . . . . . 30
3.6 σ-field generated by random variables . . . . . . . . . . . . . . 33
3.7 Distributions and induced distribution functions . . . . . . . . 34
3.8 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.9 How to check independence . . . . . . . . . . . . . . . . . . . 38
3.9.1 Discrete random variables . . . . . . . . . . . . . . . . 39
3.9.2 Absolutely continuous random variables . . . . . . . . 40
3.10 Zero-One Law . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Expectation and Integration 45

4.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Properties of Expectation . . . . . . . . . . . . . . . . . . . . 47
1
2 CONTENTS
4.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 How to compute expectation . . . . . . . . . . . . . . . . . . . 54
4.4.1 Expected values of absolutely continuous r.v. . . . . . . 55
4.4.2 Expected values of discrete r.v. . . . . . . . . . . . . . 57
4.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Joint integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Convergence of Random Variables 63

5.1 Types of convergence . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Relationship between types of convergences . . . . . . . . . . . 65
5.3 Partial converses . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4 Closed operations of convergences . . . . . . . . . . . . . . . . 68
6 The Law of Large Numbers 69

6.1 Strong Law of Large Numbers . . . . . . . . . . . . . . . . . . 69
6.2 Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . 73
7 The Central Limit Theorem 75

7.1 Characteristic functions . . . . . . . . . . . . . . . . . . . . . 75
7.1.1 Properties of characteristics functions . . . . . . . . . . 75
7.1.2 Moments and derivatives . . . . . . . . . . . . . . . . . 77
7.1.3 Correspondence between characteristics functions and
distribution . . . . . . . . . . . . . . . . . . . . . . . . 80
7.2 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . 84
8 Martingales 87
8.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . 87
8.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Chapter 1
Set Theory
1.1 Sets
A set is a collection of objects (or elements), denoted by capital letters, eg.

A, B, C, · · · .
∅: an empty set. Ω: a space (a nonempty reference set).
ω ∈ A: ω is an element of A.
A ⊂ B: the set A is a subset of B means that every element of A is an

element of B. This includes the case when A and B are equal. One of the
possible ways to show A = B is to prove A ⊂ B and B ⊂ A.
A class is a collection of subsets of Ω, denoted by A, B, C, D, E, F, G, · · · .
Convention: ∅ ⊂ A for any set A.
1.2 Basic set operations

Let A and B be subsets of Ω, define
A ∪ B = {ω : ω ∈ A or ω ∈ B} (union)
A ∩ B = {ω : ω ∈ A and ω ∈ B} (intersection)
A − B = {ω : ω ∈ A and ω 6∈ B} (difference)
c
A =: Ω − A = {ω : ω 6∈ A} (complement)
A∆B = (A − B) ∪ (B − A) (symmetric difference)
Definition If A ∩ B = ∅, then A and B are said to be disjoint.
1
2 CHAPTER 1. SET THEORY
1.3 Operations of sequence of sets

Let A1 , A2 , A3 , · · · be subsets of Ω, denote their union and intersection by
∪∞
n=1 An = {ω : ω ∈ An for some n}
∩∞
n=1 An = {ω : ω ∈ An for all n}
If A1 , A2 , · · · are disjoint, then we write
∞
X
∪∞
n=1 An = An .
n=1
DeMorgan’s Law:
c
(∪∞
n=1 An ) = ∩∞ c
n=1 An
c
(∩∞
n=1 An ) = ∪∞ c
n=1 An
Definition Let A1 , A2 , A3 , · · · be subsets of Ω. Define

(a) Infinitely often (i.o.)
lim sup An ≡ limn An
n
= ∩∞ ∞
n=1 ∪k=n Ak
= {ω : ∀n ≥ 1, ∃k ≥ n, s.t. ω ∈ Ak }
= {ω : ω ∈ An for infinity many values of n}
= {An , i.o.}
(b) Ultimately (ult.)

lim inf An ≡ limn An
n
= ∪∞ ∞
n=1 ∩k=n Ak
= {ω : ∃n ≥ 1, ∀k ≥ n, s.t. ω ∈ Ak }
= {ω : ω ∈ An for all but finitely many values of n}
= {An , ult.}
(c) The sequence {An } converges to A, written as A = limn→∞ An or simply

An → A if and only if
lim inf An = lim sup An = A.
n n
1.3. OPERATIONS OF SEQUENCE OF SETS 3
Remark The upper limit and lower limit of the sequence of real numbers
{an , n ≥ 1} are defined by
Upper limit: lim sup an = inf sup ak ;

n n≥1 k≥n
Lower limit: lim inf an = sup inf ak .

n n≥1 k≥n
Example
You play a game with your friend by tossing a fair coin. You win the n-th
round of the game if ”Head” appears in that round.
Let An be the event that you win the n-th round. Then, Acn will represent
that you lose the n-th round.
lim supn An means that ∀n ≥ 1, ∃k ≥ n such that you will win the k-th
round. This is equivalent to that you win infinitely many rounds.
On the other hand, lim inf n Acn means that ∃n ≥ 1, you will loss all the rounds
starting from the n-th round. This is equivalent to that you ultimately lose
all of the rounds.
Theorem 1.3.1. We have
lim inf An ⊂ lim sup An .

n n
Proof. By defintion,
lim inf An = {ω : ω ∈ An for all but finitely many values of n}

n
⊂ {ω : ω ∈ An for infinitely many values of n}
= lim sup An .
n
Example. (lim inf n An and lim supn An may not be equal)

Specify lim inf n An and lim supn An when A2j = B, and A2j−1 = C, j =
1, 2, · · · .
Solution
Clearly,
lim inf An = ∪∞ ∞ ∞
n=1 ∩k=n Ak = ∪n=1 (B ∩ C) = B ∩ C,
n
lim sup An = ∩∞ ∞ ∞
n=1 ∪k=n Ak = ∩n=1 (B ∪ C) = B ∪ C.
n
Theorem 1.3.2. (Monotone sequence of sets converge).

(a) If A1 ⊂ A2 ⊂ A3 · · · , then An → A = ∪∞
n=1 An , written as An ↑ A.
(b) If A1 ⊃ A2 ⊃ A3 · · · , then An → A = ∩∞
n=1 An , written as An ↓ A.
Proof. We shall only prove (a). Let A = ∪∞

n=1 An . Clearly,
lim inf An = ∪∞ ∞ ∞
n=1 ∩k=n Ak = ∪n=1 An = A
n
lim sup An = ∩∞ ∞ ∞ ∞ ∞
n=1 ∪k=n Ak = ∩n=1 (∪k=1 An ) = ∩n=1 A = A.
n
Therefore, An → A = ∪∞
n=1 An .
Corollary 1.3.1. Let A1 , A2 , · · · be subsets of Ω.

{An , i.o.} = lim sup An = lim ∪∞
k=n Ak
n n→∞
{An , ult.} = lim inf An = lim ∩∞

k=n Ak
n n→∞
Proof. We shall only prove that the first one. Let Bn = ∪∞

k=n Ak . Clearly,
B1 ⊃ B2 ⊃ B3 · · · .
From Theorem 1.3.2,
lim Bn = ∩∞ ∞ ∞
n=1 Bn = ∩n=1 ∪k=n Ak = lim sup An .
n→∞ n
1.4 Indicator functions

Let A ⊂ Ω, define

1 for ω ∈ A
IA (ω) = I{ω ∈ A} =
0 for ω ∈ Ac .
Thus IA ”indicates” whether A occurs or not, depending on whether it is 1
or 0. Therefore, the next theorem is obvious.
The indicator function transforms operations of sets into algebraic operations

of 1’s and 0’s, which are often easier to deal with.
Theorem 1.4.1. ∀A, B ⊂ Ω, we have

A = B ⇔ IA = IB ,
A ⊂ B ⇔ IA ≤ IB ,
A = ∅ or Ω ⇔ IA = 0 or 1.
1.4. INDICATOR FUNCTIONS 5
Theorem 1.4.2.
IA∪B = max{IA , IB }
IA∩B = min{IA , IB } = IA IB
IAc = 1 − IA
IA∆B = |IA − IB |
Ilim inf n An = lim inf IAn
n
Ilim supn An = lim sup IAn
n
IA∪B ≤ IA + IB
∞
X
I∪∞
n=1 An
≤ IAn
n=1
Proof. We shall only prove some of the above relationships.
IA∪B = max{IA , IB }:
∀ω ∈ Ω
If IA∪B (ω) = 1 ⇒ ω ∈A∪B

⇒ ω ∈ A or ω ∈ B
⇒ IA (ω) = 1 or IB (ω) = 1
⇒ IA∪B (ω) = max{IA (ω), IB (ω)}
Similarly, we can show that this is true if IA∪B (ω) = 0. Thus, we proved
IA∪B = max{IA , IB }.
Ilim inf n An = lim inf n IAn :
∀ω ∈ Ω
If Ilim inf n An (ω) = 1

⇒ ω ∈ lim inf An
n
⇒ ∃n ≥ 1, ∀k ≥ n, s.t. ω ∈ Ak .
⇒ ∃n ≥ 1, ∀k ≥ n, s.t. IAk (ω) = 1.
⇒ lim inf IAn (ω) = sup inf IAk (ω) = 1
n n≥1 k≥n
On the other hand,
If Ilim inf n An (ω) = 0

⇒ ω ∈ (lim inf An )c = lim sup Acn (by the DeMorgan’s law)
n n
⇒ ∀n ≥ 1, ∃k ≥ n, s.t. ω ∈ Ack .
⇒ ∀n ≥ 1, ∃k ≥ n, s.t. IAck (ω) = 1.
⇒ lim sup IAcn (ω) = inf sup IAck (ω) = 1
n n≥1 k≥n
⇒ 1 = lim sup IAcn (ω) = lim sup[1 − IAn (ω)] = 1 − lim inf IAn (ω).
n n n
⇒ lim inf IAn (ω) = 0.

n
IA∆B = |IA − IB |:
Consider 4 cases:
(a) IA = 1, IB = 1, then LHS = 0 = RHS.
(b) IA = 1, IB = 0, then LHS = 1 = RHS.
(c) IA = 0, IB = 1, then LHS = 1 = RHS.
(d) IA = 0, IB = 0, then LHS = 0 = RHS.
This completes the proof.
Example
Show (A∆B)∆C = A∆(B∆C).
Proof. Note
ILHS = |IA∆B − IC | = ||IA − IB | − IC |,

IRHS = |IA − IB∆C | = |IA − |IB − IC ||.
If IB = 0, then
ILHS = |IA − IC | = IRHS .
If IB = 1, then
IRHS = ||IB − IC | − IA | = |1 − IC − IA | = ILHS .

1.5. SEMIRINGS, FIELDS AND σ-FIELDS 7
Theorem 1.4.3.
n
X X
I∪nj=1 Aj = IAj − IAj1 ∩Aj2 +
j=1 1≤j1 <j2 ≤n
X
IAj1 ∩Aj2 ∩Aj3 + · · · + (−1)n−1 IA1 ∩A2 ∩···∩An .
1≤j1 <j2 <j3 ≤n
Proof. Set s1 = nj=1 IAj , s2 = 1≤j1 <j2 ≤n IAj1 ∩Aj2 , · · · , sn = IA1 ∩A2 ∩···∩An .
P P
Then we need to show
I∪nj=1 Aj = s1 − s2 + s3 − · · · + (−1)n−1 sn . (1.1)
For some ω ∈ Ω.
If I∪nj=1 Aj (ω) = 0 ⇒ ω 6∈ Aj for all j.

⇒ sk (ω) = 0, 1 ≤ k ≤ n
⇒ (1.1) holds.
On the other hand,

If I∪nj=1 Aj (ω) = 1 ⇒ ω ∈ Aj for at least one j.
Suppose that ω belong to exactly m of the sets A1 , A2 , · · · , An . Then

s1 (ω) = m, s2 (ω) = C2m , · · · , sm (ω) = 1, sm+1 (ω) = · · · = sn (ω) = 0.
Whence
s1 − s2 + s3 − · · · + (−1)n−1 sn = C1m − C2m + · · · + (−1)m−1 Cm

m
= C0m − (1 − 1)m = 1 = I∪nj=1 Aj .
1.5 Semirings, Fields and σ-Fields

Let Ω be a non-empty space.
Definition A class F of subsets of Ω is a semiring on Ω if
(i) ∅ ∈ F;
(ii) A, B ∈ F implies A ∩ B ∈ F;
(iii) if A, B ∈ F and A ⊂ B, then B − A is a finite disjoint union of sets in

F, i.e.,
B − A = ∪nk=1 Ck ,
where Ci ∈ F and Ci ∩ Cj = ∅ if i 6= j.
(i.e. the semiring F is closed under intersection.)
Definition A class F of subsets of Ω is a field (or algebra) on Ω if
(i) Ω ∈ F;
(ii) A ∈ F implies Ac ∈ F;
(iii) A, B ∈ F implies A ∪ B ∈ F.
(i.e. F is closed under the formation of complements and finite unions.)
Definition A class F of subsets of Ω is a σ-field (or σ-algebra) on Ω if
(i) Ω ∈ F;
(ii) A ∈ F implies Ac ∈ F;
(iii) A1 , A2 , · · · ∈ F implies ∪∞
n=1 An ∈ F.
(i.e. F is closed under the formation of complements and countable unions.)

The ordered pair (Ω, F) is called a measurable space. The sets of F are
called measurable sets or F-measurable.
Remark
(a) If F is a field (or a σ-field), then ∅ ∈ F.
Proof. By (i), Ω ∈ F. Use (ii), ∅ = Ωc ∈ F.
(b) For A, B ∈ F, by DeMorgan’s law, we have A ∩ B = (Ac ∪ B c )c . By

using (ii) and (iii), it can be shown that A ∩ B ∈ F. Therefore, a field is
closed under the formation of finite intersections. Similarly, the
σ-field is closed under the formation of countable intersections.
(c) A σ-field is a field (by taking Aj = A2 for j ≥ 3), but not vice versa.
1.5. SEMIRINGS, FIELDS AND σ-FIELDS 9
(d) In probability, Ω is a sample space, and measurable sets are events.

Probability is a function defined on the σ-field F (Details will be given
in Chapter 2).
Example
(a) Let Ω = (−∞, ∞]. Let F be the collection of half open intervals in Ω,
i.e.,
F = {(a, b] : −∞ ≤ a ≤ b ≤ ∞}
Then F is a semiring, but NOT a field.
Proof. For the first part, note that (a, a] = ∅ ∈ F. It is easy to check
that the intersection of two half open intervals are still half open intervals
(including ∅). So it is closed under intersection. On the other hand, if
A = (a2 , b1 ], B = (a1 , b2 ] ∈ F and a1 < a2 < b1 < b2 , then A ⊂ B.
B − A = (a1 , a2 ] ∪ (b1 , b2 ].
So, B − A is equal to the union of two disjoint half open intervals. Thus,
F is a semiring.
For the second part, note that (n, n + 1] ∈ F, but (0, 1] ∪ (2, 3] 6∈ F. So
it is not closed under finite union, thus not a field.
(b) Let Ω = (0, 1]. Define
F = {∪m
i=1 (ci , di ] : 0 < c1 ≤ d1 ≤ c2 ≤ d2 · · · ≤ dm ≤ 1} ∪ ∅.
Then F is a field, but NOT a σ-field.
Proof. The proof of F being a field is easy (left as an exercise). Now, we

want to show that F is not a σ-field.
Note that, 21 − n+2
1
, 12 ∈ F. Consider

1 1 1 1
∩∞
n=1 − , = 6∈ F.
2 n+2 2 2
So, F is not closed under countable intersections, from remark (b), F is

not a σ-field.
Example
(a) F = {∅, Ω}, (trivial σ-algebra). This is the smallest σ-field.

(b) F = all subsets of Ω, denoted by P(Ω), (power set). This the largest
σ-field, often too big to define probability.
(c) ∀A ⊂ Ω, {∅, A, Ac , Ω} is a σ-field, (the smallest σ-field containing A).
1.6 Minimal σ-fields

Lemma 1.6.1. Let Γ be an index set and {Aγ : γ ∈ Γ} be a collection of
σ-fields. Then A = ∩γ∈Γ Aγ is also a σ-field.
[i.e., σ-fields are closed under uncountable intersection.]
Proof.
(1) Since Aγ is a σ-field, Ω ∈ Aγ for all γ ∈ Γ. Therefore, Ω ∈ A.
(2) A ∈ A implies A ∈ Aγ for all γ ∈ Γ, which in turn implies Ac ∈ Aγ for
all γ ∈ Γ. Therefore, Ac ∈ A. That is, A is closed under the formation
of complements.
(3) If Ai ∈ A, i ≥ 1, then Ai ∈ Aγ for all i ≥ 1 and for all γ ∈ Γ. Hence,
∪∞ ∞
i=1 Ai ∈ Aγ for all γ ∈ Γ. So ∪i=1 Ai ∈ ∩γ∈Γ Aγ = A. So A is closed
under the formation of countable unions.
Theorem 1.6.1. For any class A, there exists a unique minimal σ-field
containing A, denoted by σ(A), called the σ-field generated by A. In other
words,
(a) A ⊂ σ(A),
(b) For any σ-field B with A ⊂ B, σ(A) ⊂ B,
and σ(A) is unique.
Proof. Existence.
Let QA = {B : B ⊃ A, B is a σ-field on Ω}. Clearly, QA is not empty since
it contains the power set P(Ω). Define
σ(A) = ∩B∈QA B.
From Lemma 1.6.1, σ(A) is a σ-field.
Uniqueness.
Let σ1 (A) be another σ-field satisfying (a) and (b). By definition, σ(A) ⊂
σ1 (A). By symmetry, σ1 (A) ⊂ σ(A). Thus, σ(A) = σ1 (A).
1.7. PRODUCT SPACES 11
Definition
The smallest σ-field generated by the collection of all finite open intervals
on the real line R = (−∞, ∞) (or R = (−∞, ∞] or R = [∞, ∞]) is called
the Borel σ-field, denoted by B. The element of B are called Borel sets.
the ordered pair (R, B) is called the (1-dimensional) Borel measurable
space.
Remark
(1) Every ”reasonable” subset of R is a Borel set. Closed (open) intervals,
half-open (closed) intervals and countable union (intersection) of open
(closed) intervals are examples of Borel sets. However, B 6= P(R).
(2) For A ∈ B, let
BA = {B ∩ A : B ∈ B} = B ∩ A.
Then (A, BA ) is a measurable space, and BA is called the Borel σ-field
on A.
1.7 Product spaces

For any measurable spaces (Ωi , Ai ), i = 1, 2, · · · define for n ≥ 2:
(1) n-dim rectangles of the product space of ni=1 Ωi :
Q
n
Y
Ai =: A1 × · · · × An = {(ω1 , · · · , ωn ) : ωi ∈ Ai ⊂ Ωi , 1 ≤ i ≤ n}.
i=1
Moreover, if Ai ∈ Ai , 1 ≤ i ≤ n, they are called measurable rectangles

or rectangles with measurable sides.
(2) n-dim product σ-field:
n
( n )!
Y Y
Ai = σ Ai : Ai ∈ Ai , 1 ≤ i ≤ n .
i=1 i=1
(3) n-dim product measurable space:

n n n
!
Y Y Y
(Ωi , Ai ) = Ωi , Ai .
i=1 i=1 i=1
In the special case where (Ωi , Ai ) = (R, B), the n-dim Borel measur-
able space is given by (Rn , B n ).
Chapter 2
Measure
2.1 Definitions
Let Ω be a space, A be a class of Ω, and µ : A → R = [0, ∞] be a set function
defined on A.
Definition
(i) µ is finite on A if µ(A) < ∞, ∀A ∈ A.
(ii) µ is σ-finite on A if there is a sequence of sets A1 , A2 , · · · in A with
∪∞
n=1 An = Ω and µ(An ) < ∞ for each n.
Definition Assume A1 , A2 , · · · is a sequence of disjoint sets in A and ∞

P
i=1 Ai ∈
A. µ is a measure if it is countable additive which means
∞
! ∞
X X
µ Ai = µ(Ai ). (2.1)
i=1 i=1
Definition
(i) If µ is a measure on a σ-field A of subsets of Ω, the ordered triplet
(Ω, A, µ) is a measure space. The elements of A are called measur-
able sets, or A-measurable.
[Note: (Ω, A) = measurable space 6= measure space = (Ω, A, µ).]
(ii) A measure space (Ω, A, µ) is a probability space if µ(Ω) = 1. µ is
called a probability measure usually written as P .
2.2 Some examples of measure

Example (Counting measure)
(a) Let Ω be a countable set.
For any A ⊂ Ω, define µ(A) = n(A) =: number of elements in A, (∞ if
it contains infinitely many elements)
Then we can show µ is a measure on the measurable space (Ω, P(Ω)),
where P(Ω) is the power set of Ω. And, µ is called the counting mea-
sure.
13
14 CHAPTER 2. MEASURE
Proof. Let A1 , A2 , · · · be a sequence of disjoint sets in P(Ω) and consider

∞
! ∞
! ∞ ∞
X X X X
µ Ai = n Ai = n(Ai ) = µ(Ai ).
i=1 i=1 i=1 i=1
(b) (Discrete probability space)

Let Ω be either finite or countably infinite. For any A ∈ P(Ω), define
X X
P (A) = p(ω), where p(ω) ≥ 0 and p(ω) = 1.
ω∈A ω∈Ω
P is a probability measure on (Ω, P(Ω)). Then (Ω, P(Ω), P ) is called the

discrete probability space.
2.3 Properties of a measure

Theorem 2.3.1. Let µ be a measure on a σ-field A
(a) µ(∅) = 0.
(b) (Finite additivity)

If {Ai , 1 ≤ i ≤ n} ∈ A and Ai are disjoint, then
Xn n
X
µ( Ai ) = µ(Ai ).
i=1 i=1
(c) (Monotonicity)
If A, B ∈ A, then A ⊂ B ⇒ µ(A) ≤ µ(B) and µ(B − A) = µ(B) − µ(A).
(d) (Continuity from below)

If {An , n ≥ 1} ∈ A and An ↑ A, then
µ(An ) → µ(A).
(e) (Continuity from above)

If {An , n ≥ 1} ∈ A, An ↓ A and µ(Am ) < ∞ for some m, then
µ(An ) → µ(A).
2.3. PROPERTIES OF A MEASURE 15
(f ) (Finite Sub-Additivity)
If {Ai , 1 ≤ i ≤ n} ∈ A, then
n
X
µ (∪ni=1 Ai ) ≤ µ(Ai ).
i=1
(g) (Countable Sub-Additivity)

If {An , n ≥ 1} ∈ A, then
∞
X
µ (∪∞
n=1 Ai ) ≤ µ(An ).
n=1
Proof.
(a) Take B ∈ A with µ(B) < ∞, and take A1 = B and A2 = A3 = · · · = ∅

in (2.1). Then we have
µ(B) = µ(B) + µ(∅)

µ(∅) = 0
(b) Take An+1 = An+2 = · · · = ∅ in in (2.1). Therefore, (b) can then be

proved.
(c) Note B = A ∪ (B − A) and A ∩ (B − A) = ∅. Both A and B − A are in

A since A is a σ-field.
µ(B) = µ(A ∪ (B − A)) = µ(A) + µ(B − A) ≤ µ(A).
At the same time, we also deduce that
µ(B − A) = µ(B) − µ(A).
(d) If for some m, µ(Am ) = ∞, then monotonicity implies that
lim µ(An ) = ∞ = µ(A).

n→∞
Now assume that for all m, µ(Am ) < ∞. Then
A = ∪∞
i=1 Ai = A1 + (A2 − A1 ) + (A3 − A2 ) + · · · ,
so
µ(A) = µ(A1 ) + µ(A2 − A1 ) + µ(A3 − A2 ) + · · ·

X∞
= µ(A1 ) + [µ(Ai ) − µ(Ai−1 )]
i=2
n
X
= µ(A1 ) + lim [µ(Ai ) − µ(Ai−1 )]
n→∞
i=2
= lim µ(An )
n→∞
(e) Assume that for some m, µ(Am ) < ∞. (There is no point of discussion
if µ(Am ) = ∞ for all m ≥ 1.) Then Am − An , n ≥ 1 forms an increasing
sequence and Am − An → Am − A = Am − ∩∞ i=1 Ai . Then from (d), for
n ≥ m,
µ(Am − An ) = µ(Am ) − µ(An ) → µ(Am ) − µ (∩∞

i=1 Ai ) .
Therefore,
lim µ(An ) = µ (∩∞
i=1 Ai ) = µ(A).
n→∞
k−1

(f) Let B1 = A1 and Bk = Ak − ∪i=1 Ai .
Bk are disjoint and ∪nk=1 Ak = ∪nk=1 Bk , so from (b),
n
! n
X X
n
µ (∪k=1 Ak ) = µ Bk = µ(Bk ).
k=1 k=1
Since µ(Bk ) ≤ µ(Ak ) by (c), we have

n
X
µ (∪ni=1 Ai ) ≤ µ(Ai ).
i=1
(g) By taking the limit of n → ∞ on both sides of the result in the proof of
(f), we have
n
X ∞
X
lim µ (∪ni=1 Ai ) ≤ lim µ(Ai ) = µ(Ai ).
n→∞ n→∞
i=1 i=1
Let Bn = ∪ni=1 Ai . So, {Bn , n ≥ 1} forms an increasing sequence and

limn→∞ Bn = ∪∞ i=1 Ai . From (d), the left hand side becomes
µ (∪∞
i=1 Ai ) .
2.4. ARITHMETICS WITH ∞ 17
Remark Let µ be the counting measure on the set {1, 2, 3, · · · }, let An =

{n, n + 1, n + 2, · · · }. Then ∩∞
n=1 An = ∅ but µ(An ) = ∞ for n = 1, 2, 3, · · ·
This shows that the hypothesis
µ(A1 ) < ∞
is not superfluous in Theorem 2.3.1(e).
Probability measure spaces satisfy this hypothesis automatically since prob-
ability measures are bounded above by 1.
2.4 Arithmetics with ∞

In this course, we always need to handle ∞. For example, a measure can be
∞. So, we need to know arithmetics with ∞.
(1) For any x ∈ (−∞, ∞),
∞ + x = ∞, x · ∞ = ∞(x > 0), x · ∞ = −∞(x < 0).
(2) 0 · ∞ = 0, ∞ + ∞ = ∞, ∞ · ∞ = ∞.
x x
(3) ∞
= −∞
= 0.
∞
(4) ∞ − ∞ and ∞
are not defined.
2.5 Extension of measures

Definition Let A and B be two classes of subsets of Ω with A ⊂ B. If µ
and ν are two set functions (or measures) defined on A and B, respectively
such that
µ(A) = ν(A), for all A ∈ A,
ν is said to be an extension of µ to B, and µ is the restriction of ν to A.
2.5.1 Complete measure spaces

Consider a measure space (Ω, A, µ). If A ∈ A and µ(A) = 0, is µ(B) = 0
when B ⊂ A?
In general, the answer of the question is NO because B may not in A.

However, we can extend A to include all the subsets of a A-measurable sets
which have zero measure.
Definition Let (Ω, A, µ) be a measure space, and N ⊂ Ω.
(i) N is a µ-null set iff ∃B ∈ A with µ(B) = 0 such that N ⊂ B.
(ii) (Ω, A, µ) is a complete measure space if every µ-null set N ∈ A.
Theorem 2.5.1. Given a measure space (Ω, A, µ), there exists a complete
measure space (Ω, Ā, µ̄) such that A ⊂ Ā and µ̄ = µ on A.
Proof. The proof is outside the scope of this course and is not required.
Because Theorem 2.5.1, from now on, without otherwise specified, we assume
all the measure spaces in this course are complete measure spaces.
2.5.2 Caratheodory Extension Theorem

Since σ-field is closed under a set of ”logical” operations such as the com-
plementation and countable unions, it is reasonable to consider a probability
measure on a σ-field. However, σ-field is sometimes difficult to describe.
Can we define a probability measure on a simple class and then extend it to

the σ-field generated by the corresponding simple class like semirings?
The answer is ”YES”. Now, let’s see how to do that.
Definition Let µ be a measure on a semiring A with Ω ∈ A. For any A ⊂ Ω,

define (∞ )
X
µ∗ (A) =: inf µ(An ) : A ⊂ ∪∞
n=1 An , An ∈ A
n=1
to be the outer measure of A. µ∗ is called the outer measure induced by

the measure µ.
Remark
1. In the definition, we require Ω ∈ A in order to guarantee the existence

of A1 , A2 , · · · to make ∪∞
n=1 An to cover A.
2. µ∗ (A) is defined for all A ⊂ Ω. So the domain of µ∗ (A) is the power

set P(Ω) and the range of µ∗ (A) is [0, ∞].
Theorem 2.5.2. (Caratheodory Extension Theorem)

Let µ be a measure on a semiring A with Ω ∈ A.
2.5. EXTENSION OF MEASURES 19

(i) µ has an extension to σ(A), denoted by µ|σ(A) , so Ω, σ(A), µ|σ(A) is
a measure space. Furthermore, µ|σ(A) = µ∗ |σ(A) , i.e., this extension is
simply the restriction of µ∗ to σ(A).
(ii) If µ is σ-finite, then the extension in (i) is unique. (i.e., if µ1 and

µ2 are both extensions of µ to σ(A), then µ1 = µ2 .)
Proof. The proof of this theorem is not required in this course.
We are going to see how to apply the Caratheodory extension theorem to con-
struct an important measure on the Borel σ-field which is known as Lebesgue
measure.
Let Ω = (−∞, ∞] and F = {(a, b] : −∞ ≤ a ≤ b ≤ ∞}. Define a set function

λ on F as the length of the half open interval in F, i.e.,
λ((a, b]) = b − a.
In Chapter 1, we showed that F is a semiring. We want to show that λ is a

measure on F.
Proof. Let A1 , A2 , · · · be a sequence of disjoint sets in F. Assume ∞

P
i=1 Ai ∈
F.
∞
! ∞ ∞ ∞
X X X X
λ Ai = Length of Ai = Length of Ai = λ(Ai ).
i=1 i=1 i=1 i=1
Therefore, λ is a measure on the semiring F.
Define the outer measure induced by λ as

(∞ )
X
λ∗ (A) = inf λ(An ) : A ⊂ ∪∞
n=1 An , An ∈ F , for any A ⊂ Ω.
n=1
From the Caratheodory extension theorem, λ∗ is a measure and an extension

of λ from F to σ(F) which is the Borel σ-field B in Chapter 1. λ∗ is called
the Lebesque measure on σ(F).
Example
Find λ∗ (A) when
(a) A = (a1 , b1 ] ∪ (a2 , b2 ], where a1 < b1 < a2 < b2 .

(b) A = {a}
[Note: A given in (a) and (b) are not in F]

Solution
(a) Let A1 = (a1 , b1 ] and A2 = (a2 , b2 ]. So,

(∞ )
X
λ∗ (A) = inf λ(An ) : A ⊂ ∪∞ n=1 An , An ∈ F = λ(A1 )+λ(A2 ) = (b1 −a1 )+(b2 −a2 ).
n=1
(b) Let A1 = a − n1 , a + 1
and A2 = · · · = ∅. A = ∪∞

n n=1 An . So,
(∞ )
X 2
λ∗ (A) = inf λ(An ) : A ⊂ ∪∞
n=1 An , An ∈ F = lim =0
n→∞ n
n=1
2.6 Probability measures and distribution func-

tions
Definition A real-valued function F on R is a distribution function
(d.f.) if
(a) F (−∞) = limx→−∞ F (x) = 0,

F (∞) = limx→∞ F (x) = 1.
(b) F is nondecreasing, i.e., F (x) ≤ F (y) if x ≤ y.
(c) F is right continuous, i.e., F (y) ↓ F (x) if y ↓ x.
Theorem 2.6.1. (Correspondence Theorem) The relation
F (x) = P ((−∞, x]), x∈R (2.2)
establishes a 1 − 1 correspondence between all d.f.’s and all probability mea-

sures on (R, B).
To prove Correspondence Theorem, we need to use the following theorem.
Theorem 2.6.2. Suppose that F is finite on (−∞, ∞) (i.e. |F (t)| < ∞ for
|t| < ∞), and
(i) F is a nondecreasing; (ii) F is right continuous.

2.6. PROBABILITY MEASURES AND DISTRIBUTION FUNCTIONS21
Then there is a unique measure µ on (R, B) with
µ((a, b]) = F (b) − F (a), −∞ ≤ a ≤ b ≤ ∞,
(When a = b = ∞ or −∞, the right hand is understood to be 0.)

Proof. The proof of Theorem 2.6.2 is not required.
Remark
(1) The function F in Theorem 2.6.2 is called a Lebesgue-Stieltjes (L-S)
measure function.
(2) If F (x) = x, then µ is the Lebesgue measure which we have seen in

Section 2.5.2.
(3) Theorem 2.6.2 shows that F uniquely determines µ, but not vice versa,
since we can write µ((a, b]) = F (b) − F (a) = (F (b) + c) − (F (a) + c). So
there is no 1 − 1 correspondence between the L-S measure functions and
the L-S measures.
Proof. (Correspondence Theorem) (1). Given a probability measure P

on (R, B), we first want to show that F determined by (2.2) is a d.f.
To check (b), note that x ≤ y ⇒ (−∞, x] ⊂ (−∞, y]. By the monotonicity

of P , we get
F (x) = P ((−∞, x]) ≤ P ((−∞, y]) = F (y).
To check (c), for any sequence xn ↓ x, it follows that (−∞, xn ] ↓ (−∞, x].
Since P is continuous from above, we have
F (xn ) = P ((−∞, xn ]) ↓ P ((−∞, x]) = F (x).
Thus, limy↓x F (x) = F (y), i.e. F is right continuous.
To check (a), for any sequence xn ↓ −∞, it follows that (−∞, xn ] ↓ ∅. Since
P is continuous from above, we have
F (xn ) = P ((−∞, xn ]) ↓ P (∅) = 0.
Thus, limx↓−∞ F (x) = 0. Similarly, limx↑∞ F (x) = 1.
(2) Given a d.f. F , we now show that there exists a unique probability
measure P on (R, B) satisfying (2.2).
Since F is a d.f., it must be a L-S measure function. From Theorem 2.6.2,

we see that there exists a unique probability measure P on (R, B) satisfying
F (y) − F (x) = P ((y, x]), x ≤ y.
In order to show that (2.2) is satisfied, note that

(i) limy↓−∞ F (y) = 0, (as F is a d.f.).
(ii) limy↓−∞ P ((y, x]) = P ((−∞, x]), (as P is a probability measure thus
continuous from below).
Thus letting y ↓ −∞, we get F (x) = P ((−∞, x]). That is, (2.2) holds.
Remark
(1) The definition of F in Theorem 2.6.1 does not involve any random vari-
ables. The association of the d.f.’s with the random variables will be
discussed in the next chapter.
(2) F is a d.f. and P is a probability measure. The following are equivalent
(i) P ((a, b]) = F (b) − F (a),

(ii) P ([a, b]) = F (b) − F (a−),
(iii) P ([a, b)) = F (b−) − F (a−),
(iv) P ((a, b)) = F (b−) − F (a),
(v) P ((−∞, b)) = F (b−),
(vi) P ({a}) = F (a) − F (a−).
Examples - Types of distributions

(i) A d.f. δt is called degenerate at t if
δt (x) = 0 x<t
= 1 x ≥ t.
(ii) A d.f. F is called discrete if it can be represented in the form

∞
X
F (x) = bn δan (x),
n=1
where {an ,P
n ≥ 1} is a countable set of real numbers, bj > 0 for all
j ≥ 1 and ∞ j=1 bj = 1.
2.6. PROBABILITY MEASURES AND DISTRIBUTION FUNCTIONS23
(iii) A d.f. F is called continuous if it is continuous everywhere.
(iv) A function F (not just for d.f.s) is called absolutely continuous

R ∞(−∞, ∞) if and only if there exists a function f which satisfies
in
−∞
f (x)dx < ∞ such that
Z x
F (x) = f (x)dx.
−∞
Here, f (x) is called the density function of F (x).

Note: At this moment, we just treat the integral is the Riemann inte-
gral which we have learnt in elementary calculus courses. We will have
a more general definition of the integration later.
Chapter 3
Random Variables
3.1 Mappings
Definition Let X : Ω1 → Ω2 be a mapping between two sets Ω1 and Ω2 .
(i) For every subsets B ∈ Ω2 , the inverse image of B is
X −1 (B) = {ω : ω ∈ Ω1 , X(ω) ∈ B} =: {X ∈ B}.
(ii) For every class G of Ω2 , the inverse image of G is
X −1 (G) = {X −1 (B) : B ∈ G}.
(So A ∈ X −1 (G) means that ∃B ∈ G such that A = X −1 (B).)
Theorem 3.1.1. (Properties of the inverse image)
(1) X is a mapping from Ω1 to Ω2 . Then
(i) X −1 (Ω2 ) = Ω1 , X −1 (∅) = ∅.

(ii) X −1 (B c ) = [X −1 (B)]c .
(iii) X −1 (∪γ∈Γ Bγ ) = ∪γ∈Γ X −1 (Bγ ) for Bγ ⊂ Ω2 , γ ∈ Γ.
(iv) X −1 (∩γ∈Γ Bγ ) = ∩γ∈Γ X −1 (Bγ ) for Bγ ⊂ Ω2 , γ ∈ Γ, where Γ is an
index set, not necessarily countable.
(v) X −1 (B1 − B2 ) = X −1 (B1 ) − X −1 (B2 ) for B1 , B2 ⊂ Ω2 .
(vi) B1 ⊂ B2 ⊂ Ω2 implies that X −1 (B1 ) ⊂ X −1 (B2 ).
(2) If B is a σ-field in Ω2 , then X −1 (B) is a σ-field in Ω1 .
(3) If C be a nonempty class in Ω2 , then
X −1 (σ(C)) = σ(X −1 (C)).
Proof. (1) (i) Note that
X −1 (Ω2 ) = {ω ∈ Ω1 : X(ω) ∈ Ω2 } = Ω1 ,
X −1 (∅) = {ω ∈ Ω1 : X(ω) ∈ ∅} = ∅.
25
26 CHAPTER 3. RANDOM VARIABLES
(ii)
X −1 (B c ) = {ω : X(ω) ∈ B c } = {ω : X(ω) 6∈ B}
= {ω : ω 6∈ X −1 (B)} = {ω : ω ∈ [X −1 (B)]c } = [X −1 (B)]c .
(iii) ω ∈ X −1 (∪γ∈Γ Bγ ) ⇔ X(ω) ∈ ∪γ∈Γ Bγ ⇔ X(ω) ∈ Bγ for some

γ ∈ Γ ⇔ ω ∈ X −1 (Bγ ) for some γ ∈ Γ ⇔ ω ∈ ∪γ∈Γ X −1 (Bγ ).
(iv) Similar to (iii).
(v) X −1 (B1 − B2 ) = X −1 (B1 ∩ B2c ) = X −1 (B1 ) ∩ X −1 (B2c ) = X −1 (B1 ) ∩
[X −1 (B2 )]c = X −1 (B1 ) − X −1 (B2 ).
(vi) X −1 (B1 ) = {ω : X(ω) ∈ B1 } ⊂ {ω : X(ω) ∈ B2 } = X −1 (B2 ).
(2) First X −1 (B) is nonempty as Ω1 = X −1 (Ω2 ) ∈ X −1 (B). Next, let A ∈

X −1 (B). Then ∃B ∈ B such that A = X −1 (B). So Ac = X −1 (B c ) ∈
X −1 (B). So X −1 (B) is closed under complement. Similarly, it can be
shown that it is closed under countable union. Thus it is a σ-field.
(3) Clearly, X −1 (C) ⊂ X −1 (σ(C)), which is a σ-field from (2). Thus σ(X −1 (C)) ⊂
X −1 (σ(C)). It remains to show that X −1 (σ(C)) ⊂ σ(X −1 (C)). Define
G = {G : X −1 (G) ∈ σ X −1 (C) }.

For any B ∈ C, we have X −1 (B) ∈ X −1 (C) ⊂ σ(X −1 (C)). Therefore,

B ∈ G and so C ⊂ G.
Now, we want to show that G is a σ-field.
Let E ∈ G ⇒ X −1 (E) ∈ X −1 (G), which is a σ-field, ⇒ X −1 (E c ) =
c
[X −1 (E)] ∈ X −1 (G) ⇒ E c ∈ G ⇒ X −1 (G) is closed under complement.
Similarly, we can show that it is closed under countable union. Therefore,
G is a σ-field. Hence, σ(C) ⊂ G which implies that X −1 (σ(C)) ⊂ X −1 (G).
To complete the proof, we need to show that X −1 (G) ⊂ σ(X −1 (C)).
For any A ∈ X −1 (G) ⇒ ∃B ∈ G such that X −1 (B) = A

⇒ X −1 (B) = A ∈ σ(X −1 (C)) (by the definition of G)
⇒ X −1 (G) ⊂ σ(X −1 (C))
Remark Clearly, X −1 (·) on Ω2 preserves all the set operations on Ω1 .

3.2. MEASURABLE MAPPING 27
3.2 Measurable mapping

Definition
(1) (Ω1 , A1 ) and (Ω2 , A2 ) are measurable spaces. X : Ω1 → Ω2 is a mea-
surable mapping if
X −1 (A2 ) ≡ {X ∈ A2 } ∈ A1 , ∀A2 ∈ A2 .
(2) X is a measurable function if (Ω2 , A2 ) = (Rn , B(Rn )) in (1).

(3) X is a Borel (measurable) function if (Ω1 , A1 ) = (Rm , B(Rm )) and
(Ω2 , A2 ) = (Rn , B(Rn )) in (1).
The next theorem is useful in checking if X is measurable or not.
Theorem 3.2.1. X : (Ω1 , A1 ) → (Ω2 , A2 ) is a measurable mapping if
A2 = σ(C) and X −1 (C) ∈ A1 for all C ∈ C.
Proof. From Theorem 3.1.1, we have
X −1 (A2 ) = X −1 (σ(C)) = σ X −1 (C) ⊂ σ(A1 ) = A1 .

Theorem 3.2.2. If X : (Ω1 , A1 ) → (Ω2 , A2 ) and f : (Ω2 , A2 ) → (Ω3 , A3 )

are measurable mappings, then f (X) = f · X : (Ω1 , A1 ) → (Ω3 , A3 ) is also a
measurable mapping.
Proof. ∀A3 ∈ A3 ,
(f · X)−1 (A3 ) = {ω : f · X(ω) ∈ A3 }
= {ω : X(ω) ∈ f −1 (A3 )}
= {ω : ω ∈ X −1 f −1 (A3 ) }.

Since f −1 (A3 ) ∈ A2 , {ω : ω ∈ X −1 (f −1 (A3 ))} ∈ A1 . Therefore, f · X is a

measurable mapping.
3.3 Random Variables (Vectors)

3.3.1 Random variables
Definition Let (Ω, A) be a measurable space and (R, B) be a Borel mea-
surable space. A random variable (r.v.) X is a measurable function from
(Ω, A) to (R, B). When we need to emphasize the σ-field, we will say that
X is A-measurable or write X ∈ A.
3.3.2 How to check a random variable?

To verify X is a r.v., we don’t need to check that {X ∈ B} ∈ A for all Borel
sets B. One only needs to check this for all intervals. This is justified by the
next theorem, which is a simple consequence of Theorem 3.2.1.
Theorem 3.3.1. X is a r.v. from (Ω, A) to (R, B)
⇔ {X ≤ x} = X −1 ([−∞, x]) ∈ A, ∀x ∈ R.
⇔ {X ≤ x} = X −1 ([−∞, x]) ∈ A, ∀x ∈ D which is a dense subset of R.
Proof. Take C = {[−∞, b] : b ∈ R} or C = {[−∞, b] : b ∈ D} in Theorem

3.2.1.
Remark
(i) We can take D to be all rational numbers.
(ii) {X ≤ x} in the theorem can be replaced by any of the following:
{X ≥ x}, {X < x}, {X > x}, {x < X < y}, etc.
3.3.3 Random vectors

Definition X = (X1 , · · · , Xn ) is a n-dimensional random vector if Xk
is a r.v. on (Ω, A) for 1 ≤ k ≤ n.
Theorem 3.3.2. X = (X1 , · · · , Xn ) is a n-dimensional random vector ⇒

X is a measurable function from (Ω, A) to (Rn , B(Rn )).
Proof. Let Ik = (ak , bk ], −∞ ≤ ak ≤ bk ≤ ∞, k ≥ 1. Since {Xk ∈ Ik } ∈ A,
{X = (X1 , · · · , Xn ) ∈ I1 × · · · × In } = ∩∞
k= {Xk ∈ Ik } ∈ A.
The proof follows from this and Theorem 3.2.1 as B(Rn ) = σ ( nk=1 Ik ).
Q
3.4 Construction of random variables

Theorem 3.4.1. X = (X1 , · · · , Xn ) is a n-dimensional random vector, f is
a Borel function from Rn to Rm . Then f (X) is a m-dimensional random
vector.
Proof. The proof follows directly from Theorems 3.3.2 and 3.2.2.
3.4. CONSTRUCTION OF RANDOM VARIABLES 29
Remark Continuous functions are Borel functions.
Theorem 3.4.2. If X, Y are r.v.’s, so are

(i) aX where a ∈ R.
(ii) X + Y .
(iii) X 2 .
(iv) XY .
1
(v) X
provided that X(ω) 6= 0 for all ω ∈ Ω.
X
(vi) Y
provided that Y (ω) 6= 0 for all ω ∈ Ω.
Proof. To prove the theorem, we can use Theorem 3.4.1 by choosing the
appropriate Borel function f .
(i) f (x) = ax.
(ii) f (x, y) = x + y.
(iii) f (x) = x2 .
(iv) f (x, y) = xy.

1
(v) f (x) = x
if x 6= 0.
x
(vi) f (x, y) = y
if y 6= 0.
Theorem 3.4.3. If X, Y are r.v.’s, then X ∨ Y = max{X, Y } and X ∧ Y =

min{X, Y } are r.v.’s.
Proof. {X ∨ Y ≤ t} = {X ≤ t} ∩ {Y ≤ t}.
{X ∧ Y ≥ t} = {X ≥ t} ∩ {Y ≥ t}.
By Theorem 3.3.1, we have {X ≤ t}, {Y ≤ t}, {X ≥ t} and {Y ≥ t} ∈ A.

Since A is a σ-field, {X ∨ Y ≤ t} and {X ∧ Y ≥ t} ∈ A. By Theorem 3.3.1,
X ∨ Y and X ∧ Y are r.v.’s.
Definition The positive and negative parts of a function X : Ω → R are
X + = max{X, 0}, X − = − min{X, 0}.
It is clear that X = X + − X − and |X| = X + + X − .

Corollary 3.4.1. X is a r.v., so are X + , X − and |X|.

Theorem 3.4.4. X1 , X2 , · · · are random variables on (Ω, A).
(1) supn Xn , inf n Xn , lim supn Xn and lim inf n Xn are r.v.’s.
(2) If X(ω) = limn→∞ Xn (ω) for every ω, then X is a r.v.
(3) If S(ω) = ∞
P
n=1 Xn (ω) exists for every ω, then S is a r.v.
Proof.
1.
{sup Xn ≤ t} = ∩∞
n=1 {Xn ≤ t}
n
{inf Xn ≥ t} = ∩∞
n=1 {Xn ≥ t}
n
lim sup Xn = inf sup Xm
n k≥1 m≥k
lim inf Xn = sup inf Xm

n k≥1 m≥k
2. X(ω) = limn→∞ Xn (ω) = lim supn Xn (ω). So, limn→∞ Xn is a r.v.
3. S(ω) = limn→∞ nk=1 Xk (ω). By Theorem 3.4.2 and (2), S is a r.v.

P
3.5 Approximations of r.v. by simple r.v.s

Theorem 3.5.1.
(1) (Indicator r.v.) If A ∈ A, the indicator function IA is a r.v.
(Recall: IA (ω) = I{ω ∈ A} indicates whether A occurs or not.)
Pn Pn
(2) (Simple r.v.) If Ω = i=1 Ai , where Ai ∈ A, then X = i=1 ai IAi ,
where ai are some given real constants, is a r.v.
Proof.
1. ∀B ∈ B, note that


 ∅ if 0 6∈ B, 1 6∈ B
A if 0 ∈ B, 1 6∈ B

{IA ∈ B} =

 Ac if 0 6∈ B, 1 ∈ B
Ω if 0 ∈ B, 1 ∈ B

3.5. APPROXIMATIONS OF R.V. BY SIMPLE R.V.S 31
2. ∀B ∈ B, note that
{X ∈ B} = ∪{i:ai ∈B} Ai ∈ A.
Theorem 3.5.2. Given a r.v. X ≥ 0 on (Ω, A), there exists a sequence of

simple random variables {Xn , n ≥ 1} such that
(a) 0 ≤ X1 ≤ X2 ≤ · · · ≤ X, and
(b) Xn (ω) ↑ X(ω) for every ω ∈ Ω.
Proof. ∀n ≥ 1, let
n2n
X k−1 k−1 k
Xn (ω) = n
I n
< X(ω) ≤ n + nI{X(ω) > n}.
k=1
2 2 2
This sequence has the required properties. Now, let us give a rigorous proof
of it. Clearly, Xn (ω) ≥ 0 for all n. Next, we show that {Xn (ω), n ≥ 1} is an
increasing sequence for all ω ∈ Ω. For any n ≥ 1 and ω ∈ Ω, we have
(n+1)2n+1
X k − 1 k − 1 k

Xn+1 (ω) = I < X(ω) ≤ n+1 + (n + 1)I{X(ω) > n + 1}
k=1
2n+1 2n+1 2
 
n2n+1 (n+1)2n+1
X X k − 1 k − 1 k
=  +  I < X(ω) ≤ n+1
k=1 n+1
2n+1 2n+1 2
k=n2 +1
+(n + 1)I{X(ω) > n + 1}

:= A + B + C.
Now
n2n+1
Xk−1 k−1 k
A = I < X(ω) ≤ n+1
k=1
2n+1 2n+1 2

k 1 1 2
= 0 × I 0 < X(ω) ≤ n+1 + n+1 I < X(ω) ≤ n+1
2 2 2n+1 2

2 2 3 3 3 4
+ n+1 I < X(ω) ≤ n+1 + n+1 I < X(ω) ≤ n+1
2 2n+1 2 2 2n+1 2
+······
+······
n2n+1 − 2
n+1
n2n+1 − 1

n2 −2
+ I < X(ω) ≤ +
2n+1 2n+1 2n+1
n2n+1 − 1
n+1
n2n+1

n2 −1
I < X(ω) ≤ n+1
2n+1 2n+1 2

k 1 1
≥ 0 × I 0 < X(ω) ≤ n+1 + 0 × I < X(ω) ≤ n
2 2n+1 2

1 1 3 1 3 2
+ nI < X(ω) ≤ n+1 + n I < X(ω) ≤ n
2 2n 2 2 2n+1 2
+······
+······
n2n − 1 n2n − 1 n2n+1 − 1

+ I < X(ω) ≤ +
2n 2n 2n+1
n2n − 1
n+1
n2n+1

n2 −1
I < X(ω) ≤ n+1
2n 2n+1 2
n2n
Xk−1 k−1 k
= n
I n
< X(ω) ≤ n .
k=1
2 2 2
and
(n+1)2n+1
X k−1 k−1 k
B+C = I < X(ω) ≤ + (n + 1)I {X(ω) > n + 1}
2n+1 2n+1 2n+1
k=n2n+1 +1
(n+1)2n+1
n2n+1

X k−1 k
≥ I < X(ω) ≤ + nI {X(ω) > n + 1}
2n+1 2n+1 2n+1
k=n2n+1 +1
≥ nI {n < X(ω) ≤ n + 1} + nI {X(ω) > n + 1}
= nI {X(ω) > n}
Therefore,
Xn+1 (ω) = A + B + C
n2n
X k−1 k−1 k
≥ n
I n
< X(ω) ≤ n + nI {X(ω) > n}
k=1
2 2 2
= Xn (ω) ≥ 0.
Thus, {Xn (ω), n ≥ 1} is an increasing sequence for every ω ∈ Ω. So

limn→∞ Xn (ω) exists (maybe ∞).
It remains to show that limn→∞ Xn (ω) = X(ω). First, if X(ω) = ∞, then
by definition, we have Xn (ω) = n → ∞ = X(ω). If X(ω) < ∞, then for n
large enough, we have
n2n
X k−1 k−1 k
Xn (ω) = I < X(ω) ≤ n .
k=1
2n 2n 2
3.6. σ-FIELD GENERATED BY RANDOM VARIABLES 33
We observe that there exists {1 ≤ k ≤ n2n } such that k−1 k

2n
< X(ω) ≤ 2n
k−1
in which case Xn (ω) = 2n , the left end point of the set. Therefore,
k−1 1
0 ≤ X(ω) − n
= X(ω) − Xn (ω) ≤ n → 0.
2 2
3.6 σ-field generated by random variables

Definition Let {Xλ : λ ∈ Λ} be a nonempty family of r.v.’s on (Ω, A) where
Λ is an index set. Define
σ(Xλ , λ ∈ Λ) =: σ(Xλ ∈ B, B ∈ B, λ ∈ Λ) = σ Xλ−1 (B), λ ∈ Λ = σ ∪λ∈Λ Xλ−1 (B) ,

which is called the σ-field generated by Xλ , λ ∈ Λ.
Example
(1) For Λ = {1}, if X1 (ω) ≡ a for all ω ∈ Ω, then for any B ∈ B
X1−1 (B) = Ω if a ∈ B
= ∅ if a 6∈ B.
σ(X1 ) = σ({Ω, ∅}) = {Ω, ∅}.
(2) For Λ = {1, 2, · · · , n}, we have
σ(Xi ) = σ Xi−1 (B) = Xi−1 (B).

σ(X1 , · · · , Xn ) = σ ∪ni=1 Xi−1 (B) = σ (∪ni=1 σ(Xi )) .

(3) For Λ = {1, 2, · · · , }, it is easy to check that
σ(X1 ) ⊂ σ(X1 , X2 ) ⊂ · · · ⊂ σ(X1 , · · · , Xn )

σ(X1 , X2 , · · · ) ⊃ σ(X2 , X3 , · · · ) ⊃ · · · ⊃ σ(Xn , Xn+1 , · · · )
The σ-field ∩∞ n=1 σ(Xn , Xn+1 , · · · ) is referred to as the tail σ-field of

X1 , X2 , · · · .
Theorem 3.6.1. Let X1 , · · · , Xn be r.v.’s on a measurable space (Ω, A).
A real function Y on Ω is σ(X1 , · · · , Xn )-measurable if and only if Y =
f (X1 , · · · , Yn ) where f is a Borel measurable function on Rn .
Proof. The proof is not required in this course.
3.7 Distributions and induced distribution func-

tions
Associated every random variable with a probability measure on R.
Theorem 3.7.1. A r.v. X on (Ω, A, P ) induce another probability space

(R, B, PX ) through
PX (B) = P X −1 (B) = P (X ∈ B).

∀B ∈ B :
Proof. Clearly, PX (B) is nonnegative and PX (R) = P (Ω) = 1. Let {Bi , i ≥

1} be a sequence of disjoint sets in B.
∞
! ∞
!! ∞ ∞
X X X X
PX Bi = P X −1 Bi = P X −1 (Bi ) = PX (Bi ) .
i=1 i=1 i=1 i=1
Definition X is a r.v.
(a) PX in Theorem 3.7.1 is called the distribution of X.
(b) The distribution function of X:
FX (x) = PX ([−∞, x]) = P (X ≤ x).
Definition A r.v. X is said to be absolutely continuous if its d.f. is

absolutely continuous.
Example
Let f (x) be the density function of the d.f. F (x).
(a) Uniform distribution

1

b−a
if a ≤ x ≤ b,
f (x) =
0 otherwise.
(b) Exponential distribution
βe−βx if x ≥ 0,

f (x) =
0 if x < 0,
where β > 0.
3.7. DISTRIBUTIONS AND INDUCED DISTRIBUTION FUNCTIONS35
Similarly, we can define the distributions and distribution functions for ran-
dom vectors.
Definition X = (X1 , · · · , Xn ) is a random vector.
(a) The distribution of X:
PX (B) = P (X −1 (B)) = P (X ∈ B), B ∈ Bn .
(b) The (joint) distribution function of X:
FX (x) = P (X1 ≤ x1 , · · · , Xn ≤ xn ).
(c) The marginal distribution function of Xi :
FXi (x) = P (X1 ≤ ∞, · · · , Xi−1 ≤ ∞, Xi ≤ x, Xi+1 ≤ ∞, · · · , Xn ≤ ∞).
Definition X and Y are r.v.’s on (Ω, A, P ).
(i) X and Y are identically distributed (i.d.) if FX = FY , denoted by

X =d Y .
(ii) X and Y are equal almost surely (a.s.) if P (X = Y ) = 1, denoted

by X =a.s. Y .
Remark
(i) X and Y in Definition (i) does not have to be in the same probability
space while they must in (ii).
(ii) X =d Y is much weaker concept that X =a.s. Y . One could have

X =d Y even if P (X 6= Y ) = 1. For example, X ∼ N (0, 1) and
Y = −X. Clearly, P (X = Y ) = P (X = 0) = 0, but X =d Y .
Definition A random variable x on (Ω, A, P ) is discrete if ∃ a countable

subset C of R such that P (X ∈ C) = 1.
Theorem 3.7.2.
X is discrete ⇔ PX is discrete ⇐ FX is discrete .

Proof. The first equivalence is by definition.

P∞ P∞
If FX is discrete, then FX (x) = i=1 pi δai (x), where i=1 pi = 1. Let
C = {ai , i ≥ 1}, then we have
∞
!
X
PX (C) = PX {ai }
i=1
∞
X
= PX ({ai })
i=1
∞
X
= [FX (ai ) − FX (ai −)]
i=1
∞
X
= pi = 1.
i=1
That is, PX is discrete.

On the other hand, if PX is discrete, then PX (C) = 1, where C = {ai , i ≥ 1}.
Then
FX (x) = P (X ∈ [−∞, x])
= P (X ∈ [−∞, x] ∩ C)
X
= PX ({ai })
ai ∈[−∞,x]
X∞
= PX ({ai })I{ai ≤ x}
i=1
X∞
= PX ({ai })δai (x)
i=1
X∞
=: pi δai (x)
i=1
That is, FX is discrete.

Similarly, we have the associated definition and theorem for discrete random
vectors.
Definition A random vector X on (Ω, A, P ) is discrete if ∃ a countable

subset C of Rn such that P (X ∈ C) = 1.
Theorem 3.7.3. A random vector X = (X1 , · · · , Xn ) is discrete if and only

if Xk is discrete for each 1 ≤ k ≤ n.
3.8. INDEPENDENCE 37
Proof. Let C be a countable subset of Rn , define Ci = {xi : (x1 , · · · , xn ) ∈

C}, which is clearly countable. Then we have C = C1 × · · · × Cn . Then
P (X ∈ C) = 1 ⇔ P (∩ni=1 {Xi ∈ Ci }) = 1
⇔ P (∪ni=1 {Xi 6∈ Ci }) = 0 ⇔ P ({Xi 6∈ Ci }) = 0, 1 ≤ i ≤ n
⇔ P ({Xi ∈ Ci }) = 1, 1 ≤ i ≤ n, ⇔ Xi is discrete for all i.
3.8 Independence
Definition Let (Ω, A, P ) be a probability space.
(i) The events A1 , · · · , An ∈ A are said to be independent if and only if
Y
P (∩i∈J Ai ) = P (Ai ).
i∈J
for every subsets J of {1, 2, · · · , n}.

(ii) Classes A1 , · · · , An are said to be independent if and only if
Y
P (∩i∈J Ai ) = P (Ai ). (3.1)
i∈J
for every subsets J of {1, 2, · · · , n}, and Ai ∈ Ai .
In particular, σ-field A1 , · · · , An are said to be independent if

n
Y
P (∩ni=1 Ai ) = P (Ai ) for any Ai ∈ Ai .
i=1
(Note we can choose some Ai = Ω ∈ Ai in (3.1).)

(iii) The random variables X1 , · · · , Xn are said to be independent if and
only if the events {Xi ∈ Bi } are independent, i.e.,
Y
P (∩i∈J {Xi ∈ Bi }) = P ({Xi ∈ Bi }). (3.2)
i∈J
for every subsets J of {1, 2, · · · , n} and any Bi ∈ B. This is also

equivalent to
n
Y
P (∩ni=1 {Xi ∈ Bi }) = P ({Xi ∈ Bi }) (3.3)
i=1
for any Borel sets Bi ∈ B (as one can take some Bi = R).
(iv) The random variables (events or classes) are said to be pairwise in-
dependent if and only if every two of them are independent.
(v) The random variables X1 , · · · , Xn are independent and have the same
distribution function are called independent and identically dis-
tributed (i.i.d.).
Remark Independent ⇒ Pairwise independent, but the converse is false

(see the following example).
Example
Let X1 , X2 , X3 are independent random variables with P (Xi = 0) = P (Xi =
1) = 1/2. Let A1 = {X2 = X3 }, A2 = {X3 = X1 }, A3 = {X1 = X2 }. Then
Ai ’s are pairwise independent but not independent.
Proof. Note for i 6= j,
P (A1 ) = P (X2 = X3 )
= P (X2 = X3 = 1) + P (X2 = X3 = 0)
= P (X2 = 1, X3 = 1) + P (X2 = 0, X3 = 0)
= P (X2 = 1) P (X3 = 1) + P (X2 = 0) P (X3 = 0)
= 1/4 + 1/4 = 1/2.
Similarly, P (A2 ) = P (A3 ) = 1/2.
P (Ai ∩ Aj ) = P (X1 = X2 = X3 )
= P (X1 = 1, X2 = 1, X3 = 1) + P (X1 = 0, X2 = 0, X3 = 0)
= P (X1 = 1) P (X2 = 1) P (X3 = 1) + P (X1 = 0) P (X2 = 0) P (X3 = 0)
= (1/2)3 + (1/2)3 = 1/4.
Thus, P (Ai ∩ Aj ) = P (Ai )P (Aj ). Thus, Ai ’s are pairwise independent. But

they are not independent since
P (A1 ∩ A2 ∩ A3 ) = P (X1 = X2 = X3 ) = 1/4 6= 1/8 = P (A1 )P (A2 )P (A3 ).
3.9 How to check independence

In order to check if X1 , · · · , Xn are independent, one only needs to verify
(3.3) for Bi = [−∞, ti ].
3.9. HOW TO CHECK INDEPENDENCE 39
Theorem 3.9.1. The random variables X1 , · · · , Xn are independent if and

only if
FX1 ,··· ,Xn (t1 , · · · , tn ) = FX1 (t1 ) · · · FXn (tn ) (3.4)
for all t1 , · · · , tn ∈ R.
Proof. ” ⇒ ”. If X1 , · · · , Xn are independent, then taking Bi = [−∞, ti ] in

(3.3) results (3.4).
” ⇐ ”. The proof of this part is not required.
3.9.1 Discrete random variables

Theorem 3.9.2. Discrete r.v.’s X1 , · · · , Xn , taking values in countable set
C, are independent if and only if
n
Y
P (X1 = a1 , · · · , Xn = an ) = P (Xi = ai ) (3.5)
i=1
for all a1 , · · · , an ∈ C.
Proof. If X1 , · · · , Xn are independent, then (3.5) is obviously true. On the

other hand, if (3.5) is true, then
FX1 ,··· ,Xn (t1 , · · · , tn ) = P (X1 ≤ t1 , · · · , Xn ≤ tn )

X X
= ··· P (X1 = a1 , · · · , Xn = an )
{a1 ∈C:a1 ≤t1 } {an ∈C:an ≤tn }
X X n
Y
= ··· P (Xi = ai )
{a1 ∈C:a1 ≤t1 } {an ∈C:an ≤tn } i=1
Yn X
= P (Xi = ai )
i=1 {ai ∈C:ai ≤ti }
Yn
= P (Xi ≤ ti )
i=1
= FX1 (t1 ) · · · FXn (tn )
Therefore, X1 , · · · , Xn are independent.

3.9.2 Absolutely continuous random variables

Theorem 3.9.3. Let X = (X1 , · · · , Xn ) be an absolutely continuous random
vectors. Then X1 , · · · , Xn are independent if and only if
n
Y
fX (y1 , · · · , yn ) = fXi (yi ), (3.6)
i=1
for all y1 , · · · , yn ∈ R.
Proof. If X1 , · · · , Xn are independent, then

n Z
Y ti n
Y
fXi (yi )dyi = P (Xi ≤ ti ) = P (X1 ≤ t1 , · · · , Xn ≤ tn )
i=1 −∞ i=1
Z t1 Z tn
= ··· fX (y1 , · · · , yn )dy1 · · · dyn .
−∞ −∞
Hence,
n
!
Z t1 Z tn Y
··· fX (y1 , · · · , yn ) − fXi (yi ) dy1 · · · dyn = 0.
−∞ −∞ i=1
Differentiating w.r.t. t1 , · · · , tn results in (3.6).

On the other hand, if (3.6) is true, then
Z t1 Z tn
P (X1 ≤ t1 , · · · , Xn ≤ tn ) = ··· fX (y1 , · · · , yn )dy1 · · · dyn
−∞ −∞
n Z ti
Y n
Y
= fXi (yi )dyi = P (Xi ≤ ti ) .
i=1 −∞ i=1
Therefore, X1 , · · · , Xn are independents.
Some useful results related to the functions of independent random

variables are listed in the following theorem. The proofs of them are not
required in this course and hence are omitted.
Theorem 3.9.4.
(a) If X1 , · · · , Xn are independent random variables and g1 , · · · , gn are Borel

measurable function, then g(X1 ), · · · , g(Xn ) are independent random vari-
ables.
3.10. ZERO-ONE LAW 41
(b) Let 1 = n0 ≤ n1 < n2 < · · · < nk = n; gj be a Borel measurable function

of nj − nj−1 variables. If X1 , · · · , Xn are independent random variables,
then
g1 (X1 , · · · , Xn1 ), g2 (Xn1 +1 , · · · , Xn2 ), · · · · · · , gk (Xnk−1 , · · · , Xnk )
are independent.
(c) Let X, Y be independent and absolutely continuous random variables.

Then X + Y is absolutely continuous and
Z ∞
fX+Y (t) = fX (t − s)fY (s)ds, t ∈ R.
−∞
(d) Let X, Y be nonnegative, independent and integer-valued random vari-

ables. Then for each n ≥ 0,
n
X
P (X + Y = n) = P (X = k)P (Y = n − k).
k=0
Remark In Theorem 3.9.4, (c) and (d) are known as the convolution.
3.10 Zero-One Law

Let {An , n ≥ 1} be a sequence of events on (Ω, A, P ). Recall
lim sup An = ∩∞ ∞ ∞
n=1 ∪m=n Am = lim ∪m=n Am = {An , i.o.}.
n n→∞
lim inf An = ∪∞
n=1 ∩∞
m=n Am = lim ∩∞
m=n Am = {An , utl.}.
n n→∞
c
c
lim inf An = lim sup An .
n n
Theorem 3.10.1. (Borel-Cantelli Lemma)

P∞
(a) P (An , i.o.) = 0 if n=1 P (An ) < ∞.
P∞
(b) P (An , i.o.) = 1 if n=1 P (An ) = ∞ and A1 , A2 , · · · are independent.
Proof.
(a)

P (An , i.o.) = P lim ∪∞
m=n Am
n→∞
= lim P (∪∞
m=n Am )
n→∞
∞
X
≤ lim P (Am ) → 0.
n→∞
m=n
The inequality in the last row is due the countable sub-additivity of the
probability measure.
(b) Noting 1 − x ≤ e−x for x ≥ 0 and independence of An , we have

0 ≤ 1 − P (An , i.o.) = P (Acn , ult.)
= P lim ∩∞ c
m=n Am
n→∞

= lim P lim ∩rm=n Acm = lim lim P (∩rm=n Acm )
n→∞ r→∞ n→∞ r→∞
Yr
= lim lim [1 − P (Am )] (by independence)
n→∞ r→∞
m=n
Yr
≤ lim lim e−P (Am ) as 1 − x ≤ e−x for x ≥ 0
n→∞ r→∞
m=n
− rm=n P (Am )
P
= lim lim e
n→∞ r→∞
− ∞
P
= lim e m=n P (Am )
n→∞
= lim 0 = 0.
n→∞
Remark
(1) In Theorem 3.10.1, (a) does not require the independent assumption of
A1 , A2 , · · · while independence in (b) cannot be removed
P∞ in general. For
instance, take An = A with 0 < P (A) < 1, then n=1 P (An ) = ∞ but
P (An , i.o.) < 1.
(2) In Theorem 3.10.1, (b) also holds when A1 , A2 , · · · are only pairwise
independent.
Corollary 3.10.1. (Borel 0 − 1 Law) Let {An : n ≥ 1} be a sequence of

3.10. ZERO-ONE LAW 43
(pairwise) independence events. Then

∞
X
P (An , i.o.) = 0 if P (An ) < ∞
n=1
∞
X
= 1 if P (An ) = ∞
n=1
Corollary 3.10.2. If A1 , A2 , · · · are independent and An → A, then P (A) =

0 or 1.
Proof.

P (A) = P lim An = P lim sup An = P (An , i.o.).
n→∞ n
Apply Borel 0-1 Law, we have the result.

Example
You play a game with your friend by tossing a fair coin. You win the n-th
round of the game if ”Head” appears in that round.
Let An be the event that you win the n-th round. Then, Acn will represent
that you lose the n-th round.
A1 , A2 , · · · are independent and P (An ) = 21 .

∞ ∞
X X 1
P (An ) = = ∞.
n=1 n=1
2
By Theorem 3.10.1 (Borel-Cantelli lemma), we have P (lim supn An ) = P (An , i.o.) =

1, i.e., you will win infinitely many rounds with probability 1.
Definition The tail σ-field (or remote future) of a sequence {Xn , n ≥ 1}

of random variables on (Ω, A, P ) is
∩∞
n=1 σ(Xn , Xn+1 , · · · ).
The sets of all tail σ-field are called tail events.
Theorem 3.10.2. (Kolmogorov’s 0−1 Law) Suppose that X1 , X2 , · · · are

independent and that A ∈ ∩∞
n=1 σ(Xn , Xn+1 , · · · ). Then either P (A) = 0 or
1.
Proof. The proof of this theorem is not required.

Examples of Tail events
(i) If Bn ∈ B, then {Xn ∈ Bn , i.o.} is a tail event since
{Xn ∈ Bn , i.o.} = lim sup{Xn ∈ Bn }

n
= ∩∞ ∞ ∞
n=1 ∪m=n {Xm ∈ Bm } ∈ ∩n=1 σ(Xn , Xn+1 , · · · ).
(ii) {An , i.o.} is a tail event since by taking Xn = IAn , Bn = {1} in (i)
{An , i.o.} = {IAn = 1, i.o.} = {Xn ∈ Bn , i.o.}.

Chapter 4
Expectation and Integration

4.1 Expectation
Let X be a r.v. on (Ω, A, P ).
Pn
Definition
P The expectation of a simple r.v. X = i=1 ai IAi , where
Ai ∈ A, ni=1 Ai = Ω and ai ∈ R, is
n
X
E[X] = ai P (Ai ).
i=1
Note: ni=1 Ai = Ω has already included the assumption of A1 , A2 , · · · being

P
disjoint (see P.2 of Chapter 1).
Definition X is a nonnegative r.v. on (Ω, A, P ) (i.e. X(ω) ≥ 0 for all

ω ∈ Ω).
(a) The expectation of X is
E[X] = lim E[Xn ] ≤ ∞,

n→∞
where Xn ≥ 0 are simple r.v.’s and Xn ↑ X.
(b) The expectation of X over the event A ∈ A is EA [X] =: E[XIA ].
Remark
(1) The following notations are often used to denote E[X] and EA [X]:
Z Z
E[X] = X(ω)P (dω) = XdP ;
Z Ω ZΩ
EA [X] = X(ω)P (dω) = XdP.

A A
(2) Theorem 3.5.2 guarantees the existence of the sequence of simple r.v.’s
for a nonnegative r.v. X in (a).
(3) (a) shows that if X ≥ 0, then E[X] ≥ 0.
45
46 CHAPTER 4. EXPECTATION AND INTEGRATION
(4) We can show that E[X] is well defined for both simple and nonnegative
r.v.’s in the following senses (the proofs are omitted):
(a) (Simple r.v.)

If X = Pni=1 ai IAi = m bj IBj with Ω = ni=1 Ai = m
P P P P
j=1 j=1 Bj , then
n P m
E[X] = i=1 ai P (Ai ) = j=1 bj P (Bj ).
(b) (Nonnegative r.v.)
If Xn ≥ 0, Yn ≥ 0 are simple r.v.’s and Xn ↑ X, Yn ↑ X. then
E[X] = limn→∞ E[Xn ] = limn→∞ E[Yn ].
Recall
For a general r.v. X on (Ω, A, P ), we have
X = X + − X −, |X| = X + + X − . (4.1)
where
X + = max{X, 0} = XI{X≥0} ≥ 0 and X − = max{−X, 0} = −XI{X≤0} ≥ 0.
Definition Let X be a general r.v. on (Ω, A, P ).
(a) For a general r.v. X, if either E[X + ] < ∞ or E[X − ] < ∞ (but not
both), then the expectation of X is
E[X] = E X + − E X − .

In this case, E[X] is said to exist and E[X] ∈ [−∞, ∞].
(b) If E [X + ] = E [X − ] = ∞, then E[X] is not defined (see Section 2.4).
(c) X is said to be integrable if E[|X|] =: E [X + ] + E [X − ] < ∞.
(d) If X is integrable and A ∈ A, the expectation of X over A is
EA [X] = E[XIA ].
(e) Define L1 = {X : E[|X|] < ∞}, the class of all integrable r.v.’s on
(Ω, A, P ).
4.2. PROPERTIES OF EXPECTATION 47
4.2 Properties of Expectation

Theorem 4.2.1. Assume that X, Y, X1 , · · · , Xn below are r.v.’s on (Ω, A, P ).
(i) E[C] = C where C is a real constant.
(ii) E[IA ] = P (A) where A ∈ A.
(iii) |E[X]| ≤ E[|X|].
(iv) (Linearity) If X, Y ∈ L1 , and a, b ∈ (−∞, ∞), then aX + bY ∈ L1 ,

and
E[aX + bY ] = aE[X] + bE[Y ]. (4.2)
(v) If A ∈ A and P (A) = 1, then E[X] = EA [X].
(vi) Suppose that X ≥ 0. Then E[X] = 0 if and only if X =a.s. 0.
(vii) If X, Y ∈ L1 and X =a.s. Y , then E[X] = E[Y ].
(viii) If E[|X|] < ∞, then |X| < ∞ a.s.
(ix) (Absolute integrability)

E[X] is finite if and only if E[|X|] is finite.
(x) (Monotonicity)
If X1 ≤ X ≤ X2 a.s., then
E[X1 ] ≤ E[X] ≤ E[X2 ].
(xi) (Mean value theorem)

If a ≤ X ≤ b a.s. on A ∈ A, then
aP (A) ≤ EA [X] ≤ bP (A).
Proof.
(i) E[C] = E[CIΩ ] = CP (Ω) = C.
(ii) E[IA ] = E[1 × IA + 0 × IAc ] = P (A).
(iii) |E[X]| = |E[X + ] − E[X − ]| ≤ E[X + ] + E[X − ] = E[|X|].

(iv) Before we prove the linearity for X, Y ∈ L1 , we first show that (4.2) is
true for simple r.v.’s and then for nonnegative r.v.’s.
Suppose X and Y are simple r.v.’s and are given by
n
X m
X
X= ai IAi and Y = bj IBj ,
i=1 j=1
Pn Pm
where Ai , Bj ∈ A, i=1 Ai = j=1 Bj = Ω and ai , bj ∈ R.
So,
n X
X m
aX + bY = (aai + bbj )IAi ∩Bj
i=1 j=1
is a simple r.v. and therefore,

n X
X m
E[aX + bY ] = (aai + bbj )P (Ai ∩ Bj )
i=1 j=1
Xn m
X
= a ai P (Ai ) + b bj P (Bj )
i=1 j=1
= aE[X] + bE[Y ].
Now, suppose X and Y are nonnegative r.v.’s, so there are two increas-
ing sequences of nonnegative simple r.v.’s {Xn , n ≥ 1} and {Yn , n ≥ 1}
such that Xn ↑ X and Yn ↑ Y .
For a, b ≥ 0,
E[aX + bY ] = lim E[aXn + bYn ]
n→∞
= lim (aE[Xn ] + bE[Yn ])
n→∞
= a lim E[Xn ] + b lim E[Xn ] = aE[X] + bE[Y ]
n→∞ n→∞
Suppose X, Y ∈ L1 .
By the triangle inequality: |aX + bY | ≤ |a||X| + |b||Y |. Note that
each term is nonnegative, so their expectations are well defined and
monotonicity of expectations for nonnegative r.v.’s implies
E[|aX + bY |] ≤ E[|a||X| + |b||Y |]
= |a|E[|X|] + |b|E[|Y |] (by the linearity of nonnegative r.v.’s)
< ∞.
Thus, aX + bY ∈ L1 .
To prove (4.2) for general r.v.’s, it is equivalent to show that
(a) E[X + Y ] = E[X] + E[Y ].
(b) E[aX] = aE[X].
Proof of (a).
By definition,
(X + Y )+ − (X + Y )− = X + Y = X + − X − + Y + − Y − .
Therefore,
(X + Y )+ + X − + Y − = (X + Y )− + X + + Y + . (4.3)
All the r.v.’s in (4.3) are nonnegative, so by the linearity of nonnegative

r.v.’s, we have
E[(X + Y )+ ] + E[X − ] + E[Y − ] = E[(X + Y )− ] + E[X + ] + E[Y + ]
E[(X + Y )+ ] − E[(X + Y )− ] = E[X + ] − E[X − ] + E[Y + ] − E[Y − ]
E[X + Y ] = E[X] + E[Y ]
Proof of (b).
If a ≥ 0, then
E[aX] = E[(aX)+ ] − E[(aX)− ]
= E[aX + ] − E[aX − ]
= a E[X + ] − E[X − ] = aE[X].

If a < 0, then
(aX)+ = aXI{aX≥0} = (−a)(−X)IX≤0 = (−a)X − .
Similarly, (aX)− = (−a)X + . Therefore,
E[aX] = E[(aX)+ ] − E[(aX)− ]
= E[(−a)X − ] − E[(−a)X + ]
= −a E[X − ] − E[X + ] = aE[X].

Combining (a) and (b), we have

by (a) by (b)
E[aX + bY ] = E[aX] + E[bY ] = aE[X] + bE[Y ].
(v)
E[X] = E[XIA + XIAc ]

= EA [X] + EAc [X]. (by linearity)
However,
0 ≤ |EAc [X]| ≤ EAc [|X|] ≤ ∞ × P (Ac ) = ∞ × 0 = 0.
Therefore, EAc [X] = 0 and E[X] = EA [X].
(vi) First assume E[X] = 0, we will show that X =a.s. 0. Suppose that
X =a.s. 0 is NOT true, i.e.,
P (X = 0) < 1 ⇒ FX (0) = P (X ≤ 0) = P (X = 0) = 1.
⇒ ∃ε > 0 s.t. FX (ε) < 1. (since FX is right continuous.)
⇒ P (X > ε) > 0.
Therefore,
E[X] = E[XI{X>ε} ] + E[XI{X≤ε} ]

≥ E[XI{X>ε} ] (since XI{X≤ε} ≥ 0)
≥ εE[I{X>ε} ] = εP (X > ε).
Since E[X] is assumed to be 0, P (X > ε) = 0. It is a contradiction.

This proves that X =a.s. 0.
Next assume X =a.s. 0, we will show E[X] = 0.

Consider,
0 ≤ E[X] = E[XI{X=0} ] + E[XI{X>0} ]

= E[XI{X>0} ]

≤ sup X(ω) E[I{X>0} ] ≤ sup X(ω) P (X > 0)
ω∈Ω ω∈Ω
≤ ∞ × 0 = 0.
(vii) X =a.s. Y ⇒ 1 = P (X = Y ) = P (X − Y = 0) = P (|X − Y | = 0).

Define Z = X − Y .
P (|Z| = 0) = 1 ⇒ P (Z + + Z − = 0) = 1 ⇒ P (Z + = 0) = P (Z − = 0) =
1.
From (vi), we have E[Z] = E[Z + ] − E[Z − ] = 0.
By linearity, we have E[X] = E[Y ].
(viii) We prove it by contradiction.

Suppose that P (|X| = ∞) > 0. We have
E[|X|] = E[|X|I{|X|<∞} ] + E[|X|I{|X|=∞} ]

≥ E[|X|I{|X|=∞} ] = ∞ × P (|X| = ∞) = ∞.
This implies that E[|X|] = ∞, which contradicts to our assumption of

E[|X|] < ∞.
(ix) |E[X]| < ∞ ⇔ E[X + ] < ∞ and E[X − ] < ∞ ⇔ E[|X|] < ∞.
(x) It suffices to show that X ≤ Y a.s. implies E[X] ≤ E[Y ].

If both E[X] and E[Y ] are ∞ (or −∞), then the inequality clearly
holds.
Otherwise, since Y − X ≥ 0 a.s., by the linearity and positively of
expectations, the proof follows from
E[Y ] − E[X] = E[X − Y ] ≥ 0.
(xi) a ≤ X ≤ b a.s. ⇒ aIA ≤ XIA ≤ bIA a.s.

The proof is done by taking expectation on both sides.
Remark The linearity in Theorem 4.2.1 can be generalized for X, Y 6∈ L1

provided that the RHS of (iv) is meaningful, namely not ∞ − ∞ or −∞ − ∞.
For example, (iv) is true when X, Y ≥ 0 and a, b ≥ 0.
Also, there are some useful limiting properties of the expectation. They are
stated in the following theorem and the proofs of them are omitted.
Theorem 4.2.2.
(i) (Fatou’s lemma)

If Xn ≥ 0 a.s. for n = 1, 2, · · · , then
E[lim inf Xn ] ≤ lim inf E[Xn ].

n n
(ii) (Monotone convergence theorem)

Let X, X1 , X2 , · · · be nonnegative r.v.’s with Xn (ω) ↑ X(ω) for all ω ∈
Ω. Then limn→∞ E[Xn ] = E[limn→∞ Xn ] = E[X].
(iii) (Dominated convergence theorem)

If Xn → X a.s., |Xn | < Y a.s. for all n, and E[Y ] < ∞, then
lim E[Xn ] = E[ lim Xn ] = E[X].

n→∞ n→∞
4.3 Integration
4.3.1 Definition
In the previous section, we defined the integration of a random variable over
a probability space. Now, we can extend the corresponding definition to a
measurable function over a general measure space (Ω, A, µ). Here, µ is not
necessarily a probability measure (i.e. µ(Ω) 6= 1).
Definition Let f be a measurable function on (Ω, A, µ) (i.e., f : (Ω, A, µ) →

(Rn , B(Rn ))). The integral of f with respect to µ is denoted by
Z Z
f (ω)µ(dω) = f dµ. (4.4)
Ω Ω
(a) If f is a simple function which is given as

n
X
f= ai IAi
i=1
Pn
where Ai ∈ A, i=1 Ai = Ω and ai ∈ R, then
Z n
X
f dµ = ai µ(Ai ).
Ω i=1
4.3. INTEGRATION 53
(b) If f ≥ 0, define Z Z
f dµ = lim fn dµ,
Ω n→∞
where fn ≥ 0 are simple functions and fn ↑ f .

(c) For a general measurable function f = f + − f − , define
Z Z Z
f dµ =: f dµ − f − dµ.
+
Ω Ω Ω
if either RΩ f + dµ < ∞ or Ω f − dµ < ∞. If Ω f + dµ = ∞ and Ω f − dµ =

R R R R
∞, then Ω f dµ is not defined.
Remark
(i) Some of the properties of the expectation in Theorems 4.2.1 and 4.2.2
are also valid for the integration of measurable functions.
Exercise Please list those properties which are valid for the integral of
f in (4.4).
(ii) In the case of (Ω, A, µ) = (R, B, µ), if we write x = ω ∈ R, then

Z Z
f (ω)µ(dω) = f (x)µ(dx)
R R
is just the Lebesgue-Stieltjes integral of f with respect to µ.

(iii) In the case of (Ω, A, µ) = (R, B, λ) where λ is the Lebesgue measure,
introduced in Section 2.5.2, then
Z Z
f (x)λ(dx) = f (x)dx.
R R
is just the Lebesgue integral of f with respect to λ.

R
A function f is called Lebesgue integrable if R
|f (x)|λ(dx) < ∞.
Although the Lebesgue integral looks similar to the Riemann integral

which we learnt in our elementary calculus courses, they are different
in general. The following result show their relationship:
If a function is Riemann integrable, then it is Lebesgue integrable and

its Lebesgue integral is equal to the Riemann integral. However, the
converse is not true.
4.4 How to compute expectation

Theorem 4.4.1. (Change of variable formula) Assume the following
holds:
(i) Let X be a r.v. from (Ω, A, P ) to (R, B, PX ) where PX = P · X −1 is

the induced probability by X (or distribution of X).
(ii) g is a Borel function on (R, B).
(iii) Either g ≥ 0 or E[|g(X)|] < ∞.
Then Z
E[g(X)] = g(y)PX (dy).
R
Proof.
Case I: Indicator functions.
If g = IB with B ∈ B, then the relevant definitions show
E[g(X)] = E[IB (X)] = P (X ∈ B)

Z Z
= PX (B) = IB (y)PX (dy) = g(y)PX (dy).
R R
Case II:PSimple functions.

Let g = ni=1 bi IBi with Bi ∈ B and bi ∈ R. The linearity of expected value,
the result of Case I, and the linearity of integration imply
" n # n n Z
X X X
E[g(X)] = E bi IBi (X) = bi E[IBi (X)] = bi IBi (y)PX (dy)
i=1 i=1 i=1 R
Z n
! Z
X
= bi IBi (y) PX (dy) = g(y)PX (dy).
R i=1 R
Case III: Nonnegative functions.

Now if g ≥ 0, then there exists a sequence of simple functions {gn , n ≥ 1}
such that 0 ≤ gn ↑ g. From Case II and the Monotone Convergence Theorem,
we get
Z Z
E[g(X)] = lim E[gn (X)] = lim gn (X)PX (dy) = g(y)PX (dy).
n→∞ n→∞ R R
Case IV: Integrable functions.

4.4. HOW TO COMPUTE EXPECTATION 55
For the general case, we can write g(x) = g + (x) − g − (x). The condition
that g is integrable guarantees that E[g + (x)] < ∞ and E[g − (x)] < ∞. So
from Case III for nonnegative functions and linearity of expected value and
integration
E[g(X)] = E[g + (X)] − E[g − (X)]

Z Z
= +
g (y)PX (dy) − g − (y)PX (dy)
ZR R
= g(y)PX (dy).
R
Remark The importance application of the above theorem is that we can

compute expected values of functions of random variables by performing
Lebesgue or Lebesgue-Stieltjes integral on the real line R instead of eval-
uating the original definition of expectation (or integral) over the abstract
probability space (Ω, A, P ).
4.4.1 Expected values of absolutely continuous r.v.

Lemma 4.4.1. Let XR be an absolutely continuous r.v. with density func-
x
tion f , i.e., FX (x) = −∞ f (t)dt. Let PX be the unique probability measure
corresponding to FX . Then
Z Z
PX (B) = f dλ = f (x)dx, ∀B ∈ B, (4.5)
B B
where λ is the Lebesgue measure.

R
Proof. Let A = {A ∈ B : PX (A) = A f (x)dx}. It is easy to show that A is
a σ-field, and A ⊃ S =: {(−∞, x], x ∈ R}. Therefore, A ⊃ σ(S) = B. The
proof is done.
Theorem 4.4.2. Let RX be an absolutely continuous r.v. with density func-

x
tion f (i.e., FX (x) = −∞ f (t)dt) and g is Borel. Then
Z
E[g(X)] = g(x)f (x)dx,
R
R
provided that R
|g(x)|f (x)dx < ∞.
Proof. Let PX be the unique probability measure correponding to FX such

that Z
PX ((a, b]) = FX (b) − FX (a) = f (t)dt. (4.6)
(a,b]
From Lemma 4.4.1, we have

Z
PX (B) = f (x)dx, ∀B ∈ B.
B
R
From Theorem 4.4.1, we have E[g(X)] = R g(x)PX (dx). To complete our
proof, we only need to show that
Z Z
E[g(X)] = g(x)PX (dx) = g(x)f (x)dx. (4.7)
R R
Proof of (4.7). We shall employ the same method used in Theorem 4.4.1.
Case I: Indicator functions
If g = IB with B ∈ B, then
Z Z
LHS = IB (x)PX (dx) = P (X ∈ B) = PX (B) = IB (y)f (y)dy = RHS,
R R
where the second last equality comes from (4.5).
Case II:
PnSimple functions
If g = i=1 bi IBi with Bi ∈ B and bi ∈ R. The linearity of expected value,
the result of Case I, and the linearity of integration imply
Z n
! n Z
X X
LHS = bi IBi (x) PX (dx) = bi IBi (x)PX (dx)
R i=1 i=1 R
n
X Z n
Z X
= bi IBi (y)f (y)dy = bi IBi (y)f (y)dy
R R i=1
Zi=1
= g(y)f (y)dy = RHS.
R
Case III: Nonnegative functions

Now if g ≥ 0, then there exists a sequence of simple functions {gn , n ≥ 1}
such that 0 ≤ gn ↑ g. From Case II and the Monotone Convergence Theorem,
4.4. HOW TO COMPUTE EXPECTATION 57
we get
Z Z
LHS = lim gn (y)PX (dy) = lim gn (y)f (y)dy
n→∞ R n→∞ R
Z
= g(y)f (y)dy = RHS.
R
Case IV: Integrable functions

For the general case, we can write g(x) = g + (x) − g − (x). The condition
implies that g is integrable, i.e., E[g + (X)] < ∞ and E[g − (X)] < ∞. So
from Case III for nonnegative functions and linearity of expected value and
integration
Z Z
LHS = +
g (X)PX (dx) − g − (X)PX (dx)
ZR Z R
= g + (y)f (y)dy − g − (y)f (y)dy = RHS.
R R
This proves (4.7) and hence the theorem.
Remark
(i) For an absolutely continuous r.v. X, we have several equivalent expres-

sions:
Z Z Z
E[g(X)] = g(x)PX (dx) = g(x)dFX (dx) = g(x)f (x)dx.
R R R
(ii) The last integral in (i) is Lebesgue integral, which equals Riemann
integral when the latter exists. This will make our calculations much
easier.
4.4.2 Expected values of discrete r.v.

Theorem 4.4.3. Let X be a discrete r.v. taking values x1 , x2 , · · · , with
probability density function P (X = xk ) = pk for k ≥ 1, and g be Borel. Then
∞
X ∞
X
E[g(X)] = g(xk )P (X = xk ) = pk g(xk ),
k=1 k=1
P∞
provided that k=1 pk |g(xk )| < ∞.
Proof. Clearly, g(X) is a r.v. taking values g(x1 ), g(x2 ), · · · , and we can write
∞
X
g(X) = g(xk )I{X=xk } .
k=1
Case I:
If g(X) ≥ 0, then g(xk ) ≥ 0 for all k ≥ 1. Define
n
X
Zn = g(xk )I{X=xk } , a form of truncated r.v.
k=1
Then clearly 0 ≤ Zn ↑ Z∞ ≡ g(X), and Zn are simple r.v.’s. Either by

the definition of expectation for nonnegative r.v., or simply applying the
Monotone Convergence Theorem, we get
n
X ∞
X
E[g(X)] = lim E[Zn ] = lim g(xk )P (X = xk ) = pk g(xk ).
n→∞ n→∞
k=1 k=1
Case II:
Let us consider
P∞general g. It follows from Case I and the assumption that
E[|g(X)|] = k=1 pk |g(xk )| < ∞. Therefore,
E[g(X)] = E[g + (X)] − E[g − (X)]

∞
X ∞
X
= +
pk g (xk ) − pk g − (xk )
k=1 k=1
∞
X
pk g + (xk ) − g − (xk )

=
k=1
∞
X
= pk g(xk ).
k=1
4.5 Moments
Definition Let X be a r.v. and r > 0,
(1) Define
rth Moment: E[X r ].
rth Absolute Moment: E[|X|r ].
rth Central Moment: E[(X − E[X])r ].
rth Absolute Central Moment: E[|X − E[X]|r ].
4.6. JOINT INTEGRALS 59
(2) Lr spaces = {X : E[|X|r ] < ∞}.
Remark If r = 2, the 2nd central moment is the variance of the random

variable X, i.e. E[(X − E[X])2 ] = V ar(X).
The following theorem links the probability and moment of a random vari-
able.
Theorem 4.5.1. Chebyshev (Markov) inequality If g is strictly in-
creasing and positive on (0, ∞), g(x) = g(−x), and X is a r.v. such that
E[g(X)] < ∞, then for each a > 0:
E[g(X)]
P (|X| ≥ a) ≤ .
g(a)
Proof.
E[g(X)] ≥ E[g(X)I{g(X)≥g(a)} ]
≥ g(a)E[I{g(X)≥g(a)} ]
= g(a)P (g(X) ≥ g(a))
= g(a)P (|X| ≥ a).
Example
E[|X|]
(i) X ∈ L1 ⇒ P (|X| ≥ a) ≤ a
.
E[|X|p ]
(ii) X ∈ Lp ⇒ P (|X| ≥ a) ≤ ap
.
V ar(X)
(iii) X ∈ L2 ⇒ P (|X − E[X]| ≥ a) ≤ a2
.
4.6 Joint integrals

By replacing (R, B, PX ) with (Rn , B(Rn ), PX ) in Theorem 4.4.1, we can have
the change variables formula for n-dimensional random vectors. Let X =
(X1 , · · · , Xn ) be a n-dimensional random vector on (Ω, A, P ) and PX be the
distribution of X and assume the function g satisfies all the properties in
Theorem 4.4.1. Theorem 4.4.1 shows that the expectation of g(X1 , · · · , Xn )
is given by Z
E[g(X1 , · · · , Xn )] = g(x)PX (dx), (4.8)
Rn
where x = (x1 , · · · , xn ) and dx = dx1 × · · · × dxn .
Definition If E[X], E[Y ] and E[XY ] are finite, then the covariance of X
and Y is defined as
Cov(X, Y ) = E[XY ] − E[X]E[Y ].
X and Y are said to be uncorrelated if Cov(X, Y ) = 0.
Theorem 4.6.1. If X, Y are independent and integrable r.v.’s, then
E[XY ] = E[X]E[Y ].
That is, independence implies uncorrelated.

Proof. We shall divide the proof into several steps.
Step 1.
i.e., X = ni=1 ai IAi and Y = m
P P
If X, Y are P
simple r.v.’s., j=1 bj IBj where
n P m
Ai , Bj ∈ A, i=1 Ai = j=1 Bj = Ω and ai , bj ∈ R, then
n
X m
X
E[X] = ai P (Ai ) and E[Y ] = bj P (Bj ).
i=1 j=1
Pn Pm
Note XY = i=1 j=1 ai bj IAi ∩Bj is also a simple r.v. and
P (Ai ∩Bj ) = P ({X = ai }, {Y = bj }) = P (X = ai )P (Y = bj ) = P (Ai )P (Bj ).
Therefore,
n X
X m n X
X m
E[XY ] = ai bj P (Ai ∩ Bj ) = ai bj P (Ai )P (Bj ) = E[X]E[Y ].
i=1 j=1 i=1 j=1
Step 2.
Let byc be the integer part of y.
If X ≥ 0 and Y ≥ 0, then take Xn = b2n Xc/2n and Yn = b2n Y c/2n such

that
(i) Xn and Yn are both simple r.v.’s.
(ii) 0 ≤ Xn ↑ X and 0 ≤ Yn ↑ Y , which in turn implies that
(a) limn→∞ E[Xn ] = E[X] and limn→∞ E[Yn ] = E[Y ], by the Monotone
Convergence Theorem.
(b) 0 ≤ XY − Xn Yn = X(Y − Yn ) + Yn (X − Xn ) → 0 a.s., i.e., 0 ≤
Xn Yn ↑ XY .
4.6. JOINT INTEGRALS 61
(iii) Xn and Yn are independent as Borel functions of X and Y , respectively.

Now applying the Monotone Convergence Theorem, we get
E[XY ] = lim E[Xn Yn ] = lim E[Xn ]E[Yn ] = E[X]E[Y ].
n→∞ n→∞
Step 3.
For general integrable r.v.’s, note that independence of X and Y implies that
of X + and Y + ; X − and Y − ; and so on. Therefore,
E[XY ] = E[(X + − X − )(Y + − Y − )]
= E[X + Y + ] − E[X + Y − ] − E[X − Y + ] + E[X − Y − ]
= E[X + ]E[Y + ] − E[X + ]E[Y − ] − E[X − ]E[Y + ] + E[X − ]E[Y − ]
= (E[X + ] − E[X − ])(E[Y + ] − E[Y − ])
= E[X]E[Y ].
In exactly the same manner, we can show that

Theorem 4.6.2. If X1 , · · · , Xn are independent and all have finite expecta-
tions, then
E[X1 · · · Xn ] = E[X1 ] · · · E[Xn ],
Example (Uncorrelated 6⇒ Independent)
Define X as
1
P (X = −1) = P (X = 0) = P (X = 1) = .
3
Let Y = X 2 .
From Theorem 4.4.3,
1 1 1
E[X] = −1 × + 0 × + 1 × = 0;
3 3 3
2 1 1 1 2
E[Y ] = E[X ] = 1 × + 0 × + 1 × = ;
3 3 3 3
3 1 1 1
E[XY ] = E[X ] = −1 × + 0 × + 1 × = 0.
3 3 3
By the definition of covariance, we have
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
2
= 0 − 0 × = 0.
3
So, X and Y are uncorrelated. It is obvious that X and Y are dependent.
Chapter 5
Convergence of Random Variables

5.1 Types of convergence
Definition Let X, X1 , X2 , · · · be r.v.’s on (Ω, A, P ). We say that
(a) Xn → X almost surely (a.s.) (with probability 1), written as
a.s.
Xn → X, if
n o
P lim Xn = X = P ω ∈ Ω : lim Xn (ω) = X(ω) = 1.
n→∞ n→∞
Lr
(b) Xn → X in rth mean, or in Lr , where 1 ≤ r < ∞, written as Xn → X,
if
lim E[|Xn − X|r ] = 0.
n→∞
p
(c) Xn → X in probability, written as Xn → X, if
lim P (|Xn − X| > ) = 0, for any > 0.
n→∞
a.s.
Theorem 5.1.1. Xn → X if and only if for any > 0,
lim P (∩∞
m=n {|Xm − X| < }) = 1. (5.1)
n→∞
Proof. First note that

n o
ω, lim Xn (ω) = X(ω)
n→∞
∞ \
\[ ∞
= {ω : |Xm (ω) − X(ω)| < }
>0 n=1 m=n
(i.e., ∀ > 0, ∃n ≥ 1, s.t. |Xm (ω) − X(ω)| < ∀m ≥ n.)
∞ [ ∞ \ ∞
\ 1
= ω : |Xm (ω) − X(ω)| <
k=1 n=1 m=n
k
∞ ∞
\ \ 1
= lim ω : |Xm (ω) − X(ω)| < (i)
k=1
n→∞
m=n
k
∞
\ 1
= lim lim ω : |Xm (ω) − X(ω)| <
k→∞ n→∞
m=n
k
∞
\ 1
= lim lim |Xm − X| <
k→∞ n→∞
m=n
k
63
64 CHAPTER 5. CONVERGENCE OF RANDOM VARIABLES
(Recall if An ↑ A, Bn ↓ B, then A = limn→∞ An = ∪∞

n=1 An and B =
∞
limn→∞ Bn = ∩n=1 Bn .)
a.s.
If Xn → X, then it follows from (i) that
∞ !
n o \ 1
1=P ω, lim Xn (ω) = X(ω) ≤ lim P ω : |Xm (ω) − X(ω)| < ≤ 1,
n→∞ n→∞
m=n
k
which implies (5.1) holds.

If (5.1) holds, since probability measure is continuous, we have
∞ !
n o \ 1
P ω, lim Xn (ω) = X(ω) = lim lim P ω : |Xm (ω) − X(ω)| < ,
n→∞ k→∞ n→∞
m=n
k
a.s.
which implies that Xn → X.
p
Theorem 5.1.2. Xn → X if and only if

|Xn − X|
lim E = 0.
n→∞ 1 + |Xn − X|
Proof. There is no loss of generality by taking X = 0 (Why?). Thus we want

to show
p |Xn |
Xn → 0 if and only if lim E = 0.
n→∞ 1 + |Xn |
p
First suppose that Xn → 0. Then for any > 0, limn→∞ P (|Xn | > ) = 0.
Note that
|Xn | |Xn |
0≤ ≤ I{|Xn |>} + I{|Xn |≤} ≤ I{|Xn |>} + .
1 + |Xn | 1 + |Xn |
Therefore,

|Xn |
0≤E ≤ E[I{|Xn |>} ] + = P (|Xn | > ) + .
1 + |Xn |
Taking limits and using the assumption of limn→∞ P (|Xn | > ) = 0 yield

|Xn |
0 ≤ lim E ≤ ;
n→∞ 1 + |Xn |
5.2. RELATIONSHIP BETWEEN TYPES OF CONVERGENCES 65
h i
|Xn |
since is arbitrary we have limn→∞ E 1+|X n|
= 0.
h i
|Xn | x
Next suppose limn→∞ E 1+|X n|
= 0. The function f (x) = 1+x
is strictly
increasing. Therefore
f ()I{|Xn |>} ≤ f (|Xn |)I{|Xn |>}

|Xn | |Xn |
⇒ I{|Xn |>} ≤ I{|Xn |>} ≤ .
1+ 1 + |Xn | 1 + |Xn |
Taking expectations and then limits yield

|Xn |
lim P (|Xn | > ) ≤ lim E = 0.
1 + n→∞ n→∞ 1 + |Xn |
Since > 0 is fixed, we conclude limn→∞ P (|Xn | > ) = 0.

x
Remark In the proof of Theorem 5.1.2, we use f (x) = 1+x . However, we
can generalize f to any function on [0, ∞) which is bounded, nondecreasing
on [0, ∞), continuous, and with f (0) = 0 and f (x) > 0 when x > 0.
So, by taking f (x) = min{|x|, 1}, we have
p
Xn → X if and only if lim E[min{|Xn − X|, 1}] = 0.
n→∞
When no limit is specified, the next theorem is useful to check whether the
sequence of r.v.’s converges almost surely or not.
Theorem 5.1.3. (Cauchy Criterion of a.s.) Xn converges a.s. if and
only if
lim P (|Xm − Xm0 | > , all m > m0 ≥ n) = 0, for any > 0.

n→∞
or equivalently,

lim P sup |Xm − Xn | > = 0, for any > 0.
M →∞ m,n≥M
5.2 Relationship between types of convergences

Theorem 5.2.1. Let {Xn , n ≥ 1} be a sequence of random variables.
Lr p
(a) If Xn → X for 1 ≤ r < ∞, then Xn → X.
a.s. p
(b) If Xn → X, then Xn → X.
Proof.
(a) By Chebyshev inequality (Theorem 4.5.1), we have, for any > 0,
E[|Xn − X|r ]
0 ≤ P (|Xn − X| ≥ ) ≤ → 0, as n → ∞.
r
p
Thus, Xn → X.
|Xn −X| a.s. |Xn −X|
(b) Since 1+|X n −X|
→ 0 and 1+|X n −X|
≤ 1 always, by the Dominated Con-
vergence Theorem (see Theorem 4.2.2 (iii)), we have

|Xn − X| |Xn − X|
lim E = E lim = E[0] = 0.
n→∞ 1 + |Xn − X| n→∞ 1 + |Xn − X|
The proof then follows from Theorem 5.1.2.
Remark The following examples show the converses in Theorem 5.2.1 may
not hold.
p Lr
(i) (Xn → X 6⇒ Xn → X)
Let P (Xn = 0) = 1 − n−1 and P (Xn = n) = n−1 .

h i
|Xn |
Consider E 1+|X n|
.

|Xn | 0 1 n 1
E = × 1− + ×
1 + |Xn | 1+0 n 1+n n
1
= →0 as n → ∞.
n+1
p
From Theorem 5.1.2, we have Xn → 0.
However,
lim E[|Xn |] = lim (0 × P (Xn = 0) + n × P (Xn = n)) = 1 6→ 0.

n→∞ n→∞
p L1
This case shows that Xn → 0 6⇒ Xn → 0.
5.2. RELATIONSHIP BETWEEN TYPES OF CONVERGENCES 67
p a.s.
(ii) (Xn → X 6⇒ Xn → X)
Let P (Xn = 0) = 1 − n−1 and P (Xn = 1) = n−1 , and Xn ’s are inde-

pendent.
p
From (a), we have Xn → 0.
For any 0 < < 1, we have

(∩∞ ∩km=n |Xm | < = lim P ∩km=n |Xm | <

P m=n |Xm − 0| < ) = P lim
k→∞ k→∞
Yk Yk
1 − m−1

= lim P (|Xm | < ) = lim
k→∞ k→∞
m=n m=n
n−1 n k−1 n−1
= lim ··· = lim = 0.
k→∞ n n+1 k k→∞ k
a.s.
By Theorem 5.1.1, we see that Xn 6→ 0.
(iii) (”a.s. convergence” and ”Lr convergence” do not imply each other)
a.s. Lr
(a) (Xn → X 6⇒ Xn → X)
Let P (Xn = 0) = 1 − n−2 and P (Xn = n3 ) = n−2 .

For any > 0, consider
∞
X ∞
X
P (|Xn − 0| ≥ ) = n−2 < ∞
n=1
∞
!n=1 ∞
[ X
⇒ P |Xm − 0| ≥ ≤ P (|Xm − 0| ≥ ) → 0 as n → ∞.
m=n m=n
So,
∞
! ∞
!!
\ [
lim P |Xm − 0| < = lim 1−P |Xm − 0| ≥ = 1.
n→∞ n→∞
m=n m=n
a.s.
By Theorem 5.1.1, we have Xn → 0.
L1
Since E|Xn − 0| = n → ∞ as n → ∞, Xn 6→ 0.
Lr a.s
(b) (Xn → X 6⇒ Xn → X)
Let P (Xn = 0) = 1 − n−1 and P (Xn = 1) = n−1 , and Xn ’s are

independent.
L1
E|Xn − 0| = 1/n → 0 as n → ∞. So, Xn → 0.
a.s.
From (ii), Xn 6→ 0.
5.3 Partial converses

Although the converse of (a) and (b) in Theorem 5.2.1 may not be true, we
have the partial converse of them. The proofs of the following two partial
converses are not required in this course.
p
Theorem 5.3.1. Suppose Xn → X. Then there exists a subsequence nk
such that limk→∞ Xnk = X almost surely.
p
Theorem 5.3.2. If Xn → X, |Xn | ≤ Y a.s. for all n, and E[Y r ] < ∞ for
Lr
r > 0, then Xn → X, which in turn implies that limn→∞ E[Xnr ] = E[X r ].
5.4 Closed operations of convergences

Theorem 5.4.1.
mode mode mode
(a) If Xn → X and Yn → Y , then Xn ± Yn → X ± Y . Here, mode ∈
{p, Lr , a.s.}.
(b) Let X1 , X2 , · · · and X be k-dimensional random vectors, g : Rk → R be

continuous. Then
mode mode
Xn → X ⇒ g(Xn ) → g(X).
Here, mode ∈ {p, a.s.}.
Proof. The proof of this theorem is omitted.

Chapter 6
The Law of Large Numbers

Let X, X1 , X2 , · · · be r.v.’s on (Ω, A, P ).
6.1 Strong Law of Large Numbers

Theorem 6.1.1 (Strong Law of Large Numbers). Let X1 , X2 , · · · are inde-
pendent and identically
Pn distributed (i.i.d.) r.v.’s with finite mean and finite
variance. Let Sn = i=1 Xi . Then
Sn mode
→ E[X1 ], mode ∈ {L2 , a.s.}. (6.1)
n
Proof.
L2 convergence
Noting that,
n
Sn 1X
E = E[Xi ]
n n i=1
= E[X1 ] (since Xi are identically distributed)
Consider
" 2 #
Sn
E − E[X1 ]
n
" 2 #
Sn Sn
= E −E
n n

Sn
= V ar
n
n
1 X
= V ar(Xi ) (since Xi are independent)
n2 i=1
1
= V ar(X1 ) (since Xi are identically distributed)
n
→ 0 as n → ∞. (since V ar(X1 ) is finite)
2
Sn L
Therefore, we have n
→ E[X1 ].
69
70 CHAPTER 6. THE LAW OF LARGE NUMBERS
Almost sure convergence

Case I: Nonnegative r.v.’s.
Suppose Xi ≥ 0 for i = 1, 2, · · · .
Pn
Define Yk = Xk I{Xk ≤k} and their partial sums Sn∗ = k=1 Yk .
For a fixed α > 1, let un = bαn c (the integer part of αn ). First, we want to
show
∞ ∗
Sun − E[Su∗n ]

X
P > < ∞. (6.2)

n=1
un
Since the Xn are i.i.d.,
n
X n
X
V ar[Sn∗ ] = V ar[Yk ] ≤ E[Yk2 ]
k=1 k=1
n
X
= E[X12 I{X1 ≤k} ] ≤ nE[X12 I{X1 ≤n} ].
k=1
It follows by Chebyshev inequality (Theorem 4.5.1) that
∞ ∗ ∞
Sun − E[Su∗n ] V ar[Su∗n ]
X X
P
> ≤
n=1
un
n=1
2 un 2
" ∞
#
1 X 1
≤ 2 E X12 I{X1 ≤un } .
n=1 n
u
Let K = 2α/(α − 1), and suppose x > 0. Let N = min{n : un ≥ x}. Then
αN ≥ x, and since y ≤ 2byc for y ≥ 1,
X X
u−1
n ≤ 2 α−n = Kα−N ≤ Kx−1 .
un ≥x n≥N
Therefore, ∞ −1 −1
P
n=1 un I{X1 ≤un } ≤ KX1 for X1 > 0, and the sum in (6.2) is at
most K−2 E[X1 ] < ∞.

Su∗ −E[Su∗ ] Su∗ (ω)−E[Su∗ ]
Let An = n un n > =: ω ∈ Ω : n un n > .
6.1. STRONG LAW OF LARGE NUMBERS 71
By the Borel-Cantelli lemma (Theorem 3.10.1 (a)), we have

P (lim sup An ) = 0
n
∞ ∞
!
\ [
⇒ P Ak = 0
n=1 k=n
∞ \ ∞
!
[
⇒ P Ack =1
n=1 k=n
∞
∗ !
Suk − E[Su∗k ]

\
⇒ P lim ≤ =1
n→∞
k=n
uk

∞ ∗ !
Suk − E[Su∗k ]

\
⇒ lim P ≤ =1
n→∞
k=n
uk

By Theorem 5.1.1, we have

Su∗n − E[Su∗n ] a.s.
→ 0.
un
Because of Cesàro averages1 , we have
Su∗n a.s.
→ E[X1 ].
un
Since
X∞ ∞
X
P (Xn 6= Yn ) = P (X1 > n)
n=1 n=1
Z ∞
≤ P (X1 > t)dt
Z0 ∞
= (1 − FX1 (t))dt (where FX1 is the d.f. of X1 .)
0
Z ∞
= tdFX1 (t) = E[X1 ] < ∞.
0
Apply the Borel-Cantelli lemma (Theorem 3.10.1 (a)) again with
An = {Xn 6= Yn } =: {ω ∈ Ω : Xn (ω) 6= Yn (ω)},
P ({Xn 6= Yn }, i.o.) = 0
⇒ P ({Xn = Yn }, utl.) = 1 − P ({Xn 6= Yn }, i.o.) = 1
Thus, there exists a set C ∈ A with the following properties:

Pn
1
Cesàro averages states that: If xn → x, then n−1 k=1 xk → x.
(i) P (C) = 1, and
(ii) ∀ω ∈ C, ∃N (ω) ≥ 1 such that Xn (ω) = Yn (ω) for all n ≥ N (ω).

With the above properties, we have
PN (ω)−1
(Sn∗ (ω) − Sn (ω)) (Yk (ω) − Xk (ω))
∀ω ∈ C and n ≥ N (ω), = k=1 → 0 as n → ∞
n n
Sun (ω) S ∗ (ω)
⇒ ∀ω ∈ C, lim = lim un = E[X1 ]
n→∞ un n→∞ un
Sun a.s.
⇒ → E[X1 ] as n → ∞. (6.3)
un
If un ≤ k ≤ un+1 , then since Xn ≥ 0,
Sun ≤ Sk ≤ Sun+1
Sun Sk Su
⇒ ≤ ≤ n+1
k k k
Sun Sun Sk Su Su
⇒ ≤ ≤ ≤ n+1 ≤ n+1
un+1 k k k un
un Sun Sk un+1 Sun+1
⇒ ≤ ≤ .
un+1 un k un un+1
un+1
But un
→ α, and so it follows from (6.3) that
1 Sk
E[X1 ] ≤ lim ≤ αE[X1 ]
α k→∞ k
with probability 1. Now, let α ↓ 1, we have

Sk a.s.
→ E[X1 ].
k
Case II: General r.v.’s.
For the general case, we can write Xn = Xn+ − Xn− . Then

Pn Pn
Sn +
k=1 Xk X−
= − k=1 k .
n n n
Xk+ Xk−
Pn Pn
Apply Case I to k=1
n
and k=1
n
, we have
Sn a.s.
→ E[X1+ ] − E[X1− ] = E[X1 ].
n
6.2. WEAK LAW OF LARGE NUMBERS 73
Example
Let {Xj , j ≥ 1} be a sequence of i.i.d. Bernoulli r.v.’s with P (Xj = 1) = p
and P (Xj = 0) = 1 = 1 − p.
E[X1 ] = E[X2 ] = · · · = p. From the Strong Law of Large Numbers, we have
Sn a.s.
→ E[X1 ] = p.
n
The result shows that the fraction of success for n trials will converge almost
surely to the probability of success of each trial in the long run.
6.2 Weak Law of Large Numbers

Since convergence almost surely (with probability 1) implies convergence
in probability, it follows under the hypotheses of the Strong Law of Large
Number (Theorem 6.1.1) that
Sn p
→ E[X1 ].
n
This is the weak law of large numbers.
The weak law of large numbers can also be proved by using Chebyshev in-
equality (Theorem 4.5.1) as follows:

Sn V ar[Sn ] V ar(X1 )
P − E[X1 ] ≥ ≤

2 2
= → 0 as n → ∞.
n n n2
Therefore, we have
Sn p
→ E[X1 ].
n
Chapter 7
The Central Limit Theorem

7.1 Characteristic functions
Definition Let X be a r.v. on (Ω, A, P ). The characteristic function of
X is defined for t ∈ R by
ϕ(t) = E[eitX ] = E[cos(tX) + i sin(tX)] = E[cos(tX)] + iE[sin(tX)], (7.1)

√
where i = −1.
Remark
(a) Let PX = P · X −1 and FX (x) = P (X ≤ x) be the distribution and

distribution function of X respectively, then ϕ(t) in (7.1) can be written
as Z Z
itx
ϕ(t) = e PX (dx) = eitx dFX (x). (7.2)
R R
(b) If X is absolutely continuous and has the density function f (x), then
ϕ(t) in (7.1) can be written as
Z
ϕ(t) = eitx f (x)dx. (7.3)
R
In this case, ϕ(t) in (7.3) is known as the Fourier transform of f (x).
(c) By extending the domain of ϕ(t) to a complex plane, i.e., t is allowed to

be a complex number, the moment generating function M (t) of X
can be obtained as
M (t) = ϕ(−it) = E[etX ].
7.1.1 Properties of characteristics functions

Theorem 7.1.1.
(i) ϕ(0) = 1.
(ii) ϕ(−t) = ϕ(t), where z̄ is the complex conjugate of a complex number

z, i.e., z̄ = x − iy when z = x + iy.
75
76 CHAPTER 7. THE CENTRAL LIMIT THEOREM
(iii) If X1 and X2 are independent and have characteristic functions ϕ1 (t)

and ϕ2 (t) respectively, then X1 + X2 has the characteristic function
ϕ1 (t)ϕ2 (t).
(iv) ϕ(t) is bounded by 1, i.e., |ϕ(t)| ≤ 1 for all t ∈ R.
(v) ϕ(t) is uniformly continuous on (−∞, ∞).
Proof.
(i) ϕ(0) = E[ei·0·X ] = E[e0 ] = E[1] = 1.
(ii)
ϕ(−t) = E[e−itX ]
= E[cos(−tX) + i sin(−tX)]
= E[cos(tX) − i sin(tX)] (since cos(−x) = cos(x) and sin(−x) = − sin(x).)
= E[cos(tX)] − iE[sin(tX)] = ϕ(t).
(iii) Let ϕX1 +X2 (t) be the characteristic function of X1 + X2 .

ϕX1 +X2 (t) = E[eit(X1 +X2 ) ] = E[eitX1 eitX2 ]
= E[eitX1 ]E[eitX1 ] (since X1 and X2 are independent.)
= ϕ1 (t)ϕ2 (t).
(iv)
|ϕ(t)| = |E[eitX ]|
≤ E[|eitX |] (by Theorem 4.2.1 (iii).)
= E[1] = 1
(v) For any h > 0, we have

|ϕ(t + h) − ϕ(t)| = |E[ei(t+h)X ] − E[eitX ]|
≤ E[|ei(t+h)X − eitX |]
= E[|eitX ||eihX − 1|] = E[|eihX − 1|]
a.s.
Because |eihX −1| → 0 and |eihX −1| ≤ |eihX |+1 = 2, by the dominated
convergence theorem (Theorem 4.2.2 (iii)),
|ϕ(t + h) − ϕ(t)| ≤ E[|eihX − 1|] → 0, as h → 0.
Hence, ϕ(t) is uniformly continuous on (−∞, ∞).

7.1. CHARACTERISTIC FUNCTIONS 77
7.1.2 Moments and derivatives

Lemma 7.1.1. For any real x,
n k
n+1 n

ix X
e − (ix) ≤ min |x| 2|x|
, . (7.4)

k=0
k! (n + 1)! n!
Proof. Integration by parts gives:

Z x Z x
n is xn+1 i
(x − s) e ds = + (x − s)n+1 eis ds. (7.5)
0 n + 1 n + 1 0
When n = 0,
Z x Z x
is
e ds = x + i (x − s)eis ds
0 0
Z x
eix = 1 + ix + i2
(x − s)eis ds.
0
When n = 1,
x x
x2
Z Z
is i
(x − s)e ds = + (x − s)2 eis ds.
0 2 2 0
It follows by induction that

n x
(ix)k in+1
X Z
ix
e = + (x − s)n eis ds, (7.6)
k=0
k! n! 0
for n ≥ 0. Replace n by n − 1 in (7.5), solve for the integral on the right,

and substitute this for the integral in (7.6); this gives
n x
(ix)k in
X Z
ix
e = + (x − s)n−1 (eis − 1)ds. (7.7)
k=0
k! (n − 1)! 0
By considering the integral in (7.6), since |eis | ≤ 1 for all s, it follows that
n+1 Z x Z x n+1
= |x|
i n is
1 n

n! (x − s) e ds≤
n! (x − s) ds (n + 1)! . (7.8)
0 0
For the integral in (7.7), since |eis − 1| ≤ 2 for all s, it follows that
Z x Z x
in 2|x|n

n−1 is
2 n−1

(n − 1)! (x − s) (e − 1)ds ≤
(n − 1)!
(x − s) ds= .
0 0
n!
(7.9)
Combining (7.8) and (7.9), we have

n k
n+1 n

ix X
e − (ix) ≤ min |x| 2|x|
, , (7.10)
k!
k=0
(n + 1)! n!
for n ≥ 0.
Remark Let
|x|n+1
(n+1)! |x|
R(x) = 2|x|n
= .
2(n + 1)
n!
If n is fixed, then
(i) R(x) > 1 when x is large. So, when x is large,

n k
n
ix X
e − (ix) ≤ 2|x| .

k=0
k! n!
(ii) On the other hand, R(x) < 1 when x is small. So, when x is small,
n k
n+1
(ix) ≤ |x|
ix X
e −
k! (n + 1)! .
k=0
Theorem 7.1.2. If X has a moment of order n (i.e., E[|X|n ] < ∞), then
(i)
n
(it)k |tX|n+1 2|tX|n

X
ϕ(t) − k
E[X ] ≤ E min , . (7.11)
k!
k=0
(n + 1)! n!
In particular, if E[X 2 ] < ∞, then
1
ϕ(t) = 1 + itE[X] − t2 E[X 2 ] + o(t2 ), as t → 0.
2
g(u)
[Recall: a function g is o(u) if limu→0 u
= 0.]
(ii)
∂ n ϕ(t)
= ϕ(n) (t) = in E[X n eitX ].
∂tn
In particular, ϕ(n) (0) = in E[X n ].
Proof.
(i) Replacing x with tX and taking expectation on both side of (7.4), we

have
" n #
k
|tX|n+1 2|tX|n

itX X (itX)
E e −
≤ E min , .
k!
k=0
(n + 1)! n!
Using the fact that |E[Y ]| ≤ E[|Y |] for any r.v. Y , (7.11) can then be
proved.
When n = 2, the RHS of (7.11) is given by
|tX|3 2|tX|2 t2

= E min |t||X|3 , 6|X|2 .

E min ,
3! 2! 6
Noting that,
a.s.
(a) min {|t||X|3 , 6|X|2 } → 0 as t → 0, and
(b) min {|t||X|3 , 6|X|2 } ≤ 6|X|2 .
With (a), (b) and E[|X|2 ] < ∞, by the dominated convergence theorem
(Theorem 4.2.2 (iii)), we have
|tX|3 2|tX|2

E min , = o(t2 ).
3! 2!
Therefore, we have
1
ϕ(t) = 1 + itE[X] − t2 E[X 2 ] + o(t2 ), as t → 0.
2
(ii) When n = 1 and E[|X|] < ∞, for any h > 0, we have

ihX

ϕ(t + h) − ϕ(t) itX itX e − 1 − ihX
− E[iXe ] = E e . (7.12)
h h
Consider
itX eihX − 1 − ihX
ihX
e
e
= − 1 − ihX

h h

1 1 2
≤ min (hX) , 2|hX| (from Lemma 7.1.1.)
h 2

1 2
= min hX , 2|X| .
2
Noting that,
a.s.
(a) min{hX 2 , 2|X|} → 0 as h → 0, and
(b) min{hX 2 , 2|X|} ≤ 2|X|.
With (a), (b) and E[|X|] < ∞, by the dominated convergence theorem
(Theorem 4.2.2 (iii)) , we can show that
ihX

itX e − 1 − ihX
lim E e = 0.
h→0 h
Hence, from (7.12),
0 0
ϕ (t) = E[iXeitX ] and ϕ (0) = iE[X].
Repeating this argument inductively gives, provided E[|X|n ] < ∞,

ϕ(n) (t) = in E[X n eitX ] and ϕ(n) (0) = in E[X n ].
7.1.3 Correspondence between characteristics function-

s and distribution
A characteristic function ϕ(t) uniquely determines the distribution PX of the
random variable X. This fundamental fact will be derived by means of the
following inversion formula through which PX can in principle be recovered
from ϕ.
R
Theorem 7.1.3. (Inversion Theorem) Let ϕ(t) = R eitx PX (dx) be the
characteristic function of the r.v. X with the distribution PX . If a < b, then
Z T −ita
1 e − e−itb 1
lim ϕ(t)dt = PX ((a, b)) + PX ({a, b}).
T →∞ 2π −T it 2
Example
(a) Poisson distribution:
The probability density function is given by
e−λ λk
P (X = k) = pk = , k = 0, 1, 2, · · · .
k!
∞
X e−λ λk
ϕ(t) = E[eitX ] = eitk
k=0
k!
∞ k
−λ
X (eit λ)
= e
k=0
k!
−λ eit λ it −1)
= e e = eλ(e .
(b) Other distributions:

Distribution Probability density function Interval ϕ(t)
−x 2 t2
1. Normal √1 e 2
2π
−∞ < x < ∞ e− 2
eit −1
2. Uniform 1 0<x<1 it
3. Exponential e−x 0<x<∞ 1

1−it
4. Cauchy 1 1
π 1+x2
−∞ < x < ∞ e−|t|
Remark Since eitX ∈ L1 , the characteristic functions always exist. How-

ever, the moment generating functions may not exist. For example, the
moment generating function of Cauchy distribution does not exist.
R∞
Theorem 7.1.4. If −∞
|ϕ(t)|dt < ∞, then PX has a bounded continuous
density Z ∞
1
f (y) = e−ity ϕ(t)dt. (7.13)
2π −∞
Proof. First, note that

−ita Z b
−itb

e − e
= −ity

≤ |b − a|.
e dy (7.14)
it
a

From the Inversion Theorem (Theorem 7.1.3),

1
0 ≤ PX ((a, b)) + PX ({a, b})
2
1 ∞ e−ita − e−itb
Z

= ϕ(t)dt
2π −∞
it
Z ∞ −ita −itb

1 e − e
≤ ϕ(t)dt
2π −∞ it
Z ∞
b−a
≤ |ϕ(t)|dt < ∞.
2π −∞
Let b → a. Then PX ({a}) = 0.

Since a is arbitrary, PX ({x}) = 0 for all x ∈ R.
Let FX (x) be the distribution function of X. For any h > 0,
FX (x + h) − FX (x)
h
PX ([−∞, x + h]) − PX ([−∞, x])
=
h
PX ((x, x + h])
=
h
Z ∞ −itx
1 e − e−it(x+h)
= ϕ(t)dt (since PX ({x}) = 0 for all x ∈ R.)
2πh −∞ it
Z ∞ Z x+h
1 −ity
= e dy ϕ(t)dt
2πh −∞ x
Z ∞
1 x+h 1
Z
−ity
= e ϕ(t)dt dy
h x 2π −∞
Z ∞
1
→ e−itx ϕ(t)dt ≡ f (x) as h → 0.
2π −∞
Thus Z x
dFX (x)
= f (x) or FX (x) = f (y)dy + C.
dx −∞
When x = −∞, FX (−∞) = 0 ⇒ C = 0.

Therefore, f (x) is the density function of FX (x) or PX .
Remark Equation (7.13) in Theorem 7.1.4 is known as the inverse Fourier

transform of ϕ(t). A lot of mathematical softwares such as Matlab have
the built-in algorithm to evaluate it.
Definition Let F and {Fn , n ≥ 1} be distribution functions. If, for all

continuous points x of F ,
lim Fn (x) = F (x),

n→∞
we say that Fn converges weakly to F , denoted by Fn ⇒ F .

(n) (n)
If PX and PX are the distribution of Fn and F , then PX is said to converge
(n)
weakly to PX , denoted as PX ⇒ PX if and only if Fn ⇒ F .
Definition Let {Xn , n ≥ 1} be a sequence of random variables, Fn be the

distribution function of Xn and F be the distribution function of the r.v. X.
d
Xn is said to converge to X in distribution, written as Xn −→ X if Fn ⇒ F .
d
Theorem 7.1.5. (Slutsky Theorem) Let C be a constant. If Xn −→ X
a.s.
and Yn −→ C. Then
d
(a) Xn + Yn −→ X + C.
d
(b) Xn Yn −→ XC.
Xn d X
(c) Yn
−→ C
.
Theorem 7.1.6. (Continuity Theorem) Let F, F1 , F2 , · · · be distribution

functions with characteristic functions ϕ, ϕ1 , ϕ2 , · · · .
(i) If Fn ⇒ F , then ϕn (t) → ϕ(t) for all t.
(ii) If ϕn (t) → ϕ(t) for all t, and ϕ(t) is continuous at 0, then Fn ⇒ F .

7.2 The Central Limit Theorem

In the section, we use N (0, 1) to represent the standard normal distribution.
Theorem 7.2.1. (Central Limit Theorem) Let
2
Pn X1 , X2 , · · · be i.i.d. with
mean µ and finite positive variance σ . If Sn = i=1 Xi , then
Sn − nµ
√ ⇒ N (0, 1).
σ n
or equivalently
Sn − nµ d
√ −→ X.
σ n
where X ∼ N (0, 1).
0
Proof. Let Xj = Xj − µ. It is sufficient to prove that
n
01 X 0
Sn = √ X ⇒ N (0, 1).
σ n j=1 j
By Theorem 7.1.2,
0 σ 2 t2
ϕX 0 (t) = E[eitX1 ] = 1 − + o(t2 ).
i 2
n 0
1 Pn 0 X
it σ√ j=1 Xj it σ√jn
Y
ϕSn0 = E e n
= E[e ]
j=1
n
Y t
= ϕX 0 √
j=1
σ n1
n
t
= ϕX 0 √
1 σ n
2 n
σ 2 t2

t
= 1− 2 +o
2σ n σ2n
n
t2 t2

t2
= 1− +o → ϕ(t) ≡ e− 2 as n → ∞.
2n n
It is obvious that ϕ(t) is continuous at 0. By Theorem 7.1.6,
0
Sn ⇒ N (0, 1).
7.2. THE CENTRAL LIMIT THEOREM 85
Example
Let X1 , X2 , ·P
· · be i.i.d. r.v’s with P (Xj = 1) = p and P (Xj = 0) = q = 1−p.
Then Sn = nj=1 Xj follows a binomial distribution with parameters n (no.
of trials) and p (prob. of success), denoted as Sn ∼ B(n, p).
E[Xi ] = p and var(Xi ) = pq. By the Central Limit Theorems (Theorem

7.2.1), we have
Sn − np d
√ −→ X
npq
where X ∼ N (0, 1).
Chapter 8
Martingales
8.1 Conditional Expectation
Given a probability space (Ω, F0 , P ), a σ-field F ⊂ F0 and a r.v. X ∈ F0
with E[|X|] < ∞.
Definition The random variable Y satisfied the following conditions:
(i) Y ∈ F.
R R
(ii) ∀A ∈ F, A
XdP = A
Y dP .
Then Y is called the conditional expectation of X on F. We denote Y

as E[X|F].
Lemma 8.1.1. If (i) and (ii) hold, then E[|Y |] < ∞.
Proof. Letting A = {Y > 0}. Then A ∈ F and Ac ∈ F.

Z Z Z
Y dP = XdP ≤ |X|dP (8.1)
A A A
Z Z Z
(−Y )dP = (−X)dP ≤ |X|dP (8.2)
Ac Ac Ac
Eq. (8.1) + Eq. (8.2) which gives

Z Z Z
|Y |dP = Y dP + (−Y )dP
Ω A A c
Z
≤ |X|dP
Ω
So, we have E[|Y |] ≤ E[|X|] < ∞.
Definition Let µ and ν be the measures on (Ω, F). The measure ν is said
to be absolutely continuous with respect to µ if, for A ∈ F,
µ(A) = 0 ⇒ ν(A) = 0.
Notation: ν µ.
87
88 CHAPTER 8. MARTINGALES
Theorem 8.1.1. Radon-Nikodym Theorem

Let µ and ν be σ-finite measures on (Ω, F). If ν µ, there is a function
f ∈ F so that for all A ∈ F
Z
f dµ = ν(A).
A
dν
f is usually denoted dµ
and called the Radon-Nikodym derivative.
Proof. The proof of the Radon-Nikodym Theorem is not required and hence
omitted.
Theorem 8.1.2. (a) Uniqueness
If Y 0 also satisfies (i) and (ii), then
Y =Y0 a.s.
(b) Existence E[X|F] always exists.

Proof. (a) If Y 0 satisfies (i) and (ii), then Y 0 dP
R R
A
Y dP = A0
for all
A ∈ F.
Let A = {Y − Y 0 ≥ > 0}. We have, A ∈ F and
Z
0 = (Y − Y 0 )dP ≥ P (A).
A
So, P (A) = 0. Let = n1 . An = {Y − Y 0 ≥ n1 }. Then

An ⊂ An+1 ⊂ · · · ⇒ A0 = {Y − Y 0 > 0}
⇒ P (A0 ) = lim P (An ) = 0
n→∞
Similarly, we have
P (Y 0 − Y > 0) = 0.
Thus,
P (Y = Y 0 ) = 1, ⇒ Y =Y0 a.s.
(b) Suppose first that X ≥ 0. Let µ = P and
Z
ν(A) = XdP for A ∈ F
A
The dominated convergence theorem implies ν is a measure (check it as

your own exercise) and the definition of the integral implies ν µ. The
dν
Radon-Nikodym derivative dµ ∈ F and for any A ∈ F has
Z Z
dν
XdP = ν(A) = dP
A A dµ
8.1. CONDITIONAL EXPECTATION 89
dν
Taking A = Ω, we see that dµ
≥ 0 is integrable, and we have shown that
dν
dµ
is a version of E[X|F].
To treat the general case now, write X = X + − X − , let Y1 = E[X + |F]
and Y2 = E[X − |F]. Now, Y1 − Y2 ∈ F is integrable, and for all A ∈ F
we have
Z Z Z
XdP = +
X dP − X − dP
A ZA Z A
= Y1 dP − Y2 dP
ZA A
= (Y1 − Y2 )dP
A
This shows Y1 − Y2 is a version of E[X|F] and completes the proof.
8.1.1 Examples
Intuitively, we think of F as describing the information we have at our dis-
posal - for each A ∈ F, we know whether or not A has occurred. E[X|F] is
then our ”best guess” of the value of X given the information we have.
Example If X ∈ F, then E[X|F] = X; i.e., if we know X then our ”best

guess” is X itself.
A special case of this example is X = c, where c is a constant.
Example At the other extreme from perfect information is no information.

Suppose X is independent of F, i.e., for all B ∈ R and A ∈ F
P ({X ∈ B} ∩ A) = P (X ∈ B)P (A)
We want to show that E[X|F] = E[X]; i.e., if we don’t know anything about
X, then the best guess is the mean E[X].
Proof. Note that E[X] ∈ F so (i) holds.

To verify (ii), we observe that if A ∈ F then since X and IA ∈ F are
independent, so
Z Z
XdP = E[XIA ] = E[X]E[IA ] = E[X]dP
A A
Example In this example, we relate the new definition of conditional ex-

pectation to the first one taught in an undergraduate probability course.
Suppose Ω1 , Ω2 , · · · is a finite or infinite partition of Ω into disjoint sets, each
of which has positive probability, and let F = σ(Ω1 , Ω2 , · · · ). be the σ-field
generated by these sets. Then
E[XI{X∈Ωi } ]
E[X|F] = on Ωi
P (Ωi )
E[XI {X∈Ωi } ]
Proof. Observe that P (Ωi )
is constant on each Ωi , so it is measurable
with respect to F. To verify (ii), it is enough to check the equality for A = Ω,
but it is trivial:
E[XI{X∈Ωi } ]
Z Z
dP = E[XI{X∈Ωi } ] = XdP
Ωi P (Ωi ) Ωi
A degenerate but important special case is F = {∅, Ω}, the trivial σ-field.
In this case, E[X|F] = E[X].
To continue the connection with undergraduate notions, let
P (A|G) = E[IA |G]

P (A|B) = P (A ∩ B)/P (B)
and observe that in the last example P (A|F) = P (A|Ωi ) on Ωi .
Example Let G ∈ G such that

R
P (A|G)dP
P (G|A) = RG
Ω
P (A|G)dP
Proof. By the definition of conditional expectation, for all B ∈ G,
Z Z
P (A|G)dP = E[IA |G]dP
B ZB Z
= IA dP = IA∩B dP
B Ω
= P (A ∩ B)
Let B = G. Z
P (A|G)dP = P (A ∩ G)
G
Let B = Ω. Z
P (A|G)dP = P (A)
Ω
Therefore, we have
R
P (A ∩ G) P (A|G)dP
= RG = P (G|A)
P (A) Ω
P (A|G)dP
Let Ω = ∪∞ i=1 Gi and Gi are disjoint and G = σ(G1 , G2 , G3 , · · · ). We can

deduce the Bayes formula as follows:
E(IA∩Gi )
P (A|G) = E(IA |G) = on Gi
P (Gi )
P (A ∩ Gi )
=
P (Gi )
R
P (A|G)dP
P (Gi |A) = RGi
P (A|G)dP
Ω
P (A ∩ Gi )
= P∞ R
i=1 Gi P (A|G)dP
P (A ∩ Gi )
= P∞ Bayes formula
i=1 P (A ∩ Gi )
Definition The conditional expectation of the r.v. X with respect to the

r.v. Y is defined as
E[X|Y ] = E[X|σ(Y )]
where σ(Y ) is the σ-field generated by Y .
RExample Let f (x, y) be the joint density of X and Y , i.e., P ((X, Y ) ∈ B) =

2
B
f (x, y)dxdy. for all B ∈ R .
R
Suppose that f (x, y)dx > 0 for all y.
We want to show that, if E[|g(X)|] < ∞, then

R
g(x)f (x, Y )dx
E[g(X)|Y ] = R
f (x, Y )dx
R
g(x)f (x,Y )dx
Proof. Let h(Y ) = R f (x,Y )dx . It is clear that h(Y ) ∈ σ(Y ), so (i) holds.
To check (ii). If A ∈ σ(Y ), then
A = {ω : Y (ω) ∈ B}
for some B ∈ R.
Z Z
h(Y )dP = h(Y )I(Y ∈B) dP = E[h(Y )I(Y ∈ B)]
A
ZΩ∞ Z ∞
= dx h(y)I(y∈B) f (x, y)dy
−∞ −∞
Z Z ∞
= dy h(y)f (x, y)dx
B −∞
R∞ Z ∞
g(x)f (x, y)dx
Z
−∞
= dy · R ∞ · f (x, y)dx
B −∞
f (x, y)dx −∞
Z Z ∞
= dy g(x)f (x, y)dx
B −∞
Z ∞Z ∞
= g(x)f (x, y)I(y ∈ B)dxdy
−∞ −∞
Z
= E[g(X)IB (Y )] = g(X)dP
A
Therefore,
E[g(X)|Y ] = h(Y )
R
g(x)f (x, Y )dx
=
f (x, Y )dx
Remark In particular, when g(x) = x,

R
xf (x, Y )dx
E[X|Y ] = R
f (x, Y )dx
When g(x) = I{X∈A} ,
P (X ∈ A|Y ) = E[IX∈A |Y ]
R
IX∈A f (x, Y )dx
= R
f (x, Y )dx
Z
f (x, Y )
= R dx
A f (x, Y )dx
The conditional density function f (x|y) is defined by
f (x, y)
f (x|y) = R
f (x, y)dx
Example Suppose X and Y are independent. Let ϕ be a function with

E[|ϕ(X, Y )|] < ∞ and let g(x) = E[ϕ(x, Y )]. Then
E[ϕ(X, Y )|X] = g(X).
Proof. It is clear that g(X) ∈ σ(X). To check (ii), note that if for any
A ∈ σ(X) then there exists C ∈ R so that A = {X ∈ C}.
Z Z
ϕ(X, Y )dP = ϕ(X, Y )dP = E[ϕ(X, Y )I{X∈C} ]
A X∈C
Z Z
= ϕ(x, y)I{X∈C} ν(dy)µ(dx)
Z Z
= I{X∈C} [ ϕ(x, y)ν(dy)]dµ(x)
Z
= I{X∈C} g(x)µ(dx)
Z Z
= g(x)µ(dx) = g(x)dP
X∈C A
⇒ g(X) = E[ϕ(X, Y )|X]
8.1.2 Properties
Theorem 8.1.3. (a) Linearity
E[aX + bY |F] = aE[X|F] + bE[Y |F]
(b) Monotonicity If X ≤ Y then
E[X|F] ≤ E[Y |F] a.s.
(c) Monotone Convergence Theorem If Xn ≥ 0 and Xn ↑ X with E[X] < ∞

then
E[Xn |F] ↑ E[X|F]
Proof. (a) For (i), since E[X|F], E[Y |F] ∈ F, then aE[X|F]+bE[Y |F] ∈ F.
To check (ii), if A ∈ F, then
Z Z Z
(aX + bY )dP = a XdP + b Y dP
A ZA A Z
= a E[X|F]dP + b E[Y |F]dP
ZA A
= a [aE[X|F] + bE[Y |F]]dP

A
So, E[aX + bY |F] = aE[X|F] + bE[Y |F].

(b) For all A ∈ F,
Z Z Z Z
E[X|F]dP = XdP ≤ Y dP = E[Y |F]dP
ZA A A A
⇒ [E[X|F] − E[Y |F]]dP ≤ 0.

A
Let A = {E[X|F] − E[Y |F] ≥ > 0} ∈ F. So,

Z
P (A) ≤ [E[X|F] − E[Y |F]]dP ≤ 0
A
⇒ P (A) = 0
Let → 0. We have
P (E[X|F] − E[Y |F] > 0) = 0

⇒ E(X|F) ≤ E(Y |F) a.s.
(c) Let Yn = X − Xn . It suffices to show that
E(Yn |F) ↓ 0.
Since Yn is nonincrease, Zn ≡ E(Yn |F) ↓ a limit Z∞ . If A ∈ F, then

Z Z
Zn dP = Yn dP → 0.
A A
By the Dominated Convergence Theorem,

Z Z Z
Z∞ dP = lim Zn dP = lim Zn dP = 0
A A n→∞ n→∞ A
for all A ∈ F. Therefore, Z∞ = 0 a.s.

Theorem 8.1.4. Jensen’s Inequality If ϕ is convex and E[|X|], E[|ϕ(X)|] <

∞, then
ϕ(E[X|F]) ≤ E[ϕ(X)|F].
Proof. Omitted.
Theorem 8.1.5. (a) E[E(Y |F)] = E[Y ]
(b)
|E[X|F]|p ≤ E[|X|p |F] for p ≥ 1.
Proof. (a) For all A ∈ F,

Z Z
E[Y |F]dP = Y dP
A A
Let A = Ω.
Z Z
E[E(Y |F)] = E[Y |F]dP = Y dP = E[Y ]
Ω Ω
(b) Use Jensen’s inequality with ϕ(x) = |x|p .
Theorem 8.1.6. If F1 ⊂ F2 , then
(a) E[E(X|F1 )|F2 ] = E[X|F1 ]
(b) E[E(X|F1 )|F2 ] = E[X|F1 ]
Proof. (a) It is obvious.
(b) For all A ∈ F1 ⊂ F2 , we have

Z Z Z
E[X|F2 ]dP = XdP = E[X|F1 ]dP
A A A
So E[E(X|F2 )|F1 ] = E[X|F1 ].
Theorem 8.1.7. If X ∈ F and E[|Y |], E[|XY |] < ∞, then
E[XY |F] = XE[Y |F]

Proof. To check (i), we have XE(Y |F) ∈ F

To check (ii), for all A ∈ F.
Case 1:
When X = IB ∈ F.
Z Z Z
XY dP = Y IB dP = Y dP
A ZA A∩B
= E[Y |F]dP
ZA∩B Z
= IB E[Y |F]dP = XE[Y |F]dP
A A
⇒ E[XY |F] = XE[Y |F]
Case 2:
If X, Y ≥ 0, let {Xn , n ≥ 1} be a sequence of simple r.v’s. and Xn ↑ X.
Using the Monotone Convergence Theorem,
Z Z
XE[Y |F]dP = lim Xn E[Y |F]dP
A n→∞ A
Z
= lim E[Xn Y |F]dP
n→∞ A
Z
= lim E[Xn Y |F]dP
n→∞
ZA
= E[XY |F]dP
A
Therefore, E[XY |F] = XE[Y |F].

Case 3:
If X, Y are general r.v’s, then
X = X+ − X− and Y =Y+−Y−
E[XY |F] = E[X + Y + |F] − E[X − Y + |F] − E[X + Y − |F] + E[X − Y − |F]
= X + E[Y + |F] − X − E[Y + |F] − X + E[Y − |F] + X − E[Y − |F]
= XE[Y |F]
Theorem 8.1.8. Suppose E[X 2 ] < ∞, E[X|F] is the random variable Y ∈

F that minimizes the ”mean square error”
E[X − Y ]2
8.2. MARTINGALES 97
Proof.
E[X − Y ]2 = E[X − E(X|F) + (Y − E(X|F))]2

= E[X − E(X|F)]2 + 2E{[X − E(X|F)][Y − E(X|F)]} + E[Y − E(X|F)]2
E{[X − E(X|F)][Y − E(X|F)]} = E[E{[X − E(X|F)][Y − E(X|F)]}|F]

= E[E[(X − E(X|F))|F] · [Y − E(X|F)]]
= 0
So
E[X − Y ]2 = E[X − E(X|F)]2 + E[Y − E(X|F)]2

≥ E[X − E(X|F)]2
and ”=” holds only if Y = E(X|F).
8.2 Martingales
Definition Let {Fn : n ≥ 1} be a sequence of σ-field. {Fn : n ≥ 1} is called
a filtration if
F1 ⊆ F2 ⊆ F3 ⊆ · · ·
Definition Let {Fn : n ≥ 1} be a filtration. A sequence of r.v.’s {Xn : n ≥

1} is said to be adapted to Fn if Xn ∈ Fn for all n.
Definition A sequence {Xn , n ≥ 1} is said to be a martingales with respect

to {Fn : n ≥ 1} if
(i) E[|Xn |] < ∞.
(ii) {Xn : n ≥ 1} is adapted to {Fn : n ≥ 1}.
(iii) E[Xn+1 |Fn ] = Xn for all n.
Definition A sequence {Xn , n ≥ 1} is said to be a
(a) supermartingales with respect to {Fn : n ≥ 1} if (iii) is replaced by

E[Xn+1 |Fn ] ≤ Xn for all n.
(b) submartingales with respect to {Fn : n ≥ 1} if (iii) is replaced by

E[Xn+1 |Fn ] ≥ Xn for all n.
Example (Simple Random Walk) Define

1 if the nth toss is head
ξn = .
−1 if the nth toss is tail
Let Xn = ξ1 + · · · + ξn and Fn = σ(ξ1 , ξ2 , · · · , ξn ) for n ≥ 1, X0 = 0 and

F0 = {∅, Ω}.
Show that {Xn , n ≥ 0} is a martingale with respect to {Fn , n ≥ 1}.
Proof. We observe that Xn ∈ Fn , E[|Xn |] < ∞, and ξn+1 is independent of
Fn , so using the linearity of conditional expectation, we have
E[Xn+1 |Fn ] = E[Xn |Fn ] + E[ξn+1 |Fn ]

= Xn + E[ξn+1 ]
= Xn
Note: In this example, Fn = σ(X1 , · · · , Xn ) and {Fn , n ≥ 1} is the smallest

filtration that Xn is adapted to.
Theorem 8.2.1. If {Xn , n ≥ 1} is a supermartingale then for n > m,
E[Xn |Fm ] ≤ Xm .
Proof. The definition of supermartingale gives the result for n = m + 1.
Suppose n = m + k with k ≥ 2. By Theorem 8.1.3,
E[Xm |Fm ] = E[Xm+k |Fm ] = E[E[Xm+k |Fm+k−1 ]|Fm ] ≤ E[Xm+k−1 |Fm ] ≤ · · · ≤ E[Xn |Fn ] = Xn
Theorem 8.2.2. (i) If {Xn , n ≥ 1} is a submartingale then for n > m,

then E[Xn |Fm ] ≥ Xm .
(ii) If Xn is a martingale then for n > m, then E(Xn |Fm ) = Xm .

Proof. (i) Note that {−Xn , n ≥ 1} is a supermartingale and use Theorem
8.1.3.
(ii) Observe that {Xn , n ≥ 1} is a supermartingale and a submartingale.
Theorem 8.2.3. If {Xn , n ≥ 1} is a martingale w.r.t. {Fn , n ≥ 1} and ϕ

is a convex function with E[|ϕ(Xn )|] < ∞ for all n then {ϕ(Xn ), n ≥ 1} is a
submartingale w.r.t. {Fn , n ≥ 1}. Consequently, if p ≥ 1 and E[|Xn |p ] < ∞
for all n, then {|Xn |p , n ≥ 1} is a submartingale w.r.t. {Fn , n ≥ 1}.
8.2. MARTINGALES 99
Proof. By Jensen’s inequality and the definition
E[ϕ(Xn+1 )|Fn ] ≥ ϕ(E[Xn+1 |Fn ]) = ϕ(Xn )
Theorem 8.2.4. If {Xn , n ≥ 1} is a submartingale w.r.t. {Fn , n ≥ 1}

and ϕ is an increasing convex function with E[|ϕ(Xn )|] < ∞ for all n, then
{ϕ(Xn ), n ≥ 1} is a submartingale w.r.t. {Fn , n ≥ 1}. Consequently
(i) If {Xn , n ≥ 1} is a submartingale then {(Xn − a)+ , n ≥ 1} is a sub-

martingale.
(ii) If {Xn , n ≥ 1} is a supermartingale then {min{Xn , a}, n ≥ 1} is a

supermartingale.
Proof. By Jensen’s inequality and the assumptions
E[ϕ(Xn+1 )|Fn ] ≥ ϕ(E[Xn+1 |Fn ]) ≥ ϕ(Xn )

Lecture Notes of Advanced Probability

Uploaded by

Lecture Notes of Advanced Probability

Uploaded by

Contents

4 Expectation and Integration 45

5 Convergence of Random Variables 63

6 The Law of Large Numbers 69

7 The Central Limit Theorem 75

A set is a collection of objects (or elements), denoted by capital letters, eg.

∅: an empty set. Ω: a space (a nonempty reference set).

A ⊂ B: the set A is a subset of B means that every element of A is an

A class is a collection of subsets of Ω, denoted by A, B, C, D, E, F, G, · · · .

Convention: ∅ ⊂ A for any set A.

1.2 Basic set operations

Definition If A ∩ B = ∅, then A and B are said to be disjoint.

1.3 Operations of sequence of sets

Definition Let A1 , A2 , A3 , · · · be subsets of Ω. Define

(b) Ultimately (ult.)

(c) The sequence {An } converges to A, written as A = limn→∞ An or simply

Upper limit: lim sup an = inf sup ak ;

Lower limit: lim inf an = sup inf ak .

Theorem 1.3.1. We have

lim inf An ⊂ lim sup An .

lim inf An = {ω : ω ∈ An for all but finitely many values of n}

Example. (lim inf n An and lim supn An may not be equal)

Theorem 1.3.2. (Monotone sequence of sets converge).

Proof. We shall only prove (a). Let A = ∪∞

Corollary 1.3.1. Let A1 , A2 , · · · be subsets of Ω.

{An , ult.} = lim inf An = lim ∩∞

Proof. We shall only prove that the first one. Let Bn = ∪∞

1.4 Indicator functions

The indicator function transforms operations of sets into algebraic operations

Theorem 1.4.1. ∀A, B ⊂ Ω, we have

Proof. We shall only prove some of the above relationships.

If IA∪B (ω) = 1 ⇒ ω ∈A∪B

Ilim inf n An = lim inf n IAn :

If Ilim inf n An (ω) = 1

On the other hand,

If Ilim inf n An (ω) = 0

⇒ lim inf IAn (ω) = 0.

(a) IA = 1, IB = 1, then LHS = 0 = RHS.

(b) IA = 1, IB = 0, then LHS = 1 = RHS.

(c) IA = 0, IB = 1, then LHS = 1 = RHS.

(d) IA = 0, IB = 0, then LHS = 0 = RHS.

This completes the proof.

ILHS = |IA∆B − IC | = ||IA − IB | − IC |,

IRHS = ||IB − IC | − IA | = |1 − IC − IA | = ILHS .

I∪nj=1 Aj = s1 − s2 + s3 − · · · + (−1)n−1 sn . (1.1)

If I∪nj=1 Aj (ω) = 0 ⇒ ω 6∈ Aj for all j.

On the other hand,

Suppose that ω belong to exactly m of the sets A1 , A2 , · · · , An . Then

s1 − s2 + s3 − · · · + (−1)n−1 sn = C1m − C2m + · · · + (−1)m−1 Cm

= C0m − (1 − 1)m = 1 = I∪nj=1 Aj .

1.5 Semirings, Fields and σ-Fields

Definition A class F of subsets of Ω is a semiring on Ω if

(iii) if A, B ∈ F and A ⊂ B, then B − A is a finite disjoint union of sets in

(i.e. the semiring F is closed under intersection.)

Definition A class F of subsets of Ω is a field (or algebra) on Ω if

(i.e. F is closed under the formation of complements and finite unions.)

Definition A class F of subsets of Ω is a σ-field (or σ-algebra) on Ω if

(i.e. F is closed under the formation of complements and countable unions.)

(a) If F is a field (or a σ-field), then ∅ ∈ F.

Proof. By (i), Ω ∈ F. Use (ii), ∅ = Ωc ∈ F.

(b) For A, B ∈ F, by DeMorgan’s law, we have A ∩ B = (Ac ∪ B c )c . By

(d) In probability, Ω is a sample space, and measurable sets are events.

(b) Let Ω = (0, 1]. Define

Then F is a field, but NOT a σ-field.

Proof. The proof of F being a field is easy (left as an exercise). Now, we

So, F is not closed under countable intersections, from remark (b), F is

(a) F = {∅, Ω}, (trivial σ-algebra). This is the smallest σ-field.

1.6 Minimal σ-fields

1.7 Product spaces

Moreover, if Ai ∈ Ai , 1 ≤ i ≤ n, they are called measurable rectangles

(3) n-dim product measurable space:

Definition Assume A1 , A2 , · · · is a sequence of disjoint sets in A and ∞

2.2 Some examples of measure

Proof. Let A1 , A2 , · · · be a sequence of disjoint sets in P(Ω) and consider

(b) (Discrete probability space)

P is a probability measure on (Ω, P(Ω)). Then (Ω, P(Ω), P ) is called the