Lecture Notes of Advanced Probability
Lecture Notes of Advanced Probability
1 Set Theory 1
1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic set operations . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Operations of sequence of sets . . . . . . . . . . . . . . . . . . 2
1.4 Indicator functions . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Semirings, Fields and σ-Fields . . . . . . . . . . . . . . . . . . 7
1.6 Minimal σ-fields . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Measure 13
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Some examples of measure . . . . . . . . . . . . . . . . . . . . 13
2.3 Properties of a measure . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Arithmetics with ∞ . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Extension of measures . . . . . . . . . . . . . . . . . . . . . . 17
2.5.1 Complete measure spaces . . . . . . . . . . . . . . . . . 17
2.5.2 Caratheodory Extension Theorem . . . . . . . . . . . . 18
2.6 Probability measures and distribution functions . . . . . . . . 20
3 Random Variables 25
3.1 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Measurable mapping . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Random Variables (Vectors) . . . . . . . . . . . . . . . . . . . 27
3.3.1 Random variables . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 How to check a random variable? . . . . . . . . . . . . 28
3.3.3 Random vectors . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Construction of random variables . . . . . . . . . . . . . . . . 28
3.5 Approximations of r.v. by simple r.v.s . . . . . . . . . . . . . 30
3.6 σ-field generated by random variables . . . . . . . . . . . . . . 33
3.7 Distributions and induced distribution functions . . . . . . . . 34
3.8 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.9 How to check independence . . . . . . . . . . . . . . . . . . . 38
3.9.1 Discrete random variables . . . . . . . . . . . . . . . . 39
3.9.2 Absolutely continuous random variables . . . . . . . . 40
3.10 Zero-One Law . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1
2 CONTENTS
4.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 How to compute expectation . . . . . . . . . . . . . . . . . . . 54
4.4.1 Expected values of absolutely continuous r.v. . . . . . . 55
4.4.2 Expected values of discrete r.v. . . . . . . . . . . . . . 57
4.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Joint integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8 Martingales 87
8.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . 87
8.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Chapter 1
Set Theory
1.1 Sets
ω ∈ A: ω is an element of A.
A ∪ B = {ω : ω ∈ A or ω ∈ B} (union)
A ∩ B = {ω : ω ∈ A and ω ∈ B} (intersection)
A − B = {ω : ω ∈ A and ω 6∈ B} (difference)
c
A =: Ω − A = {ω : ω 6∈ A} (complement)
A∆B = (A − B) ∪ (B − A) (symmetric difference)
1
2 CHAPTER 1. SET THEORY
DeMorgan’s Law:
c
(∪∞
n=1 An ) = ∩∞ c
n=1 An
c
(∩∞
n=1 An ) = ∪∞ c
n=1 An
Remark The upper limit and lower limit of the sequence of real numbers
{an , n ≥ 1} are defined by
Example
You play a game with your friend by tossing a fair coin. You win the n-th
round of the game if ”Head” appears in that round.
Let An be the event that you win the n-th round. Then, Acn will represent
that you lose the n-th round.
lim supn An means that ∀n ≥ 1, ∃k ≥ n such that you will win the k-th
round. This is equivalent to that you win infinitely many rounds.
On the other hand, lim inf n Acn means that ∃n ≥ 1, you will loss all the rounds
starting from the n-th round. This is equivalent to that you ultimately lose
all of the rounds.
Proof. By defintion,
lim inf An = ∪∞ ∞ ∞
n=1 ∩k=n Ak = ∪n=1 (B ∩ C) = B ∩ C,
n
lim sup An = ∩∞ ∞ ∞
n=1 ∪k=n Ak = ∩n=1 (B ∪ C) = B ∪ C.
n
4 CHAPTER 1. SET THEORY
(b) If A1 ⊃ A2 ⊃ A3 · · · , then An → A = ∩∞
n=1 An , written as An ↓ A.
lim inf An = ∪∞ ∞ ∞
n=1 ∩k=n Ak = ∪n=1 An = A
n
lim sup An = ∩∞ ∞ ∞ ∞ ∞
n=1 ∪k=n Ak = ∩n=1 (∪k=1 An ) = ∩n=1 A = A.
n
Therefore, An → A = ∪∞
n=1 An .
Theorem 1.4.2.
IA∪B = max{IA , IB }
IA∩B = min{IA , IB } = IA IB
IAc = 1 − IA
IA∆B = |IA − IB |
Ilim inf n An = lim inf IAn
n
Ilim supn An = lim sup IAn
n
IA∪B ≤ IA + IB
∞
X
I∪∞
n=1 An
≤ IAn
n=1
IA∪B = max{IA , IB }:
∀ω ∈ Ω
Similarly, we can show that this is true if IA∪B (ω) = 0. Thus, we proved
IA∪B = max{IA , IB }.
∀ω ∈ Ω
⇒ 1 = lim sup IAcn (ω) = lim sup[1 − IAn (ω)] = 1 − lim inf IAn (ω).
n n n
IA∆B = |IA − IB |:
Consider 4 cases:
Example
Show (A∆B)∆C = A∆(B∆C).
Proof. Note
If IB = 0, then
ILHS = |IA − IC | = IRHS .
If IB = 1, then
Theorem 1.4.3.
n
X X
I∪nj=1 Aj = IAj − IAj1 ∩Aj2 +
j=1 1≤j1 <j2 ≤n
X
IAj1 ∩Aj2 ∩Aj3 + · · · + (−1)n−1 IA1 ∩A2 ∩···∩An .
1≤j1 <j2 <j3 ≤n
Proof. Set s1 = nj=1 IAj , s2 = 1≤j1 <j2 ≤n IAj1 ∩Aj2 , · · · , sn = IA1 ∩A2 ∩···∩An .
P P
Then we need to show
For some ω ∈ Ω.
(i) ∅ ∈ F;
(ii) A, B ∈ F implies A ∩ B ∈ F;
8 CHAPTER 1. SET THEORY
(i) Ω ∈ F;
(ii) A ∈ F implies Ac ∈ F;
(iii) A, B ∈ F implies A ∪ B ∈ F.
(i) Ω ∈ F;
(ii) A ∈ F implies Ac ∈ F;
(iii) A1 , A2 , · · · ∈ F implies ∪∞
n=1 An ∈ F.
Remark
(c) A σ-field is a field (by taking Aj = A2 for j ≥ 3), but not vice versa.
1.5. SEMIRINGS, FIELDS AND σ-FIELDS 9
Example
(a) Let Ω = (−∞, ∞]. Let F be the collection of half open intervals in Ω,
i.e.,
F = {(a, b] : −∞ ≤ a ≤ b ≤ ∞}
Then F is a semiring, but NOT a field.
Proof. For the first part, note that (a, a] = ∅ ∈ F. It is easy to check
that the intersection of two half open intervals are still half open intervals
(including ∅). So it is closed under intersection. On the other hand, if
A = (a2 , b1 ], B = (a1 , b2 ] ∈ F and a1 < a2 < b1 < b2 , then A ⊂ B.
B − A = (a1 , a2 ] ∪ (b1 , b2 ].
So, B − A is equal to the union of two disjoint half open intervals. Thus,
F is a semiring.
For the second part, note that (n, n + 1] ∈ F, but (0, 1] ∪ (2, 3] 6∈ F. So
it is not closed under finite union, thus not a field.
F = {∪m
i=1 (ci , di ] : 0 < c1 ≤ d1 ≤ c2 ≤ d2 · · · ≤ dm ≤ 1} ∪ ∅.
1 1 1 1
∩∞
n=1 − , = 6∈ F.
2 n+2 2 2
Example
10 CHAPTER 1. SET THEORY
Theorem 1.6.1. For any class A, there exists a unique minimal σ-field
containing A, denoted by σ(A), called the σ-field generated by A. In other
words,
(a) A ⊂ σ(A),
(b) For any σ-field B with A ⊂ B, σ(A) ⊂ B,
and σ(A) is unique.
Proof. Existence.
Let QA = {B : B ⊃ A, B is a σ-field on Ω}. Clearly, QA is not empty since
it contains the power set P(Ω). Define
σ(A) = ∩B∈QA B.
From Lemma 1.6.1, σ(A) is a σ-field.
Uniqueness.
Let σ1 (A) be another σ-field satisfying (a) and (b). By definition, σ(A) ⊂
σ1 (A). By symmetry, σ1 (A) ⊂ σ(A). Thus, σ(A) = σ1 (A).
1.7. PRODUCT SPACES 11
Definition
The smallest σ-field generated by the collection of all finite open intervals
on the real line R = (−∞, ∞) (or R = (−∞, ∞] or R = [∞, ∞]) is called
the Borel σ-field, denoted by B. The element of B are called Borel sets.
the ordered pair (R, B) is called the (1-dimensional) Borel measurable
space.
Remark
(1) Every ”reasonable” subset of R is a Borel set. Closed (open) intervals,
half-open (closed) intervals and countable union (intersection) of open
(closed) intervals are examples of Borel sets. However, B 6= P(R).
(2) For A ∈ B, let
BA = {B ∩ A : B ∈ B} = B ∩ A.
Then (A, BA ) is a measurable space, and BA is called the Borel σ-field
on A.
n
Y
Ai =: A1 × · · · × An = {(ω1 , · · · , ωn ) : ωi ∈ Ai ⊂ Ωi , 1 ≤ i ≤ n}.
i=1
In the special case where (Ωi , Ai ) = (R, B), the n-dim Borel measur-
able space is given by (Rn , B n ).
12 CHAPTER 1. SET THEORY
Chapter 2
Measure
2.1 Definitions
Let Ω be a space, A be a class of Ω, and µ : A → R = [0, ∞] be a set function
defined on A.
Definition
(i) µ is finite on A if µ(A) < ∞, ∀A ∈ A.
(ii) µ is σ-finite on A if there is a sequence of sets A1 , A2 , · · · in A with
∪∞
n=1 An = Ω and µ(An ) < ∞ for each n.
Definition
(i) If µ is a measure on a σ-field A of subsets of Ω, the ordered triplet
(Ω, A, µ) is a measure space. The elements of A are called measur-
able sets, or A-measurable.
[Note: (Ω, A) = measurable space 6= measure space = (Ω, A, µ).]
(ii) A measure space (Ω, A, µ) is a probability space if µ(Ω) = 1. µ is
called a probability measure usually written as P .
13
14 CHAPTER 2. MEASURE
(a) µ(∅) = 0.
Xn n
X
µ( Ai ) = µ(Ai ).
i=1 i=1
(c) (Monotonicity)
If A, B ∈ A, then A ⊂ B ⇒ µ(A) ≤ µ(B) and µ(B − A) = µ(B) − µ(A).
µ(An ) → µ(A).
µ(An ) → µ(A).
2.3. PROPERTIES OF A MEASURE 15
(f ) (Finite Sub-Additivity)
If {Ai , 1 ≤ i ≤ n} ∈ A, then
n
X
µ (∪ni=1 Ai ) ≤ µ(Ai ).
i=1
Proof.
A = ∪∞
i=1 Ai = A1 + (A2 − A1 ) + (A3 − A2 ) + · · · ,
16 CHAPTER 2. MEASURE
so
(e) Assume that for some m, µ(Am ) < ∞. (There is no point of discussion
if µ(Am ) = ∞ for all m ≥ 1.) Then Am − An , n ≥ 1 forms an increasing
sequence and Am − An → Am − A = Am − ∩∞ i=1 Ai . Then from (d), for
n ≥ m,
Therefore,
lim µ(An ) = µ (∩∞
i=1 Ai ) = µ(A).
n→∞
k−1
(f) Let B1 = A1 and Bk = Ak − ∪i=1 Ai .
Bk are disjoint and ∪nk=1 Ak = ∪nk=1 Bk , so from (b),
n
! n
X X
n
µ (∪k=1 Ak ) = µ Bk = µ(Bk ).
k=1 k=1
(g) By taking the limit of n → ∞ on both sides of the result in the proof of
(f), we have
n
X ∞
X
lim µ (∪ni=1 Ai ) ≤ lim µ(Ai ) = µ(Ai ).
n→∞ n→∞
i=1 i=1
µ (∪∞
i=1 Ai ) .
2.4. ARITHMETICS WITH ∞ 17
(2) 0 · ∞ = 0, ∞ + ∞ = ∞, ∞ · ∞ = ∞.
x x
(3) ∞
= −∞
= 0.
∞
(4) ∞ − ∞ and ∞
are not defined.
Theorem 2.5.1. Given a measure space (Ω, A, µ), there exists a complete
measure space (Ω, Ā, µ̄) such that A ⊂ Ā and µ̄ = µ on A.
Proof. The proof is outside the scope of this course and is not required.
Because Theorem 2.5.1, from now on, without otherwise specified, we assume
all the measure spaces in this course are complete measure spaces.
Remark
We are going to see how to apply the Caratheodory extension theorem to con-
struct an important measure on the Borel σ-field which is known as Lebesgue
measure.
λ((a, b]) = b − a.
Example
Find λ∗ (A) when
(b) A = {a}
(b) Let A1 = a − n1 , a + 1
and A2 = · · · = ∅. A = ∪∞
n n=1 An . So,
(∞ )
X 2
λ∗ (A) = inf λ(An ) : A ⊂ ∪∞
n=1 An , An ∈ F = lim =0
n→∞ n
n=1
Theorem 2.6.2. Suppose that F is finite on (−∞, ∞) (i.e. |F (t)| < ∞ for
|t| < ∞), and
Remark
(1) The function F in Theorem 2.6.2 is called a Lebesgue-Stieltjes (L-S)
measure function.
(3) Theorem 2.6.2 shows that F uniquely determines µ, but not vice versa,
since we can write µ((a, b]) = F (b) − F (a) = (F (b) + c) − (F (a) + c). So
there is no 1 − 1 correspondence between the L-S measure functions and
the L-S measures.
To check (c), for any sequence xn ↓ x, it follows that (−∞, xn ] ↓ (−∞, x].
Since P is continuous from above, we have
To check (a), for any sequence xn ↓ −∞, it follows that (−∞, xn ] ↓ ∅. Since
P is continuous from above, we have
(2) Given a d.f. F , we now show that there exists a unique probability
measure P on (R, B) satisfying (2.2).
22 CHAPTER 2. MEASURE
(ii) limy↓−∞ P ((y, x]) = P ((−∞, x]), (as P is a probability measure thus
continuous from below).
Thus letting y ↓ −∞, we get F (x) = P ((−∞, x]). That is, (2.2) holds.
Remark
(1) The definition of F in Theorem 2.6.1 does not involve any random vari-
ables. The association of the d.f.’s with the random variables will be
discussed in the next chapter.
δt (x) = 0 x<t
= 1 x ≥ t.
where {an ,P
n ≥ 1} is a countable set of real numbers, bj > 0 for all
j ≥ 1 and ∞ j=1 bj = 1.
2.6. PROBABILITY MEASURES AND DISTRIBUTION FUNCTIONS23
Random Variables
3.1 Mappings
Definition Let X : Ω1 → Ω2 be a mapping between two sets Ω1 and Ω2 .
X −1 (Ω2 ) = {ω ∈ Ω1 : X(ω) ∈ Ω2 } = Ω1 ,
X −1 (∅) = {ω ∈ Ω1 : X(ω) ∈ ∅} = ∅.
25
26 CHAPTER 3. RANDOM VARIABLES
(ii)
X −1 (B c ) = {ω : X(ω) ∈ B c } = {ω : X(ω) 6∈ B}
= {ω : ω 6∈ X −1 (B)} = {ω : ω ∈ [X −1 (B)]c } = [X −1 (B)]c .
(3) Clearly, X −1 (C) ⊂ X −1 (σ(C)), which is a σ-field from (2). Thus σ(X −1 (C)) ⊂
X −1 (σ(C)). It remains to show that X −1 (σ(C)) ⊂ σ(X −1 (C)). Define
G = {G : X −1 (G) ∈ σ X −1 (C) }.
⇔ {X ≤ x} = X −1 ([−∞, x]) ∈ A, ∀x ∈ R.
⇔ {X ≤ x} = X −1 ([−∞, x]) ∈ A, ∀x ∈ D which is a dense subset of R.
Remark
{X = (X1 , · · · , Xn ) ∈ I1 × · · · × In } = ∩∞
k= {Xk ∈ Ik } ∈ A.
The proof follows from this and Theorem 3.2.1 as B(Rn ) = σ ( nk=1 Ik ).
Q
Proof. The proof follows directly from Theorems 3.3.2 and 3.2.2.
3.4. CONSTRUCTION OF RANDOM VARIABLES 29
(ii) X + Y .
(iii) X 2 .
(iv) XY .
1
(v) X
provided that X(ω) 6= 0 for all ω ∈ Ω.
X
(vi) Y
provided that Y (ω) 6= 0 for all ω ∈ Ω.
Proof. To prove the theorem, we can use Theorem 3.4.1 by choosing the
appropriate Borel function f .
(i) f (x) = ax.
(ii) f (x, y) = x + y.
(iii) f (x) = x2 .
(3) If S(ω) = ∞
P
n=1 Xn (ω) exists for every ω, then S is a r.v.
Proof.
1.
{sup Xn ≤ t} = ∩∞
n=1 {Xn ≤ t}
n
{inf Xn ≥ t} = ∩∞
n=1 {Xn ≥ t}
n
lim sup Xn = inf sup Xm
n k≥1 m≥k
2. ∀B ∈ B, note that
{X ∈ B} = ∪{i:ai ∈B} Ai ∈ A.
and
(n+1)2n+1
X k−1 k−1 k
B+C = I < X(ω) ≤ + (n + 1)I {X(ω) > n + 1}
2n+1 2n+1 2n+1
k=n2n+1 +1
(n+1)2n+1
n2n+1
X k−1 k
≥ I < X(ω) ≤ + nI {X(ω) > n + 1}
2n+1 2n+1 2n+1
k=n2n+1 +1
≥ nI {n < X(ω) ≤ n + 1} + nI {X(ω) > n + 1}
= nI {X(ω) > n}
Therefore,
Xn+1 (ω) = A + B + C
n2n
X k−1 k−1 k
≥ n
I n
< X(ω) ≤ n + nI {X(ω) > n}
k=1
2 2 2
= Xn (ω) ≥ 0.
k−1 1
0 ≤ X(ω) − n
= X(ω) − Xn (ω) ≤ n → 0.
2 2
Example
(1) For Λ = {1}, if X1 (ω) ≡ a for all ω ∈ Ω, then for any B ∈ B
X1−1 (B) = Ω if a ∈ B
= ∅ if a 6∈ B.
Definition X is a r.v.
Example
Let f (x) be the density function of the d.f. F (x).
βe−βx if x ≥ 0,
f (x) =
0 if x < 0,
where β > 0.
3.7. DISTRIBUTIONS AND INDUCED DISTRIBUTION FUNCTIONS35
Similarly, we can define the distributions and distribution functions for ran-
dom vectors.
FX (x) = P (X1 ≤ x1 , · · · , Xn ≤ xn ).
Remark
(i) X and Y in Definition (i) does not have to be in the same probability
space while they must in (ii).
Theorem 3.7.2.
3.8 Independence
Definition Let (Ω, A, P ) be a probability space.
(i) The events A1 , · · · , An ∈ A are said to be independent if and only if
Y
P (∩i∈J Ai ) = P (Ai ).
i∈J
for any Borel sets Bi ∈ B (as one can take some Bi = R).
38 CHAPTER 3. RANDOM VARIABLES
(iv) The random variables (events or classes) are said to be pairwise in-
dependent if and only if every two of them are independent.
(v) The random variables X1 , · · · , Xn are independent and have the same
distribution function are called independent and identically dis-
tributed (i.i.d.).
Example
Let X1 , X2 , X3 are independent random variables with P (Xi = 0) = P (Xi =
1) = 1/2. Let A1 = {X2 = X3 }, A2 = {X3 = X1 }, A3 = {X1 = X2 }. Then
Ai ’s are pairwise independent but not independent.
Proof. Note for i 6= j,
P (A1 ) = P (X2 = X3 )
= P (X2 = X3 = 1) + P (X2 = X3 = 0)
= P (X2 = 1, X3 = 1) + P (X2 = 0, X3 = 0)
= P (X2 = 1) P (X3 = 1) + P (X2 = 0) P (X3 = 0)
= 1/4 + 1/4 = 1/2.
P (Ai ∩ Aj ) = P (X1 = X2 = X3 )
= P (X1 = 1, X2 = 1, X3 = 1) + P (X1 = 0, X2 = 0, X3 = 0)
= P (X1 = 1) P (X2 = 1) P (X3 = 1) + P (X1 = 0) P (X2 = 0) P (X3 = 0)
= (1/2)3 + (1/2)3 = 1/4.
for all t1 , · · · , tn ∈ R.
n
Y
P (X1 = a1 , · · · , Xn = an ) = P (Xi = ai ) (3.5)
i=1
for all a1 , · · · , an ∈ C.
Yn
= P (Xi ≤ ti )
i=1
= FX1 (t1 ) · · · FXn (tn )
for all y1 , · · · , yn ∈ R.
Hence,
n
!
Z t1 Z tn Y
··· fX (y1 , · · · , yn ) − fXi (yi ) dy1 · · · dyn = 0.
−∞ −∞ i=1
Theorem 3.9.4.
are independent.
Remark In Theorem 3.9.4, (c) and (d) are known as the convolution.
lim sup An = ∩∞ ∞ ∞
n=1 ∪m=n Am = lim ∪m=n Am = {An , i.o.}.
n n→∞
lim inf An = ∪∞
n=1 ∩∞
m=n Am = lim ∩∞
m=n Am = {An , utl.}.
n n→∞
c
c
lim inf An = lim sup An .
n n
Proof.
42 CHAPTER 3. RANDOM VARIABLES
(a)
P (An , i.o.) = P lim ∪∞
m=n Am
n→∞
= lim P (∪∞
m=n Am )
n→∞
∞
X
≤ lim P (Am ) → 0.
n→∞
m=n
The inequality in the last row is due the countable sub-additivity of the
probability measure.
Remark
(1) In Theorem 3.10.1, (a) does not require the independent assumption of
A1 , A2 , · · · while independence in (b) cannot be removed
P∞ in general. For
instance, take An = A with 0 < P (A) < 1, then n=1 P (An ) = ∞ but
P (An , i.o.) < 1.
(2) In Theorem 3.10.1, (b) also holds when A1 , A2 , · · · are only pairwise
independent.
Proof.
P (A) = P lim An = P lim sup An = P (An , i.o.).
n→∞ n
∩∞
n=1 σ(Xn , Xn+1 , · · · ).
(ii) {An , i.o.} is a tail event since by taking Xn = IAn , Bn = {1} in (i)
Remark
(1) The following notations are often used to denote E[X] and EA [X]:
Z Z
E[X] = X(ω)P (dω) = XdP ;
Z Ω ZΩ
(2) Theorem 3.5.2 guarantees the existence of the sequence of simple r.v.’s
for a nonnegative r.v. X in (a).
45
46 CHAPTER 4. EXPECTATION AND INTEGRATION
(4) We can show that E[X] is well defined for both simple and nonnegative
r.v.’s in the following senses (the proofs are omitted):
Recall
For a general r.v. X on (Ω, A, P ), we have
X = X + − X −, |X| = X + + X − . (4.1)
where
X + = max{X, 0} = XI{X≥0} ≥ 0 and X − = max{−X, 0} = −XI{X≤0} ≥ 0.
(a) For a general r.v. X, if either E[X + ] < ∞ or E[X − ] < ∞ (but not
both), then the expectation of X is
E[X] = E X + − E X − .
EA [X] = E[XIA ].
(e) Define L1 = {X : E[|X|] < ∞}, the class of all integrable r.v.’s on
(Ω, A, P ).
4.2. PROPERTIES OF EXPECTATION 47
(x) (Monotonicity)
If X1 ≤ X ≤ X2 a.s., then
Proof.
(iv) Before we prove the linearity for X, Y ∈ L1 , we first show that (4.2) is
true for simple r.v.’s and then for nonnegative r.v.’s.
Suppose X and Y are simple r.v.’s and are given by
n
X m
X
X= ai IAi and Y = bj IBj ,
i=1 j=1
Pn Pm
where Ai , Bj ∈ A, i=1 Ai = j=1 Bj = Ω and ai , bj ∈ R.
So,
n X
X m
aX + bY = (aai + bbj )IAi ∩Bj
i=1 j=1
Now, suppose X and Y are nonnegative r.v.’s, so there are two increas-
ing sequences of nonnegative simple r.v.’s {Xn , n ≥ 1} and {Yn , n ≥ 1}
such that Xn ↑ X and Yn ↑ Y .
For a, b ≥ 0,
E[aX + bY ] = lim E[aXn + bYn ]
n→∞
= lim (aE[Xn ] + bE[Yn ])
n→∞
= a lim E[Xn ] + b lim E[Xn ] = aE[X] + bE[Y ]
n→∞ n→∞
Suppose X, Y ∈ L1 .
By the triangle inequality: |aX + bY | ≤ |a||X| + |b||Y |. Note that
each term is nonnegative, so their expectations are well defined and
monotonicity of expectations for nonnegative r.v.’s implies
E[|aX + bY |] ≤ E[|a||X| + |b||Y |]
= |a|E[|X|] + |b|E[|Y |] (by the linearity of nonnegative r.v.’s)
< ∞.
4.2. PROPERTIES OF EXPECTATION 49
Thus, aX + bY ∈ L1 .
To prove (4.2) for general r.v.’s, it is equivalent to show that
(a) E[X + Y ] = E[X] + E[Y ].
(b) E[aX] = aE[X].
Proof of (a).
By definition,
(X + Y )+ − (X + Y )− = X + Y = X + − X − + Y + − Y − .
Therefore,
(X + Y )+ + X − + Y − = (X + Y )− + X + + Y + . (4.3)
Proof of (b).
If a ≥ 0, then
E[aX] = E[(aX)+ ] − E[(aX)− ]
= E[aX + ] − E[aX − ]
= a E[X + ] − E[X − ] = aE[X].
If a < 0, then
(aX)+ = aXI{aX≥0} = (−a)(−X)IX≤0 = (−a)X − .
Similarly, (aX)− = (−a)X + . Therefore,
E[aX] = E[(aX)+ ] − E[(aX)− ]
= E[(−a)X − ] − E[(−a)X + ]
= −a E[X − ] − E[X + ] = aE[X].
(v)
However,
(vi) First assume E[X] = 0, we will show that X =a.s. 0. Suppose that
X =a.s. 0 is NOT true, i.e.,
P (X = 0) < 1 ⇒ FX (0) = P (X ≤ 0) = P (X = 0) = 1.
⇒ ∃ε > 0 s.t. FX (ε) < 1. (since FX is right continuous.)
⇒ P (X > ε) > 0.
Therefore,
Define Z = X − Y .
P (|Z| = 0) = 1 ⇒ P (Z + + Z − = 0) = 1 ⇒ P (Z + = 0) = P (Z − = 0) =
1.
From (vi), we have E[Z] = E[Z + ] − E[Z − ] = 0.
By linearity, we have E[X] = E[Y ].
(ix) |E[X]| < ∞ ⇔ E[X + ] < ∞ and E[X − ] < ∞ ⇔ E[|X|] < ∞.
Also, there are some useful limiting properties of the expectation. They are
stated in the following theorem and the proofs of them are omitted.
Theorem 4.2.2.
52 CHAPTER 4. EXPECTATION AND INTEGRATION
4.3 Integration
4.3.1 Definition
In the previous section, we defined the integration of a random variable over
a probability space. Now, we can extend the corresponding definition to a
measurable function over a general measure space (Ω, A, µ). Here, µ is not
necessarily a probability measure (i.e. µ(Ω) 6= 1).
(b) If f ≥ 0, define Z Z
f dµ = lim fn dµ,
Ω n→∞
Remark
(i) Some of the properties of the expectation in Theorems 4.2.1 and 4.2.2
are also valid for the integration of measurable functions.
Exercise Please list those properties which are valid for the integral of
f in (4.4).
Then Z
E[g(X)] = g(y)PX (dy).
R
Proof.
Case I: Indicator functions.
If g = IB with B ∈ B, then the relevant definitions show
For the general case, we can write g(x) = g + (x) − g − (x). The condition
that g is integrable guarantees that E[g + (x)] < ∞ and E[g − (x)] < ∞. So
from Case III for nonnegative functions and linearity of expected value and
integration
= g(y)PX (dy).
R
Proof of (4.7). We shall employ the same method used in Theorem 4.4.1.
Case I: Indicator functions
If g = IB with B ∈ B, then
Z Z
LHS = IB (x)PX (dx) = P (X ∈ B) = PX (B) = IB (y)f (y)dy = RHS,
R R
Case II:
PnSimple functions
If g = i=1 bi IBi with Bi ∈ B and bi ∈ R. The linearity of expected value,
the result of Case I, and the linearity of integration imply
Z n
! n Z
X X
LHS = bi IBi (x) PX (dx) = bi IBi (x)PX (dx)
R i=1 i=1 R
n
X Z n
Z X
= bi IBi (y)f (y)dy = bi IBi (y)f (y)dy
R R i=1
Zi=1
= g(y)f (y)dy = RHS.
R
we get
Z Z
LHS = lim gn (y)PX (dy) = lim gn (y)f (y)dy
n→∞ R n→∞ R
Z
= g(y)f (y)dy = RHS.
R
Remark
(ii) The last integral in (i) is Lebesgue integral, which equals Riemann
integral when the latter exists. This will make our calculations much
easier.
Proof. Clearly, g(X) is a r.v. taking values g(x1 ), g(x2 ), · · · , and we can write
∞
X
g(X) = g(xk )I{X=xk } .
k=1
Case I:
If g(X) ≥ 0, then g(xk ) ≥ 0 for all k ≥ 1. Define
n
X
Zn = g(xk )I{X=xk } , a form of truncated r.v.
k=1
Case II:
Let us consider
P∞general g. It follows from Case I and the assumption that
E[|g(X)|] = k=1 pk |g(xk )| < ∞. Therefore,
4.5 Moments
Definition Let X be a r.v. and r > 0,
(1) Define
rth Moment: E[X r ].
rth Absolute Moment: E[|X|r ].
rth Central Moment: E[(X − E[X])r ].
rth Absolute Central Moment: E[|X − E[X]|r ].
4.6. JOINT INTEGRALS 59
The following theorem links the probability and moment of a random vari-
able.
Theorem 4.5.1. Chebyshev (Markov) inequality If g is strictly in-
creasing and positive on (0, ∞), g(x) = g(−x), and X is a r.v. such that
E[g(X)] < ∞, then for each a > 0:
E[g(X)]
P (|X| ≥ a) ≤ .
g(a)
Proof.
E[g(X)] ≥ E[g(X)I{g(X)≥g(a)} ]
≥ g(a)E[I{g(X)≥g(a)} ]
= g(a)P (g(X) ≥ g(a))
= g(a)P (|X| ≥ a).
Example
E[|X|]
(i) X ∈ L1 ⇒ P (|X| ≥ a) ≤ a
.
E[|X|p ]
(ii) X ∈ Lp ⇒ P (|X| ≥ a) ≤ ap
.
V ar(X)
(iii) X ∈ L2 ⇒ P (|X − E[X]| ≥ a) ≤ a2
.
Definition If E[X], E[Y ] and E[XY ] are finite, then the covariance of X
and Y is defined as
E[XY ] = E[X]E[Y ].
Therefore,
n X
X m n X
X m
E[XY ] = ai bj P (Ai ∩ Bj ) = ai bj P (Ai )P (Bj ) = E[X]E[Y ].
i=1 j=1 i=1 j=1
Step 2.
Let byc be the integer part of y.
(a) limn→∞ E[Xn ] = E[X] and limn→∞ E[Yn ] = E[Y ], by the Monotone
Convergence Theorem.
(b) 0 ≤ XY − Xn Yn = X(Y − Yn ) + Yn (X − Xn ) → 0 a.s., i.e., 0 ≤
Xn Yn ↑ XY .
4.6. JOINT INTEGRALS 61
Step 3.
For general integrable r.v.’s, note that independence of X and Y implies that
of X + and Y + ; X − and Y − ; and so on. Therefore,
E[XY ] = E[(X + − X − )(Y + − Y − )]
= E[X + Y + ] − E[X + Y − ] − E[X − Y + ] + E[X − Y − ]
= E[X + ]E[Y + ] − E[X + ]E[Y − ] − E[X − ]E[Y + ] + E[X − ]E[Y − ]
= (E[X + ] − E[X − ])(E[Y + ] − E[Y − ])
= E[X]E[Y ].
Lr
(b) Xn → X in rth mean, or in Lr , where 1 ≤ r < ∞, written as Xn → X,
if
lim E[|Xn − X|r ] = 0.
n→∞
p
(c) Xn → X in probability, written as Xn → X, if
lim P (|Xn − X| > ) = 0, for any > 0.
n→∞
a.s.
Theorem 5.1.1. Xn → X if and only if for any > 0,
lim P (∩∞
m=n {|Xm − X| < }) = 1. (5.1)
n→∞
63
64 CHAPTER 5. CONVERGENCE OF RANDOM VARIABLES
a.s.
which implies that Xn → X.
p
Theorem 5.1.2. Xn → X if and only if
|Xn − X|
lim E = 0.
n→∞ 1 + |Xn − X|
Therefore,
|Xn |
0≤E ≤ E[I{|Xn |>} ] + = P (|Xn | > ) + .
1 + |Xn |
Taking limits and using the assumption of limn→∞ P (|Xn | > ) = 0 yield
|Xn |
0 ≤ lim E ≤ ;
n→∞ 1 + |Xn |
5.2. RELATIONSHIP BETWEEN TYPES OF CONVERGENCES 65
h i
|Xn |
since is arbitrary we have limn→∞ E 1+|X n|
= 0.
h i
|Xn | x
Next suppose limn→∞ E 1+|X n|
= 0. The function f (x) = 1+x
is strictly
increasing. Therefore
When no limit is specified, the next theorem is useful to check whether the
sequence of r.v.’s converges almost surely or not.
Theorem 5.1.3. (Cauchy Criterion of a.s.) Xn converges a.s. if and
only if
or equivalently,
lim P sup |Xm − Xn | > = 0, for any > 0.
M →∞ m,n≥M
a.s. p
(b) If Xn → X, then Xn → X.
Proof.
E[|Xn − X|r ]
0 ≤ P (|Xn − X| ≥ ) ≤ → 0, as n → ∞.
r
p
Thus, Xn → X.
|Xn −X| a.s. |Xn −X|
(b) Since 1+|X n −X|
→ 0 and 1+|X n −X|
≤ 1 always, by the Dominated Con-
vergence Theorem (see Theorem 4.2.2 (iii)), we have
|Xn − X| |Xn − X|
lim E = E lim = E[0] = 0.
n→∞ 1 + |Xn − X| n→∞ 1 + |Xn − X|
Remark The following examples show the converses in Theorem 5.2.1 may
not hold.
p Lr
(i) (Xn → X 6⇒ Xn → X)
p
From Theorem 5.1.2, we have Xn → 0.
However,
p L1
This case shows that Xn → 0 6⇒ Xn → 0.
5.2. RELATIONSHIP BETWEEN TYPES OF CONVERGENCES 67
p a.s.
(ii) (Xn → X 6⇒ Xn → X)
(iii) (”a.s. convergence” and ”Lr convergence” do not imply each other)
a.s. Lr
(a) (Xn → X 6⇒ Xn → X)
So,
∞
! ∞
!!
\ [
lim P |Xm − 0| < = lim 1−P |Xm − 0| ≥ = 1.
n→∞ n→∞
m=n m=n
a.s.
By Theorem 5.1.1, we have Xn → 0.
L1
Since E|Xn − 0| = n → ∞ as n → ∞, Xn 6→ 0.
Lr a.s
(b) (Xn → X 6⇒ Xn → X)
68 CHAPTER 5. CONVERGENCE OF RANDOM VARIABLES
Consider
" 2 #
Sn
E − E[X1 ]
n
" 2 #
Sn Sn
= E −E
n n
Sn
= V ar
n
n
1 X
= V ar(Xi ) (since Xi are independent)
n2 i=1
1
= V ar(X1 ) (since Xi are identically distributed)
n
→ 0 as n → ∞. (since V ar(X1 ) is finite)
2
Sn L
Therefore, we have n
→ E[X1 ].
69
70 CHAPTER 6. THE LAW OF LARGE NUMBERS
Suppose Xi ≥ 0 for i = 1, 2, · · · .
Pn
Define Yk = Xk I{Xk ≤k} and their partial sums Sn∗ = k=1 Yk .
For a fixed α > 1, let un = bαn c (the integer part of αn ). First, we want to
show
∞ ∗
Sun − E[Su∗n ]
X
P > < ∞. (6.2)
n=1
un
n
X n
X
V ar[Sn∗ ] = V ar[Yk ] ≤ E[Yk2 ]
k=1 k=1
n
X
= E[X12 I{X1 ≤k} ] ≤ nE[X12 I{X1 ≤n} ].
k=1
∞ ∗ ∞
Sun − E[Su∗n ] V ar[Su∗n ]
X X
P
> ≤
n=1
un
n=1
2 un 2
" ∞
#
1 X 1
≤ 2 E X12 I{X1 ≤un } .
n=1 n
u
Let K = 2α/(α − 1), and suppose x > 0. Let N = min{n : un ≥ x}. Then
αN ≥ x, and since y ≤ 2byc for y ≥ 1,
X X
u−1
n ≤ 2 α−n = Kα−N ≤ Kx−1 .
un ≥x n≥N
Therefore, ∞ −1 −1
P
n=1 un I{X1 ≤un } ≤ KX1 for X1 > 0, and the sum in (6.2) is at
most K−2 E[X1 ] < ∞.
Su∗ −E[Su∗ ] Su∗ (ω)−E[Su∗ ]
Let An = n un n > =: ω ∈ Ω : n un n > .
6.1. STRONG LAW OF LARGE NUMBERS 71
P ({Xn 6= Yn }, i.o.) = 0
⇒ P ({Xn = Yn }, utl.) = 1 − P ({Xn 6= Yn }, i.o.) = 1
Sun ≤ Sk ≤ Sun+1
Sun Sk Su
⇒ ≤ ≤ n+1
k k k
Sun Sun Sk Su Su
⇒ ≤ ≤ ≤ n+1 ≤ n+1
un+1 k k k un
un Sun Sk un+1 Sun+1
⇒ ≤ ≤ .
un+1 un k un un+1
un+1
But un
→ α, and so it follows from (6.3) that
1 Sk
E[X1 ] ≤ lim ≤ αE[X1 ]
α k→∞ k
Example
Let {Xj , j ≥ 1} be a sequence of i.i.d. Bernoulli r.v.’s with P (Xj = 1) = p
and P (Xj = 0) = 1 = 1 − p.
Sn a.s.
→ E[X1 ] = p.
n
The result shows that the fraction of success for n trials will converge almost
surely to the probability of success of each trial in the long run.
Sn p
→ E[X1 ].
n
This is the weak law of large numbers.
The weak law of large numbers can also be proved by using Chebyshev in-
equality (Theorem 4.5.1) as follows:
Sn V ar[Sn ] V ar(X1 )
P − E[X1 ] ≥ ≤
2 2
= → 0 as n → ∞.
n n n2
Therefore, we have
Sn p
→ E[X1 ].
n
74 CHAPTER 6. THE LAW OF LARGE NUMBERS
Chapter 7
Remark
(b) If X is absolutely continuous and has the density function f (x), then
ϕ(t) in (7.1) can be written as
Z
ϕ(t) = eitx f (x)dx. (7.3)
R
(i) ϕ(0) = 1.
75
76 CHAPTER 7. THE CENTRAL LIMIT THEOREM
(iv)
|ϕ(t)| = |E[eitX ]|
≤ E[|eitX |] (by Theorem 4.2.1 (iii).)
= E[1] = 1
When n = 0,
Z x Z x
is
e ds = x + i (x − s)eis ds
0 0
Z x
eix = 1 + ix + i2
(x − s)eis ds.
0
When n = 1,
x x
x2
Z Z
is i
(x − s)e ds = + (x − s)2 eis ds.
0 2 2 0
By considering the integral in (7.6), since |eis | ≤ 1 for all s, it follows that
n+1 Z x Z x n+1
= |x|
i n is
1 n
n! (x − s) e ds≤
n! (x − s) ds (n + 1)! . (7.8)
0 0
For the integral in (7.7), since |eis − 1| ≤ 2 for all s, it follows that
Z x Z x
in 2|x|n
n−1 is
2 n−1
(n − 1)! (x − s) (e − 1)ds ≤
(n − 1)!
(x − s) ds= .
0 0
n!
(7.9)
78 CHAPTER 7. THE CENTRAL LIMIT THEOREM
for n ≥ 0.
Remark Let
|x|n+1
(n+1)! |x|
R(x) = 2|x|n
= .
2(n + 1)
n!
If n is fixed, then
(ii) On the other hand, R(x) < 1 when x is small. So, when x is small,
n k
n+1
(ix) ≤ |x|
ix X
e −
k! (n + 1)! .
k=0
Theorem 7.1.2. If X has a moment of order n (i.e., E[|X|n ] < ∞), then
(i)
n
(it)k |tX|n+1 2|tX|n
X
ϕ(t) − k
E[X ] ≤ E min , . (7.11)
k!
k=0
(n + 1)! n!
1
ϕ(t) = 1 + itE[X] − t2 E[X 2 ] + o(t2 ), as t → 0.
2
g(u)
[Recall: a function g is o(u) if limu→0 u
= 0.]
7.1. CHARACTERISTIC FUNCTIONS 79
(ii)
∂ n ϕ(t)
= ϕ(n) (t) = in E[X n eitX ].
∂tn
In particular, ϕ(n) (0) = in E[X n ].
Proof.
Using the fact that |E[Y ]| ≤ E[|Y |] for any r.v. Y , (7.11) can then be
proved.
When n = 2, the RHS of (7.11) is given by
|tX|3 2|tX|2 t2
= E min |t||X|3 , 6|X|2 .
E min ,
3! 2! 6
Noting that,
a.s.
(a) min {|t||X|3 , 6|X|2 } → 0 as t → 0, and
(b) min {|t||X|3 , 6|X|2 } ≤ 6|X|2 .
With (a), (b) and E[|X|2 ] < ∞, by the dominated convergence theorem
(Theorem 4.2.2 (iii)), we have
|tX|3 2|tX|2
E min , = o(t2 ).
3! 2!
Therefore, we have
1
ϕ(t) = 1 + itE[X] − t2 E[X 2 ] + o(t2 ), as t → 0.
2
Consider
itX eihX − 1 − ihX
ihX
e
e
= − 1 − ihX
h h
1 1 2
≤ min (hX) , 2|hX| (from Lemma 7.1.1.)
h 2
1 2
= min hX , 2|X| .
2
Noting that,
a.s.
(a) min{hX 2 , 2|X|} → 0 as h → 0, and
(b) min{hX 2 , 2|X|} ≤ 2|X|.
With (a), (b) and E[|X|] < ∞, by the dominated convergence theorem
(Theorem 4.2.2 (iii)) , we can show that
ihX
itX e − 1 − ihX
lim E e = 0.
h→0 h
Hence, from (7.12),
0 0
ϕ (t) = E[iXeitX ] and ϕ (0) = iE[X].
R
Theorem 7.1.3. (Inversion Theorem) Let ϕ(t) = R eitx PX (dx) be the
characteristic function of the r.v. X with the distribution PX . If a < b, then
Z T −ita
1 e − e−itb 1
lim ϕ(t)dt = PX ((a, b)) + PX ({a, b}).
T →∞ 2π −T it 2
7.1. CHARACTERISTIC FUNCTIONS 81
Example
e−λ λk
P (X = k) = pk = , k = 0, 1, 2, · · · .
k!
∞
X e−λ λk
ϕ(t) = E[eitX ] = eitk
k=0
k!
∞ k
−λ
X (eit λ)
= e
k=0
k!
−λ eit λ it −1)
= e e = eλ(e .
eit −1
2. Uniform 1 0<x<1 it
4. Cauchy 1 1
π 1+x2
−∞ < x < ∞ e−|t|
FX (x + h) − FX (x)
h
PX ([−∞, x + h]) − PX ([−∞, x])
=
h
PX ((x, x + h])
=
h
Z ∞ −itx
1 e − e−it(x+h)
= ϕ(t)dt (since PX ({x}) = 0 for all x ∈ R.)
2πh −∞ it
Z ∞ Z x+h
1 −ity
= e dy ϕ(t)dt
2πh −∞ x
Z ∞
1 x+h 1
Z
−ity
= e ϕ(t)dt dy
h x 2π −∞
Z ∞
1
→ e−itx ϕ(t)dt ≡ f (x) as h → 0.
2π −∞
Thus Z x
dFX (x)
= f (x) or FX (x) = f (y)dy + C.
dx −∞
d
Theorem 7.1.5. (Slutsky Theorem) Let C be a constant. If Xn −→ X
a.s.
and Yn −→ C. Then
d
(a) Xn + Yn −→ X + C.
d
(b) Xn Yn −→ XC.
Xn d X
(c) Yn
−→ C
.
By Theorem 7.1.2,
0 σ 2 t2
ϕX 0 (t) = E[eitX1 ] = 1 − + o(t2 ).
i 2
n 0
1 Pn 0 X
it σ√ j=1 Xj it σ√jn
Y
ϕSn0 = E e n
= E[e ]
j=1
n
Y t
= ϕX 0 √
j=1
σ n1
n
t
= ϕX 0 √
1 σ n
2 n
σ 2 t2
t
= 1− 2 +o
2σ n σ2n
n
t2 t2
t2
= 1− +o → ϕ(t) ≡ e− 2 as n → ∞.
2n n
It is obvious that ϕ(t) is continuous at 0. By Theorem 7.1.6,
0
Sn ⇒ N (0, 1).
7.2. THE CENTRAL LIMIT THEOREM 85
Example
Let X1 , X2 , ·P
· · be i.i.d. r.v’s with P (Xj = 1) = p and P (Xj = 0) = q = 1−p.
Then Sn = nj=1 Xj follows a binomial distribution with parameters n (no.
of trials) and p (prob. of success), denoted as Sn ∼ B(n, p).
Martingales
8.1 Conditional Expectation
Given a probability space (Ω, F0 , P ), a σ-field F ⊂ F0 and a r.v. X ∈ F0
with E[|X|] < ∞.
(i) Y ∈ F.
R R
(ii) ∀A ∈ F, A
XdP = A
Y dP .
Definition Let µ and ν be the measures on (Ω, F). The measure ν is said
to be absolutely continuous with respect to µ if, for A ∈ F,
µ(A) = 0 ⇒ ν(A) = 0.
Notation: ν µ.
87
88 CHAPTER 8. MARTINGALES
Similarly, we have
P (Y 0 − Y > 0) = 0.
Thus,
P (Y = Y 0 ) = 1, ⇒ Y =Y0 a.s.
(b) Suppose first that X ≥ 0. Let µ = P and
Z
ν(A) = XdP for A ∈ F
A
dν
Taking A = Ω, we see that dµ
≥ 0 is integrable, and we have shown that
dν
dµ
is a version of E[X|F].
To treat the general case now, write X = X + − X − , let Y1 = E[X + |F]
and Y2 = E[X − |F]. Now, Y1 − Y2 ∈ F is integrable, and for all A ∈ F
we have
Z Z Z
XdP = +
X dP − X − dP
A ZA Z A
= Y1 dP − Y2 dP
ZA A
= (Y1 − Y2 )dP
A
8.1.1 Examples
Intuitively, we think of F as describing the information we have at our dis-
posal - for each A ∈ F, we know whether or not A has occurred. E[X|F] is
then our ”best guess” of the value of X given the information we have.
We want to show that E[X|F] = E[X]; i.e., if we don’t know anything about
X, then the best guess is the mean E[X].
E[XI{X∈Ωi } ]
E[X|F] = on Ωi
P (Ωi )
E[XI {X∈Ωi } ]
Proof. Observe that P (Ωi )
is constant on each Ωi , so it is measurable
with respect to F. To verify (ii), it is enough to check the equality for A = Ω,
but it is trivial:
E[XI{X∈Ωi } ]
Z Z
dP = E[XI{X∈Ωi } ] = XdP
Ωi P (Ωi ) Ωi
A degenerate but important special case is F = {∅, Ω}, the trivial σ-field.
In this case, E[X|F] = E[X].
Let B = G. Z
P (A|G)dP = P (A ∩ G)
G
8.1. CONDITIONAL EXPECTATION 91
Let B = Ω. Z
P (A|G)dP = P (A)
Ω
Therefore, we have
R
P (A ∩ G) P (A|G)dP
= RG = P (G|A)
P (A) Ω
P (A|G)dP
E(IA∩Gi )
P (A|G) = E(IA |G) = on Gi
P (Gi )
P (A ∩ Gi )
=
P (Gi )
R
P (A|G)dP
P (Gi |A) = RGi
P (A|G)dP
Ω
P (A ∩ Gi )
= P∞ R
i=1 Gi P (A|G)dP
P (A ∩ Gi )
= P∞ Bayes formula
i=1 P (A ∩ Gi )
Z Z
h(Y )dP = h(Y )I(Y ∈B) dP = E[h(Y )I(Y ∈ B)]
A
ZΩ∞ Z ∞
= dx h(y)I(y∈B) f (x, y)dy
−∞ −∞
Z Z ∞
= dy h(y)f (x, y)dx
B −∞
R∞ Z ∞
g(x)f (x, y)dx
Z
−∞
= dy · R ∞ · f (x, y)dx
B −∞
f (x, y)dx −∞
Z Z ∞
= dy g(x)f (x, y)dx
B −∞
Z ∞Z ∞
= g(x)f (x, y)I(y ∈ B)dxdy
−∞ −∞
Z
= E[g(X)IB (Y )] = g(X)dP
A
Therefore,
E[g(X)|Y ] = h(Y )
R
g(x)f (x, Y )dx
=
f (x, Y )dx
f (x, y)
f (x|y) = R
f (x, y)dx
Proof. It is clear that g(X) ∈ σ(X). To check (ii), note that if for any
A ∈ σ(X) then there exists C ∈ R so that A = {X ∈ C}.
Z Z
ϕ(X, Y )dP = ϕ(X, Y )dP = E[ϕ(X, Y )I{X∈C} ]
A X∈C
Z Z
= ϕ(x, y)I{X∈C} ν(dy)µ(dx)
Z Z
= I{X∈C} [ ϕ(x, y)ν(dy)]dµ(x)
Z
= I{X∈C} g(x)µ(dx)
Z Z
= g(x)µ(dx) = g(x)dP
X∈C A
⇒ g(X) = E[ϕ(X, Y )|X]
8.1.2 Properties
Theorem 8.1.3. (a) Linearity
Proof. (a) For (i), since E[X|F], E[Y |F] ∈ F, then aE[X|F]+bE[Y |F] ∈ F.
To check (ii), if A ∈ F, then
Z Z Z
(aX + bY )dP = a XdP + b Y dP
A ZA A Z
= a E[X|F]dP + b E[Y |F]dP
ZA A
Let → 0. We have
E(Yn |F) ↓ 0.
Proof. Omitted.
(b)
|E[X|F]|p ≤ E[|X|p |F] for p ≥ 1.
Let A = Ω.
Z Z
E[E(Y |F)] = E[Y |F]dP = Y dP = E[Y ]
Ω Ω
= E[Y |F]dP
ZA∩B Z
= IB E[Y |F]dP = XE[Y |F]dP
A A
⇒ E[XY |F] = XE[Y |F]
Case 2:
If X, Y ≥ 0, let {Xn , n ≥ 1} be a sequence of simple r.v’s. and Xn ↑ X.
Using the Monotone Convergence Theorem,
Z Z
XE[Y |F]dP = lim Xn E[Y |F]dP
A n→∞ A
Z
= lim E[Xn Y |F]dP
n→∞ A
Z
= lim E[Xn Y |F]dP
n→∞
ZA
= E[XY |F]dP
A
X = X+ − X− and Y =Y+−Y−
E[XY |F] = E[X + Y + |F] − E[X − Y + |F] − E[X + Y − |F] + E[X − Y − |F]
= X + E[Y + |F] − X − E[Y + |F] − X + E[Y − |F] + X − E[Y − |F]
= XE[Y |F]
E[X − Y ]2
8.2. MARTINGALES 97
Proof.
So
8.2 Martingales
Definition Let {Fn : n ≥ 1} be a sequence of σ-field. {Fn : n ≥ 1} is called
a filtration if
F1 ⊆ F2 ⊆ F3 ⊆ · · ·
E[Xm |Fm ] = E[Xm+k |Fm ] = E[E[Xm+k |Fm+k−1 ]|Fm ] ≤ E[Xm+k−1 |Fm ] ≤ · · · ≤ E[Xn |Fn ] = Xn