STAT2602 tutorial 2
Name: Ying Li, Sanyou Wu, Zhenghao Li
The University of Hong Kong
This week:
1. Type of convergence
2. Some examples
3. Exercise
4. Appendix (for almost surely)
1 Types of convergence
Let {Xn }∞
n=1 be a sequence of random variables and X be another random variable.
In statistics, we have 4 different types of convergence of random variables.
1. Almost sure convergence:
a.s.
Xn → X if P (limn→∞ Xn = X) = 1. (1)
2. Mean square convergence
m.s.
Xn → X if limn→∞ E (Xn − X)2 = 0.
(2)
3. Convergence in probability (PS:In STAT2602, we mainly use 3.)
p
Xn → X if for any ϵ > 0, limn→∞ P (|Xn − X| > ϵ) = 0. (3)
4. Convergence in Distribution
d
Xn → X if limn→∞ Fn (X) = F (X) (4)
for any X at which F is continuous. Here Fn (X) and F (X) are the CDFs of Xn and X, respectively.
Note that a sequence might converge in one sense but not another. Some of these convergence
types are “stronger” than others and some are “weaker.” By this, we mean the following: If Type
A convergence is stronger than Type B convergence, it means that Type A convergence implies
Type B convergence.
[Link]
php
1
Figure 1: The stronger types of convergence are on top and, as we move to the bottom, the
convergence becomes weaker.
Theorem 1.1. W.L.L.N (Weak Law of Large Numbers): Let {Xi }∞
i=1 be a sequence of i.i.d. random
variables with E(Xi ) = µ, var(Xi ) = σ 2 for any i then
n
1X p
X̄n = Xi → µ. (5)
n
i=1
Proof. By Chebyshev’s inequality.
2 Some examples
Theorem 2.1 (Continuous mapping theorem). Let X1 , . . . , Xn be random variables, g : R → R
be continuous. Then we have
a.s. a.s.
• Xn → X ⇒ g(Xn ) → g(X)
p p
• Xn → X ⇒ g(Xn ) → g(X)
d d
• Xn → X ⇒ g(Xn ) → g(X)
Examples:
p p
1. Xn → X ⇒ Xn − X → 0
p p
2. Xn → 1 ⇒ 1/Xn → 1
p p p
3. Xn → 0, Yn → 0 ⇒ Xn Yn → 0
p p p
4. Xn → a, Yn → b ⇒ Xn Yn → ab
p p
5. Xn → X, and Y is a random variable ⇒ Xn Y → XY
p p p
6. Xn → X, Yn → Y ⇒ Xn Yn → XY (by continous mapping theorem)
2
3 Exercise
1. Suppose X and Y are random variable with the joint density function f (x, y) = sin x2 for
√
0 < x < π and 0 < y < x. Define Wn = I(X > nY ) for all positive integer n.
a. Find the distribution of Wn .
b. Show that Wn converges in probability to 0 .
c. Show that E(X) exists. (Hints: Consider the existence of E X 2 .)
d. Show that E X 2 Wn = E X 2 E (Wn ). Determine whether X 2 and Wn are independent.
Solution:
a. Since Wn takes value only on 0 and 1, we would like to calculate P (Wn = 1) first, then by
using P (Wn = 0) = 1 − P (Wn = 1) to complete the question.
1
P (Wn = 1) = P (X > nY ) = P Y < X (6)
n
Z √π Z 1 x Z √π Z 1 x
n n
sin x2 dydx
= f (x, y)dydx = (7)
0 0 0 0
√
π
cos x √
2
Z
1 1 π
x sin x2 dx = [−
= ]0 (8)
0 n n 2
1 1
= (− cos π +1) = . (9)
2n | {z } n
=1
1
Thus, p(Wn = 1) = n and p(Wn = 0) = 1 − n1 .
1
b. For any ϵ > 0, P (|Wn − 0| > ϵ) = P (Wn = 1) = n → 0 as the n → ∞.
c.
√
Z π Z x
2
x2 f (x, y)dydx
E X =
0 0
√
Z π Z x
x2 sin x2 dydx
=
0 0
√
Z π
x3 sin x2 dx
=
0
Z √π
1 (10)
x2 sin x2 d x2
=
2 0
1 π
Z
let u = x2 , du = 2xdx
= u sin udu
2 0
1 π du ∗ − cos u
Z
1 π
= [u ∗ − cos(u)]0 − − cos udu (ps : = − cos u + u sin u)
2 2 0 u
1 1
= (π + [sin u]π0 ) = π.
2 2
3
p
By Cauchy-Schwarz inequality (E X 2 E Y 2 exists, E(XY ) ≤
E (X 2 ) E (Y 2 ). ) and
setting Y = 1, p
E(X) ≤ E (X 2 )
[E(X)]2 ≤ E X 2
π
=
2
d. As X 2 Wn is a mixed distribution with P X 2 Wn = 0 = 1 − 1/n and P X 2 Wn > z =
P X 2 > z for any z > 0
Z √π Z x
x
E X 2 Wn = x2 I(y < ) sin x2 dydx
0 0 n
Z √π Z 1 x
n
x2 sin x2 dydx
=
0 0
√
Z π 1
x
= x2 sin(x2 )[y]0n dx
0
√
Z π
x 2
= x sin(x2 )dx
0 n
√
Z π
1 2
x sin x2 2xdx
=
0 2n
Z π
1 √
= Q sin(Q)dQ (let Q = x2 , dQ = 2xdx, x = 0, Q = 0, x = π, Q = π)
2n 0
Z π
1 1 dQ ∗ − cos Q
= [Q ∗ − cos(Q)]π0 − − cos QdQ (ps : = − cos Q + Q sin Q)
2n 2n 0 Q
1
= (π + [sin Q]π0 )
2n
π
=
2n
= E X 2 E (Wn ) .
1 1 1
∗ 1 + (1 − ) ∗ 0 = ,
E(Wn ) = (11)
n n n
π
E(X 2 ) = . (12)
2
Although E X 2 Wn = E X 2 E (Wn ), we can not tell whether the two random variables
are independent or not. To prove independence, we have to look at the joint (or conditional)
probability distribution function. For this question, it can be proved that the two random
variables are actually independent.
2. Let X be a positive random variable such that the probability density function of X is
(ln x)2
1
fX (x) = √ exp − .
2πx 2
a. By using the fact that when x is large and t > 0, tx − (ln x)2 /2 > ln x, show that the moment
generating function of X, MX (t), does not exist for t > 0.
4
b. Show that E X k = exp k 2 /2 for any positive integer k. Determine whether the existence
of moments at any order implies its moment generating function is infinitely differentiable at
t = 0.
c. c. Let Y be a positive random variable such that the probability density function of Y is
(ln y)2
1
fY (y) = √ exp − (1 + sin(2π ln y)).
2πy 2
Show that E Y k = E X k for any positive integer k. Is your answer contradicting to the
one-to-one correspondence between moment generating function and the distribution?
Solution:
a.
∞
(ln x)2
Z
tX
tx 1
E e = e √ exp − dx
0 2πx 2
Z k Z ∞ (13)
(ln x)2 (ln x)2
1 1
= etx √ exp − dx + √ exp tx − dx
0 2πx 2 k 2πx 2
Note when t > 0, and when x > k, then
(ln x)2
tx − > ln x (14)
2
(ln x)2
tx > ln x + (15)
2
Taking the gradient w.r.t. x
gradient of the left hand side = t (16)
1 1 1 + ln x
gradient of the right hand side = + ln x = . (17)
x x x
Both gradient are greater than zero (for x > 1), thus, they both are increasing functions. The
gradient of the second function is decreasing as x increasing, then, there must exist a moment,
the speed of the increasing of the second function is less than that of the first function. In
2
other word, there must exist a k, s.t. x > k, tx − (ln2x) > ln x.
Considering the second term
Z ∞ Z ∞
(ln x)2
1 1
√ exp tx − > √ exp(ln x)dx (18)
k 2πx 2 2πx
Zk ∞
1
= √ xdx (19)
k 2πx
Z ∞
1
= √ dx (20)
k 2π
→ +∞ (21)
5
b.
∞
(ln x)2
Z
k
1 k
E X = x √ exp − dx
0 2πx 2
Z ∞
(ln x)2
1 1
= exp(k ln x) √ exp − dx (let u = ln x, du = dx)
0 2πx 2 x
Z ∞ 2
1 u
= √ exp ku − du (22)
−∞ 2π 2
2Z ∞
(u − k)2
k 1
= exp √ exp − du
2 −∞ 2π 2
2
k
= exp
2
Now, if the MX (t) to be differentiable at t = 0, then,
Mx (t) − Mx (0) Mx (t) − Mx (0)
lim and lim . (23)
t→0+ t−0 t→0− t−0
However, for t > 0, moment generating function MX (t) does not exist, thus,
Mx (t) − Mx (0)
lim , (24)
t→0+ t−0
do not exist.
c.
∞
(ln y)2
Z
k
1 k
E Y = y √ exp − (1 + sin(2π ln y))dy
0 2πy 2
Z ∞ Z ∞
(ln y)2 (ln y)2
k 1 k 1
= y √ exp − dy + y √ exp − sin(2π ln y)dy
0 2πy 2 0 2πy 2
| {z }
E(X 2 )
Z ∞
k2 (ln y)2
1
= exp 1+ exp(k ln y) √ exp − sin(2π ln y)dy
2 0 2πy 2
2 Z ∞
(ln y)2
k 1
= exp 1+ √ exp k ln y − sin(2π ln y)dy
2 0 2πy 2
2 Z ∞
(u − k)2
k 1 1
= exp 1+ √ exp − sin(2πu)du (let u = ln y, du = dy)
2 −∞ 2π 2 y
2 Z ∞ 2
k 1 v
= exp 1+ √ exp − sin[2π(v + k)]dv (let v = u − k, dv = du)
2 −∞ 2π 2
2 Z ∞ 2
k 1 v
= exp 1+ √ exp − sin(2πv)dv (k ∈ Z, sin(2πv + 2πk) = sin(2πv))
2 −∞ 2π 2
2
k
= exp
2
(25)
as the function under integration is odd. Although any order moments of Y equal to those
of X, as the moment generating function of X does not exist, there is no contradiction.
6
3. Let X be a discrete random variable such that
∞
X 1 t
MX (t) = n
exp .
2 2n
n=1
−1
a. Show that E X k = 2k+1 − 1
for any positive integer k.
b. Find the probability mass function of X.
c. Let (Yn ; n ≥ 1) be a sequence of independent random variables which have the same moment
generating function as X 2 and Ȳ = (Y1 + Y2 + . . . + Yn ) /n. Estimate P (|Ȳ − 1/7| > 0.1)
when n → ∞.
a.
∞ X ∞
d X 1 1 t 1 t
Mx (t) = exp = exp
dt 2n 2n 2 n 2 2n 2n
n=1 n=1
∞
d2 X 1 1 X
t 1 t
2
Mx (t) = 2n n
exp n
= 3n
exp
dt 2 2 2 2 2n
n=1
dk P∞ 1
t
By observation, we know dt k M× (t) = n=1 2(k+1)n exp 2n . Then,
dk
E X k = k Mx (0)
dt
∞
X 1
= ( k+1 )n
2
n=1
1
= − 1. (26)
1 − 1/2k+1
2k=1
= k+1 −1
2 −1
1
= k+1
2 −1
PS:
∞
X 1
pn = if |p| < 1 (27)
1−p
n=0
b. X tx
Mx (t) = E etx = e P (X = x)
x
X
= P (X = x)ext
x
∞
X 1 1n t
= e2
2n
n=1
By observation, we know that
(
x if x = 21n : with n ∈ Z+
P (X = x) =
0 otherwise
7
c. Let Yi = X 2
k
1
∵E X =
2k+1 − 1
1 1
∴ E (Yi ) = E X 2 = 2+1
=
z −1 7
n
1 X p 1
∵ Ȳn = Yi −→ E (Yi ) =
n 7
i=1
1
for any ϵ > 0, P Ȳn − 7 > ϵ → 0, as n → ∞
1
P Ȳn − > 0.1 → 0, as n → ∞.
7
4 Appendix
(Note that you do not required to prove convergence almost surely in this course.)
By the definition, we have
∞ [
a.s.
\
Xn → X ⇔ ∀ϵ > 0, P( {|Xn − X| ≥ ϵ}) = 0
N =1 n=N
On the other hand, we can prove that
∞ [
\ [
∀ϵ > 0, P( {|Xn − X| ≥ ϵ}) = 0 ⇔ lim P( {|Xn − X| > ϵ}) = 0
N →∞
N =1 n=N n=N
This tell us that a.s. being stronger than the convergence in probability. Moreover, if {Xn } are
(pairwise) indepent, by Borel 0-1 law,
a.s.
X
Xn → X ⇔ P({|Xn − X| > ϵ}) < ∞
n
For example, let P(Xn = n) = n1 , P(Xn = 0) = 1 − n1 . We have
X X1
P({|Xn − 0| > ϵ}) = =∞
n n
n
Thus Xn is not converge to 0 almost surely.
Theorem 4.1 (Borel 0-1 law). Let {An } be (pairwise) independent, then
[ X
P( lim An ) = 0 ⇔ P(An ) < ∞
N →∞
n=N n
and
[ X
P( lim An ) = 1 ⇔ P(An ) = ∞
N →∞
n=N n