Introduction To Probability Theory

Introduction to Probability Theory
K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay
October 19, 2017

2
LECTURES 22-23
Chapter 8 : Moment Generating functions and Character-

istic functions
In this chapter, we introduce the notion of moment generating function
(in short mgf) and characteristic function of a random variable and study its
properties. Both moment generating function and Characteristic function
can be used to identify distribution functions uniquely unlike moments. In
fact, a way to understand whether a distribution is moment determinant or
not is by using either moment generating function or characteristic functions.
It is interesting to note that mgf is closely related the Laplace transform and
characteristic function is its counter part Fourier transform.
0.1 Moment generating function

In this subsection we study moment generating functions and its properties.
Definition 8.1 Given a random variable on a probability space (Ω, F, P ),
its moment generating function denoted by MX is defined as
MX (t) = E[etX ], t ∈ I,
where I is an interval on which the rhs expectation exists. In fact for a

non negative random variable X, I always contains (−∞, 0]. If X is a non
negative random variable such that EX doesn’t exists, then MX (t) doesn’t
exists for t > 0 (exercise). Analogous comment holds for negative random
variable. Moment generating functions becomes useful, if I contains an
interval containing 0.
Example 0.1 Let X ∼ Bernoulli (p). Then
MX (t) = (1 − p) + pet , t ∈ R.
Now we will state and indicate the proofs of various properties of moment
generating functions.
Theorem 0.1 Let X be a random with mgf MX (t), t ∈ I and Y = aX +

b, a 6= 0. Then the mgf of Y is given by
MY (t) = ebt MX (t), t ∈ J,
where J = {t ∈ R|at ∈ I} := a−1 I.

0.1. MOMENT GENERATING FUNCTION 3
Proof: For t ∈ a−1 I,

E[etY ] = E[ebt eatX ]
= ebt MX (at).
Note that for t ∈ a−1 I, at ∈ I and hence MX (at) exists.
Following is an illustration of the use of the above theorem.
Example 0.2 Let X ∼ N (µ, σ 2 ). Then X = µ + σY, Y ∼ N (0, 1). Now
Z ∞
1 1 2
MY (t) = √ ety e− 2 y dy
2π −∞
Z ∞
t2 1 1 2
= e √ 2 e− 2 (y−t) dy
2π −∞
t2
= e 2 , t ∈ R.
t2
i.e. mgf of standard normal is e 2 , t ∈ R. Now using the above therorem
1 2 t2
MX (t) = eµt+ 2 σ , t ∈ R2 .
Theorem 0.2 Let X be a random variable such that MX (t) exists in an

interval I which contains [−h, h] for some h > 0. Then X has moments of
all orders and the mgf MX has all derivatives on (−h, h) and the following
holds.
(k)
EX k = MX (0), k = 0, 1, · · · .
Here M (k) (t) denote the kth derivative of MX (t) at t. Proof follows from
dk E[etX ] h dk etX i
= E , t ∈ (−h, h).
dtk dtk
So proof is all about justifying differentation under the ’integral’ sign.
The above theorem is bad for mgfs. What I meant is the following
”existence of mgf arround a neighourhood of 0, makes the random variable
very nice. This implies an unpleasent property of mgfs, i.e. they won’t
exists arround zero unless the random variable is nice”. We will see some
examples to illustrate this point.
Example 0.3 (Cauchy distribution) Let X be a Cauchy random variable,

i.e. X is a continuous random variable with pdf
1
f (x) = , x ∈ R.
π(1 + x2 )
4
We know that X doesn’t have finite mean (exercise). Hence Theorem 0.2
points to the non existence of mgf. In fact, we show that mgf of X exists
only at t = 0. For t > 0, consider
1 ∞ tx 1
Z
tX
E[e ] = e dx
π −∞ 1 + x2
1 ∞ tx 1
Z
≥ e dx
π 0 1 + x2
t ∞ x
Z
≥ dx
π 0 1 + x2
Z ∞
t 1
= dy.
2π 1 y
The RHS integral diverges to ∞. Hence by comparison of integrals, it follows
that E[etX ] diverges to ∞. i.e. MX (t) doesn’t exists for t > 0. Using a
simliar argument one can show that MX (t) doesn’t exists for t < 0 (exercise).
So MX (t) exists only at 0.
Example 0.4 (Log normal distribution) Let X ∼ eY , Y ∼ N (µ, σ 2 ). Then
X is said to be ’log normally’ distributed.
Note
n2
EX n = E[enY ] = MY (n) = e 2 , n ≥ 1.
Now since X ≥ 0, clearly MX (t) exists for t ≥ 0. Now for t > 0,
Y
E[etX ] = E[ete ]
Z ∞
1 y y2
= √ ete e− 2 dy
2π −∞
Z ∞
1
≥ √ edy,
2π K
for some K > 0 large enough. In the first inequality we used the fact that
there exists a K > 0
y y2
tee − ≥ 1 for all y ≥ K.
2
Now since the last integral diverges to ∞, it follows that E[etX diverges to
∞. Hence MX (t) doesn’t exists for t > 0, though EX n exists for all n ≥ 1.
Theorem 0.3 Let X1 , X2 , · · · , Xn be indedependent random variables with
mgfs MXi (t) exists on a common interval I, then the mgf of the sum Sn =
X1 + · · · + Xn exists on I and is given by
MSn (t) = Πnk=1 MXk (t), t ∈ I.
0.1. MOMENT GENERATING FUNCTION 5
Proof: We use the following results.

• If X and Y are independent, then E[XY ] = EXEY .
• If X and Y are independent so are f ◦ X and g ◦ Y , where f, g are

Borel functions.
Now
E[etSn ] = E[etX1 · · · etXn ]

= Πni=1 E[etXi ]
= Πni=1 MXi (t), t ∈ I.
This completes the proof.
Example 0.5 Let X ∼ Binomial (n, p). Note that X = X1 + · · · + Xn ,

where Xi ’s are i.i.d. Bernoulli (p). Hence
MX (t) = (MX1 (t))n = (1 − p + pet )n , t ∈ R.

n n
MX (t) = (1 − p)n + (1 − p)n−1 pet + (1 − p)n−2 p2 e2t + · · · + pn ent .
1 2

(1) n n−1 t n
MX (t) = (1 − p) pe + 2 (1 − p)n−2 p2 e2t + · · · + npn ent ,
1 2

(2) n n−1 t 2 n
MX (t) = (1 − p) pe + 2 (1 − p)n−2 p2 e2t + · · · + n2 pn ent ,
1 2

(k) n n−1 t k n
MX (t) = (1 − p) pe + 2 (1 − p)n−2 p2 e2t + · · · + nk pn ent ,
1 2
k = 3, 4, · · · .
Therefore

k n n−1 k n
EX = (1 − p) p+2 (1 − p)n−2 p2 + · · · + nk pn , k = 1, 2, · · ·
1 2
gives all the moments.
Theorem 0.4 Let X and Y be two random variables such that MX (t) =
MY (t) on some interval I, then X and Y has the same distribution function.
This proof is beyond the scope of this course. Any way we will see a similar
result for the characteristic functions soon.
6
Theorem 0.5 (moment determinant distributions) Let X be a random vari-

able such that the moments µk = EX k , k ≥ 1 exists and satisfies
1 2k1
lim µ2k = 0.
k→∞ 2k
Then if Y is another random variable with EY k = µk for all k ≥ 1, then X

and Y has same distribution.
The proof follows from the Riesz criterion (sufficient condition) for moment
determinancy given by
µ 1
2k 2k
lim inf < ∞.
k→∞ (2k)!
and the Stirling’s approximation given by
n!
lim √ 1 = 1.
n→∞ 2πnn+ 2 e−n
Remark 0.1 Theorem 0.5 gives a partial converse for Theorem 0.2. i.e., ”
if µk = EX k exists for all k, then MX (t) is uniquely determined by µk ’s” is
not true in general. But with an extra condition that µk doesn’t grow rapidly
1
1 2k
(for example if limk→∞ 2k µ2k = 0, then mgf is uniquely determined. This
gives a partial answer to the question, when given all moments determine a
distribution uniquely, because mgfs determinds distributions uniquely, a fact
we will not state or prove in this course.
Now we will see some more examples.
Example 0.6 Let X ∼ Binomial (n, p). Then

n
X
X= kI{X=k} ,
k=0
with
n k
P {X = k} = p (1 − p)n−k := pk , k = 0, 1, · · · , n.
k
Hence
n
X
MX (t) = ketk pk , t ∈ R.
k=0
0.2. CHARACTERISTIC FUNCTIONS 7
Therefore
n
(m)
X
MX (t) = k m etk pk , t ∈ R.
k=0
So we get the moments as

n
X
m
µm = EX = k m pk .
k=0
Now note that

0 ≤ µ2m ≤ 22m , m ≥ 1.
Therefore
1 1
lim (µ2m ) 2m = 0.
m→∞ 2m
i.e. Binomial (n, p) is moment determinant.
0.2 Characteristic functions

Definition 7.1(Characteristic functions) The characteristic function of a
random variable X is defined as
ΦX (t) = EeitX , t ∈ R.
(where EeitX = E cos tX + iE sin tX).
A digression: Before going further, I will give some very brief working
knowledge for complex valued functions defined on R.
• Given a function f = f1 +if2 : R → C, f1 , f2 are the real and imaginary

parts of f .
• Given two functions f, g : R → C with f = f1 + if2 , g = g1 + ig2 , we

define
(f + g)(x) := f1 (x) + g1 (x) + i(f2 (x) + g2 (x)), x ∈ R,

(f g)(x) = f (x)g(x)
= f1 (x)g1 (x) − f2 (x)g2 (x) + i(f1 (x)g2 (x) + f2 (x)g1 (x)), x ∈ R
1 1
(x) = , x ∈ R.
f f1 (x) + if2 (x)
8
• Given a finction g : R → R, we say that g is continuous at x if both

g1 and g2 are continuous at x. This equivalent to the following. ”For
each varepsilon > 0, there exists δ > 0 such that
|x − y| < δ ⇒ |g(x) − g(y)| < ε.
• A function g : R → C is uniformly continuous if g1 and g2 are uniformly

continuous. This is equivalent to ”for each ε > 0, there exists a δ > 0
such that
for all x, y ∈ R, with |x − y| < δ ⇒ |g(x) − g(y)| < ε.
• g : R → C is said be differentable at x, if g1 and g2 are differentiable

at x and in this case g 0 (x) = g10 (x) + g20 (x).
• g : R → C is (Riemann) integrable on [a, b] if g1 and g2 are integrable
on [a, b] and in this case
Z b Z b Z b
g(x)dx = g1 (x)dx + i g2 (x)dx.
a a a
These are special class of line integrals.
• Let G : R → C be the primitive of g : R → C, i.e., (G0 (x) = g(x), x ∈
R), then for any a < b,
Z b
g(x)dx = G(b) − G(a).
a
(This is called the fundamental theorem for line integrals).
Example 0.7 Let X ∼ Bernoulli(p). Then
φX (t) = (1 − p) + peit .
Example 0.8 Let X ∼ exponential(λ). Then
Z ∞
ΦX (t) = Ee itX = λ eitx e−λx dx
R ∞0 (it−λ)x
= 0 eh dxi
∞
λ (it−λ)x
= it−λ e
0
λ
= it−λ , t ∈ R.
In the third equality, we used fundamental theorem for line integrals and in
the fourth equality, we used
lim e(it−λ)x = 0.
x→∞
Theorem 0.6 For any random variable X, its characteristic function φX (·)
is uniformly continuous on R and satisfies
(i) ΦX (0) = 1
(ii) |ΦX (t)| ≤ 1
(iii) ΦX (−t) = ΦX (t) , where for z a complex number, z denote the
conjugate.
Proof:
We prove (iii), (i) and (ii) are exercises.
ΦX (−t) = Ee−itX = E cos tX − iE sin tX

= E cos tX + iE sin tX
= ΦX (t) .
Now we show that ΦX is uniformly continuous. Consider
|ΦX (t + h) − ΦX (t)| = |E(ei(t+h)X − eitX )|,

≤ ihX
p − 1|
E|e
= E 2(1 − cos(hX))
= 2E| sin( hX
2 )|
Note that hX(ω)

hX
lim sin = 0, and sin( ) ≤ 1.

h→0 2 2
Hence, using Dominated Convergence theorem, ΦX (t + h) → ΦX (t) uni-
formly in t as h → 0. This imply that ΦX is uniformly continuous.
Theorem 0.7 If the random variable X has finite moments upto order n.
Then Φ has continuous derivatives upto order n.More over
(k)
ik EX k = ΦX (0), k = 1, 2, . . . , n .
Proof.
Consider
ΦX (t + h) − ΦX (t) (eihX − 1)
= E[eitX ]
h h
since |eihx − 1| ≤ |hx|), we get
(eihX − 1)
|eitX | ≤ |X|
h
10
and E|X| < ∞. Hence by Dominated Convergence theorem
(eihX − 1)
lim E[eitX ] = E[iXeitX ] .
h→0 h
Therefore
Φ0X (t) = E[iXeitX ] .
Put t = 0, we get
(1)
ΦX (0) = i EX .
For higher order derivatives, repeat the above arguments.
Theorem 0.8 (Inversion theorem) Let X be a random variable with distri-

bution function F and characteristic function φX (·). Then
∞
e−ita − e−itb
Z
1
F (b) − F (a) = ΦX (t)dt,
2π −∞ it
whenever a, b are points of continuity of F .
Proof. Before proceeding towards the sketch of proof, a word about the
integral on the rhs. The integral on the rhs is interpreted as an improper
Riemann integral which may not be in general absolutely integrable. At this
stage, student may not worry about this.
Consider
Z ∞ −ita Z ∞ −ita
1 e − e−itb 1 e − e−itb itX
ΦX (t)dt = Ee dt
2π −∞ it 2π −∞ Z ∞ −ita it
1 e − e−itb itX
= E e dt (0.1)
2π it
Z ∞ −∞
eit(X−a) − eit(X−b)
= E dt .
−∞ 2πit
The second equality follows from the change of order of integration (This
in fact, requires to consider the integrals on finite intervels say for example
[−T, T ](i.e. proper integrals) and use change of variable formula there and
then let T → ∞). Now
∞
0
eit(X−a) − eit(X−b) e−it(X−a) − e−it(X−b)
Z Z
dt = dt (0.2)
−∞ 2πit 0 2πit
Hence, using 2iSinθ = eiθ − e−iθ , we have

Z ∞ it(X−a)
− eit(X−b) 1 ∞ Sin t(X − a) 1 ∞ Sin t(X − b)
Z Z
e
dt = dt − dt .
−∞ 2πit π 0 t π 0 t
(0.3)
Using Z ∞
Sin αx π
dx = sgn(α)
0 x 2
we get  1
Z ∞ if X > a
1 Sin t(X − a)  2
dt = 0 if X = a (0.4)
π 0 t  1
− 2 if X < a ,
where 
 −1 if α < 0
sgn(α) = 0 if α = 0
1 if α > 0 .

Similarly, the other integral. Combining (0.1), (0.3) and (0.4), we com-
plete the proof.
Remark
R ∞ sin αx 0.2 In this remark, I will give the computation R ∞ of the integral
sin x
0 x dx. It is enough to compute the Dirichlet integral 0 x dx because
for other values of α, the integral follows from the Dirichlet integral easily.
For example, Z ∞ Z ∞
sin(−x) sin x
dx = − dx.
0 x 0 x
R∞
Before even proceeding to compute this, I will show that 0 sinx x dx is not
absolutely integrable but is integrable. First recall that
Z ∞ Z T
sin x sin x
dx := lim dx.
0 x ε→0,T →∞ ε x
Consider
Z T Z T
sin x 1
dx = (1 − cos x)0 dx
1 x 1 x
h 1 − cos x iT Z T 1 − cos x
integration by parts = + dx
x 1 1 x2
Now since
sin x
lim = 1
x↓0 x
12
sin x
the function f (x) = x ,x > 0, = 1, x = 0 is Riemann integrable and hence
Z 1
sin x
lim dx exists.
ε→0 ε x
Combining above arguments it follows that

Z T
sin x
lim dx
ε→0,T →∞ ε x
exists. i.e. Dirichlet integral exists as improper Riemann integral. Now we

will see that it is not absolutely integrable. Note that
∞ ∞
| sin x| | sin2 x|
Z Z
dx ≥ dx
0 |x| |x|
Z0 ∞
|1 − cos 2x|
= dx
2x
Z0 ∞
1 − cos 2x
= dx
2x
Z0 ∞
1 − cos 2x
≥ dx.
1 2x
The last integral has two parts in which first one is known to diverge to ∞
and the second one converges using the same arguement as for the Dirichlet
integral. Hence the integral on the lhs also diverges.
Now we compute the Dirichlet integral. To this end, first we need the
value of the following. Consider for any given u > 0
Z ∞ Z T
eix e−xu dx = lim e(i−u)x dx
0 T →∞ 0
h e(i−u)x iT
= lim
T →∞i−u 0
−1 u+i
= = .
i−u 1 + u2
Hence equating the real and imaginary parts, we get the values of the fol-
lowing improper Riemann integrals
Z ∞ Z ∞
−xu u 1
cos xe dx = 2
, sin xe−xu dx = .
0 1+u 0 1 + u2
Now consider
Z ∞ Z ∞ Z ∞
sin x
dx = sin x e−xu dudx
0 x
Z0 ∞ Z ∞ 0
= sin xe−xu dxdu

Z0 ∞ 0
1 π
= du = .
0 1 + u2 2
Theorem 0.9 (Uniqueness Theorem)

Let X1 , X2 be two random variables such that ΦX1 ≡ ΦX2 . Then X1 , X2
have same distribution.
Proof:
Using Inversion theorem, we have
F1 (b) − F1 (a) = F2 (b) − F2 (a)
for all a, b ∈ R such that F1 , F2 are continuous at a and b.

Now let a → −∞, we have
F1 (b) = F2 (b)
for all b at which F1 and F2 are continuous.

Therefore
F1 ≡ F2 (Exercise)

Introduction To Probability Theory

Uploaded by

Introduction To Probability Theory

Uploaded by

Introduction to Probability Theory

October 19, 2017

Chapter 8 : Moment Generating functions and Character-

0.1 Moment generating function

where I is an interval on which the rhs expectation exists. In fact for a

Example 0.1 Let X ∼ Bernoulli (p). Then

Theorem 0.1 Let X be a random with mgf MX (t), t ∈ I and Y = aX +

MY (t) = ebt MX (t), t ∈ J,

where J = {t ∈ R|at ∈ I} := a−1 I.

Proof: For t ∈ a−1 I,

Theorem 0.2 Let X be a random variable such that MX (t) exists in an

Example 0.3 (Cauchy distribution) Let X be a Cauchy random variable,

Proof: We use the following results.

• If X and Y are independent so are f ◦ X and g ◦ Y , where f, g are

E[etSn ] = E[etX1 · · · etXn ]

This completes the proof.

Example 0.5 Let X ∼ Binomial (n, p). Note that X = X1 + · · · + Xn ,

MX (t) = (MX1 (t))n = (1 − p + pet )n , t ∈ R.

gives all the moments.

Theorem 0.5 (moment determinant distributions) Let X be a random vari-

Then if Y is another random variable with EY k = µk for all k ≥ 1, then X

and the Stirling’s approximation given by

Now we will see some more examples.

Example 0.6 Let X ∼ Binomial (n, p). Then

So we get the moments as

Now note that

0.2 Characteristic functions

(where EeitX = E cos tX + iE sin tX).

• Given a function f = f1 +if2 : R → C, f1 , f2 are the real and imaginary

• Given two functions f, g : R → C with f = f1 + if2 , g = g1 + ig2 , we

(f + g)(x) := f1 (x) + g1 (x) + i(f2 (x) + g2 (x)), x ∈ R,

• Given a finction g : R → R, we say that g is continuous at x if both

• A function g : R → C is uniformly continuous if g1 and g2 are uniformly

• g : R → C is said be differentable at x, if g1 and g2 are differentiable

ΦX (−t) = Ee−itX = E cos tX − iE sin tX

|ΦX (t + h) − ΦX (t)| = |E(ei(t+h)X − eitX )|,

Note that  hX(ω) 

since |eihx − 1| ≤ |hx|), we get

and E|X| < ∞. Hence by Dominated Convergence theorem

Theorem 0.8 (Inversion theorem) Let X be a random variable with distri-

whenever a, b are points of continuity of F .

Hence, using 2iSinθ = eiθ − e−iθ , we have

Combining above arguments it follows that

exists. i.e. Dirichlet integral exists as improper Riemann integral. Now we

= sin xe−xu dxdu

Theorem 0.9 (Uniqueness Theorem)

F1 (b) − F1 (a) = F2 (b) − F2 (a)

for all a, b ∈ R such that F1 , F2 are continuous at a and b.

for all b at which F1 and F2 are continuous.

You might also like

Note that hX(ω)