Random Variable Functions Explained
Random Variable Functions Explained
Proof Fix B ∈ B. Note that h−1 (B) = {x ∈ R : h(x) ∈ B}. Clearly h(X(ω)) ∈ B ⇔
X(ω) ∈ h−1 (B) and therefore
Z −1 (B) = {ω ∈ Ω : Z(ω) ∈ B}
= {ω ∈ Ω : h(X(ω)) ∈ B}
= {ω ∈ Ω : X(ω) ∈ h−1 (B)}
= X −1 (h−1 (B)).
The following theorem provides condition under which a function h(X) of a random
variable X is a random variable.
1
Theorem 3 Under the notation of Lemma 1, Z is a random variable provided h is a
Borel function.
Proof Fix B ∈ B. Since h is a Borel function we have h−1 (B) ∈ B and now using the
fact that X is a random variable it follows that Z −1 (B) = X −1 (h−1 (B)) ∈ B. This proves
the result. ♠
A random variable X takes values in various Borel sets according to some probability law
called the probability distribution of X. Clearly the probability distribution of a random
variable X is described by its distribution function and/or by its p.d.f./p.m.f.. In the
following section we will derive probability distribution of function of a random variable,
i.e., we will derive expressions for p.m.f./p.d.f. of functions of random variables.
Theorem 5 Let X be a random variable of discrete type with support SX and p.m.f. fX .
Let h : R → R be a Borel function and let Z : Ω → R be defined by Z(ω) = h(X(ω)), ω ∈
Ω. Then Z is a random variable of discrete type with support SZ = {h(x) : x ∈ SX } and
2
p.m.f.
P
fX (x), if z ∈ SZ
fZ (z) = x∈Az
0, otherwise
(
P ({X ∈ Az }), if z ∈ Sz
= ,
0, otherwise
Proof Since h is a Borel function, using Theorem 3, it follows that Z is a random variable.
Fix z0 ∈ SZ so that z0 = h(x0 ) for some x0 ∈ SX . Then {X = x0 } = {ω ∈ Ω : X(ω) =
x0 } ⊆ {ω ∈ Ω : h(X(ω)) = h(x0 )} = {h(X) = h(x0 )}, and {X ∈ SX } = {ω ∈ Ω : X(ω) ∈
SX } ⊆ {ω ∈ Ω : h(X(ω)) ∈ SZ } = {h(X) ∈ SZ }. Therefore,
and
P ({Z ∈ SZ }) = P ({h(X) ∈ SZ })
≥ P ({X ∈ SX })
= 1.
It follows that P ({Z = z}) > 0, ∀z ∈ SZ , and P ({Z ∈ SZ }) = 1, i.e., Z is a discrete type
random variable with support SZ . Moreover, for z ∈ SZ ,
3
is one-one with inverse function h−1 : D → R, where D = {h(x) : x ∈ R}. Then Z is a
discrete type random variable with support SZ = {h(x) : x ∈ SX } and p.m.f.
(
fX (h−1 (z)), if z ∈ SZ
fZ (z) =
0, otherwise
(
P ({X = h−1 (z)}), if z ∈ SZ
= .
0, otherwise
Show that Z = X 2 is a random variable and find its p.m.f., and distribution function.
4
{−2, −1, 0, 1, 2, 3} and SZ = {0, 1, 4, 9}. Moreover,
5
♠
Show that Z = |X| is a random variable and find its p.m.f., and distribution function.
where n is a positive integer and p ∈ (0, 1). Show that Y = n − X is a random variable
and find its p.m.f., and distribution function.
6
Proof Note that SX = SY = {0, 1, . . . , n} and h(x) = n − x, x ∈ R, is a continuous
function. Therefore Y = n − X is a random variable. For y ∈ SY
The following theorem deals with probability distribution of absolutely continuous type
random variables.
k
X d −1
fT (t) = fX (h−1
j (t))| h (t)|Ih(Sj,X ) (t).
j=1
dt j
7
Proof Let FT be the distribution function of T . For t ∈ R and ∆ ∈ R
Fix j ∈ {1, . . . , k}. First suppose that hj is strictly decreasing on Sj,X . Note that {X ∈
Sj,X } = {h(X) ∈ h(Sj,X )}. Thus, for t belonging to exterior (i.e., excluding the boundary
and interior points) of h(Sj,X ) and appropriately small ∆, we have P ({t < h(X) ≤
t + ∆, X ∈ Sj,X }) = 0. Also, for t belonging to interior of h(Sj,X ) and appropriately small
∆, we have P ({t < h(X) ≤ t + ∆, X ∈ Sj,X }) = P ({h−1 −1
j (t + ∆) ≤ X < hj (t)}). Thus
for all t ∈ R, except those on boundary of h(Sj,X ), we have
d −1
→ −fX (h−1
j (t))( h (t))Ih(Sj,X ) (t), (11)
dt j
d −1
→ fX (h−1
j (t))( h (t))Ih(Sj,X ) (t), (12)
dt j
8
as ∆ → 0. It follows that the distribution function of T is a differentiable everywhere
on R except possibly on boundary points of ST . Now the result follows from Remark 27
(vii)-(viii) of Chapter II and using (13). ♠
. ♠
It may be worth mentioning here that, in view of Remark 27 (viii) of Chapter II, Theorem
10 and Corollary 14 can be applied even in situations where the function h is differentiable
everywhere on SX except possibly at finite number of points.
and let T = X 2 .
(i) Show that T is a random variable of absolutely continuous type;
(ii) Find the distribution function of T and hence find its p.d.f.;
(iii) Find the p.d.f. of T directly (i.e., without finding the distribution function).
Solution (i) & (iii) Clearly T = X 2 is a random variable (being a continuous function of
random variable X). We have SX = ST = (0, ∞). Also h(x) = x2 , x ∈ SX , is a strictly
√
increasing function on SX with inverse function h−1 (x) = x, x ∈ ST . Using Corollary
14, it follows that T = X 2 is a random variable of absolutely continuous type with p.d.f.
( √ √
fX ( t)| dtd ( t)|, if t > 0
fT (t) =
0, otherwise
( √
e−√ t
2 t
, if t > 0
= .
0, otherwise
9
(ii) We have FT (t) = P ({X 2 ≤ t}), t ∈ R. Clearly, for t < 0, FT (t) = P ({X 2 ≤ t}) = 0.
For t ≥ 0
√ √
FT (t) = P ({− t ≤ X ≤ t})
Z √t
= f (x)dx
√ X
− t
√
Z t
= e−x dx
0
√
= 1 − e− t .
|x|
2
, if − 1 < x < 1
x
fX (x) = 3
, if 1 ≤ x < 2 ,
0, otherwise
and let T = X 2 .
(i) Show that T is a random variable of absolutely continuous type;
(ii) Find the distribution function of T and hence find its p.d.f.;
(iii) Find the p.d.f. of T directly (i.e., without finding the distribution function).
Solution (i) & (iii) Clearly T = X 2 is a random variable (being a continuous function of
random variable X). We have SX = (−1, 0) ∪ (0, 2). Also h(x) = x2 , x ∈ SX , is strictly
√
decreasing in S1,X = (−1, 0), with inverse function h−1 1 (t) = − t; h(x) = x2 , x ∈ SX , is
√
strictly increasing in S2,X = (0, 2), with inverse function h−1
1 (t) = t; SX = S1,X ∪ S2,X ;
10
h(S1,X ) = (0, 1) and h(S2,X ) = (0, 4). Using Theorem 10, it follows that T = X 2 is a
random variable of absolutely continuous type with p.d.f.
√ d √ √ d √
fT (t) = fX (− t)| (− t)|I(0,1) (t) + fX ( t)| ( t)|I(0,4) (t)
dt dt
1
2 , if 0 < t < 1
1
= 6
, if 1 < t < 4 .
0, otherwise
(ii) We have FT (t) = P ({X 2 ≤ t}), t ∈ R. Since P ({X ∈ (−1, 2)} = 1, we have
P ({T ∈ (0, 4)}) = 1. Therefore, for t < 0, FT (t) = P ({T ≤ t}) = 0 and, for t ≥ 4,
FT (t) = P ({T ≤ t}) = 1. For t ∈ [0, 4), we have
√ √
FT (t) = P ({− t ≤ X ≤ t})
Z √t
= √ X
f (x)dx
− t
( R √t |x|
√ dx, if 0 ≤ t < 1
= R 1 |x| − t 2 R √t x .
−1 2
dx + 1 3
dx, if 1 ≤ t < 4
♠
Note that a Borel function of a discrete type random variable is a random variable
of discrete type (cf. Theorem 5). Theorem 10 provides sufficient conditions under which
a Borel function of an absolutely continuous type random variable is of absolutely con-
tinuous type. The following example illustrates that, in general, a Borel function of an
11
absolutely continuous type random variable may not be of absolutely continuous type. ♠
and let T = [X], where, for x ∈ R, [x] denotes the largest integer not exceeding x. Show
that T is a random variable of discrete type and find its p.m.f..
Proof For a ∈ R, we have T −1 ((−∞, a]) = (−∞, [a]+1) ∈ B. It follows that T is a random
variable. Also SX = (0, ∞). Since P ({X ∈ SX }) = 1, we have P ({T ∈ {0, 1, 2, . . .}) = 1.
Also, for i ∈ {0, 1, 2, . . .},
It follows that the random variable T is of discrete type with support ST = {0, 1, 2, . . .}
and p.m.f. (
(1 − e−1 )e−t , if t ∈ {0, 1, 2, . . .}
fT (t) = .
0, otherwise
♠
12
n repetitions of the random experiment. Therefore, in line with axiomatic approach to
probability, one may define the mean value (or expected value) of random variable X as
X xfn (x)
E(X) = lim
n→∞
x∈S
n
X
X fn (x)
= x lim
x∈SX
n→∞ n
X
= xP ({X = x})
x∈SX
X
= xfX (x),
x∈SX
provided the involved limits exist and the interchange of signs of summation and limit
is allowed. A similar discussion can be provided for defining the expected value of an
absolutely continuous type random variable, having p.d.f. fX , as
Z ∞
E(X) = xfX (x)dx,
−∞
provided the integral is defined. The above discussion leads to the following definitions.
Definition 18 (i) Let X be a discrete type random variable with p.m.f. fX and support
SX . We say that the expected value of X (denoted by E(X)) is finite and equals
X
E(X) = xfX (x),
x∈SX
P
provided x∈SX |x|fX (x) < ∞.
(ii) Let X be an absolutely continuous type random variable with p.d.f. fX . We say that
the expected value of X (denoted by E(X)) is finite and equals
Z ∞
E(X) = xfX (x)dx,
−∞
R∞
provided −∞
|x|fX (x) < ∞. ♠
13
Remark 19 (i) Since
X X X
| xfX (x)| ≤ |xfX (x)| = |x|fX (x)
x∈SX x∈SX x∈SX
and Z ∞ Z ∞ Z ∞
| xfX (x)dx| ≤ |xfX (x)|dx = |x|fX (x)dx,
−∞ −∞ −∞
it follows that if the expected value of a random variable X is finite then |E(X)| < ∞.
P
(ii) If X is a random variable of discrete type with finite support SX , then x∈SX |x|fX (x) <
∞. Consequently the expected value of X is finite.
(iii) Suppose that X is a random variable of an absolutely continuous type with support
SX ⊆ [−a, a], for some a > 0. Then
Z ∞ Z ∞
|x|fX (x)dx ≤ a fX (x)dx = a.
−∞ −∞
Show that the expected value of X is finite and find its value.
∞
X X j
E(X) = xfX (x) = j
= lim Sn ,
x∈SX j=1
2 n→∞
14
where
n
X j
Sn = (21)
j=1
2j
n
Sn X j
⇒ = j+1
2 j=1
2
n+1
X j−1
= . (22)
j=2
2j
♠
Example 23 Let X be a random variable with p.m.f.
(
3
π 2 x2
, if x ∈ {±1, ±2, ±3, . . .}
fX (x) = .
0, otherwise
e−|x|
fX (x) = , −∞ < x < ∞.
2
Show that the expected value of X is finite and find its value.
15
Solution We have
∞ ∞
e−|x|
Z Z
|x|fX (x)dx = |x| dx
−∞ 2
Z−∞
∞
= xe−x dx
0
= 1.
1 1
fX (x) = · , −∞ < x < ∞.
π 1 + x2
Solution We have
Z ∞
1 ∞ 1 2 ∞ x
Z Z
|x|fX (x)dx = |x| dx = dx = ∞.
−∞ π −∞ 1 + x2 π 0 1 + x2
16
Proof (i)
Case I. X is of absolutely continuous type
Z ∞
E(X) = xfX (x)dx
−∞
Z 0 Z ∞
= xfX (x)dx + xfX (x)dx
−∞ 0
Z 0 Z 0 Z ∞Z x
= − fX (x)dtdx + fX (x)dtdx.
−∞ x 0 0
17
Also
Z 0 Z x1 i−1 Z
X xj+1 Z 0
P ({X < t})dt = P ({X < t})dt + P ({X < t})dt + P ({X < t})dt
−∞ −∞ j=1 xj xi
i−1 Z xj+1
X Z 0
= 0+ P ({X ≤ xj })dt + P ({X ≤ xi })dt
j=1 xj xi
i−1
X
= (xj+1 − xj )P ({X ≤ xj }) − xi P ({X ≤ xi })
j=1
i−1
X i−1
X
= xj+1 P ({X ≤ xj }) − xj P ({X ≤ xj }) − xi P ({X ≤ xi })
j=1 j=1
i
X i−1
X
= xj P ({X ≤ xj−1 }) − xj P ({X ≤ xj }) − xi P ({X ≤ xi })
j=2 j=1
i
X
= − xj P ({X = xj }).
j=1
Therefore
Z ∞ Z 0 ∞
X
P ({X > t})dt − P ({X < t})dt = xj P ({X = xj }) = E(X).
0 −∞ j=1
(ii) Suppose that P ({X ≥ 0}) = 1. Then P ({X < t}) = 0, ∀t ≥ 0 and therefore
Z ∞ Z 0 Z ∞
E(X) = P ({X > t})dt − P ({X < t})dt = P ({X > t})dt.
0 −∞ 0
(iii) Suppose that P ({X ∈ {0, ±1, ±2, . . .}}) = 1. Then, for m ∈ Z (the set of integers)
and m − 1 ≤ t ≤ m, we have P ({X > t}) = P ({X ≥ m}) and P ({X < t}) = P ({X ≤
18
m − 1}). Therefore
Z ∞ ∞ Z
X n
P ({X > t}dt = P ({X > t})dt
0 n=1 n−1
∞ Z
X n
= P ({X ≥ n})dt
n=1 n−1
∞
X
= P ({X ≥ n}),
n=1
Z 0 X∞ Z −n+1
P ({X < t}dt = P ({X < t})dt
−∞ n=1 −n
∞ Z
X −n+1
= P ({X ≤ −n})dt
n=1 −n
X∞
= P ({X ≤ −n}),
n=1
Theorem 27 (i) Let X be a random variable of discrete type with support SX and p.m.f.
fX . Let h : R → R be a Borel function and let T = h(X). Then
X
E(T ) = h(x)fX (x),
x∈SX
provided it is finite.
(ii) Let X be a random variable of absolutely continuous type with p.d.f. fX . Let
h : R → R be a Borel function and let T = h(X). Then
Z ∞
E(T ) = h(x)fX (x)dx,
−∞
provided it is finite.
19
Proof (i) By Theorem 5, T = h(X) is a random variable of discrete type with support
ST = {h(x) : x ∈ SX } and p.m.f.
( P
x∈At P ({X = x}), if t ∈ ST
fT (t) = ,
0, otherwise
(ii) Define At = {x ∈ SX : h(x) > t}, t ≥ 0 and Bs = {x ∈ SX : h(x) < s}, s ≤ 0. For
simplicity we will assume that, for every t ≥ 0 and s ≤ 0, At and Bs are intervals. Then,
using Theorem 26,
Z ∞ Z 0
E(T ) = P ({T > t})dt − P ({T < s})ds
0 −∞
Z ∞Z Z 0 Z
= fX (x)dxdt − fX (x)dxds
0 At −∞ Bs
Z Z h(x) Z Z 0
= fX (x)dtdx − fX (x)dsdx,
A0 0 B0 h(x)
on interchanging the order of integrations in the above integrals and using the following
two immediate observations: (a) t ∈ (0, ∞), x ∈ At ⇔ x ∈ A0 and t ∈ (0, h(x)); (b)
20
s ∈ (−∞, 0), x ∈ Bs ⇔ x ∈ B0 and s ∈ (h(x), 0). Therefore
Z Z
E(T ) = h(x)fX (x)dx + h(x)fX (x)dx
A0 B0
Z
= h(x)fX (x)dx
ZSX∞
= h(x)fX (x)dx,
−∞
Some special kind of expectations which are frequently used are defined below.
♠
The following theorem deals with some elementary properties of expectation.
21
are finite;
(vi) If h1 , . . . , hm are Borel functions then
Xm m
X
E( hi (X)) = E(hi (X)),
i=1 i=1
Proof We will provide the proof for the situation when X is of absolutely continuous.
The proof for the discrete case is analogous and is left as an exercise. Also assertions
(iv)-(vi) follow directly from the definition of the expectation of a random variable and
elementary properties of integrals. Therefore we will provide the proofs of only first three
assertions.
(i) Since P ({h1 (X) ≤ h2 (X)}) = 1, without loss of generality we may take SX ⊆ {x ∈
∗
R : h1 (x) ≤ h2 (x)} (otherwise replace SX by SX = SX ∩ {x ∈ R : h1 (x) ≤ h2 (x)}, so that
∗
P ({X ∈ SX }) = 1). Then h1 (x)ISX (x)fX (x) ≤ h2 (x)ISX (x)fX (x), ∀x ∈ R and, therefore,
Z ∞ Z ∞
E(h1 (X)) = h1 (x)ISX (x)fX (x)dx ≤ h2 (x)ISX (x)fX (x)dx = E(h2 (X)).
−∞ −∞
(ii) Since P ({a ≤ X ≤ b}) = 1, without loss of generality we may assume that SX ⊆ [a, b].
Then aISX (x)fX (x) ≤ xISX (x)fX (x) ≤ bISX (x)fX (x), ∀x ∈ R and therefore
Z ∞ Z ∞ Z ∞
a= aISX (x)fX (x)dx ≤ xISX (x)fX (x)dx ≤ bISX (x)fX (x)dx = b,
−∞ −∞ −∞
i.e., a ≤ E(X) ≤ b.
22
(iii) Since P ({X ≥ 0}) = 1, without loss of generality we may take SX ⊆ [0, ∞). Then
c
(−∞, 0) ⊆ SX = {x ∈ R : fX (x) = 0} and therefore, for n ∈ {1, 2, . . .},
0 = E(X)
Z 0 Z ∞
= xfX (x)dx + xfX (x)dx
−∞ 0
Z ∞
= xfX (x)dx
0
Z ∞
≥ xfX (x)dx
1
n
1 ∞
Z
≥ fX (x)dx
n n1
1 1
= P ({X ≥ })
n n
1
⇒ P ({X ≥ }) = 0, ∀n ∈ {1, 2, . . .}
n
1
⇒ lim P ({X ≥ }) = 0
n→∞ n
∞
[ 1
⇒ P ( {X ≥ }) = 0,
n=1
n
♠
As a consequence of the above theorem we have the following corollary.
Corollary 30 Let X be a random variable with finite first two moments. Let E(X) = µ.
Then,
(i) Var(X) = E(X 2 ) − (E(X))2 ;
(ii) Var(X) ≥ 0. Moreover, Var(X) = 0 if, and only if, P ({X = µ}) = 1;
(iii) E(X 2 ) ≥ (E(X))2 , (Cauchy-Schwarz inequality);
(iv) for real constants a and b, Var(aX + b) = a2 Var(X).
23
Proof (i) Note that µ = E(X) is a fixed real number. Therefore, using Theorem 29
(v)-(vi), we have
(ii) Since P ({(X − µ)2 ≥ 0}) = P (Ω) = 1, using Theorem 29 (i), we have Var(X) =
E((X − µ)2 ) ≥ 0. Also, using Theorem 29 (iii), if Var(X) = E((X − µ)2 ) = 0 then
P ({(X − µ)2 = 0}) = 1, i.e., P ({X = µ}) = 1. Therefore E(X) = µ and E(X 2 ) = µ2 .
Now using (i) we get
Var(X) = E(X 2 ) − (E(X))2 = 0.
24
It follows that E(Y1 ) = 1, E(Y12 ) = 9/4 and Var(Y1 ) = E(Y12 ) − (E(Y1 ))2 = 5/4 (cf.
Corollary 30 (i)).
(ii) We have
Z ∞
E(X) = xfX (x)dx
−∞
Z −1 Z 3 2
x x
= dx + dx
−2 2 0 9
1
=
Z4 ∞
E(e− max(X,0) ) = e− max(X,0) fX (x)dx
−∞
Z −1 Z 3
1 x −x
= dx + e dx
−2 2 0 9
11 − 8e−3
= .
18
25
Solution (i) Fix r ∈ {1, 2, . . . , n}. Using Theorem 27 (i), we have
(ii) Using (i) we get E(X) = E(X(1) ) = np and E(X(X − 1)) = E(X(2) ) = n(n − 1)p2 .
Therefore E(X 2 ) = E(X(X −1)+X) = n(n−1)p2 +np and Var(X) = E(X 2 )−(E(X))2 =
npq.
(iii) For t ∈ R, we have
n
tX
X n x n−x
tx
E(e ) = e p q
x=0
x
n
X n
= (pet )x q n−x
x=0
x
= (q + pet )n .
Therefore
We are familiar with the Laplace transform of a given real-valued function defined
on R and also with the fact that, under certain conditions, the Laplace transform of a
26
function determines the function uniquely. In probability theory the Laplace transform
of a p.d.f./p.m.f. of a random variable X plays an important role and is refereed to as
moment generating function (of probability distribution) of random variable X.
(i) We call the function MX the moment generating function (m.g.f.) of (probability
distribution) of random variable X.
(ii) We say that the m.g.f. of a random variable X exists if there exists a positive real
number a such that (−a, a) ⊆ A (i.e., if MX (t) = E(etX ) is finite in an in an interval
containing 0). ♠
a a
MY (t) = McX+d (t) = E(et(cX+d) ) = etd E(ectX ) = etd MX (ct), t ∈ (− , ),
|c| |c|
Proof We will provide the proof for case where X is of absolutely continuous type. The
proof for the case of discrete type X follows in the similar fashion with integrals signs
27
being replaced by summation signs.
(i) We have E(etX ) < ∞, ∀t ∈ (−a, a). Therefore
Z 0 Z ∞
tx
e fX (x)dx < ∞, ∀t ∈ (−a, a) and etx fX (x)dx < ∞, ∀t ∈ (−a, a)
−∞ 0
Z 0 Z ∞
⇒ e−t|x| fX (x)dx < ∞, ∀t ∈ (−a, a) and et|x| fX (x)dx < ∞, ∀t ∈ (−a, a)
−∞ 0
Z0 Z ∞
⇒ e|t||x| fX (x)dx < ∞, ∀t ∈ (−a, a) and e|t||x| fX (x)dx < ∞, ∀t ∈ (−a, a)
−∞ 0
Z 0 Z ∞
⇒ e|tx| fX (x)dx < ∞, ∀t ∈ (−a, a) and e|tx| fX (x)dx < ∞, ∀t ∈ (−a, a)
Z−∞
∞
0
i.e., E(e|tX| ) < ∞, ∀t ∈ (−a, a). Fix r ∈ {1, 2, . . .} and t ∈ (−a, a) − {0}. Then
r
limx→∞ e|x| r
|tx| = 0 and therefore there exists a positive real number Ar,t such that |x| < e
|tx|
,
whenever |x| > Ar,t . Thus we have
Z ∞
r
E(|X| ) = |x|r fX (x)dx
Z−∞ Z
r
= |x| fX (x)dx + |x|r fX (x)dx
|x|≤Ar,t |x|>Ar,t
Z Z
≤ Arr,t fX (x)dx + e|tx| fX (x)dx
|x|≤Ar,t |x|>Ar,t
Z ∞
≤ Arr,t + e|tx| fX (x)dx
−∞
< ∞.
Under the assumption that MX (t) = E(etX ) < ∞, ∀t ∈ (−a, a), using arguments of
advanced calculus, it can be shown that the derivative can be passed through the integral
28
sign. Therefore
Z ∞
(r) dr
MX (t) = r
etx fX (x)dx
dt
Z ∞ −∞
dr tx
= r
(e )fX (x)dx
−∞ dt
Z ∞
= xr etx fX (x)dx
Z−∞∞
(r)
⇒ MX (0) = xr fX (x)dx
−∞
= E(X r ).
Under the assumption that MX (t) = E(etX ) < ∞, ∀t ∈ (−a, a), using arguments of
advanced calculus, it can be shown that the summation sign can be passed through the
integral sign, i.e.,
∞ r Z ∞
X t
MX (t) = xr fX (x)dx
r=0
r! −∞
∞ r
X t ′
= µ.
r=0
r! r
♠
As a consequence of the above theorem we have the following corollary.
Corollary 35 Under the notation and assumption of Theorem 34, define ψX : (−a, a) →
(1)
R by ψX (t) = ln(MX (t)), t ∈ (−a, a). Then µ′1 = E(X) = ψX (0) and µ2 = Var(X) =
(2) (r)
ψX (0), where ψX denotes the r-th (r ∈ {1, 2, . . .}) derivative of ψX .
29
(r)
Using the facts that MX (0) = 1 and MX (0) = E(X r ), r ∈ {1, 2, . . .} (cf. Theorem 34
(ii)), we get
(1)
(1) MX (0)
ψX (0) = = E(X)
MX (0)
(2) (1)
(2) MX (0)MX (0) − (MX (0))2
ψX (0) = = E(X 2 ) − (E(X))2 = Var(X).
(MX (0))2
where λ > 0.
(i) Find the m.g.f. MX (t), t ∈ A = {s ∈ R : E(esX ) < ∞}, of X. Show that X possess
moments of all orders. Find the mean and variance of X;
(ii) Find ψX (t) = ln(MX (t)), t ∈ A. Hence find the mean and variance of X;
(iii) What are the first four terms in the power series expansion of MX centered at 0.
Solution We have
∞ −λ x ∞
X
tx e λ −λ
X (λet )x t t
MX (t) = E(e ) = tX
e =e = e−λ eλe = eλ(e −1) , ∀t ∈ R.
x=0
x! x=0
x!
Since A = {s ∈ R : E(esX ) < ∞} = R, by Theorem 34 (i), for every r ∈ {1, 2, . . .}, µ′r is
finite. Clearly
(1) t (2) t t
MX (t) = λet eλ(e −1) and MX (t) = λet eλ(e −1) + λ2 e2t eλ(e −1) , t ∈ R.
(1) (2)
Therefore E(X) = MX (0) = λ, E(X 2 ) = MX (0) = λ + λ2 and Var(X) = E(X 2 ) −
(E(X))2 = λ.
(1) (2)
(ii) We have ψX (t) = ln(MX (t)) = λ(et − 1), t ∈ R, ψX (t) = ψX (t) = λet , t ∈ R.
(1) (2)
Therefore E(X) = ψX (0) = λ and Var(X) = ψX (0) = λ.
(3) t t t
(iii) We have MX (t) = λ3 e3t eλ(e −1) + 3λ2 e2t eλ(e −1) + λet eλ(e −1) , t ∈ R. Therefore µ′3 =
(3)
E(X 3 ) = MX (0) = λ3 + 3λ2 + λ. Since A = {s ∈ R : E(esX ) < ∞} = R, by Theorem 34
30
(iii), we have
t2 t3
MX (t) = 1 + µ′1 t + µ′2 + µ′3 + · · ·
2! 3!
t2 t3
= 1 + λt + λ(λ + 1) + λ(λ2 + 3λ + 1) + · · · , t ∈ R.
2! 3!
(i) Find the m.g.f. MX (t), t ∈ A = {s ∈ R : E(esX ) < ∞}, of X. Show that X possess
moments of all orders. Find the mean and variance of X;
(ii) Find ψX (t) = ln(MX (t)), t ∈ A. Hence find the mean and variance of X;
(iii) Expand MX (t) as a power series centered at 0 and hence find E(X r ), r ∈ {1, 2, . . .}.
Solution We have
Z ∞ Z ∞
tX −x
MX (t) = E(e ) =tX
e e dx = e−(1−t)x dx.
0 0
(1) (2)
MX (t) = (1 − t)−2 and MX (t) = 2(1 − t)−3 , t < 1.
(1) (2)
Therefore E(X) = MX (0) = 1, E(X 2 ) = MX (0) = 2 and Var(X) = E(X 2 )−(E(X))2 =
1.
(1)
(ii) We have ψX (t) = ln(MX (t)) = − ln(1 − t), t < 1, ψX (t) = (1 − t)−1 , t < 1 and
(2) (1) (2)
ψX (t) = (1 − t)−2 , t < 1. Therefore E(X) = ψX (0) = 1 and Var(X) = ψX (0) = 1.
(iii) We have
∞
X
−1
MX (t) = (1 − t) = tr , t ∈ (−1, 1).
r=0
31
Example 38 Let X be a random variable with p.d.f.
1 1
fX (x) = · , −∞ < x < ∞.
π 1 + x2
Solution From Example 25 we know that expected value of X is not finite. Therefore,
using Theorem 34 (i), we conclude that m.g.f. of X does not exist. ♠
In the sequel we will see that, under certain conditions, a probability distribution is
uniquely determined by its m.g.f..
Definition 39 Two random variables X and Y are said to have the same distribution
d
(written as X = Y ) if they have the same distribution function, i.e., FX (x) = FY (x), ∀x ∈
R. ♠
d
It follows that if X and Y are of discrete type then X = Y if, and only if, X and Y have
d
the same p.m.f., i.e., fX (x) = fY (x), ∀x ∈ R. If X = Y and G is differentiable everywhere
except possibly on a finite set C then, using Remark 27 (viii) of Chapter II, it follows that
both of X and Y are of absolutely continuous type with a common (version of) p.d.f.
(
G′ (x), if x ∈
/C
g(x) = .
0, otherwise
It follows that if X and Y have distribution functions that are differentiable everywhere
d
except possibly on finite sets, then X = Y if, and only if, there exist versions fX and fY
of p.d.f.s of X and Y , respectively, such that fX (x) = fY (x), ∀x ∈ R.
The following theorem is immediate from the above discussion.
32
Theorem 40 (i) Let X and Y be random variables of discrete type with p.m.f.s fX and
d
fY respectively. Then X = Y if, and only if, fX (x) = fY (x), ∀x ∈ R;
(ii) Let X and Y be random variables having distribution functions that are differentiable
everywhere except possibly on finite sets. Then both of them are of absolutely continuous
d
type. Moreover, X = Y if, and only if, there exist versions of p.d.f.s fX and fY of X and
Y , respectively, such that fX (x) = fY (x), ∀x ∈ R. ♠
i.e., P ({ψ(X) ≤ a}) = P ({ψ(Y ) ≤ a}). It follows that ψ(X) and ψ(Y ) have the same
d
distribution function, i.e., ψ(X) = ψ(Y ).
d
(ii) Suppose that X = Y , i.e., for each x ∈ R, FX (x) = FY (x) = G(x), say. Since the
common distribution function G is differentiable everywhere except possibly on a finite
33
set C (say), we may take their common p.d.f. to be (cf. Remark 27 (viii) of Chapter II)
(
G′ (x), if x ∈
/C
fX (x) = fY (x) = .
0, otherwise
Therefore
Z ∞
E(h(X)) = h(x)fX (x)dx
Z−∞
∞
= h(x)fY (x)dx
−∞
= E(h(Y )).
d
The proof for ψ(X) = ψ(Y ) follows on the lines of proof of (i). ♠
d
where n is a given positive integer. Let Y = n − X. Show that Y = X and hence show
that E(X) = n/2.
(ii) Let X be a random variable with p.d.f.
e−|x|
fX (x) = , −∞ < x < ∞,
2
d
and let Y = −X. Show that Y = X and hence show that E(X) = 0.
Proof (i) Clearly E(X) is finite. Using Example 9 it follows that the p.m.f. of Y = n − X
is given by
34
d
i.e., Y = X. Hence (using Corollary 41 (i))
E(X) = E(Y )
= E(n − X)
= n − E(X)
n
⇒ E(X) = .
2
(ii) Using Corollary 14 it can be shown that the p.d.f. of random variable Y is
e−|y|
fY (y) = = fX (y), −∞ < y < ∞.
2
d
It follows that Y = X and therefore (since E(X) is finite)
E(X) = E(Y )
= −E(X)
⇒ E(X) = 0.
35
Proof For simplicity we will assume that either X is of discrete type or X is of absolutely
continuous type. Moreover if X is of absolutely continuous type then we will assume that
its distribution function is differentiable everywhere except possibly on a finite set.
(i) Let Y1 = X − µ and Y2 = µ − X. The assertion follows from Theorem 40 on noting
that the p.d.f.s/p.m.f.s of Y1 and Y2 are given by fY1 (y) = fX (µ + y), y ∈ R and fY2 (y) =
fX (µ − y), y ∈ R.
(ii) Let Y1 = X − µ and Y2 = µ − X. Then Y1 and Y2 have distribution functions
FY1 (x) = FX (µ + x), x ∈ R and FY2 (x) = 1 − FX ((µ − x)−), x ∈ R. Therefore
d
Y1 = Y2 ⇔ FY1 (x) = FY2 (x), ∀x ∈ R
⇔ FX (µ + x) + FX ((µ − x)−) = 1, ∀x ∈ R.
(iii) Clearly
d
Distribution of X is symmetric about µ ⇔ X − µ = µ − X = −(X − µ)
d
⇔ Y = −Y.
Let X and Y be random variables having the same distribution. Suppose that the
m.g.f. MX exists. Then there exists a positive real number a such that E(etX ) < ∞, ∀t ∈
(−a, a). Using Corollary 41, we conclude that MY exists and MY (t) = MX (t), ∀t ∈
(−a, a). Thus if two random variables have the same distribution then they have the
36
same m.g.f., provided it exists. The following theorem illustrates that the converse is also
true.
Proof We will provide the proof for the special case where X and Y are of discrete type
with SX = SY ⊆ {0, 1, 2, . . .}, as the proof general proof is involved. We have
We know that if two power series match over an interval then they have the same coeffi-
cients. It follows that P ({X = k}) = P ({Y = k}), k ∈ {0, 1, 2, . . .}, i.e., X and Y have
the same p.m.f.. Now the result follows using Theorem 40 (i). ♠
Example 46 Let µ ∈ R and σ > 0 be real constants and let Xµ,σ be a random variable
having p.d.f.
1 (x−µ)2
fXµ,σ (x) = √ e− 2σ2 , −∞ < x < ∞.
σ 2π
(i) Show that fXµ,σ is a p.d.f.;
(ii) Show that the probability distribution function of Xµ,σ is symmetric about µ. Hence
find E(Xµ,σ );
(iii) Find the m.g.f. of Xµ,σ and hence find the mean and variance of Xµ,σ ;
(iv) Let Yµ,σ = aXµ,σ + b, where a 6= 0 and b are real constants. Using the m.g.f. of Xµ,σ ,
show that the p.d.f. of Yµ,σ is
1 (y−(aµ+b))2
fYµ,σ (y) = √ e− 2a2 σ2 , −∞ < y < ∞.
|a|σ 2π
37
Proof (i) Clearly fXµ,σ (x) ≥ 0, ∀x ∈ R. Also
∞ ∞
1
Z Z
(x−µ)2
fXµ,σ (x)dx = √ e− 2σ2 dx
−∞ −∞ σ 2π
Z ∞
1 z2 x−µ
= √ e− 2 dz (on making the transformation σ
= z)
2π −∞
= I, say.
Clearly I ≥ 0 and
Z ∞ Z ∞
2 1 2
− y2 1 2
− z2
I = √ e dy √ e dz
2π −∞ 2π −∞
Z ∞Z ∞
1 y 2 +z 2
= e− 2 dydz.
2π −∞ −∞
38
Now using Theorem 44 (i) and (v) it follows that the distribution of Xµ,σ is symmetric
about µ and E(Xµ,σ ) = µ.
(iii) For t ∈ R
Also, for t ∈ R,
σ 2 t2
ψXµ,σ (t) = ln(MXµ,σ (t)) = µt +
2
(1)
⇒ E(X) = ψXµ,σ (0) = µ
(2)
and Var(X) = ψXµ,σ (0) = σ 2 .
= MXaµ+b,|a|σ (t)
d
⇒ Yµ,σ = Xaµ+b,|a|σ , (using Theorem 45).
Now using Theorem 40 (ii) it follows that the p.d.f. of Yµ,σ is given by
39
♠
Proof From the solution of Example 32 (iii), it is clear that the m.g.f. of Xp is given
Therefore, for t ∈ R,
40
(ii) For t ∈ R
We often come across situations where the probability of a Borel set under a given
probability distribution can not be explicitly evaluated and thus some estimates of the
probability may be desired. For example if a random variable Z has the p.d.f.
1 z2
fZ (z) = √ e− 2 , −∞ < z < ∞,
2π
then ∞
1
Z
z2
P ({Z > 2}) = √ e− 2 dz (48)
2 2π
can not be explicitly evaluated and an estimate of this probability may be desired. To
estimate this probability one has to either resort to numerical integration or use some
other estimation procedure. Inequalities are popular estimation procedures and they play
an important role in probability theory. The following theorem provides an inequality
which can be used for estimating tail probabilities of the type P ({|X| > c}), c ∈ R.
41
Proof We provide the proof for the case where X is of absolutely continuous type. The
proof for the discrete type follows in the similar fashion with integral signs replaced by
summation signs. Fix c > 0 such that g(c) > 0 and define A = {x ∈ R : |x| ≥ c}. Then
Z ∞
E(g(|X|)) = g(|x|)fX (x)dx
Z−∞
∞
≥ g(|x|)IA (x)fX (x)dx (since g(|x|) ≥ g(|x|)IA (x), ∀x ∈ R)
−∞
Z ∞
≥ g(c) IA (x)fX (x)dx (g(|x|)IA (x) ≥ g(c)IA (x), ∀x ∈ R, as g ↑)
−∞
= g(c)P ({X ∈ A})
= g(c)P ({|X| ≥ c})
E(g(|X|))
⇒ P ({|X| ≥ c}) ≤ .
g(c)
♠
As a consequence of the above theorem we have the following corollary.
σ2
P ({|X − µ| ≥ k}) ≤ .
k2
Proof (i) Fix c > 0 and r > 0 and let g(x) = xr , x ≥ 0. Clearly g is a non-negative and
non-decreasing function. Using Theorem 49 we get
E(|X|r )
P ({|X| ≥ c}) ≤ .
cr
E(|X − µ|2 ) σ2
P ({|X − µ| ≥ k}) ≤ = .
k2 k2
42
Example 51 Let us revisit the problem of estimating P ({Z > 2}), given by (48). Using
Example 46 (iii), we have µ = E(Z) = 0 and σ 2 = Var(Z) = 1. Moreover, using Example
d
46 (ii), we have Z = −Z. Consequently P ({Z > 2}) = P ({−Z > 2}) = P ({Z < −2}),
i.e., P ({Z > 2}) = P ({|Z| > 2})/2 = P ({|Z| ≥ 2})/2. Using the Chebyshev inequality
we have
1
P ({|Z| ≥ 2}) ≤ = 0.25,
4
and therefore P ({Z > 2}) ≤ 0.125. The exact value of P ({Z > 2}), obtained using
numerical integration, is 0.0228. ♠
The following example illustrates that the bounds provided in Theorem 49 and Corollary
50 are tight, i.e., the upper bounds provided therein may be attained.
Clearly E(X 2 ) = 1/4 and, therefore, using the Markov inequality we have
1
P ({|X| ≥ 1}) ≤ E(X 2 ) = .
4
1
P ({|X| ≥ 1}) = P ({X ∈ {−1, 1}}) = .
4
It follows that the upper bounds provided in Theorem 49 and Corollary 50 may be at-
tained. ♠
43
We call these measures as descriptive measures. Four prominently used descriptive mea-
sures are: (i) Measures of central tendency (or location), also referred to as averages; (ii)
measures of dispersion; (iii) measures of skewness, and (iv) measures of kurtosis.
1 1
FX (m−) ≤ ≤ FX (m), i.e., P ({X < m}) ≤ ≤ P ({X ≤ m})
2 2
is called the median (of the probability distribution of) X. Clearly if m is the median
of a probability distribution then, in the long run (i.e., when the random experiment
E is repeated a large number of times), the values of X on either side of m in SX are
observed with the same frequency. Thus the median of a probability distribution, in some
sense, divides SX into two equal parts each having the same probability of occurrence.
It is evident that if X is of absolutely continuous type then the median m is given by
FX (m) = 1/2. For some distributions (especially for distributions of discrete type random
variables) it may happen that {x ∈ R : FX (x) = 1/2} = [a, b), for some −∞ < a < b < ∞,
44
so that the median is not unique . In that case P ({X = x}) = 0, ∀x ∈ (a, b) and thus
we take the median to be m = a = inf{x ∈ R : FX (x) ≥ 1/2}. For random variables
having a symmetric probability distribution it is easy to verify that the mean and the
median coincide (Exercise 23). Unlike the mean, the median of a probability distribution
is always defined. Moreover the median is not affected by a few extreme values as it
takes into account only the probabilities with which different values occur and not their
numerical values. As a measure of central tendency the median is preferred over the mean
if the distribution is asymmetric and a few extreme observations are assigned positive
probabilities. However the fact that the median does not at all take into account the
numerical values of X is one of its demerits. Another disadvantage with median is that
for many probability distributions it is not easy to evaluate.
I (c) Mode. Roughly speaking the mode m0 of a probability distribution is the value
that occurs with highest probability and is defined by fX (m0 ) = sup{fX (x) : x ∈ SX }.
Clearly if m0 is the mode of a probability distribution of X then, in the long run, either m0
or a value in the neighborhood of m0 is observed with maximum frequency. Mode is easy
to understand and easy to calculate. Normally, it can be found by just inspection. Note
that a probability distribution may have more than one mode which may be far apart.
Moreover mode does not take into account the numerical values of X and it also does not
take into account the probabilities associated with all the values of X. These are crucial
deficiencies of mode which makes it less preferable over mean and median. A probabil-
ity distribution with one (two/three) mode(s) is called an unimodal (bimodal/trimodal)
distribution. A distribution with multiple modes is called a multimodal distribution.
45
µ = E(X) is the mean (of the probability distribution) of X. The standard deviation (of
√ p
the probability distribution) of X is defined by σ = µ2 = E((X − µ)2 ). Clearly the
variance and the standard deviation give us the idea about the average spread of values
of X around the mean µ. However, unlike the variance, the unit of measurement of stan-
dard deviation is the same as that of X. Because of its simplicity and intuitive appeal,
standard deviation is the most widely used measure of dispersion. Some of the demerits
of standard deviation are that in many situations it may not be defined (distributions for
which second moment is not finite) and that it is sensitive to presence of a few extreme
values of X which are different from other values. A justification for having the mean µ
p
in place of median or any other average in the definition of σ = E((X − µ)2 ) is that
p p
E((X − µ)2 ) ≤ E((X − c)2 ), ∀c ∈ R (Exercise 24 (a)).
II (b) Mean Deviation. Let A be a suitable average. The mean deviation (of probability
distribution) of X around average A is defined by MD(A) = E(|X − A|). Among various
mean deviations, the mean deviation about the median m is more preferable than the
others. A reason for this preference is the fact that for any random variable X MD(m) =
E(|X − m|) ≤ E(|X − c|) = MD(c), ∀ c ∈ R (cf. Exercixe 24 (b)). Since a natural
distance between X and m is |X − m|, as a measure of dispersion, the mean deviation
about median seems to be more appealing than the standard deviation. Although the
mean deviation about median (or mean) has more intuitive appeal than the standard
deviation, in most situations, they are not easy to evaluate. Some of the other demerits
of mean deviations are that in many situations they may not be defined and that they are
sensitive to presence of a few extreme values of X which are different from other values.
II (c) Quartile Deviation. A common drawback with the standard deviation and
mean deviations, as measures of dispersion, is that they are sensitive to presence of a
few extreme values of X. Quartile deviation measures the spread in the middle half of
the distribution and is therefore not influenced by extreme values. Let q1 and q3 be real
numbers such that
1 3
FX (q1 −) ≤ ≤ FX (q1 ) and FX (q3 −) ≤ ≤ FX (q3 )
4 4
1 3
i.e., P ({X < q1 }) ≤ ≤ P ({X ≤ q1 }) and P ({X < q3 }) ≤ ≤ P ({X ≤ q3 }).
4 4
The quantities q1 and q3 are called, respectively, the lower and upper quartiles of the
probability distribution of random variable X. Clearly if q1 , m and q3 are respectively
the lower quartile, the median and the upper quartile of a probability distribution then
46
they divide the probability distribution in four parts so that, in the long run (i.e., when
the random experiment E is repeated a large number of times), twenty five percent of the
observed values of X are expected to be less than q1 , fifty percent of the observed values of
X are expected to be less than m and seventy five percent of the observed values of X are
expected to be less than q3 . The quantity IQR = q3 −q1 is called the inter-quartile range of
the probability distribution of X and the quantity QD = (q3 − q1 )/2 is called the quartile
deviation or the semi-inter-quartile range of the probability distribution of X. It is evident
that if X is of absolutely continuous type then q1 and q3 are given by FX (q1 ) = 1/4 and
FX (q3 ) = 3/4. For some distributions (especially for distributions of discrete type random
variables) it may happen that {x ∈ R : FX (q1 ) = 1/4} = [a, b) and/or {x ∈ R : FX (q3 ) =
3/4} = [c, d) for some −∞ < a < b < ∞ and −∞ < a < b < ∞, so that q1 and/or q3
are not uniquely defined. In that case P ({X = x}) = 0, ∀x ∈ (a, b)/(c, d) and thus we
take q1 = a = inf{x ∈ R : FX (x) ≥ 1/4} and/or q3 = c = inf{x ∈ R : FX (x) ≥ 3/4}.
For random variables having a symmetric probability distribution it is easy to verify that
m = (q1 + q3 )/2 (cf. Exercise 23). Although, unlike the standard deviation and the mean
deviation, quartile deviation is not sensitive to presence of some extreme values of X a
major drawback with the quartile deviation is that it ignores the tails of the probability
distribution (which constitute 50% of the probability distribution). Note that the quartile
deviation depends on the units of measurement of random variable X and thus it may
not be an appropriate measure for comparing dispersions of two probability distributions.
For comparing dispersion of two different probability distributions a normalized measure
such as q3 −q1
2 q3 − q 1
CQD = q3 +q =
2
1
q3 + q1
seems to be more appropriate. The quantity CQD is called the coefficient of quartile
deviation of the probability distribution of X. Clearly the coefficient of quartile deviation
is independent of units and thus it can be used to compare dispersions of two different
probability distributions.
II (d) Coefficient of Variation. Like quartile deviation, standard deviation σ also
depends on the units of measurement of random variable X and thus it is not an appro-
priate measure for comparing dispersions of two different probability distributions. Let
µ and σ, respectively, be the mean and the standard deviation of the distribution of X.
Suppose that µ 6= 0. The coefficient of variation of the probability distribution of X is
defined by
σ
CV = ,
µ
47
Clearly the coefficient of variation measures the variation per unit of mean and is inde-
pendent of units. Therefore it seems to be an appropriate measure to compare dispersion
of two different probability distributions. A disadvantage with the coefficient of variation
is that when the mean µ is close to zero it is very sensitive to small changes in the mean.
E((X − µ)3 ) µ3
β1 = E(Z 3 ) = 3
= 3.
σ µ22
The quantity β1 is simply called the coefficient of skewness. Clearly for symmetric dis-
tributions β1 = 0 (cf. Theorem 44 (vi)). However the converse may not be true, i.e.,
there are examples of skewed probability distributions for which β1 = 0. A large positive
value of β1 indicates that the distribution is positively skewed and a small negative value
of β1 indicates that the data is negatively skewed. A measure of skewness can also be
based on quartiles. Let q1 , m, q3 and µ denote respectively the lower quartile, the median,
48
the upper quartile and the mean of the probability distribution of X. We know that
for random variables having a symmetric probability distribution µ = m = (q1 + q3 )/2,
i.e., q3 − m = m − q1 . For positively (negatively) skewed distribution we will have
(q3 − m) > (<) (m − q1 ). Thus one may also define a measure of skewness based on
(q3 − m) − (m − q1 ) = q3 − 2m + q1 . To make this quantity independent of units one may
consider
q3 − 2m + q1
β2 =
q3 − q1
as a measure of skewness. The quantity β2 is called the Yule coefficient of skewness.
1 (x−µ)2
fYµ,σ (x) = √ e− 2σ2 , −∞ < x < ∞.
σ 2π
We have seen (cf. Example 46 (iii)) that µ and σ 2 are respectively the mean and the
variance of the distribution of Yµ,σ . We call the probability distribution corresponding to
p.d.f. fYµ,σ as the normal distribution with mean µ and variance σ 2 (denoted by N (µ, σ 2 )).
We know that N (µ, σ 2 ) distribution is symmetric about µ (cf. Example 46 (ii)). Also it
is easy to verify that N (µ, σ 2 ) distribution is unimodal with µ as the common value of
mean, median and mode. Kurtosis of the probability distribution of X is a measure of
peakedness and thickness of tails of p.d.f. of X relative to the peakedness and thickness
of tails of the p.d.f. of normal distribution. A distribution is said to have higher (lower)
kurtosis than the normal distribution if its p.d.f., in comparison with the p.d.f. of a
normal distribution, has a sharper (rounded) peak and longer, fatter (shorter, thinner)
tails. Let µ and σ, respectively, be the mean and the standard deviation of distribution
of X and let Z = (X − µ)/σ be the standardized variable. A measure of kurtosis of the
probability distribution of X is defined by
E((X − µ)4 ) µ4
γ1 = E(Z 4 ) = 4
= 2.
σ µ2
γ2 = γ1 − 3
49
is called the excess kurtosis of the distribution of X. It is clear that for normal distributions
the excess kurtosis is zero. Distributions with zero excess kurtosis are called mesokurtic.
A distribution with positive (negative) excess kurtosis is called leptokurtic (platykurtic).
Clearly a leptokurtic (platykurtic) distribution has sharper (rounded) peak and longer,
fatter (shorter, thinner) tails.
Example 53 For α ∈ [0, 1], let Xα be a random variable having the p.d.f.
(
αex , if x < 0
fα (x) = .
(1 − α)e−x , if x ≥ 0
50
(r − 1)Ir−1 , r ∈ {2, 3, . . .}. On successively using this relationship we get
Z ∞
Ir = e−x xr−1 dx = (r − 1)!, r ∈ {1, 2, . . .}.
0
(ii) Let p ∈ (0, 1) and let ξp be such that Fα (ξp ) = p. Note that
Z 0
Fα (0) = α ex dx = α.
−∞
51
i.e., ξp = − ln(α/p). Combining the two cases we get
(
ln( 1−α
1−p
), if 0 ≤ α < p
ξp = .
− ln( αp ), if p ≤ α ≤ 1
(iii) We have (
ln( 4(1−α)
3
), if 0 ≤ α < 41
q1 (α) = ξ 1 = ,
4
− ln(4α), if 41 ≤ α ≤ 1
(
ln(2(1 − α)), if 0 ≤ α < 21
m(α) = ξ 1 = ,
2
− ln(2α), if 21 ≤ α ≤ 1
and (
ln(4(1 − α)), if 0 ≤ α < 34
q3 (α) = ξ 3 = .
4
− ln( 4α
3
), if 34 ≤ α ≤ 1
(iv) From (i) µ′1 (α) = E(Xα ) = 1 − 2α. Clearly the mode m0 (α) of the distribution of Xα
is
m0 (α) = sup{fα (x); −∞ < x < ∞} = max{α, 1 − α}.
(v) Using (i) we have µ′1 (α) = E(Xα ) = 1 − 2α and µ′2 (α) = E(Xα2 ) = 2. It follows that
the standard deviation of the distribution of Xα is
p p √
σ(α) = Var(Xα ) = µ′2 (α) − (µ′1 (α))2 = 1 + 4α − 4α2 .
Note that, for 0 ≤ α < 1/2, m(α) = ln(2(1−α)) ≥ 0 and, for α > 1/2, m(α) = − ln(2α) <
0. Thus for the evaluation of the mean deviation about the median the following cases
arise:
Case I. 0 ≤ α < 21 (so that m(α) ≥ 0)
52
1
Case II. 2
≤ α ≤ 1 (so that m(α) ≤ 0)
− ln16α3 2 , 3
if 4
≤α≤1
ln( 3 )
53
(vi) We have
µ3 (α) = E((Xα − µ′1 )3 ) = µ′3 (α) − 3µ′1 (α)µ′2 (α) + 2(µ′1 (α))3 = 2(1 − 2α)3 .
Therefore
µ3 (α) 2(1 − 2α)3
β1 (α) = =√ .
σ(α) 1 + 4α − 4α2
Using (iii) the Yule coefficient of skewness is
Clearly, for 0 ≤ α < 1/2, βi (α) > 0, i = 1, 2 and, for 1/2 < α ≤ 1, βi (α) < 0, i = 1, 2.
It follows that the probability distribution of Xα is positively skewed if 0 ≤ α < 1/2 and
negatively skewed if 1/2 < α ≤ 1. For α = 1/2, fα (x) = fα (−x), ∀x ∈ R. Thus, for
α = 1/2, the probability distribution of Xα is symmetric about zero.
(vii) We have
Therefore
µ4 (α) 24 − 12(1 − 2α)2 − 3(1 − 2α)4
γ1 (α) = =
(µ2 (α))2 (2 − (1 − 2α)2 )2
and
12 − 6(1 − 2α)4
γ2 (α) = γ1 (α) − 3 = .
(2 − (1 − 2α)2 )2
Clearly, for any α ∈ [0, 1], γ2 (α) > 0. It follows that, for any value of α ∈ [0, 1], the
distribution of Xα is leptokurtic. ♠
Exercises
1. Let (Ω, A, P ) be a probability space and let X : Ω → R, and h : RX → R be given
54
functions, where RX = {X(ω) : ω ∈ Ω}. Assuming that there exists a set S ⊆ R
such that S ∈/ B, using appropriate examples, show that
(a) X may not be a random variable;
(b) if X is a random variable then h(X) may not be a random variable.
here n is a positive integer and p ∈ (0, 1). Find the p.m.f.s of random variables
√
Y1 = n − X, Y2 = X 2 and Y3 = X.
Find the distribution function of of Y = X/(X + 1) and hence determine the p.m.f.
of Y .
6. Let the random variable X have the p.d.f. fX (·). Find the distribution functions
and hence the p.d.f.s (provided they exist) of X + = max(X, 0), X − = max(−X, 0),
Y1 = |X| and Y2 = X 2 in each of the following cases:
55
1
(
3
, if − 2 < x < −1
1+x x2
2
, if − 1 < x < 1
65
, if − 1 < x < 4
(a) fX (x) = ; (b) fX (x) = 2x
.
0, otherwise
27
, if 4 < x < 5
0, otherwise
Find the p.d.f. and the the distribution function of the random variable Y =
−2 ln X 4 .
11. Let X be a random variable with pdf fX given in Problem 10 and let Y = min(X, 1/2).
(a) Is X of continuous type?
56
(b) Examine whether or not X is of discrete or absolutely continuous type.
1 x2
fX (x) = √ e− 2 , −∞ < x < ∞.
2π
and let Y = 1 − X 2 .
(a) Find the distribution function of Y and hence find its p.d.f.;
(b) Find the p.d.f. of Y directly (i.e., without finding the distribution function).
57
16. Let the random variable X have the p.d.f.
(
6x(1 − x), if 0 < x < 1
fX (x) = ,
0, otherwise
18. In three independent tosses of a fair coin let X denote the number of tails appearing.
Let Y = X 2 and Z = 2X 2 + 1. Find the mean and variance of random variables Y
and Z.
58
Find the expected value of Y = X 2 − 5X + 3.
22. Let E(|X|β ) < ∞ for some β > 0. Then show that E(|X|α ) < ∞, ∀ α ∈ (0, β].
23. Let X be an absolutely continuous random variable with p.d.f. fX (x) that is sym-
metric about µ (∈ R), i.e., fX (x + µ) = fX (µ − x), ∀ x ∈ (−∞, ∞). Show that
µ is the median of the probability distribution of X and µ = (q1 + q3 )/2, where q1
and q3 are respectively the lower and the upper quartiles of the distribution of X.
Further, if E(X) is finite, then show that E(X) = µ.
24. (a) For any random variable X having the mean µ and finite second moment, show
that E((X − µ)2 ) ≤ E((X − c)2 ), ∀c ∈ R;
25. (a) Let X be a random variable with finite expectation. Show that limx→−∞ xFX (x) =
limx→∞ [x(1 − FX (x))] = 0, where FX is the distribution function of X.
(b) Let X be a random variable with limx→∞ [xα P (|X| > x)] = 0, for some α > 0.
Show that E(|X|β ) < ∞, ∀ β ∈ (0, α). What about E(|X|α )?
26. (a) Let X be a non-negative random variable (i.e., SX ⊆ [0, ∞)) of absolutely
continuous type and let h be a real-valued function defined on (0, ∞). Define ψ(x) =
Rx
0
h(t)dt, x ≥ 0, and suppose that h(x) ≥ 0, ∀ x ≥ 0. Show that
Z ∞
E(ψ(X)) = h(y)P (X > y)dy.
0
(b)Let α be a positive real number. Under the assumptions of (a), show that
Z ∞
α
E(X ) = α xα−1 P (X > x)dx.
0
(c) Let F (0) = G(0) = 0 and let F (t) ≥ G(t), ∀ t > 0, where F and G are distribu-
tion functions of absolutely continuous type non-negative random variables X and
Y , respectively. Show that E(X k ) ≤ E(Y k ), ∀ k > 0, provided the expectations
exist.
√ √
27. Consider a target comprising of three concentric circles of radii 1/ 3, 1, 3 feet.
Shots within the inner circle earn 4 points, within the next ring 3 points and within
59
the third ring 2 points. Shots outside the target do not earn any point. Let X
denote the distance (in feet) of the hit from the centre and suppose that X has the
p.d.f. (
2
π(1+x2 )
, if x > 0
fX (x) = .
0, otherwise
Find the expected score in a single shot.
28. (a) Find the moments of the random variable that has the m.g.f. M (t) = (1 − t)−3 ,
t < 1.
(b) Let the random variable X have the m.g.f.
e−t et e2t 1 3t
M (t) = + + + e .
8 4 8 2
2
Find the distribution function of X and find P (X = 1).
(c) If the m.g.f. of a random variable X is
et − e−2t
M (t) = , for t 6= 0,
3t
29. Let X be a random variable with m.g.f. M (t), −h < t < h. Prove that P (X ≥
a) ≤ e−at M (t), 0 < t < h and P (X ≤ a) ≤ e−at M (t), −h < t < 0.
30. (a) Let X be a random variable such that P (X ≤ 0) = 0 and let µ = E(X) is finite.
Show that P (X ≥ 2µ) ≤ 0.5.
(b) If X is a random variable such that E(X) = 3 and E(X 2 ) = 13, then determine
a lower bound for P (−2 < X < 8).
Using this p.m.f., show that the bound for Chebyshev’s inequality cannot be im-
proved (without additional assumptions)
32. Suppose that one is interested in estimating the proportion p ∈ (0, 1) of females in
a big population. To do so a random sample of size n is taken from the population
60
and the proportion of females in this sample is taken to be an estimate of p. If one
wants to be 90% sure that the estimate is within 0.1 units of the true proportion,
what should be the sample size?
33. Let m and n be positive integers and let x1 , . . . , xn be positive real numbers. Using
the fact that, for a random variable X, E(X 2 ) ≥ (E(X))2 , show that
Xn Xn Xn
m+1 2 2m+1
( xi ) ≤ ( xi )( xi ).
i=1 i=1 i=1
34. For µ ∈ R and λ > 0, let Xµ,λ be a random variable having the p.d.f.
(
1 − x−µ
λ
e λ , if x ≥ µ
fµ,σ (x) = .
0, otherwise
(a) Find Cr (µ, λ) = E((X − µ)r ), r ∈ {1, 2, . . .} and µ′r (µ, λ) = E(Xµ,λ
r
), r ∈ {1, 2};
(b) For p ∈ (0, 1), find the p-th quantile ξp ≡ ξp (µ, λ) of the distribution of Xµ,λ
(Fµ,λ (ξp ) = p, where Fµ,λ is the distribution function of Xµ,λ );
(c) Find the lower quartile q1 (µ, λ), the median m(µ, λ) and the upper quartile
q3 (µ, λ) of the distribution of Xµ,λ ;
(e) Find the standard deviation σ(µ, λ), the mean deviation about median MD(m(µ, λ)),
the inter-quartile range IQR(µ, λ), the quartile deviation (or semi-inter-quartile
range) QD(µ, λ), the coefficient of quartile deviation CQD(µ, λ) and the coefficient
of variation CV(µ, λ) of the distribution of Xµ,λ ;
(f) Find the coefficient of skewness β1 (µ, λ) and the Yule coefficient of skewness
β2 (µ, λ) of the distribution of Xµ,λ ;
(h) Based on values of measures of skewness and the kurtosis of the distribution of
Xµ,λ , comment on the shape of fµ,σ .
35. For any values of µ ∈ R and σ > 0, show that the kurtosis of N (µ, σ 2 ) distribution
is β2 = 3.
61