0% found this document useful (0 votes)
11 views61 pages

Random Variable Functions Explained

Statistics

Uploaded by

vree1456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views61 pages

Random Variable Functions Explained

Statistics

Uploaded by

vree1456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter III

Function of a Random Variable and Its Distribution

1 Function of a Random Variable


Let (Ω, A, P ) be a probability space and let X be a random variable defined on the
probability space (Ω, A, P ). Let RX = {X(ω) : ω ∈ Ω}, h : RX → R be a given function
and let Z : Ω → R be a function of random variable X, defined by Z(ω) = h(X(ω)), ω ∈ Ω.
Some times we may be interested in studying probabilistic properties of Z, which is a
function of a random variable X. Since the random variable Z takes values in R, to study
the probabilistic properties of Z, it is important that Z −1 (B) ∈ B, ∀B ∈ B, i.e., Z is
a random variable. Here B denotes the Borel sigma-field. The following lemma will be
useful in deriving conditions on the function h such that Z(ω) = h(X(ω)), ω ∈ Ω is a
random variable.

Lemma 1 Let X : Ω → R and h : R → R be given functions. Define Z : Ω → R by


Z(ω) = h(X(ω)), ω ∈ Ω. Then, for any B ∈ B,

Z −1 (B) = X −1 (h−1 (B)).

Proof Fix B ∈ B. Note that h−1 (B) = {x ∈ R : h(x) ∈ B}. Clearly h(X(ω)) ∈ B ⇔
X(ω) ∈ h−1 (B) and therefore

Z −1 (B) = {ω ∈ Ω : Z(ω) ∈ B}
= {ω ∈ Ω : h(X(ω)) ∈ B}
= {ω ∈ Ω : X(ω) ∈ h−1 (B)}
= X −1 (h−1 (B)).

Definition 2 A function h : R → R is said to be a Borel function if h−1 (B) ∈ B, ∀B ∈


B. ♠

The following theorem provides condition under which a function h(X) of a random
variable X is a random variable.

1
Theorem 3 Under the notation of Lemma 1, Z is a random variable provided h is a
Borel function.

Proof Fix B ∈ B. Since h is a Borel function we have h−1 (B) ∈ B and now using the
fact that X is a random variable it follows that Z −1 (B) = X −1 (h−1 (B)) ∈ B. This proves
the result. ♠

Remark 4 Let h : R → R be a continuous function. According to a standard result in


calculus inverse image of any open interval (a, b), −∞ ≤ a < b ≤ ∞, under continuous
function h is a countable union of disjoint open intervals. Since B contains all open
intervals and is closed under countable unions it follows that h−1 ((a, b)) ∈ B, whenever
−∞ ≤ a < b ≤ ∞. Now employing the arguments used in the proof of Theorem 4
in Chapter II we get h−1 (B) ∈ B, ∀B ∈ B. It follows that any continuous function
h : R → R is a Borel function and thus, in view of Theorem 3, any continuous function
of a random variable is a random variable. In particular if X is a random variable then
X 2 , |X|, max(X, 0), sin X and cos X are random variables. ♠

A random variable X takes values in various Borel sets according to some probability law
called the probability distribution of X. Clearly the probability distribution of a random
variable X is described by its distribution function and/or by its p.d.f./p.m.f.. In the
following section we will derive probability distribution of function of a random variable,
i.e., we will derive expressions for p.m.f./p.d.f. of functions of random variables.

2 Probability Distribution of a Function of a Random


Variable
In our future discussions when we refer to a random variable, unless otherwise stated,
it will be either of discrete type or of absolutely continuous type. The probability dis-
tribution of a discrete type random variable will be referred to as discrete (probability)
distribution and the probability distribution of a random variable of absolutely continuous
type will be referred to as absolutely continuous (probability) distribution. The following
theorem deals with discrete probability distributions.

Theorem 5 Let X be a random variable of discrete type with support SX and p.m.f. fX .
Let h : R → R be a Borel function and let Z : Ω → R be defined by Z(ω) = h(X(ω)), ω ∈
Ω. Then Z is a random variable of discrete type with support SZ = {h(x) : x ∈ SX } and

2
p.m.f.
 P
 fX (x), if z ∈ SZ
fZ (z) = x∈Az
 0, otherwise
(
P ({X ∈ Az }), if z ∈ Sz
= ,
0, otherwise

where Az = {x ∈ SX : h(x) = z}.

Proof Since h is a Borel function, using Theorem 3, it follows that Z is a random variable.
Fix z0 ∈ SZ so that z0 = h(x0 ) for some x0 ∈ SX . Then {X = x0 } = {ω ∈ Ω : X(ω) =
x0 } ⊆ {ω ∈ Ω : h(X(ω)) = h(x0 )} = {h(X) = h(x0 )}, and {X ∈ SX } = {ω ∈ Ω : X(ω) ∈
SX } ⊆ {ω ∈ Ω : h(X(ω)) ∈ SZ } = {h(X) ∈ SZ }. Therefore,

P ({Z = z0 }) = P ({h(X) = h(x0 )})


≥ P ({X = x0 })
> 0 (since x0 ∈ SX ),

and

P ({Z ∈ SZ }) = P ({h(X) ∈ SZ })
≥ P ({X ∈ SX })
= 1.

It follows that P ({Z = z}) > 0, ∀z ∈ SZ , and P ({Z ∈ SZ }) = 1, i.e., Z is a discrete type
random variable with support SZ . Moreover, for z ∈ SZ ,

P ({Z = z}) = P ({ω ∈ Ω : h(X(ω)) = z}


X
= P ({X = x})
x∈Az
X
= fX (x).
x∈Az

Hence the result follows. ♠


The following corollary is an immediate consequence of the above theorem.

Corollary 6 Under the notation and assumptions of Theorem 5, suppose that h : R → R

3
is one-one with inverse function h−1 : D → R, where D = {h(x) : x ∈ R}. Then Z is a
discrete type random variable with support SZ = {h(x) : x ∈ SX } and p.m.f.
(
fX (h−1 (z)), if z ∈ SZ
fZ (z) =
0, otherwise
(
P ({X = h−1 (z)}), if z ∈ SZ
= .
0, otherwise

Example 7 Let X be a random variable with p.m.f.



1

 7
, if x ∈ {−2, −1, 0, 1}
3
fX (x) = 14
, if x ∈ {2, 3} .

0, otherwise

Show that Z = X 2 is a random variable and find its p.m.f., and distribution function.

Solution Since h(x) = x2 , x ∈ R, is a continuous function and X is a random variable,


using Remark 4, it follows that Z = h(X) = X 2 is a random variable. Clearly SX =

4
{−2, −1, 0, 1, 2, 3} and SZ = {0, 1, 4, 9}. Moreover,

P ({Z = 0}) = P ({X ∈ {x ∈ SX : x2 = 0})


= P ({X = 0})
1
= ,
7
P ({Z = 1} = P ({X ∈ {x ∈ SX : x2 = 1})
= P ({X ∈ {−1, 1})
1 1
= +
7 7
2
= ,
7
P ({Z = 4} = P ({X ∈ {x ∈ SX : x2 = 4})
= P ({X ∈ {−2, 2})
1 3
= +
7 14
5
= ,
14
P ({Z = 9} = P ({X ∈ {x ∈ SX : x2 = 9})
= P ({X = 3})
3
= .
14

Therefore, the p.m.f. of Z is



1

 7
, if z = 0

2
, if z = 1


7


5
fZ (z) = 14
, if z = 4 ,

3
, if z = 9


14



0, otherwise

and the distribution function of Z is




 0, if z<0

1
 7, if 0≤z<1



3
FZ (z) = 7
, if 1≤z<4 .

11
, if 4≤z<9


14



1, if z≥9

5

Example 8 Let X be a random variable with p.m.f.


(
|x|
2550
, if x ∈ {±1, ±2, . . . , ±50}
fX (x) = .
0, otherwise

Show that Z = |X| is a random variable and find its p.m.f., and distribution function.

Solution As h(x) = |x|, x ∈ R, is a continuous function and X is a random variable,


using Remark 4, Z = |X| is a random variable. We have SX = {±1, ±2, . . . , ±50} and
SZ = {1, 2, . . . , 50}. Moreover, for z ∈ SZ ,

P ({Z = z}) = P ({X ∈ {x ∈ SZ : |x| = z})


= P ({X ∈ {−z, z}})
| − z| |z|
= + ,
2550 2550
z
= .
1275

Therefore the p.d.f of Z is


(
z
1275
, if z ∈ {1, 2, . . . , 50}
fZ (z) = ,
0, otherwise

and the distribution function of Z is





 0, if z<1
 1 ,

if 1≤z<2
1275
FZ (z) = i(i+1) .


 2550
, if i ≤ z < i + 1, i = 2, 3, . . . , 49

 1, if z ≥ 50

Example 9 Let X be a random variable with p.m.f.


(
n

x
px (1 − p)n−x , if x ∈ {0, 1, . . . , n}
fX (x) = ,
0, otherwise

where n is a positive integer and p ∈ (0, 1). Show that Y = n − X is a random variable
and find its p.m.f., and distribution function.

6
Proof Note that SX = SY = {0, 1, . . . , n} and h(x) = n − x, x ∈ R, is a continuous
function. Therefore Y = n − X is a random variable. For y ∈ SY

P ({Y = y}) = P ({X ∈ {x ∈ SY : n − X = y})


= P ({X = n − y})
 
n
= pn−y (1 − p)y
n−y
 
n
= (1 − p)y pn−y .
y

Thus the p.m.f. of Y is


(
n

y
(1 − p)y pn−y , if y ∈ {0, 1, . . . , n}
fY (y) = ,
0, otherwise

and the distribution function of Y is





 0, if y < 0
pn

 , if 0 ≤ y < 1
FY (y) = Pi n
 j n−j
.
j=0 j (1 − p) p , if i ≤ y < i + 1, i = 1, 2, . . . n − 1




 1, if y ≥ n

The following theorem deals with probability distribution of absolutely continuous type
random variables.

Theorem 10 Let X be a random variable of absolutely continuous type with support


SX and p.d.f. fX . Let S1,X , S2,X , . . . , Sk,X be intervals such that Si,X ∩ Sj,X = φ, i 6= j,
and ∪ki=1 Si,X = SX . Let h : R → R be a Borel such that, on each Si,X (i = 1, . . . , k),
h : Si,X → R is strictly monotone and differentiable with inverse function h−1 i . Let
h(Sj,X ) = {h(x) : x ∈ Sj,X }, j = 1, . . . , k, so that h(Sj,X ), j = 1, . . . , k, is an interval.
Then the random variable T = h(X) is of absolutely continuous type with p.d.f.

k
X d −1
fT (t) = fX (h−1
j (t))| h (t)|Ih(Sj,X ) (t).
j=1
dt j

7
Proof Let FT be the distribution function of T . For t ∈ R and ∆ ∈ R

FT (t + ∆) − FT (t) P ({t < h(X) ≤ t + ∆})


=
∆ ∆
k
X P ({t < h(X) ≤ t + ∆, X ∈ Sj,X })
= .
j=1

Fix j ∈ {1, . . . , k}. First suppose that hj is strictly decreasing on Sj,X . Note that {X ∈
Sj,X } = {h(X) ∈ h(Sj,X )}. Thus, for t belonging to exterior (i.e., excluding the boundary
and interior points) of h(Sj,X ) and appropriately small ∆, we have P ({t < h(X) ≤
t + ∆, X ∈ Sj,X }) = 0. Also, for t belonging to interior of h(Sj,X ) and appropriately small
∆, we have P ({t < h(X) ≤ t + ∆, X ∈ Sj,X }) = P ({h−1 −1
j (t + ∆) ≤ X < hj (t)}). Thus
for all t ∈ R, except those on boundary of h(Sj,X ), we have

P ({t < h(X) ≤ t + ∆, X ∈ Sj,X }) P ({h−1 −1


j (t + ∆) ≤ X < hj (t)})Ih(Sj,X ) (t)
=
∆ "Z −1 ∆#
hj (t)
1
= fX (z)dz Ih(Sj,X ) (t)
∆ h−1 j (t+∆)

d −1
→ −fX (h−1
j (t))( h (t))Ih(Sj,X ) (t), (11)
dt j

as ∆ → 0. Similarly if hj is strictly increasing on Sj,X then for all t ∈ R, except those on


boundary of h(Sj,X ), we have

P ({t < h(X) ≤ t + ∆, X ∈ Sj,X }) P ({h−1 −1


j (t) < X ≤ hj (t + ∆)})Ih(Sj,X ) (t)
=
∆ "Z −1 ∆ #
hj (t+∆)
1
= fX (z)dz Ih(Sj,X ) (t)
∆ h−1 j (t)

d −1
→ fX (h−1
j (t))( h (t))Ih(Sj,X ) (t), (12)
dt j

as ∆ → 0. Note that if h is strictly decreasing (increasing) on Sj,X then dtd h−1


j (t) < (>)
0. Now on combining (11) and (12) we get for all t ∈ R, except those on boundary of
h(Sj,X ),

P ({t < h(X) ≤ t + ∆, X ∈ Sj,X )}) d −1


→ fX (h−1
j (t))| h (t)|Ih(Sj,X ) (t), as ∆ → 0
∆ dt j
k
FT (t + ∆) − FT (t) X d −1
⇒ → fX (h−1
j (t))| h (t)|Ih(Sj,X ) (t), (13)
∆ j=1
dt j

8
as ∆ → 0. It follows that the distribution function of T is a differentiable everywhere
on R except possibly on boundary points of ST . Now the result follows from Remark 27
(vii)-(viii) of Chapter II and using (13). ♠

The following corollary to Theorem 10 is immediate.

Corollary 14 Let X be a random variable of absolutely continuous type with support SX


and p.d.f. fX . Let h : R → R be a Borel such that h is differentiable and strictly monotone
on SX (i.e., either h′ (x) < 0, ∀x ∈ SX or h′ (x) > 0, ∀x ∈ SX ). Let ST = {h(x) : x ∈ SX }.
Then T = h(X) is a random variable of absolutely continuous type with p.d.f.
(
fX (h−1 (t))| dtd h−1 (t)|, if t ∈ ST
fT (t) = .
0, otherwise

. ♠

It may be worth mentioning here that, in view of Remark 27 (viii) of Chapter II, Theorem
10 and Corollary 14 can be applied even in situations where the function h is differentiable
everywhere on SX except possibly at finite number of points.

Example 15 Let X be a random variable with p.d.f.


(
e−x , if x > 0
fX (x) = ,
0, otherwise

and let T = X 2 .
(i) Show that T is a random variable of absolutely continuous type;
(ii) Find the distribution function of T and hence find its p.d.f.;
(iii) Find the p.d.f. of T directly (i.e., without finding the distribution function).

Solution (i) & (iii) Clearly T = X 2 is a random variable (being a continuous function of
random variable X). We have SX = ST = (0, ∞). Also h(x) = x2 , x ∈ SX , is a strictly

increasing function on SX with inverse function h−1 (x) = x, x ∈ ST . Using Corollary
14, it follows that T = X 2 is a random variable of absolutely continuous type with p.d.f.
( √ √
fX ( t)| dtd ( t)|, if t > 0
fT (t) =
0, otherwise
( √
e−√ t
2 t
, if t > 0
= .
0, otherwise

9
(ii) We have FT (t) = P ({X 2 ≤ t}), t ∈ R. Clearly, for t < 0, FT (t) = P ({X 2 ≤ t}) = 0.
For t ≥ 0
√ √
FT (t) = P ({− t ≤ X ≤ t})
Z √t
= f (x)dx
√ X
− t

Z t
= e−x dx
0

= 1 − e− t .

Therefore the distribution function of T is


(
0, if t < 0
FT (t) = √ .
1 − e− t , if t ≥ 0

Clearly FT is differentiable everywhere except at t = 0. Therefore, using Remark 27


(viii) of Chapter II, the random variable T is of absolutely continuous type with p.d.f.
fT (t) = FT′ (t), if t 6= 0. At t = 0 we may assign any arbitrary non-negative value to fT (0).
Thus a p.d.f. of T is ( −√t
e√
2 t
, if t > 0
fT (t) = .
0, otherwise

Example 16 Let X be a random variable with p.d.f.

|x|


 2
, if − 1 < x < 1
x
fX (x) = 3
, if 1 ≤ x < 2 ,

0, otherwise

and let T = X 2 .
(i) Show that T is a random variable of absolutely continuous type;
(ii) Find the distribution function of T and hence find its p.d.f.;
(iii) Find the p.d.f. of T directly (i.e., without finding the distribution function).

Solution (i) & (iii) Clearly T = X 2 is a random variable (being a continuous function of
random variable X). We have SX = (−1, 0) ∪ (0, 2). Also h(x) = x2 , x ∈ SX , is strictly

decreasing in S1,X = (−1, 0), with inverse function h−1 1 (t) = − t; h(x) = x2 , x ∈ SX , is

strictly increasing in S2,X = (0, 2), with inverse function h−1
1 (t) = t; SX = S1,X ∪ S2,X ;

10
h(S1,X ) = (0, 1) and h(S2,X ) = (0, 4). Using Theorem 10, it follows that T = X 2 is a
random variable of absolutely continuous type with p.d.f.

√ d √ √ d √
fT (t) = fX (− t)| (− t)|I(0,1) (t) + fX ( t)| ( t)|I(0,4) (t)
 dt dt
1
 2 , if 0 < t < 1

1
= 6
, if 1 < t < 4 .

0, otherwise

(ii) We have FT (t) = P ({X 2 ≤ t}), t ∈ R. Since P ({X ∈ (−1, 2)} = 1, we have
P ({T ∈ (0, 4)}) = 1. Therefore, for t < 0, FT (t) = P ({T ≤ t}) = 0 and, for t ≥ 4,
FT (t) = P ({T ≤ t}) = 1. For t ∈ [0, 4), we have
√ √
FT (t) = P ({− t ≤ X ≤ t})
Z √t
= √ X
f (x)dx
− t
( R √t |x|
√ dx, if 0 ≤ t < 1
= R 1 |x| − t 2 R √t x .
−1 2
dx + 1 3
dx, if 1 ≤ t < 4

Therefore the distribution function of T is





 0, if t<0
 t , if

0≤t<1
2
FT (t) = t+2
.


 6
, if 1≤t<4

 1, if t≥4

Note that FT is differentiable everywhere except at points 0, 1 and 4. Using Remark 27


(viii) of Chapter II it follows that the random variable T is of absolutely continuous type
with a p.d.f. 
1
 2 , if 0 < t < 1

1
fT (t) = 6
, if 1 < t < 4 .

0, otherwise


Note that a Borel function of a discrete type random variable is a random variable
of discrete type (cf. Theorem 5). Theorem 10 provides sufficient conditions under which
a Borel function of an absolutely continuous type random variable is of absolutely con-
tinuous type. The following example illustrates that, in general, a Borel function of an

11
absolutely continuous type random variable may not be of absolutely continuous type. ♠

Example 17 Let X be a random variable with p.d.f.


(
e−x , if x > 0
fX (x) = ,
0, otherwise

and let T = [X], where, for x ∈ R, [x] denotes the largest integer not exceeding x. Show
that T is a random variable of discrete type and find its p.m.f..

Proof For a ∈ R, we have T −1 ((−∞, a]) = (−∞, [a]+1) ∈ B. It follows that T is a random
variable. Also SX = (0, ∞). Since P ({X ∈ SX }) = 1, we have P ({T ∈ {0, 1, 2, . . .}) = 1.
Also, for i ∈ {0, 1, 2, . . .},

P ({T = i}) = P ({i ≤ X < i + 1})


Z i+1
= fX (x)dx
i
Z i+1
= e−x dx
i
= (1 − e−1 )e−i
> 0.

It follows that the random variable T is of discrete type with support ST = {0, 1, 2, . . .}
and p.m.f. (
(1 − e−1 )e−t , if t ∈ {0, 1, 2, . . .}
fT (t) = .
0, otherwise

3 Expectation and Moments of a Random Variable


Suppose that X is a discrete type random variable defined on a probability space (Ω, A, P )
associated with a random experiment. Let SX and fX denote, respectively, the support
and p.m.f. of X. Suppose that the random experiment is repeated a large number number
of times. Let fn (x), x ∈ SX , denote the frequency of the event {X = x} in the first n
repetitions of the random experiment. Then, according to the relative frequency approach
P
to probability, P ({X = x}) = limn→∞ fn (x)/n, x ∈ SX . Note that x∈SX xfn (x)/n
represents the mean observed value (or expected value) of random variable X in the first

12
n repetitions of the random experiment. Therefore, in line with axiomatic approach to
probability, one may define the mean value (or expected value) of random variable X as

X xfn (x)
E(X) = lim
n→∞
x∈S
n
X
X fn (x)
= x lim
x∈SX
n→∞ n
X
= xP ({X = x})
x∈SX
X
= xfX (x),
x∈SX

provided the involved limits exist and the interchange of signs of summation and limit
is allowed. A similar discussion can be provided for defining the expected value of an
absolutely continuous type random variable, having p.d.f. fX , as
Z ∞
E(X) = xfX (x)dx,
−∞

provided the integral is defined. The above discussion leads to the following definitions.

Definition 18 (i) Let X be a discrete type random variable with p.m.f. fX and support
SX . We say that the expected value of X (denoted by E(X)) is finite and equals
X
E(X) = xfX (x),
x∈SX

P
provided x∈SX |x|fX (x) < ∞.

(ii) Let X be an absolutely continuous type random variable with p.d.f. fX . We say that
the expected value of X (denoted by E(X)) is finite and equals
Z ∞
E(X) = xfX (x)dx,
−∞

R∞
provided −∞
|x|fX (x) < ∞. ♠

The following observations to above definitions are immediate.

13
Remark 19 (i) Since
X X X
| xfX (x)| ≤ |xfX (x)| = |x|fX (x)
x∈SX x∈SX x∈SX

and Z ∞ Z ∞ Z ∞
| xfX (x)dx| ≤ |xfX (x)|dx = |x|fX (x)dx,
−∞ −∞ −∞

it follows that if the expected value of a random variable X is finite then |E(X)| < ∞.
P
(ii) If X is a random variable of discrete type with finite support SX , then x∈SX |x|fX (x) <
∞. Consequently the expected value of X is finite.
(iii) Suppose that X is a random variable of an absolutely continuous type with support
SX ⊆ [−a, a], for some a > 0. Then
Z ∞ Z ∞
|x|fX (x)dx ≤ a fX (x)dx = a.
−∞ −∞

Consequently the expected value of X is finite. ♠

Example 20 Let X be a random variable with p.m.f.


(
( 21 )x , if x ∈ {1, 2, 3, . . .}
fX (x) = .
0, otherwise

Show that the expected value of X is finite and find its value.

Solution We have SX = {1, 2, 3, . . .} and


∞ ∞
X X n X
|x|fX (x) = n
= an , say.
x∈SX n=1
2 n=1

Clearly an > 0, n = 1, 2, . . . and an+1


an
= n+1
2n
→ 12 < 1, as n → ∞. By the ratio test,
P P∞
x∈SX |x|fX (x) = n=1 an < ∞ and therefore the expected value of X is finite. Moreover


X X j
E(X) = xfX (x) = j
= lim Sn ,
x∈SX j=1
2 n→∞

14
where
n
X j
Sn = (21)
j=1
2j
n
Sn X j
⇒ = j+1
2 j=1
2
n+1
X j−1
= . (22)
j=2
2j

On subtracting (22) from (21) we get


n
Sn X 1 n
= j
− n+1
2 j=1
2 2
 n
1 n
= 1− − n+1
2 2
  n 
1 n
⇒ Sn = 2 1 − − n+1
2 2
⇒ E(X) = 2.


Example 23 Let X be a random variable with p.m.f.
(
3
π 2 x2
, if x ∈ {±1, ±2, ±3, . . .}
fX (x) = .
0, otherwise

Show that the expected value of X is not finite.


Solution We have SX = {±1, ±2, ±3, . . .} and

X 6 X1
|x|fX (x) = = ∞.
x∈SX
π 2 n=1 n

Thus the expected value of X is not finite. ♠


Example 24 Let X be a random variable with p.d.f.

e−|x|
fX (x) = , −∞ < x < ∞.
2

Show that the expected value of X is finite and find its value.

15
Solution We have
∞ ∞
e−|x|
Z Z
|x|fX (x)dx = |x| dx
−∞ 2
Z−∞

= xe−x dx
0
= 1.

Thus the expected value of X is finite and


Z ∞
E(X) = xfX (x)dx
−∞
Z ∞
e−|x|
= x
−∞ 2
= 0.

Example 25 Let X be a random variable with p.d.f.

1 1
fX (x) = · , −∞ < x < ∞.
π 1 + x2

Show that the expected value of X is not finite.

Solution We have
Z ∞
1 ∞ 1 2 ∞ x
Z Z
|x|fX (x)dx = |x| dx = dx = ∞.
−∞ π −∞ 1 + x2 π 0 1 + x2

Therefore the expected value of X is not finite. ♠

Theorem 26 Let X be a random variable of absolutely continuous or discrete type with


finite expected value. Then
R∞ R0
(i) E(X) = 0 P ({X > t})dt − −∞ P ({X < t})dt;
R∞
(ii) E(X) = 0 P ({X > t})dt, provided P ({X ≥ 0} = 1;
(iii) E(X) = ∞
P P∞
n=1 P ({X ≥ n})− n=1 P ({X ≤ −n}), provide P ({X ∈ {0, ±1, ±2, . . .}}) =
1;
(iv) E(X) = ∞
P
n=1 P ({X ≥ n}), provided P ({X ∈ {0, 1, 2, . . .}) = 1.

16
Proof (i)
Case I. X is of absolutely continuous type
Z ∞
E(X) = xfX (x)dx
−∞
Z 0 Z ∞
= xfX (x)dx + xfX (x)dx
−∞ 0
Z 0 Z 0 Z ∞Z x
= − fX (x)dtdx + fX (x)dtdx.
−∞ x 0 0

On interchanging the order of integration in the two integrals above we get


Z 0 Z t Z ∞ Z ∞
E(X) = − fX (x)dxdt + fX (x)dxdt
−∞ −∞ 0 t
Z 0 Z ∞
= − P ({X < t})dt + P ({X > t})dt.
−∞ 0

Case II. X is of discrete type


We will illustrate the idea of the proof by considering a special case where SX = {x1 , x2 , . . .},
−∞ < x1 < x2 < · · · < xi < 0 < xi+1 < xi+2 < · · · < ∞ and limn→∞ xn = ∞. Under the
above situation
Z ∞ Z xi+1 ∞ Z
X xj+1
P ({X > t})dt = P ({X > t})dt + P ({X > t})dt
0 0 j=i+1 xj
Z xi+1 ∞ Z
X xj+1
= P ({X ≥ xi+1 })dt + P ({X ≥ xj+1 })dt
0 j=i+1 xj

X
= xi+1 P ({X ≥ xi+1 }) + (xj+1 − xj )P ({X ≥ xj+1 })
j=i+1
X∞ ∞
X
= xi+1 P ({X ≥ xi+1 }) + xj+1 P ({X ≥ xj+1 }) − xj P ({X ≥ xj+1 })
j=i+1 j=i+1
X∞ ∞
X
= xi+1 P ({X ≥ xi+1 }) + xj P ({X ≥ xj }) − xj P ({X ≥ xj+1 })
j=i+2 j=i+1

X
= xj P ({X = xj }).
j=i+1

17
Also
Z 0 Z x1 i−1 Z
X xj+1 Z 0
P ({X < t})dt = P ({X < t})dt + P ({X < t})dt + P ({X < t})dt
−∞ −∞ j=1 xj xi

i−1 Z xj+1
X Z 0
= 0+ P ({X ≤ xj })dt + P ({X ≤ xi })dt
j=1 xj xi

i−1
X
= (xj+1 − xj )P ({X ≤ xj }) − xi P ({X ≤ xi })
j=1
i−1
X i−1
X
= xj+1 P ({X ≤ xj }) − xj P ({X ≤ xj }) − xi P ({X ≤ xi })
j=1 j=1
i
X i−1
X
= xj P ({X ≤ xj−1 }) − xj P ({X ≤ xj }) − xi P ({X ≤ xi })
j=2 j=1
i
X
= − xj P ({X = xj }).
j=1

Therefore
Z ∞ Z 0 ∞
X
P ({X > t})dt − P ({X < t})dt = xj P ({X = xj }) = E(X).
0 −∞ j=1

(ii) Suppose that P ({X ≥ 0}) = 1. Then P ({X < t}) = 0, ∀t ≥ 0 and therefore
Z ∞ Z 0 Z ∞
E(X) = P ({X > t})dt − P ({X < t})dt = P ({X > t})dt.
0 −∞ 0

(iii) Suppose that P ({X ∈ {0, ±1, ±2, . . .}}) = 1. Then, for m ∈ Z (the set of integers)
and m − 1 ≤ t ≤ m, we have P ({X > t}) = P ({X ≥ m}) and P ({X < t}) = P ({X ≤

18
m − 1}). Therefore
Z ∞ ∞ Z
X n
P ({X > t}dt = P ({X > t})dt
0 n=1 n−1
∞ Z
X n
= P ({X ≥ n})dt
n=1 n−1

X
= P ({X ≥ n}),
n=1
Z 0 X∞ Z −n+1
P ({X < t}dt = P ({X < t})dt
−∞ n=1 −n
∞ Z
X −n+1
= P ({X ≤ −n})dt
n=1 −n

X∞
= P ({X ≤ −n}),
n=1

and the assertion follows on using (i).


(iv) Suppose that P ({X ∈ {0, 1, 2, . . .}}) = 1. Then P ({X ≤ −n}) = 0, ∀n ∈ {1, 2, . . .},
and the result follows from (iii). ♠

Theorem 27 (i) Let X be a random variable of discrete type with support SX and p.m.f.
fX . Let h : R → R be a Borel function and let T = h(X). Then
X
E(T ) = h(x)fX (x),
x∈SX

provided it is finite.

(ii) Let X be a random variable of absolutely continuous type with p.d.f. fX . Let
h : R → R be a Borel function and let T = h(X). Then
Z ∞
E(T ) = h(x)fX (x)dx,
−∞

provided it is finite.

19
Proof (i) By Theorem 5, T = h(X) is a random variable of discrete type with support
ST = {h(x) : x ∈ SX } and p.m.f.
( P
x∈At P ({X = x}), if t ∈ ST
fT (t) = ,
0, otherwise

where At = {x ∈ SX : h(x) = t}, t ∈ ST , so that {At : t ∈ ST } forms a partition of SX


(As ∩ At = φ, if s 6= t, and ∪t∈ST At = SX ). Therefore
X
E(T ) = tP ({T = t})
t∈ST
( )
X X
= t P ({X = x})
t∈ST x∈At
XX
= tP ({X = x})
t∈ST x∈At
XX
= h(x)P ({X = x}) (since, for x ∈ At , t = h(x))
t∈ST x∈At
X
= h(x)P ({X = x}) (since As ∩ At = φ, if s 6= t)
x∈ ∪ At
t∈ST
X S
= h(x)P ({X = x}) (since t∈ST At = SX )
x∈SX
X
= h(x)fX (x).
x∈SX

(ii) Define At = {x ∈ SX : h(x) > t}, t ≥ 0 and Bs = {x ∈ SX : h(x) < s}, s ≤ 0. For
simplicity we will assume that, for every t ≥ 0 and s ≤ 0, At and Bs are intervals. Then,
using Theorem 26,
Z ∞ Z 0
E(T ) = P ({T > t})dt − P ({T < s})ds
0 −∞
Z ∞Z Z 0 Z
= fX (x)dxdt − fX (x)dxds
0 At −∞ Bs
Z Z h(x) Z Z 0
= fX (x)dtdx − fX (x)dsdx,
A0 0 B0 h(x)

on interchanging the order of integrations in the above integrals and using the following
two immediate observations: (a) t ∈ (0, ∞), x ∈ At ⇔ x ∈ A0 and t ∈ (0, h(x)); (b)

20
s ∈ (−∞, 0), x ∈ Bs ⇔ x ∈ B0 and s ∈ (h(x), 0). Therefore
Z Z
E(T ) = h(x)fX (x)dx + h(x)fX (x)dx
A0 B0
Z
= h(x)fX (x)dx
ZSX∞
= h(x)fX (x)dx,
−∞

since A0 ∩ B0 = φ and SX = A0 ∪ B0 ∪ {x ∈ SX : h(x) = 0}. ♠

Some special kind of expectations which are frequently used are defined below.

Definition 28 Let X be a random variable defined on some probability space.


(i) µ′1 = E(X), provided it is finite, is called the mean of the (distribution of ) random
variable X;
(ii) For r ∈ {1, 2, . . .}, µ′r = E(X r ), provided it is finite, is called the r-th moment of the
(distribution of ) random variable X;
(iii) For r ∈ {1, 2, . . .}, E(|X|r ), provided it is finite, is called the r-th absolute moment
of the (distribution of ) random variable X;
(iv) For r ∈ {1, 2, . . .}, µr = E((X − µ′1 )r ), provided it is finite, is called the r-th central
moment of the (distribution of ) random variable X;
(v) µ2 = E((X − µ′1 )2 ), provided it is finite, is called the variance of the (distribution of )
random variable X;


The following theorem deals with some elementary properties of expectation.

Theorem 29 Let X be a random variable.


(i) If h1 and h2 are Borel functions such that P ({h1 (X) ≤ h2 (X)}) = 1, then E(h1 (X)) ≤
E(h2 (X)), provided the involved expectations are finite.
(ii) If, for real constants a and b with a ≤ b, P ({a ≤ X ≤ b}) = 1, then a ≤ E(X) ≤ b;
(iii) If P ({X ≥ 0}) = 1 and E(X) = 0 then P ({X = 0}) = 1;
(iv) If E(|X|) is finite then |E(X)| ≤ E(|X|);
(v) For real constants a and b, E(aX +b) = aE(X)+b, provided the involved expectations

21
are finite;
(vi) If h1 , . . . , hm are Borel functions then

Xm m
X
E( hi (X)) = E(hi (X)),
i=1 i=1

provided the involved expectations are finite.

Proof We will provide the proof for the situation when X is of absolutely continuous.
The proof for the discrete case is analogous and is left as an exercise. Also assertions
(iv)-(vi) follow directly from the definition of the expectation of a random variable and
elementary properties of integrals. Therefore we will provide the proofs of only first three
assertions.
(i) Since P ({h1 (X) ≤ h2 (X)}) = 1, without loss of generality we may take SX ⊆ {x ∈

R : h1 (x) ≤ h2 (x)} (otherwise replace SX by SX = SX ∩ {x ∈ R : h1 (x) ≤ h2 (x)}, so that

P ({X ∈ SX }) = 1). Then h1 (x)ISX (x)fX (x) ≤ h2 (x)ISX (x)fX (x), ∀x ∈ R and, therefore,
Z ∞ Z ∞
E(h1 (X)) = h1 (x)ISX (x)fX (x)dx ≤ h2 (x)ISX (x)fX (x)dx = E(h2 (X)).
−∞ −∞

(ii) Since P ({a ≤ X ≤ b}) = 1, without loss of generality we may assume that SX ⊆ [a, b].
Then aISX (x)fX (x) ≤ xISX (x)fX (x) ≤ bISX (x)fX (x), ∀x ∈ R and therefore
Z ∞ Z ∞ Z ∞
a= aISX (x)fX (x)dx ≤ xISX (x)fX (x)dx ≤ bISX (x)fX (x)dx = b,
−∞ −∞ −∞

i.e., a ≤ E(X) ≤ b.

22
(iii) Since P ({X ≥ 0}) = 1, without loss of generality we may take SX ⊆ [0, ∞). Then
c
(−∞, 0) ⊆ SX = {x ∈ R : fX (x) = 0} and therefore, for n ∈ {1, 2, . . .},

0 = E(X)
Z 0 Z ∞
= xfX (x)dx + xfX (x)dx
−∞ 0
Z ∞
= xfX (x)dx
0
Z ∞
≥ xfX (x)dx
1
n

1 ∞
Z
≥ fX (x)dx
n n1
1 1
= P ({X ≥ })
n n
1
⇒ P ({X ≥ }) = 0, ∀n ∈ {1, 2, . . .}
n
1
⇒ lim P ({X ≥ }) = 0
n→∞ n

[ 1
⇒ P ( {X ≥ }) = 0,
n=1
n

using continuity of probability measures and the fact that An = {X ≥ n1 }, n = 1, 2, . . .,


1
is an increasing sequence of sets. Since ∪∞n=1 {X ≥ n } = {X > 0}, it follows that
P ({X > 0}) = 0 and therefore

P ({X = 0}) = P ({X ≥ 0}) − P ({X > 0}) = 1.


As a consequence of the above theorem we have the following corollary.

Corollary 30 Let X be a random variable with finite first two moments. Let E(X) = µ.
Then,
(i) Var(X) = E(X 2 ) − (E(X))2 ;
(ii) Var(X) ≥ 0. Moreover, Var(X) = 0 if, and only if, P ({X = µ}) = 1;
(iii) E(X 2 ) ≥ (E(X))2 , (Cauchy-Schwarz inequality);
(iv) for real constants a and b, Var(aX + b) = a2 Var(X).

23
Proof (i) Note that µ = E(X) is a fixed real number. Therefore, using Theorem 29
(v)-(vi), we have

E((X − µ)2 ) = E(X 2 ) − 2µE(X) + µ2 = E(X 2 ) − µ2 = E(X 2 ) − (E(X))2 .

(ii) Since P ({(X − µ)2 ≥ 0}) = P (Ω) = 1, using Theorem 29 (i), we have Var(X) =
E((X − µ)2 ) ≥ 0. Also, using Theorem 29 (iii), if Var(X) = E((X − µ)2 ) = 0 then
P ({(X − µ)2 = 0}) = 1, i.e., P ({X = µ}) = 1. Therefore E(X) = µ and E(X 2 ) = µ2 .
Now using (i) we get
Var(X) = E(X 2 ) − (E(X))2 = 0.

(iii) Follows on using (i) and (ii).


(iv) Let Y = aX + b. Then E(Y ) = aE(X) + b (cf. Theorem 29 (v)), Y − E(Y ) =
a(X − E(X)) and

Var(Y ) = E((Y − E(Y ))2 ) = E(a2 (X − E(X))2 ) = a2 E((X − E(X))2 ) = a2 Var(X).

Example 31 Let X be a random variable with p.d.f.



1
 2 , if − 2 < x < −1

x
fX (x) = 9
, if 0 < x < 3 .

0, otherwise

(i) If Y1 = max(X, 0), find mean and variance of Y1 ;


(ii) If Y2 = 2X + 3e− max(X,0) + 4, find E(Y2 ).

Solution Using Theorem 27 (ii), for r > 0, we get

E(Y1r ) = E((max(X, 0))r )


Z ∞
= (max(x, 0))r fX (x)dx
−∞
Z 3 r+1
x
= dx
0 9
3r
= .
r+2

24
It follows that E(Y1 ) = 1, E(Y12 ) = 9/4 and Var(Y1 ) = E(Y12 ) − (E(Y1 ))2 = 5/4 (cf.
Corollary 30 (i)).
(ii) We have
Z ∞
E(X) = xfX (x)dx
−∞
Z −1 Z 3 2
x x
= dx + dx
−2 2 0 9
1
=
Z4 ∞
E(e− max(X,0) ) = e− max(X,0) fX (x)dx
−∞
Z −1 Z 3
1 x −x
= dx + e dx
−2 2 0 9
11 − 8e−3
= .
18

Now using Theorem 29 (v)-(vi) we get

E(Y2 ) = E(2X + 3e− max(X,0) + 4)


= 2E(X) + 3E(e− max(X,0) ) + 4
19 − 4e−3
= .
3

Example 32 Let X be a random variable with p.m.f.


(
n

x
px q n−x , if x ∈ {0, 1, . . . , n}
fX (x) = ,
0, otherwise

where n ∈ {1, 2, . . .}, p ∈ (0, 1) and q = 1 − p.


(i) For r ∈ {1, 2, . . .}, find E(X(r) ), where X(r) = X(X − 1)(X − 2) · · · (X − r + 1);
(ii) Find mean and variance of X;
(iii) Let T = eX + 2e−X + 6X 2 + 3X + 4. Find E(T ).

25
Solution (i) Fix r ∈ {1, 2, . . . , n}. Using Theorem 27 (i), we have

E(X(r) ) = E(X(X − 1)(X − 2) · · · (X − r + 1))


n  
X n x n−x
= x(x − 1)(x − 2) · · · (x − r + 1) p q
x=0
x
n
X n!
= x(x − 1)(x − 2) · · · (x − r + 1) px q n−x
x=r
x!(n − x)!
n
X n − r 
= n(n − 1)(n − 2) · · · (n − r + 1)p r
px−r q (n−r)−(x−r)
x=r
x − r
n−r  
r
X n − r x (n−r)−x
= n(n − 1)(n − 2) · · · (n − r + 1)p p q
x=0
x
= n(n − 1)(n − 2) · · · (n − r + 1)pr (q + p)n−r
= n(n − 1)(n − 2) · · · (n − r + 1)pr .

(ii) Using (i) we get E(X) = E(X(1) ) = np and E(X(X − 1)) = E(X(2) ) = n(n − 1)p2 .
Therefore E(X 2 ) = E(X(X −1)+X) = n(n−1)p2 +np and Var(X) = E(X 2 )−(E(X))2 =
npq.
(iii) For t ∈ R, we have
n 
tX
X n x n−x
tx
E(e ) = e p q
x=0
x
n  
X n
= (pet )x q n−x
x=0
x
= (q + pet )n .

Therefore

E(T ) = E(eX + 2e−X + 6X 2 + 3X + 4)


= E(eX ) + 2E(e−X ) + 6E(X 2 ) + 3E(X) + 4
= (q + pe)n + 2e−n (qe + p)n + 6n(n − 1)p2 + 3np + 4.

We are familiar with the Laplace transform of a given real-valued function defined
on R and also with the fact that, under certain conditions, the Laplace transform of a

26
function determines the function uniquely. In probability theory the Laplace transform
of a p.d.f./p.m.f. of a random variable X plays an important role and is refereed to as
moment generating function (of probability distribution) of random variable X.

Definition 33 Let X be a random variable and let A = {t ∈ R : E(|etX |) = E(etX ) is finite}.


Define MX : A → R by
MX (t) = E(etX ), t ∈ A.

(i) We call the function MX the moment generating function (m.g.f.) of (probability
distribution) of random variable X.
(ii) We say that the m.g.f. of a random variable X exists if there exists a positive real
number a such that (−a, a) ⊆ A (i.e., if MX (t) = E(etX ) is finite in an in an interval
containing 0). ♠

Note that MX (0) = 1 and therefore A = {t ∈ R : E(etX ) is finite} 6= φ. Moreover, using


Theorem 29 (ii)-(iii), we have MX (t) > 0, ∀t ∈ A. Also if MX (t) = E(etX ) exists and
is finite on an interval (−a, a), a > 0, then for any real constants c and d the m.g.f. of
Y = cX + d also exists and

a a
MY (t) = McX+d (t) = E(et(cX+d) ) = etd E(ectX ) = etd MX (ct), t ∈ (− , ),
|c| |c|

with the convention that a/0 = ∞.


The name moment generating function to the transform MX is motivated by the fact
that MX can be used to generate moments of a random variable, as illustrated in the
following theorem.

Theorem 34 Let X be a random variable with m.g.f. MX that is finite on an interval


(−a, a), for some a > 0 (i.e., m.g.f. of X exists). Then,
(i) for each r ∈ {1, 2, . . .}, µ′r = E(X r ) is finite;
(r) (r) r
(ii) for r ∈ {1, 2, . . .}, µ′r = E(X r ) = MX (0), where MX (0) = [ dtd r MX (t)]t=0 , the r-the
derivative of MX at the point 0;
(iii) MX (t) = ∞ tr ′
P
r=0 r! µr , t ∈ (−a, a).

Proof We will provide the proof for case where X is of absolutely continuous type. The
proof for the case of discrete type X follows in the similar fashion with integrals signs

27
being replaced by summation signs.
(i) We have E(etX ) < ∞, ∀t ∈ (−a, a). Therefore
Z 0 Z ∞
tx
e fX (x)dx < ∞, ∀t ∈ (−a, a) and etx fX (x)dx < ∞, ∀t ∈ (−a, a)
−∞ 0
Z 0 Z ∞
⇒ e−t|x| fX (x)dx < ∞, ∀t ∈ (−a, a) and et|x| fX (x)dx < ∞, ∀t ∈ (−a, a)
−∞ 0
Z0 Z ∞
⇒ e|t||x| fX (x)dx < ∞, ∀t ∈ (−a, a) and e|t||x| fX (x)dx < ∞, ∀t ∈ (−a, a)
−∞ 0
Z 0 Z ∞
⇒ e|tx| fX (x)dx < ∞, ∀t ∈ (−a, a) and e|tx| fX (x)dx < ∞, ∀t ∈ (−a, a)
Z−∞

0

⇒ e|tx| fX (x)dx < ∞, ∀t ∈ (−a, a),


−∞

i.e., E(e|tX| ) < ∞, ∀t ∈ (−a, a). Fix r ∈ {1, 2, . . .} and t ∈ (−a, a) − {0}. Then
r
limx→∞ e|x| r
|tx| = 0 and therefore there exists a positive real number Ar,t such that |x| < e
|tx|
,
whenever |x| > Ar,t . Thus we have
Z ∞
r
E(|X| ) = |x|r fX (x)dx
Z−∞ Z
r
= |x| fX (x)dx + |x|r fX (x)dx
|x|≤Ar,t |x|>Ar,t
Z Z
≤ Arr,t fX (x)dx + e|tx| fX (x)dx
|x|≤Ar,t |x|>Ar,t
Z ∞
≤ Arr,t + e|tx| fX (x)dx
−∞
< ∞.

(ii) Fix r ∈ {1, 2, . . .}. Then, for t ∈ (−a, a),


Z ∞
MX (t) = etx fX (x)dx
−∞
r Z ∞
(r) d
MX (t) = etx fX (x)dx.
dtr −∞

Under the assumption that MX (t) = E(etX ) < ∞, ∀t ∈ (−a, a), using arguments of
advanced calculus, it can be shown that the derivative can be passed through the integral

28
sign. Therefore
Z ∞
(r) dr
MX (t) = r
etx fX (x)dx
dt
Z ∞ −∞
dr tx
= r
(e )fX (x)dx
−∞ dt
Z ∞
= xr etx fX (x)dx
Z−∞∞
(r)
⇒ MX (0) = xr fX (x)dx
−∞
= E(X r ).

(iii) Fix r ∈ {1, 2, . . .}. Then, for t ∈ (−a, a),


Z ∞
MX (t) = etx fX (x)dx
−∞
∞ ∞ r r
tx
Z X
= ( )fX (x)dx.
−∞ r=0 r!

Under the assumption that MX (t) = E(etX ) < ∞, ∀t ∈ (−a, a), using arguments of
advanced calculus, it can be shown that the summation sign can be passed through the
integral sign, i.e.,
∞ r Z ∞
X t
MX (t) = xr fX (x)dx
r=0
r! −∞
∞ r
X t ′
= µ.
r=0
r! r


As a consequence of the above theorem we have the following corollary.

Corollary 35 Under the notation and assumption of Theorem 34, define ψX : (−a, a) →
(1)
R by ψX (t) = ln(MX (t)), t ∈ (−a, a). Then µ′1 = E(X) = ψX (0) and µ2 = Var(X) =
(2) (r)
ψX (0), where ψX denotes the r-th (r ∈ {1, 2, . . .}) derivative of ψX .

Proof We have, for t ∈ (−a, a),

(1) (2) (1)


(1) M (t) (2) MX (t)MX (t) − (MX (t))2
ψX (t) = X and ψX (t) = .
MX (t) (MX (t))2

29
(r)
Using the facts that MX (0) = 1 and MX (0) = E(X r ), r ∈ {1, 2, . . .} (cf. Theorem 34
(ii)), we get

(1)
(1) MX (0)
ψX (0) = = E(X)
MX (0)
(2) (1)
(2) MX (0)MX (0) − (MX (0))2
ψX (0) = = E(X 2 ) − (E(X))2 = Var(X).
(MX (0))2

Example 36 Let X be a random variable with p.m.f.


(
e−λ λx
x!
, if x ∈ {0, 1, 2, . . .}
fX (x) = ,
0, otherwise

where λ > 0.
(i) Find the m.g.f. MX (t), t ∈ A = {s ∈ R : E(esX ) < ∞}, of X. Show that X possess
moments of all orders. Find the mean and variance of X;
(ii) Find ψX (t) = ln(MX (t)), t ∈ A. Hence find the mean and variance of X;
(iii) What are the first four terms in the power series expansion of MX centered at 0.

Solution We have
∞ −λ x ∞
X
tx e λ −λ
X (λet )x t t
MX (t) = E(e ) = tX
e =e = e−λ eλe = eλ(e −1) , ∀t ∈ R.
x=0
x! x=0
x!

Since A = {s ∈ R : E(esX ) < ∞} = R, by Theorem 34 (i), for every r ∈ {1, 2, . . .}, µ′r is
finite. Clearly

(1) t (2) t t
MX (t) = λet eλ(e −1) and MX (t) = λet eλ(e −1) + λ2 e2t eλ(e −1) , t ∈ R.

(1) (2)
Therefore E(X) = MX (0) = λ, E(X 2 ) = MX (0) = λ + λ2 and Var(X) = E(X 2 ) −
(E(X))2 = λ.
(1) (2)
(ii) We have ψX (t) = ln(MX (t)) = λ(et − 1), t ∈ R, ψX (t) = ψX (t) = λet , t ∈ R.
(1) (2)
Therefore E(X) = ψX (0) = λ and Var(X) = ψX (0) = λ.
(3) t t t
(iii) We have MX (t) = λ3 e3t eλ(e −1) + 3λ2 e2t eλ(e −1) + λet eλ(e −1) , t ∈ R. Therefore µ′3 =
(3)
E(X 3 ) = MX (0) = λ3 + 3λ2 + λ. Since A = {s ∈ R : E(esX ) < ∞} = R, by Theorem 34

30
(iii), we have

t2 t3
MX (t) = 1 + µ′1 t + µ′2 + µ′3 + · · ·
2! 3!
t2 t3
= 1 + λt + λ(λ + 1) + λ(λ2 + 3λ + 1) + · · · , t ∈ R.
2! 3!

Example 37 Let X be a random variable with p.d.f.


(
e−x , if x > 0
fX (x) = .
0, otherwise

(i) Find the m.g.f. MX (t), t ∈ A = {s ∈ R : E(esX ) < ∞}, of X. Show that X possess
moments of all orders. Find the mean and variance of X;
(ii) Find ψX (t) = ln(MX (t)), t ∈ A. Hence find the mean and variance of X;
(iii) Expand MX (t) as a power series centered at 0 and hence find E(X r ), r ∈ {1, 2, . . .}.

Solution We have
Z ∞ Z ∞
tX −x
MX (t) = E(e ) =tX
e e dx = e−(1−t)x dx.
0 0

Clearly A = {s ∈ R : E(esX ) < ∞} = (−∞, 1) ⊃ (−1, 1) and MX (t) = (1 − t)−1 , t < 1.


By Theorem 34 (i), for every r ∈ {1, 2, . . .}, µ′r is finite. Clearly

(1) (2)
MX (t) = (1 − t)−2 and MX (t) = 2(1 − t)−3 , t < 1.

(1) (2)
Therefore E(X) = MX (0) = 1, E(X 2 ) = MX (0) = 2 and Var(X) = E(X 2 )−(E(X))2 =
1.
(1)
(ii) We have ψX (t) = ln(MX (t)) = − ln(1 − t), t < 1, ψX (t) = (1 − t)−1 , t < 1 and
(2) (1) (2)
ψX (t) = (1 − t)−2 , t < 1. Therefore E(X) = ψX (0) = 1 and Var(X) = ψX (0) = 1.
(iii) We have

X
−1
MX (t) = (1 − t) = tr , t ∈ (−1, 1).
r=0

Since A = {s ∈ R : E(esX ) < ∞} = (−∞, 1) ⊃ (−1, 1), using Theorem 34 (iii), we


r
conclude that µ′r = coefficient of tr! in the power series expansion of MX (t) centered at 0
= r!. ♠

31
Example 38 Let X be a random variable with p.d.f.

1 1
fX (x) = · , −∞ < x < ∞.
π 1 + x2

Show that the m.g.f. of X does not exist.

Solution From Example 25 we know that expected value of X is not finite. Therefore,
using Theorem 34 (i), we conclude that m.g.f. of X does not exist. ♠

In the sequel we will see that, under certain conditions, a probability distribution is
uniquely determined by its m.g.f..

Definition 39 Two random variables X and Y are said to have the same distribution
d
(written as X = Y ) if they have the same distribution function, i.e., FX (x) = FY (x), ∀x ∈
R. ♠

Let X and Y be two random variables with p.d.f.s/p.m.f.s fX and fY , respectively. If


d
fX (x) = fY (x), ∀x ∈ R then obviously FX (x) = FY (x), ∀x ∈ R, i.e., X = Y .
d
Conversely suppose that X = Y . Then, for each x ∈ R, FX (x) = FY (x) = G(x), say,
and thus X and Y are of the same type (i.e., either both are of discrete type or both
d
are of absolutely continuous type). If X = Y and both of them are of discrete type then
clearly SX = SY = {x ∈ R : G(x) − G(x−) > 0} = S, say, and
(
G(x) − G(x−), if x ∈ S
fX (x) = fY (x) = .
0, otherwise

d
It follows that if X and Y are of discrete type then X = Y if, and only if, X and Y have
d
the same p.m.f., i.e., fX (x) = fY (x), ∀x ∈ R. If X = Y and G is differentiable everywhere
except possibly on a finite set C then, using Remark 27 (viii) of Chapter II, it follows that
both of X and Y are of absolutely continuous type with a common (version of) p.d.f.
(
G′ (x), if x ∈
/C
g(x) = .
0, otherwise

It follows that if X and Y have distribution functions that are differentiable everywhere
d
except possibly on finite sets, then X = Y if, and only if, there exist versions fX and fY
of p.d.f.s of X and Y , respectively, such that fX (x) = fY (x), ∀x ∈ R.
The following theorem is immediate from the above discussion.

32
Theorem 40 (i) Let X and Y be random variables of discrete type with p.m.f.s fX and
d
fY respectively. Then X = Y if, and only if, fX (x) = fY (x), ∀x ∈ R;
(ii) Let X and Y be random variables having distribution functions that are differentiable
everywhere except possibly on finite sets. Then both of them are of absolutely continuous
d
type. Moreover, X = Y if, and only if, there exist versions of p.d.f.s fX and fY of X and
Y , respectively, such that fX (x) = fY (x), ∀x ∈ R. ♠

As a consequence of the above theorem we have the following corollary.


d
Corollary 41 (i) Let X and Y be random variables of discrete type with X = Y . Then,
d
for any Borel functions h and ψ, E(h(X)) = E(h(Y )) and ψ(X) = ψ(Y );
(ii) Let X and Y be random variables of absolutely continuous type having distribution
functions that are differentiable everywhere except possibly on finite sets. Suppose that
d d
X = Y . Then, for any Borel functions h and ψ, E(h(X)) = E(h(Y )) and ψ(X) = ψ(Y ).
d
Proof (i) Suppose that X = Y . Since X and Y are of discrete type this implies that they
have the same support and the same p.m.f., i.e., SX = SY = S (say) and, for each x ∈ R,
fX (x) = fY (x) = g(x) (say). Therefore
X
E(h(X)) = h(x)fX (x)
x∈SX
X
= h(x)g(x)
x∈S
X
= h(x)fY (x)
x∈SY
= E(h(Y )).

Fix a ∈ R. On taking h(x) = I(−∞,a] (ψ(x)), x ∈ R, we get

E(I(−∞,a] (ψ(X))) = E(I(−∞,a] (ψ(Y ))),

i.e., P ({ψ(X) ≤ a}) = P ({ψ(Y ) ≤ a}). It follows that ψ(X) and ψ(Y ) have the same
d
distribution function, i.e., ψ(X) = ψ(Y ).
d
(ii) Suppose that X = Y , i.e., for each x ∈ R, FX (x) = FY (x) = G(x), say. Since the
common distribution function G is differentiable everywhere except possibly on a finite

33
set C (say), we may take their common p.d.f. to be (cf. Remark 27 (viii) of Chapter II)
(
G′ (x), if x ∈
/C
fX (x) = fY (x) = .
0, otherwise

Therefore
Z ∞
E(h(X)) = h(x)fX (x)dx
Z−∞

= h(x)fY (x)dx
−∞
= E(h(Y )).

d
The proof for ψ(X) = ψ(Y ) follows on the lines of proof of (i). ♠

Example 42 Let X be a random variable with p.m.f.


(
n
( 21 )n , if x ∈ {0, 1, . . . , n}

x
fX (x) = ,
0, otherwise

d
where n is a given positive integer. Let Y = n − X. Show that Y = X and hence show
that E(X) = n/2.
(ii) Let X be a random variable with p.d.f.

e−|x|
fX (x) = , −∞ < x < ∞,
2
d
and let Y = −X. Show that Y = X and hence show that E(X) = 0.

Proof (i) Clearly E(X) is finite. Using Example 9 it follows that the p.m.f. of Y = n − X
is given by

fY (y) = P ({Y = y})


( 
n 1 n
( ) , if y ∈ {0, 1, . . . , n}
y 2
=
0, otherwise
= fX (y), ∀ y ∈ R,

34
d
i.e., Y = X. Hence (using Corollary 41 (i))

E(X) = E(Y )
= E(n − X)
= n − E(X)
n
⇒ E(X) = .
2

(ii) Using Corollary 14 it can be shown that the p.d.f. of random variable Y is

e−|y|
fY (y) = = fX (y), −∞ < y < ∞.
2
d
It follows that Y = X and therefore (since E(X) is finite)

E(X) = E(Y )
= −E(X)
⇒ E(X) = 0.

Definition 43 A random variable X is said to have a symmetric distribution about a


d
point µ ∈ R if X − µ = µ − X. ♠

Theorem 44 Let X be a random variable having p.d.f./p.m.f. fX and distribution func-


tion FX . Let µ ∈ R. Then,
(i) the distribution of X is symmetric about µ if, and only if, fX (µ − x) = fX (µ + x), ∀x ∈
R;
(ii) the distribution of X is symmetric about µ if, and only if, FX (µ+x)+FX ((µ−x)−) =
1, ∀x ∈ R (i.e., if, and only if, P ({X ≤ µ + x}) = P ({X ≥ µ − x}), ∀x ∈ R);
(iii) the distribution of X is symmetric about µ if, and only if, the distribution of Y = X−µ
is symmetric about 0;
1
(iv) if the distribution of X is symmetric about µ, FX (µ−) ≤ 2
≤ FX (µ);
(v) if the distribution of X is symmetric about µ and the expected value of X is finite,
E(X) = µ;
(vi) if the distribution of X is symmetric about 0, E(X 2m+1 ) = 0, m ∈ {1, 2, . . .}, provided
the expectation exists.

35
Proof For simplicity we will assume that either X is of discrete type or X is of absolutely
continuous type. Moreover if X is of absolutely continuous type then we will assume that
its distribution function is differentiable everywhere except possibly on a finite set.
(i) Let Y1 = X − µ and Y2 = µ − X. The assertion follows from Theorem 40 on noting
that the p.d.f.s/p.m.f.s of Y1 and Y2 are given by fY1 (y) = fX (µ + y), y ∈ R and fY2 (y) =
fX (µ − y), y ∈ R.
(ii) Let Y1 = X − µ and Y2 = µ − X. Then Y1 and Y2 have distribution functions
FY1 (x) = FX (µ + x), x ∈ R and FY2 (x) = 1 − FX ((µ − x)−), x ∈ R. Therefore

d
Y1 = Y2 ⇔ FY1 (x) = FY2 (x), ∀x ∈ R
⇔ FX (µ + x) + FX ((µ − x)−) = 1, ∀x ∈ R.

(iii) Clearly

d
Distribution of X is symmetric about µ ⇔ X − µ = µ − X = −(X − µ)
d
⇔ Y = −Y.

(iv) Using (ii), we have

Distribution of X is symmetric about µ ⇔ FX (µ + x) + FX ((µ − x)−) = 1, ∀x ∈ R


⇒ FX (µ) + FX (µ−) = 1
1
⇒ FX (µ−) ≤ ≤ FX (µ),
2

since FX (µ−) ≤ FX (µ).


d
(v) Suppose that the distribution of X is symmetric about µ, i.e., suppose that X − µ =
µ−X, and E(|X|) < ∞. Now, using Corollary 41, we conclude that E(X −µ) = E(µ−X),
i.e. E(X) = µ.
d
(vi) Suppose that distribution of X is symmetric about 0, i.e., suppose that X = −X.
Fix m ∈ {1, 2, . . .}. Using Corollary 41 we conclude that E(X 2m+1 ) = E((−X)2m+1 ),
provided E(|X|2m+1 ) < ∞. It follows that E(X 2m+1 ) = 0. ♠

Let X and Y be random variables having the same distribution. Suppose that the
m.g.f. MX exists. Then there exists a positive real number a such that E(etX ) < ∞, ∀t ∈
(−a, a). Using Corollary 41, we conclude that MY exists and MY (t) = MX (t), ∀t ∈
(−a, a). Thus if two random variables have the same distribution then they have the

36
same m.g.f., provided it exists. The following theorem illustrates that the converse is also
true.

Theorem 45 Let X and Y be random variables having m.g.f.s MX and MY respectively.


Suppose that there exists a positive real number b such that MX (t) = MY (t), ∀t ∈ (−b, b).
d
Then X = Y .

Proof We will provide the proof for the special case where X and Y are of discrete type
with SX = SY ⊆ {0, 1, 2, . . .}, as the proof general proof is involved. We have

MX (t) = MY (t), ∀ t ∈ (−b, b)



X ∞
X
tk
⇒ e P ({X = k}) = etk P ({Y = k}), ∀t ∈ (−b, b)
k=0 k=0
X∞ ∞
X
⇒ sk P ({X = k}) = sk P ({Y = k}), ∀s ∈ (e−b , eb ).
k=0 k=0

We know that if two power series match over an interval then they have the same coeffi-
cients. It follows that P ({X = k}) = P ({Y = k}), k ∈ {0, 1, 2, . . .}, i.e., X and Y have
the same p.m.f.. Now the result follows using Theorem 40 (i). ♠

Example 46 Let µ ∈ R and σ > 0 be real constants and let Xµ,σ be a random variable
having p.d.f.
1 (x−µ)2
fXµ,σ (x) = √ e− 2σ2 , −∞ < x < ∞.
σ 2π
(i) Show that fXµ,σ is a p.d.f.;
(ii) Show that the probability distribution function of Xµ,σ is symmetric about µ. Hence
find E(Xµ,σ );
(iii) Find the m.g.f. of Xµ,σ and hence find the mean and variance of Xµ,σ ;
(iv) Let Yµ,σ = aXµ,σ + b, where a 6= 0 and b are real constants. Using the m.g.f. of Xµ,σ ,
show that the p.d.f. of Yµ,σ is

1 (y−(aµ+b))2
fYµ,σ (y) = √ e− 2a2 σ2 , −∞ < y < ∞.
|a|σ 2π

37
Proof (i) Clearly fXµ,σ (x) ≥ 0, ∀x ∈ R. Also
∞ ∞
1
Z Z
(x−µ)2
fXµ,σ (x)dx = √ e− 2σ2 dx
−∞ −∞ σ 2π
Z ∞
1 z2 x−µ
= √ e− 2 dz (on making the transformation σ
= z)
2π −∞
= I, say.

Clearly I ≥ 0 and
 Z ∞  Z ∞ 
2 1 2
− y2 1 2
− z2
I = √ e dy √ e dz
2π −∞ 2π −∞
Z ∞Z ∞
1 y 2 +z 2
= e− 2 dydz.
2π −∞ −∞

On making the transformation y = r cos θ, z = r sin θ, r > 0, 0 ≤ θ ≤ 2π (so that the


jacobian of the transformation is r) we get
Z ∞ Z 2π
2 1 r2
I = re− 2 dθdr

Z ∞ 0 20
r
= re− 2 dr
Z0 ∞
= e−z dz
0
= 1.

Since I ≥ 0, it follows that I = 1 and thus fXµ,σ (x) is a p.d.f..


(ii) Clearly
1 x2
fXµ,σ (µ − x) = fXµ,σ (µ + x) = √ e− 2σ2 , ∀x ∈ R.
σ 2π

38
Now using Theorem 44 (i) and (v) it follows that the distribution of Xµ,σ is symmetric
about µ and E(Xµ,σ ) = µ.
(iii) For t ∈ R

MXµ,σ (t) = E(etXµ,σ )


Z ∞
1 (x−µ)2
= etx √ e− 2σ2 dx
−∞ σ 2π
Z ∞
1 z2
= √ et(µ+σz) e− 2 dz
2π −∞
σ 2 t2 Z ∞
eµt+ 2 (z−σt)2
= √ e− 2 dz
2π −∞
σ 2 t2
= eµt+ 2 ,

since, by (i), Z ∞ (x−µ)2 √


e− 2σ 2 dx = σ 2π, ∀µ ∈ R, and σ > 0.
−∞

Also, for t ∈ R,

σ 2 t2
ψXµ,σ (t) = ln(MXµ,σ (t)) = µt +
2
(1)
⇒ E(X) = ψXµ,σ (0) = µ
(2)
and Var(X) = ψXµ,σ (0) = σ 2 .

(iv) Using the discussion following Definition 33 we have, for t ∈ R,

MYµ,σ (t) = MaXµ,σ +b (t)


= etb MXµ,σ (at)
a2 σ 2 t2
= e(aµ+b)t+ 2

= MXaµ+b,|a|σ (t)
d
⇒ Yµ,σ = Xaµ+b,|a|σ , (using Theorem 45).

Now using Theorem 40 (ii) it follows that the p.d.f. of Yµ,σ is given by

fYµ,σ (y) = fXaµ+b,|a|σ (y)


1 (y−(aµ+b))2
= √ e− 2a2 σ2 , y ∈ R.
|a|σ 2π

39

Example 47 Let p ∈ (0, 1) and let Xp be a random variable with p.m.f.


(
n

x
px q n−x , if x ∈ {0, 1, . . . , n}
fXp (x) = ,
0, otherwise

where n is a given positive integer and q = 1 − p. .


(i) Find the m.g.f. of Xp and hence find the mean and variance of Xp , p ∈ (0, 1);
(ii) Let Yp = n − Xp , p ∈ (0, 1). Using the m.g.f. of Xp show that the p.m.f. of Yp is
(
n

y
q y (1 − q)n−y , if y ∈ {0, 1, . . . , n}
fYp (y) = ,
0, otherwise

Proof From the solution of Example 32 (iii), it is clear that the m.g.f. of Xp is given

MXp (t) = (1 − p + pet )n , t ∈ R.

Therefore, for t ∈ R,

ψXp (t) = ln(MXp (t))


= n ln(1 − p + pet )
(1) npet
ψXp (t) =
1 − p + pet
(2) (1 − p + pet )et − pe2t
ψXp (t) = np
(1 − p + pet )2
(1)
⇒ E(X) = ψXp (0) = np
(2)
and Var(X) = ψXp (0) = np(1 − p).

40
(ii) For t ∈ R

MYp (t) = E(etYp )


= ent MXp (−t)
= ent (1 − p + pe−t )n
= (p + (1 − p)et )n
= MX1−p (t)
d
⇒ Yp = X1−p .

Now using Theorem 40 (i) it follows that the p.m.f. of Yp is given by

fYp (y) = fX1−p (y)


( 
n
y
q y (1 − q)n−y , if y ∈ {0, 1, . . . , n}
= = .
0, otherwise

We often come across situations where the probability of a Borel set under a given
probability distribution can not be explicitly evaluated and thus some estimates of the
probability may be desired. For example if a random variable Z has the p.d.f.

1 z2
fZ (z) = √ e− 2 , −∞ < z < ∞,

then ∞
1
Z
z2
P ({Z > 2}) = √ e− 2 dz (48)
2 2π
can not be explicitly evaluated and an estimate of this probability may be desired. To
estimate this probability one has to either resort to numerical integration or use some
other estimation procedure. Inequalities are popular estimation procedures and they play
an important role in probability theory. The following theorem provides an inequality
which can be used for estimating tail probabilities of the type P ({|X| > c}), c ∈ R.

Theorem 49 Let X be a random variable and let g : [0, ∞] → R be a non-negative and


non-decreasing function such that the expected value of g(X) is finite. Then, for any
c > 0 for which g(c) > 0,
E(g(|X|))
P ({|X| ≥ c}) ≤ .
g(c)

41
Proof We provide the proof for the case where X is of absolutely continuous type. The
proof for the discrete type follows in the similar fashion with integral signs replaced by
summation signs. Fix c > 0 such that g(c) > 0 and define A = {x ∈ R : |x| ≥ c}. Then
Z ∞
E(g(|X|)) = g(|x|)fX (x)dx
Z−∞

≥ g(|x|)IA (x)fX (x)dx (since g(|x|) ≥ g(|x|)IA (x), ∀x ∈ R)
−∞
Z ∞
≥ g(c) IA (x)fX (x)dx (g(|x|)IA (x) ≥ g(c)IA (x), ∀x ∈ R, as g ↑)
−∞
= g(c)P ({X ∈ A})
= g(c)P ({|X| ≥ c})
E(g(|X|))
⇒ P ({|X| ≥ c}) ≤ .
g(c)


As a consequence of the above theorem we have the following corollary.

Corollary 50 Let X be a random variable.


(i) (Markov Inequality) Suppose that E(|X|r ) < ∞ for some r > 0. Then, for any
c > 0,
E(|X|r )
P ({|X| ≥ c}) ≤ .
cr
(ii) (Chebyshev Inequality) Suppose that X has finite first two moments. If µ = E(X)
and σ 2 = Var(X) (σ ≥ 0) then, for any k > 0,

σ2
P ({|X − µ| ≥ k}) ≤ .
k2

Proof (i) Fix c > 0 and r > 0 and let g(x) = xr , x ≥ 0. Clearly g is a non-negative and
non-decreasing function. Using Theorem 49 we get

E(|X|r )
P ({|X| ≥ c}) ≤ .
cr

(ii) Using (i) for r = 2 we get

E(|X − µ|2 ) σ2
P ({|X − µ| ≥ k}) ≤ = .
k2 k2

42
Example 51 Let us revisit the problem of estimating P ({Z > 2}), given by (48). Using
Example 46 (iii), we have µ = E(Z) = 0 and σ 2 = Var(Z) = 1. Moreover, using Example
d
46 (ii), we have Z = −Z. Consequently P ({Z > 2}) = P ({−Z > 2}) = P ({Z < −2}),
i.e., P ({Z > 2}) = P ({|Z| > 2})/2 = P ({|Z| ≥ 2})/2. Using the Chebyshev inequality
we have
1
P ({|Z| ≥ 2}) ≤ = 0.25,
4
and therefore P ({Z > 2}) ≤ 0.125. The exact value of P ({Z > 2}), obtained using
numerical integration, is 0.0228. ♠

The following example illustrates that the bounds provided in Theorem 49 and Corollary
50 are tight, i.e., the upper bounds provided therein may be attained.

Example 52 Let X be a random variable with p.m.f.



1
 8 , if x ∈ {−1, 1}

3
fX (x) = 4
, if x = 0 .

0, otherwise

Clearly E(X 2 ) = 1/4 and, therefore, using the Markov inequality we have

1
P ({|X| ≥ 1}) ≤ E(X 2 ) = .
4

The exact probability is

1
P ({|X| ≥ 1}) = P ({X ∈ {−1, 1}}) = .
4

It follows that the upper bounds provided in Theorem 49 and Corollary 50 may be at-
tained. ♠

4 Descriptive Measures of Probability Distributions


Let X be a random variable defined on a probability space (Ω, A, P ), associated with
a random experiment E. Let FX and fX denote, respectively, the distribution function
and the p.d.f./p.m.f. of X. The probability distribution (i.e., the distribution func-
tion/p.d.f./p.m.f.) of X describes the manner in which the random variable X takes
values in different Borel sets. It may be desirable to have a set of numerical measures
that provide a summary of the prominent features of the probability distribution of X.

43
We call these measures as descriptive measures. Four prominently used descriptive mea-
sures are: (i) Measures of central tendency (or location), also referred to as averages; (ii)
measures of dispersion; (iii) measures of skewness, and (iv) measures of kurtosis.

I. Measures of Central Tendency


A measure of central tendency or location (also called an average) gives us the idea
about the central value of the probability distribution around which values of the random
variable are clustered. Three commonly used measures of central tendency are mean,
median and mode.
I (a) Mean. Recall (Definition 28 (i)) that the mean of (probability distribution) of
random variable X is given by µ′1 = E(X). We have seen that the mean of a probability
distribution gives us the idea about the average observed value of X in the long run
(i.e., the average of observed values of X when the random experiment is repeated a
large number of times). Mean seems to be the best suited average if the distribution
d
is symmetric about a point µ (i.e., X − µ = µ − X, in which case µ = E(X)), values
in the neighborhood of µ occur with high probabilities, and as we move away from µ
in either direction fX decreases. Because of its simplicity mean is the most commonly
used average (especially for symmetric or nearly symmetric distributions). Some of the
demerits of this measure are that in some situations this may not be defined (Examples
23, 25) and that it is very sensitive to presence of a few extreme values of X which
are different from other values of X (even though they may occur with small positive
probabilities). So this measure should be used with caution if probability distribution
assigns positive probabilities to a few Borel sets having some extreme values.
I (b) Median. A real number m satisfying

1 1
FX (m−) ≤ ≤ FX (m), i.e., P ({X < m}) ≤ ≤ P ({X ≤ m})
2 2

is called the median (of the probability distribution of) X. Clearly if m is the median
of a probability distribution then, in the long run (i.e., when the random experiment
E is repeated a large number of times), the values of X on either side of m in SX are
observed with the same frequency. Thus the median of a probability distribution, in some
sense, divides SX into two equal parts each having the same probability of occurrence.
It is evident that if X is of absolutely continuous type then the median m is given by
FX (m) = 1/2. For some distributions (especially for distributions of discrete type random
variables) it may happen that {x ∈ R : FX (x) = 1/2} = [a, b), for some −∞ < a < b < ∞,

44
so that the median is not unique . In that case P ({X = x}) = 0, ∀x ∈ (a, b) and thus
we take the median to be m = a = inf{x ∈ R : FX (x) ≥ 1/2}. For random variables
having a symmetric probability distribution it is easy to verify that the mean and the
median coincide (Exercise 23). Unlike the mean, the median of a probability distribution
is always defined. Moreover the median is not affected by a few extreme values as it
takes into account only the probabilities with which different values occur and not their
numerical values. As a measure of central tendency the median is preferred over the mean
if the distribution is asymmetric and a few extreme observations are assigned positive
probabilities. However the fact that the median does not at all take into account the
numerical values of X is one of its demerits. Another disadvantage with median is that
for many probability distributions it is not easy to evaluate.
I (c) Mode. Roughly speaking the mode m0 of a probability distribution is the value
that occurs with highest probability and is defined by fX (m0 ) = sup{fX (x) : x ∈ SX }.
Clearly if m0 is the mode of a probability distribution of X then, in the long run, either m0
or a value in the neighborhood of m0 is observed with maximum frequency. Mode is easy
to understand and easy to calculate. Normally, it can be found by just inspection. Note
that a probability distribution may have more than one mode which may be far apart.
Moreover mode does not take into account the numerical values of X and it also does not
take into account the probabilities associated with all the values of X. These are crucial
deficiencies of mode which makes it less preferable over mean and median. A probabil-
ity distribution with one (two/three) mode(s) is called an unimodal (bimodal/trimodal)
distribution. A distribution with multiple modes is called a multimodal distribution.

II. Measures of Dispersion


Measures of central tendency give us the idea about the location of only central part of the
distribution. Other measures are often needed to describe a probability distribution. The
values assumed by a random variable X usually differ from each other. The usefulness of
mean or median as an average is very much dependent on the variability (or dispersion)
of values of X around mean or median. A probability distribution (or the corresponding
random variable X) is said to have a high dispersion if its support contains many val-
ues that are significantly higher or lower than the mean or median value. Some of the
commonly used measures of dispersion are standard deviation, mean deviation, quartile
deviation (or semi-inter-quartile range) and coefficient of variation.
II (a) Standard Deviation. Recall (Definition 28 (v)) that the variance (of the prob-
ability distribution) of a random variable X is defined by σ 2 = E((X − µ)2 ), where

45
µ = E(X) is the mean (of the probability distribution) of X. The standard deviation (of
√ p
the probability distribution) of X is defined by σ = µ2 = E((X − µ)2 ). Clearly the
variance and the standard deviation give us the idea about the average spread of values
of X around the mean µ. However, unlike the variance, the unit of measurement of stan-
dard deviation is the same as that of X. Because of its simplicity and intuitive appeal,
standard deviation is the most widely used measure of dispersion. Some of the demerits
of standard deviation are that in many situations it may not be defined (distributions for
which second moment is not finite) and that it is sensitive to presence of a few extreme
values of X which are different from other values. A justification for having the mean µ
p
in place of median or any other average in the definition of σ = E((X − µ)2 ) is that
p p
E((X − µ)2 ) ≤ E((X − c)2 ), ∀c ∈ R (Exercise 24 (a)).
II (b) Mean Deviation. Let A be a suitable average. The mean deviation (of probability
distribution) of X around average A is defined by MD(A) = E(|X − A|). Among various
mean deviations, the mean deviation about the median m is more preferable than the
others. A reason for this preference is the fact that for any random variable X MD(m) =
E(|X − m|) ≤ E(|X − c|) = MD(c), ∀ c ∈ R (cf. Exercixe 24 (b)). Since a natural
distance between X and m is |X − m|, as a measure of dispersion, the mean deviation
about median seems to be more appealing than the standard deviation. Although the
mean deviation about median (or mean) has more intuitive appeal than the standard
deviation, in most situations, they are not easy to evaluate. Some of the other demerits
of mean deviations are that in many situations they may not be defined and that they are
sensitive to presence of a few extreme values of X which are different from other values.
II (c) Quartile Deviation. A common drawback with the standard deviation and
mean deviations, as measures of dispersion, is that they are sensitive to presence of a
few extreme values of X. Quartile deviation measures the spread in the middle half of
the distribution and is therefore not influenced by extreme values. Let q1 and q3 be real
numbers such that

1 3
FX (q1 −) ≤ ≤ FX (q1 ) and FX (q3 −) ≤ ≤ FX (q3 )
4 4
1 3
i.e., P ({X < q1 }) ≤ ≤ P ({X ≤ q1 }) and P ({X < q3 }) ≤ ≤ P ({X ≤ q3 }).
4 4

The quantities q1 and q3 are called, respectively, the lower and upper quartiles of the
probability distribution of random variable X. Clearly if q1 , m and q3 are respectively
the lower quartile, the median and the upper quartile of a probability distribution then

46
they divide the probability distribution in four parts so that, in the long run (i.e., when
the random experiment E is repeated a large number of times), twenty five percent of the
observed values of X are expected to be less than q1 , fifty percent of the observed values of
X are expected to be less than m and seventy five percent of the observed values of X are
expected to be less than q3 . The quantity IQR = q3 −q1 is called the inter-quartile range of
the probability distribution of X and the quantity QD = (q3 − q1 )/2 is called the quartile
deviation or the semi-inter-quartile range of the probability distribution of X. It is evident
that if X is of absolutely continuous type then q1 and q3 are given by FX (q1 ) = 1/4 and
FX (q3 ) = 3/4. For some distributions (especially for distributions of discrete type random
variables) it may happen that {x ∈ R : FX (q1 ) = 1/4} = [a, b) and/or {x ∈ R : FX (q3 ) =
3/4} = [c, d) for some −∞ < a < b < ∞ and −∞ < a < b < ∞, so that q1 and/or q3
are not uniquely defined. In that case P ({X = x}) = 0, ∀x ∈ (a, b)/(c, d) and thus we
take q1 = a = inf{x ∈ R : FX (x) ≥ 1/4} and/or q3 = c = inf{x ∈ R : FX (x) ≥ 3/4}.
For random variables having a symmetric probability distribution it is easy to verify that
m = (q1 + q3 )/2 (cf. Exercise 23). Although, unlike the standard deviation and the mean
deviation, quartile deviation is not sensitive to presence of some extreme values of X a
major drawback with the quartile deviation is that it ignores the tails of the probability
distribution (which constitute 50% of the probability distribution). Note that the quartile
deviation depends on the units of measurement of random variable X and thus it may
not be an appropriate measure for comparing dispersions of two probability distributions.
For comparing dispersion of two different probability distributions a normalized measure
such as q3 −q1
2 q3 − q 1
CQD = q3 +q =
2
1
q3 + q1
seems to be more appropriate. The quantity CQD is called the coefficient of quartile
deviation of the probability distribution of X. Clearly the coefficient of quartile deviation
is independent of units and thus it can be used to compare dispersions of two different
probability distributions.
II (d) Coefficient of Variation. Like quartile deviation, standard deviation σ also
depends on the units of measurement of random variable X and thus it is not an appro-
priate measure for comparing dispersions of two different probability distributions. Let
µ and σ, respectively, be the mean and the standard deviation of the distribution of X.
Suppose that µ 6= 0. The coefficient of variation of the probability distribution of X is
defined by
σ
CV = ,
µ

47
Clearly the coefficient of variation measures the variation per unit of mean and is inde-
pendent of units. Therefore it seems to be an appropriate measure to compare dispersion
of two different probability distributions. A disadvantage with the coefficient of variation
is that when the mean µ is close to zero it is very sensitive to small changes in the mean.

III. Measures of Skewness


Skewness of a probability distribution is a measure of asymmetry (lack of symmetry).
Recall that the probability distribution of random variable X is said to be symmetric
d
about point µ if X − µ = µ − X. In that case µ = E(X) (provided it exists) and
fX (µ + x) = fX (µ − x), ∀x ∈ R. Evidently, for symmetric distributions, the shape
of the p.d.f/p.m.f on the left of µ is the mirror image of that on the right side of µ.
It can be shown that, for symmetric distributions, the mean and the median coincide
(Exercise 23). We say that a probability distribution is positively skewed if the tail on
the right side of the p.d.f./p.m.f. is longer than that on the left side of the p.d.f./p.m.f.
and bulk of the values lie on the left side of the mean. Clearly a positively skewed
distribution indicates presence of a few high values of X which pull up the value of the
mean resulting in mean larger than the median and the mode. For unimodal positively
skewed distribution we normally have Mode < Median < Mean. Similarly we say that a
probability distribution is negatively skewed if the tail on the left side of the p.d.f./p.m.f.
is longer than that on the right side of the p.d.f./p.m.f. and bulk of the values lie on
the right side of the mean. Clearly a negatively skewed distribution indicates presence of
a few low values of X which pull down the value of the mean resulting in mean smaller
than the median and the mode. For unimodal negatively skewed distribution we normally
have Mean < Median < Mode. Let µ and σ, respectively, be the mean and the standard
deviation of X and let Z = (X −µ)/σ be the standardized variable (independent of units).
A measure of skewness of the probability distribution of X is defined by

E((X − µ)3 ) µ3
β1 = E(Z 3 ) = 3
= 3.
σ µ22

The quantity β1 is simply called the coefficient of skewness. Clearly for symmetric dis-
tributions β1 = 0 (cf. Theorem 44 (vi)). However the converse may not be true, i.e.,
there are examples of skewed probability distributions for which β1 = 0. A large positive
value of β1 indicates that the distribution is positively skewed and a small negative value
of β1 indicates that the data is negatively skewed. A measure of skewness can also be
based on quartiles. Let q1 , m, q3 and µ denote respectively the lower quartile, the median,

48
the upper quartile and the mean of the probability distribution of X. We know that
for random variables having a symmetric probability distribution µ = m = (q1 + q3 )/2,
i.e., q3 − m = m − q1 . For positively (negatively) skewed distribution we will have
(q3 − m) > (<) (m − q1 ). Thus one may also define a measure of skewness based on
(q3 − m) − (m − q1 ) = q3 − 2m + q1 . To make this quantity independent of units one may
consider
q3 − 2m + q1
β2 =
q3 − q1
as a measure of skewness. The quantity β2 is called the Yule coefficient of skewness.

IV. Measures of Kurtosis


For real constants µ ∈ R and σ > 0, let Yµ,σ be a random variable having the p.d.f.

1 (x−µ)2
fYµ,σ (x) = √ e− 2σ2 , −∞ < x < ∞.
σ 2π

We have seen (cf. Example 46 (iii)) that µ and σ 2 are respectively the mean and the
variance of the distribution of Yµ,σ . We call the probability distribution corresponding to
p.d.f. fYµ,σ as the normal distribution with mean µ and variance σ 2 (denoted by N (µ, σ 2 )).
We know that N (µ, σ 2 ) distribution is symmetric about µ (cf. Example 46 (ii)). Also it
is easy to verify that N (µ, σ 2 ) distribution is unimodal with µ as the common value of
mean, median and mode. Kurtosis of the probability distribution of X is a measure of
peakedness and thickness of tails of p.d.f. of X relative to the peakedness and thickness
of tails of the p.d.f. of normal distribution. A distribution is said to have higher (lower)
kurtosis than the normal distribution if its p.d.f., in comparison with the p.d.f. of a
normal distribution, has a sharper (rounded) peak and longer, fatter (shorter, thinner)
tails. Let µ and σ, respectively, be the mean and the standard deviation of distribution
of X and let Z = (X − µ)/σ be the standardized variable. A measure of kurtosis of the
probability distribution of X is defined by

E((X − µ)4 ) µ4
γ1 = E(Z 4 ) = 4
= 2.
σ µ2

The quantity γ1 is simply called the kurtosis of the probability distribution of X. It is


easy to show that for any values of µ ∈ R and σ > 0, the kurtosis of N (µ, σ 2 ) distribution
is γ1 = 3 (cf. Exercise 32). The quantity

γ2 = γ1 − 3

49
is called the excess kurtosis of the distribution of X. It is clear that for normal distributions
the excess kurtosis is zero. Distributions with zero excess kurtosis are called mesokurtic.
A distribution with positive (negative) excess kurtosis is called leptokurtic (platykurtic).
Clearly a leptokurtic (platykurtic) distribution has sharper (rounded) peak and longer,
fatter (shorter, thinner) tails.

Example 53 For α ∈ [0, 1], let Xα be a random variable having the p.d.f.
(
αex , if x < 0
fα (x) = .
(1 − α)e−x , if x ≥ 0

(i) Show that, for a positive integer r,


Z ∞
e−x xr−1 dx = (r − 1)!.
0

Hence find µ′r (α) = E(Xαr ), r ∈ {1, 2, . . .};


(ii) For p ∈ (0, 1), find ξp ≡ ξp (α) such that Fα (ξp ) = p, where Fα is the distribution
function of Xα . The quantity ξp is called the p-th quantile of the distribution of Xα ;
(iii) Find the lower quartile q1 (α), the median m(α) and the upper quartile q3 (α) of the
distribution of Xα ;
(iv) Find the mode m0 (α) of the distribution of Xα ;
(v) Find the standard deviation σ(α), the mean deviation about median MD(m(α)), the
inter-quartile range IQR(α), the quartile deviation (or semi-inter-quartile range) QD(α),
the coefficient of quartile deviation CQD(α) and the coefficient of variation CV(α) of the
distribution of Xα ;
(vi) Find the coefficient of skewness β1 (α) and the Yule coefficient of skewness β2 (α) of the
distribution of Xα . According to values of α, classify the distribution of Xα as symmetric,
positively skewed and negatively skewed;
(vii) Find the excess kurtosis γ2 (α) of the distribution of Xα and hence comment on the
kurtosis of the distribution of Xα .

Solution (i) For r ∈ {1, 2, . . .}, let


Z ∞
Ir = e−x xr−1 dx,
0

so that I1 = 1. Performing integration by parts it is straightforward to see that Ir =

50
(r − 1)Ir−1 , r ∈ {2, 3, . . .}. On successively using this relationship we get
Z ∞
Ir = e−x xr−1 dx = (r − 1)!, r ∈ {1, 2, . . .}.
0

Therefore, for a positive integer r,

µ′r (α) = E(Xαr )


Z 0 Z ∞
= r x
αx e dx + (1 − α)xr e−x dx
−∞ 0
Z ∞
r
= ((−1) α + 1 − α) xr e−x dx
0
= ((−1)r α + 1 − α)r!
(
(1 − 2α)r!, if r ∈ {1, 3, 5, . . .}
= .
r!, if r ∈ {2, 4, 6, . . .}

(ii) Let p ∈ (0, 1) and let ξp be such that Fα (ξp ) = p. Note that
Z 0
Fα (0) = α ex dx = α.
−∞

Thus, for evaluation of ξp , the following two cases arise.


Case I. 0 ≤ α < p
We have p = Fα (ξp ), i.e.,
Z 0 Z ξp
p = x
αe dx + (1 − α)e−x dx
−∞ 0
−ξp
= 1 − (1 − α)e ,

i.e., ξp = ln((1 − α)/(1 − p)).


Case II. α ≥ p
In this case we have
Z ξp
p = αex dx
−∞
= αeξp ,

51
i.e., ξp = − ln(α/p). Combining the two cases we get
(
ln( 1−α
1−p
), if 0 ≤ α < p
ξp = .
− ln( αp ), if p ≤ α ≤ 1

(iii) We have (
ln( 4(1−α)
3
), if 0 ≤ α < 41
q1 (α) = ξ 1 = ,
4
− ln(4α), if 41 ≤ α ≤ 1
(
ln(2(1 − α)), if 0 ≤ α < 21
m(α) = ξ 1 = ,
2
− ln(2α), if 21 ≤ α ≤ 1
and (
ln(4(1 − α)), if 0 ≤ α < 34
q3 (α) = ξ 3 = .
4
− ln( 4α
3
), if 34 ≤ α ≤ 1
(iv) From (i) µ′1 (α) = E(Xα ) = 1 − 2α. Clearly the mode m0 (α) of the distribution of Xα
is
m0 (α) = sup{fα (x); −∞ < x < ∞} = max{α, 1 − α}.

(v) Using (i) we have µ′1 (α) = E(Xα ) = 1 − 2α and µ′2 (α) = E(Xα2 ) = 2. It follows that
the standard deviation of the distribution of Xα is
p p √
σ(α) = Var(Xα ) = µ′2 (α) − (µ′1 (α))2 = 1 + 4α − 4α2 .

Note that, for 0 ≤ α < 1/2, m(α) = ln(2(1−α)) ≥ 0 and, for α > 1/2, m(α) = − ln(2α) <
0. Thus for the evaluation of the mean deviation about the median the following cases
arise:
Case I. 0 ≤ α < 21 (so that m(α) ≥ 0)

MD(m(α)) = E(|X − m(α)|)


Z 0 Z m(α)
= α x
(m(α) − x)e dx + (1 − α) (m(α) − x)e−x dx
−∞ 0
Z ∞
+(1 − α) (x − m(α))e−x dx
m(α)
= m(α) + 2α
= ln(2(1 − α)) + 2α.

52
1
Case II. 2
≤ α ≤ 1 (so that m(α) ≤ 0)

MD(m(α)) = E(|X − m(α)|)


Z m(α) Z 0
x
= α (m(α) − x)e dx + α (x − m(α))ex dx
−∞ m(α)
Z ∞
+(1 − α) (x − m(α))e−x dx
0
= 2(1 − α) − m(α)
= ln(2α) + 2(1 − α).

Combining the two cases we get


(
ln(2(1 − α)) + 2α, if 0 ≤ α < 21
MD(m(α)) = .
ln(2α) + 2(1 − α), if 12 ≤ α ≤ 1

Using (iii) the inter-quartile range of the distribution of Xα is

IQR(α) = q3 (α) − q1 (α)




 ln 3, if 0 ≤ α < 41
= ln(16α(1 − α)), if 41 ≤ α < 34 .

ln 3, if 43 ≤ α ≤ 1

The quartile deviation of the distribution of Xα is


 √
 ln 3, if 0 ≤ α < 14
q3 (α) − q1 (α)  p
QD(α) = = ln(4 α(1 − α)), if 14 ≤ α < 34 .
2  √
ln 3, if 34 ≤ α ≤ 1

The coefficient quartile deviation of the distribution of Xα is



ln 3 1
 16(1−α)2
, if 0 ≤ α < 4
ln( )

 3
q3 (α) − q1 (α) 
ln(16α(1−α)) 1 3
CQD(α) = = ln( 1−α )
, if 4
≤α< 4 .
q3 (α) + q1 (α)  α

− ln16α3 2 , 3


 if 4
≤α≤1
ln( 3 )

For α 6= 1/2, the coefficient of variation of the distribution of Xα is



σ(α) 1 + 4α − 4α2
CV(α) = ′ = .
µ1 (α) 1 − 2α

53
(vi) We have

µ3 (α) = E((Xα − µ′1 )3 ) = µ′3 (α) − 3µ′1 (α)µ′2 (α) + 2(µ′1 (α))3 = 2(1 − 2α)3 .

Therefore
µ3 (α) 2(1 − 2α)3
β1 (α) = =√ .
σ(α) 1 + 4α − 4α2
Using (iii) the Yule coefficient of skewness is

q3 (α) − 2m(α) + q1 (α)


β2 (α) =
q3 (α) − q1 (α)
ln( 34 )


 ln 3
, if 0 ≤ α < 41

 − ln(4α(1−α)) , if 1 ≤ α < 1

ln(16α(1−α)) 4 2
= ln(4α(1−α)) 1 3
.


 ln(16α(1−α))
, if 2
≤ α < 4
ln( 43 )

, if 43 ≤ α ≤ 1

ln 3

Clearly, for 0 ≤ α < 1/2, βi (α) > 0, i = 1, 2 and, for 1/2 < α ≤ 1, βi (α) < 0, i = 1, 2.
It follows that the probability distribution of Xα is positively skewed if 0 ≤ α < 1/2 and
negatively skewed if 1/2 < α ≤ 1. For α = 1/2, fα (x) = fα (−x), ∀x ∈ R. Thus, for
α = 1/2, the probability distribution of Xα is symmetric about zero.
(vii) We have

µ4 (α) = E((Xα − µ′1 )4 )


= µ′4 (α) − 4µ′1 (α)µ′3 (α) + 6(µ′1 (α))2 µ′2 (α) − 3(µ′1 (α))4
= 24 − 12(1 − 2α)2 − 3(1 − 2α)4 .

Therefore
µ4 (α) 24 − 12(1 − 2α)2 − 3(1 − 2α)4
γ1 (α) = =
(µ2 (α))2 (2 − (1 − 2α)2 )2
and
12 − 6(1 − 2α)4
γ2 (α) = γ1 (α) − 3 = .
(2 − (1 − 2α)2 )2
Clearly, for any α ∈ [0, 1], γ2 (α) > 0. It follows that, for any value of α ∈ [0, 1], the
distribution of Xα is leptokurtic. ♠

Exercises
1. Let (Ω, A, P ) be a probability space and let X : Ω → R, and h : RX → R be given

54
functions, where RX = {X(ω) : ω ∈ Ω}. Assuming that there exists a set S ⊆ R
such that S ∈/ B, using appropriate examples, show that
(a) X may not be a random variable;
(b) if X is a random variable then h(X) may not be a random variable.

2. Let the random variable X have the p.m.f.


(
n

x
px (1 − p)n−x , if x ∈ {0, 1, . . . , n}
fX (x) = ;
0, otherwise

here n is a positive integer and p ∈ (0, 1). Find the p.m.f.s of random variables

Y1 = n − X, Y2 = X 2 and Y3 = X.

3. Let X be a random variable with p.m.f.




 e 1,
 if x = 0
e−1
fX (x) = 2(|x|)!
, if x ∈ {±1, ±2, . . .} ;

0, otherwise

Find the p.m.f. and distribution function of Y = |X|.

4. Let X be a random variable with


1 2
P ({X = −2}) = 21 , P ({X = −1}) = 21 , P ({X = 0}) = 17 ,
4 5
P ({X = 1}) = 21 , P ({X = 2}) = 21 , P ({X = 3}) = 27 .

Find the p.m.f. and distribution function of Y = X 2 .

5. Let X be a random variable with p.m.f.


(
1 2 x
( ) ,
3 3
if x ∈ {0, 1, 2, . . .}
fX (x) = ;
0, otherwise

Find the distribution function of of Y = X/(X + 1) and hence determine the p.m.f.
of Y .

6. Let the random variable X have the p.d.f. fX (·). Find the distribution functions
and hence the p.d.f.s (provided they exist) of X + = max(X, 0), X − = max(−X, 0),
Y1 = |X| and Y2 = X 2 in each of the following cases:

55

1
(


 3
, if − 2 < x < −1
1+x x2

2
, if − 1 < x < 1 
65
, if − 1 < x < 4
(a) fX (x) = ; (b) fX (x) = 2x
.
0, otherwise 

 27
, if 4 < x < 5

 0, otherwise

7. Let the random variable X have the p.d.f.


(
4x3 , if 0 < x < 1
f( x) = .
0, otherwise

Find the p.d.f. and the the distribution function of the random variable Y =
−2 ln X 4 .

8. Let the random variable X have the p.d.f.



1
 2 , if − 2 < x < −1

1
fX (x) = 6
, if 0 < x < 3 .

0, otherwise

Find the p.d.f. of (a) Y1 = |X|, (b) Y2 = X 2 .

9. Let X be a random variable with p.d.f.


(
e−x , if x > 0
fX (x) = ,
0, otherwise

and let Y = sin X.


(a) Find the distribution function of Y and hence find its p.d.f.;
(b) Find the p.d.f. of Y directly (i.e., without finding the distribution function).

10. Let X be a random variable with p.d.f.


(
1, if 0 < x < 1
fX (x) = ,
0, otherwise

Find the p.d.f.s of the following random variables: (a) Y1 = X; (b) Y2 = X 2 ; (c)
Y3 = 2X + 3; (d) Y4 = − ln X.

11. Let X be a random variable with pdf fX given in Problem 10 and let Y = min(X, 1/2).
(a) Is X of continuous type?

56
(b) Examine whether or not X is of discrete or absolutely continuous type.

12. Let the random variable X have the p.d.f.



1

 2
, if 0 < x ≤ 1
1
fX (x) = 2x2
, if x > 1 ,

0, otherwise

and let Y = 1/X.


(a) Find the distribution function of Y and hence find its p.d.f.;
(b) Find the p.d.f. of Y directly (i.e., without finding the distribution function).

13. Let the random variable X have the p.d.f.


(
1 − xθ
θ
e , if x > 0
fX (x) = ,
0, otherwise

where θ > 0. Let Y = (X − θ)2 .


(a) Find the distribution function of Y and hence find its p.d.f.;
(b) Find the p.d.f. of Y directly (i.e., without finding the distribution function).

14. Let the random variable X have the p.d.f.

1 x2
fX (x) = √ e− 2 , −∞ < x < ∞.

Find the p.m.f./p.d.f. of Y = g(X), where



 −1, if x < 0

1
g(x) = 2
, if x = 0 .

1, if x > 0

15. Let the random variable X have the p.d.f.


(
3
8
(x + 1)2 , if − 1 < x < 1
fX (x) = ,
0, otherwise

and let Y = 1 − X 2 .
(a) Find the distribution function of Y and hence find its p.d.f.;
(b) Find the p.d.f. of Y directly (i.e., without finding the distribution function).

57
16. Let the random variable X have the p.d.f.
(
6x(1 − x), if 0 < x < 1
fX (x) = ,
0, otherwise

and let Y = X 2 (3 − 2X).


(a) Find the distribution function of Y and hence find its p.d.f.;
(b) Find the p.d.f. of Y directly (i.e., without finding the distribution function).

17. Let X be a random variable with p.m.f.


(
1
n
,if x ∈ {1, 2, . . . , n}
fX (x) = ,
0, otherwise

where n (≥ 2) is an integer. Find the mean and variance of X.

18. In three independent tosses of a fair coin let X denote the number of tails appearing.
Let Y = X 2 and Z = 2X 2 + 1. Find the mean and variance of random variables Y
and Z.

19. (a) From a box containing N identical tickets, numbered, 1, 2, . . . , N , n (≤ N )


tickets are drawn with replacement. Let X = largest number drawn. Find E(X).
(b) Find the expected number of throws of a fair die required to obtain a 6.

20. (a) Let X be a random variable with p.m.f.


(
c
x2+r
, if x ∈ {1, 2, . . . , }
fX (x) = ,
0, otherwise

where c−1 = ∞ −2−r


P
n=1 n and r ≥ 0 is an integer. For what values of j ∈ {0, 1, 2, . . .},
E(X j ) is finite?
(b) Find a distribution for which no moment exist.

21. Let X be a random variable with p.d.f.



x


 2
, if 0 < x ≤ 1
1


2
, if 1 < x ≤ 2
fX (x) = 3−x
.


 2
, if 2 < x < 3

 0, otherwise

58
Find the expected value of Y = X 2 − 5X + 3.

22. Let E(|X|β ) < ∞ for some β > 0. Then show that E(|X|α ) < ∞, ∀ α ∈ (0, β].

23. Let X be an absolutely continuous random variable with p.d.f. fX (x) that is sym-
metric about µ (∈ R), i.e., fX (x + µ) = fX (µ − x), ∀ x ∈ (−∞, ∞). Show that
µ is the median of the probability distribution of X and µ = (q1 + q3 )/2, where q1
and q3 are respectively the lower and the upper quartiles of the distribution of X.
Further, if E(X) is finite, then show that E(X) = µ.

24. (a) For any random variable X having the mean µ and finite second moment, show
that E((X − µ)2 ) ≤ E((X − c)2 ), ∀c ∈ R;

(b) If X is an absolutely continuous random variable with median m, then show


that E(|X − m|) ≤ E(|X − c|), ∀ c ∈ (−∞, ∞);

25. (a) Let X be a random variable with finite expectation. Show that limx→−∞ xFX (x) =
limx→∞ [x(1 − FX (x))] = 0, where FX is the distribution function of X.
(b) Let X be a random variable with limx→∞ [xα P (|X| > x)] = 0, for some α > 0.
Show that E(|X|β ) < ∞, ∀ β ∈ (0, α). What about E(|X|α )?

26. (a) Let X be a non-negative random variable (i.e., SX ⊆ [0, ∞)) of absolutely
continuous type and let h be a real-valued function defined on (0, ∞). Define ψ(x) =
Rx
0
h(t)dt, x ≥ 0, and suppose that h(x) ≥ 0, ∀ x ≥ 0. Show that
Z ∞
E(ψ(X)) = h(y)P (X > y)dy.
0

(b)Let α be a positive real number. Under the assumptions of (a), show that
Z ∞
α
E(X ) = α xα−1 P (X > x)dx.
0

(c) Let F (0) = G(0) = 0 and let F (t) ≥ G(t), ∀ t > 0, where F and G are distribu-
tion functions of absolutely continuous type non-negative random variables X and
Y , respectively. Show that E(X k ) ≤ E(Y k ), ∀ k > 0, provided the expectations
exist.
√ √
27. Consider a target comprising of three concentric circles of radii 1/ 3, 1, 3 feet.
Shots within the inner circle earn 4 points, within the next ring 3 points and within

59
the third ring 2 points. Shots outside the target do not earn any point. Let X
denote the distance (in feet) of the hit from the centre and suppose that X has the
p.d.f. (
2
π(1+x2 )
, if x > 0
fX (x) = .
0, otherwise
Find the expected score in a single shot.

28. (a) Find the moments of the random variable that has the m.g.f. M (t) = (1 − t)−3 ,
t < 1.
(b) Let the random variable X have the m.g.f.

e−t et e2t 1 3t
M (t) = + + + e .
8 4 8 2
2
Find the distribution function of X and find P (X = 1).
(c) If the m.g.f. of a random variable X is

et − e−2t
M (t) = , for t 6= 0,
3t

find the p.d.f. of Y = X 2 .

29. Let X be a random variable with m.g.f. M (t), −h < t < h. Prove that P (X ≥
a) ≤ e−at M (t), 0 < t < h and P (X ≤ a) ≤ e−at M (t), −h < t < 0.

30. (a) Let X be a random variable such that P (X ≤ 0) = 0 and let µ = E(X) is finite.
Show that P (X ≥ 2µ) ≤ 0.5.
(b) If X is a random variable such that E(X) = 3 and E(X 2 ) = 13, then determine
a lower bound for P (−2 < X < 8).

31. Let the random variable X have the p.m.f.



1
 8 , if x ∈ {−1, 1}

6
fX (x) = 8
, if x = 0 .

0, otherwise

Using this p.m.f., show that the bound for Chebyshev’s inequality cannot be im-
proved (without additional assumptions)

32. Suppose that one is interested in estimating the proportion p ∈ (0, 1) of females in
a big population. To do so a random sample of size n is taken from the population

60
and the proportion of females in this sample is taken to be an estimate of p. If one
wants to be 90% sure that the estimate is within 0.1 units of the true proportion,
what should be the sample size?

33. Let m and n be positive integers and let x1 , . . . , xn be positive real numbers. Using
the fact that, for a random variable X, E(X 2 ) ≥ (E(X))2 , show that

Xn Xn Xn
m+1 2 2m+1
( xi ) ≤ ( xi )( xi ).
i=1 i=1 i=1

34. For µ ∈ R and λ > 0, let Xµ,λ be a random variable having the p.d.f.
(
1 − x−µ
λ
e λ , if x ≥ µ
fµ,σ (x) = .
0, otherwise

(a) Find Cr (µ, λ) = E((X − µ)r ), r ∈ {1, 2, . . .} and µ′r (µ, λ) = E(Xµ,λ
r
), r ∈ {1, 2};

(b) For p ∈ (0, 1), find the p-th quantile ξp ≡ ξp (µ, λ) of the distribution of Xµ,λ
(Fµ,λ (ξp ) = p, where Fµ,λ is the distribution function of Xµ,λ );

(c) Find the lower quartile q1 (µ, λ), the median m(µ, λ) and the upper quartile
q3 (µ, λ) of the distribution of Xµ,λ ;

(d) Find the mode m0 (µ, λ) of the distribution of Xµ,σ ;

(e) Find the standard deviation σ(µ, λ), the mean deviation about median MD(m(µ, λ)),
the inter-quartile range IQR(µ, λ), the quartile deviation (or semi-inter-quartile
range) QD(µ, λ), the coefficient of quartile deviation CQD(µ, λ) and the coefficient
of variation CV(µ, λ) of the distribution of Xµ,λ ;

(f) Find the coefficient of skewness β1 (µ, λ) and the Yule coefficient of skewness
β2 (µ, λ) of the distribution of Xµ,λ ;

(g) Find the excess kurtosis γ2 (µ, λ) of the distribution of Xµ,λ ;

(h) Based on values of measures of skewness and the kurtosis of the distribution of
Xµ,λ , comment on the shape of fµ,σ .

35. For any values of µ ∈ R and σ > 0, show that the kurtosis of N (µ, σ 2 ) distribution
is β2 = 3.

61

You might also like