0% found this document useful (0 votes)
3K views121 pages

Discrete Random Variables and Probabilities

1) The random variable X represents the length of the hypotenuse of a right-angled triangle with sides determined by rolling two 6-sided dice. X can take 21 possible values. 2) The probability mass function of a random variable X that represents whether two cards drawn from a deck are the same color or different colors is given. 3) The probability that 5 people selected at random from a group of 15 people with different blood types have a particular distribution of blood types is calculated to be 0.167.

Uploaded by

Ravi K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views121 pages

Discrete Random Variables and Probabilities

1) The random variable X represents the length of the hypotenuse of a right-angled triangle with sides determined by rolling two 6-sided dice. X can take 21 possible values. 2) The probability mass function of a random variable X that represents whether two cards drawn from a deck are the same color or different colors is given. 3) The probability that 5 people selected at random from a group of 15 people with different blood types have a particular distribution of blood types is calculated to be 0.167.

Uploaded by

Ravi K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Week 2 Practice Assignment
  • Week 1 Practice Assignment Solutions
  • Week 2 Graded Assignment Solution
  • Week 3 Practice Assignment Solution
  • Week 3 Graded Assignment
  • Week 4 Practice Assignment
  • Week 4 Graded Assignment Solutions
  • Week 5 Practice Assignment Solutions
  • Week 5 Graded Assignment Solution
  • Week 6 Practice Assignment Solution
  • Week 6 Graded Assignment Solution
  • Week 7 Practice Assignment Solutions

Statistics for Data Science - 2

Week 2 practice Assignment


Discrete random variable

1. A random variable X is defined as the length of the hypotenuse of the right-angled tri-
angle whose other two sides are determined by the roll of two 6-sided dice. How many
values does X take? [1 mark]
Solution:
When two dice are rolled then there are a total of 36 outcomes.
The outcomes are:
{(1, 1) , (1, 2), ... , (1, 6),
(2, 1), (2, 2), ... , (2, 6),
...
...
(6, 1), (6, 2), ... , (6, 6)}

But the outcomes like (1, 2) (2, 1) will give the same length of the hypotenuse, hence a
total of 21 values are possible for the random variable X.

2. Two cards are drawn from a well shuffled pack of 52 cards one after other without
replacement. A random variable is defined as:
(
0 if both cards are of same color
X=
1 if both cards are of different color

Find the probability mass function of X. [1 mark]

x 0 1
(a) 1 12
fX (x) 13 13

x 0 1
(b) 1 1
fX (x) 2 2

x 0 1
(c) 25 26
fX (x) 51 51

x 0 1
(d) 12 13
fX (x) 25 25

Solution:
P (X = 0) = P (Both the cards are of same colors)
= P (First card is any one of 52 cards).P (2nd card is of same color as of 1st card)
25
= 1.
51

P (X = 1) = P (Both the cards are of different colors)


= P (First card is any one of 52 cards).P (2nd card is of different color as of 1st card)
26
= 1.
51

Hence, option (c) is right.

3. In a group of fifteen people, 8 people have blood group type O, 4 people have blood group
type A, and 3 people have blood group type B. If five people are selected randomly from
these fifteen people, then what is the probability that out of these five people 2 people
have blood group type O, 2 have blood group type A and one has blood group type B?
(Answer the question correct up to two decimal places.) [2 mark]

Solution:
Number of ways of selecting five people out of 15 = 15 C5
Number of ways of selecting 2 people of blood group of type O out of 8 people of blood
group of type O= 8 C2
Number of ways of selecting 2 people of blood group of type A out of 4 people of blood
group of type A= 4 C2
Number of ways of selecting 1 people of blood group of type B out of 3 people of blood
group of type B= 3 C1
8
C2 4 C2 3 C1
Therefore, required probability = 15 C
5
28 × 6 × 3
. = = 0.167
3003

4. Probability mass funcion of a discrete random variable X is given as:

x -2 -1 0 1 2
fX (x) a 0.2 b 0.1 0.2

Table 2.1.P: PMF of X

3
If P (X ≤ 1|X ≥ −1) = , then find the value of P (X = −2). [2 marks]
4
Solution:

Page 2
We know that

X
fX (x) = 1
x∈TX

⇒a + 0.2 + b + 0.1 + 0.2 = 1


⇒a + b = 0.5 ...(1)

From the given condition, we have

3
P (X ≤ 1|X ≥ −1) =
4
P (X ≤ 1, X ≥ −1) 3
⇒ =
P (X ≥ −1) 4
P ({−1, 0, 1}) 3
⇒ =
P ({−1, 0, 1, 2}) 4
b + 0.3 3
⇒ =
b + 0.5 4
⇒4b + 1.2 = 3b + 1.5
⇒b = 0.3 ...(2)

From equations (1) and (2), we have

a = 0.2
b = 0.3

P (X = −2) = a
⇒P (X = −2) = 0.2

5. Siberian seagulls migrate to Ganga river to escape harsh winter weather in the months
of October to March. It is seen that the number of Siberian seagulls reaching Ganga
river on one day in January is Poisson distributed with an average of 1000. What is the
probability that 650 seagulls will arrive on a given day of January? [2 marks]
e−650 (650)1000
(a)
650!
−650
e (650)1000
(b)
1000!
−1000
e (650)1000
(c)
650!

Page 3
e−1000 (1000)650
(d)
650!
Solution:
Let X be the number of Siberian seagulls migrating everyday near to Ganga river.
By given condition, we have

X ∼ Poisson(1000)

e−λ λx
P (X = x) =
x!
e−1000 (1000)650
⇒P (X = 650) =
650!

6. Probability mass function of a discrete random variable X is given as:

x -1 0 1 2 3
fX (x) 0.1 0.3 0.2 0.1 0.3

Table 2.2.P: PMF of X

If another random variable Y is defined as Y = X(X − 1), then find the smallest value
1 1
of y in the range of Y such that P (Y ≤ y) > and P (Y ≥ y) ≤ . [2 marks]
2 2
Solution:
Y is defined as Y = X(X − 1)

At X = −1, Y = −1(−2) = 2
At X = 0, Y = 0(−1) = 0
At X = 1, Y = 1(0) = 0
At X = 2, Y = 2(1) = 2
At X = 3, Y = 3(2) = 6

Therefore, TY = {0, 2, 6}

P (Y = 0) = P (X ∈ {0, 1}) = 0.3 + 0.2 = 0.5


P (Y = 2) = P (X ∈ {−1, 2}) = 0.1 + 0.1 = 0.2
P (Y = 6) = P (X = 3) = 0.3
Now,
P (Y ≤ 0) = P (Y = 0) = 0.5

Page 4
First required condition is not satisfied at Y = 0.

P (Y ≤ 2) = P (Y = 0) + P (Y = 2) = 0.5 + 0.2 = 0.7


Both the required conditions are satisfied at Y = 2.

7. Three friends toss three fair coins to decide who is going to pay for the dinner. The per-
son getting an outcome different from the other two outcomes will pay for the dinner. If
all three coins result in the same outcome, they will toss the coins again. If X denotes
the number of trials needed to decide who is going to pay, then what is the probability
that X is at most 3? (Answer the question correct up to two decimal places.) [2 marks]
Solution:
Let X be the number of trials to decide who is going to pay.
Sample space on tossing three coins are:
{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT }
P (They will decide who is going to pay) = P ({ HHT, HTH, THH, HTT, THT, TTH }) =
6
8
= 43
P (They will not decide who is going to pay) = P ({ HHH, TTT }) = 28 = 14
X will take values as 1, 2, 3, 4, ...
and X ∼ Geometric( 34 )

P (X ≤ 3) = P (X = 1) + P (X = 2) + P (X = 3)
 2
3 1 3 1 3
= + . + .
4 4 4 4 4
= 0.98

6
8. Let X ∼ Uniform({1, 2, 3, ... n}). If the probability that X is an odd number is ,
11
then what can be the value of n? [2 marks]

(a) 11 only
(b) 12 only
(c) Any multiple of 11.
(d) Any odd multiple of 11.

Solution:
Since, X ∼ Uniform({1, 2, 3, ... n})
Let A be the event that X takes odd numbers.
Therefore,
number of outcomes in A
P (A) = ...(1)
number of outcomes in S
where S = {1, 2, 3, ...n}

Page 5
It is given that
6
P (A) = ...(2)
11
By equation (1) and (2), we have
n should be multiple of 11 and number of odd numbers less than or equal to n should
be multiple of 6.
This is possible only for n = 11.

9. The number of customers arriving per day at a certain automobile service facility is
assumed to follow a Poisson distribution with an average of 50 customers arriving each
day. Assume that number of customers on different days are independent. What is the
probability that exactly 40 customers will come for at least 5 days over a 30 days period?
[3 marks]
4 
X  x  30−x 
30 e−50 (50)40 e−50 (50)40
(a) 1 − Cx 40!
1− 40!
x=0
4
X  x  30−x 
30 e−50 (50)40 e−50 (50)40
(b) Cx 40!
1− 40!
x=0
 5  −50 (50)40 25

e−50 (50)40
(c) 30
C5 . 1 − e 40!
40!
−50 (50)40 5
   −50 40 25
(d) 30 C5 1 − e 40! . e 40!(50)

Solution:
Let X be the number of customers arriving per day at a certain automobile service
facility.
X ∼ P oisson(50)
e−50 5040
P (X = 40) =
40!
Let Y be the number of days in the next 30 days on which 40 customers have arrived
on that particular shop.
e−50 5040
 
Then, Y ∼ Binomial 30,
40!
Now,

P (Y ≥ 5) = 1 − P (Y < 5)
4 x  30−x !
e−50 (50)40 e−50 (50)40
X 
30
1− Cx 1−
40! 40!
x=0

Page 6
10. A biased coin with the probability of 0.4 of showing head is tossed until it shows either
two consecutive heads or two consecutive tails. If X denotes the number of tosses
required, what is the value of P (X = 5)? [3 marks]

(a) 0.03456
(b) 0.02304
(c) 0.01675
(d) 0.0576

Solution:
It is clear that

P (X = 5) = P (HTHTT) + P (THTHH)
= (0.4)2 (0.6)3 + (0.4)3 (0.6)2
= 0.0576

Page 7
Statistics for Data Science - 2

Week 1 Practice Assignment Solution


Events and probabilities

1. A customer will purchase a shirt with probability 0.5. The customer will purchase a
pant with probability 0.4 and will purchase both a shirt and a pant with probability 0.2.
What is the probability that the customer will purchase neither a shirt nor a pant?
Solution:
Let A be the event that the customer will purchase a shirt and B be the event that the
customer will purchase a pant.
Given that, P (A) = 0.5 and P (B) = 0.4.
Also given that the customer will purchase both a shirt and a pant with probability 0.2.
i.e. P (A ∩ B) = 0.2.
We have to find the probability that the customer will purchase neither a shirt nor a
pant i.e. P (AC ∩ B C ).
We know that P (AC ∩ B C ) = P ((A ∪ B)C ) = 1 − P (A ∪ B)
And, P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.5 + 0.4 − 0.2 = 0.7
⇒ P (AC ∩ B C ) = 1 − P (A ∪ B) = 1 − 0.7 = 0.3

2. Suppose that we roll a pair of fair dice, so each of the 36 possible outcomes is equally
likely. Let A denote the event that the first die shows 5, B be the event such that the
sum of the outcomes of rolling the pair of dice is 10, and C be the event such that the
sum of the outcomes of rolling the pair of dice is 7. Then

a) Event A and event B are independent.


b) Event A and event B are not independent.
c) Event A and event C are independent.
d) Event A and event C are not independent.

Solution:
We are rolling a pair of fair dice and all the 36 outcomes is equally likely that means
probability of occurring each outcome is same i.e. 1/36.
A is the event that the first die shows 5.
⇒ A = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)}
B is the event that the sum of the outcomes of rolling the pair of dice is 10.
⇒ B = {(4, 6), (5, 5), (6, 4)}
C is the event that the sum of the outcomes of rolling the pair of dice is 7.
⇒ C = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
Also, A ∩ B = {(5, 5)} and A ∩ C = {(5, 2)}
Since each outcome is equally likely, so

1
6 3 6 1 1
P (A) = 36
, P (B) = 36
, P (C) = 36
, P (A ∩ B) = 36
and P (A ∩ C) = 36

1
Since P (A ∩ B) = 36
6= P (A)P (B) ⇒ event A and B are not independent.

1
Also, P (A ∩ C) = 36 = 61 × 16 = P (A)P (C) ⇒ event A and C are independent.
Hence, option (b) and (c) are correct.

3. Let A and B be two independent events of a random experiment. Then, which of the
following is/are always true?

a) P (A ∪ B) = P (A)P (B) + P (B)


b) P (A ∪ B) = P (A)P (B C ) + P (B)
c) P (A ∪ B) = P (A) + P (B)
d) P ((A ∩ B)|A) = P (B)

Solution:
Given that A and B are two independent events ⇒ P (A ∩ B) = P (A)P (B).

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= P (A) + P (B) − P (A)P (B)
= P (A)[1 − P (B)] + P (B)
= P (A)P (B C ) + P (B)

Therefore, option (b) is correct.


Consider
P ((A ∩ B) ∩ A)
P ((A ∩ B)|A) =
P (A)
P (A ∩ B)
=
P (A)
P (A)P (B)
=
P (A)
= P (B)

This implies that option (d) is also correct.


Hence, option (b) and (d) are correct.

4. The probability that a student registered for IITM online degree program will pass the
qualifier exam is 0.6 independent of all other students. Find the probability that out of
10,000 registered students, 7,000 students will pass the qualifier exam.

2
a) (0.6)3000 (0.4)7000
b) (0.6)7,000 (0.4)3,000
10,000
c) C7,000 (0.6)3,000 (0.4)7,000
10,000
d) C7,000 (0.6)7,000 (0.4)3,000

Solution:
Probability(p) that the student registered for IITM online degree program will pass the
qualifier exam is 0.6.
We have to find the probability that out of 10,000 registered students, 7,000 students will
pass the qualifier exam and passing qualifier exam for any student will be independent
of the other.
So here we can use binomial distribution with X will be number of students who will
pass the exam along with p = 0.6, n = 10, 000, and k = 7, 000.
And we know that for binomial distribution P (X = k) = n Ck pk (1 − p)(n−k)

⇒ P (X = 7, 000) = 10,000 C7,000 (0.6)7,000 (1 − 0.6)(10,000−7,000)

⇒ P (X = 7, 000) = 10,000 C7,000 (0.6)7,000 (0.4)3,000

Hence, probability that out of 10,000 registered students, 7,000 students will pass the
qualifier exam is 10,000 C7,000 (0.6)7,000 (0.4)3,000 .

5. Assume that the probability of a defective computer component is 0.05. Components


are randomly selected for being tested(assume that the testing is 100% accurate). Find
the probability that the first defect is observed when the sixth component is tested.

a) (0.05)6 × 0.95
b) (0.95)6 × 0.05
c) (0.95)5 × 0.05
d) (0.05)5 × 0.95

Solution:
We have to find the probability that the first defect is observed when the sixth compo-
nent is tested.
The probability of a defective computer component is 0.05.
Here we can assume that getting a defective component is success. That means we have
to find the probability of first success at 6th trials with p given as 0.05.
So here we can use geometric distribution with X representing the number of compo-
nents tested along with p = 0.05
And we know that for geometric distribution P (X = k) = (1 − p)k−1 p.

3
⇒ P (X = 6) = (1 − 0.05)6−1 × 0.05

⇒ P (X = 6) = (0.95)5 × 0.05

Hence the probability that the first defect is observed when the sixth component is tested
is (0.95)5 × 0.05.

6. If Aarushi and Ansh play a game of chess, Aarushi wins with probability 0.5 and Ansh
wins with probability 0.4 and the game ends in a draw with probability 0.1, independent
of all other games. They agree to play a match consisting of 5 games. Find the proba-
bility that Aarushi wins 4-1 (win gives 1 pt to winner and draw gives 0.5 pts to both).
Enter your answer correct to 3 decimals accuracy.
Solution:
Let Ai be the event that Aarushi will win the ith game and Bj be the event that Ansh
will win the jth game.
From given information we have P (Ai ) = 0.5, P (Bj ) = 0.4
There are two disjoint ways that Aarushi wins 4-1.
i) Aarushi wins 4 games and Ansh wins one game.
Probability of happening this will be 5 C4 (0.5)4 × 0.4 = 0.125
ii) Aarushi wins 3 games and 2 games are drawn.
Probability of happening this will be 5 C3 (0.5)3 × (0.1)2 = 0.0125
So, the probability that Aarushi wins 4-1 is 0.125 + 0.0125 = 0.1375

7. The probability of someone catching flu in a particular winter when they have been
given the flu vaccine is 0.2. Without the vaccine, the probability of catching flu is 0.5. If
40% of the population has been given the vaccine, what is the probability that a person
chosen at random from the population will catch flu over that winter? Enter the answer
correct to 2 decimals accuracy.
Solution:
Let A be the event that the person will catch flu and V be the event that the person
has been given the vaccine.
Given that P (A | V ) = 0.2, P (A | V C ) = 0.5 and P (V ) = 0.4
We have to find the probability that a person chosen at random from the population
will catch flu over that winter i.e. P (A).
And we can write P (A) = P (A | V )P (V ) + P (A | V C )P (V C )
⇒ P (A) = 0.2 × 0.4 + 0.5 × (1 − 0.4)
⇒ P (A) = 0.38

8. Suppose you are playing a game of cards with your friend. Your friend is supposed
to give you 13 cards one by one. With a well-shuffled pack of 52 cards, what is the
probability that you are dealt a perfect hand(13 of one suit)?
13!
a)
52!
4
12! × 39!
b)
51!
13! × 39!
c)
51!
13! × 39!
d)
52!
Solution:
Your friend is supposed to give you 13 cards one by one. Need to find the probability
that you are dealt a perfect hand i.e. you have gotten 13 cards of one suit.
For the first card, it can be any card from the 52 cards so probability will be 1.
Once the first card is given to you, the probability for the second card to be of same suit
will be 12
51
because once the first card is given to you it will belong to one particular suit
and second card will be conditional on that.
11
Similarly for the third card, probability will be 50 .
Continue like this, we get that the probability that you are dealt a perfect hand is

12 11 10 9 8 7 6 5 4 3 2 1
=1× × × × × × × × × × × ×
51 50 49 48 47 46 45 44 43 42 41 40
12! × 39!
=
51!

9. A person has bought a bed from an online furniture store. The seller delivers the
disassembled bed parts along with some screws to assemble it. The probability of a
screw being defective is 0.1 independent of all other screws. To compensate for the
manufacturing error, the seller sends two extra screws in the package where the bed
needs exactly 8 screws to assemble. What is the probability that the buyer will be able
to assemble the bed? (Enter the answer correct to 4 decimal accuracy)
Solution:
Let X represents the number of screws that seller sends with the bed.
We need exactly 8 screws to assemble the bed and the seller sends two extra i.e. seller
sends ten screws.
The buyer will be able to assemble the bed if 8 screws are non - defective or 9 screws
are non - defective or 10 screws are non - defective out of the ten screws.
We can relate this with binomial distribution as X ∼ Binomial(10, p) where p is the
probability of a screw being non - defective and value of p will be 1 - 0.1 = 0.9
The buyer will be able to assemble the bed if at least 8 screws are non - defective.
So, the probability that the buyer will be able to assemble the bed is P (X ≥ 8).

5
And

P (X ≥ 8) = P (X = 8) + P (X = 9) + P (X = 10)
= 10 C8 (0.9)8 (0.1)2 + 10 C9 (0.9)9 (0.1)1 + 10 C10 (0.9)10 (0.1)0
= (0.9)8 [(0.1)2 × 45 + 10 × 0.9 × 0.1 + 0.81]
= (0.9)8 × 2.16
= 0.9298

10. In a pizza shop 40% of the customers order medium size pizza, 50% order small size
pizza, and 10% order large size pizza. Of those ordering medium size pizza 32 also ask to
add extra toppings. Of those ordering small size pizza 15 also ask to add extra toppings,
and of those ordering large size pizza 45 also ask to add extra toppings. Given that a
customer asked to add extra toppings, find the conditional probability that the customer
ordered a medium pizza.
15
a) 67
40
b) 67
12
c) 67
52
d) 67

Solution:
Let S, M and L denote the event that customer will order small, medium and large size
pizza, respectively.
Given that P (S) = 0.50, P (M ) = 0.40 and P (L) = 0.10.
Also, let T be the event that customer will ask to add extra toppings.
This implies that P (T | S) = 15 , P (T | M ) = 23 and P (T | L) = 54 .
We need to find P (M | T ).
And
P (M ∩ T )
P (M | T ) =
P (T )
P (T | M )P (M )
=
P (T | S)P (S) + P (T | M )P (M ) + P (T | L)P (L)
2
3
× 0.40
= 1
5
× 0.50 + 3 × 0.40 + 45 × 0.10
2

0.80 15
= ×
3 6.7
40
=
67

6
Statistics for Data Science - 2

Week 2 Graded Assignment Solution


Discrete random variable

1. Toss a coin 50 times. Let the random variable X be defined as the number of tails
observed. Find the average of the values in the range of the random variable.
Solution:
Random variable X is defined as the number of tails observed while tossing the coin 50
times.
So the possible values taken by X is 0, 1, 2, 3 ....48, 49, 50.
⇒ Range of X = {0, 1, 2, 3....., 48, 49, 50}
Average of range values = sum of all values of range/ total number of values

0+1+2+3+.....+48+49+50 1275
⇒ Average of range values = 51
= 51
= 25

2. Suppose that 5 fruits are randomly chosen from a basket containing 20 fruits, of which
16 are good and 4 are rotten. Let Y denote the number of rotten fruits chosen. Find
the possible values taken by Y .

a) {1, 2, 3, 4, 5}
b) {0, 1, 2, 3, 4, 5}
c) {1, 2, 3, 4}
d) {0, 1, 2, 3, 4}

Solution:
Random variable Y is defined as the number of rotten fruits chosen from the basket
while drawing 5 fruits. Since there are only 4 rotten fruits, so Y cannot take values more
than 4. Also there are 16 good fruits, so while drawing fruits there can be 0 rotten fruit
or 1 rotten fruit or 2 rotten fruits or 3 rotten fruits or 4 rotten fruits.
Hence, the possible values taken by Y i.e Range = {0, 1, 2, 3, 4}.

3. Let X be the number of candies present in a box. We have the following information:
There are at most four candies in the box.
The probability of having 2 candies in the box is the same as the probability of having
one candy.
The probability of having no candy in the box is the same as the probability of having
3 candies.
The probability of having four candies is twice of the probability of having three candies
and four times of having two candies.
What will be the PMF of X?

1
X 0 1 2 3 4
a) 1 1 2 2 4
P (X = x) 10 10 10 10 10

X 0 1 2 3 4
b) 2 1 1 2 4
P (X = x) 10 10 10 10 10

X 0 1 2 3 4
c) 1 2 1 2 4
P (X = x) 10 10 10 10 10

X 0 1 2 3 4
d) 4 2 1 1 2
P (X = x) 10 10 10 10 10

Solution:
Given that there are at most four candies in the box, so X cannot take values more than
4.
Also given that
P (X = 2) = P (X = 1), P (X = 0) = P (X = 3), P (X = 4) = 2P (X = 3) and
P (X = 4) = 4P (X = 2).
Let P (X = 2) = p and P (X = 0) = q
⇒ 2q = 4p
⇒ q = 2p
And we know that P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) = 1
⇒ q + p + p + q + 2q = 1
⇒ 4q + 2p = 1
Using the above relation, we will get 4 × 2p + 2p = 1
⇒ p = 1/10 and hence q = 2/10.
So, P (X = 0) = 2/10, P (X = 1) = 1/10, P (X = 2) = 1/10, P (X = 3) = 2/10, and
P (X = 4) = 4/10.
Therefore, option b is the correct answer.

4. Let X be a discrete random variable with following probability mass function

X 0 1 2 3 4 5 6
P (X = x) 0 k 4k 6k 4k 10k 2 6k 2

Table 2.1.G: PMF of X

Find the value of P (X ≤ 4). Enter your answer correct up to 4 decimals accuracy.
Solution:
P6
We know that P (X = x) = 1
x=0
⇒ P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) + P (X = 5) + P (X = 6)
=1
⇒ 0 + k + 4k + 6k + 4k + 10k 2 + 6k 2 = 1

2
⇒ 16k 2 + 15k − 1 = 0
⇒ (16k − 1)(k + 1) = 0
⇒ k = −1 or k = 1/16
Since k cannot take negative values, so k must be 1/16.
Now,

P (X ≤ 4) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)
= 0 + k + 4k + 6k + 4k
= 15k
1
= 15 ×
16
= 0.9375

5. I roll two fair six sided dice and observe the two outcomes. Let the random variables
Y and Z denote the outcomes observed on the two dice and let X = Y + Z. Find
P (Y = 3|X = 6).
Solution:
Y and Z denotes the outcomes observed on the two dice.
Given X = Y + Z, so the favourable outcomes for X = 6 will be {(1,5),(2,4),(3,3),(4,2),
(5,1)}.
From the reduced sample space the favourable outcomes for (Y = 3|X = 6) will be
{(3,3)}.
Hence, P (Y = 3|X = 6) = 51 = 0.2

6. Let X be a discrete random variable with following probability mass function




 0.2 for k = 0

0.3 for k = 1



P (X = k) = 0.4 for k = 2

0.1 for k = 3





0 otherwise.

Define Y = (X − 1)(X + 1)(X + 3). Find P (Y ≤ 32).


Solution:
Given that X is taking values 0, 1, 2 and 3 and Y = (X − 1)(X + 1)(X + 3).
Now we will calculate the values taken by Y corresponding to every value of X.
At X = 0
Y = (0 − 1)(0 + 1)(0 + 3) = −3
At X = 1
Y = (1 − 1)(1 + 1)(1 + 3) = 0
At X = 2
Y = (2 − 1)(2 + 1)(2 + 3) = 15

3
At X = 3
Y = (3 − 1)(3 + 1)(3 + 3) = 48
This implies that Y is taking values -3, 0, 15, and 48.
So,

P (Y ≤ 32) = P (Y = −3) + P (Y = 0) + P (Y = 15)


= P (X = 0) + P (X = 1) + P (X = 2)
= 0.2 + 0.3 + 0.4
= 0.9

7. A shopkeeper sells mobile phones. The demand for mobile phone follows a Poisson dis-
tribution with mean 4.6 per week. The shopkeeper has 5 mobile phones in his shop at
the beginning of a week. Find the probability that this will not be enough to satisfy the
demand for mobile phones in that week. Enter your answer correct up to two decimals
accuracy.
Solution:
The shopkeeper has 5 mobile phones in his shop at the beginning of a week. The shop-
keeper will not be able to satisfy the demand for mobile phones in that week only if
the demand of mobile phone is more than 5 phones. So, we need to find the value of
P (X > 5).
Also given that demand for mobile phone follows a Poisson distribution with mean 4.6
per week. i.e. λ = 4.6

P (X > 5) = 1 − P (X ≤ 5)
= 1 − [P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) + P (X = 5)]
 −4.6
e (4.6)0 e−4.6 (4.6)1 e−4.6 (4.6)2 e−4.6 (4.6)3 e−4.6 (4.6)4 e−4.6 (4.6)5

=1− + + + + +
0! 1! 2! 3! 4! 5!
−4.6
= 1 − e [1 + 4.6 + 10.58 + 16.22 + 18.66 + 17.16]
= 1 − 0.68
= 0.32

8. Suppose that in the end semester paper of Statistics there are 18 multiple-choice ques-
tions (only one option is correct for each question). Each question has 4 possible options.
You know the answer to 8 questions, but you have no idea about the other 10 questions
and choose answers randomly and independently. Your score X of the exam is the total
number of correct answers. Find the value of P (X ≥ 12). Enter your answer correct up
to 2 decimals accuracy.
Solution:
Since your score is the total number of correct answers and you know the answer to 8
questions.

4
So, instead of finding the value of P (X ≥ 12), define a new random variable Y and
find the value of P (Y ≥ 4) from the set of 10 questions for which you do not know the
answer.
Also there are four options to each question and only one is correct. That means prob-
ability of getting an answer correct is 1/4 and each question is independent of other.
So we can use binomial distribution with n = 10 and p = 0.25
Now,

P (Y ≥ 4) = 1 − P (Y < 4)
= 1 − [P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3)]
h  1 0  3 10  1 1  3 9  1 2  3 8  1 3  3 7 i
= 1 − 10 C0 + 10 C1 + 10 C2 + 10 C3
4 4 4 4 4 4 4 4
 3 7 h 3 3  1 1  3 2  1 2  3 1  1 3 i
=1− + 10 + 45 + 120
4 4 4 4 4 4 4
 3 7 h 372 i
=1−
4 64
= 1 − 0.78
= 0.22

This implies that P (X ≥ 12) = 0.22.

9. A fruit owner sells fruit in a lot that contains 50 fruits. A customer selects 5 fruits at
random from a lot and rejects the lot (will not purchase) if one of the 5 selected fruits
is rotten. What is the probability that the customer will purchase the lot if there are 4
rotten fruits in the lot? Enter your answer correct up to 2 decimals accuracy.
Solution:
Given that there are 4 rotten fruits in the lot that contains 50 fruits.
Customer will purchase the lot if out of 5 selected fruits there is no rotten fruit.
Probability that there will not be any rotten fruit in 5 selected fruits will be
4
C0 46 C5 1370754
50 C
= = 0.6469
5 2118760
Accepted range: 0.61 - 0.67

10. Suppose the probability that any given person will independently believe a tale about
the existence of a parallel universe is 0.6. What is the probability that the eighth person
to hear this tale about existence of a parallel universe is the fifth one to believe it?

a) 8 C5 (0.6)5 (0.4)3
7
b) C4 (0.6)5 (0.4)3
c) 8 C5 (0.6)3 (0.4)5
d) 7 C4 (0.6)3 (0.4)5

5
Solution:
Given that the probability that any given person will believe a tale about the existence
of parallel universe is 0.6.
We need to find the probability that the eighth person to hear this tale about existence
of parallel universe is the fifth one to believe it.
We can put this into other words as out of 7 trials we need 4 successes and 8th trial also
a success.(Here success is considered as the probability that the person will believe the
tale about the existence of parallel universe)
Probability of getting 4 successes out of 7 will be 7 C4 (0.6)4 (0.4)3
Combining that 8th trial also, success will be 7 C4 (0.6)4 (0.4)3 × 0.6.
This implies that the probability that the eighth person to hear this tale about existence
of parallel universe is the fifth one to believe it is 7 C4 (0.6)5 (0.4)3 .

11. Suppose the number of visitors arriving at a zoo can be modeled to be Poisson dis-
tributed. On an average 20 visitors arrive per hour. Let X be the number of visitors
arriving from 2pm to 4pm. Then the probability that at least 35 visitors will arrive in
the given duration is
k=∞
P e−20 (20)k
a)
k=35 k!
k=34
P e−20 (20)k
b) 1 −
k=0 k!
k=∞
P e−40 (40)k
c)
k=35 k!
k=34
P e−40 (40)k
d) 1 −
k=0 k!

Solution:
Given that on an average 20 visitors arrive per hour and X is the number of visitors
arriving from 2pm to 4pm. So, here λ = 20 × 2 = 40
Now we have to find the probability that at least 35 visitors will arrive in the given
duration, that is from 2pm to 4pm.

P (X ≥ 35) = P (X = 35) + P (X = 36) + P (X = 37) + .....


k=∞
X e−λ (λ)k
=
k=35
k!
k=∞
X e−40 (40)k
=
k=35
k!

6
Also we can write

P (X ≥ 35) = 1 − P (X < 35)


= 1 − [P (X = 0) + P (X = 1) + P (X = 2) + .... + P (X = 34)]
k=34
X e−40 (40)k
=1−
k=0
k!

7
Statistics for Data Science - 2

Week 3 Practice Assignment Solution


Multiple random variables

1. Let X and Y be two random variables with joint distribution given in Table 3.1.P, where
a and b are two unknown values.

X
0 1 2
Y
1 3
0 a
12 12
2 1
1 b
12 12
3 1 1
2
12 12 12

Table 3.1.P: Joint distribution of X and Y .

i) Find P (Y = 1).
4
a)
12
3
b)
12
5
c)
12
1
d)
12
Solution: P
We know that, fXY (x, y) = 1
x∈TX , y∈TY
1 3 2 1 3 11
⇒ 12 + 12 +a + 12 + b + 12 + 12 + 12
+ 12 =1
⇒a+b=0
Since a and b cannot take negative values ⇒ a = b = 0.

1
Now,
X
P (Y = 1) = fXY (x, 1)
x∈TX
2 1
= +b+
12 12
3
= +0
12
3
=
12
ii) Find P (Y = 1 | X = 2).
1
a)
12
1
b)
4
1
c)
3
1
d)
2
Solution:

P (Y = 1, X = 2)
P (Y = 1 | X = 2) =
P (X = 2)
1
= 12
1 1
a+ +
12 12
1
=
2

iii) Find P (X = 0, Y ≥ 1).


4
a)
12
3
b)
12
5
c)
12
1
d)
12

2
Solution:

P (X = 0, Y ≥ 1) = P (X = 0, Y = 1) + P (X = 0, Y = 2)
2 3
= +
12 12
5
=
12

2. Let X and Y be two independent discrete random variables with CDFs FX and FY ,
respectively. Define another random variable Z = min(X, Y ), then the CDF of Z is

a) min(FX , FY )
b) FX FY
c) FX + FY + FX FY
d) FX + FY − FX FY

Solution:

FZ (z) = P (Z ≤ z) = P (min(X, Y ) ≤ z)
= 1 − P (min(X, Y ) > z)
= 1 − P (X > z, Y > z)

Since X and Y are two independent discrete random variables,


P (X > z, Y > z) = P (X > z)P (Y > z)

⇒ FZ (z) = 1 − P (X > z)P (Y > z)


= 1 − [(1 − P (X ≤ z))(1 − P (Y ≤ z))]
= 1 − [(1 − FX (z))(1 − FY (z))]
= FX (z) + FY (z) − FX (z)FY (z)

3. Let X and Y be two independent random variables with PMFs


(
1
for k = 1, 2, 3, 4, 5, 6.
fX (k) = fY (k) = 6
0 otherwise

Define Z = X − Y . Find the value of fZ (3).


4
a)
12
3
b)
12
3
5
c)
12
1
d)
12
Solution:

fZ (3) = P (Z = 3) = P (X − Y = 3)
= P (X = 4, Y = 1) + P (X = 5, Y = 2) + P (X = 6, Y = 3)

Given that X and Y are two independent random variables.


⇒ P (X = x, Y = y) = P (X = x)P (Y = y) for all (x, y).

fZ (3) = P (X = 4, Y = 1) + P (X = 5, Y = 2) + P (X = 6, Y = 3)
= P (X = 4)P (Y = 1) + P (X = 5)P (Y = 2) + P (X = 6)P (Y = 3)
1 1 1 1 1 1
= × + × + ×
6 6 6 6 6 6
3
=
36
1
=
12

4. Let X ∼ Geometric(p) and Y ∼ Geometric(p) be independent and let Z = X + Y .


Determine the values of p for which P (Z = 26) > P (Z = 25).

a) p > 0.02
b) p < 0.04
c) p > 0.15
d) p < 0.30
e) p = 0.05

Solution:
If X ∼ Geometric(p) and Y ∼ Geometric(p) are two independent random variables and
Z = X + Y , then
P (Z = n) = (n − 1)p2 (1 − p)n−2 (try derivation by yourself)
We have to find the value of p for which P (Z = 26) > P (Z = 25).
P (Z = 26) = (26 − 1)p2 (1 − p)26−2 and P (Z = 25) = (25 − 1)p2 (1 − p)25−2
Comparing both, we will get
25p2 (1 − p)24 > 24p2 (1 − p)23
⇒ 25(1 − p) > 24
24
⇒ 1 − p > 25
⇒ p < 0.04

4
5. Let X ∼ Uniform({1, 2, 3, 4, 5, 6}) and let Y be the number of times 2 occurs in X throws
of a fair die. Choose the incorrect option(s) among the following.

1
a) P (Y = 2 | X = 2) =
6
52
b) P (Y = 2 | X = 4) =
63
5
c) P (Y = 5 | X = 6) = 5
6
5
d) P (Y = 6 | X = 5) = 6
6
Solution:

P (Y = 2 | X = 2) ∼ Bin(2, 1/6), Y takes values in{0, 1, 2}


1 5
= 2 C2 ( )2 ( )0
6 6
1
=
36

P (Y = 2 | X = 4) ∼ Bin(4, 1/6)
1 5
= 4 C 2 ( )2 ( )2
6 6
52
= 3
6

P (Y = 5 | X = 6) ∼ Bin(6, 1/6)
1 5
= 6 C 5 ( )5 ( )1
6 6
5
= 5
6

P (Y = 6 | X = 5) ∼ Bin(5, 1/6)
=0

6. Let the random variables X and Y each have range {1, 2, 3}. The following formula
gives the joint PMF
i + 2j
P (X = i, Y = j) = ,
c
where c is an unknown value. Find P (1 ≤ X ≤ 3, 1 < Y ≤ 3).

5
5
a) 9
7
b) 9
2
c) 9
4
d) 9

Solution: P
We know that, P (X = x, Y = y) = 1
x∈TX , y∈TY
⇒ P (X = 1, Y = 1) + P (X = 1, Y = 2) + P (X = 1, Y = 3) + P (X = 2, Y = 1) + P (X =
2, Y = 2) + P (X = 2, Y = 3) + P (X = 3, Y = 1) + P (X = 3, Y = 2) + P (X = 3, Y =
3) = 1
⇒ 3c + 5c + 7c + 4c + 6c + 8c + 5c + 7c + 9
c
=1
⇒ c = 54
Now,

P (1 ≤ X ≤ 3, 1 < Y ≤ 3) = P (X = 1, Y = 2) + P (X = 1, Y = 3) + P (X = 2, Y = 2)
+ P (X = 2, Y = 3) + P (X = 3, Y = 2) + P (X = 3, Y = 3)

1
⇒ P (1 ≤ X ≤ 3, 1 < Y ≤ 3) = [5 + 7 + 6 + 8 + 7 + 9]
c
42
=
54
7
=
9

7. The joint PMF of the random variables X and Y is given in Table 3.2.P.

X
1 2 3
Y

1 k k 2k

2 2k 0 4k

3 3k k 6k

Table 3.2.P: Joint distribution of X and Y .

Consider the random variable Z = X 2 Y.


i) Find the range of Z | Y = 2.
a) {1, 4, 9}

6
b) {4, 8, 18}
c) {1, 9}
d) {2, 18}
e) {2, 8, 18}

Solution: P
We know that, P (X = x, Y = y) = 1
x∈TX , y∈TY
⇒ k + k + 2k + 2k + 0 + 4k + 3k + k + 6k = 1
1
⇒ k = 20
When Y = 2, P (X = 2, Y = 2) = 0. So for the range we will not consider the pair (2,
2).
Since Z = X 2 Y , the range of Z | Y = 2 will be {12 × 2, 32 × 2} which is equal to {2, 18}.
ii) Find the value of P (Z = 18 | Y = 2).
1
a) 3
2
b) 3
3
c) 4
1
d) 4

Solution:

P (Z = 18, Y = 2)
P (Z = 18 | Y = 2) =
P (Y = 2)
P (X = 3, Y = 2)
=
P (X = 1, Y = 2) + P (X = 3, Y = 2)
4k
=
2k + 4k
2
=
3

8. The following options gives the joint PMF of the random variables X and Y . If the
random variables X and Y are independent, then which of the following option(s) can
be the joint PMF of X and Y ?

7
Y
0 1 2
X

0 0.01 0 0

1 0.09 0.09 0

2 0 0 0.81

a)

Y
0 1 2
X

0 0.06 0.18 0.12

1 0.04 0.12 0.48

b)

Y
0 1 2
X
1 1 1
0
12 24 24
1 1 1
1
6 12 8
1 1 1
2
4 8 12

c)

Y
0 1 2
X
1 1 1
0
10 5 5
1 1 3
1
10 10 10

d)

8
Y
0 1
X

0 0.10 0.15

1 0.20 0.30

2 0.10 0.15

e)

Solution:
In option a)
P (X = 0, Y = 1) = 0 but P (X = 0) = 0.01+0+0 = 0.01 and P (Y = 1) = 0+0.09+0 =
0.09
⇒ P (X = 0, Y = 1) 6= P (X = 0)P (Y = 1)
Therefore, option (a) cannot be the joint PMF of X and Y.

In option b)
P (X = 0, Y = 0) = 0.06 but P (X = 0) = 0.06 + 0.18 + 0.12 = 0.36 and P (Y = 0) =
0.06 + 0.04 = 0.10
⇒ P (X = 0, Y = 0) = 0.06 6= 0.036 = P (X = 0)P (Y = 0)
Therefore, option (b) cannot be the joint PMF of X and Y.

In option c)
P (X = 1, Y = 0) = 1/6 but P (X = 1) = 1/6 + 1/12 + 1/8 = 3/8 and P (Y = 0) =
1/12 + 1/6 + 1/4 = 1/2
⇒ P (X = 1, Y = 0) = 1/6 6= 3/16 = P (X = 1)P (Y = 0)
Therefore, option (c) cannot be the joint PMF of X and Y.

In option d)
P (X = 0, Y = 1) = 1/5 but P (X = 0) = 1/10 + 1/5 + 1/5 = 1/2 and P (Y = 1) =
1/5 + 1/10 = 3/10
⇒ P (X = 0, Y = 1) = 1/5 6= 3/20 = P (X = 0)P (Y = 1)
Therefore, option (d) cannot be the joint PMF of X and Y.

In option e)
For every (x, y), P (X = x, Y = y) = P (X = x)P (Y = y) (check yourself)
Hence option (e) is the joint PMF of X and Y.
Answer: e

9
9. From a sack of fruits containing 3 mangoes, 2 kiwis, and 3 guavas, a random sample of
4 pieces of fruit is selected. If X is the number of mangoes and Y is the number of kiwis
in the sample, then find the joint probability distribution of X and Y .

X
0 1 2 3
Y
3 9 3
0 0
70 70 70
2 18 2 18
1
70 70 70 70
3 9 3
2 0
70 70 70

a)

X
0 1 2 3
Y
3 9 3
0 0
70 70 70
2 18 18 2
1
70 70 70 70
3 9 3
2 0
70 70 70

b)

X
0 1 2 3
Y
3 9 3
0 0
70 70 70
2 18 18 2
1
70 70 70 70
9 3 3
2 0
70 70 70

c)

10
X
0 1 2 3
Y
3 3 9
0 0
70 70 70
2 18 18 2
1
70 70 70 70
3 9 3
2 0
70 70 70

d)
Solution:
X is the number of mangoes and Y is the number of kiwis in the sample. The number
of mangoes and kiwis in the sack is 3 and 2,respectively.
So X will take values in {0, 1, 2, 3} and Y will take values in {0, 1, 2} when the random
sample of 4 pieces is selected.
P (X = 0, Y = 0) = P (no mango and no kiwi) = 0 (not possible since the number of
guava is 3)
2
C1 3 C3 2
P (X = 0, Y = 1) = P (no mango and one kiwi) = 8 =
C4 70
2
C2 3 C2 3
P (X = 0, Y = 2) = P (no mango and two kiwis) = 8C
=
4 70
3
C1 3 C3 3
P (X = 1, Y = 0) = P (one mango and no kiwi) = 8C
=
4 70
3
C1 2 C1 3 C2 18
P (X = 1, Y = 1) = P (one mango and one kiwi) = 8C
=
4 70
3
C1 2 C2 3 C1 9
P (X = 1, Y = 2) = P (one mango and two kiwis) = 8C
=
4 70
3
C2 3 C2 9
P (X = 2, Y = 0) = P (two mangoes and no kiwi) = 8 =
C4 70
3
C2 2 C1 3 C1 18
P (X = 2, Y = 1) = P (two mangoes and one kiwi) = 8C
=
4 70
3
C2 2 C2 3
P (X = 2, Y = 2) = P (two mangoes and two kiwis) = 8C
=
4 70
Similarly you can check for other values also.
Answer: b
10. Suppose you flip a fair coin. If the coin lands heads, you roll a fair six-sided die 50 times.
If the coin lands tails, you roll the die 51 times. Let X be 1 if the coin lands heads and

11
0 if the coin lands tails. Let Y be the total number of times you get the number 5 while
throwing the dice. Find P (X = 1|Y = 10).
85
a)
157
82
b)
167
72
c)
157
85
d)
167
Solution:

P (X = 1|Y = 10) P (Y = 10|X = 1).P (X = 1)


=
P (X = 0|Y = 10) P (Y = 10|X = 0).P (X = 0)
P (Y = 10|X = 1)
= [Since P (X = 1) = P (X = 0)]
P (Y = 10|X = 0)
50 1 5
C10 ( )10 ( )40
= 6 6
1 5
51 C ( )10 ( )41
10
6 6
50
C10 6
= 51 C
×
10 5
41 6
= ×
51 5
246
=
255

246
⇒ P (X = 1|Y = 10) = × P (X = 0|Y = 10)
255

Also P (X = 1|Y = 10) + P (X = 0|Y = 10) = 1

255
⇒ P (X = 1|Y = 10) + P (X = 1|Y = 10) = 1
246
246 82
⇒ P (X = 1|Y = 10) = =
501 167
11. Three balls are selected at random from a box containing five red, four blue, three yellow
and six green coloured balls. If X, Y and Z are the number of red balls, blue balls and
green balls respectively, choose the correct option(s) among the following.

12
25
a) P (X = 1, Y = 0, Z = 2) =
272
5
b) P (X = 1, Y = 1, Z = 1) =
34
1
c) P (X = 1, Y = 0 | Z = 2) =
4
5
d) P (X = 0, Y = 0, Z = 3) =
204
Solution:
5
C1 6 C2 25
P (X = 1, Y = 0, Z = 2) = P (one red ball and 2 green balls) = 18 C
=
3 272

5
C1 4 C1 6 C1
P (X = 1, Y = 1, Z = 1) = P (one red ball, one blue ball and 1 green ball) = 18 C
3
5
=
34
6
C3 5
P (X = 0, Y = 0, Z = 3) = P (3 green balls) = 18 C
=
3 204

And
5
C1
P (X = 1, Y = 0 | Z = 2) = P (one red ball given that two balls are green) = 16 C
1
5
=
16

13
Statistics for Data Science - 2
Week 3 Graded Assignment
Multiple random variables

1. Joint distribution of two random variables X and Y is given as:

X
0 1
Y
1 1
1
4 8
1
2 k
4
1
3 0
8

Table 3.1.G: Joint distribution of X and Y .

Find the value of fY |X=1 (2). [1 mark]


Solution:
We know that
X
fXY (x, y) = 1
x∈TX ,y∈TY

1 1 1 1
⇒ + + +k+0+ =1
4 8 4 8
3 1
⇒k = 1 − =
4 4
Now,
fXY (1, 2)
fY |X=1 (2) =
fX (1)
fXY (1, 2)
=
fXY (1, 1) + fXY (1, 2) + fXY (1, 3)

1
4
= 1 1 1
8
+ + 4 8

1
4 1
= 1 =
2
2
2. Customers at a fast-food restaurant buy both sandwiches and drinks. The following
joint distribution summarizes the numbers of sandwiches (X) and drinks (Y ) purchased
by customers.

X
1 2
Y

1 0.4 0.2

2 0.1 0.25

3 0 0.05

Table 3.2.G: Joint distribution of X and Y .

Find the probability that a customer will buy two sandwiches given that he has bought
three drinks. [1 mark]
Solution:
X denotes the number of sandwiches purchased by a customer and Y denotes the num-
ber of drinks purchased by a customer.
To find: fX|Y =3 (2)

Now,
fXY (2, 3)
fX|Y =3 (2) = =
fY (3)
fXY (2, 3)
=
fXY (1, 3) + fXY (2, 3)

0.05
=
0 + 0.05
=1

3. Consider an experiment of tossing a fair coin twice. Let X be the number of heads that
occurs in the two tosses and Y be the number of tails that occurs in the two tosses.
Choose the correct statements. [2 marks]
(a) X and Y are independent random variables.
(b) X and Y are dependent random variables.
1
(c) fXY (1, 1) = .
2
1
(d) fY |X=0 (1) = .
4

Page 2
Solution:
X denotes the number of heads that occurs in the two tosses and Y denotes the number
of tails that occurs in the two tosses.
First we will make the table of the joint pmf of X and Y .

X
0 1 2
Y
1
0 0 0 4

1
1 0 2
0

1
2 4
0 0

Joint pmf of X and Y .

From the table, we have


1 1
fX (0) = 0 + 0 + =
4 4
1 1
fY (0) = 0 + 0 + =
4 4
and
fXY (0, 0) = 0

It is clear that
fXY (0, 0) 6= fX (0).fY (0)
It implies that X and Y are dependent random variables.
So, option (a) is incorrect and option (b) is correct.

Now, from table


1
fXY (1, 1) =
2
So, option (c) is correct.

fXY (0, 1)
fY |X=0 (1) = = 0 (Since, fXY (0, 1) = 0)
fX (0)
So, option (d) is incorrect.

4. A fair coin is tossed 4 times. Let X be the total number of heads and Y be the number
of heads before the first tail (If there is no tail in all the four tosses, then Y = 4). What
is the value of fY |X=2 (0)? [2 marks]
5
(a)
16

Page 3
1
(b)
8
9
(c)
16
1
(d)
2
Solution:
A fair coin is tossed four times. X denotes the number of heads and Y denotes the
number of heads before first tail (If there is no tail in all the four tosses, then Y = 4).
Clearly, X ∼ Binomial(4, 21 ).

Now,

fXY (2, 0)
fY |X=2 (0) =
fX (2)
fX|Y =0 (2).fY (0)
= ..(1)
fX (2)

Now, event Y = 0 shows that there is no head before first tail that is first outcome is
tail.
It implies that fY (0) = 21

fX|Y =0 (2) = P (two heads in the next three tosses)


 3
3 1
= C2
2

And 4
fX (2) = 4 C2 12
Putting the values in the equation (1), we get

1 3 1
3

C2 2
.2
fY |X=2 (0) = 4
1

4C
2 2
3 1
= =
6 2

5. Two fair dice are thrown simultaneously. Let X be the outcome on the first die and Y
be the sum of the outcomes on both the dice. Find the value of P (Y − X ≥ 6). [2
marks]

Page 4
1
(a)
6
1
(b)
12
5
(c)
12
1
(d)
24
Solution:
X denotes the outcome on the first die and Y denotes the sum of the outcomes on both
the dice.
Notice that Y − X will denote the outcome on the second die.
Let Z = Y − X, then Z ∼ Uniform({1, 2, 3, 4, 5, 6})

P (Y − X ≥ 6) = P (Z ≥ 6)
P (Y − X ≥ 6) = P (Z = 6)
1
P (Y − X ≥ 6) =
6

6. Let X and Y denote the number of cars and number of bikes reaching a street corner
during a certain 15-minute time period, respectively. Joint distribution of X and Y is
given as
9
fXY (x, y) =
16(4x+y )
Choose the correct option(s). [2 marks]
3
(a) Marginal pmf of X is fX (x) = .
4x+1
3
(b) Marginal pmf of X is fX (x) = .
4x
(c) X and Y are independent random variables.
(d) X and Y are dependent random variables.

Solution:
X and Y denote the number of cars and number of bikes reaching a street corner during
a certain 15-minute time period, respectively.
Range of X and Y will be TX , TY = {0, 1, 2, ..., ∞}

Joint distribution of X and Y is given as


9
fXY (x, y) =
16(4x+y )

Page 5
Now,

X
fX (x) = fXY (x, y)
y=0

X 9
=
y=0
16(4x+y )

9 X 1
=
16.4x y=0 4y
 
9 1 1
= 1 + + 2 + ...
16.4x 4 4
 
9 1
=
16.4x 1 − 14
 
9 4
= 2 x
4 .4 3
3
= x+1
4

Therefore, option (a) is correct and option (b) is incorrect.

Similarly, we can show that


3
fY (y) = y+1
4

Now, Choose two arbitrary points x and y in the range of X and Y , respectively, then

3 3
fX (x).fY (y) = .
4x+1 4y+1
9
⇒ fX (x).fY (y) =
16(4x+y )
⇒ fX (x).fY (y) = fXY (x, y)

Hence, X and Y are independent random variables.


Therefore, option (c) is correct and option (d) is incorrect.

7. Which of the following option(s) is (are) always correct? [2 marks]

(a) fXY Z (x, y, z) = fX|(Y =y,Z=z) (x).fY Z (y, z)


(b) fXY Z (x, y, z) = fX|(Y =y,Z=z) (x).fX (x)
P
(c) fX (x) = fXY (x, y) where RY is the range of Y .
y∈RY

Page 6
(d) fXY (x, y) = fX (x).fY (y)

Solution:
fXY Z (x, y, z)
We know that fX|(Y =y,Z=z) (x) =
fY Z (y, z)
⇒ fXY Z (x, y, z) = fX|(Y =y,Z=z) (x).fY Z (y, z)
Hence, option (a) is correct and option (b) is incorrect.

We know Pby the definition of marginal pmf that


fX (x) = fXY (x, y) where RY is the range of Y .
y∈RY

Hence, option (c) is correct.

fXY (x, y) = fX (x).fY (y) is true only when X and Y are independent. Therefore, option
(d) need not to be always true.

8. Two random variables X and Y are jointly distributed with joint pmf

fXY (x, y) = a(bx + y)

, where x and y are integers in 0 ≤ x ≤ 2 and 0 ≤ y ≤ 3 such that P (X ≥ 1, Y ≤ 2) = 47 .


Find the value of fXY (2, 1). [2 marks]
1
1.
21
5
2. 42
1
3. 42
9
4. 42

Solution: We know that


X
fXY (x, y) = 1
x∈TX ,y∈TY

⇒fXY (0, 0) + fXY (0, 1) + fXY (0, 2) + fXY (0, 3) + fXY (1, 0) + fXY (1, 1) + fXY (1, 2)
+ fXY (1, 3) + fXY (2, 0) + fXY (2, 1) + fXY (2, 2) + fXY (2, 3) = 1

⇒a + 2a + 3a + ab + (ab + a) + (ab + 2a) + (ab + 3a)+


(2ab) + (2ab + a) + (2ab + 2a) + (2ab + 3a) = 1
⇒18a + 12ab = 1 ...(1)

Page 7
Now, using the given condition,
4
P (X ≥ 1, Y ≤ 2) =
7
⇒P (X = 1, Y = 0) + P (X = 1, Y = 1) + P (X = 1, Y = 2) + P (X = 2, Y = 0)+
4
P (X = 2, Y = 1) + P (X = 2, Y = 2) =
7
4
⇒ab + ab + a + ab + 2a + 2ab + 2ab + a + 2ab + 2a =
7
4
⇒6a + 9ab = ....(2)
7

Solving equation (1) and (2), we get


1 1
ab = and a =
21 42
It implies that
1
a= and b = 2
42
Therefore, the joint pmf of X and Y will be
1
fXY (x, y) = (2x + y)
42
1 5
Now, fXY (2, 1) = 42
(4 + 1) = 42
.

9. Let X1 , X2 , X3 and X4 be four independent and identically distributed Poisson random


variables with λi = 4 for all i. Find the probability that exactly one of the Xi equals 0
and exactly one of the Xi equals 1? [3 marks]

(a) 24e−8 (1 − 25e−8 )


(b) 24e−8 (1 − 5e−4 )2
(c) 48e−8 (1 − e−8 )
(d) 48e−8 (1 − 5e−4 )2

Solution:
First we will find the probability such that X1 = 0, X2 = 1 and other two random
variables do not take value 0 and 1.

Since all four random variable are independent, we have

P (X1 = 0, X2 = 1, X3 6= {0, 1}), X4 6= {0, 1}) =


P (X1 = 0).P (X2 = 1).P (X3 6= {0, 1})P (X4 6= {0, 1})....(1)

Page 8
Now,
e−4 40
P (X1 = 0) = = e−4
0!
e−4 41
P (X2 = 1) = = 4e−4
1!

P (X3 6= {0, 1}) = 1 − P (X3 = {0, 1})


= 1 − [P (X3 = 0) + P (X3 = 1)]
= 1 − [e−4 + 4e−4 ] = 1 − 5e−4

P (X4 6= {0, 1}) = 1 − P (X4 = {0, 1})


. = 1 − [P (X4 = 0) + P (X4 = 1)]
. = 1 − [e−4 + 4e−4 ] = 1 − 5e−4

Putting all these values in equation (1), we get


P (X1 = 0, X2 = 1, X3 6= {0, 1}), X4 6= {0, 1}) = e−4 (4e−4 )(1 − 5e−4 )2

We can choose such pairs of Xi for which exactly one Xi equals 0 and exactly one Xi
equals 1 in 4P2 ways.
Therefore,
probability that exactly one of the Xi equals 0 and exactly one of the Xi equals 1 is
given by
4
P2 e−4 (4e−4 )(1 − 5e−4 )2
= 48e−8 (1 − 5e−4 )2

10. Akshat draws a card randomly from a well-shuffled pack of 52 cards. If the drawn card
is a face card, then he draws two balls randomly from bag A which contains 5 Red, 6
Black and 4 Green balls. If the drawn card is not a face card, then he draws three balls
randomly from bag B which contains 7 Red, 8 Black and 5 Green balls. Let two random
variables X and Y are defined as:
(
0 if the drawn card is a face card
X=
1 if the drawn card is not a face card

and Y be the number of Red balls drawn. Find the value of fY (1). Write your answer
correct up to two decimal places. [3 marks]
Solution:
Akshat draws a card randomly from a well-shuffled pack of 52 cards. Random variable
X is defined as (
0 if the drawn card is a face card
X=
1 if the drawn card is not a face card

Page 9
If the drawn card is a face card, then he draws two balls randomly from bag A which
contains 5 Red, 6 Black and 4 Green balls. If the drawn card is not a face card, then
he draws three balls randomly from bag B which contains 7 Red, 8 Black and 5 Green
balls. Random variable Y is the number of Red balls drawn.
To find: fY (1)
We know that

fY (1) = fXY (0, 1) + fXY (1, 1)


= fY |X=0 (1).fX (0) + fY |X=1 (1).fX (1)
5
C1 10 C1 12 7 C1 13 C2 40
= 15 . + 20 .
C2 52 C3 52
= 0.109 + 0.368 = 0.47

Page 10
Statistics for Data Science - 2
Week 4 Practice Assignment
Expectation and variance

5 15
1. If the expected value and variance of the Binomial random variable X are and ,
2 8
respectively, then find the value of P (X = 10). [1 mark]

 10
3
(a)
4
 10
3
(b) 10
4
 10
1
(c)
4
 10
1
(d) 10
4
Solution: If X ∼ Binomial(n, p), then expected value and variance of X is given by np
and np(1 − p), respectively.

Given that
5
E[X] = np = ...(1)
2
And
15
Var(X) = np(1 − p) = ..(2)
8
Putting the value of np in the equation (2) from equation (1), we get
3 1
(1 − p) = ⇒ p = .
4 4
Putting the value of p in equation (1), we get
n = 10
It implies that X ∼ Binomial 10, 14


Therefore,  10  0  10


10 1 3 1
P (X = 10) = C10 =
4 4 4

1 1
2. X and Y are two independent geometric random variables with parameters and ,
2 4
respectively. Find the value of Var(X + 2Y ). [1 mark]
Solution:
1−p
We know that if X ∼ Geometric(p), then Var(X) =
p2
1
1− 2
Therefore, Var(X) = 1 =2 ...(1)
4
1
1− 4
Var(Y ) = 1 = 12 ...(2)
16

Now, since X and Y are independent, we have

Var(X + 2Y ) = Var(X) + 22 Var(Y )


2 + 48 = 50

3. The number of spam messages (X) sent to a server in a day has Poisson distribution
with parameter λ = 21. Each spam message independently has a probability of p = 13
of not being detected by the spam filter. Let Y denote the number of spam messages
detected by the filter in a day. Calculate the expected value of X + Y . [2 marks]

solution:
X denotes the number of spam messages sent to the server in a day and

X ∼ Poisson(21)

Y denotes the number of spam messages detected by the filter in a day.


1
It is given that each spam messages independently has a probability of of not being
3
detected. It implies that
2
Y |X ∼ Binomial(X, )
3
Recall that if N ∼ Poisson(λ) and Z|N ∼ Binomial(N, p), then Z ∼ Poisson(λp).

Therefore, Y ∼ Poisson(14)

E[X] = 21 and E[Y ] = 14


⇒ E[X + Y ] = E[X] + E[Y ] = 35

4. Two random variables X and Y are jointly distributed with the joint pmf
1
fXY (x, y) = (x + y),
9
where x and y are integers in 0 ≤ x ≤ 2 and 0 ≤ y ≤ 1. Let Z = XY + Y 2 . Find the
expected value of Z. [2 marks]
1
(a)
3

Page 2
4
(b)
3
2
(c)
3
14
(d)
9
Solution:

E[Z] = E[XY + Y 2 ]
X
= (xy + y 2 )fXY (x, y)
0≤x≤2;0≤y≤1
1 X
= (xy + y 2 )(x + y)
9 0≤x≤2;0≤y≤1
1
= (1 + 4 + 9)
9
14
=
9
5. The distribution of a certain company’s employees’ monthly salary has mean |60000 and
standard deviation |20000. The probability that a randomly selected employee from that
company has a salary either greater than or equal to |100000 or less than or equal to
|20000 is: [2 marks]
1
(a) at least
4
1
(b) at most
4
1
(c) at least
2
1
(d) at most
2
Solution:
Let X denote the employees’ monthly salary.
Given that E[X] = µ = 60000 and SD= σ = 20000.

P (X ≥ 100000 or X ≤ 20000) = P (X − 60000 ≥ 40000 or X − 60000 ≤ −40000)


= P (|X − 60000| ≥ 40000)
= P (|X − µ| ≥ 2σ)
By using Chebyshev’s inequality
1

4

Page 3
Hence, probability that a randomly selected employee from that company has a salary
either greater than or equal to |100000 or less than or equal to |20000 is at most 14 .

6. Two random variables X and Y are jointly distributed with the joint pmf
1
fXY (x, y) = (xy + x + y + 1),
27
where x and y are integers in 0 ≤ x ≤ 1 and 1 ≤ y ≤ 3. Find the correlation coefficient
of X and Y . [2 marks]
Solution:

X
E[X] = xfXY (x, y)
x∈TX ,y∈YY
1 X
= x(xy + x + y + 1)
27 x∈T ,y∈Y
X Y

1
= (4 + 6 + 8)
27
18 2
= =
27 3

X
E[Y ] = yfXY (x, y)
x∈TX ,y∈YY
1 X
= y(xy + x + y + 1)
27 x∈T ,y∈Y
X Y

1
= (2 + 6 + 12 + 4 + 12 + 24)
27
60 20
= =
27 9

X
E[XY ] = xyfXY (x, y)
x∈TX ,y∈YY
1 X
= xy(xy + x + y + 1)
27 x∈T ,y∈Y
X Y

1
= (4 + 12 + 24)
27
40
=
27

Page 4
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
40 2 20
= − .
27 3 9
=0

We know that
Cov(X, Y )
Correlation coefficient = p =0
Var(X)Var(Y )

7. Let X and Y be two independent random variables such that X ∼ Binomial(4, 12 ) and
Y ∼ Uniform({1, 2, 3}). Find the value of Cov(2X + Y , X + Y 2 X). [2marks]

(a) 16.67
(b) 6.67
(c) 13.37
(d) 0

Solution:

Cov(2X + Y, X + Y 2 X) = Cov(2X, X + Y 2 X) + Cov(Y, X + Y 2 X)


= Cov(2X, X) + Cov(2X, Y 2 X) + Cov(Y, X) + Cov(Y, Y 2 X)
= 2Cov(X, X) + 2Cov(X, Y 2 X) + Cov(Y, X) + Cov(Y, Y 2 X)
= 2Var(X) + 2(E[X 2 Y 2 ] − E[X]E[Y 2 X]) + (E[XY ] − E[X]E[Y ])
+ (E[XY 3 ] − E[Y ]E[Y 2 X])

Since X and Y are independent random variables, (X 2 , Y 2 ), (X, Y 2 ), (X, Y 3 ) are also
independent. It implies that
E[X 2 Y 2 ] = E[X 2 ]E[Y 2 ]
E[Y 2 X] = E[Y 2 ]E[X]
E[XY 3 ] = E[X]E[Y 3 ]

Therefore,

Cov(2X + Y, X + Y 2 X) = 2Var(X) + 2(E[X 2 ]E[Y 2 ] − E[X]2 E[Y 2 ]) + (E[XY ] − E[X]E[Y ])


+ (E[X]E[Y 3 ] − E[Y ]E[Y 2 ]E[X])
= 2Var(X) + 2(E[X 2 ]E[Y 2 ] − E[X]2 E[Y 2 ]) + E[X]E[Y 3 ] − E[Y ]E[Y 2 ]E[X]

Page 5
Now, X ∼ Binomial(4, 12 )
Therefore, E[X] = np = 2
Var(X) = np(1 − p) = 1
E[X 2 ] = Var(X) + (E[X])2 = np(1 − p) + (np)2 = 1 + 4 = 5

And Y ∼ Uniform({1, 2, 3})


E[Y ] = 31 (1 + 2 + 3) = 2
E[Y 2 ] = 31 (1 + 4 + 9) = 14
3
E[Y 3 ] = 31 (1 + 8 + 27) = 12

Therefore,

70 56 56
Cov(2X + Y, X + Y 2 X) = 2(1) + 2( − ) + 24 −
3 3 3
28
= 26 −
3
= 16.67

8. The joint distribution of two random variables X and Y is given as:

X
0 1 2
Y
2 5
-1 0
17 17
1 2
0 0
17 17
3 4
1 0
17 17

Table 4.1.P: Joint distribution of X and Y .

Find the standard deviation of the product of the two random variables. (Write your
answer correct up to two decimal points.) [2 marks]
Solution:
To find: SD(XY )

Page 6
X
E[XY ] = xyfXY (x, y)
x∈TX ,y∈TY
2 5 4
= −1( ) − 2( ) + 2( )
17 17 17
−4
=
17

X
E[(XY )2 ] = = x2 y 2 fXY (x, y)
x∈TX ,y∈TY
2 5 4
= 1( ) + 4( ) + 4( )
17 17 17
38
=
17

Var(XY ) = E[(XY )2 ] − [E[XY ]]2


38 16
= −
17 289
630
=
289
Therefore,
p
SD(XY ) = Var(XY )
r
630
= = 1.47
289

9. An ice-cream seller sells ice creams at three prices: |30, |40, and |50. A random cus-
tomer will buy an ice cream of |30, |40 and |50 with probabilities of 0.5, 0.3, and
0.2, respectively. If the number of customers in a day follows Poisson distribution with
λ = 60, what is the expected sales (in |) of the seller in a day? [3 marks]
Solution:
Let X denote the number of customers coming to the ice-cream seller in a day, then

X ∼ Poisson(60)

Let Y denote the price at which the customer buys the ice-cream, then
E[Y ] = 30(0.5) + 40(0.3) + 50(0.2) = 37

If X = x customers comes at the shop, then expected sale will be xE[Y ]

Page 7
But since X ∼ Poisson(60), on an average 60 customers come to the ice-cream seller in
a day. It means that expected sale of the day will be

60E[Y ] = 60(37) = 2220

10. An urn contains 10 balls numbered from 1 to 10. We remove six balls randomly and add
up their numbers. Let X denote the sum of the numbers of the removed balls. Find the
expected value of X. [3 marks]
6
P
(Hint: Suppose Xi denotes the number of the ith removed ball, then X = Xi )
i=1

Solution:
Let Xi , i = 1, 2, ...6 denote the number on the ith ball, then
P6
X= (Xi )
i=1 6 
P
⇒ E[X] = E (Xi )
 6 i=1 
P
⇒ E[X] = E(Xi )
i=1
⇒ E[X] = 6E(Xi ) ...(1)
1 11
Now, E[Xi ] = [1 + 2 + 3 + ...10] =
10 2
Putting the value in equation (1), we get
11
E[X] = 6 × = 33
2

Page 8
Statistics for Data Science - 2
Week 4 Graded assignment solutions

1. Suppose 1 in 100 products that are coming out of a production line is defective. Sup-
pose we randomly pick and keep aside products from the production line till the first
defective item is obtained. Let the random variable X represent the number of prod-
ucts that are kept aside (Assume that the first defective item is also kept aside). Find
Var(X). [1 mark]
1
(a)
100
99
(b)
100
(c) 100
(d) 9900

Solution:
The random variable X represent the number of products that are kept aside (including
the first defective item) before the first defective is obtained.
It is given that 1 out of 100products
 are defective.
1
Therefore, X ∼ Geometric
100
Now,
1−p
Var(X) =
p2
1− 1
= 1 100 = 9900
( 100 )2

Hence, the correct option is (d).

2. Two coins are tossed. The probabilities of occurrence of tail on the first and the second
coin are 0.6 and 0.4, respectively. If the random variable X represents the number of
heads obtained, find the expected value of X. (Enter the answer correct to 2 decimal
points). [1 mark]
Answer: 1
Solution:

1
Given,

P (tail occurs on the first coin) = 0.6. (1)


P (tail occurs on the second coin) = 0.4. (2)

Random variable X denote the number of heads obtained after the tossing of two coins.
Therefore, X will take the values in {0, 1, 2}.
Now,
X
E(X) = xP (X = x)
x∈X
=0.P (X = 0) + 1.P (X = 1) + 2.P (X = 2)

P (X = 1) =P (H on first coin and T on second coin) + P (T on first coin and H on second coin)
=(0.4 × 0.4) + (0.6 × 0.6)
=0.52

P (X = 2) =P (H on both the coins)


= (0.4 × 0.6) = 0.24

Therefore, E(X) = 0.52 + (2 × 0.24) = 1

3. Let the two random variables X and Y be independent with means equal to 10 and
20, and variances equal to 2 and 4, respectively. Find the value of Var(XY ).
Hint: If X and Y are independent, X 2 and Y 2 are also independent. [1 mark]
Answer: 1208
Solution:
Mean and variance of X is 10 and 2, respectively.
Mean and variance of Y is 20 and 4, respectively.

Var(XY) =E[(XY )2 ] − (E[XY ])2


=E[X 2 Y 2 ] − (E[X]E[Y ])2 , X and Y are independent.
=E[X 2 ]E[Y 2 ] − E[X]2 E[Y ]2 , X and Y are independent.
=(V ar(X) + E[X]2 )(V ar(Y ) + E[Y ]2 ) − E[X]2 E[Y ]2
=(2 + 102 )(4 + 202 ) − 102 202
=(102 × 404) − 40000
=41208 − 40000
=1208

2
4. Let X and Y be two independent discrete random variables. Define random variables
U and V as

X − E(X) Y − E(Y )
U= , V =
SD(X) SD(Y )
Find Cov(U, V ). [1 mark]
Answer: 0
Solution:
Cov(U, V ) = E(U V ) − E(U )E(V ).
Since U and V are the standardized form of random variables X and Y , respectively,

E(U ) = E(V ) = 0 and V ar(X) = V ar(Y ) = 1


Now,

Cov(U, V ) = E(U V )
  
X − E(X) Y − E(Y )
=E
SD(X) SD(Y )
1
= E[(X − E(X))(Y − E(Y ))]
SD(X)SD(Y )
= E[XY − XE(Y ) − Y E(X) + E(X)E(Y )]
= E[XY ] − E[X]E(Y ) − E[Y ]E(X) + E(X)E(Y )

Since X and Y are independent, E(XY ) = E(X)E(Y )


Therefore, Cov(U, V ) = E[X]E(Y ) − E[X]E(Y ) = 0.
Use the following information to answer questions (5) and (6).
Number of people (X) who make a reservation in a restaurant a day is a random
variable with mean equal to 10 and variance equal to 2.

5. Using Markov’s inequality, find a bound on the probability that on a particular day,
the number of reservations will exceed 30. [1 mark]
1
(a) P (X > 30) ≤
4
1
(b) P (X > 30) ≥
3
10
(c) P (X > 30) ≤
31
10
(d) P (X > 30) >
31

3
Solution:
Random variable X represents the number of people who make reservation in a restau-
rant. It is given that

E(X) = 10 (3)

Using Markov’s inequality, we know that


µ
P (X ≥ c) ≤
c
10
Therefore, P (X > 30) = P (X ≥ 31) ≤ .
31
Therefore, the correct option is (c).

6. Find a bound on the probability that on a particular day, number of reservations made
will lie in between 6 and 14 using Chebyshev’s inequality. [2 marks]
7
(a) P (6 < X < 14) ≤
8
7
(b) P (6 < X < 14) ≥
8
7
(c) P (6 < X < 14) >
8
1
(d) P (6 < X < 14) ≤
8
Solution:
Using the Chebyshev’s inequality, we know that
1
P (| X − µ |≥ kσ) ≤ (4)
k2
1
P (µ − kσ < X < µ + kσ) ≥ 1 − (5)
k2
Given µ = 10 and σ 2 = 2
Now, we can write P (6 < X < 14) as
1
P (10 − kσ < X < 10 + kσ) ≥ 1 − 2 . Using (5)
k
Now, let

10 − kσ =6 (6)
10 + kσ =14 (7)

Solving (6) and (7), we get kσ = 4

4
4 16
⇒k= ⇒ k2 = =8
σ 2
1 7
Therefore, P (6 < X < 14) ≥ 1 − =
8 8
Hence, the correct option is (b).

7. The joint probability mass function of three discrete random variables X, Y and Z is
given as
1
p(0, 1, 2) = p(0, 2, 3) = p(1, 0, −2) =
3
Calculate Var(XY + 2Z). [2 mark]
52
(a)
9
32
(b)
9
80
(c)
3
56
(d)
3
Solution:

t1 t2 t3 t1 t2 + 2t3 fXY Z (t1 , t2 , t3 )


0 1 2 4 1/3
0 2 3 6 1/3
1 0 −2 -4 1/3

Joint PMF of X, Y and Z.

1
XY + 2Z will take the values in {-4, 6, 4} with the probabilities each.
3

1
E(XY + 2Z) = [−4 + 6 + 4]
3
6
= =2
3

1
E[(XY + 2Z)2 ] = [(−4)2 + 62 + 42 ]
3
1
= [16 + 36 + 16]
3
68
=
3

5
Now,
Var(XY + 2Z) =E[(XY + 2Z)2 ] − [E(XY + 2Z)]2
68
= − 22
3
56
=
3
Hence, the correct option is (d).
8. An urn contains 5 white balls and 5 red balls. 2 balls are selected at random. Let
X denote the number of red balls drawn and let Y denote the number of white balls
drawn. Find the correlation coefficient between X and Y . [2 marks]
(a) ρ(X, Y ) = 1
(b) ρ(X, Y ) = −1
(c) ρ(X, Y ) = 0
(d) ρ(X, Y ) = −0.5
Solution:
Two balls are selected at random from the urn containing 5 white and 5 red balls.
Random variable X represent the number of red balls drawn.
Therefore, X will take values in {0, 1, 2}.
Random variable Y represent the number of white balls drawn.
Therefore, Y will take values in {0, 1, 2}.
Joint probability distribution of X and Y is given by

X
0 1 2
Y
10
0 0 0
45
25
1 0 0
45
10
2 0 0
45

Joint distribution of X and Y .

Now,
     
10 25 10
E(X) = 0 × + 1× + 2×
45 45 45
=1

6
Similarly, E(Y ) = 1.

     
2 10 25 2 10
E(X ) = 0 × + 1× + 2 ×
45 45 45
65
=
45
65
Similarly, E(Y 2 ) =.
45
65 20
Now, Var(X) = Var(Y ) = − (1)2 =
45 45
   
10 25
E(XY ) = 0 × + 1× + (2 × 0)
45 45
25
=
45

Correlation coefficient between X and Y is given by

Cov(X, Y )
ρ(X, Y ) =
SD(X)SD(Y )
E(XY ) − E(X)E(Y )
= p
Var(X)Var(Y )
25
( − 1)
= q 45
( 20
45
) × ( 20
45
)
=−1

Therefore, the correct option is (b).

9. Five students each from class 8, 9 and 10 have been nominated for the formation of
the school committee. The number of boys and girls who are selected from each of the
classes is given in Table 4.1.A.

Class 8 Class 9 Class 10


Girls 2 2 3
Boys 3 3 2

Table 4.1.A: Total number of boys and girls selected.

If the committee comprises of two students from each class, find the expected number
of girls in the committee. (Enter the answer correct to 1 decimal point) [2 marks]
Answer: 2.8

7
Solution:
Let X1 represent the number of girls from class eight in the school committee.
Let X2 represent the number of girls from class nine in the school committee.
Let X3 represent the number of girls from class ten in the school committee.
We need to find E(X1 + X2 + X3 ).
We know that E(X1 + X2 + X3 ) = E(X1 ) + E(X2 ) + E(X3 ).

Since total number of girls selected from class eight is 2, therefore, the committee can
comprise of either 0 girl or 1 girl or 2 girls from class eight.
i.e. X1 will take values in {0, 1, 2}.
Now
3
C2 3
P (X1 = 0) = 5 =
C2 10
3 2
C1 × C1 6
P (X1 = 1) = 5 =
C2 10
2
C2 1
P (X1 = 2) = 5 =
C2 10

     
3 6 1 8
Therefore, E(X1 ) = 0× + 1× + 2× =
10 10 10 10

Similarly, total number of girls selected from class nine is 2, therefore, the committee
can comprise of either 0 girl or 1 girl or 2 girls from class nine.
8
i.e. X2 will take values in {0, 1, 2}, hence E(X2 ) = .
10

Total number of girls selected from class ten is 3 and we have to select 2 students from
each class, therefore, the committee can comprise of either 0 girl or 1 girl or 2 girls
from class ten.
i.e. X3 will take values in {0, 1, 2}.
Now
2
C2 1
P (X3 = 0) = 5 =
C2 10
3 2
C1 × C1 6
P (X3 = 1) = 5 =
C2 10
3
C2 3
P (X3 = 2) = 5 =
C2 10

8
     
1 6 3 12
Therefore, E(X3 ) = 0× + 1× + 2× =
10 10 10 10

Now

E(X1 + X2 + X3 ) =E(X1 ) + E(X2 ) + E(X3 )


8 8 12
= + +
10 10 10
=2.8

Hence, expected number of girls in the class committee is 2.8.

10. A share of a company costs |1000 today. Suppose today’s share price increases by
50% with probability 0.6 and decreases by 50% with probability 0.4. Independent of
today, suppose that tomorrow’s share price increases by 20% with probability 0.2, and
decreases by 30% with probability 0.8. If you decide to buy 3 shares today, find the
expected profit (in |) at the end of 2 days. [2 marks]

(a) -120
(b) 360
(c) 120
(d) -360

Solution:
The cost price of a share of the company is |1000.
Let the random variable X represent the price of the share at the end of 2 days.
Price can either go up by 50% with probability 0.6 or can go down by 50% with prob-
ability 0.4 on the first day.
Independent of today, the share price can either go up by 20% with probability 0.2 or
can go down by 30% with probability 0.8.

i.e. If the share price increases by 50% on the first day, the price of the share will
become |1500.
Andthe price of the share at the end of two days if the share prices increases by 20%
20
is | 1500 × + 1500 =|1800 with probability (0.6 × 0.2) = 0.12.
100
Similarly,the price of the share
 at the end of two days if the share prices decreases by
30
30% is | 1500 − 1500 × =|1050 with probability (0.6 × 0.8) = 0.48.
100

Again, if the share price decreases by 50% on the first day, the price of the share will
become |500.
And the price of the share at the end of two days if the share prices increases by 20%

9
 
20
is | 500 × + 500 =|600 with probability (0.4 × 0.2) = 0.08.
100
Similarly,the price of the share
 at the end of two days if the share prices decreases by
30
30% is | 500 − 500 × =|350 with probability (0.4 × 0.8) = 0.32.
100

Therefore, X will take values in {1800, 1050, 600, 350}, where

P (X = 1800) = 0.12
P (X = 1050) = 0.48
P (X = 600) = 0.08
P (X = 350) = 0.32
Now,

E(X) =(1800 × 0.12) + (1050 × 0.48) + (600 × 0.08) + (350 × 0.32)


=880

The expected gain at the end of two days if you buy one share is |(880-1000) = -|120.
Therefore, if you buy 3 shares of the company, expected gain will be -|360.

Hence, the correct option is (d).

11. A lottery has 500 tickets out of which only 2 tickets contain prizes worth |500 and
|1,000; the rest are worth |0. If one has bought 2 tickets, what will be his/her ex-
pected gain (in |)? [2 marks]
Answer: 6

Solution:
In the lottery, only two tickets out of 500 contain prizes worth |500 and |1,000.
If one has bought two tickets, one can get the prizes worth |0, |500, |1,000 and |1,500.
Let the random variable X represent the worth of the prizes of two tickets.
Therefore, X will take values in {0, 500, 1000, 1500}.

498
C2
P (X = 0) = P (Both the tickets are worth |0) = 500
C2
498
C 1 1C 1
P (X = 500) = P (One of the ticket is worth |0 and the other is worth |500) = 500
C2
498
C 1 1C 1
P (X = 1000) = P (One of the ticket is worth |0 and the other is worth |1000) = 500
C2
P (X = 1500) = P (One of the ticket is worth |500 and the other is worth |1000) =

10
2
C2
500
C2
498 498
C 1 1C 1 498
C 1 1C 1 2
       
C2 C2
E(X) = 0 × 500 + 500 × 500 + 1000 × 500 + 1500 × 500
C2 C2 C2 C2
1
= 500 [500 × 498C 1 + 1000 × 498C 1 + 1 × 1500]
C2
1
= 500 [249000 + 498000 + 1500]
C2
748500
= =6
124750
Therefore, the expected gain is |6.

12. Number of cars (X) that visit Garage A each day is a random variable with mean 45
and variance 10 while the number of cars (Y ) that visit Garage B each day is a random
variable with mean 45 and variance 20. If the arrival of cars in garages A and B are
independent, find an upper bound on the probability that the difference in the number
of cars arriving in Garage A and Garage B on a particular day is greater than or equal
to 10. [3 marks]
3
(a)
10
2
(b)
10
1
(c)
10
1
(d)
4
Solution:
The random variable X represent the number of cars that come each day in Garage A.
2
Let the mean and variance of X be denoted by µX and σX respectively.
2
Given µX = 45, σX = 10.
The random variable Y represent the number of cars that come each day in Garage B.
Let the mean and variance of Y be denoted by µY and σY2 respectively.
Given µY = 45, σY2 = 20.
Arrival of cars in shop A and B are independent. That implies X and Y are indepen-
dent.
Difference in the number of cars arriving in shop A and shop B is given by | X − Y |.
Let µ = E(X − Y ) = E(X) − E(Y ) = 10 − 10 = 0
and σ 2 = V ar(X − Y ) = V ar(X) + V ar(Y ) = 10 + 20 = 30. (Since X and Y are
independent.)

11
Using Chebysheb’s inequality,
1
P {| (X − Y ) − 0 |≥ kσ} ≤ (8)
k2
for k > 0.
Substituting kσ = 10 in equation (8), we get
1
P {| X − Y |≥ 10} ≤
( 100
30
)

3
⇒ P {| X − Y |≥ 10} ≤
10
Therefore, option (a) is correct.

12
Statistics for Data Science - 2

Week 5 Practice Assignment Solutions

1. The probability density function of a continuous random variable X is shown in Figure


5.1.P.

Figure 5.1.P: Probability Density Function graph of X

The PDF is defined as follows:


(
e−x x≥0
fX (x) =
0 x<0
Find P (− < X < 0), where  is a very small positive number.

(a) e
(b) 0
(c) e−
(d) e−2

Answer: b
Solution: R0
We know that P (− < X < 0) = − fX (x)dx
But the value of fX (x) is zero in the range − to zero.
Therefore, P (− < X < 0) = 0.
Therefore, option b is the correct option.

1
2. Which of the following statements is/are true for a continuous random variable with
PDF fX (x)?

(a) If fX (2) = 2fX (1), then P (2 −  < X < 2 + ) = 2P (1 −  < X < 1 + ) for a small
.
(b) If fX (2) = 2fX (1), then P (2 −  < X < 2 + ) ≈ 2P (1 −  < X < 1 + ) for a small
.
(c) P (X = x0 ) = 0 for any value of x0 .
(d) CDF FX (x) is continuous in the domain [−∞, ∞].

Answer: b, c, and d

Solution:
Option a: We know that for small , P (x −  < X < x + ) ∝ fX (x).
Therefore, P (1 −  < X < 1 + ) ∝ fX (1) and P (2 −  < X < 2 + ) ∝ fX (2)
But P (x −  < X < x + ) is not exact linear function of fX (x).
Therefore when fX (2) = 2fX (1), then P (2 −  < X < 2 + ) 6= 2P (1 −  < X < 1 + )
but P (2 −  < X < 2 + ) ≈ 2P (1 −  < X < 1 + )
Hence option a is wrong but option b is correct.
Option c: The probability at an instant (PX (x)) for a continuous random variable is
zero as there is no sudden spike in the CDF function for any value of x. Hence option
c is correct.
Option d: For a continuous random variable CDF is always continuous.

3. If 
 1 (x2 − 8x + 16) 1 ≤ x ≤ 7
fX (x) = 18
0 otherwise
What is the value of P (X ≤ 4)? Enter the answer correct to one decimal accuracy.
R xa+1
( xa dx = )
a+1
Answer: 0.5

Solution: R
4
P (X ≤ 4) = −∞ fX (x)dx
R4
⇒ P (X ≤ 4) = 1 fX (x)dx, since fX (x) = 0 for x < 1.
R4 1 2
⇒ P (X ≤ 4) = 1 ( 18 (x − 8x + 16))dx
1 3
⇒ P (X ≤ 4) = (x /3 − 8x2 /2 + 16x/1)|41
18
1 1
⇒ P (X ≤ 4) = (43 /3 − 4 ∗ 42 + 16 ∗ 4) − (13 /3 − 4 ∗ 12 + 16 ∗ 1)
18 18
⇒ P (X ≤ 4) = 0.5

2
4. If X ∼ Normal(10, 25), what is the value of E[2X 2 ]?
Answer: 250

Solutions:
Given E[X]=10, Var(X)=25
We know that Var(X)= E[X 2 ] − E[X]2
⇒ E[X 2 ] = Var(X) + E[X]2
⇒ E[X 2 ] = 25 + 102 = 125
We know thatE[cX] = cE[X], where c is a constant.
⇒ E[2X 2 ] = 2E[X 2 ]
⇒ E[2X 2 ] = 2 × 125 = 250

5. If X ∼ Normal(10, 4), then what is the value of P (X ≥ 8|X ≤ 9)? Use the standard
normal distribution tables if necessary. Enter the answer up to two decimals accuracy.
Use the following CDF values of standard normal distribution.
FZ (−2) = 0.02275, FZ (−1.5) = 0.06681, FZ (−1) = 0.15866, FZ (−0.5) = 0.30854, FZ (0) =
0.5, FZ (0.5) = 0.69146, and FZ (1) = 0.84134
Answer: 0.485 accepted range 0.48 to 0.49

Solution:
Given µ = 10, σ 2 = 4 ⇒ σ = 2
We need to find P (X ≥ 8|X ≤ 9).
P (X ≥ 8 ∩ X ≤ 9)
P (X ≥ 8|X ≤ 9) =
P (X ≤ 9)
FX (9) − FX (8)
P (X ≥ 8|X ≤ 9) =
FX (9)
Converting present normal distribution to standard distribution to get values of FX (x).
x−µ 8 − 10
For x = 8, z = = = −1, ⇒ FX (8) = FZ (−1)
σ 2
x−µ 9 − 10
For x = 9, z = = = −0.5, ⇒ FX (9) = FZ (−0.5)
σ 2
FX (9) − FX (8)
P (X ≥ 8|X ≤ 9) =
FX (9)
0.30854 − 0.15866
⇒ P (X ≥ 8|X ≤ 9) = = 0.485
0.30854

6. A random variable X has the following PDF


(
2x 0 ≤ x ≤ 1
fX (x) =
0 otherwise

Define Y = eX . What is the PDF fY (y) of Y ?

3

 2 log(y) 1≤y≤e
(a) fY (y) = y
0 otherwise


 log(y)
1≤y≤e
(b) fY (y) = 2ey
0 otherwise

 log(y) 1≤y≤e
(c) fY (y) = y
0 otherwise


 log(y)
1≤y≤e
(d) fY (y) = ey
0 otherwise

 log(y) 1≤y≤e
(e) fY (y) = 2y
0 otherwise

Answer: a

Solution:
Given Y = g(X) = eX
⇒ log y = x = g −1 (y)
Therefore g −1 (y) = log(y)
d(ex )
g(x) = ex , ⇒ g 0 (x) = ex Since = ex
dx x
We know that in the range 0 to 1, e is monotonic (increasing function).
1
Therefore, we can use the formula, fY (y) = 0 −1 fX (g −1 (y))
|g (g (y))|
g 0 (g −1 (y)) = g 0 (log y) = elog y = y
|g 0 (g −1 (y))| = y since y is positive in the range [1, e]
fX (g −1 (y)) = fX (log y) = 2 log y
1
Therefore, fY (y) = log y
y
2 log y
fY (y) =
y
Hence option a is correct.

Use the following information to answer the questions 7 and 8.

4
The CDF of random variable X is given below:


 0 x≤0

2x2 0 ≤ x ≤ 12



FX (x) = 12 1
2
≤x≤1
 x
1≤x≤2




 2
1 x≥2

Use the following derivative formula:

d(xa )
= axa−1
dx
7. Which of the following statements is/are correct?

(a) X is a continuous random variable.


(b) X is a discrete random variable.
(c) The PDF of X is not defined as X is discrete random variable.


 0 x≤0
1

4x 0 ≤ x ≤ 2



1
(d) The PDF of random variable X is fX (x) = 0 2
≤x≤1
 x

 1≤x≤2
2



0 x>2


 0 x<0

2 1
2x 0 ≤ x ≤ 2



1
(e) The PDF of random variable X is fX (x) = 0 2
<x<1
 x
1≤x≤2




 4
0 x>2

Answer: a, d
Solution:
d(FX (x))
We know that fX (x) =
dx
Given 

 0 x≤0

2x2 0 ≤ x ≤ 12



1 1
FX (x) = 2 2
≤x≤1
 x
1≤x≤2




 2
1 x≥2

5
d(0


 =0 x≤0



 dx



d(2x2 )


1
= 4x 0≤x≤


2




 dx



 1
d( 2 )
⇒ fX (x) = =0 1
<x≤1
2


 dx



d( x2 )


= 12


 1<x≤2
dx







 d(1) = 0



x>2
 dx


 0 x≤0
1
4x 0 ≤ x ≤ 2




1
Therefore, fX (x) = 0 2
<x≤1
 1
1<x≤2





 2
0 x>2
Since, FX (x) is continuous in the given domain, hence X is a continuous random
variable.
8. What is the value of P (X ≥ 1|X ≤ 1.5)? Enter the answer correct to two decimals
accuracy.
Answer: 0.33, accepted range 0.31 to 0.35
Solution:
FX (1.5) − FX (1) 1.5/2 − 1/2
P (X ≥ 1|X ≤ 1.5) = = = 1/3
FX (1.5) 1.5/2
9. The time taken by Rohith to complete a race follows the exponential distribution with
expected time of completion of 10 minutes. What is the probability that Rohith takes
less than 20 minutes but more than 10 minutes to complete the race? Enter the answer
e−ax
correct to 2 decimals accuracy. ( e−ax dx =
R
)
−a
Answer: 0.2325, accepted range: 0.23 to 0.235
Solution:
Given E[X] = 10 minutes.
We know for a exponential distribution E[X] = λ1
⇒ λ1 = 10, λ = 0.1
For exponential distribution FX (x) = 1 − e−λx
The probability that athlete takes more than 10 minutes is,
FX (10) = 1 − e−0.1×10 = 1 − e−1
The probability that athlete takes more than 20 minutes is,

6
FX (20) = 1 − e−0.1×20 = 1 − e−2
The probability that athlete takes more than 10 minutes but less than 20 minutes to
complete race is FX (20) − FX (10) = e−1 − e−2 = 0.232 approximately.

10. The PDFs of random variables X1, X2, X3, X4, and X5 are shown in Figure 5.2.P.
Based on the information, choose the correct option(s) from below.

Figure 5.2.P: PDF of Normal Distributions for different variables.

(a) E(X1) ≈ E(X5) < E(X2) < E(X4) < E(X3)


(b) E(X1) < E(X5) < E(X2) < E(X4) < E(X3)
(c) E(X1) < E(X5) = E(X2) < E(X4) < E(X3)
(d) Var(X1) < Var(X3) < Var(X4) < Var(X5)
(e) Var(X1) ≈ Var(X2) < Var(X3) < Var(X4) < Var(X5)

Answer: a, d, and e
Solution:
We know that in the PDF of normal distribution, the peak value occurs at mean.
E[X] = µ(mean)
Also, the value of PDF at mean is inversely proportional to standard deviation
1
Since, fX (µ) = √ .
2πσ
The peak value, which is mean or E[X], of PDF occurs approximately for X1, X2, X3, X4,
and X5 at -10, 0, 20, 10, and -10 respectively.
Therefore, E(X1) ≈ E(X5) < E(X2) < E(X4) < E(X3)
The peak value (fX (µ)) for variables X1, X2, X3, X4, and X5 are such that fX1 (µ) ≈
fX2 (µ) > fX3 (µ) > fX4 (µ) > fX5 (µ).
Therefore, Var(X1) ≈ Var(X2) < Var(X3) < Var(X4) < Var(X5)
Hence, options a, d, and e correct.

7
11. The PDF of a continuous random variable is given as
(
4x3 0 ≤ x ≤ 1
fX (x) =
0 otherwise
R xa+1
What is the value of Var(X)? ( xa dx = )
a+1
1
(a)
75
2
(b)
75
3
(c)
75
4
(d)
75
Answer: b
We knowR that Var(X) = E[X 2 ] − E[X]2
E[X] = xfX (x)dx
R1
E[X] = 0 x ∗ 4x3 dx
R1
⇒ E[X] = 0 4x4 dx

4x5 1
⇒ E[X] = |
5 0

4 4
⇒ E[X] = 5
−0= 5
R
E[X 2 ] = x2 fX (x)dx
R1
E[X] = 0 x ∗ 4x4 dx
6
⇒ E[X] = 4x6 |10

4 2
⇒ E[X] = 6
−0= 3

Therefore, Var(X) = E[X 2 ] − E[X]2

2
Var(X) = 3
− ( 45 )2

2 16
Var(X) = 3
− 25

2
Var(X) = 75

12. Let X ∼ Uniform(a1 , b1 ) and Y ∼ Uniform(a2 , b2 ). Based on this information, choose


the correct option(s) from below.

8
(a) If b2 − a2 = b1 − a1 , then Var(X) = Var(Y ).
(b) If b2 + a2 = b1 + a1 , then Var(X) = Var(Y ).
(c) If b2 − a2 = b1 − a1 , then E(X) = E(Y ).
(d) If b2 − b1 = a1 − a2 , then E(X) = E(Y ).
Answer: a and d
Solution:
We know that mean (E(X)) and Variance (Var(X)) of uniform random variable (X ∼
a+b (b − a)2
Uniform(a, b) is and respectively.
2 12
Given X ∼ Uniform(a1 , b1 ) and Y ∼ Uniform(a2 , b2 ),
a1 + b 1 a2 + b 2
E(X) = , E(Y ) = . So, for E(X) to be equal to E(Y ), a1 + b1 = a2 + b2
2 2
or b2 − b1 = a1 − a2 . Hence option d is correct and option c is incorrect.
(b1 − a1 )2 (b2 − a2 )2
Similarly for Var(X) to be equal to Var(Y ), = or b1 −a1 = b2 −a2 ,
12 12
hence option a is correct and option b is incorrect.
13. The CDF of a random variable X is given as:

0 x x<0


FX (x) = 0 ≤ x ≤ ln 2
 ln 4
1 − e−x ln 2 ≤ x < ∞

Derivative formulas required to solve the problem:


d(ax)
=a
dx
d(e−ax )
= −ae−ax
dx
The PDF of the random variable X is:


 0 x<0
1

(a) fX (x) = 0 ≤ x < ln 2
 ln
 −x
 4
e ln 2 ≤ x < ∞

0
 x<0
(b) fX (x) = 1 0 ≤ x < ln 2

 −x
e ln 2 ≤ x < ∞


 0 x<0
1

(c) fX (x) = 0 ≤ x ≤ ln 2
 ln
 −x
 2
e ln 2 < x < ∞

9


 0 x<0
1

(d) fX (x) = 0 ≤ x < ln 2
 lnx 2


e ln 2 ≤ x < ∞

Answer: a
Solution:
d(FX (x))
We know that fX (x) =
dx
Given, 
0 x<0


x
FX (x) = 0 ≤ x ≤ ln 2
 ln 4
1 − e−x

ln 2 ≤ x < ∞
Therefore,
d(0)


 =0 x<0



 dx



 x
d( )

fX (x) = ln 4 = 1 0 ≤ x ≤ ln 2
dx ln 4







−x
 d(1 − e ) = e−x



ln 2 ≤ x < ∞
dx
Hence option a is correct.

10
Statistics for Data Science - 2

Week 5 Graded Assignment solution


Continuous random variable

1. The CDF of a random variable X is


(
1 − e−3x x≥0
FX (x) =
0 otherwise

i) Find P (X > 4).


a) e−3 − e−4
b) e−12
c) e−7
d) e−3 e−4
Solution:

P (X > 4) = 1 − P (X ≤ 4) = 1 − FX (4)
= 1 − (1 − e−3×4 )
= e−12

ii) Find the value of P (−5 < X ≤ 6).

a) 1 − e−18
b) e−5 − e−18
c) e−18
d) e−9
Solution:

P (−5 < X ≤ 6) = FX (6) − FX (−5)


= (1 − e−3×6 ) − 0
= 1 − e−18

2. Let X be a continuous random variable with the following PDF:


(
ke−x x ≥ 0
fX (x) =
0 otherwise

1
i) Find the value of k.
Solution:
We know that for PDF of the random variable
Z ∞
fX (x) = 1
−∞
Z ∞
⇒ ke−x dx = 1
0
 −x  ∞
e
⇒k =1

−1
0

⇒ k(0 + 1) = 1 (As x approaches to ∞, e−x approaches to 0)


⇒k=1
ii) Find P (3 < X < 4).

a) e−1
b) e−3 e−4
c) e−3 − e−4
d) e−4 − e−3
Rb
Hint: Use a e−x dx = e−a − e−b
Solution:

Z 4
P (3 < X < 4) = ke−x dx
3
 4
e−x

=1×

−1


3
 −4
e−3

e
= −
−1 −1
−3 −4
=e −e

3. Let X be a continuous random variable with PDF


(
5x4 0 < x ≤ 1
fX (x) =
0 otherwise
3
Find P (X ≤ 4
| X > 41 ).
3
a)
16
2
17
b)
86
22
c)
93
9
d)
22
Rb
Hint: Use a
5x4 dx = b5 − a5

3 1 P (X ≤ 43 and X > 14 )
P (X ≤ |X> )=
4 4 P (X > 14 )
R 3/4 4
1/4
5x dx
= R1
1/4
5x4 dx
3/4

5x5
5
1/4
= 1

5x5
5
1/4
3/4

5
x
3 1

1/4
⇒ P (X ≤ | X > ) = 1
4 4
x5


1/4
3 5
(4) − ( 14 )5
=
1 − ( 14 )5
22
=
93

4. The lifespan (in hours) of an electronic component used in an electric car has the density
function ( x
1 − 500
500
e x≥0
fX (x) =
0 otherwise
Determine the probability that the component lasts more than 200 hours before it needs
to be replaced.

a) e−0.4
b) e200

3
c) 0.5
d) e−2.5

Solution:
Let X denote the lifespan (in hours) of the electronic component. We have to find the
probability that the component lasts more than 200 hours before it needs to be replaced
i.e.
P (X > 200) = 1 − P (X ≤ 200)
1
Also, we can relate the given density with the exponential distribution with λ = 500 .

⇒ P (X > 200) = 1 − P (X ≤ 200)


= 1 − FX (200)
200
= 1 − (1 − e− 500 )
= e−0.4

5. The number of days in advance by which airline tickets are purchased by travelers is
exponentially distributed with an average of 28 days. If there is an 80% chance that a
traveler will purchase tickets fewer than d days in advance, then what is the value of d?
Write your answer to the nearest integer.
Solution:
Let X be the number of days in advance by which airline tickets are purchased by
travelers.
We need to find the value of d.
1
Given that average is 28 days, so λ = 28 and there is 80% chance that a traveler will
purchase tickets fewer than d days in advance.

⇒ P (X < d) = 0.80
d
⇒ 1 − e− 28 = 0.80
d
⇒ e− 28 = 0.20
d
⇒ = −ln(0.20)
28
⇒ d = 28 × (1.609)
⇒ d = 45.052

Rounding off to the nearest integer, d = 45 days.

6. A firm produces machines with a lifespan, whose distribution has a mean of 200 months
and standard deviation of 50 months. The firm wishes to introduce a warranty scheme
in which it would like to replace all the dysfunctional machines with new ones within
warranty period. But they do not wish to do so for more than 11.9% of the machines
they produce. If the lifespan of the machine is assumed to follow a normal distribution,

4
how long a guarantee period should be offered? (Answer is expected in months)
Hint: Use P (Z < −1.18) = 0.119, where Z represents the standard normal distribution.
Solution:
Let X denote the lifespan of the machines in months. Given that µ = 200 and σ = 50.
The firm did not wish to replace more than 11.9% of the machines they produce.
If m be the guarantee period (in months), then

P (X ≤ m) = 0.119
 
X − 200 m − 200
⇒P ≤ = 0.119
50 50

Comparing this equation with the given value of standard normal distribution we will get

m − 200
= −1.18
50
⇒ m = 141

7. Let X be a continuous random variable with the following PDF:


(
3(1 − x)2 0 < x < 1
fX (x) =
0 otherwise

Define Y = (1 − X)3 . Find the PDF of the random variable Y .

a) (
1 0<y<1
fY (y) =
0 otherwise

b) (
(1 − y)3 0<y<1
fY (y) =
0 otherwise

c) (
y3 0<y<1
fY (y) =
0 otherwise

d) (
3y 2/3 0<y<1
fY (y) =
0 otherwise

Hint:
d
Apply the monotonic, differentiable function theorem and (1 − x)3 = −3(1 − x)2
dx

5
Solution:
We know that in the range (0, 1), (1 − x)3 is monotonic (decreasing function).
1
Therefore, we can use the formula, fY (y) = 0 −1 fX (g −1 (y))
|g (g (y))|
Given Y = (1 − X)3 = g(X)(let)
⇒ y 1/3 = 1 − x, ⇒ x = 1 − y 1/3 = g −1 (y)
Therefore g −1 (y) = 1 − y 1/3
d
g(x) = (1 − x)3 ⇒ g 0 (x) = −3(1 − x)2 , since (1 − x)3 = −3(1 − x)2
dx
And
g 0 (g −1 (y)) = g 0 (1 − y 1/3 ) = −3(1 − (1 − y 1/3 ))2 = −3y 2/3
|g 0 (g −1 (y))| = 3y 2/3 , since y 2/3 is positive in the range (0, 1).
fX (g −1 (y)) = fX (1 − y 1/3 ) = 3(1 − (1 − y 1/3 ))2 = 3y 2/3
3y 2/3
Therefore, fY (y) = 2/3
3y
⇒ fY (y) = 1
Therefore
(
1 0<y<1
fY (y) =
0 otherwise

8. Let X be a continuous random variable with the following PDF:


(
x2 /81 −6 < x < 3
fX (x) =
0 otherwise

Define Y = 13 (12 − X). Find the PDF of the random variable Y .

a) (
(12 − 3y)2 /27 −6 < y < 3
fY (y) =
0 otherwise

b) (
(12 − 3y)2 /27 3 < y < 6
fY (y) =
0 otherwise

c) (
(12 − 3y)/27 −6 < y < 3
fY (y) =
0 otherwise

d) (
(12 − 3y)/27 3 < y < 6
fY (y) =
0 otherwise

6
Solution:
We know that in the range (-6, 3), 13 (12 − x) is monotonic (decreasing function).
1
Therefore, we can use the formula, fY (y) = 0 −1 fX (g −1 (y))
|g (g (y))|
Given Y = 31 (12 − X) = g(X)(let)
⇒ 3y = 12 − x, ⇒ x = 12 − 3y = g −1 (y)
Therefore g −1 (y) = 12 − 3y
g(x) = 31 (12 − x) ⇒ g 0 (x) = − 13
And
g 0 (g −1 (y)) = g 0 (12 − 3y) = − 13
|g 0 (g −1 (y))| = 13
(12 − 3y)2
fX (g −1 (y)) = fX (12 − 3y) =
81
(12 − 3y)2
Therefore, fY (y) = 81
1
3
(12 − 3y)2
⇒ fY (y) =
27
When x = −6, y = 6 and x = 3, y = 3.
Therefore

 (12 − 3y)2
3<y<6
fY (y) = 27
0 otherwise

9. Let X be a continuous random variable with the following PDF:


(
x3 (6x2 + 5x − 4) 0 < x ≤ 1
fX (x) =
0 otherwise

Find the value of E[X].


523
a)
210
23
b)
210
173
c)
210
187
d)
210
Rb 1
Hint: Use a
xn dx = n+1
(bn+1 − an+1 )

7
Solution:

Z ∞
E[X] = xfX (x)dx
−∞
Z 1
= x × x3 (6x2 + 5x − 4)dx
Z0 1
= (6x6 + 5x5 − 4x4 )dx
0
1 1 1
7 6 5
6x 5x 4x
= + −
7 6 5

0 0 0
6 5 4
= + −
7 6 5
187
=
210

10. Let X be a continuous random variable with the following PDF:



x
 0≤x≤1
fX (x) = 2 − x 1 < x ≤ 2

0 otherwise

Define Y = 6X + 5. Find the variance of Y.


6
Rb 1
Use a xn dx = n+1 (bn+1 − an+1 )

Rb Rc Rb
Also, a xn dx = a xn dx + c xn dx where a < c < b.
Solution:
Var(Y ) = Var(6X + 5) = 36Var(X)
And Var(X) = E[X 2 ] − (E[X])2

8
Z ∞
E[X] = xfX (x)dx
−∞
Z 2
= xfX (x)dx
0
Z 1 Z 2
= xfX (x)dx + xfX (x)dx
0 1
Z 1 Z 2
= [Link] + x(2 − x)dx
0 1
1 2 2
x3 2x2 x3
= + −
3 2 3
0 1 1
1 2 2 (23 − 13 )
= + (2 − 1 ) −
3 3
1 7
= +3−
3 3
=1

Z ∞
2
E[X ] = x2 fX (x)dx
−∞
Z 2
= x2 fX (x)dx
Z0 1 Z 2
2
= x fX (x)dx + x2 fX (x)dx
Z0 1 Z 2 1
= x2 .xdx + x2 (2 − x)dx
0
1 2 1 2
x4
2x3 x4
= + −
4 3 4
0 1 1
1 2 1
= + (23 − 13 ) − (24 − 14 )
4 3 4
1 14 15
= + −
4 3 4
7
=
6
Therefore,
Var(X) = 76 − 1 = 16
⇒ Var(Y ) = 36 × 16 = 6

9
Statistics for Data Science - 2

Week 6 Practice Assignment Solution

1. Let X ∼ Bernoulli(0.6). Let (Y | X = 0) ∼ Exp(1) and (Y | X = 1) ∼ Exp(3). Find


the marginal of Y.

a) 0.6e−y + 0.4e−3y
b) 0.4e−y + 0.6e−3y
c) 0.6e−y + 1.2e−3y
d) 0.4e−y + 1.8e−3y

Solution:
Given that, X ∼ Bernoulli(0.6), therefore pX (1) = 0.6 and pX (0) = 0.4.
The marginal density of Y is given by
X
fY (y) = pX (x)fY |X=x (y)
x∈TX

= pX (1)fY |X=1 (y) + pX (0)fY |X=0 (y)


= 0.6 × 3e−3y + 0.4e−y
= 1.8e−3y + 0.4e−y

2. Let X ∼ Uniform{1, 2, 3}. Let (Y | X = 1) ∼ Exp(1), (Y | X = 2) ∼ Exp(2) and


(Y | X = 3) ∼ Normal(0, 4). What is the marginal of Y ?
1 2
a) e−y + 2e−2y + √ e−y /8
2 2π
1 −y 1 2
b) [e + 2e−2y + √ e−y /8 ]
3 2 2π
1 1 2
c) [e−y + e−2y + √ e−y /4 ]
3 2π
1 2
d) e−y + e−2y + √ e−y /4
2 2π
Solution:
Given that, X ∼ Uniform{1, 2, 3}, therefore pX (1) = pX (2) = pX (3) = 13 .

1
The marginal density of Y is given by
X
fY (y) = pX (x)fY |X=x (y)
x∈TX

= pX (1)fY |X=1 (y) + pX (2)fY |X=2 (y) + pX (3)fY |X=3 (y)


2
1 1 1 e−y /8
= × e−y + × 2e−2y + × √
3 3 3 2 2π
1 1 2
= [e−y + 2e−2y + √ e−y /8 ]
3 2 2π

3. Let X ∼ Uniform{1, 2}. Let (Y | X = 1) ∼ Exp(2) and (Y | X = 2) ∼ Exp(4). Find


the value of fX|Y =3 (2).

2e−12
a)
e−6 + 2e−12
e−6
b) −6
e + 2e−12
e−12
c) −6
e + e−12
e−6
d) −6
e + e−12
Solution:
Given that, X ∼ Uniform{1, 2}, therefore pX (1) = pX (2) = 21 .
The marginal density of Y is given by
X
fY (y) = pX (x)fY |X=x (y)
x∈TX

= pX (1)fY |X=1 (y) + pX (2)fY |X=2 (y)


1 1
= × 2e−2y + × 4e−4y
2 2
= e−2y + 2e−4y

And
pX (2)fY |X=2 (3)
fX|Y =3 (2) =
fY (3)
1
2
× 4e−4×3
= −2×3
e + 2e−4×3
2e−12
= −6
e + 2e−12

2
4. The joint density function of two continuous random variables X and Y is given as
(
kxy 0 < x < 4, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the value of k. Enter your answer correct to two decimals accuracy.
Solution: R∞ R∞
We know that for joint PDF, −∞ −∞ fXY (x, y)dxdy = 1
Since fXY (x, y) is nonzero in the region 0 < x < 4, 0 < y < 1.
Z 1Z 4
⇒ fXY (x, y)dxdy = 1
0 0
Z 1Z 4
⇒ kxy dxdy = 1
0 0
Z 1
y 2 4
⇒ kx dx = 1
0 2 0
Z 1
⇒ 8kxdx = 1
0
x2 1
⇒ 8k = 1
2 0
1
⇒ k = = 0.25
4

5. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : x + y < 4, x > 0, y > 0}. Find the value
of P (2X + Y > 2).
1
a) 8
7
b) 8
3
c) 4
1
d) 4

Solution:

(X, Y ) ∼ Uniform(D), therefore


(
1
8
(x, y) ∈ D
fXY (x, y) =
0 otherwise

3
1
Area of the lower shaded region (A) will be 2
×1×2=1

P (2X + Y > 2) = 1 − P (2X + Y ≤ 2)


|A|
=1−
|D|
1
=1−
8
7
=
8

6. The joint density function of the random variables X and Y is given by


(
x + y 0 < x < 1, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the value of P (X + Y < 1).


1
a) 3
2
b) 3
1
c) 6
3
d) 4

Solution:

4
Z 1 Z 1−y
P (X + Y < 1) = (x + y)dxdy
0 0
Z 1 2 
x 1−y
= + xy dy
0 2 0
Z 1
(1 − y)2

= + (1 − y)y dy
0 2
 1
(1 − y)3 y 2 y 3

= − + −
6 2 3

0
   
1 1 1
= − − −
2 3 6
1
=
3
7. The joint PDF of two continuous random variables X and Y is given by
(
2
(5x + 2y) 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fXY (x, y) = 7
0 otherwise
Find the marginal PDF of X.
a) (
2x 0 ≤ x ≤ 1
fX (x) =
0 otherwise
b) (
2
7
(5x + 1) 0 ≤ x ≤ 1
fX (x) =
0 otherwise
c) (
2
7
(3x + 2) 0 ≤ x ≤ 1
fX (x) =
0 otherwise

5
d) (
2
7
(5y + 1) 0 ≤ x ≤ 1
fX (x) =
0 otherwise

Solution:
For 0 ≤ x ≤ 1
Z 1
2
fX (x) = (5x + 2y)dy
0 7
 1
2y 2

2
= 5xy +
7 2

0
2
= (5x + 1)
7

8. Let X and Y be jointly continuous random variables with joint PDF


(
k(2 − y) 0 < x < 4, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the marginal PDF of Y.

a) (
3
2
y(2 − y) 0 < y < 1
fY (y) =
0 otherwise

b) (
2y 0<y<1
fY (y) =
0 otherwise

c) (
3
2
(1 − y2) 0 < y < 1
fY (y) =
0 otherwise

d) (
2
3
(2 − y) 0 < y < 1
fY (y) =
0 otherwise

Solution: R∞ R∞
We know that for joint PDF, −∞ −∞ fXY (x, y)dxdy = 1

6
Since fXY (x, y) is nonzero in the region 0 < x < 4, 0 < y < 1.
Z 1Z 4
⇒ fXY (x, y)dxdy = 1
0 0
Z 1Z 4
⇒ k(2 − y)dxdy = 1
0 0
Z 1 4
⇒ k(2 − y)x dy = 1

0 0
Z 1
⇒ 4k(2 − y)dy = 1
0
 1
y 2

⇒ 4k 2y − =1
2
0
3
⇒ 4k × = 1
2
1
⇒k=
6
For 0 < y < 1
Z 4
1
fY (y) = (2 − y)dx
0 6
1 4
= (2 − y)x

6 0
2
= (2 − y)
3

9. Let X and Y be two independent continuous random variables with PDFs fX (x) and
fY (y) given as
(
1 0≤x<1
fX (x) =
0 otherwise
(
y/2 0 ≤ y < 2
fY (y) =
0 otherwise
Find the value of P (2X + Y > 1).
1
a) 24
11
b) 12
1
c) 12
23
d) 24

7
Solution:
Given that X and Y be two independent continuous random variables,
therefore fXY (x, y) = fX (x)fY (y).
(
y/2 0 ≤ x < 1, 0 ≤ y < 2
fXY (x, y) =
0 otherwise
We have to find the value of P (2X + Y > 1).
And
P (2X + Y > 1) = 1 − P (2X + Y ≤ 1)
1−y
Z 1 Z
2 y
P (2X + Y ≤ 1) = dxdy
2
Z0 1 0
1−y
y 2
= x dy
2 0
Z0 1
1
= y(1 − y)dy
0 4
 1
1 y 2 y 3

= −
4 2 3

0
1
=
24
1 23
⇒ P (2X + Y > 1) = 1 − 24
= 24

10. The joint density function of two random variables X and Y is given by
(
8xy 0 ≤ x ≤ 1, 0 ≤ y ≤ x
fXY (x, y) =
0 otherwise

Are X and Y independent?

8
a) Yes
b) No

Solution:

Z x
fX (x) = 8xy dy
0
x
y 2
= 8x
2
0
3
= 4x

Z 1
fY (y) = 8xy dx
0
1
x2
= 8y
2
0
= 4y

fX (x)fY (y) = 4x3 × 4y = 16x3 y 6= fXY (x, y).


Hence X and Y are not independent.

11. Let (X, Y ) ∼ Uniform(D), where D = [3, 5] × [2, 4]. Are X and Y independent?

a) Yes
b) No

Solution:
(X, Y ) ∼ Uniform(D), therefore

9
(
1
4
3 ≤ x ≤ 5, 2 ≤ y ≤ 4
fXY (x, y) =
0 otherwise

Z 4
1
fX (x) = dy
2 4
4
1
= y
4
2
1
=
2

Z 5
1
fY (y) = dx
3 4
5
1
= x
4
3
1
=
2
fX (x)fY (y) = 21 × 12 = 41 = fXY (x, y).
Hence X and Y are independent.

12. The joint PDF of two random variables X and Y is given by


(
4xy 0 < x < 1, 0 < y < 1
fXY (x, y) =
0 otherwise

Find the distribution of X | Y = 0.5. (fX|Y =0.5 (x))

a) (
2x 0 < x < 1
fX|Y =0.5 (x) =
0 otherwise

b) (
3x2 0<x<1
fX|Y =0.5 (x) =
0 otherwise

c) (
4x3 0<x<1
fX|Y =0.5 (x) =
0 otherwise

10
d) (
1 0<x<1
fX|Y =0.5 (x) =
0 otherwise
Solution:
For 0 < y < 1
Z 1
fY (y) = 4xy dx
0
1
x2
= 4y
2
0
= 2y
The distribution of X | Y = 0.5, (0 < x < 1) is given by

fXY (x, 0.5)


fX|Y =0.5 (x) =
fY (0.5)
4x × 0.5
=
2 × 0.5
= 2x

13. The joint PDF of two random variables X and Y is given by


(
x2 + xy
3
0 ≤ x ≤ 1, 0 ≤ y ≤ 2
fXY (x, y) =
0 otherwise

Find the value of P ( 14 < X < 1


2
| Y = 1).
83
a) 96
13
b) 96
13
c) 48
35
d) 48

Solution:
For 0 < y < 1
Z 1 xy 
fY (y) = x2 +
dx
0 3
 3  1
x x2 y
= +
3 6

0
1 1
= + y
3 6

11
fXY (x, 1)
fX|Y =1 (x) =
fY (1)
x + x×1
2
= 1 1 3
3
+ ×1
 6 x
= 2 x2 +
3

  Z 1/2 
1 1 x
P <X< |Y =1 = 2 x2 + dx
4 2 1/4 3
 3  1/2
x x2
=2 +
3 6

1/4
   
1 1 1 1
=2 + − +
24 24 192 96
 
1 1
=2 −
12 64
13
=
96

12
Statistics for Data Science - 2
Week 6 graded Assignment
Solution

1. A person randomly chooses a battery from a store which has 40 batteries of type A and
60 batteries of type B. Battery life of type A and type B batteries are exponentially
distributed with average life of 4 years and 6 years, respectively. If the chosen battery
lasts for 5 years, what is the probability that the battery is of type A?
1
(a) 5
1 + e 12
1
(b) −5
1 + e 12
−4
e5
(c) −6
1+e 5
−6
e5
(d) −4
1+e 5

Solution:
Define a event X as follows:
(
1 If the chosen battery is of type A
X=
0 If the chosen battery is of type B

Let Y denote the battery life of the chosen battery.


By the given information, we have
Y |X = 1 ∼ Exp( 14 ) and

Y |X = 0 ∼ Exp( 16 )

It implies that

−y
fY |X=1 (y) = 14 e 4 ; y > 0 and

−y
fY |X=0 (y) = 16 e 6 ;y > 0

Also given that

40 2
P (X = 1) = = and
100 5
60 3
P (X = 0) = =
100 5

To find: fX|Y =5 (1). Now,

fY |X=1 (5).P (X = 1)
fX|Y =5 (1) =
fY (5)

fY |X=1 (5).P (X = 1)
=
fY |X=1 (5).P (X = 1) + fY |X=0 (5).P (X = 0)

1 −5
4
e 4 . 52
= 1 −5 −5
4
e 4 . 52 + 16 e 6 . 35

1 −5
10
e4
= 1 −5
1 −5
10
e 4 + 10 e6

−5
e 4
= −5 −5
e 4+e 6

1
= 5
1 + e 12

2. Let Y = XZ + X, where X ∼ Uniform{1, 2, 3} and Z ∼ Normal(1, 4) are independent.


Find the value of fX|Y =2 (2).

3 exp( 18 )
(a)
3 exp( 18 ) + 6 + 2 exp( 29 )
3 exp( −1
8
)
(b)
3 exp( −1
8
) + 6 + 2 exp( −2
9
)
2 exp( −2
9
)
(c)
3 exp( 8 ) + 6 + 2 exp( −2
−1
9
)
6
(d)
3 exp( 32 ) + 6 + 2 exp( −1
−1
18
)
Solution:
Given that X ∼ Uniform{1, 2, 3} and Z ∼ Normal(1, 4) are independent.
Y = XZ + X
It implies that

Page 2
Y |X = 1 = Z + 1 ∼ Normal(2, 4)
Y |X = 2 = 2Z + 2 ∼ Normal(4, 16)
Y |X = 3 = 3Z + 3 ∼ Normal(6, 36)

Therefore,  
−(y−2)2
fY |X=1 (y) = √1 exp
2 2π 8
 
−(y−4)2
fY |X=2 (y) = √1 exp
4 2π 32
 
−(y−6)2
fY |X=3 (y) = √1 exp
6 2π 72

To find: fX|Y =2 (2).

fY |X=2 (2).fX (2)


fX|Y =2 (2) =
fY |X=2 (2).fX (2) + fY |X=1 (2).fX (1) + fY |X=3 (2).fX (3)

 
−(2−4)2
√1 exp . 13
4 2π 32
=      
−(2−4)2 −(2−2)2 −(2−6)2
√1 exp . 31 + √1 exp . 13 + √1 exp . 31
4 2π 32 2 2π 8 6 2π 72

exp −1 1

8 4
= 1
exp −1 −2
 1
+ 2 exp(0) + 16 exp

4 8 9

3 exp( −1
8
)
=
3 exp( 8 ) + 6 + 2 exp( −2
−1
9
)

3. The joint pdf of two continuous ranodm variables X and Y is given by


(
4xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fXY (x, y) =
0 otherwise

Are X and Y independent?

1. Yes
2. No

Solution:
First we will calculate the marginal densities of X and Y .

Page 3
For 0 ≤ x ≤ 1
Z 1
fX (x) = fXY (x, y)dy
0
Z 1
= 4xydy
0
1

= 2xy 2


0
= 2x

For 0 ≤ y ≤ 1
Z 1
fY (y) = fXY (x, y)dx
0
Z 1
= 4xydx
0
1

2
= 2x y

0
= 2y

Therefore,
fX (x).fY (y) = 4xy = fXY (x, y)
It implies that X and Y are independent random variables.

4. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : (x − 1)2 + (y − 1)2 ≤ 1}. Calculate


P (X ≥ Y ).
Solution:

y=x
2

x
1 2

Page 4
The region X ≥ Y will be the lower half part of the circle.

Therefore,
Area of lower half circle
P (X ≥ Y ) =
Area of the circle
π(1)2/2
=
π(1)2
1
=
2

5. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : y ≤ 2x, 0 < x < 1, 0 < y < 2} ∪ [1, 2] ×
[0, 2]. Find the marginal density of X.

(a) 
 2x + 2 0≤x≤2
fX (x) = 3 3
0 otherwise

(b) 
 2x + 1 0≤x≤2
fX (x) = 3 3
0 otherwise

(c) 
2x
3
 0≤x≤1
2
fX (x) = 1≤x≤2
3
0 otherwise

(d) 
2x
3
 0≤x≤1
1
fX (x) = 1≤x≤2
3
0 otherwise

Page 5
y

2 y = 2x

x
1 2

D denotes the area of the support(X, Y ).


Area of D = 12 × 1 × 2 + 1 × 2 = 3
Since (X, Y ) ∼ Uniform(D), it implies that
1
fXY (x, y) = , x, y ∈ D
3
R
We know that fX (x) = fXY (x, y)dy

For 0 < x < 1


Z 2x
1
fX (x) = dy
0 3
2x
1
= y
3
0
2x
=
3
For 1 < x < 2
Z 2
1
fX (x) = dy
0 3
2
1
= y
3
0
2
=
3
Therefore, marginal density of X is given by

2x
3 0≤x≤1

fX (x) = 23 1 ≤ x ≤ 2

0 otherwise

Page 6
6. The joint pdf of two random variables X and Y is given by
(
24xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, x + y ≤ 1
fXY (x, y) =
0 otherwise

Choose the correct option(s).


(a) P (X + Y ≤ 41 ) = 1
2
(b) P (X + Y ≤ 12 ) = 1
16
(c) X and Y are independent random variables.
(d) X and Y are dependent random variables.
Solution:
Option (a)

x+y =1

0.25

x
0.25 1

Orange region will denote X + Y x≤+ 14y. =


Now,
1
Z 1/4 Z 1/4−y
1
P (X + Y ≤ ) = fXY (x, y)dxdy
4 y=0 x=0

Z 1/4 Z 1/4−y

= 24xydxdy
y=0 x=0

1/4−y
Z 1/4
= 12x2 y dy

y=0
x=0

Page 7
Z 1/4  2
1
= 12y −y dy
y=0 4

Z 1/4
12
= y(1 − 4y)2 dy
y=0 16

Z 1/4
3
= y(1 + 16y 2 − 8y)dy
4 y=0

 1/4
y2 8y 3

3
= + 4y 4 −

4 2 3


y=0

 
3 1 1 1
= + −
4 32 64 24
3 1 1
= . =
4 192 256

Hence, option (a) is wrong.

Option (b)

x + y = 0.5
0.5 x+y =1

x
0.5 1

Page 8
Orange region will denote X + Y ≤ 12 . Now,
Z 1/2 Z 1/2−y
1
P (X + Y ≤ ) = fXY (x, y)dxdy
2 y=0 x=0

Z 1/2 Z 1/2−y

= 24xydxdy
y=0 x=0

1/2−y
Z 1/2
2
= 12x y dy
y=0
x=0

Z 1/2  2
1
= 12y −y dy
y=0 2

Z 1/2
12
= y(1 − 2y)2 dy
y=0 4

Z 1/2

=3 y(1 + 4y 2 − 4y)dy
y=0

 1/2
y2 4y 3

=3 + y4 −

2 3


y=0

 
1 1 1
=3 + −
8 16 6
2 1
=3× =
96 16

Hence, option (b) is correct.

Option (c) and (d)

Page 9
y

0.5 x+y =1

x
0.5 1

For 0 < x < 1


Z 1−x
fX (x) = fXY (x, y)dy
y=0
Z 1−x
= 24xydy
y=0
1−x

2
= 12xy

y=0
= 12x(1 − x)2

For 0 < y < 1


Z 1−y
fY (y) = fXY (x, y)dx
Zx=0
1−y
= 24xydx
0
1−y

= 12x2 y


x=0
= 12y(1 − y)2

Therefore, fX (x).fY (y) = 144xy(1 − x)2 (1 − y)2 6= fXY (x, y)

Hence, X and Y are not independent.

Page 10
7. The joint pdf of two random variables X and Y is given by
(
3xy(1 − x) 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
fXY (x, y) =
0 otherwise

Calculate P (X > 12 |Y = 1).


Solution:
We know that
fXY (a < X < b, y)
P (a < X < b|Y = y) =
fY (y)
Now,
Z 1
fY (y) = 3xy(1 − x)dx
0
Z 1
= (3xy − 3x2 y)dx
0
 2  1
3x y
= − x3 y

2
0
3y y
= −y =
2 2
1
Therefore, fY (1) = 2
Now,

1 fXY (X > 12 , Y = 1)
P (X > |Y = 1) =
2 fY (1)
1
= 2fXY (X > , Y = 1)
2
Z 1
= 2(3x(1 − x))dx
x= 12
Z 1
=6 (x − x2 )dx
1
2
 1
x2 x3

=6 −
2 3 1

  2 
1 1 1 1 1 1
=6 − −6 − =1− =
2 3 8 24 2 2

8. The amount of milk (in litres) in a shop at the beginning of any day is a random amount
X from which a random amount Y (in litres) is sold during that day. Assume that the

Page 11
joint density function of X and Y is given by
(
1
0 ≤ x ≤ 10, 0 ≤ y ≤ x
fXY (x, y) = 50
0 otherwise
Find the probability that amount of milk left at the end of day is less than 5 litres. Write
your answer correct to two decimal points.
Solution:
y
y=x
10

5
x−y =5

x
5 10

X denotes the amount of milk at the beginning of any day and Y denotes the amount
of milk which is sold during that day.
Therefore, amount of milk left at the end of the day will be denoted by X − Y .

To find: P (X − Y < 5)

In the diagram above, brown region denotes X −Y < 5 and brown + blue region denotes
the support of X and Y .

1
Area of the support(X, Y ) = 2
× 10 × 10 = 50.

Area of brown region = Area of support(X, Y )− area of blue region

⇒ area of brown region = 50 − 12 × 5 × 5 = 75


2

Therefore,
area of brown region
P (X − Y < 5) =
area of support
75/2
=
50
75
=
100

Page 12
9. The joint pdf of two continuous random variables X and Y is given by
(
ke−(x+y) x ≥ 0, y ≥ 0
fXY (x, y) =
0 otherwise

Find the value of P (X ≥ 5, Y ≤ 5).

(a) e−10
(b) (e−5 − 1)e−5
(c) (1 − e−5 )e−5
(d) (e−5 + 1)e−5

Solution:
We know that Z Z
fXY dxdy = 1
Supp(X,Y )

Therefore,
Z ∞ Z ∞
(ke−(x+y) )dxdy = 1
y=0 x=0
Z ∞ Z ∞
⇒k e−y e−x dxdy = 1
y=0 x=0
Z ∞ ∞

−y −x
⇒k e (−e ) dy = 1
y=0
0
Z ∞
−y
⇒k e (0 + 1)dy = 1
y=0
Z ∞
⇒k e−y dy = 1
y=0


⇒k(−e−y ) = 1


0
⇒k(0 + 1) = 1
⇒k = 1

To find: P (X ≥ 5, Y ≤ 5)

Page 13
Now,
Z 5 Z ∞
P (X ≥ 5, Y ≤ 5) = (e−(x+y) )dxdy
y=0 x=5
Z 5 Z∞
= e−y e−x dxdy
y=0 x=5
Z 5 ∞

−y −x
= e (−e ) dy
y=0
5
Z 5
= e−y (0 + e−5 )dy
y=0
Z 5
−5
= (e ) e−y dy
y=0
5

−5 −y
= (e )(−e )

0
= (e−5 )(−e−5 + 1)
= (e−5 )(1 − e−5 )

10. The joint pdf of two random variables X and Y is given by


(
1
(x + y) 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
fXY (x, y) = 8
0 otherwise
 
1 1
Find the value of P ≤ y ≤ 1 | (X = ) . Write your answer correct to two decimal
2 2
points.
Solution:
We know that
fXY (X = x, a < Y < b)
P (a < Y < b|X = x) =
fX (x)
Now,
Z 2
1
fX (x) = (x + y)dy
0 8
 2
y 2

1
= xy +
8 2

0
2x + 2 x+1
= =
8 4

Therefore, fX ( 12 ) = 3
8

Page 14
Now,

1 1 fXY (X = 12 , 12 ≤ Y ≤ 1)
P ( ≤ Y ≤ 1|X = ) =
2 2 fX ( 12 )

Z 1   
8 1 1
= + y dy
1/2 3 8 2

Z 1  
1 1
= + y dy
1/2 3 2

 1
y y2

= +

6 6


1/2

   
1 1 1 1
= + − +
6 6 12 24

1 1 5
= − = = 0.20
3 8 24

Page 15
Statistics for Data Science - 2
Week 7 practice Assignment
Statistics from samples and Limit theorems

X
1. If X, Y ∼ i.i.d. Normal(0, 4), what will be the variance of ?
Y
(a) 4
(b) 2
(c) 1
(d) Undefined

Solution:
X
We know that if X, Y ∼ i.i.d. Normal(0, σ 2 ), ∼ Cauchy(0, 1) and variance of Cauchy
Y
distribution is undefined.
Therefore, option(d) is correct.

2. A population has mean 60 and standard deviation 6. Random samples of size 100 from
this population are collected independently. Find the expected value of the sample mean.
Solution:
We know that expected value of the sample mean X is given by

E[X] = µ
= 60

3. Let X1 , X2 , X3 , X4 and X5 ∼ i.i.d. Normal(2, 25). Calculate P (2X1 + X2 + 3X3 + X4 +


X5 ≥ 10).

1. FZ (0.3)
2. 1 − FZ (0.3)
3. FZ (−0.3)
4. 1 − FZ (−0.3)

Solution:

We know that linear combination of independent Normal distributions is again a normal


distribution.
Hence, 2X1 + X2 + 3X3 + X4 + X5 will follow a Normal distribution.
Let Y = 2X1 + X2 + 3X3 + X4 + X5
E[Y ] = E[2X1 + X2 + 3X3 + X4 + X5 ] = (2 + 1 + 3 + 1 + 1)E[X] = 16
Var(Y ) = Var(2X1 + X2 + 3X3 + X4 + X5 ) = (4 + 1 + 9 + 1 + 1)Var(X) = 400

It implies that Y ∼ Normal(16, 202 ).


To find: P (Y ≥ 10)

Now,

P (Y ≥ 10) = P (Y − 16 ≥ −6)
Y − 16 −6
= P( ≥ )
20 20
Y − 16
= P( ≥ −0.3)
20
= P (Z ≥ −0.3)
= 1 − P (Z < −0.3)
= 1 − FZ (−0.3)

4. Random samples of size 100 are collected from a population of unknown parameters. If
the variance of the sample mean is 36, what will be the standard deviation of the actual
population?
Solution:
σ2
We know that variance of the sample mean is given by where σ is the standard
n
deviation of the actual population and n is the sample size.

By the given information, we have

σ2
= 36
n
σ2
⇒ = 36
100
⇒σ 2 = 3600
⇒σ = 60

Therefore, standard deviation of the actual population is 60.

5. A random sample of size 50 is collected from a population with a standard deviation of


5. Find the upper bound on the probability that the sample mean will be at least 10
away from the actual mean using the weak law of large numbers. Write your answer
correct to three decimal places.

Page 2
Solution:
Given: standard deviation of the population, σ = 5
Sample size, n = 50

To find: upper bound on P (|X − µ| ≥ 10) where X and µ are sample mean and popu-
lation mean, respectively.

Now, by weak law of large number, we have

σ2
P (|X − µ| ≥ δ) ≤
nδ 2
25
⇒P (|X − µ| ≥ 10) ≤
100 × 50
⇒P (|X − µ| ≥ 10) ≤ 0.005

6. A study shows that the average daily sleeping hours of teenagers is ten hours with a
standard deviation of two hours. If a sample of 100 teenagers is collected, what will be
the probability that the mean of the sleeping hours of these 100 teenagers is at least 0.4
hours away from the population mean? Assume that each observation in the sample is
independent. Assume that FZ denotes the CDF of standard normal distribution.

(a) 1 + FZ (−2) − FZ (2)


(b) 1 − FZ (−2) + FZ (2)
(c) FZ (2) − FZ (−2)
(d) FZ (2)

Solution:
let X denote the average daily sleeping hours of teenagers.
Given: standard deviation of X, σ = 2
Sample size, n = 100

To find: P (|X − µ| ≥ 0.4) where X and µ are sample mean and population mean,
respectively.

Let S = X1 + X2 + . . . X100 where Xi denotes the ith sample.


S − nµ S − 100µ
By CLT, we know that √ ∼ Normal(0, 1) ⇒ ∼ Z (Standard Normal)
σ n 20

Page 3
Now,

S
P (|X − µ| ≥ 0.4) = P ( − µ ≥ 0.4)

n

S − nµ
= P ( ≥ 0.4)

n
S − nµ 0.4√n

= P ( √ ≥ )

σ n σ
= P (|Z| ≥ 2)
= P (Z ≥ 2) + P (Z ≤ −2)
= 1 − P (Z ≤ 2) + P (Z ≤ −2)
= 1 − FZ (2) + FZ (−2)

7. What is the fourth moment of the Normal(0, 4) distribution?


Solution:
λ2 X 2 λ3 X 3
MX (λ) = E[eλX ] = E[1 + λX + + + ...]
2! 3!
λ2 E[X 2 ] λ3 E[X 3 ]
= 1 + λE[X] + + + ...
2! 3!
In the moment generating function, coefficient of λ will give first moment (E[X]), co-
λ2 λk
efficient of will give the second moment (E[X 2 ]) and similarly, coefficient of will
2! k!
give the kth moment (E[X k ]).

2 σ2
Moment generating function of Normal(0, σ 2 ) is given by eλ /2
.
Let N ∼ Normal(0, 22 )
λ2 22/2
MN (λ) = e
λ2 22 λ4 24
=1+ + + ...
2 2!(4)
λ2 22 λ2
=1+ + 48 + . . .
2 4!

λ4
Therefore, 4th moment of Normal(0, 22 ) = coefficient of = 48
4!

8. Let X ∼ Gamma(2, 12 ) and Y ∼ Gamma(5, 21 ) be two independent random variables.


X
What will be the expected value of ? Write your answer correct to two decimal
X +Y

Page 4
places.
Solution:
We know that if X ∼ Gamma(α, k) and Y ∼ Gamma(β, k) be two independent random
X
variables, then ∼ Beta(α, β).
X +Y

Given that X ∼ Gamma(2, 12 ) and Y ∼ Gamma(5, 12 ) are two independent random


variables. It implies that
X
∼ Beta(2, 5)
X +Y
 
X 2
Therefore, E = = 0.28
X +Y 2+5

9. A study says that the delivery time of pizzas has a standard deviation of 10 minutes. A
pizza shop collected the data of some deliveries and their
√ delivery time. The probability
that the mean delivery time of this sample is at least 5 minutes away from the actual
mean delivery time is at most 51 as per the weak law of large numbers. What is the size
of the sample?
Solution:
Let X denote the delivery time of pizzas.
Given that σ = 10 √
To find: size of the sample such that P (|X − µ| ≥ 5) ≤ 15 ...(1).
By the weak law of large numbers, we have

σ2
P (|X − µ| ≥ δ) ≤ 2

√ 100
⇒P (|X − µ| ≥ 5) ≤ ...(1)
n×5

By equation (1) and (2), we have


1 100
=
5 5n
⇒n = 100

10. A company sells eggs whose weights are normally distributed with a mean of 70g and a
standard deviation of 2g. Suppose that these eggs are sold in packages that each contain
four eggs. Assume that the weight of each egg is independent. What is the probability
that the mean weight of the four eggs in a package is greater than 68.5g? Write your
answer correct to two decimal places.
(Hint: Use the fact that linear combination of normal distributions is again a normal
distribution. FZ (−1.5) = 0.066)

Page 5
Solution:
Let X denote the weight of an egg.
Given that E[X] = µ = 70
SD(X) = σ = 2
X ∼ Normal(70, 22 ) Let X1 , X2 , X3 and X4 denote the weights of four eggs in a package.

Suppose that
X1 + X2 + X3 + X4
X=
4

To find: P (X > 68.5)

We know that linear combination of independent Normal distribution is again a Normal


distribution.
It implies that X is a Normal distribution.

E[X] = µ = 70 and
σ2 4
Var(X) = = =1
n 4

It implies that X ∼ Normal(70, 1) ⇒ X − 70 ∼ Normal(0, 1)

Now,

P (X > 68.5) = P (X − 70 > −1.5)


= P (Z > −1.5)
= 1 − FZ (−1.5)
= 1 − 0.066 = 0.93

11. Let X1 , X2 , X3 , . . . Xn be i.i.d. Poisson(4). What should be the value of n such that
P (3.8 ≤ X ≤ 4.2) ≥ 0.95? [2 marks]
(Hint: Use FZ (1.96) = 0.975)

1. at least 200
2. at least 385
3. at least 450
4. at least 585

Solution:
Given that X1 , X2 , X3 , . . . Xn ∼ i.i.d. Poisson(4)

Page 6
Mean of the distribution = µ = 4
Variance of the distribution = σ 2 = 4
Let S = X1 + X2 + . . . + Xn and
X1 + X2 + . . . + Xn
X=
n

To find: value of n such that P (3.8 ≤ X ≤ 4.2) ≥ 0.95


By CLT, we know that
S − nµ
√ ∼ Normal(0, 1)

S − 4n
⇒ √ ∼ Normal(0, 1) ...(1)
2 n

P (3.8 ≤ X ≤ 4.2) ≥ 0.95


S
⇒P (3.8 ≤ ≤ 4.2) ≥ 0.95
n
S
⇒P (−0.2 ≤ − 4 ≤ 0.2) ≥ 0.95
n
S − 4n
⇒P (−0.2 ≤ ≤ 0.2) ≥ 0.95
n
S − 4n
⇒P (−0.1 ≤ ≤ 0.1) ≥ 0.95
2n
√ S − 2n √
⇒P (−0.1 n ≤ √ ≤ 0.1 n) ≥ 0.95
2 n
√ √
⇒FZ (0.1 n) − FZ (−0.1 n) ≥ 0.95
√ √
⇒FZ (0.1 n) − (1 − FZ (0.1 n)) ≥ 0.95

⇒2FZ (0.1 n) − 1 ≥ 0.95

⇒Fz (0.1 n) ≥ 0.975

⇒0.1 n ≥ 1.96
⇒n ≥ 384.16

12. Let the moment generating function of a random variable X be given by


         
1 −4λ 1 −2λ 1 2λ 1 4λ 5
MX (λ) = e + e + e + e +
8 6 6 8 12
Find the distribution of X. [1 mark]

X −4 −2 0 2 4
1 1 1 1 5
P (X = x) 8 6 6 8 12

1.

Page 7
X −4 −2 0 2 4
5 1 1 1 1
P (X = x) 12 8 6 6 8

2.

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8

3.

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 8 6

4.

Solution:
The MGF of a discrete random variable X with the PMF fX (x) = P (X = x), x ∈ TX
is given by

MX (λ) = E[eλX ]
X
= P (X = x).eλx
x∈TX

Now, MGF of a random variable X be given by


         
1 −4λ 1 −2λ 1 2λ 1 4λ 5
MX (λ) = e + e + e + e +
8 6 6 8 12

Therefore, distribution of X is given by

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8

13. A fair die is rolled 3600 times. Use CLT to compute the probability that six appears at
most 630 times. Enter the answer correct to two decimal places.
(Hint: Use FZ (1.341) = 0.91)
Solution:
Define a random variable X such that
(
1 if six appears on rolling a fair die
X=
0 otherwise

Page 8
1
Therefore, E[X] = µ = and
6
1 5 5
Var(X) = σ 2 = . =
6 6 36

Let X1 , X2 , . . . , X3600 be outcomes on rolling the fair die 3600 times.


Notice that X1 +X2 +. . .+X3600 will denote the number of times six appears in 3600 rolls.

Let S = X1 + X2 + . . . + X3600

To find: P (S ≤ 630)

By CLT, we know that


S − 3600µ
√ ∼ Normal(0, 1)
σ n
S − 600
⇒ √ ∼ Normal(0, 1)
10 5
Now,
P (S ≤ 630) = P (S − 600 ≤ 30)
S − 600 30
= P( √ ≤ √ )
10 5 10 5
= P (Z ≤ 1.34)
= 0.91

14. A fair die is rolled 1000 times. Let X denote the number of times six is obtained. Find
X 1
a bound for the probability that differs from by more than 0.2 using weak law
1000 6
of large numbers.
5
1. at least
1440
1436
2. at least
1440
5
3. at most
1440
1436
4. at most
1440
Solution:
X denotes the number of times six is obtained on rolling the die 1000 times.
Let X1 , X2 , . . . , X1000 be 1000 i.i.d. samples such that
(
1 if six appears on rolling a fair die
Xi =
0 otherwise

Page 9
1
E[Xi ] = µ = and
6
5
Var(Xi ) = σ 2 =
36
Notice that X = X1 + X2 + X3 + . . . + X1000
!
X 1
To find: Bound on P − > 0.2 .

1000 6

By weak law of large numbers, we have


σ2
P (|X − µ| > δ) ≤ 2
nδ!
X 1 5
⇒P − > 0.2 ≤

1000 6 36 × 1000 × 0.04
!
X 1 5
⇒P − > 0.2 ≤

1000 6 1440

15. Consider the following PDF curves and match them with the correct distribution. [1
mark]

Graph 1 Graph 2

Graph 3 Graph 4

Page 10
(a) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Gamma, Graph 4 → Beta.
(b) Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 → Gamma.
(c) Graph 1 → Beta, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Gamma.
(d) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Beta.

Solution:
Graph 1: Range of the distribution is [0, 1] and shape of the graph resembles to the Beta
distribution.

Graph 2: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.

Graph 3: PDF curve is symmetric about mean and shape of the graph resembles to the
Normal distribution.

Graph 4: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.
Therefore, Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 →
Gamma.

16. Let X1 , X2 and X3 ∼ i.i.d. X where X has the following probability mass function:

x -1 2
2 1
fX (x) 3 3

Table 7.1.P: PMF of X

Find the distribution of Y = X1 + X2 + X3 . [1 mark]

Y -3 0 3 6
(a) 1 1 1 1
P (Y = y) 6 6 3 3

Y -3 0 3 6
(b) 8 4 2 1
P (Y = y) 27 9 9 27

Y -3 0 3 6
(c) 8 1 4 2
P (Y = y) 27 27 9 9

Y -3 0 3 6
(d) 2 8 1 4
P (Y = y) 9 27 27 9

Page 11
Solution:
The PMF of X is given by

x -1 2
2 1
fX (x) 3 3

Given that Y = X1 + X2 + X3 where X1 , X2 and X3 ∼ i.i.d. X.


To find: Distribution of Y .

We will find the distribution of X by finding the MGF of Y .

MY (λ) = E[eλY ]
= E[eλ(X1 +X2 +X3 ) ]
= E[eλX1 eλX2 eλX3 ]
= E[eλX1 ]E[eλX2 ]E[eλX3 ] (Since, X1 , X2 and X3 are independent)
λX λX λX
= E[e ]E[e ]E[e ] (Since, X1 , X2 and X3 ∼ i.i.d. X)
= [MX (λ)]3 ...(1)

Now,
MX (λ) = E[eλX ]
= e−1λ .P (X = −1) + e2λ .P (X = 2)
2e−λ e2λ
= + ...(2)
3 3
From equation (1) and (2), we have

3
2e−λ e2λ

MY (λ) = +
3 3
1
= (2e−λ + e2λ )3
27
1
= (8e−3λ + e6λ + 12e−2λ e2λ + 6e−λ e4λ ) (since, (a + b)3 = a3 + b3 + 3a2 b + 3ab2 )
27
8 1 4 2
= e−3λ + e6λ + + e3λ
27 27 9 9
Therefore, distribution of Y is given by

Y -3 0 3 6
8 4 2 1
P (Y = y) 27 9 9 27

Page 12

You might also like