Discrete Random Variables and Probabilities
Discrete Random Variables and Probabilities
1. A random variable X is defined as the length of the hypotenuse of the right-angled tri-
angle whose other two sides are determined by the roll of two 6-sided dice. How many
values does X take? [1 mark]
Solution:
When two dice are rolled then there are a total of 36 outcomes.
The outcomes are:
{(1, 1) , (1, 2), ... , (1, 6),
(2, 1), (2, 2), ... , (2, 6),
...
...
(6, 1), (6, 2), ... , (6, 6)}
But the outcomes like (1, 2) (2, 1) will give the same length of the hypotenuse, hence a
total of 21 values are possible for the random variable X.
2. Two cards are drawn from a well shuffled pack of 52 cards one after other without
replacement. A random variable is defined as:
(
0 if both cards are of same color
X=
1 if both cards are of different color
x 0 1
(a) 1 12
fX (x) 13 13
x 0 1
(b) 1 1
fX (x) 2 2
x 0 1
(c) 25 26
fX (x) 51 51
x 0 1
(d) 12 13
fX (x) 25 25
Solution:
P (X = 0) = P (Both the cards are of same colors)
= P (First card is any one of 52 cards).P (2nd card is of same color as of 1st card)
25
= 1.
51
3. In a group of fifteen people, 8 people have blood group type O, 4 people have blood group
type A, and 3 people have blood group type B. If five people are selected randomly from
these fifteen people, then what is the probability that out of these five people 2 people
have blood group type O, 2 have blood group type A and one has blood group type B?
(Answer the question correct up to two decimal places.) [2 mark]
Solution:
Number of ways of selecting five people out of 15 = 15 C5
Number of ways of selecting 2 people of blood group of type O out of 8 people of blood
group of type O= 8 C2
Number of ways of selecting 2 people of blood group of type A out of 4 people of blood
group of type A= 4 C2
Number of ways of selecting 1 people of blood group of type B out of 3 people of blood
group of type B= 3 C1
8
C2 4 C2 3 C1
Therefore, required probability = 15 C
5
28 × 6 × 3
. = = 0.167
3003
x -2 -1 0 1 2
fX (x) a 0.2 b 0.1 0.2
3
If P (X ≤ 1|X ≥ −1) = , then find the value of P (X = −2). [2 marks]
4
Solution:
Page 2
We know that
X
fX (x) = 1
x∈TX
3
P (X ≤ 1|X ≥ −1) =
4
P (X ≤ 1, X ≥ −1) 3
⇒ =
P (X ≥ −1) 4
P ({−1, 0, 1}) 3
⇒ =
P ({−1, 0, 1, 2}) 4
b + 0.3 3
⇒ =
b + 0.5 4
⇒4b + 1.2 = 3b + 1.5
⇒b = 0.3 ...(2)
a = 0.2
b = 0.3
P (X = −2) = a
⇒P (X = −2) = 0.2
5. Siberian seagulls migrate to Ganga river to escape harsh winter weather in the months
of October to March. It is seen that the number of Siberian seagulls reaching Ganga
river on one day in January is Poisson distributed with an average of 1000. What is the
probability that 650 seagulls will arrive on a given day of January? [2 marks]
e−650 (650)1000
(a)
650!
−650
e (650)1000
(b)
1000!
−1000
e (650)1000
(c)
650!
Page 3
e−1000 (1000)650
(d)
650!
Solution:
Let X be the number of Siberian seagulls migrating everyday near to Ganga river.
By given condition, we have
X ∼ Poisson(1000)
e−λ λx
P (X = x) =
x!
e−1000 (1000)650
⇒P (X = 650) =
650!
x -1 0 1 2 3
fX (x) 0.1 0.3 0.2 0.1 0.3
If another random variable Y is defined as Y = X(X − 1), then find the smallest value
1 1
of y in the range of Y such that P (Y ≤ y) > and P (Y ≥ y) ≤ . [2 marks]
2 2
Solution:
Y is defined as Y = X(X − 1)
At X = −1, Y = −1(−2) = 2
At X = 0, Y = 0(−1) = 0
At X = 1, Y = 1(0) = 0
At X = 2, Y = 2(1) = 2
At X = 3, Y = 3(2) = 6
Therefore, TY = {0, 2, 6}
Page 4
First required condition is not satisfied at Y = 0.
7. Three friends toss three fair coins to decide who is going to pay for the dinner. The per-
son getting an outcome different from the other two outcomes will pay for the dinner. If
all three coins result in the same outcome, they will toss the coins again. If X denotes
the number of trials needed to decide who is going to pay, then what is the probability
that X is at most 3? (Answer the question correct up to two decimal places.) [2 marks]
Solution:
Let X be the number of trials to decide who is going to pay.
Sample space on tossing three coins are:
{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT }
P (They will decide who is going to pay) = P ({ HHT, HTH, THH, HTT, THT, TTH }) =
6
8
= 43
P (They will not decide who is going to pay) = P ({ HHH, TTT }) = 28 = 14
X will take values as 1, 2, 3, 4, ...
and X ∼ Geometric( 34 )
P (X ≤ 3) = P (X = 1) + P (X = 2) + P (X = 3)
2
3 1 3 1 3
= + . + .
4 4 4 4 4
= 0.98
6
8. Let X ∼ Uniform({1, 2, 3, ... n}). If the probability that X is an odd number is ,
11
then what can be the value of n? [2 marks]
(a) 11 only
(b) 12 only
(c) Any multiple of 11.
(d) Any odd multiple of 11.
Solution:
Since, X ∼ Uniform({1, 2, 3, ... n})
Let A be the event that X takes odd numbers.
Therefore,
number of outcomes in A
P (A) = ...(1)
number of outcomes in S
where S = {1, 2, 3, ...n}
Page 5
It is given that
6
P (A) = ...(2)
11
By equation (1) and (2), we have
n should be multiple of 11 and number of odd numbers less than or equal to n should
be multiple of 6.
This is possible only for n = 11.
9. The number of customers arriving per day at a certain automobile service facility is
assumed to follow a Poisson distribution with an average of 50 customers arriving each
day. Assume that number of customers on different days are independent. What is the
probability that exactly 40 customers will come for at least 5 days over a 30 days period?
[3 marks]
4
X x 30−x
30 e−50 (50)40 e−50 (50)40
(a) 1 − Cx 40!
1− 40!
x=0
4
X x 30−x
30 e−50 (50)40 e−50 (50)40
(b) Cx 40!
1− 40!
x=0
5 −50 (50)40 25
e−50 (50)40
(c) 30
C5 . 1 − e 40!
40!
−50 (50)40 5
−50 40 25
(d) 30 C5 1 − e 40! . e 40!(50)
Solution:
Let X be the number of customers arriving per day at a certain automobile service
facility.
X ∼ P oisson(50)
e−50 5040
P (X = 40) =
40!
Let Y be the number of days in the next 30 days on which 40 customers have arrived
on that particular shop.
e−50 5040
Then, Y ∼ Binomial 30,
40!
Now,
P (Y ≥ 5) = 1 − P (Y < 5)
4 x 30−x !
e−50 (50)40 e−50 (50)40
X
30
1− Cx 1−
40! 40!
x=0
Page 6
10. A biased coin with the probability of 0.4 of showing head is tossed until it shows either
two consecutive heads or two consecutive tails. If X denotes the number of tosses
required, what is the value of P (X = 5)? [3 marks]
(a) 0.03456
(b) 0.02304
(c) 0.01675
(d) 0.0576
Solution:
It is clear that
P (X = 5) = P (HTHTT) + P (THTHH)
= (0.4)2 (0.6)3 + (0.4)3 (0.6)2
= 0.0576
Page 7
Statistics for Data Science - 2
1. A customer will purchase a shirt with probability 0.5. The customer will purchase a
pant with probability 0.4 and will purchase both a shirt and a pant with probability 0.2.
What is the probability that the customer will purchase neither a shirt nor a pant?
Solution:
Let A be the event that the customer will purchase a shirt and B be the event that the
customer will purchase a pant.
Given that, P (A) = 0.5 and P (B) = 0.4.
Also given that the customer will purchase both a shirt and a pant with probability 0.2.
i.e. P (A ∩ B) = 0.2.
We have to find the probability that the customer will purchase neither a shirt nor a
pant i.e. P (AC ∩ B C ).
We know that P (AC ∩ B C ) = P ((A ∪ B)C ) = 1 − P (A ∪ B)
And, P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.5 + 0.4 − 0.2 = 0.7
⇒ P (AC ∩ B C ) = 1 − P (A ∪ B) = 1 − 0.7 = 0.3
2. Suppose that we roll a pair of fair dice, so each of the 36 possible outcomes is equally
likely. Let A denote the event that the first die shows 5, B be the event such that the
sum of the outcomes of rolling the pair of dice is 10, and C be the event such that the
sum of the outcomes of rolling the pair of dice is 7. Then
Solution:
We are rolling a pair of fair dice and all the 36 outcomes is equally likely that means
probability of occurring each outcome is same i.e. 1/36.
A is the event that the first die shows 5.
⇒ A = {(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)}
B is the event that the sum of the outcomes of rolling the pair of dice is 10.
⇒ B = {(4, 6), (5, 5), (6, 4)}
C is the event that the sum of the outcomes of rolling the pair of dice is 7.
⇒ C = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
Also, A ∩ B = {(5, 5)} and A ∩ C = {(5, 2)}
Since each outcome is equally likely, so
1
6 3 6 1 1
P (A) = 36
, P (B) = 36
, P (C) = 36
, P (A ∩ B) = 36
and P (A ∩ C) = 36
1
Since P (A ∩ B) = 36
6= P (A)P (B) ⇒ event A and B are not independent.
1
Also, P (A ∩ C) = 36 = 61 × 16 = P (A)P (C) ⇒ event A and C are independent.
Hence, option (b) and (c) are correct.
3. Let A and B be two independent events of a random experiment. Then, which of the
following is/are always true?
Solution:
Given that A and B are two independent events ⇒ P (A ∩ B) = P (A)P (B).
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= P (A) + P (B) − P (A)P (B)
= P (A)[1 − P (B)] + P (B)
= P (A)P (B C ) + P (B)
4. The probability that a student registered for IITM online degree program will pass the
qualifier exam is 0.6 independent of all other students. Find the probability that out of
10,000 registered students, 7,000 students will pass the qualifier exam.
2
a) (0.6)3000 (0.4)7000
b) (0.6)7,000 (0.4)3,000
10,000
c) C7,000 (0.6)3,000 (0.4)7,000
10,000
d) C7,000 (0.6)7,000 (0.4)3,000
Solution:
Probability(p) that the student registered for IITM online degree program will pass the
qualifier exam is 0.6.
We have to find the probability that out of 10,000 registered students, 7,000 students will
pass the qualifier exam and passing qualifier exam for any student will be independent
of the other.
So here we can use binomial distribution with X will be number of students who will
pass the exam along with p = 0.6, n = 10, 000, and k = 7, 000.
And we know that for binomial distribution P (X = k) = n Ck pk (1 − p)(n−k)
Hence, probability that out of 10,000 registered students, 7,000 students will pass the
qualifier exam is 10,000 C7,000 (0.6)7,000 (0.4)3,000 .
a) (0.05)6 × 0.95
b) (0.95)6 × 0.05
c) (0.95)5 × 0.05
d) (0.05)5 × 0.95
Solution:
We have to find the probability that the first defect is observed when the sixth compo-
nent is tested.
The probability of a defective computer component is 0.05.
Here we can assume that getting a defective component is success. That means we have
to find the probability of first success at 6th trials with p given as 0.05.
So here we can use geometric distribution with X representing the number of compo-
nents tested along with p = 0.05
And we know that for geometric distribution P (X = k) = (1 − p)k−1 p.
3
⇒ P (X = 6) = (1 − 0.05)6−1 × 0.05
⇒ P (X = 6) = (0.95)5 × 0.05
Hence the probability that the first defect is observed when the sixth component is tested
is (0.95)5 × 0.05.
6. If Aarushi and Ansh play a game of chess, Aarushi wins with probability 0.5 and Ansh
wins with probability 0.4 and the game ends in a draw with probability 0.1, independent
of all other games. They agree to play a match consisting of 5 games. Find the proba-
bility that Aarushi wins 4-1 (win gives 1 pt to winner and draw gives 0.5 pts to both).
Enter your answer correct to 3 decimals accuracy.
Solution:
Let Ai be the event that Aarushi will win the ith game and Bj be the event that Ansh
will win the jth game.
From given information we have P (Ai ) = 0.5, P (Bj ) = 0.4
There are two disjoint ways that Aarushi wins 4-1.
i) Aarushi wins 4 games and Ansh wins one game.
Probability of happening this will be 5 C4 (0.5)4 × 0.4 = 0.125
ii) Aarushi wins 3 games and 2 games are drawn.
Probability of happening this will be 5 C3 (0.5)3 × (0.1)2 = 0.0125
So, the probability that Aarushi wins 4-1 is 0.125 + 0.0125 = 0.1375
7. The probability of someone catching flu in a particular winter when they have been
given the flu vaccine is 0.2. Without the vaccine, the probability of catching flu is 0.5. If
40% of the population has been given the vaccine, what is the probability that a person
chosen at random from the population will catch flu over that winter? Enter the answer
correct to 2 decimals accuracy.
Solution:
Let A be the event that the person will catch flu and V be the event that the person
has been given the vaccine.
Given that P (A | V ) = 0.2, P (A | V C ) = 0.5 and P (V ) = 0.4
We have to find the probability that a person chosen at random from the population
will catch flu over that winter i.e. P (A).
And we can write P (A) = P (A | V )P (V ) + P (A | V C )P (V C )
⇒ P (A) = 0.2 × 0.4 + 0.5 × (1 − 0.4)
⇒ P (A) = 0.38
8. Suppose you are playing a game of cards with your friend. Your friend is supposed
to give you 13 cards one by one. With a well-shuffled pack of 52 cards, what is the
probability that you are dealt a perfect hand(13 of one suit)?
13!
a)
52!
4
12! × 39!
b)
51!
13! × 39!
c)
51!
13! × 39!
d)
52!
Solution:
Your friend is supposed to give you 13 cards one by one. Need to find the probability
that you are dealt a perfect hand i.e. you have gotten 13 cards of one suit.
For the first card, it can be any card from the 52 cards so probability will be 1.
Once the first card is given to you, the probability for the second card to be of same suit
will be 12
51
because once the first card is given to you it will belong to one particular suit
and second card will be conditional on that.
11
Similarly for the third card, probability will be 50 .
Continue like this, we get that the probability that you are dealt a perfect hand is
12 11 10 9 8 7 6 5 4 3 2 1
=1× × × × × × × × × × × ×
51 50 49 48 47 46 45 44 43 42 41 40
12! × 39!
=
51!
9. A person has bought a bed from an online furniture store. The seller delivers the
disassembled bed parts along with some screws to assemble it. The probability of a
screw being defective is 0.1 independent of all other screws. To compensate for the
manufacturing error, the seller sends two extra screws in the package where the bed
needs exactly 8 screws to assemble. What is the probability that the buyer will be able
to assemble the bed? (Enter the answer correct to 4 decimal accuracy)
Solution:
Let X represents the number of screws that seller sends with the bed.
We need exactly 8 screws to assemble the bed and the seller sends two extra i.e. seller
sends ten screws.
The buyer will be able to assemble the bed if 8 screws are non - defective or 9 screws
are non - defective or 10 screws are non - defective out of the ten screws.
We can relate this with binomial distribution as X ∼ Binomial(10, p) where p is the
probability of a screw being non - defective and value of p will be 1 - 0.1 = 0.9
The buyer will be able to assemble the bed if at least 8 screws are non - defective.
So, the probability that the buyer will be able to assemble the bed is P (X ≥ 8).
5
And
P (X ≥ 8) = P (X = 8) + P (X = 9) + P (X = 10)
= 10 C8 (0.9)8 (0.1)2 + 10 C9 (0.9)9 (0.1)1 + 10 C10 (0.9)10 (0.1)0
= (0.9)8 [(0.1)2 × 45 + 10 × 0.9 × 0.1 + 0.81]
= (0.9)8 × 2.16
= 0.9298
10. In a pizza shop 40% of the customers order medium size pizza, 50% order small size
pizza, and 10% order large size pizza. Of those ordering medium size pizza 32 also ask to
add extra toppings. Of those ordering small size pizza 15 also ask to add extra toppings,
and of those ordering large size pizza 45 also ask to add extra toppings. Given that a
customer asked to add extra toppings, find the conditional probability that the customer
ordered a medium pizza.
15
a) 67
40
b) 67
12
c) 67
52
d) 67
Solution:
Let S, M and L denote the event that customer will order small, medium and large size
pizza, respectively.
Given that P (S) = 0.50, P (M ) = 0.40 and P (L) = 0.10.
Also, let T be the event that customer will ask to add extra toppings.
This implies that P (T | S) = 15 , P (T | M ) = 23 and P (T | L) = 54 .
We need to find P (M | T ).
And
P (M ∩ T )
P (M | T ) =
P (T )
P (T | M )P (M )
=
P (T | S)P (S) + P (T | M )P (M ) + P (T | L)P (L)
2
3
× 0.40
= 1
5
× 0.50 + 3 × 0.40 + 45 × 0.10
2
0.80 15
= ×
3 6.7
40
=
67
6
Statistics for Data Science - 2
1. Toss a coin 50 times. Let the random variable X be defined as the number of tails
observed. Find the average of the values in the range of the random variable.
Solution:
Random variable X is defined as the number of tails observed while tossing the coin 50
times.
So the possible values taken by X is 0, 1, 2, 3 ....48, 49, 50.
⇒ Range of X = {0, 1, 2, 3....., 48, 49, 50}
Average of range values = sum of all values of range/ total number of values
0+1+2+3+.....+48+49+50 1275
⇒ Average of range values = 51
= 51
= 25
2. Suppose that 5 fruits are randomly chosen from a basket containing 20 fruits, of which
16 are good and 4 are rotten. Let Y denote the number of rotten fruits chosen. Find
the possible values taken by Y .
a) {1, 2, 3, 4, 5}
b) {0, 1, 2, 3, 4, 5}
c) {1, 2, 3, 4}
d) {0, 1, 2, 3, 4}
Solution:
Random variable Y is defined as the number of rotten fruits chosen from the basket
while drawing 5 fruits. Since there are only 4 rotten fruits, so Y cannot take values more
than 4. Also there are 16 good fruits, so while drawing fruits there can be 0 rotten fruit
or 1 rotten fruit or 2 rotten fruits or 3 rotten fruits or 4 rotten fruits.
Hence, the possible values taken by Y i.e Range = {0, 1, 2, 3, 4}.
3. Let X be the number of candies present in a box. We have the following information:
There are at most four candies in the box.
The probability of having 2 candies in the box is the same as the probability of having
one candy.
The probability of having no candy in the box is the same as the probability of having
3 candies.
The probability of having four candies is twice of the probability of having three candies
and four times of having two candies.
What will be the PMF of X?
1
X 0 1 2 3 4
a) 1 1 2 2 4
P (X = x) 10 10 10 10 10
X 0 1 2 3 4
b) 2 1 1 2 4
P (X = x) 10 10 10 10 10
X 0 1 2 3 4
c) 1 2 1 2 4
P (X = x) 10 10 10 10 10
X 0 1 2 3 4
d) 4 2 1 1 2
P (X = x) 10 10 10 10 10
Solution:
Given that there are at most four candies in the box, so X cannot take values more than
4.
Also given that
P (X = 2) = P (X = 1), P (X = 0) = P (X = 3), P (X = 4) = 2P (X = 3) and
P (X = 4) = 4P (X = 2).
Let P (X = 2) = p and P (X = 0) = q
⇒ 2q = 4p
⇒ q = 2p
And we know that P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) = 1
⇒ q + p + p + q + 2q = 1
⇒ 4q + 2p = 1
Using the above relation, we will get 4 × 2p + 2p = 1
⇒ p = 1/10 and hence q = 2/10.
So, P (X = 0) = 2/10, P (X = 1) = 1/10, P (X = 2) = 1/10, P (X = 3) = 2/10, and
P (X = 4) = 4/10.
Therefore, option b is the correct answer.
X 0 1 2 3 4 5 6
P (X = x) 0 k 4k 6k 4k 10k 2 6k 2
Find the value of P (X ≤ 4). Enter your answer correct up to 4 decimals accuracy.
Solution:
P6
We know that P (X = x) = 1
x=0
⇒ P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) + P (X = 5) + P (X = 6)
=1
⇒ 0 + k + 4k + 6k + 4k + 10k 2 + 6k 2 = 1
2
⇒ 16k 2 + 15k − 1 = 0
⇒ (16k − 1)(k + 1) = 0
⇒ k = −1 or k = 1/16
Since k cannot take negative values, so k must be 1/16.
Now,
P (X ≤ 4) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)
= 0 + k + 4k + 6k + 4k
= 15k
1
= 15 ×
16
= 0.9375
5. I roll two fair six sided dice and observe the two outcomes. Let the random variables
Y and Z denote the outcomes observed on the two dice and let X = Y + Z. Find
P (Y = 3|X = 6).
Solution:
Y and Z denotes the outcomes observed on the two dice.
Given X = Y + Z, so the favourable outcomes for X = 6 will be {(1,5),(2,4),(3,3),(4,2),
(5,1)}.
From the reduced sample space the favourable outcomes for (Y = 3|X = 6) will be
{(3,3)}.
Hence, P (Y = 3|X = 6) = 51 = 0.2
3
At X = 3
Y = (3 − 1)(3 + 1)(3 + 3) = 48
This implies that Y is taking values -3, 0, 15, and 48.
So,
7. A shopkeeper sells mobile phones. The demand for mobile phone follows a Poisson dis-
tribution with mean 4.6 per week. The shopkeeper has 5 mobile phones in his shop at
the beginning of a week. Find the probability that this will not be enough to satisfy the
demand for mobile phones in that week. Enter your answer correct up to two decimals
accuracy.
Solution:
The shopkeeper has 5 mobile phones in his shop at the beginning of a week. The shop-
keeper will not be able to satisfy the demand for mobile phones in that week only if
the demand of mobile phone is more than 5 phones. So, we need to find the value of
P (X > 5).
Also given that demand for mobile phone follows a Poisson distribution with mean 4.6
per week. i.e. λ = 4.6
P (X > 5) = 1 − P (X ≤ 5)
= 1 − [P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) + P (X = 5)]
−4.6
e (4.6)0 e−4.6 (4.6)1 e−4.6 (4.6)2 e−4.6 (4.6)3 e−4.6 (4.6)4 e−4.6 (4.6)5
=1− + + + + +
0! 1! 2! 3! 4! 5!
−4.6
= 1 − e [1 + 4.6 + 10.58 + 16.22 + 18.66 + 17.16]
= 1 − 0.68
= 0.32
8. Suppose that in the end semester paper of Statistics there are 18 multiple-choice ques-
tions (only one option is correct for each question). Each question has 4 possible options.
You know the answer to 8 questions, but you have no idea about the other 10 questions
and choose answers randomly and independently. Your score X of the exam is the total
number of correct answers. Find the value of P (X ≥ 12). Enter your answer correct up
to 2 decimals accuracy.
Solution:
Since your score is the total number of correct answers and you know the answer to 8
questions.
4
So, instead of finding the value of P (X ≥ 12), define a new random variable Y and
find the value of P (Y ≥ 4) from the set of 10 questions for which you do not know the
answer.
Also there are four options to each question and only one is correct. That means prob-
ability of getting an answer correct is 1/4 and each question is independent of other.
So we can use binomial distribution with n = 10 and p = 0.25
Now,
P (Y ≥ 4) = 1 − P (Y < 4)
= 1 − [P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3)]
h 1 0 3 10 1 1 3 9 1 2 3 8 1 3 3 7 i
= 1 − 10 C0 + 10 C1 + 10 C2 + 10 C3
4 4 4 4 4 4 4 4
3 7 h 3 3 1 1 3 2 1 2 3 1 1 3 i
=1− + 10 + 45 + 120
4 4 4 4 4 4 4
3 7 h 372 i
=1−
4 64
= 1 − 0.78
= 0.22
9. A fruit owner sells fruit in a lot that contains 50 fruits. A customer selects 5 fruits at
random from a lot and rejects the lot (will not purchase) if one of the 5 selected fruits
is rotten. What is the probability that the customer will purchase the lot if there are 4
rotten fruits in the lot? Enter your answer correct up to 2 decimals accuracy.
Solution:
Given that there are 4 rotten fruits in the lot that contains 50 fruits.
Customer will purchase the lot if out of 5 selected fruits there is no rotten fruit.
Probability that there will not be any rotten fruit in 5 selected fruits will be
4
C0 46 C5 1370754
50 C
= = 0.6469
5 2118760
Accepted range: 0.61 - 0.67
10. Suppose the probability that any given person will independently believe a tale about
the existence of a parallel universe is 0.6. What is the probability that the eighth person
to hear this tale about existence of a parallel universe is the fifth one to believe it?
a) 8 C5 (0.6)5 (0.4)3
7
b) C4 (0.6)5 (0.4)3
c) 8 C5 (0.6)3 (0.4)5
d) 7 C4 (0.6)3 (0.4)5
5
Solution:
Given that the probability that any given person will believe a tale about the existence
of parallel universe is 0.6.
We need to find the probability that the eighth person to hear this tale about existence
of parallel universe is the fifth one to believe it.
We can put this into other words as out of 7 trials we need 4 successes and 8th trial also
a success.(Here success is considered as the probability that the person will believe the
tale about the existence of parallel universe)
Probability of getting 4 successes out of 7 will be 7 C4 (0.6)4 (0.4)3
Combining that 8th trial also, success will be 7 C4 (0.6)4 (0.4)3 × 0.6.
This implies that the probability that the eighth person to hear this tale about existence
of parallel universe is the fifth one to believe it is 7 C4 (0.6)5 (0.4)3 .
11. Suppose the number of visitors arriving at a zoo can be modeled to be Poisson dis-
tributed. On an average 20 visitors arrive per hour. Let X be the number of visitors
arriving from 2pm to 4pm. Then the probability that at least 35 visitors will arrive in
the given duration is
k=∞
P e−20 (20)k
a)
k=35 k!
k=34
P e−20 (20)k
b) 1 −
k=0 k!
k=∞
P e−40 (40)k
c)
k=35 k!
k=34
P e−40 (40)k
d) 1 −
k=0 k!
Solution:
Given that on an average 20 visitors arrive per hour and X is the number of visitors
arriving from 2pm to 4pm. So, here λ = 20 × 2 = 40
Now we have to find the probability that at least 35 visitors will arrive in the given
duration, that is from 2pm to 4pm.
6
Also we can write
7
Statistics for Data Science - 2
1. Let X and Y be two random variables with joint distribution given in Table 3.1.P, where
a and b are two unknown values.
X
0 1 2
Y
1 3
0 a
12 12
2 1
1 b
12 12
3 1 1
2
12 12 12
i) Find P (Y = 1).
4
a)
12
3
b)
12
5
c)
12
1
d)
12
Solution: P
We know that, fXY (x, y) = 1
x∈TX , y∈TY
1 3 2 1 3 11
⇒ 12 + 12 +a + 12 + b + 12 + 12 + 12
+ 12 =1
⇒a+b=0
Since a and b cannot take negative values ⇒ a = b = 0.
1
Now,
X
P (Y = 1) = fXY (x, 1)
x∈TX
2 1
= +b+
12 12
3
= +0
12
3
=
12
ii) Find P (Y = 1 | X = 2).
1
a)
12
1
b)
4
1
c)
3
1
d)
2
Solution:
P (Y = 1, X = 2)
P (Y = 1 | X = 2) =
P (X = 2)
1
= 12
1 1
a+ +
12 12
1
=
2
2
Solution:
P (X = 0, Y ≥ 1) = P (X = 0, Y = 1) + P (X = 0, Y = 2)
2 3
= +
12 12
5
=
12
2. Let X and Y be two independent discrete random variables with CDFs FX and FY ,
respectively. Define another random variable Z = min(X, Y ), then the CDF of Z is
a) min(FX , FY )
b) FX FY
c) FX + FY + FX FY
d) FX + FY − FX FY
Solution:
FZ (z) = P (Z ≤ z) = P (min(X, Y ) ≤ z)
= 1 − P (min(X, Y ) > z)
= 1 − P (X > z, Y > z)
fZ (3) = P (Z = 3) = P (X − Y = 3)
= P (X = 4, Y = 1) + P (X = 5, Y = 2) + P (X = 6, Y = 3)
fZ (3) = P (X = 4, Y = 1) + P (X = 5, Y = 2) + P (X = 6, Y = 3)
= P (X = 4)P (Y = 1) + P (X = 5)P (Y = 2) + P (X = 6)P (Y = 3)
1 1 1 1 1 1
= × + × + ×
6 6 6 6 6 6
3
=
36
1
=
12
a) p > 0.02
b) p < 0.04
c) p > 0.15
d) p < 0.30
e) p = 0.05
Solution:
If X ∼ Geometric(p) and Y ∼ Geometric(p) are two independent random variables and
Z = X + Y , then
P (Z = n) = (n − 1)p2 (1 − p)n−2 (try derivation by yourself)
We have to find the value of p for which P (Z = 26) > P (Z = 25).
P (Z = 26) = (26 − 1)p2 (1 − p)26−2 and P (Z = 25) = (25 − 1)p2 (1 − p)25−2
Comparing both, we will get
25p2 (1 − p)24 > 24p2 (1 − p)23
⇒ 25(1 − p) > 24
24
⇒ 1 − p > 25
⇒ p < 0.04
4
5. Let X ∼ Uniform({1, 2, 3, 4, 5, 6}) and let Y be the number of times 2 occurs in X throws
of a fair die. Choose the incorrect option(s) among the following.
1
a) P (Y = 2 | X = 2) =
6
52
b) P (Y = 2 | X = 4) =
63
5
c) P (Y = 5 | X = 6) = 5
6
5
d) P (Y = 6 | X = 5) = 6
6
Solution:
P (Y = 2 | X = 4) ∼ Bin(4, 1/6)
1 5
= 4 C 2 ( )2 ( )2
6 6
52
= 3
6
P (Y = 5 | X = 6) ∼ Bin(6, 1/6)
1 5
= 6 C 5 ( )5 ( )1
6 6
5
= 5
6
P (Y = 6 | X = 5) ∼ Bin(5, 1/6)
=0
6. Let the random variables X and Y each have range {1, 2, 3}. The following formula
gives the joint PMF
i + 2j
P (X = i, Y = j) = ,
c
where c is an unknown value. Find P (1 ≤ X ≤ 3, 1 < Y ≤ 3).
5
5
a) 9
7
b) 9
2
c) 9
4
d) 9
Solution: P
We know that, P (X = x, Y = y) = 1
x∈TX , y∈TY
⇒ P (X = 1, Y = 1) + P (X = 1, Y = 2) + P (X = 1, Y = 3) + P (X = 2, Y = 1) + P (X =
2, Y = 2) + P (X = 2, Y = 3) + P (X = 3, Y = 1) + P (X = 3, Y = 2) + P (X = 3, Y =
3) = 1
⇒ 3c + 5c + 7c + 4c + 6c + 8c + 5c + 7c + 9
c
=1
⇒ c = 54
Now,
P (1 ≤ X ≤ 3, 1 < Y ≤ 3) = P (X = 1, Y = 2) + P (X = 1, Y = 3) + P (X = 2, Y = 2)
+ P (X = 2, Y = 3) + P (X = 3, Y = 2) + P (X = 3, Y = 3)
1
⇒ P (1 ≤ X ≤ 3, 1 < Y ≤ 3) = [5 + 7 + 6 + 8 + 7 + 9]
c
42
=
54
7
=
9
7. The joint PMF of the random variables X and Y is given in Table 3.2.P.
X
1 2 3
Y
1 k k 2k
2 2k 0 4k
3 3k k 6k
6
b) {4, 8, 18}
c) {1, 9}
d) {2, 18}
e) {2, 8, 18}
Solution: P
We know that, P (X = x, Y = y) = 1
x∈TX , y∈TY
⇒ k + k + 2k + 2k + 0 + 4k + 3k + k + 6k = 1
1
⇒ k = 20
When Y = 2, P (X = 2, Y = 2) = 0. So for the range we will not consider the pair (2,
2).
Since Z = X 2 Y , the range of Z | Y = 2 will be {12 × 2, 32 × 2} which is equal to {2, 18}.
ii) Find the value of P (Z = 18 | Y = 2).
1
a) 3
2
b) 3
3
c) 4
1
d) 4
Solution:
P (Z = 18, Y = 2)
P (Z = 18 | Y = 2) =
P (Y = 2)
P (X = 3, Y = 2)
=
P (X = 1, Y = 2) + P (X = 3, Y = 2)
4k
=
2k + 4k
2
=
3
8. The following options gives the joint PMF of the random variables X and Y . If the
random variables X and Y are independent, then which of the following option(s) can
be the joint PMF of X and Y ?
7
Y
0 1 2
X
0 0.01 0 0
1 0.09 0.09 0
2 0 0 0.81
a)
Y
0 1 2
X
b)
Y
0 1 2
X
1 1 1
0
12 24 24
1 1 1
1
6 12 8
1 1 1
2
4 8 12
c)
Y
0 1 2
X
1 1 1
0
10 5 5
1 1 3
1
10 10 10
d)
8
Y
0 1
X
0 0.10 0.15
1 0.20 0.30
2 0.10 0.15
e)
Solution:
In option a)
P (X = 0, Y = 1) = 0 but P (X = 0) = 0.01+0+0 = 0.01 and P (Y = 1) = 0+0.09+0 =
0.09
⇒ P (X = 0, Y = 1) 6= P (X = 0)P (Y = 1)
Therefore, option (a) cannot be the joint PMF of X and Y.
In option b)
P (X = 0, Y = 0) = 0.06 but P (X = 0) = 0.06 + 0.18 + 0.12 = 0.36 and P (Y = 0) =
0.06 + 0.04 = 0.10
⇒ P (X = 0, Y = 0) = 0.06 6= 0.036 = P (X = 0)P (Y = 0)
Therefore, option (b) cannot be the joint PMF of X and Y.
In option c)
P (X = 1, Y = 0) = 1/6 but P (X = 1) = 1/6 + 1/12 + 1/8 = 3/8 and P (Y = 0) =
1/12 + 1/6 + 1/4 = 1/2
⇒ P (X = 1, Y = 0) = 1/6 6= 3/16 = P (X = 1)P (Y = 0)
Therefore, option (c) cannot be the joint PMF of X and Y.
In option d)
P (X = 0, Y = 1) = 1/5 but P (X = 0) = 1/10 + 1/5 + 1/5 = 1/2 and P (Y = 1) =
1/5 + 1/10 = 3/10
⇒ P (X = 0, Y = 1) = 1/5 6= 3/20 = P (X = 0)P (Y = 1)
Therefore, option (d) cannot be the joint PMF of X and Y.
In option e)
For every (x, y), P (X = x, Y = y) = P (X = x)P (Y = y) (check yourself)
Hence option (e) is the joint PMF of X and Y.
Answer: e
9
9. From a sack of fruits containing 3 mangoes, 2 kiwis, and 3 guavas, a random sample of
4 pieces of fruit is selected. If X is the number of mangoes and Y is the number of kiwis
in the sample, then find the joint probability distribution of X and Y .
X
0 1 2 3
Y
3 9 3
0 0
70 70 70
2 18 2 18
1
70 70 70 70
3 9 3
2 0
70 70 70
a)
X
0 1 2 3
Y
3 9 3
0 0
70 70 70
2 18 18 2
1
70 70 70 70
3 9 3
2 0
70 70 70
b)
X
0 1 2 3
Y
3 9 3
0 0
70 70 70
2 18 18 2
1
70 70 70 70
9 3 3
2 0
70 70 70
c)
10
X
0 1 2 3
Y
3 3 9
0 0
70 70 70
2 18 18 2
1
70 70 70 70
3 9 3
2 0
70 70 70
d)
Solution:
X is the number of mangoes and Y is the number of kiwis in the sample. The number
of mangoes and kiwis in the sack is 3 and 2,respectively.
So X will take values in {0, 1, 2, 3} and Y will take values in {0, 1, 2} when the random
sample of 4 pieces is selected.
P (X = 0, Y = 0) = P (no mango and no kiwi) = 0 (not possible since the number of
guava is 3)
2
C1 3 C3 2
P (X = 0, Y = 1) = P (no mango and one kiwi) = 8 =
C4 70
2
C2 3 C2 3
P (X = 0, Y = 2) = P (no mango and two kiwis) = 8C
=
4 70
3
C1 3 C3 3
P (X = 1, Y = 0) = P (one mango and no kiwi) = 8C
=
4 70
3
C1 2 C1 3 C2 18
P (X = 1, Y = 1) = P (one mango and one kiwi) = 8C
=
4 70
3
C1 2 C2 3 C1 9
P (X = 1, Y = 2) = P (one mango and two kiwis) = 8C
=
4 70
3
C2 3 C2 9
P (X = 2, Y = 0) = P (two mangoes and no kiwi) = 8 =
C4 70
3
C2 2 C1 3 C1 18
P (X = 2, Y = 1) = P (two mangoes and one kiwi) = 8C
=
4 70
3
C2 2 C2 3
P (X = 2, Y = 2) = P (two mangoes and two kiwis) = 8C
=
4 70
Similarly you can check for other values also.
Answer: b
10. Suppose you flip a fair coin. If the coin lands heads, you roll a fair six-sided die 50 times.
If the coin lands tails, you roll the die 51 times. Let X be 1 if the coin lands heads and
11
0 if the coin lands tails. Let Y be the total number of times you get the number 5 while
throwing the dice. Find P (X = 1|Y = 10).
85
a)
157
82
b)
167
72
c)
157
85
d)
167
Solution:
246
⇒ P (X = 1|Y = 10) = × P (X = 0|Y = 10)
255
255
⇒ P (X = 1|Y = 10) + P (X = 1|Y = 10) = 1
246
246 82
⇒ P (X = 1|Y = 10) = =
501 167
11. Three balls are selected at random from a box containing five red, four blue, three yellow
and six green coloured balls. If X, Y and Z are the number of red balls, blue balls and
green balls respectively, choose the correct option(s) among the following.
12
25
a) P (X = 1, Y = 0, Z = 2) =
272
5
b) P (X = 1, Y = 1, Z = 1) =
34
1
c) P (X = 1, Y = 0 | Z = 2) =
4
5
d) P (X = 0, Y = 0, Z = 3) =
204
Solution:
5
C1 6 C2 25
P (X = 1, Y = 0, Z = 2) = P (one red ball and 2 green balls) = 18 C
=
3 272
5
C1 4 C1 6 C1
P (X = 1, Y = 1, Z = 1) = P (one red ball, one blue ball and 1 green ball) = 18 C
3
5
=
34
6
C3 5
P (X = 0, Y = 0, Z = 3) = P (3 green balls) = 18 C
=
3 204
And
5
C1
P (X = 1, Y = 0 | Z = 2) = P (one red ball given that two balls are green) = 16 C
1
5
=
16
13
Statistics for Data Science - 2
Week 3 Graded Assignment
Multiple random variables
X
0 1
Y
1 1
1
4 8
1
2 k
4
1
3 0
8
1 1 1 1
⇒ + + +k+0+ =1
4 8 4 8
3 1
⇒k = 1 − =
4 4
Now,
fXY (1, 2)
fY |X=1 (2) =
fX (1)
fXY (1, 2)
=
fXY (1, 1) + fXY (1, 2) + fXY (1, 3)
1
4
= 1 1 1
8
+ + 4 8
1
4 1
= 1 =
2
2
2. Customers at a fast-food restaurant buy both sandwiches and drinks. The following
joint distribution summarizes the numbers of sandwiches (X) and drinks (Y ) purchased
by customers.
X
1 2
Y
1 0.4 0.2
2 0.1 0.25
3 0 0.05
Find the probability that a customer will buy two sandwiches given that he has bought
three drinks. [1 mark]
Solution:
X denotes the number of sandwiches purchased by a customer and Y denotes the num-
ber of drinks purchased by a customer.
To find: fX|Y =3 (2)
Now,
fXY (2, 3)
fX|Y =3 (2) = =
fY (3)
fXY (2, 3)
=
fXY (1, 3) + fXY (2, 3)
0.05
=
0 + 0.05
=1
3. Consider an experiment of tossing a fair coin twice. Let X be the number of heads that
occurs in the two tosses and Y be the number of tails that occurs in the two tosses.
Choose the correct statements. [2 marks]
(a) X and Y are independent random variables.
(b) X and Y are dependent random variables.
1
(c) fXY (1, 1) = .
2
1
(d) fY |X=0 (1) = .
4
Page 2
Solution:
X denotes the number of heads that occurs in the two tosses and Y denotes the number
of tails that occurs in the two tosses.
First we will make the table of the joint pmf of X and Y .
X
0 1 2
Y
1
0 0 0 4
1
1 0 2
0
1
2 4
0 0
It is clear that
fXY (0, 0) 6= fX (0).fY (0)
It implies that X and Y are dependent random variables.
So, option (a) is incorrect and option (b) is correct.
fXY (0, 1)
fY |X=0 (1) = = 0 (Since, fXY (0, 1) = 0)
fX (0)
So, option (d) is incorrect.
4. A fair coin is tossed 4 times. Let X be the total number of heads and Y be the number
of heads before the first tail (If there is no tail in all the four tosses, then Y = 4). What
is the value of fY |X=2 (0)? [2 marks]
5
(a)
16
Page 3
1
(b)
8
9
(c)
16
1
(d)
2
Solution:
A fair coin is tossed four times. X denotes the number of heads and Y denotes the
number of heads before first tail (If there is no tail in all the four tosses, then Y = 4).
Clearly, X ∼ Binomial(4, 21 ).
Now,
fXY (2, 0)
fY |X=2 (0) =
fX (2)
fX|Y =0 (2).fY (0)
= ..(1)
fX (2)
Now, event Y = 0 shows that there is no head before first tail that is first outcome is
tail.
It implies that fY (0) = 21
And 4
fX (2) = 4 C2 12
Putting the values in the equation (1), we get
1 3 1
3
C2 2
.2
fY |X=2 (0) = 4
1
4C
2 2
3 1
= =
6 2
5. Two fair dice are thrown simultaneously. Let X be the outcome on the first die and Y
be the sum of the outcomes on both the dice. Find the value of P (Y − X ≥ 6). [2
marks]
Page 4
1
(a)
6
1
(b)
12
5
(c)
12
1
(d)
24
Solution:
X denotes the outcome on the first die and Y denotes the sum of the outcomes on both
the dice.
Notice that Y − X will denote the outcome on the second die.
Let Z = Y − X, then Z ∼ Uniform({1, 2, 3, 4, 5, 6})
P (Y − X ≥ 6) = P (Z ≥ 6)
P (Y − X ≥ 6) = P (Z = 6)
1
P (Y − X ≥ 6) =
6
6. Let X and Y denote the number of cars and number of bikes reaching a street corner
during a certain 15-minute time period, respectively. Joint distribution of X and Y is
given as
9
fXY (x, y) =
16(4x+y )
Choose the correct option(s). [2 marks]
3
(a) Marginal pmf of X is fX (x) = .
4x+1
3
(b) Marginal pmf of X is fX (x) = .
4x
(c) X and Y are independent random variables.
(d) X and Y are dependent random variables.
Solution:
X and Y denote the number of cars and number of bikes reaching a street corner during
a certain 15-minute time period, respectively.
Range of X and Y will be TX , TY = {0, 1, 2, ..., ∞}
Page 5
Now,
∞
X
fX (x) = fXY (x, y)
y=0
∞
X 9
=
y=0
16(4x+y )
∞
9 X 1
=
16.4x y=0 4y
9 1 1
= 1 + + 2 + ...
16.4x 4 4
9 1
=
16.4x 1 − 14
9 4
= 2 x
4 .4 3
3
= x+1
4
Now, Choose two arbitrary points x and y in the range of X and Y , respectively, then
3 3
fX (x).fY (y) = .
4x+1 4y+1
9
⇒ fX (x).fY (y) =
16(4x+y )
⇒ fX (x).fY (y) = fXY (x, y)
Page 6
(d) fXY (x, y) = fX (x).fY (y)
Solution:
fXY Z (x, y, z)
We know that fX|(Y =y,Z=z) (x) =
fY Z (y, z)
⇒ fXY Z (x, y, z) = fX|(Y =y,Z=z) (x).fY Z (y, z)
Hence, option (a) is correct and option (b) is incorrect.
fXY (x, y) = fX (x).fY (y) is true only when X and Y are independent. Therefore, option
(d) need not to be always true.
8. Two random variables X and Y are jointly distributed with joint pmf
⇒fXY (0, 0) + fXY (0, 1) + fXY (0, 2) + fXY (0, 3) + fXY (1, 0) + fXY (1, 1) + fXY (1, 2)
+ fXY (1, 3) + fXY (2, 0) + fXY (2, 1) + fXY (2, 2) + fXY (2, 3) = 1
Page 7
Now, using the given condition,
4
P (X ≥ 1, Y ≤ 2) =
7
⇒P (X = 1, Y = 0) + P (X = 1, Y = 1) + P (X = 1, Y = 2) + P (X = 2, Y = 0)+
4
P (X = 2, Y = 1) + P (X = 2, Y = 2) =
7
4
⇒ab + ab + a + ab + 2a + 2ab + 2ab + a + 2ab + 2a =
7
4
⇒6a + 9ab = ....(2)
7
Solution:
First we will find the probability such that X1 = 0, X2 = 1 and other two random
variables do not take value 0 and 1.
Page 8
Now,
e−4 40
P (X1 = 0) = = e−4
0!
e−4 41
P (X2 = 1) = = 4e−4
1!
We can choose such pairs of Xi for which exactly one Xi equals 0 and exactly one Xi
equals 1 in 4P2 ways.
Therefore,
probability that exactly one of the Xi equals 0 and exactly one of the Xi equals 1 is
given by
4
P2 e−4 (4e−4 )(1 − 5e−4 )2
= 48e−8 (1 − 5e−4 )2
10. Akshat draws a card randomly from a well-shuffled pack of 52 cards. If the drawn card
is a face card, then he draws two balls randomly from bag A which contains 5 Red, 6
Black and 4 Green balls. If the drawn card is not a face card, then he draws three balls
randomly from bag B which contains 7 Red, 8 Black and 5 Green balls. Let two random
variables X and Y are defined as:
(
0 if the drawn card is a face card
X=
1 if the drawn card is not a face card
and Y be the number of Red balls drawn. Find the value of fY (1). Write your answer
correct up to two decimal places. [3 marks]
Solution:
Akshat draws a card randomly from a well-shuffled pack of 52 cards. Random variable
X is defined as (
0 if the drawn card is a face card
X=
1 if the drawn card is not a face card
Page 9
If the drawn card is a face card, then he draws two balls randomly from bag A which
contains 5 Red, 6 Black and 4 Green balls. If the drawn card is not a face card, then
he draws three balls randomly from bag B which contains 7 Red, 8 Black and 5 Green
balls. Random variable Y is the number of Red balls drawn.
To find: fY (1)
We know that
Page 10
Statistics for Data Science - 2
Week 4 Practice Assignment
Expectation and variance
5 15
1. If the expected value and variance of the Binomial random variable X are and ,
2 8
respectively, then find the value of P (X = 10). [1 mark]
10
3
(a)
4
10
3
(b) 10
4
10
1
(c)
4
10
1
(d) 10
4
Solution: If X ∼ Binomial(n, p), then expected value and variance of X is given by np
and np(1 − p), respectively.
Given that
5
E[X] = np = ...(1)
2
And
15
Var(X) = np(1 − p) = ..(2)
8
Putting the value of np in the equation (2) from equation (1), we get
3 1
(1 − p) = ⇒ p = .
4 4
Putting the value of p in equation (1), we get
n = 10
It implies that X ∼ Binomial 10, 14
1 1
2. X and Y are two independent geometric random variables with parameters and ,
2 4
respectively. Find the value of Var(X + 2Y ). [1 mark]
Solution:
1−p
We know that if X ∼ Geometric(p), then Var(X) =
p2
1
1− 2
Therefore, Var(X) = 1 =2 ...(1)
4
1
1− 4
Var(Y ) = 1 = 12 ...(2)
16
3. The number of spam messages (X) sent to a server in a day has Poisson distribution
with parameter λ = 21. Each spam message independently has a probability of p = 13
of not being detected by the spam filter. Let Y denote the number of spam messages
detected by the filter in a day. Calculate the expected value of X + Y . [2 marks]
solution:
X denotes the number of spam messages sent to the server in a day and
X ∼ Poisson(21)
Therefore, Y ∼ Poisson(14)
4. Two random variables X and Y are jointly distributed with the joint pmf
1
fXY (x, y) = (x + y),
9
where x and y are integers in 0 ≤ x ≤ 2 and 0 ≤ y ≤ 1. Let Z = XY + Y 2 . Find the
expected value of Z. [2 marks]
1
(a)
3
Page 2
4
(b)
3
2
(c)
3
14
(d)
9
Solution:
E[Z] = E[XY + Y 2 ]
X
= (xy + y 2 )fXY (x, y)
0≤x≤2;0≤y≤1
1 X
= (xy + y 2 )(x + y)
9 0≤x≤2;0≤y≤1
1
= (1 + 4 + 9)
9
14
=
9
5. The distribution of a certain company’s employees’ monthly salary has mean |60000 and
standard deviation |20000. The probability that a randomly selected employee from that
company has a salary either greater than or equal to |100000 or less than or equal to
|20000 is: [2 marks]
1
(a) at least
4
1
(b) at most
4
1
(c) at least
2
1
(d) at most
2
Solution:
Let X denote the employees’ monthly salary.
Given that E[X] = µ = 60000 and SD= σ = 20000.
Page 3
Hence, probability that a randomly selected employee from that company has a salary
either greater than or equal to |100000 or less than or equal to |20000 is at most 14 .
6. Two random variables X and Y are jointly distributed with the joint pmf
1
fXY (x, y) = (xy + x + y + 1),
27
where x and y are integers in 0 ≤ x ≤ 1 and 1 ≤ y ≤ 3. Find the correlation coefficient
of X and Y . [2 marks]
Solution:
X
E[X] = xfXY (x, y)
x∈TX ,y∈YY
1 X
= x(xy + x + y + 1)
27 x∈T ,y∈Y
X Y
1
= (4 + 6 + 8)
27
18 2
= =
27 3
X
E[Y ] = yfXY (x, y)
x∈TX ,y∈YY
1 X
= y(xy + x + y + 1)
27 x∈T ,y∈Y
X Y
1
= (2 + 6 + 12 + 4 + 12 + 24)
27
60 20
= =
27 9
X
E[XY ] = xyfXY (x, y)
x∈TX ,y∈YY
1 X
= xy(xy + x + y + 1)
27 x∈T ,y∈Y
X Y
1
= (4 + 12 + 24)
27
40
=
27
Page 4
Cov(X, Y ) = E[XY ] − E[X]E[Y ]
40 2 20
= − .
27 3 9
=0
We know that
Cov(X, Y )
Correlation coefficient = p =0
Var(X)Var(Y )
7. Let X and Y be two independent random variables such that X ∼ Binomial(4, 12 ) and
Y ∼ Uniform({1, 2, 3}). Find the value of Cov(2X + Y , X + Y 2 X). [2marks]
(a) 16.67
(b) 6.67
(c) 13.37
(d) 0
Solution:
Since X and Y are independent random variables, (X 2 , Y 2 ), (X, Y 2 ), (X, Y 3 ) are also
independent. It implies that
E[X 2 Y 2 ] = E[X 2 ]E[Y 2 ]
E[Y 2 X] = E[Y 2 ]E[X]
E[XY 3 ] = E[X]E[Y 3 ]
Therefore,
Page 5
Now, X ∼ Binomial(4, 12 )
Therefore, E[X] = np = 2
Var(X) = np(1 − p) = 1
E[X 2 ] = Var(X) + (E[X])2 = np(1 − p) + (np)2 = 1 + 4 = 5
Therefore,
70 56 56
Cov(2X + Y, X + Y 2 X) = 2(1) + 2( − ) + 24 −
3 3 3
28
= 26 −
3
= 16.67
X
0 1 2
Y
2 5
-1 0
17 17
1 2
0 0
17 17
3 4
1 0
17 17
Find the standard deviation of the product of the two random variables. (Write your
answer correct up to two decimal points.) [2 marks]
Solution:
To find: SD(XY )
Page 6
X
E[XY ] = xyfXY (x, y)
x∈TX ,y∈TY
2 5 4
= −1( ) − 2( ) + 2( )
17 17 17
−4
=
17
X
E[(XY )2 ] = = x2 y 2 fXY (x, y)
x∈TX ,y∈TY
2 5 4
= 1( ) + 4( ) + 4( )
17 17 17
38
=
17
9. An ice-cream seller sells ice creams at three prices: |30, |40, and |50. A random cus-
tomer will buy an ice cream of |30, |40 and |50 with probabilities of 0.5, 0.3, and
0.2, respectively. If the number of customers in a day follows Poisson distribution with
λ = 60, what is the expected sales (in |) of the seller in a day? [3 marks]
Solution:
Let X denote the number of customers coming to the ice-cream seller in a day, then
X ∼ Poisson(60)
Let Y denote the price at which the customer buys the ice-cream, then
E[Y ] = 30(0.5) + 40(0.3) + 50(0.2) = 37
Page 7
But since X ∼ Poisson(60), on an average 60 customers come to the ice-cream seller in
a day. It means that expected sale of the day will be
10. An urn contains 10 balls numbered from 1 to 10. We remove six balls randomly and add
up their numbers. Let X denote the sum of the numbers of the removed balls. Find the
expected value of X. [3 marks]
6
P
(Hint: Suppose Xi denotes the number of the ith removed ball, then X = Xi )
i=1
Solution:
Let Xi , i = 1, 2, ...6 denote the number on the ith ball, then
P6
X= (Xi )
i=1 6
P
⇒ E[X] = E (Xi )
6 i=1
P
⇒ E[X] = E(Xi )
i=1
⇒ E[X] = 6E(Xi ) ...(1)
1 11
Now, E[Xi ] = [1 + 2 + 3 + ...10] =
10 2
Putting the value in equation (1), we get
11
E[X] = 6 × = 33
2
Page 8
Statistics for Data Science - 2
Week 4 Graded assignment solutions
1. Suppose 1 in 100 products that are coming out of a production line is defective. Sup-
pose we randomly pick and keep aside products from the production line till the first
defective item is obtained. Let the random variable X represent the number of prod-
ucts that are kept aside (Assume that the first defective item is also kept aside). Find
Var(X). [1 mark]
1
(a)
100
99
(b)
100
(c) 100
(d) 9900
Solution:
The random variable X represent the number of products that are kept aside (including
the first defective item) before the first defective is obtained.
It is given that 1 out of 100products
are defective.
1
Therefore, X ∼ Geometric
100
Now,
1−p
Var(X) =
p2
1− 1
= 1 100 = 9900
( 100 )2
2. Two coins are tossed. The probabilities of occurrence of tail on the first and the second
coin are 0.6 and 0.4, respectively. If the random variable X represents the number of
heads obtained, find the expected value of X. (Enter the answer correct to 2 decimal
points). [1 mark]
Answer: 1
Solution:
1
Given,
Random variable X denote the number of heads obtained after the tossing of two coins.
Therefore, X will take the values in {0, 1, 2}.
Now,
X
E(X) = xP (X = x)
x∈X
=0.P (X = 0) + 1.P (X = 1) + 2.P (X = 2)
P (X = 1) =P (H on first coin and T on second coin) + P (T on first coin and H on second coin)
=(0.4 × 0.4) + (0.6 × 0.6)
=0.52
3. Let the two random variables X and Y be independent with means equal to 10 and
20, and variances equal to 2 and 4, respectively. Find the value of Var(XY ).
Hint: If X and Y are independent, X 2 and Y 2 are also independent. [1 mark]
Answer: 1208
Solution:
Mean and variance of X is 10 and 2, respectively.
Mean and variance of Y is 20 and 4, respectively.
2
4. Let X and Y be two independent discrete random variables. Define random variables
U and V as
X − E(X) Y − E(Y )
U= , V =
SD(X) SD(Y )
Find Cov(U, V ). [1 mark]
Answer: 0
Solution:
Cov(U, V ) = E(U V ) − E(U )E(V ).
Since U and V are the standardized form of random variables X and Y , respectively,
Cov(U, V ) = E(U V )
X − E(X) Y − E(Y )
=E
SD(X) SD(Y )
1
= E[(X − E(X))(Y − E(Y ))]
SD(X)SD(Y )
= E[XY − XE(Y ) − Y E(X) + E(X)E(Y )]
= E[XY ] − E[X]E(Y ) − E[Y ]E(X) + E(X)E(Y )
5. Using Markov’s inequality, find a bound on the probability that on a particular day,
the number of reservations will exceed 30. [1 mark]
1
(a) P (X > 30) ≤
4
1
(b) P (X > 30) ≥
3
10
(c) P (X > 30) ≤
31
10
(d) P (X > 30) >
31
3
Solution:
Random variable X represents the number of people who make reservation in a restau-
rant. It is given that
E(X) = 10 (3)
6. Find a bound on the probability that on a particular day, number of reservations made
will lie in between 6 and 14 using Chebyshev’s inequality. [2 marks]
7
(a) P (6 < X < 14) ≤
8
7
(b) P (6 < X < 14) ≥
8
7
(c) P (6 < X < 14) >
8
1
(d) P (6 < X < 14) ≤
8
Solution:
Using the Chebyshev’s inequality, we know that
1
P (| X − µ |≥ kσ) ≤ (4)
k2
1
P (µ − kσ < X < µ + kσ) ≥ 1 − (5)
k2
Given µ = 10 and σ 2 = 2
Now, we can write P (6 < X < 14) as
1
P (10 − kσ < X < 10 + kσ) ≥ 1 − 2 . Using (5)
k
Now, let
10 − kσ =6 (6)
10 + kσ =14 (7)
4
4 16
⇒k= ⇒ k2 = =8
σ 2
1 7
Therefore, P (6 < X < 14) ≥ 1 − =
8 8
Hence, the correct option is (b).
7. The joint probability mass function of three discrete random variables X, Y and Z is
given as
1
p(0, 1, 2) = p(0, 2, 3) = p(1, 0, −2) =
3
Calculate Var(XY + 2Z). [2 mark]
52
(a)
9
32
(b)
9
80
(c)
3
56
(d)
3
Solution:
1
XY + 2Z will take the values in {-4, 6, 4} with the probabilities each.
3
1
E(XY + 2Z) = [−4 + 6 + 4]
3
6
= =2
3
1
E[(XY + 2Z)2 ] = [(−4)2 + 62 + 42 ]
3
1
= [16 + 36 + 16]
3
68
=
3
5
Now,
Var(XY + 2Z) =E[(XY + 2Z)2 ] − [E(XY + 2Z)]2
68
= − 22
3
56
=
3
Hence, the correct option is (d).
8. An urn contains 5 white balls and 5 red balls. 2 balls are selected at random. Let
X denote the number of red balls drawn and let Y denote the number of white balls
drawn. Find the correlation coefficient between X and Y . [2 marks]
(a) ρ(X, Y ) = 1
(b) ρ(X, Y ) = −1
(c) ρ(X, Y ) = 0
(d) ρ(X, Y ) = −0.5
Solution:
Two balls are selected at random from the urn containing 5 white and 5 red balls.
Random variable X represent the number of red balls drawn.
Therefore, X will take values in {0, 1, 2}.
Random variable Y represent the number of white balls drawn.
Therefore, Y will take values in {0, 1, 2}.
Joint probability distribution of X and Y is given by
X
0 1 2
Y
10
0 0 0
45
25
1 0 0
45
10
2 0 0
45
Now,
10 25 10
E(X) = 0 × + 1× + 2×
45 45 45
=1
6
Similarly, E(Y ) = 1.
2 10 25 2 10
E(X ) = 0 × + 1× + 2 ×
45 45 45
65
=
45
65
Similarly, E(Y 2 ) =.
45
65 20
Now, Var(X) = Var(Y ) = − (1)2 =
45 45
10 25
E(XY ) = 0 × + 1× + (2 × 0)
45 45
25
=
45
Cov(X, Y )
ρ(X, Y ) =
SD(X)SD(Y )
E(XY ) − E(X)E(Y )
= p
Var(X)Var(Y )
25
( − 1)
= q 45
( 20
45
) × ( 20
45
)
=−1
9. Five students each from class 8, 9 and 10 have been nominated for the formation of
the school committee. The number of boys and girls who are selected from each of the
classes is given in Table 4.1.A.
If the committee comprises of two students from each class, find the expected number
of girls in the committee. (Enter the answer correct to 1 decimal point) [2 marks]
Answer: 2.8
7
Solution:
Let X1 represent the number of girls from class eight in the school committee.
Let X2 represent the number of girls from class nine in the school committee.
Let X3 represent the number of girls from class ten in the school committee.
We need to find E(X1 + X2 + X3 ).
We know that E(X1 + X2 + X3 ) = E(X1 ) + E(X2 ) + E(X3 ).
Since total number of girls selected from class eight is 2, therefore, the committee can
comprise of either 0 girl or 1 girl or 2 girls from class eight.
i.e. X1 will take values in {0, 1, 2}.
Now
3
C2 3
P (X1 = 0) = 5 =
C2 10
3 2
C1 × C1 6
P (X1 = 1) = 5 =
C2 10
2
C2 1
P (X1 = 2) = 5 =
C2 10
3 6 1 8
Therefore, E(X1 ) = 0× + 1× + 2× =
10 10 10 10
Similarly, total number of girls selected from class nine is 2, therefore, the committee
can comprise of either 0 girl or 1 girl or 2 girls from class nine.
8
i.e. X2 will take values in {0, 1, 2}, hence E(X2 ) = .
10
Total number of girls selected from class ten is 3 and we have to select 2 students from
each class, therefore, the committee can comprise of either 0 girl or 1 girl or 2 girls
from class ten.
i.e. X3 will take values in {0, 1, 2}.
Now
2
C2 1
P (X3 = 0) = 5 =
C2 10
3 2
C1 × C1 6
P (X3 = 1) = 5 =
C2 10
3
C2 3
P (X3 = 2) = 5 =
C2 10
8
1 6 3 12
Therefore, E(X3 ) = 0× + 1× + 2× =
10 10 10 10
Now
10. A share of a company costs |1000 today. Suppose today’s share price increases by
50% with probability 0.6 and decreases by 50% with probability 0.4. Independent of
today, suppose that tomorrow’s share price increases by 20% with probability 0.2, and
decreases by 30% with probability 0.8. If you decide to buy 3 shares today, find the
expected profit (in |) at the end of 2 days. [2 marks]
(a) -120
(b) 360
(c) 120
(d) -360
Solution:
The cost price of a share of the company is |1000.
Let the random variable X represent the price of the share at the end of 2 days.
Price can either go up by 50% with probability 0.6 or can go down by 50% with prob-
ability 0.4 on the first day.
Independent of today, the share price can either go up by 20% with probability 0.2 or
can go down by 30% with probability 0.8.
i.e. If the share price increases by 50% on the first day, the price of the share will
become |1500.
Andthe price of the share at the end of two days if the share prices increases by 20%
20
is | 1500 × + 1500 =|1800 with probability (0.6 × 0.2) = 0.12.
100
Similarly,the price of the share
at the end of two days if the share prices decreases by
30
30% is | 1500 − 1500 × =|1050 with probability (0.6 × 0.8) = 0.48.
100
Again, if the share price decreases by 50% on the first day, the price of the share will
become |500.
And the price of the share at the end of two days if the share prices increases by 20%
9
20
is | 500 × + 500 =|600 with probability (0.4 × 0.2) = 0.08.
100
Similarly,the price of the share
at the end of two days if the share prices decreases by
30
30% is | 500 − 500 × =|350 with probability (0.4 × 0.8) = 0.32.
100
P (X = 1800) = 0.12
P (X = 1050) = 0.48
P (X = 600) = 0.08
P (X = 350) = 0.32
Now,
The expected gain at the end of two days if you buy one share is |(880-1000) = -|120.
Therefore, if you buy 3 shares of the company, expected gain will be -|360.
11. A lottery has 500 tickets out of which only 2 tickets contain prizes worth |500 and
|1,000; the rest are worth |0. If one has bought 2 tickets, what will be his/her ex-
pected gain (in |)? [2 marks]
Answer: 6
Solution:
In the lottery, only two tickets out of 500 contain prizes worth |500 and |1,000.
If one has bought two tickets, one can get the prizes worth |0, |500, |1,000 and |1,500.
Let the random variable X represent the worth of the prizes of two tickets.
Therefore, X will take values in {0, 500, 1000, 1500}.
498
C2
P (X = 0) = P (Both the tickets are worth |0) = 500
C2
498
C 1 1C 1
P (X = 500) = P (One of the ticket is worth |0 and the other is worth |500) = 500
C2
498
C 1 1C 1
P (X = 1000) = P (One of the ticket is worth |0 and the other is worth |1000) = 500
C2
P (X = 1500) = P (One of the ticket is worth |500 and the other is worth |1000) =
10
2
C2
500
C2
498 498
C 1 1C 1 498
C 1 1C 1 2
C2 C2
E(X) = 0 × 500 + 500 × 500 + 1000 × 500 + 1500 × 500
C2 C2 C2 C2
1
= 500 [500 × 498C 1 + 1000 × 498C 1 + 1 × 1500]
C2
1
= 500 [249000 + 498000 + 1500]
C2
748500
= =6
124750
Therefore, the expected gain is |6.
12. Number of cars (X) that visit Garage A each day is a random variable with mean 45
and variance 10 while the number of cars (Y ) that visit Garage B each day is a random
variable with mean 45 and variance 20. If the arrival of cars in garages A and B are
independent, find an upper bound on the probability that the difference in the number
of cars arriving in Garage A and Garage B on a particular day is greater than or equal
to 10. [3 marks]
3
(a)
10
2
(b)
10
1
(c)
10
1
(d)
4
Solution:
The random variable X represent the number of cars that come each day in Garage A.
2
Let the mean and variance of X be denoted by µX and σX respectively.
2
Given µX = 45, σX = 10.
The random variable Y represent the number of cars that come each day in Garage B.
Let the mean and variance of Y be denoted by µY and σY2 respectively.
Given µY = 45, σY2 = 20.
Arrival of cars in shop A and B are independent. That implies X and Y are indepen-
dent.
Difference in the number of cars arriving in shop A and shop B is given by | X − Y |.
Let µ = E(X − Y ) = E(X) − E(Y ) = 10 − 10 = 0
and σ 2 = V ar(X − Y ) = V ar(X) + V ar(Y ) = 10 + 20 = 30. (Since X and Y are
independent.)
11
Using Chebysheb’s inequality,
1
P {| (X − Y ) − 0 |≥ kσ} ≤ (8)
k2
for k > 0.
Substituting kσ = 10 in equation (8), we get
1
P {| X − Y |≥ 10} ≤
( 100
30
)
3
⇒ P {| X − Y |≥ 10} ≤
10
Therefore, option (a) is correct.
12
Statistics for Data Science - 2
(a) e
(b) 0
(c) e−
(d) e−2
Answer: b
Solution: R0
We know that P (− < X < 0) = − fX (x)dx
But the value of fX (x) is zero in the range − to zero.
Therefore, P (− < X < 0) = 0.
Therefore, option b is the correct option.
1
2. Which of the following statements is/are true for a continuous random variable with
PDF fX (x)?
(a) If fX (2) = 2fX (1), then P (2 − < X < 2 + ) = 2P (1 − < X < 1 + ) for a small
.
(b) If fX (2) = 2fX (1), then P (2 − < X < 2 + ) ≈ 2P (1 − < X < 1 + ) for a small
.
(c) P (X = x0 ) = 0 for any value of x0 .
(d) CDF FX (x) is continuous in the domain [−∞, ∞].
Answer: b, c, and d
Solution:
Option a: We know that for small , P (x − < X < x + ) ∝ fX (x).
Therefore, P (1 − < X < 1 + ) ∝ fX (1) and P (2 − < X < 2 + ) ∝ fX (2)
But P (x − < X < x + ) is not exact linear function of fX (x).
Therefore when fX (2) = 2fX (1), then P (2 − < X < 2 + ) 6= 2P (1 − < X < 1 + )
but P (2 − < X < 2 + ) ≈ 2P (1 − < X < 1 + )
Hence option a is wrong but option b is correct.
Option c: The probability at an instant (PX (x)) for a continuous random variable is
zero as there is no sudden spike in the CDF function for any value of x. Hence option
c is correct.
Option d: For a continuous random variable CDF is always continuous.
3. If
1 (x2 − 8x + 16) 1 ≤ x ≤ 7
fX (x) = 18
0 otherwise
What is the value of P (X ≤ 4)? Enter the answer correct to one decimal accuracy.
R xa+1
( xa dx = )
a+1
Answer: 0.5
Solution: R
4
P (X ≤ 4) = −∞ fX (x)dx
R4
⇒ P (X ≤ 4) = 1 fX (x)dx, since fX (x) = 0 for x < 1.
R4 1 2
⇒ P (X ≤ 4) = 1 ( 18 (x − 8x + 16))dx
1 3
⇒ P (X ≤ 4) = (x /3 − 8x2 /2 + 16x/1)|41
18
1 1
⇒ P (X ≤ 4) = (43 /3 − 4 ∗ 42 + 16 ∗ 4) − (13 /3 − 4 ∗ 12 + 16 ∗ 1)
18 18
⇒ P (X ≤ 4) = 0.5
2
4. If X ∼ Normal(10, 25), what is the value of E[2X 2 ]?
Answer: 250
Solutions:
Given E[X]=10, Var(X)=25
We know that Var(X)= E[X 2 ] − E[X]2
⇒ E[X 2 ] = Var(X) + E[X]2
⇒ E[X 2 ] = 25 + 102 = 125
We know thatE[cX] = cE[X], where c is a constant.
⇒ E[2X 2 ] = 2E[X 2 ]
⇒ E[2X 2 ] = 2 × 125 = 250
5. If X ∼ Normal(10, 4), then what is the value of P (X ≥ 8|X ≤ 9)? Use the standard
normal distribution tables if necessary. Enter the answer up to two decimals accuracy.
Use the following CDF values of standard normal distribution.
FZ (−2) = 0.02275, FZ (−1.5) = 0.06681, FZ (−1) = 0.15866, FZ (−0.5) = 0.30854, FZ (0) =
0.5, FZ (0.5) = 0.69146, and FZ (1) = 0.84134
Answer: 0.485 accepted range 0.48 to 0.49
Solution:
Given µ = 10, σ 2 = 4 ⇒ σ = 2
We need to find P (X ≥ 8|X ≤ 9).
P (X ≥ 8 ∩ X ≤ 9)
P (X ≥ 8|X ≤ 9) =
P (X ≤ 9)
FX (9) − FX (8)
P (X ≥ 8|X ≤ 9) =
FX (9)
Converting present normal distribution to standard distribution to get values of FX (x).
x−µ 8 − 10
For x = 8, z = = = −1, ⇒ FX (8) = FZ (−1)
σ 2
x−µ 9 − 10
For x = 9, z = = = −0.5, ⇒ FX (9) = FZ (−0.5)
σ 2
FX (9) − FX (8)
P (X ≥ 8|X ≤ 9) =
FX (9)
0.30854 − 0.15866
⇒ P (X ≥ 8|X ≤ 9) = = 0.485
0.30854
3
2 log(y) 1≤y≤e
(a) fY (y) = y
0 otherwise
log(y)
1≤y≤e
(b) fY (y) = 2ey
0 otherwise
log(y) 1≤y≤e
(c) fY (y) = y
0 otherwise
log(y)
1≤y≤e
(d) fY (y) = ey
0 otherwise
log(y) 1≤y≤e
(e) fY (y) = 2y
0 otherwise
Answer: a
Solution:
Given Y = g(X) = eX
⇒ log y = x = g −1 (y)
Therefore g −1 (y) = log(y)
d(ex )
g(x) = ex , ⇒ g 0 (x) = ex Since = ex
dx x
We know that in the range 0 to 1, e is monotonic (increasing function).
1
Therefore, we can use the formula, fY (y) = 0 −1 fX (g −1 (y))
|g (g (y))|
g 0 (g −1 (y)) = g 0 (log y) = elog y = y
|g 0 (g −1 (y))| = y since y is positive in the range [1, e]
fX (g −1 (y)) = fX (log y) = 2 log y
1
Therefore, fY (y) = log y
y
2 log y
fY (y) =
y
Hence option a is correct.
4
The CDF of random variable X is given below:
0 x≤0
2x2 0 ≤ x ≤ 12
FX (x) = 12 1
2
≤x≤1
x
1≤x≤2
2
1 x≥2
d(xa )
= axa−1
dx
7. Which of the following statements is/are correct?
Answer: a, d
Solution:
d(FX (x))
We know that fX (x) =
dx
Given
0 x≤0
2x2 0 ≤ x ≤ 12
1 1
FX (x) = 2 2
≤x≤1
x
1≤x≤2
2
1 x≥2
5
d(0
=0 x≤0
dx
d(2x2 )
1
= 4x 0≤x≤
2
dx
1
d( 2 )
⇒ fX (x) = =0 1
<x≤1
2
dx
d( x2 )
= 12
1<x≤2
dx
d(1) = 0
x>2
dx
0 x≤0
1
4x 0 ≤ x ≤ 2
1
Therefore, fX (x) = 0 2
<x≤1
1
1<x≤2
2
0 x>2
Since, FX (x) is continuous in the given domain, hence X is a continuous random
variable.
8. What is the value of P (X ≥ 1|X ≤ 1.5)? Enter the answer correct to two decimals
accuracy.
Answer: 0.33, accepted range 0.31 to 0.35
Solution:
FX (1.5) − FX (1) 1.5/2 − 1/2
P (X ≥ 1|X ≤ 1.5) = = = 1/3
FX (1.5) 1.5/2
9. The time taken by Rohith to complete a race follows the exponential distribution with
expected time of completion of 10 minutes. What is the probability that Rohith takes
less than 20 minutes but more than 10 minutes to complete the race? Enter the answer
e−ax
correct to 2 decimals accuracy. ( e−ax dx =
R
)
−a
Answer: 0.2325, accepted range: 0.23 to 0.235
Solution:
Given E[X] = 10 minutes.
We know for a exponential distribution E[X] = λ1
⇒ λ1 = 10, λ = 0.1
For exponential distribution FX (x) = 1 − e−λx
The probability that athlete takes more than 10 minutes is,
FX (10) = 1 − e−0.1×10 = 1 − e−1
The probability that athlete takes more than 20 minutes is,
6
FX (20) = 1 − e−0.1×20 = 1 − e−2
The probability that athlete takes more than 10 minutes but less than 20 minutes to
complete race is FX (20) − FX (10) = e−1 − e−2 = 0.232 approximately.
10. The PDFs of random variables X1, X2, X3, X4, and X5 are shown in Figure 5.2.P.
Based on the information, choose the correct option(s) from below.
Answer: a, d, and e
Solution:
We know that in the PDF of normal distribution, the peak value occurs at mean.
E[X] = µ(mean)
Also, the value of PDF at mean is inversely proportional to standard deviation
1
Since, fX (µ) = √ .
2πσ
The peak value, which is mean or E[X], of PDF occurs approximately for X1, X2, X3, X4,
and X5 at -10, 0, 20, 10, and -10 respectively.
Therefore, E(X1) ≈ E(X5) < E(X2) < E(X4) < E(X3)
The peak value (fX (µ)) for variables X1, X2, X3, X4, and X5 are such that fX1 (µ) ≈
fX2 (µ) > fX3 (µ) > fX4 (µ) > fX5 (µ).
Therefore, Var(X1) ≈ Var(X2) < Var(X3) < Var(X4) < Var(X5)
Hence, options a, d, and e correct.
7
11. The PDF of a continuous random variable is given as
(
4x3 0 ≤ x ≤ 1
fX (x) =
0 otherwise
R xa+1
What is the value of Var(X)? ( xa dx = )
a+1
1
(a)
75
2
(b)
75
3
(c)
75
4
(d)
75
Answer: b
We knowR that Var(X) = E[X 2 ] − E[X]2
E[X] = xfX (x)dx
R1
E[X] = 0 x ∗ 4x3 dx
R1
⇒ E[X] = 0 4x4 dx
4x5 1
⇒ E[X] = |
5 0
4 4
⇒ E[X] = 5
−0= 5
R
E[X 2 ] = x2 fX (x)dx
R1
E[X] = 0 x ∗ 4x4 dx
6
⇒ E[X] = 4x6 |10
4 2
⇒ E[X] = 6
−0= 3
2
Var(X) = 3
− ( 45 )2
2 16
Var(X) = 3
− 25
2
Var(X) = 75
8
(a) If b2 − a2 = b1 − a1 , then Var(X) = Var(Y ).
(b) If b2 + a2 = b1 + a1 , then Var(X) = Var(Y ).
(c) If b2 − a2 = b1 − a1 , then E(X) = E(Y ).
(d) If b2 − b1 = a1 − a2 , then E(X) = E(Y ).
Answer: a and d
Solution:
We know that mean (E(X)) and Variance (Var(X)) of uniform random variable (X ∼
a+b (b − a)2
Uniform(a, b) is and respectively.
2 12
Given X ∼ Uniform(a1 , b1 ) and Y ∼ Uniform(a2 , b2 ),
a1 + b 1 a2 + b 2
E(X) = , E(Y ) = . So, for E(X) to be equal to E(Y ), a1 + b1 = a2 + b2
2 2
or b2 − b1 = a1 − a2 . Hence option d is correct and option c is incorrect.
(b1 − a1 )2 (b2 − a2 )2
Similarly for Var(X) to be equal to Var(Y ), = or b1 −a1 = b2 −a2 ,
12 12
hence option a is correct and option b is incorrect.
13. The CDF of a random variable X is given as:
0 x x<0
FX (x) = 0 ≤ x ≤ ln 2
ln 4
1 − e−x ln 2 ≤ x < ∞
9
0 x<0
1
(d) fX (x) = 0 ≤ x < ln 2
lnx 2
e ln 2 ≤ x < ∞
Answer: a
Solution:
d(FX (x))
We know that fX (x) =
dx
Given,
0 x<0
x
FX (x) = 0 ≤ x ≤ ln 2
ln 4
1 − e−x
ln 2 ≤ x < ∞
Therefore,
d(0)
=0 x<0
dx
x
d( )
fX (x) = ln 4 = 1 0 ≤ x ≤ ln 2
dx ln 4
−x
d(1 − e ) = e−x
ln 2 ≤ x < ∞
dx
Hence option a is correct.
10
Statistics for Data Science - 2
P (X > 4) = 1 − P (X ≤ 4) = 1 − FX (4)
= 1 − (1 − e−3×4 )
= e−12
a) 1 − e−18
b) e−5 − e−18
c) e−18
d) e−9
Solution:
1
i) Find the value of k.
Solution:
We know that for PDF of the random variable
Z ∞
fX (x) = 1
−∞
Z ∞
⇒ ke−x dx = 1
0
−x ∞
e
⇒k =1
−1
0
a) e−1
b) e−3 e−4
c) e−3 − e−4
d) e−4 − e−3
Rb
Hint: Use a e−x dx = e−a − e−b
Solution:
Z 4
P (3 < X < 4) = ke−x dx
3
4
e−x
=1×
−1
3
−4
e−3
e
= −
−1 −1
−3 −4
=e −e
3 1 P (X ≤ 43 and X > 14 )
P (X ≤ |X> )=
4 4 P (X > 14 )
R 3/4 4
1/4
5x dx
= R1
1/4
5x4 dx
3/4
5x5
5
1/4
= 1
5x5
5
1/4
3/4
5
x
3 1
1/4
⇒ P (X ≤ | X > ) = 1
4 4
x5
1/4
3 5
(4) − ( 14 )5
=
1 − ( 14 )5
22
=
93
4. The lifespan (in hours) of an electronic component used in an electric car has the density
function ( x
1 − 500
500
e x≥0
fX (x) =
0 otherwise
Determine the probability that the component lasts more than 200 hours before it needs
to be replaced.
a) e−0.4
b) e200
3
c) 0.5
d) e−2.5
Solution:
Let X denote the lifespan (in hours) of the electronic component. We have to find the
probability that the component lasts more than 200 hours before it needs to be replaced
i.e.
P (X > 200) = 1 − P (X ≤ 200)
1
Also, we can relate the given density with the exponential distribution with λ = 500 .
5. The number of days in advance by which airline tickets are purchased by travelers is
exponentially distributed with an average of 28 days. If there is an 80% chance that a
traveler will purchase tickets fewer than d days in advance, then what is the value of d?
Write your answer to the nearest integer.
Solution:
Let X be the number of days in advance by which airline tickets are purchased by
travelers.
We need to find the value of d.
1
Given that average is 28 days, so λ = 28 and there is 80% chance that a traveler will
purchase tickets fewer than d days in advance.
⇒ P (X < d) = 0.80
d
⇒ 1 − e− 28 = 0.80
d
⇒ e− 28 = 0.20
d
⇒ = −ln(0.20)
28
⇒ d = 28 × (1.609)
⇒ d = 45.052
6. A firm produces machines with a lifespan, whose distribution has a mean of 200 months
and standard deviation of 50 months. The firm wishes to introduce a warranty scheme
in which it would like to replace all the dysfunctional machines with new ones within
warranty period. But they do not wish to do so for more than 11.9% of the machines
they produce. If the lifespan of the machine is assumed to follow a normal distribution,
4
how long a guarantee period should be offered? (Answer is expected in months)
Hint: Use P (Z < −1.18) = 0.119, where Z represents the standard normal distribution.
Solution:
Let X denote the lifespan of the machines in months. Given that µ = 200 and σ = 50.
The firm did not wish to replace more than 11.9% of the machines they produce.
If m be the guarantee period (in months), then
P (X ≤ m) = 0.119
X − 200 m − 200
⇒P ≤ = 0.119
50 50
Comparing this equation with the given value of standard normal distribution we will get
m − 200
= −1.18
50
⇒ m = 141
a) (
1 0<y<1
fY (y) =
0 otherwise
b) (
(1 − y)3 0<y<1
fY (y) =
0 otherwise
c) (
y3 0<y<1
fY (y) =
0 otherwise
d) (
3y 2/3 0<y<1
fY (y) =
0 otherwise
Hint:
d
Apply the monotonic, differentiable function theorem and (1 − x)3 = −3(1 − x)2
dx
5
Solution:
We know that in the range (0, 1), (1 − x)3 is monotonic (decreasing function).
1
Therefore, we can use the formula, fY (y) = 0 −1 fX (g −1 (y))
|g (g (y))|
Given Y = (1 − X)3 = g(X)(let)
⇒ y 1/3 = 1 − x, ⇒ x = 1 − y 1/3 = g −1 (y)
Therefore g −1 (y) = 1 − y 1/3
d
g(x) = (1 − x)3 ⇒ g 0 (x) = −3(1 − x)2 , since (1 − x)3 = −3(1 − x)2
dx
And
g 0 (g −1 (y)) = g 0 (1 − y 1/3 ) = −3(1 − (1 − y 1/3 ))2 = −3y 2/3
|g 0 (g −1 (y))| = 3y 2/3 , since y 2/3 is positive in the range (0, 1).
fX (g −1 (y)) = fX (1 − y 1/3 ) = 3(1 − (1 − y 1/3 ))2 = 3y 2/3
3y 2/3
Therefore, fY (y) = 2/3
3y
⇒ fY (y) = 1
Therefore
(
1 0<y<1
fY (y) =
0 otherwise
a) (
(12 − 3y)2 /27 −6 < y < 3
fY (y) =
0 otherwise
b) (
(12 − 3y)2 /27 3 < y < 6
fY (y) =
0 otherwise
c) (
(12 − 3y)/27 −6 < y < 3
fY (y) =
0 otherwise
d) (
(12 − 3y)/27 3 < y < 6
fY (y) =
0 otherwise
6
Solution:
We know that in the range (-6, 3), 13 (12 − x) is monotonic (decreasing function).
1
Therefore, we can use the formula, fY (y) = 0 −1 fX (g −1 (y))
|g (g (y))|
Given Y = 31 (12 − X) = g(X)(let)
⇒ 3y = 12 − x, ⇒ x = 12 − 3y = g −1 (y)
Therefore g −1 (y) = 12 − 3y
g(x) = 31 (12 − x) ⇒ g 0 (x) = − 13
And
g 0 (g −1 (y)) = g 0 (12 − 3y) = − 13
|g 0 (g −1 (y))| = 13
(12 − 3y)2
fX (g −1 (y)) = fX (12 − 3y) =
81
(12 − 3y)2
Therefore, fY (y) = 81
1
3
(12 − 3y)2
⇒ fY (y) =
27
When x = −6, y = 6 and x = 3, y = 3.
Therefore
(12 − 3y)2
3<y<6
fY (y) = 27
0 otherwise
7
Solution:
Z ∞
E[X] = xfX (x)dx
−∞
Z 1
= x × x3 (6x2 + 5x − 4)dx
Z0 1
= (6x6 + 5x5 − 4x4 )dx
0
1 1 1
7 6 5
6x 5x 4x
= + −
7 6 5
0 0 0
6 5 4
= + −
7 6 5
187
=
210
Rb Rc Rb
Also, a xn dx = a xn dx + c xn dx where a < c < b.
Solution:
Var(Y ) = Var(6X + 5) = 36Var(X)
And Var(X) = E[X 2 ] − (E[X])2
8
Z ∞
E[X] = xfX (x)dx
−∞
Z 2
= xfX (x)dx
0
Z 1 Z 2
= xfX (x)dx + xfX (x)dx
0 1
Z 1 Z 2
= [Link] + x(2 − x)dx
0 1
1 2 2
x3 2x2 x3
= + −
3 2 3
0 1 1
1 2 2 (23 − 13 )
= + (2 − 1 ) −
3 3
1 7
= +3−
3 3
=1
Z ∞
2
E[X ] = x2 fX (x)dx
−∞
Z 2
= x2 fX (x)dx
Z0 1 Z 2
2
= x fX (x)dx + x2 fX (x)dx
Z0 1 Z 2 1
= x2 .xdx + x2 (2 − x)dx
0
1 2 1 2
x4
2x3 x4
= + −
4 3 4
0 1 1
1 2 1
= + (23 − 13 ) − (24 − 14 )
4 3 4
1 14 15
= + −
4 3 4
7
=
6
Therefore,
Var(X) = 76 − 1 = 16
⇒ Var(Y ) = 36 × 16 = 6
9
Statistics for Data Science - 2
a) 0.6e−y + 0.4e−3y
b) 0.4e−y + 0.6e−3y
c) 0.6e−y + 1.2e−3y
d) 0.4e−y + 1.8e−3y
Solution:
Given that, X ∼ Bernoulli(0.6), therefore pX (1) = 0.6 and pX (0) = 0.4.
The marginal density of Y is given by
X
fY (y) = pX (x)fY |X=x (y)
x∈TX
1
The marginal density of Y is given by
X
fY (y) = pX (x)fY |X=x (y)
x∈TX
2e−12
a)
e−6 + 2e−12
e−6
b) −6
e + 2e−12
e−12
c) −6
e + e−12
e−6
d) −6
e + e−12
Solution:
Given that, X ∼ Uniform{1, 2}, therefore pX (1) = pX (2) = 21 .
The marginal density of Y is given by
X
fY (y) = pX (x)fY |X=x (y)
x∈TX
And
pX (2)fY |X=2 (3)
fX|Y =3 (2) =
fY (3)
1
2
× 4e−4×3
= −2×3
e + 2e−4×3
2e−12
= −6
e + 2e−12
2
4. The joint density function of two continuous random variables X and Y is given as
(
kxy 0 < x < 4, 0 < y < 1
fXY (x, y) =
0 otherwise
Find the value of k. Enter your answer correct to two decimals accuracy.
Solution: R∞ R∞
We know that for joint PDF, −∞ −∞ fXY (x, y)dxdy = 1
Since fXY (x, y) is nonzero in the region 0 < x < 4, 0 < y < 1.
Z 1Z 4
⇒ fXY (x, y)dxdy = 1
0 0
Z 1Z 4
⇒ kxy dxdy = 1
0 0
Z 1
y 2 4
⇒ kx dx = 1
0 2 0
Z 1
⇒ 8kxdx = 1
0
x2 1
⇒ 8k = 1
2 0
1
⇒ k = = 0.25
4
5. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : x + y < 4, x > 0, y > 0}. Find the value
of P (2X + Y > 2).
1
a) 8
7
b) 8
3
c) 4
1
d) 4
Solution:
3
1
Area of the lower shaded region (A) will be 2
×1×2=1
Solution:
4
Z 1 Z 1−y
P (X + Y < 1) = (x + y)dxdy
0 0
Z 1 2
x 1−y
= + xy dy
0 2 0
Z 1
(1 − y)2
= + (1 − y)y dy
0 2
1
(1 − y)3 y 2 y 3
= − + −
6 2 3
0
1 1 1
= − − −
2 3 6
1
=
3
7. The joint PDF of two continuous random variables X and Y is given by
(
2
(5x + 2y) 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
fXY (x, y) = 7
0 otherwise
Find the marginal PDF of X.
a) (
2x 0 ≤ x ≤ 1
fX (x) =
0 otherwise
b) (
2
7
(5x + 1) 0 ≤ x ≤ 1
fX (x) =
0 otherwise
c) (
2
7
(3x + 2) 0 ≤ x ≤ 1
fX (x) =
0 otherwise
5
d) (
2
7
(5y + 1) 0 ≤ x ≤ 1
fX (x) =
0 otherwise
Solution:
For 0 ≤ x ≤ 1
Z 1
2
fX (x) = (5x + 2y)dy
0 7
1
2y 2
2
= 5xy +
7 2
0
2
= (5x + 1)
7
a) (
3
2
y(2 − y) 0 < y < 1
fY (y) =
0 otherwise
b) (
2y 0<y<1
fY (y) =
0 otherwise
c) (
3
2
(1 − y2) 0 < y < 1
fY (y) =
0 otherwise
d) (
2
3
(2 − y) 0 < y < 1
fY (y) =
0 otherwise
Solution: R∞ R∞
We know that for joint PDF, −∞ −∞ fXY (x, y)dxdy = 1
6
Since fXY (x, y) is nonzero in the region 0 < x < 4, 0 < y < 1.
Z 1Z 4
⇒ fXY (x, y)dxdy = 1
0 0
Z 1Z 4
⇒ k(2 − y)dxdy = 1
0 0
Z 1 4
⇒ k(2 − y)x dy = 1
0 0
Z 1
⇒ 4k(2 − y)dy = 1
0
1
y 2
⇒ 4k 2y − =1
2
0
3
⇒ 4k × = 1
2
1
⇒k=
6
For 0 < y < 1
Z 4
1
fY (y) = (2 − y)dx
0 6
1 4
= (2 − y)x
6 0
2
= (2 − y)
3
9. Let X and Y be two independent continuous random variables with PDFs fX (x) and
fY (y) given as
(
1 0≤x<1
fX (x) =
0 otherwise
(
y/2 0 ≤ y < 2
fY (y) =
0 otherwise
Find the value of P (2X + Y > 1).
1
a) 24
11
b) 12
1
c) 12
23
d) 24
7
Solution:
Given that X and Y be two independent continuous random variables,
therefore fXY (x, y) = fX (x)fY (y).
(
y/2 0 ≤ x < 1, 0 ≤ y < 2
fXY (x, y) =
0 otherwise
We have to find the value of P (2X + Y > 1).
And
P (2X + Y > 1) = 1 − P (2X + Y ≤ 1)
1−y
Z 1 Z
2 y
P (2X + Y ≤ 1) = dxdy
2
Z0 1 0
1−y
y 2
= x dy
2 0
Z0 1
1
= y(1 − y)dy
0 4
1
1 y 2 y 3
= −
4 2 3
0
1
=
24
1 23
⇒ P (2X + Y > 1) = 1 − 24
= 24
10. The joint density function of two random variables X and Y is given by
(
8xy 0 ≤ x ≤ 1, 0 ≤ y ≤ x
fXY (x, y) =
0 otherwise
8
a) Yes
b) No
Solution:
Z x
fX (x) = 8xy dy
0
x
y 2
= 8x
2
0
3
= 4x
Z 1
fY (y) = 8xy dx
0
1
x2
= 8y
2
0
= 4y
11. Let (X, Y ) ∼ Uniform(D), where D = [3, 5] × [2, 4]. Are X and Y independent?
a) Yes
b) No
Solution:
(X, Y ) ∼ Uniform(D), therefore
9
(
1
4
3 ≤ x ≤ 5, 2 ≤ y ≤ 4
fXY (x, y) =
0 otherwise
Z 4
1
fX (x) = dy
2 4
4
1
= y
4
2
1
=
2
Z 5
1
fY (y) = dx
3 4
5
1
= x
4
3
1
=
2
fX (x)fY (y) = 21 × 12 = 41 = fXY (x, y).
Hence X and Y are independent.
a) (
2x 0 < x < 1
fX|Y =0.5 (x) =
0 otherwise
b) (
3x2 0<x<1
fX|Y =0.5 (x) =
0 otherwise
c) (
4x3 0<x<1
fX|Y =0.5 (x) =
0 otherwise
10
d) (
1 0<x<1
fX|Y =0.5 (x) =
0 otherwise
Solution:
For 0 < y < 1
Z 1
fY (y) = 4xy dx
0
1
x2
= 4y
2
0
= 2y
The distribution of X | Y = 0.5, (0 < x < 1) is given by
Solution:
For 0 < y < 1
Z 1 xy
fY (y) = x2 +
dx
0 3
3 1
x x2 y
= +
3 6
0
1 1
= + y
3 6
11
fXY (x, 1)
fX|Y =1 (x) =
fY (1)
x + x×1
2
= 1 1 3
3
+ ×1
6 x
= 2 x2 +
3
Z 1/2
1 1 x
P <X< |Y =1 = 2 x2 + dx
4 2 1/4 3
3 1/2
x x2
=2 +
3 6
1/4
1 1 1 1
=2 + − +
24 24 192 96
1 1
=2 −
12 64
13
=
96
12
Statistics for Data Science - 2
Week 6 graded Assignment
Solution
1. A person randomly chooses a battery from a store which has 40 batteries of type A and
60 batteries of type B. Battery life of type A and type B batteries are exponentially
distributed with average life of 4 years and 6 years, respectively. If the chosen battery
lasts for 5 years, what is the probability that the battery is of type A?
1
(a) 5
1 + e 12
1
(b) −5
1 + e 12
−4
e5
(c) −6
1+e 5
−6
e5
(d) −4
1+e 5
Solution:
Define a event X as follows:
(
1 If the chosen battery is of type A
X=
0 If the chosen battery is of type B
Y |X = 0 ∼ Exp( 16 )
It implies that
−y
fY |X=1 (y) = 14 e 4 ; y > 0 and
−y
fY |X=0 (y) = 16 e 6 ;y > 0
40 2
P (X = 1) = = and
100 5
60 3
P (X = 0) = =
100 5
fY |X=1 (5).P (X = 1)
fX|Y =5 (1) =
fY (5)
fY |X=1 (5).P (X = 1)
=
fY |X=1 (5).P (X = 1) + fY |X=0 (5).P (X = 0)
1 −5
4
e 4 . 52
= 1 −5 −5
4
e 4 . 52 + 16 e 6 . 35
1 −5
10
e4
= 1 −5
1 −5
10
e 4 + 10 e6
−5
e 4
= −5 −5
e 4+e 6
1
= 5
1 + e 12
3 exp( 18 )
(a)
3 exp( 18 ) + 6 + 2 exp( 29 )
3 exp( −1
8
)
(b)
3 exp( −1
8
) + 6 + 2 exp( −2
9
)
2 exp( −2
9
)
(c)
3 exp( 8 ) + 6 + 2 exp( −2
−1
9
)
6
(d)
3 exp( 32 ) + 6 + 2 exp( −1
−1
18
)
Solution:
Given that X ∼ Uniform{1, 2, 3} and Z ∼ Normal(1, 4) are independent.
Y = XZ + X
It implies that
Page 2
Y |X = 1 = Z + 1 ∼ Normal(2, 4)
Y |X = 2 = 2Z + 2 ∼ Normal(4, 16)
Y |X = 3 = 3Z + 3 ∼ Normal(6, 36)
Therefore,
−(y−2)2
fY |X=1 (y) = √1 exp
2 2π 8
−(y−4)2
fY |X=2 (y) = √1 exp
4 2π 32
−(y−6)2
fY |X=3 (y) = √1 exp
6 2π 72
−(2−4)2
√1 exp . 13
4 2π 32
=
−(2−4)2 −(2−2)2 −(2−6)2
√1 exp . 31 + √1 exp . 13 + √1 exp . 31
4 2π 32 2 2π 8 6 2π 72
exp −1 1
8 4
= 1
exp −1 −2
1
+ 2 exp(0) + 16 exp
4 8 9
3 exp( −1
8
)
=
3 exp( 8 ) + 6 + 2 exp( −2
−1
9
)
1. Yes
2. No
Solution:
First we will calculate the marginal densities of X and Y .
Page 3
For 0 ≤ x ≤ 1
Z 1
fX (x) = fXY (x, y)dy
0
Z 1
= 4xydy
0
1
= 2xy 2
0
= 2x
For 0 ≤ y ≤ 1
Z 1
fY (y) = fXY (x, y)dx
0
Z 1
= 4xydx
0
1
2
= 2x y
0
= 2y
Therefore,
fX (x).fY (y) = 4xy = fXY (x, y)
It implies that X and Y are independent random variables.
y=x
2
x
1 2
Page 4
The region X ≥ Y will be the lower half part of the circle.
Therefore,
Area of lower half circle
P (X ≥ Y ) =
Area of the circle
π(1)2/2
=
π(1)2
1
=
2
5. Let (X, Y ) ∼ Uniform(D), where D = {(x, y) : y ≤ 2x, 0 < x < 1, 0 < y < 2} ∪ [1, 2] ×
[0, 2]. Find the marginal density of X.
(a)
2x + 2 0≤x≤2
fX (x) = 3 3
0 otherwise
(b)
2x + 1 0≤x≤2
fX (x) = 3 3
0 otherwise
(c)
2x
3
0≤x≤1
2
fX (x) = 1≤x≤2
3
0 otherwise
(d)
2x
3
0≤x≤1
1
fX (x) = 1≤x≤2
3
0 otherwise
Page 5
y
2 y = 2x
x
1 2
Page 6
6. The joint pdf of two random variables X and Y is given by
(
24xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, x + y ≤ 1
fXY (x, y) =
0 otherwise
x+y =1
0.25
x
0.25 1
Z 1/4 Z 1/4−y
= 24xydxdy
y=0 x=0
1/4−y
Z 1/4
= 12x2 y dy
y=0
x=0
Page 7
Z 1/4 2
1
= 12y −y dy
y=0 4
Z 1/4
12
= y(1 − 4y)2 dy
y=0 16
Z 1/4
3
= y(1 + 16y 2 − 8y)dy
4 y=0
1/4
y2 8y 3
3
= + 4y 4 −
4 2 3
y=0
3 1 1 1
= + −
4 32 64 24
3 1 1
= . =
4 192 256
Option (b)
x + y = 0.5
0.5 x+y =1
x
0.5 1
Page 8
Orange region will denote X + Y ≤ 12 . Now,
Z 1/2 Z 1/2−y
1
P (X + Y ≤ ) = fXY (x, y)dxdy
2 y=0 x=0
Z 1/2 Z 1/2−y
= 24xydxdy
y=0 x=0
1/2−y
Z 1/2
2
= 12x y dy
y=0
x=0
Z 1/2 2
1
= 12y −y dy
y=0 2
Z 1/2
12
= y(1 − 2y)2 dy
y=0 4
Z 1/2
=3 y(1 + 4y 2 − 4y)dy
y=0
1/2
y2 4y 3
=3 + y4 −
2 3
y=0
1 1 1
=3 + −
8 16 6
2 1
=3× =
96 16
Page 9
y
0.5 x+y =1
x
0.5 1
Page 10
7. The joint pdf of two random variables X and Y is given by
(
3xy(1 − x) 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
fXY (x, y) =
0 otherwise
1 fXY (X > 12 , Y = 1)
P (X > |Y = 1) =
2 fY (1)
1
= 2fXY (X > , Y = 1)
2
Z 1
= 2(3x(1 − x))dx
x= 12
Z 1
=6 (x − x2 )dx
1
2
1
x2 x3
=6 −
2 3 1
2
1 1 1 1 1 1
=6 − −6 − =1− =
2 3 8 24 2 2
8. The amount of milk (in litres) in a shop at the beginning of any day is a random amount
X from which a random amount Y (in litres) is sold during that day. Assume that the
Page 11
joint density function of X and Y is given by
(
1
0 ≤ x ≤ 10, 0 ≤ y ≤ x
fXY (x, y) = 50
0 otherwise
Find the probability that amount of milk left at the end of day is less than 5 litres. Write
your answer correct to two decimal points.
Solution:
y
y=x
10
5
x−y =5
x
5 10
X denotes the amount of milk at the beginning of any day and Y denotes the amount
of milk which is sold during that day.
Therefore, amount of milk left at the end of the day will be denoted by X − Y .
To find: P (X − Y < 5)
In the diagram above, brown region denotes X −Y < 5 and brown + blue region denotes
the support of X and Y .
1
Area of the support(X, Y ) = 2
× 10 × 10 = 50.
Therefore,
area of brown region
P (X − Y < 5) =
area of support
75/2
=
50
75
=
100
Page 12
9. The joint pdf of two continuous random variables X and Y is given by
(
ke−(x+y) x ≥ 0, y ≥ 0
fXY (x, y) =
0 otherwise
(a) e−10
(b) (e−5 − 1)e−5
(c) (1 − e−5 )e−5
(d) (e−5 + 1)e−5
Solution:
We know that Z Z
fXY dxdy = 1
Supp(X,Y )
Therefore,
Z ∞ Z ∞
(ke−(x+y) )dxdy = 1
y=0 x=0
Z ∞ Z ∞
⇒k e−y e−x dxdy = 1
y=0 x=0
Z ∞ ∞
−y −x
⇒k e (−e ) dy = 1
y=0
0
Z ∞
−y
⇒k e (0 + 1)dy = 1
y=0
Z ∞
⇒k e−y dy = 1
y=0
∞
⇒k(−e−y ) = 1
0
⇒k(0 + 1) = 1
⇒k = 1
To find: P (X ≥ 5, Y ≤ 5)
Page 13
Now,
Z 5 Z ∞
P (X ≥ 5, Y ≤ 5) = (e−(x+y) )dxdy
y=0 x=5
Z 5 Z∞
= e−y e−x dxdy
y=0 x=5
Z 5 ∞
−y −x
= e (−e ) dy
y=0
5
Z 5
= e−y (0 + e−5 )dy
y=0
Z 5
−5
= (e ) e−y dy
y=0
5
−5 −y
= (e )(−e )
0
= (e−5 )(−e−5 + 1)
= (e−5 )(1 − e−5 )
Therefore, fX ( 12 ) = 3
8
Page 14
Now,
1 1 fXY (X = 12 , 12 ≤ Y ≤ 1)
P ( ≤ Y ≤ 1|X = ) =
2 2 fX ( 12 )
Z 1
8 1 1
= + y dy
1/2 3 8 2
Z 1
1 1
= + y dy
1/2 3 2
1
y y2
= +
6 6
1/2
1 1 1 1
= + − +
6 6 12 24
1 1 5
= − = = 0.20
3 8 24
Page 15
Statistics for Data Science - 2
Week 7 practice Assignment
Statistics from samples and Limit theorems
X
1. If X, Y ∼ i.i.d. Normal(0, 4), what will be the variance of ?
Y
(a) 4
(b) 2
(c) 1
(d) Undefined
Solution:
X
We know that if X, Y ∼ i.i.d. Normal(0, σ 2 ), ∼ Cauchy(0, 1) and variance of Cauchy
Y
distribution is undefined.
Therefore, option(d) is correct.
2. A population has mean 60 and standard deviation 6. Random samples of size 100 from
this population are collected independently. Find the expected value of the sample mean.
Solution:
We know that expected value of the sample mean X is given by
E[X] = µ
= 60
1. FZ (0.3)
2. 1 − FZ (0.3)
3. FZ (−0.3)
4. 1 − FZ (−0.3)
Solution:
Now,
P (Y ≥ 10) = P (Y − 16 ≥ −6)
Y − 16 −6
= P( ≥ )
20 20
Y − 16
= P( ≥ −0.3)
20
= P (Z ≥ −0.3)
= 1 − P (Z < −0.3)
= 1 − FZ (−0.3)
4. Random samples of size 100 are collected from a population of unknown parameters. If
the variance of the sample mean is 36, what will be the standard deviation of the actual
population?
Solution:
σ2
We know that variance of the sample mean is given by where σ is the standard
n
deviation of the actual population and n is the sample size.
σ2
= 36
n
σ2
⇒ = 36
100
⇒σ 2 = 3600
⇒σ = 60
Page 2
Solution:
Given: standard deviation of the population, σ = 5
Sample size, n = 50
To find: upper bound on P (|X − µ| ≥ 10) where X and µ are sample mean and popu-
lation mean, respectively.
σ2
P (|X − µ| ≥ δ) ≤
nδ 2
25
⇒P (|X − µ| ≥ 10) ≤
100 × 50
⇒P (|X − µ| ≥ 10) ≤ 0.005
6. A study shows that the average daily sleeping hours of teenagers is ten hours with a
standard deviation of two hours. If a sample of 100 teenagers is collected, what will be
the probability that the mean of the sleeping hours of these 100 teenagers is at least 0.4
hours away from the population mean? Assume that each observation in the sample is
independent. Assume that FZ denotes the CDF of standard normal distribution.
Solution:
let X denote the average daily sleeping hours of teenagers.
Given: standard deviation of X, σ = 2
Sample size, n = 100
To find: P (|X − µ| ≥ 0.4) where X and µ are sample mean and population mean,
respectively.
Page 3
Now,
S
P (|X − µ| ≥ 0.4) = P ( − µ ≥ 0.4)
n
S − nµ
= P ( ≥ 0.4)
n
S − nµ 0.4√n
= P ( √ ≥ )
σ n σ
= P (|Z| ≥ 2)
= P (Z ≥ 2) + P (Z ≤ −2)
= 1 − P (Z ≤ 2) + P (Z ≤ −2)
= 1 − FZ (2) + FZ (−2)
2 σ2
Moment generating function of Normal(0, σ 2 ) is given by eλ /2
.
Let N ∼ Normal(0, 22 )
λ2 22/2
MN (λ) = e
λ2 22 λ4 24
=1+ + + ...
2 2!(4)
λ2 22 λ2
=1+ + 48 + . . .
2 4!
λ4
Therefore, 4th moment of Normal(0, 22 ) = coefficient of = 48
4!
Page 4
places.
Solution:
We know that if X ∼ Gamma(α, k) and Y ∼ Gamma(β, k) be two independent random
X
variables, then ∼ Beta(α, β).
X +Y
9. A study says that the delivery time of pizzas has a standard deviation of 10 minutes. A
pizza shop collected the data of some deliveries and their
√ delivery time. The probability
that the mean delivery time of this sample is at least 5 minutes away from the actual
mean delivery time is at most 51 as per the weak law of large numbers. What is the size
of the sample?
Solution:
Let X denote the delivery time of pizzas.
Given that σ = 10 √
To find: size of the sample such that P (|X − µ| ≥ 5) ≤ 15 ...(1).
By the weak law of large numbers, we have
σ2
P (|X − µ| ≥ δ) ≤ 2
nδ
√ 100
⇒P (|X − µ| ≥ 5) ≤ ...(1)
n×5
10. A company sells eggs whose weights are normally distributed with a mean of 70g and a
standard deviation of 2g. Suppose that these eggs are sold in packages that each contain
four eggs. Assume that the weight of each egg is independent. What is the probability
that the mean weight of the four eggs in a package is greater than 68.5g? Write your
answer correct to two decimal places.
(Hint: Use the fact that linear combination of normal distributions is again a normal
distribution. FZ (−1.5) = 0.066)
Page 5
Solution:
Let X denote the weight of an egg.
Given that E[X] = µ = 70
SD(X) = σ = 2
X ∼ Normal(70, 22 ) Let X1 , X2 , X3 and X4 denote the weights of four eggs in a package.
Suppose that
X1 + X2 + X3 + X4
X=
4
E[X] = µ = 70 and
σ2 4
Var(X) = = =1
n 4
Now,
11. Let X1 , X2 , X3 , . . . Xn be i.i.d. Poisson(4). What should be the value of n such that
P (3.8 ≤ X ≤ 4.2) ≥ 0.95? [2 marks]
(Hint: Use FZ (1.96) = 0.975)
1. at least 200
2. at least 385
3. at least 450
4. at least 585
Solution:
Given that X1 , X2 , X3 , . . . Xn ∼ i.i.d. Poisson(4)
Page 6
Mean of the distribution = µ = 4
Variance of the distribution = σ 2 = 4
Let S = X1 + X2 + . . . + Xn and
X1 + X2 + . . . + Xn
X=
n
X −4 −2 0 2 4
1 1 1 1 5
P (X = x) 8 6 6 8 12
1.
Page 7
X −4 −2 0 2 4
5 1 1 1 1
P (X = x) 12 8 6 6 8
2.
X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8
3.
X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 8 6
4.
Solution:
The MGF of a discrete random variable X with the PMF fX (x) = P (X = x), x ∈ TX
is given by
MX (λ) = E[eλX ]
X
= P (X = x).eλx
x∈TX
X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8
13. A fair die is rolled 3600 times. Use CLT to compute the probability that six appears at
most 630 times. Enter the answer correct to two decimal places.
(Hint: Use FZ (1.341) = 0.91)
Solution:
Define a random variable X such that
(
1 if six appears on rolling a fair die
X=
0 otherwise
Page 8
1
Therefore, E[X] = µ = and
6
1 5 5
Var(X) = σ 2 = . =
6 6 36
Let S = X1 + X2 + . . . + X3600
To find: P (S ≤ 630)
14. A fair die is rolled 1000 times. Let X denote the number of times six is obtained. Find
X 1
a bound for the probability that differs from by more than 0.2 using weak law
1000 6
of large numbers.
5
1. at least
1440
1436
2. at least
1440
5
3. at most
1440
1436
4. at most
1440
Solution:
X denotes the number of times six is obtained on rolling the die 1000 times.
Let X1 , X2 , . . . , X1000 be 1000 i.i.d. samples such that
(
1 if six appears on rolling a fair die
Xi =
0 otherwise
Page 9
1
E[Xi ] = µ = and
6
5
Var(Xi ) = σ 2 =
36
Notice that X = X1 + X2 + X3 + . . . + X1000
!
X 1
To find: Bound on P − > 0.2 .
1000 6
15. Consider the following PDF curves and match them with the correct distribution. [1
mark]
Graph 1 Graph 2
Graph 3 Graph 4
Page 10
(a) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Gamma, Graph 4 → Beta.
(b) Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 → Gamma.
(c) Graph 1 → Beta, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Gamma.
(d) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Beta.
Solution:
Graph 1: Range of the distribution is [0, 1] and shape of the graph resembles to the Beta
distribution.
Graph 2: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.
Graph 3: PDF curve is symmetric about mean and shape of the graph resembles to the
Normal distribution.
Graph 4: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.
Therefore, Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 →
Gamma.
16. Let X1 , X2 and X3 ∼ i.i.d. X where X has the following probability mass function:
x -1 2
2 1
fX (x) 3 3
Y -3 0 3 6
(a) 1 1 1 1
P (Y = y) 6 6 3 3
Y -3 0 3 6
(b) 8 4 2 1
P (Y = y) 27 9 9 27
Y -3 0 3 6
(c) 8 1 4 2
P (Y = y) 27 27 9 9
Y -3 0 3 6
(d) 2 8 1 4
P (Y = y) 9 27 27 9
Page 11
Solution:
The PMF of X is given by
x -1 2
2 1
fX (x) 3 3
MY (λ) = E[eλY ]
= E[eλ(X1 +X2 +X3 ) ]
= E[eλX1 eλX2 eλX3 ]
= E[eλX1 ]E[eλX2 ]E[eλX3 ] (Since, X1 , X2 and X3 are independent)
λX λX λX
= E[e ]E[e ]E[e ] (Since, X1 , X2 and X3 ∼ i.i.d. X)
= [MX (λ)]3 ...(1)
Now,
MX (λ) = E[eλX ]
= e−1λ .P (X = −1) + e2λ .P (X = 2)
2e−λ e2λ
= + ...(2)
3 3
From equation (1) and (2), we have
3
2e−λ e2λ
MY (λ) = +
3 3
1
= (2e−λ + e2λ )3
27
1
= (8e−3λ + e6λ + 12e−2λ e2λ + 6e−λ e4λ ) (since, (a + b)3 = a3 + b3 + 3a2 b + 3ab2 )
27
8 1 4 2
= e−3λ + e6λ + + e3λ
27 27 9 9
Therefore, distribution of Y is given by
Y -3 0 3 6
8 4 2 1
P (Y = y) 27 9 9 27
Page 12