4/22/24, 11:02 AM Practical 2
MGV's Loknete Vyankatrao Hiray Arts, Science
and Commerce College Nashik
Department of Mathematics
M. Sc. 1 Data Science
Practical 2 Probability Distributions of Discrete
Random Variables
a. Binomial Random Variable
1. Example. Given 10 trials for coin toss generate 10
data points:n - number of trials.p - probability of
occurence of each trial (e.g. for toss of a coin 0.5
each). size - The shape of the returned array
In [1]: from numpy import random
x = random.binomial(n=10, p=0.5, size=10)
print(x)
[3 3 3 6 3 3 5 6 7 6]
In [5]: # Visualization of Binomial Distribution
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.binomial(n=10, p=0.5, size=1000), hist= True)
plt.show()
localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 1/7
4/22/24, 11:02 AM Practical 2
2. Example Consider a random experiment of tossing a
biased coin 6 times where the probability of getting a
head is 0.6. If ‘getting a head’ is considered as
‘success’ then, the binomial distribution table will
contain the probability of x successes for each
possible value of x.
In [6]: from scipy.stats import binom
# setting the values
# of n and p
n = 6
p = 0.6
# defining the list of r values
r_values = list(range(n + 1))
# obtaining the mean and variance
mean, var = binom.stats(n, p)
# list of pmf values
dist = [binom.pmf(r, n, p) for r in r_values ]
# printing the table
print("r\tp(r)")
for i in range(n + 1):
print(str(r_values[i]) + "\t" + str(dist[i]))
# printing mean and variance
print("mean = "+str(mean))
print("variance = "+str(var))
r p(r)
0 0.0040960000000000015
1 0.03686400000000002
2 0.1382400000000001
3 0.2764800000000001
4 0.3110400000000001
5 0.1866240000000001
6 0.04665599999999999
mean = 3.5999999999999996
variance = 1.44
localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 2/7
4/22/24, 11:02 AM Practical 2
In [7]: from scipy.stats import binom
import matplotlib.pyplot as plt
# setting the values
# of n and p
n = 6
p = 0.6
# defining list of r values
r_values = list(range(n + 1))
# list of pmf values
dist = [binom.pmf(r, n, p) for r in r_values ]
# plotting the graph
plt.bar(r_values, dist)
plt.show()
b. Poisson Random Variable
1. Example If someone eats twice a day what is the
probability he will eat thrice? It has two parameters:
lam - rate or known number of occurrences e.g. 2 for
above problem. size - The shape of the returned
array.Generate a random 1x10 distribution for
occurrence 2
In [8]: from numpy import random
x = random.poisson(lam=2, size=10)
print(x)
[4 2 1 0 2 2 2 1 4 1]
In [10]: # Visualization of Poisson Distribution
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 3/7
4/22/24, 11:02 AM Practical 2
In [11]: sns.distplot(random.poisson(lam=2, size=1000), kde=False)
plt.show()
2. Example of frequencies of hurricanes. Assume that
when we have data on observing hurricanes over a
period of 20 years. We find that the average number of
hurricanes per year is 7 ¶
In [1]: import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
k = np.arange(0, 21)
print(k)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
In [16]: # Poisson PMF (Probability mass function)
In [2]: pmf = poisson.pmf(k, mu=7)
pmf = np.round(pmf, 5)
print(pmf)
[9.1000e-04 6.3800e-03 2.2340e-02 5.2130e-02 9.1230e-02 1.2772e-01
1.4900e-01 1.4900e-01 1.3038e-01 1.0140e-01 7.0980e-02 4.5170e-02
2.6350e-02 1.4190e-02 7.0900e-03 3.3100e-03 1.4500e-03 6.0000e-04
2.3000e-04 9.0000e-05 3.0000e-05]
localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 4/7
4/22/24, 11:02 AM Practical 2
In [3]: for val, prob in zip(k,pmf):
print(f"k-value {val} has probability = {prob}")
k-value 0 has probability = 0.00091
k-value 1 has probability = 0.00638
k-value 2 has probability = 0.02234
k-value 3 has probability = 0.05213
k-value 4 has probability = 0.09123
k-value 5 has probability = 0.12772
k-value 6 has probability = 0.149
k-value 7 has probability = 0.149
k-value 8 has probability = 0.13038
k-value 9 has probability = 0.1014
k-value 10 has probability = 0.07098
k-value 11 has probability = 0.04517
k-value 12 has probability = 0.02635
k-value 13 has probability = 0.01419
k-value 14 has probability = 0.00709
k-value 15 has probability = 0.00331
k-value 16 has probability = 0.00145
k-value 17 has probability = 0.0006
k-value 18 has probability = 0.00023
k-value 19 has probability = 9e-05
k-value 20 has probability = 3e-05
In [15]: plt.plot(k, pmf, marker='o')
plt.xlabel('k')
plt.ylabel('Probability')
plt.show()
In [17]: # Poisson CDF (Cumulative Distribution function)
In [4]: cdf = poisson.cdf(k, mu=7)
cdf = np.round(cdf, 3)
print(cdf)
[0.001 0.007 0.03 0.082 0.173 0.301 0.45 0.599 0.729 0.83 0.901 0.947
0.973 0.987 0.994 0.998 0.999 1. 1. 1. 1. ]
localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 5/7
4/22/24, 11:02 AM Practical 2
In [19]: for val, prob in zip(k,cdf):
print(f"k-value {val} has probability = {prob}")
k-value 0 has probability = 0.001
k-value 1 has probability = 0.007
k-value 2 has probability = 0.03
k-value 3 has probability = 0.082
k-value 4 has probability = 0.173
k-value 5 has probability = 0.301
k-value 6 has probability = 0.45
k-value 7 has probability = 0.599
k-value 8 has probability = 0.729
k-value 9 has probability = 0.83
k-value 10 has probability = 0.901
k-value 11 has probability = 0.947
k-value 12 has probability = 0.973
k-value 13 has probability = 0.987
k-value 14 has probability = 0.994
k-value 15 has probability = 0.998
k-value 16 has probability = 0.999
In [20]: plt.plot(k, cdf, marker='o')
plt.xlabel('k')
plt.ylabel('Cumulative Probability')
plt.show()
c. Hypergeometric Random Variable
1. Example Aces in a Five-Card Poker Hand. The
number of aces in a five-card poker hand has the
hypergeometric distribution with population size 52,
four good elements in the population, and a simple
random sample size of 5
In [22]: import scipy.stats as stats
localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 6/7
4/22/24, 11:02 AM Practical 2
In [23]: k = np.arange(5)
N = 52 # population size
G = 4 # number of good elements in population
n = 5 # simple random sample size
stats.hypergeom.pmf(k, N, G, n)
Out[23]: array([6.58841998e-01, 2.99473636e-01, 3.99298181e-02, 1.73607905e-03,
1.84689260e-05])
In [25]: a = np.round(stats.hypergeom.pmf(k, N, G, n), 3)
In [27]: a
Out[27]: array([0.659, 0.299, 0.04 , 0.002, 0. ])
In [26]: plt.plot(k, a, marker='o')
plt.show()
In [ ]:
localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 7/7