0% found this document useful (0 votes)
40 views7 pages

Probability Distributions in Data Science

Uploaded by

Om Bachhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views7 pages

Probability Distributions in Data Science

Uploaded by

Om Bachhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

4/22/24, 11:02 AM Practical 2

MGV's Loknete Vyankatrao Hiray Arts, Science


and Commerce College Nashik

Department of Mathematics

M. Sc. 1 Data Science

Practical 2 Probability Distributions of Discrete


Random Variables

a. Binomial Random Variable

1. Example. Given 10 trials for coin toss generate 10


data points:n - number of trials.p - probability of
occurence of each trial (e.g. for toss of a coin 0.5
each). size - The shape of the returned array
In [1]: from numpy import random
x = random.binomial(n=10, p=0.5, size=10)
print(x)

[3 3 3 6 3 3 5 6 7 6]

In [5]: # Visualization of Binomial Distribution


from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.binomial(n=10, p=0.5, size=1000), hist= True)
plt.show()

localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 1/7


4/22/24, 11:02 AM Practical 2

2. Example Consider a random experiment of tossing a


biased coin 6 times where the probability of getting a
head is 0.6. If ‘getting a head’ is considered as
‘success’ then, the binomial distribution table will
contain the probability of x successes for each
possible value of x.
In [6]: from scipy.stats import binom
# setting the values
# of n and p
n = 6
p = 0.6
# defining the list of r values
r_values = list(range(n + 1))
# obtaining the mean and variance
mean, var = binom.stats(n, p)
# list of pmf values
dist = [binom.pmf(r, n, p) for r in r_values ]
# printing the table
print("r\tp(r)")
for i in range(n + 1):
print(str(r_values[i]) + "\t" + str(dist[i]))
# printing mean and variance
print("mean = "+str(mean))
print("variance = "+str(var))

r p(r)
0 0.0040960000000000015
1 0.03686400000000002
2 0.1382400000000001
3 0.2764800000000001
4 0.3110400000000001
5 0.1866240000000001
6 0.04665599999999999
mean = 3.5999999999999996
variance = 1.44

localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 2/7


4/22/24, 11:02 AM Practical 2

In [7]: from scipy.stats import binom


import matplotlib.pyplot as plt
# setting the values
# of n and p
n = 6
p = 0.6
# defining list of r values
r_values = list(range(n + 1))
# list of pmf values
dist = [binom.pmf(r, n, p) for r in r_values ]
# plotting the graph
plt.bar(r_values, dist)
plt.show()

b. Poisson Random Variable

1. Example If someone eats twice a day what is the


probability he will eat thrice? It has two parameters:
lam - rate or known number of occurrences e.g. 2 for
above problem. size - The shape of the returned
array.Generate a random 1x10 distribution for
occurrence 2
In [8]: from numpy import random
x = random.poisson(lam=2, size=10)
print(x)

[4 2 1 0 2 2 2 1 4 1]

In [10]: # Visualization of Poisson Distribution


from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns

localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 3/7


4/22/24, 11:02 AM Practical 2

In [11]: sns.distplot(random.poisson(lam=2, size=1000), kde=False)


plt.show()

2. Example of frequencies of hurricanes. Assume that


when we have data on observing hurricanes over a
period of 20 years. We find that the average number of
hurricanes per year is 7 ¶
In [1]: import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
k = np.arange(0, 21)
print(k)

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]

In [16]: # Poisson PMF (Probability mass function)

In [2]: pmf = poisson.pmf(k, mu=7)


pmf = np.round(pmf, 5)
print(pmf)

[9.1000e-04 6.3800e-03 2.2340e-02 5.2130e-02 9.1230e-02 1.2772e-01


1.4900e-01 1.4900e-01 1.3038e-01 1.0140e-01 7.0980e-02 4.5170e-02
2.6350e-02 1.4190e-02 7.0900e-03 3.3100e-03 1.4500e-03 6.0000e-04
2.3000e-04 9.0000e-05 3.0000e-05]

localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 4/7


4/22/24, 11:02 AM Practical 2

In [3]: for val, prob in zip(k,pmf):


print(f"k-value {val} has probability = {prob}")

k-value 0 has probability = 0.00091


k-value 1 has probability = 0.00638
k-value 2 has probability = 0.02234
k-value 3 has probability = 0.05213
k-value 4 has probability = 0.09123
k-value 5 has probability = 0.12772
k-value 6 has probability = 0.149
k-value 7 has probability = 0.149
k-value 8 has probability = 0.13038
k-value 9 has probability = 0.1014
k-value 10 has probability = 0.07098
k-value 11 has probability = 0.04517
k-value 12 has probability = 0.02635
k-value 13 has probability = 0.01419
k-value 14 has probability = 0.00709
k-value 15 has probability = 0.00331
k-value 16 has probability = 0.00145
k-value 17 has probability = 0.0006
k-value 18 has probability = 0.00023
k-value 19 has probability = 9e-05
k-value 20 has probability = 3e-05

In [15]: plt.plot(k, pmf, marker='o')


plt.xlabel('k')
plt.ylabel('Probability')
plt.show()

In [17]: # Poisson CDF (Cumulative Distribution function)

In [4]: cdf = poisson.cdf(k, mu=7)


cdf = np.round(cdf, 3)
print(cdf)

[0.001 0.007 0.03 0.082 0.173 0.301 0.45 0.599 0.729 0.83 0.901 0.947
0.973 0.987 0.994 0.998 0.999 1. 1. 1. 1. ]

localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 5/7


4/22/24, 11:02 AM Practical 2

In [19]: for val, prob in zip(k,cdf):


print(f"k-value {val} has probability = {prob}")

k-value 0 has probability = 0.001


k-value 1 has probability = 0.007
k-value 2 has probability = 0.03
k-value 3 has probability = 0.082
k-value 4 has probability = 0.173
k-value 5 has probability = 0.301
k-value 6 has probability = 0.45
k-value 7 has probability = 0.599
k-value 8 has probability = 0.729
k-value 9 has probability = 0.83
k-value 10 has probability = 0.901
k-value 11 has probability = 0.947
k-value 12 has probability = 0.973
k-value 13 has probability = 0.987
k-value 14 has probability = 0.994
k-value 15 has probability = 0.998
k-value 16 has probability = 0.999

In [20]: plt.plot(k, cdf, marker='o')


plt.xlabel('k')
plt.ylabel('Cumulative Probability')
plt.show()

c. Hypergeometric Random Variable

1. Example Aces in a Five-Card Poker Hand. The


number of aces in a five-card poker hand has the
hypergeometric distribution with population size 52,
four good elements in the population, and a simple
random sample size of 5
In [22]: import scipy.stats as stats

localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 6/7


4/22/24, 11:02 AM Practical 2

In [23]: k = np.arange(5)
N = 52 # population size
G = 4 # number of good elements in population
n = 5 # simple random sample size
stats.hypergeom.pmf(k, N, G, n)

Out[23]: array([6.58841998e-01, 2.99473636e-01, 3.99298181e-02, 1.73607905e-03,


1.84689260e-05])

In [25]: a = np.round(stats.hypergeom.pmf(k, N, G, n), 3)

In [27]: a

Out[27]: array([0.659, 0.299, 0.04 , 0.002, 0. ])

In [26]: plt.plot(k, a, marker='o')



plt.show()

In [ ]: ​

localhost:8888/notebooks/Desktop/LVH Academic/Data Science/practical exercis/Practical 2.ipynb 7/7

You might also like