100% found this document useful (1 vote)

54 views42 pages

Lecture Slides - Inferential Statistics

Uploaded by

smorshed03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

54 views42 pages

Lecture Slides - Inferential Statistics

Uploaded by

smorshed03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Statistics for Data Science

Inferential Statistics
Agenda – Inferential Statistics
1. Inferential Statistics
2. Some fundamental terms first
a. Random Variables
b. Distribution and its types
3. Binomial Distribution
4. Uniform Distribution
5. Normal Distribution
6. Sampling and Inference
a. Simple Random Samples
b. Sampling Distribution
c. Central Limit Theorem
7. Estimation
a. Point Estimation
b. Interval Estimation

2
Descriptive vs. Inferential Statistics

Summaries from Inferential

Is that enough?
data Statistics

Summaries give a sense of Typically, we work with Inferential statistics helps

central tendency only ‘samples’ of data tackle that challenge

Tell a lot about ‘what’s It is not rough to make There are powerful
happening’ generalized statement methods to draw
about the population reasonable conclusions
Mean, Std Dev, IQRs etc. about the population from
Challenge - how to learn an observed sample
about population from the
sample? This becomes extremely
critical in business
decision making

4
Role of distributions in inferential statistics

5
Real World Problems

Is the new manufacturing process better/more reliable than the old

Quality Testing process?

What is the likelihood that temp will be more than 20 degree celsius
Meteorology on a specific day?

Human Does training the workforce improve sales?

Resources

Digital What is the likelihood that the conversion rate of the website will be
Marketing above x% next month?

... ...
6
We will learn about...
1. Some famous distributions & their characteristics
2. Central Limit Theorem - Powerful idea that enables a lot of inferential statistics
3. Estimations and Confidence Intervals
4. Hypothesis Testing

7
Some fundamental terms first

8
Random Variable
Suppose there are 1000 students in the university. What is the probability that 500
students will pass the upcoming exam?

There is a 50-50 chance that each student will pass or fail

The total number of students who pass can range from 0 to 1000

A random variable assigns a numerical value to each outcome of an experiment. It

assumes different values with different probability.

9
Discrete Random Variable

You work for an Auto insurance company. Suppose the number of insurance claims filed by
a driver in a month is a random variable (X) described by

0, with prob 0.95

1, with prob 0.04
X=
2, with prob 0.008
3, with prob 0.002

All probabilities must be non-negative and sum to 1.

When all possible values the random variable can take can be listed,
we call it a discrete random variable

10
Continuous Random Variable
Suppose the volume of soda in a bottle is described by a random variable.

Can we list all possible values?

498 mL, 499 mL, 500 mL, … … … What about 499.2129415 mL?

Sometimes it is just not possible to list all values a random variable can take

If the random variable can take any value in a given range, we call it a
continuous random variable

11
Probability Distribution

Probability Distribution
Describes the values that a random variable can take,
along with the probabilities of those values

Discrete Probability Distribution Continuous Probability Distribution

Arises from discrete random variables Arises from continuous random variables

Has an associated probability mass Has an associated probability density

function, which gives the probability with function, which helps determine the
which the random variable takes a probability with which the random
particular value variable lies between two given numbers

12
Probability Distribution: Example
A company tracks the number of sales new employees make each day during a 100-day
probationary period. The results for one new employee are shown. Construct and plot a
probability distribution.

Sales #Days Sales #Days Relative

Frequency
0 16
0 16 0.16
1 18
1 18 0.18
2 15
2 15 0.15
3 21
3 21 0.21
4 11
4 11 0.11
5 10
5 10 0.10
6 9
6 9 0.09
13
Distributions around us (commonly occuring)

Company has introduced a new drug to cure a disease, it either cures

Bernoulli
the disease (it's successful) or it doesn't (it's a failure)

Binomial The number of defective products in a production run.

Uniform The number of of microwave oven sold daily at a electronic store

Income distribution of a country : middle-class population is a bit

Normal
higher than the rich and poor population in the country

14
Basic distributions - Binomial

15
Bernoulli Distribution
Success and failure are non-
judgemental. Any one outcome
may be termed as success

It has only two possible outcomes, namely 1 (success) and 0 (failure), of one single trial.

1, with prob p
X=
0, with prob 1-p

Very useful in many scenarios:

Manufacturing defective parts Outcome of medical test

16
Binomial Distribution

Suppose we ask any adult who uses the app TikTok if he/she has ever posted a video on the app

The answer can be Yes or No (success or failure)

We can use the Bernoulli distribution to model this scenario

Now let us extend this into a survey of 25 adults chosen at random

We can define a random variable X which counts the number of successes

(say, the number of adults who responded Yes)

17
Binomial Distribution

In many situations an experiment may have only two outcomes - success and failure

These experiments can be modelled using the Binomial probability distribution.

Bernoulli Distribution is a special case of Binomial Distribution with a single trial.

Probability Mass Function

18
Binomial Distribution : Assumptions

Number of trials (n) is fixed.

Each trial is independent of the other trials.

There are only two possible outcomes (success or failure) for each trial.

The probability of a success (p) is the same for each trial.

19
What happens if these assumptions are violated?
In a month of 30 days, what is the probability that it will rain on more than 10 days, if on
average the chance of rain on a given day is 20%?

If we assume that:

1. The event of rain on a particular day is independent of it raining on the previous day.
2. The chance of rain does not increase or decrease over the duration of the month.

Then we can use the binomial distribution with n=30 and p=0.2 to calculate the probability.

Assumptions 1 and 2 in the example are not strictly valid, but they allow for a direct calculation
that may be good enough for practical purposes.

20
Basic distributions - Uniform

21
Uniform Distribution
Suppose we roll a die. The outcomes of this event can be 1,2,3,4,5,6

All of the outcomes have an equal probability of occurrence and are

mutually exclusive https://pixabay.com/vectors/di
ce-games-play-1294902/

We can say that the probabilities of occurrence is uniformly distributed.

This is referred to as Uniform Distribution

Useful when we are interested in unbiased selection

22
Uniform Distribution
There are two types of Uniform Distribution

Discrete Uniform Distribution: Can take a finite number (m) of values and each value has
equal probability of selection.
For example: Number of books sold by a bookseller per day can be uniformly distributed
between 100 to 300.

Continuous Uniform Distribution: Can take any value between a specified range.
For example: Tomorrow’s temperature in United states can be
uniformly Distributed between 12 degree Celsius to 17 degree celsius

23
Basic distributions - Normal

24
Normal Distribution : Introduction
Normal distribution is the most common and useful continuous distribution

It is characterized by a symmetric bell-shaped curve having two parameters - mean (μ)and

standard deviation (𝜎).

25
Normal Distribution : Why Normal

Why is it called the normal distribution?

They are commonly found everywhere starting from nature to industry

Many useful datasets are approximately normally distributed

For example the height and weight of the adults, IQ scores, measurement errors,
quality control test results etc.

26
Normal Distribution : Properties

1. The graph of the normal 1. About 68% of the data fall

distribution is called the within one standard
normal curve deviation from the mean
2. Normal curve is symmetric 2. About 95% of the data fall
around the mean within two standard
3. Mean, Median and Mode of the deviation from the mean
normal distribution are equal 3. About 99.7% of the data
4. Total area under the normal fall within three standard
curve is 1 deviation from the mean
68%
95%
99%

𝜇-3𝜎 𝜇-2𝜎 𝜇-𝜎 𝜇+𝜎 𝜇+2𝜎 𝜇+3𝜎

27
Normal Distribution : Example
Assume that a food delivery service provider A has a mean delivery time of 40 minutes and
a standard deviation of 10 minutes.

Using the Empirical Rule, we can determine that

● About 68% of the delivery times are

between 30-50 minutes (40 ± 10)

68%
● About 95% of the delivery times are
between 20-60 minutes (40 ± 2(10)) 95%
99%
● About 99.7% of the delivery times are
between 10-70 minutes (40 ± 3(10)) 10 20 30 40 50 60 70

This property is known as Empirical rule.

28
Normal Distribution : Area under Density Curve

As with any continuous probability distribution, the area under the density curve between
two points indicates the probability that the variable will fall within that interval.

𝜇 and 𝜎 are the parameters that decide the center and spread of the normal curve

To find the area, we need Calculus

But, there is an easier way to do it in Python (or other softwares).

It provides us the necessary functions to calculate the area.

29
Normal Distribution : Standard Normal Distribution
A standard normal distribution is used to compare two normal distributions with different
parameters (μ, σ)

The standard normal variable is denoted by Z and the distribution is also known as Z
distribution

It always has a mean of 0 and standard deviation of 1

Standardize

Normal Distribution Standard Normal Distribution

30
Normal Distribution : Z-Score

A normal variable can be converted to standard normal variable by subtracting the

mean (𝜇) and dividing the standard deviation (𝜎):

where,
● X is the observed data point
● Z (Z-Score/Standard score) is the measure of the number of
standard deviations above or below the mean that data point
falls

31
Sampling and Inference

32
Revisiting the need for sampling..

In many of the situations, what we have available to us is a sample of data.

The data we have is finite.

Till now, the goal was to find ways of describing, summarizing and visualising
the sample data only

Moving ahead, we want to make inferences about the

“entire” population using the sample data.

33
Sampling : Simple Random Sampling

A sampling technique where every item in the population has an equal chance
of being selected

Allows all the entities in the population to have

Why are simple random an equal chance of being selected and so the
samples important? sample is likely to be representative of the
population

34
Sampling Distribution
The sampling distribution of a statistic is the probability distribution of that
statistic when we draw many samples
For example sampling distribution of the mean, sampling distribution of variance etc.
To a great extent, statistical inference techniques are based on sampling distribution of a statistic

Samples of
size n

Population Sampling Distribution

Distribution of means

35
Some Important Points

Suppose we are sampling from a population with mean 𝝁 and standard deviation 𝞂. Let 𝑿
ഥ
be the random variable representing the sample mean of n independent observations.

The mean of 𝑿
ഥ is equal to 𝝁

The standard deviation of 𝑿

ഥ is equal to 𝞂/√n (Also called the ‘standard error’ of 𝑿
ഥ)

Even the population is not normally distributed, then for sufficiently

large n 𝑿
ഥ is also normally distributed.

36
Central Limit Theorem

The sampling distribution of the sample means will approach

normal distribution as the sample size gets bigger, no matter
what the shape of the population distribution is.

Assumptions

Data must be randomly sampled Sample values must be independent of each other

Samples should come from the same distribution Sample size must be sufficiently large (≥30)

37
Central Limit Theorem

Large sample size provides better estimate of

the population mean.

For sample size n = 5, the mean of sample

means pile up around the population mean.

For sample size n = 30, the mean of sample

means are much closer to the population
mean.

38
Estimations

39
Estimation

Estimation
Make inference about a population parameter
based on sample statistic

Point Estimation Interval Estimation

Single point estimation of the population A range of values within which the
parameter population parameter lies with some
(x%) confidence
E.g. Population mean as estimated from
the sample mean is $40 E.g. Population mean should lie between
$38-42, with 95% confidence (x = 95)
40
Point Estimation
A point estimate of a population parameter is a single value of a statistic

For example: The sample mean X̅ is a point estimate of the population mean μ. Similarly, the sample
standard deviation s is a point estimate of the population standard deviation σ.

Point estimates vary from sample to sample. Often an interval is used to provide
a range of values the parameter can take, instead of a single point estimate

41
Interval estimation - Confidence interval

Confidence interval provides an interval, or a range of values, which is expected

to cover the true unknown parameter.
Confidence limits
True Value
Estimation
(unknown)
The upper and lower limits of
the interval are determined
using the distribution of the
sample mean and a multiplier
which specifies the ‘confidence’
Confidence level

42
Confidence Interval for Mean 𝜇

Interpretation of 95% Confidence Interval

- The interpretation of a 95% confidence interval is that, if the process is repeated a

large number of times, then the intervals so constructed, will contain the true
population parameter 95% of times.

Why not 100% Confidence Interval?

- A 100% confidence interval will include All possible values.

- Hence there will be no insight into the problem.

Lecture+Slides+ +week+1
No ratings yet
Lecture+Slides+ +week+1
30 pages
Day 02-Random Variable and Probability - Part (I)
No ratings yet
Day 02-Random Variable and Probability - Part (I)
34 pages
UNIT 1 Notes by ARUN JHAPATE
No ratings yet
UNIT 1 Notes by ARUN JHAPATE
20 pages
Business Statistics Guide
No ratings yet
Business Statistics Guide
32 pages
Understanding Data Distributions Explained
No ratings yet
Understanding Data Distributions Explained
4 pages
1st Unit Notes
No ratings yet
1st Unit Notes
34 pages
Probability Distribution
No ratings yet
Probability Distribution
10 pages
Module 2
No ratings yet
Module 2
67 pages
Statistics Notes Part-2
No ratings yet
Statistics Notes Part-2
24 pages
Statistics Part2
No ratings yet
Statistics Part2
28 pages
Module 2 in IStat 1 Probability Distribution
No ratings yet
Module 2 in IStat 1 Probability Distribution
6 pages
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
No ratings yet
Statistic S at Probabili TY: Teacher: Aldwin N. Petronio
44 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
20 pages
Lecture4 Probability
No ratings yet
Lecture4 Probability
28 pages
Continuous Random Variable
No ratings yet
Continuous Random Variable
44 pages
Distribution Prerequisite
No ratings yet
Distribution Prerequisite
11 pages
Business Inferential Statistics Lessons
No ratings yet
Business Inferential Statistics Lessons
7 pages
Types of Data
No ratings yet
Types of Data
45 pages
Normal Distribution Overview
No ratings yet
Normal Distribution Overview
19 pages
Bus Stat CHP 6&7
No ratings yet
Bus Stat CHP 6&7
7 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
4 pages
BSR PPT - Compiled
No ratings yet
BSR PPT - Compiled
24 pages
Continuous Probability Distribution Guide
No ratings yet
Continuous Probability Distribution Guide
42 pages
Probability Distributions
No ratings yet
Probability Distributions
16 pages
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
No ratings yet
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
65 pages
Inbound 4421484962866478386
No ratings yet
Inbound 4421484962866478386
68 pages
Probability & Testing in Data Analytics
No ratings yet
Probability & Testing in Data Analytics
70 pages
Random Variables & Distributions Guide
No ratings yet
Random Variables & Distributions Guide
4 pages
Continuous Probability Distributions Overview
No ratings yet
Continuous Probability Distributions Overview
53 pages
Binomial Distribution
No ratings yet
Binomial Distribution
36 pages
Probabilidad y Estadistica
No ratings yet
Probabilidad y Estadistica
9 pages
Probability
No ratings yet
Probability
50 pages
3 - Introduction To Inferential Statistics
No ratings yet
3 - Introduction To Inferential Statistics
32 pages
Types of Random Variables Explained
No ratings yet
Types of Random Variables Explained
4 pages
Ders 1
No ratings yet
Ders 1
34 pages
Continuous Prob Dist
No ratings yet
Continuous Prob Dist
79 pages
Random Variables: Petter Mostad 2005.09.19
No ratings yet
Random Variables: Petter Mostad 2005.09.19
24 pages
Statistics Distributions Guide
No ratings yet
Statistics Distributions Guide
5 pages
Week5 BAM
No ratings yet
Week5 BAM
48 pages
Special Probability Distributions Overview
No ratings yet
Special Probability Distributions Overview
45 pages
Continuous Random Variables Guide
No ratings yet
Continuous Random Variables Guide
33 pages
What Is Probability
No ratings yet
What Is Probability
8 pages
Probability
No ratings yet
Probability
36 pages
CH 6 - Normal Distribution
No ratings yet
CH 6 - Normal Distribution
41 pages
Class 4 SP
No ratings yet
Class 4 SP
23 pages
Classify Sample Observation
No ratings yet
Classify Sample Observation
2 pages
Statistical Tools
No ratings yet
Statistical Tools
79 pages
UNIT - 4 Complete
No ratings yet
UNIT - 4 Complete
77 pages
Probability Concepts and Examples
No ratings yet
Probability Concepts and Examples
19 pages
Understanding Probability Distributions
No ratings yet
Understanding Probability Distributions
4 pages
Statistics Final Review
No ratings yet
Statistics Final Review
28 pages
Unit 1 Ssmda Notes
No ratings yet
Unit 1 Ssmda Notes
35 pages
03-Probability & Statistics
No ratings yet
03-Probability & Statistics
35 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
33 pages
Week 12
No ratings yet
Week 12
38 pages
Continuous Pro SV
No ratings yet
Continuous Pro SV
45 pages
01 SOC Chapter 1 Probability Distributions - Nov 26, 2023
No ratings yet
01 SOC Chapter 1 Probability Distributions - Nov 26, 2023
41 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
3 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
What Is Supervised and Unsupervised Learning?
No ratings yet
What Is Supervised and Unsupervised Learning?
2 pages
Lecture Slides - Hypothesis Testing
No ratings yet
Lecture Slides - Hypothesis Testing
30 pages
Statistics 13th Edition Mcclave Test Bank
100% (29)
Statistics 13th Edition Mcclave Test Bank
27 pages
Chapter 9 Estimation From Sampling Data
No ratings yet
Chapter 9 Estimation From Sampling Data
22 pages
Chapter 5 Joint Probability Distributions 2
No ratings yet
Chapter 5 Joint Probability Distributions 2
49 pages
Math 213 - Engineering Data Analysis
No ratings yet
Math 213 - Engineering Data Analysis
11 pages
Modern Engineering Statistics 1st Edition Thomas P. Ryan Instant Study Access
No ratings yet
Modern Engineering Statistics 1st Edition Thomas P. Ryan Instant Study Access
102 pages
Seeing The Forest by Looking at The Trees: How To Interpret A Meta-Analysis Forest Plot, J Dettori, 2021
No ratings yet
Seeing The Forest by Looking at The Trees: How To Interpret A Meta-Analysis Forest Plot, J Dettori, 2021
3 pages
T-Distribution and Sample Size Estimation
No ratings yet
T-Distribution and Sample Size Estimation
18 pages
Lecture 4 Notes
No ratings yet
Lecture 4 Notes
15 pages
PSYB70 - Methods of Psychological Science - Lecture Notes
No ratings yet
PSYB70 - Methods of Psychological Science - Lecture Notes
49 pages
Statistics - Lec09 - Interval Estimation
No ratings yet
Statistics - Lec09 - Interval Estimation
30 pages
Exam Statistics and Probability 3rd Quarter
No ratings yet
Exam Statistics and Probability 3rd Quarter
6 pages
1350-2A-QP-MathematicalStudies-L3-23May24-AM 2025-06-03 15 - 56 - 09
No ratings yet
1350-2A-QP-MathematicalStudies-L3-23May24-AM 2025-06-03 15 - 56 - 09
24 pages
Sta 341 Class Notes Final
No ratings yet
Sta 341 Class Notes Final
120 pages
Lesson 4.2 Computing The Point Estimate of A Population Mean
No ratings yet
Lesson 4.2 Computing The Point Estimate of A Population Mean
24 pages
Statistical Inference Guide
No ratings yet
Statistical Inference Guide
10 pages
Engr371 S
No ratings yet
Engr371 S
5 pages
Statistical Inference
No ratings yet
Statistical Inference
158 pages
Unit One Sampling and Sampling Distribution
No ratings yet
Unit One Sampling and Sampling Distribution
41 pages
Estimation and Confidence Intervals
No ratings yet
Estimation and Confidence Intervals
51 pages
STA5001T STA5001P: Statistical Methods Statistical Methods Practical
No ratings yet
STA5001T STA5001P: Statistical Methods Statistical Methods Practical
21 pages
Guide To Finding and Using Reliability Data For QRA: Source Responsible
100% (1)
Guide To Finding and Using Reliability Data For QRA: Source Responsible
42 pages
Lecture 2 - Inferential Statistics
No ratings yet
Lecture 2 - Inferential Statistics
75 pages
Independent T-Test
No ratings yet
Independent T-Test
4 pages
Statistics: Confidence Intervals
No ratings yet
Statistics: Confidence Intervals
12 pages
Complete Bundle Probability and Statistics For Engineering and The Sciences 8th Edition Devore
No ratings yet
Complete Bundle Probability and Statistics For Engineering and The Sciences 8th Edition Devore
410 pages
Chapter 9
No ratings yet
Chapter 9
22 pages
Statistical Inference-1
No ratings yet
Statistical Inference-1
3 pages
Chapter 5 - Estimation
No ratings yet
Chapter 5 - Estimation
8 pages
Module-1 Deep Learning (Autosaved)
No ratings yet
Module-1 Deep Learning (Autosaved)
100 pages
Chapter 2 Statistical Estimation
No ratings yet
Chapter 2 Statistical Estimation
30 pages

Lecture Slides - Inferential Statistics

Uploaded by

Lecture Slides - Inferential Statistics

Uploaded by

Statistics for Data Science

Summaries from Inferential

Summaries give a sense of Typically, we work with Inferential statistics helps

Is the new manufacturing process better/more reliable than the old

Human Does training the workforce improve sales?

There is a 50-50 chance that each student will pass or fail

A random variable assigns a numerical value to each outcome of an experiment. It

0, with prob 0.95

All probabilities must be non-negative and sum to 1.

Can we list all possible values?

Discrete Probability Distribution Continuous Probability Distribution

Has an associated probability mass Has an associated probability density

Sales #Days Sales #Days Relative

Company has introduced a new drug to cure a disease, it either cures

Binomial The number of defective products in a production run.

Uniform The number of of microwave oven sold daily at a electronic store

Income distribution of a country : middle-class population is a bit

Very useful in many scenarios:

Manufacturing defective parts Outcome of medical test

The answer can be Yes or No (success or failure)

We can use the Bernoulli distribution to model this scenario

Now let us extend this into a survey of 25 adults chosen at random

We can define a random variable X which counts the number of successes

These experiments can be modelled using the Binomial probability distribution.

Bernoulli Distribution is a special case of Binomial Distribution with a single trial.

Probability Mass Function

Number of trials (n) is fixed.

Each trial is independent of the other trials.

The probability of a success (p) is the same for each trial.

All of the outcomes have an equal probability of occurrence and are

We can say that the probabilities of occurrence is uniformly distributed.

This is referred to as Uniform Distribution

Useful when we are interested in unbiased selection

It is characterized by a symmetric bell-shaped curve having two parameters - mean (μ)and

Why is it called the normal distribution?

They are commonly found everywhere starting from nature to industry

Many useful datasets are approximately normally distributed

1. The graph of the normal 1. About 68% of the data fall

𝜇-3𝜎 𝜇-2𝜎 𝜇-𝜎 𝜇+𝜎 𝜇+2𝜎 𝜇+3𝜎

Using the Empirical Rule, we can determine that

● About 68% of the delivery times are

This property is known as Empirical rule.

To find the area, we need Calculus

But, there is an easier way to do it in Python (or other softwares).

It always has a mean of 0 and standard deviation of 1

Normal Distribution Standard Normal Distribution

A normal variable can be converted to standard normal variable by subtracting the

In many of the situations, what we have available to us is a sample of data.

The data we have is finite.

Moving ahead, we want to make inferences about the

Allows all the entities in the population to have

Population Sampling Distribution

The standard deviation of 𝑿

Even the population is not normally distributed, then for sufficiently

The sampling distribution of the sample means will approach

Large sample size provides better estimate of

For sample size n = 5, the mean of sample

For sample size n = 30, the mean of sample

Point Estimation Interval Estimation

Confidence interval provides an interval, or a range of values, which is expected

Interpretation of 95% Confidence Interval

- The interpretation of a 95% confidence interval is that, if the process is repeated a

Why not 100% Confidence Interval?

- A 100% confidence interval will include All possible values.

You might also like