Lecture On Sampling Distributions
Lecture On Sampling Distributions
12/08/21 3
The Population
4’s
5’s
2’s
12/08/21 4
Characteristics of the Population
For this population, we have the parameters:
= (2)(.4) + (4)(.5) + (5)(.1) = .8 + 2 + .5 = 3.3
2 = (2 - 3.3)2(.4) + (4 - 3.3)2(.5) + (5 - 3.3)2(.1) = 1.21
= (1.21)1/2 = 1.1
0.6
0.4
0.2
0
2 3 4 5
12/08/21 5
Possible Outcomes of Sampling Process
Now, consider the sampling process of taking n = 2
observations (with replacement) from this population or
distribution. Below is a table of possibilities.
12/08/21 6
Some Points about the Preceding Table
Since we are sampling with replacement, to obtain the
probability of each possible sample, we simply
multiply the probabilities of each of the observations
(Think of a tree diagram!).
The 9 possible samples represent the elementary
events of the experiment of taking a sample of size 2
from the population or distribution.
The sample mean ( X ) is obtained the usual way.
The sample variance is computed the usual way. For
example, for the second sample, we have
S2 = [(2-3)2 + (4-3)2]/(2-1) = [1 + 1]/1 = 2
12/08/21 7
Sample Statistics as Random Variables
Since the sample mean and the sample variance are
numerical characteristics of each of the possible
samples, they can be viewed as random variables in
this sampling experiment.
Therefore, we could obtain the probability
distributions of the sample mean and sample
variance.
These probability distributions are called sampling
distributions.
Thus we will have the sampling distribution of the
sample mean, as well as the sample variance.
12/08/21 8
Sampling Distribution of the Sample Mean
From the earlier table, we could construct the
probability distribution of the sample mean, now
called the sampling distribution of the sample mean.
This is given by the following table.
X P(X ) X * P(X ) ( X 3 .3 ) 2 P ( X )
2 .1 6 0 .3 2 .2 7 0 4
3 .2 0 + .2 0 = .4 0 1 .2 0 .0 3 6 0
3 .5 .0 4 + .0 4 = .0 8 0 .2 8 .0 0 3 2
4 .2 5 1 .0 0 .1 2 2 5
4 .5 .0 5 + .0 5 = .1 0 0 .4 5 .1 4 4 0
5 .0 1 0 .0 5 .0 2 8 9
Sum s 1 .0 0 3 .3 .6 0 5 0
12/08/21 9
Graph of the Sampling Distribution of
the Sample Mean
Sampling Distribution of the Sample Mean Based on
a Sample of Size n = 2
0.4
0.3
P (X b a r) 0.2
0.1
0.0
2 3 4 5
XBar
12/08/21 12
Sampling Distribution of the Sample Mean
12/08/21 14
Some Methods for Obtaining Sampling
Distributions of Statistics
Complete enumeration, if possible.
Computer simulation or via the Monte Carlo method.
In this method the computer generates many, many
samples, and then constructs the probability
histogram of the values of the statistic of interest. This
will provide an empirical approximation.
Using theoretical results such as, for instance, when
sampling from a Bernoulli population the number of
successes is binomially-distributed.
Using theoretical approximations such as the Central
Limit Theorem or the deMoivre approximation.
12/08/21 15
Illustrating the Monte Carlo Method
12/08/21 16
First 10 of the 500 Generated Samples
The table below shows the first 10 samples of size n
= 10 that were generated from the population.
Also included are their corresponding sample means.
y p(y) x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 SampleMean
2 0.4 4 2 2 5 4 2 4 2 2 4 3.1
4 0.5 4 4 2 2 5 4 4 4 2 2 3.3
5 0.1 4 2 2 2 2 2 4 4 4 5 3.1
2 2 4 2 2 2 2 2 2 2 2.2
2 5 2 4 4 4 2 2 5 2 3.2
4 4 4 2 2 4 2 2 2 4 3.0
4 4 4 4 2 2 5 2 2 4 3.3
2 2 2 4 2 2 4 2 2 2 2.4
2 4 2 2 4 5 5 2 2 2 3.0
2 5 4 4 2 2 4 4 4 2 3.3
12/08/21 17
Relative Frequency Histogram of the 500
Sample Means
Simulated Sampling Distribution of the Sample
Mean Based on 10 Observations when Sampling
from the Population p(2) = .4, p(4) = .5, and p(5) = .1
Relative Frequency (in %)
10
2 3 4
Sample Mean
12/08/21 18
Points to Ponder
This relative frequency histogram of the simulated
sample means serves as an approximation to the
sampling distribution of the sample mean when n =
10 and when sampling from the given population.
Notice that the values of the sample means are now
clustered around the population mean of 3.3, and
furthermore, the shape of the histogram is almost
bell-shaped.
Looking at this histogram, it also shows that the
chances of getting a sample of size n = 10 whose
sample mean is less than 2.5 or greater than 4.5 is
rather small.
12/08/21 19
When the mean of the 500 sample means is
computed, it turns out to be 3.3094. [Their median is
exactly 3.30!]
Recall that the population mean is 3.30.
The standard deviation of the 500 sample means turns
out to be 0.3497.
Recall that the population standard deviation is
(1.21)1/2 = 1.1, so
1.1 1.1
.3478.
n 10 3.1622
12/08/21 20
We therefore note that the mean of the simulated
sample means is very close to the population mean,
and
the standard deviation of the simulated sample
means is also very close to the population standard
deviation divided by the square root of the sample
size.
Indeed, we always have the theoretical results:
X Mean of X
X Std. Error of X
n
12/08/21 21
An Important Result About the Sampling
Distribution of the Sample Mean
When the population being sampled is a
normal population with mean and standard
deviation , then the sampling distribution of
the sample mean is also normal with mean
and standard error of /n1/2, for any sample
size n.
When the population is not normal, however,
then the sampling distribution of the sample
mean need not be normal. But we have:
12/08/21 22
Central Limit Theorem
If a random sample of size n is taken from a
population or distribution with mean and standard
deviation , and if the sample size is large (n > 30),
then the sampling distribution of the sample mean is
approximately normal with mean and standard
deviation (or standard error) of /n1/2. That is,
2
X is approx. N , .
n
12/08/21 23
Uses of the Central Limit Theorem
Because of this approximation, when computing
probabilities associated with the sample mean, we
can use the approximation given below which uses
the standard normal distribution.
Note: Z N(0,1), the standard normal variable.
b
a
P a X b P
Z
.
n n
12/08/21 24
Applications of the CLT
Situation 1: Suppose we take a sample of
size n = 30 from the population described by
the probability function p(2) = 0.4, p(3) = 0.5,
p(5) = 0.1. This is the population we were
using earlier.
Question 1: We seek the approximate
probability that the sample mean is between
3.1 and 3.5.
Question 2: Find the approximate probability
that the sample mean is less than 2.6.
12/08/21 25
Applications … continued
Situation 2: The systolic blood pressure
population data set has mean = 114.58 and
standard deviation of = 14.06. Its
distribution is not normal as it is right-skewed.
Suppose we take a random sample of n = 50
people, and obtain the sample mean of their
systolic blood pressures.
Question 1: What is the approximate
probability that this sample mean will exceed
120?
12/08/21 26
Continued ...
12/08/21 27
Sampling a Bernoulli Population
A Bernoulli population is one where there are only
two possible values or outcomes, called a “Success,”
denoted by the value of X = 1, and a “Failure,”
denoted by a value of X = 0. The probability of a
“Success” is denoted by p.
For such a population we have:
Mean = = p;
Variance = 2 = p(1-p).
Consider now taking a sample of size n from this
population and letting p̂ equal the proportion of
“successes” in the sample. That is,
12/08/21 28
Sample Proportion
Because the Bernoulli observations are either
0 or 1 (with 1 representing “success”), then
the sample proportion could be defined via:
pq
pˆ is approx. N pˆ p, pˆ
2
.
n
12/08/21 30
An Application
Situation: One of the ways most Americans relieve
stress is to reward themselves with sweets.
According to one study, 46% admit to overeating
sweet foods when stressed. Suppose that the 46%
figure is correct and we take a random sample of size
n = 100 Americans and ask them if they overeat
sweets when they are stressed out.
12/08/21 31