0% found this document useful (0 votes)
109 views31 pages

Lecture On Sampling Distributions

The document discusses sampling distributions and how they relate to making statistical inferences about population parameters. It explains that sampling from a population can be viewed as a random experiment, with possible samples as outcomes. Sample statistics like the mean are then random variables with their own probability distributions called sampling distributions. These distributions have means and variances that can be used to infer properties of the population being sampled from. The document provides an example to illustrate these concepts.

Uploaded by

shahidanahmad
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
0% found this document useful (0 votes)
109 views31 pages

Lecture On Sampling Distributions

The document discusses sampling distributions and how they relate to making statistical inferences about population parameters. It explains that sampling from a population can be viewed as a random experiment, with possible samples as outcomes. Sample statistics like the mean are then random variables with their own probability distributions called sampling distributions. These distributions have means and variances that can be used to infer properties of the population being sampled from. The document provides an example to illustrate these concepts.

Uploaded by

shahidanahmad
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 31

Sampling Distributions

Stat 515 Lecture


Inching Towards Inference
 Recall that one of our main goals is to make inference
about the unknown parameters of the population or the
distribution, such as the mean , the standard
deviation , or some other summary measures such
as the median, etc.
 We now have possible models for the population,
which are provided by the probability distributions
(Binomial, Poisson, Normal, Uniform, others).
 We also know how to compute sample statistics such
as the sample mean, sample standard deviation, and
others, with these sample statistics to be used for
making inference about the parameters.
12/08/21 2
Sampling as a Random Experiment

 To understand the notion of a sampling distribution of


a sample statistic, it is important to realize that the
process of taking a sample from a population could be
viewed as a random experiment.
 To illustrate this idea, consider a population taking 3
values: 2, 4, 5 according to the following probability
distribution.
 Probability Function: p(2) = .4, p(4) = .5, p(5) = .1
 You may imagine that 40% of all the values in the
population equals 2; 50% equals 4; and 10% equals 5.

12/08/21 3
The Population

4’s

5’s
2’s

12/08/21 4
Characteristics of the Population
 For this population, we have the parameters:
  = (2)(.4) + (4)(.5) + (5)(.1) = .8 + 2 + .5 = 3.3
 2 = (2 - 3.3)2(.4) + (4 - 3.3)2(.5) + (5 - 3.3)2(.1) = 1.21
  = (1.21)1/2 = 1.1

 Its shape is given by the bar graph below:

0.6
0.4
0.2
0
2 3 4 5

12/08/21 5
Possible Outcomes of Sampling Process
 Now, consider the sampling process of taking n = 2
observations (with replacement) from this population or
distribution. Below is a table of possibilities.

Possible Probability of Sample Mean Sample


Samples Sample Variance
(2, 2) (.4)(.4) = .16 2 0
(2, 4) (.4)(.5) = .20 3 2
(2, 5) (.4)(.1) = .04 3.5 4.5
(4, 2) (.5)(.4) = .20 3 2
(4, 4) (.5)(.5) = .25 4 0
(4, 5) (.5)(.1) = .05 4.5 .5
(5, 2) (.1)(.4) = .04 3.5 4.5
(5, 4) (.1)(.5) = .05 4.5 .5
(5, 5) (.1)(.1) = .01 5 0

12/08/21 6
Some Points about the Preceding Table
 Since we are sampling with replacement, to obtain the
probability of each possible sample, we simply
multiply the probabilities of each of the observations
(Think of a tree diagram!).
 The 9 possible samples represent the elementary
events of the experiment of taking a sample of size 2
from the population or distribution.
 The sample mean ( X ) is obtained the usual way.
 The sample variance is computed the usual way. For
example, for the second sample, we have
 S2 = [(2-3)2 + (4-3)2]/(2-1) = [1 + 1]/1 = 2
12/08/21 7
Sample Statistics as Random Variables
 Since the sample mean and the sample variance are
numerical characteristics of each of the possible
samples, they can be viewed as random variables in
this sampling experiment.
 Therefore, we could obtain the probability
distributions of the sample mean and sample
variance.
 These probability distributions are called sampling
distributions.
 Thus we will have the sampling distribution of the
sample mean, as well as the sample variance.
12/08/21 8
Sampling Distribution of the Sample Mean
 From the earlier table, we could construct the
probability distribution of the sample mean, now
called the sampling distribution of the sample mean.
 This is given by the following table.

X P(X ) X * P(X ) ( X  3 .3 ) 2 P ( X )

2 .1 6 0 .3 2 .2 7 0 4
3 .2 0 + .2 0 = .4 0 1 .2 0 .0 3 6 0
3 .5 .0 4 + .0 4 = .0 8 0 .2 8 .0 0 3 2
4 .2 5 1 .0 0 .1 2 2 5
4 .5 .0 5 + .0 5 = .1 0 0 .4 5 .1 4 4 0
5 .0 1 0 .0 5 .0 2 8 9
Sum s 1 .0 0 3 .3 .6 0 5 0

12/08/21 9
Graph of the Sampling Distribution of
the Sample Mean
Sampling Distribution of the Sample Mean Based on
a Sample of Size n = 2
0.4

0.3

P (X b a r) 0.2

0.1

0.0
2 3 4 5
XBar

 Note that it has become more concentrated near the


population mean of 3.3, compared to the original
distribution.
12/08/21 10
Parameters of the Sampling Distribution
 Because the sampling distribution is just like any
other probability distribution, we are also able to
obtain its mean, variance, and standard deviation.
 Thus, for the sampling distribution of the sample
mean, we find the mean to be 3.3, which coincides
with the original population mean; while
 the variance of the sampling distribution of the
sample mean turns out to be equal to .605, which is
equal to (1.21)/2, the population variance divided by
the sample size.
 The standard deviation of the sample mean, now
called the standard error (SE), is (.605)1/2 = .7778.
12/08/21 11
Recapitulation
 Sampling from a probability distribution or population
could be viewed as a random experiment, and the
elementary outcomes are the possible samples.
 Sample statistics, such as the sample mean, could
be viewed as random variables, and as such have
their associated probability distributions, which are
called sampling distributions.
 The sampling distribution also has a mean.
 And it also has a variance.
 The standard deviation of the sampling distribution is
called the standard error (SE).

12/08/21 12
Sampling Distribution of the Sample Mean

 The mean of the sampling distribution of the sample


mean equals the population mean.

 The variance of the sampling distribution of the


sample mean equals the population variance divided
by the sample size.

 These two characteristics are always true for the


sampling distribution of the sample mean when
sampling with replacement.
12/08/21 13
Obtaining Sampling Distributions
 In the example considered, we obtained the sampling
distribution of the sample mean by enumerating all
the possible samples that could arise.

 However, such a method is not feasible if the sample


size is large. For instance, if n = 10, then there will
be a total of (3)(3)(3)…(3) = 310 = 59049 possible
samples, and complete enumeration is not anymore
possible.

 How do we obtain sampling distributions?

12/08/21 14
Some Methods for Obtaining Sampling
Distributions of Statistics
 Complete enumeration, if possible.
 Computer simulation or via the Monte Carlo method.
In this method the computer generates many, many
samples, and then constructs the probability
histogram of the values of the statistic of interest. This
will provide an empirical approximation.
 Using theoretical results such as, for instance, when
sampling from a Bernoulli population the number of
successes is binomially-distributed.
 Using theoretical approximations such as the Central
Limit Theorem or the deMoivre approximation.
12/08/21 15
Illustrating the Monte Carlo Method

 We illustrate the use of the simulation or Monte Carlo


method by approximating the sampling distribution of
the sample mean based on n = 10 observations from
the population considered earlier which has:
 p(2) = .4, p(4) = .5, p(5) = .1
 We generate 500 samples of size n = 10 from this
population, and for each sample we compute the
sample mean.
 This simulation was done using Minitab.

12/08/21 16
First 10 of the 500 Generated Samples
 The table below shows the first 10 samples of size n
= 10 that were generated from the population.
 Also included are their corresponding sample means.
y p(y) x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 SampleMean
2 0.4 4 2 2 5 4 2 4 2 2 4 3.1
4 0.5 4 4 2 2 5 4 4 4 2 2 3.3
5 0.1 4 2 2 2 2 2 4 4 4 5 3.1
2 2 4 2 2 2 2 2 2 2 2.2
2 5 2 4 4 4 2 2 5 2 3.2
4 4 4 2 2 4 2 2 2 4 3.0
4 4 4 4 2 2 5 2 2 4 3.3
2 2 2 4 2 2 4 2 2 2 2.4
2 4 2 2 4 5 5 2 2 2 3.0
2 5 4 4 2 2 4 4 4 2 3.3
12/08/21 17
Relative Frequency Histogram of the 500
Sample Means
Simulated Sampling Distribution of the Sample
Mean Based on 10 Observations when Sampling
from the Population p(2) = .4, p(4) = .5, and p(5) = .1
Relative Frequency (in %)

10

2 3 4
Sample Mean

12/08/21 18
Points to Ponder
 This relative frequency histogram of the simulated
sample means serves as an approximation to the
sampling distribution of the sample mean when n =
10 and when sampling from the given population.
 Notice that the values of the sample means are now
clustered around the population mean of 3.3, and
furthermore, the shape of the histogram is almost
bell-shaped.
 Looking at this histogram, it also shows that the
chances of getting a sample of size n = 10 whose
sample mean is less than 2.5 or greater than 4.5 is
rather small.
12/08/21 19
 When the mean of the 500 sample means is
computed, it turns out to be 3.3094. [Their median is
exactly 3.30!]
 Recall that the population mean is 3.30.
 The standard deviation of the 500 sample means turns
out to be 0.3497.
 Recall that the population standard deviation is
(1.21)1/2 = 1.1, so

 1.1 1.1
   .3478.
n 10 3.1622
12/08/21 20
 We therefore note that the mean of the simulated
sample means is very close to the population mean,
and
 the standard deviation of the simulated sample
means is also very close to the population standard
deviation divided by the square root of the sample
size.
 Indeed, we always have the theoretical results:

 X  Mean of X  

 X  Std. Error of X 
n
12/08/21 21
An Important Result About the Sampling
Distribution of the Sample Mean
 When the population being sampled is a
normal population with mean  and standard
deviation , then the sampling distribution of
the sample mean is also normal with mean 
and standard error of /n1/2, for any sample
size n.
 When the population is not normal, however,
then the sampling distribution of the sample
mean need not be normal. But we have:
12/08/21 22
Central Limit Theorem
 If a random sample of size n is taken from a
population or distribution with mean  and standard
deviation , and if the sample size is large (n > 30),
then the sampling distribution of the sample mean is
approximately normal with mean  and standard
deviation (or standard error) of /n1/2. That is,

  2

X is approx. N   , .
 n 
12/08/21 23
Uses of the Central Limit Theorem
 Because of this approximation, when computing
probabilities associated with the sample mean, we
can use the approximation given below which uses
the standard normal distribution.
 Note: Z  N(0,1), the standard normal variable.

 
 b
  a  
P a  X  b  P

Z


.
 

 n n 

12/08/21 24
Applications of the CLT
 Situation 1: Suppose we take a sample of
size n = 30 from the population described by
the probability function p(2) = 0.4, p(3) = 0.5,
p(5) = 0.1. This is the population we were
using earlier.
 Question 1: We seek the approximate
probability that the sample mean is between
3.1 and 3.5.
 Question 2: Find the approximate probability
that the sample mean is less than 2.6.
12/08/21 25
Applications … continued
 Situation 2: The systolic blood pressure
population data set has mean  = 114.58 and
standard deviation of  = 14.06. Its
distribution is not normal as it is right-skewed.
Suppose we take a random sample of n = 50
people, and obtain the sample mean of their
systolic blood pressures.
 Question 1: What is the approximate
probability that this sample mean will exceed
120?
12/08/21 26
Continued ...

 Question 2: What would be the value of A


such that the probability that the sample
mean of the systolic blood pressures of a
sample of size 50 is greater than A is 0.95?

12/08/21 27
Sampling a Bernoulli Population
 A Bernoulli population is one where there are only
two possible values or outcomes, called a “Success,”
denoted by the value of X = 1, and a “Failure,”
denoted by a value of X = 0. The probability of a
“Success” is denoted by p.
 For such a population we have:
 Mean =  = p;
 Variance = 2 = p(1-p).
 Consider now taking a sample of size n from this
population and letting p̂ equal the proportion of
“successes” in the sample. That is,

12/08/21 28
Sample Proportion
 Because the Bernoulli observations are either
0 or 1 (with 1 representing “success”), then
the sample proportion could be defined via:

Number of " Successes"


pˆ 
n
1 n
  Xi
n i 1
 X.
12/08/21 29
Sampling Distribution of the Sample
Proportion
 Since the sample proportion is the sample mean of
the observations from a Bernoulli population, by the
Central Limit Theorem, it follows that the sampling
distribution of the sample proportion, when the
sample size is large (that is n > 30), is approximately
normal with mean of p and SE of [p(1-p)/n]1/2.

 pq 
pˆ is approx. N   pˆ  p,  pˆ 
2
.
 n 
12/08/21 30
An Application
 Situation: One of the ways most Americans relieve
stress is to reward themselves with sweets.
According to one study, 46% admit to overeating
sweet foods when stressed. Suppose that the 46%
figure is correct and we take a random sample of size
n = 100 Americans and ask them if they overeat
sweets when they are stressed out.

 Question 1: What is the probability that the


proportion who overeats sweets in this sample
exceeds 0.50?

12/08/21 31

You might also like