0% found this document useful (0 votes)
62 views25 pages

TOAE201-LecturerNotes-Chapter 4. Sample Theoretical Basis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views25 pages

TOAE201-LecturerNotes-Chapter 4. Sample Theoretical Basis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER 4

SAMPLE THEORETICAL BASIS

4.1 SAMPLING FROM A POPULATION ........................................................................ 1


4.2. SAMPLING DISTRIBUTIONS OF SAMPLE MEANS ............................................ 5
4.3. SAMPLING DISTRIBUTION OF THE PROPORTION ......................................... 19
4.4. SAMPLING DISTRIBUTIONS OF SAMPLE VARIANCES ................................. 23

Textbook: Paul Newbold, William [Link], Betty Thorne, 2010, Statistics for Business
and Economics, 7th edition, Pearson.

4.1 SAMPLING FROM A POPULATION

Reasons for Sampling


There are many practical reasons for choosing a sample rather than a population, to
estimate a characteristic of a population:

* Time
It may take to much time to contact the whole population. Even if you could
contact the whole population the results may be meaningless as they would be out
of date, e.g. if it took me 2 years to poll the voting population in the 1 year run up
to an election.

* Cost
The cost may be too high and hence prohibitive, e.g. if I was to poll the 90 million
people who vote a general election, the cost would be astronomical.

* May be Physically Impossible


It may well be impossible to track the whole population down, e.g. it would be
difficult to find all 90 million voters in the run up to a general election.
* Testing May Destroy Population
Testing every motor vehicle for how well it stands up in a crash, will result in
there being no motor vehicles being left to drive.

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 1
Results from the Sample are Adequate
The additional accuracy gained from testing a whole population rather than a sample may
not justify the additional time, cost or effort expended in doing so.

Sampling Methods
When sampling we must ensure that we choose a sample which is representative of the
whole population.

The sampling methods that follow are just a few ways in which sampling can be carried
out. Other methods also exist which are not discussed here.

Simple Random Sampling


A simple random sample of size n from a finite population of size N is a sample selected
such that each possible sample of size n has the same probability of being selected.

A sample is selected so that each item or person in the population has the same chance of
being selected.

Example
For example, market-research groups may use random numbers to select telephone
numbers to call and ask about preferences for a product. Various statistical computer
packages and spreadsheets have routines for obtaining random numbers, and these are
used for sampling studies.

Systematic Random Sampling


A random starting point is selected and then every kth member of the population is
selected.

The starting point would be a random number between 1 and k. Then we would pick
every 𝑘 𝑡ℎ number after that.

k is calculated as the population size N, divided by sample size, n. If k is not a whole


number, then round down to the next lowest number.

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 2
Example
If we had a population of size 2,000 and we wanted to choose a sample of size 100, then
2,000
k= = 20 . We would then choose a random number between 1 and 20 as our
100
starting point.

If we choose 18 as our random starting point, then starting with the 18th observation,
every 20th observation (18, 38, 58,……) would be chosen. We would end up with a
sample of 100 observations.

Stratified Random Sampling


The population is divided up into subgroups called strata, and a simple random sample is
randomly selected from each stratum. The best results are obtained when the elements
within each stratum are much alike (homogeneous).

This is used to guarantee that each group is represented in the sample.

Example
Consider the advertising expenditure for the largest 352 companies in the United States.
Suppose we wanted to study whether firms with high returns on equity, spent more of
each sales dollar on advertising than firms with a low return or deficit.

Stratum Profitability Number of Relative Number


(return on equity) Firms Frequency Sampled
1 30% and over 8 0.02 1
2 20% up to 30% 35 0.10 5
3 10% up to 20% 189 0.54 27
4 0% up to 10% 115 0.33 16
5 Deficit 5 0.01 1

Total 352 1.00 50


If we use simple random sampling firms in the 3rd and 4th strata would have a much
higher chance (87%) of being chosen, whereas firms in the 1st and 5th strata would have
little chance of being chosen and may well not be chosen at all.

If we want a sample of 50 firms, we can guarantee representation from each group by


randomly choosing:
50x0.02= 1 from the 1st strata
50x0.10= 5 from the 2nd strata
50x0.54= 27 from the 3rd strata
50x0.33= 16 from the 4th strata
50x0.01= 1 from the 5th strata

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 3
Cluster Sampling
A population is divided into clusters using naturally occurring geographic or other
boundaries. Ideally each cluster is a representative small scale version of the population
(i.e. heterogeneous). A simple random sample of the clusters is then chosen. All elements
within each sampled (chosen) cluster form the sample.

So here, we will not have all clusters (groups) represented in our sample.

Sampling “Error”
Sampling error is the difference between a sample statistic and its corresponding
population parameter. In the case of the mean, it is

X − Where:
X = the mean of the sample
 = the population mean

Samples are used to estimate population characteristics. For example the mean of a
sample is used to estimate the mean of the population. However, since the sample is only
part of the population, it is unlikely that the sample mean will be exactly equal to the
population mean.

Likewise, the sample standard deviation is unlikely to be exactly the same as the
population standard deviation.

We would not be surprised then if the sample statistic is different from the corresponding
population parameter.

This difference is called the sampling error.

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 4
Example
Consider the population of 5 employees at Spurs Industries.
Last week the output for each employee was 97, 103, 96, 99 and 105 units.
Suppose we select two employees whose output was 97 and 105. The mean of this
97 + 105
sample is = 101.
2
Suppose we select another two employees whose output was 103 and 96. The mean of
103 + 96
this sample is = 99.5.
2
97 + 103 + 96 + 99 + 105
We know however that the mean of the population is = 100
5

The sampling error for the 1st sample is 1 (101-100).


This was found from X −  , where x = sample mean &  = population mean.
The sampling error for the second sample is -0.5 (99.5-100)

Each of these differences, 1.0 and -0.5, is the sampling error made in estimating the
population mean based on the sample mean.

Each of the possible samples of size 2 has an equal chance of selection. Each sample may
have a different sample mean and hence, sampling error. The value of the sampling error
is based on the random selection of the sample. Therefore, sampling errors are random
and occur by chance.

4.2. SAMPLING DISTRIBUTIONS OF SAMPLE MEANS

Here our random variable will be a mean. Each observation represents the average of a
sample of size n.

Organizing the means of all possible samples of size n, into a probability distribution
would result in us obtaining the sampling distribution of the sample mean.

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 5
Example-Constructing Sampling Distribution of Sample Mean
Yerani Industries has seven production employees (the population). The hourly earnings
for each employee are given in the table below.

Employee Hourly Earnings

Joe $7
Sam $7
Sue $8
Bob $8
Jan $7
Art $8
Ted $9

What is the sampling distribution of the sample mean of the samples of size 2?

To arrive at the sampling distribution of the sample mean, all possible samples of size 2
need to be selected without replacement from the population, and their means computed.

7!
There are 21 possible samples ( 7 C 2 = = 21 ).
2!5!
Listed below are all the 21 sample means from all samples of size 2.

Sample Employees Earnings Mean Sample Employees Earnings Mean

1 Joe, Sam 7, 7 7.00 12 Sue, Bob 8, 8 8.00


2 Joe, Sue 7, 8 7.50 13 Sue, Jan 8, 7 7.50
3 Joe, Bob 7, 8 7.50 14 Sue, Art 8, 8 8.00
4 Joe, Jan 7, 7 7.00 15 Sue, Ted 8, 9 8.50
5 Joe, Art 7, 8 7.50 16 Bob, Jan 8, 7 7.50
6 Joe, Ted 7. 9 8.00 17 Bob, Art 8, 8 8.00
7 Sam, Sue 7, 8 7.50 18 Bob, Ted 8, 9 8.50
8 Sam, Bob 7, 8 7.50 19 Jan, Art 7, 8 7.50
9 Sam, Jan 7, 7 7.00 20 Jan, Ted 7, 9 8.00
10 Sam, Art 7, 8 7.50 21 Art, Ted 8, 9 8.50
11 Sam, Ted 7, 9 8.00

Sampling Distribution of the Sample Mean for n=2

Sample Mean Number of Means Probability


$7.00 3 0.1429
$7.50 9 0.4285
$8.00 6 0.2857
$8.50 3 0.1429

Total 21 1.00
Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 6
Sampling Distribution of the Mean of Two Dice

Let us consider rolling a fair die an infinite number or times. We know that the possible
outcomes are 1, 2, 3, 4, 5, 6. The probability distribution of the random variable X is:

X 1 2 3 4 5 6
P(X) 1/6 1/6 1/6 1/6 1/6 1/6

The mean of this population is 3.5, from:

 =  xP(x)

1 1 1 1 1 1


= 1  + 2  + 3  + 4  + 5  + 6  = 3.5
6 6 6 6 6 6

The variance is 2.92, from:

 2 =  (x −  )2 P( x)

2 1  2 1  2 1  2 1  2 1  2 1 
= (1 − 3.5)   + (2 − 3.5)   + (3 − 3.5)   + (4 − 3.5)   + (5 − 3.5)   + (6 − 3.5)   = 2.92
6 6 6 6 6 6

The Distribution of x is

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 7
We can create the sampling distribution of the mean of two dice, by drawing samples of
size 2 from the population. For each sample of two dice we add their scores and divide by
2 – that is, take the average. We have constructed a new random variable x .
Each sample mean can then be recorded. Using classical probability this will lead to the
following table:

Sample # Sample x
1 1, 1 1.0
2 1, 2 1.5
3 1, 3 2.0
4 1, 4 2.5
5 1, 5 3.0
6 1, 6 3.5
7 2, 1 1.5
8 2, 2 2.0
9 2, 3 2.5
10 2, 4 3.0
11 2, 5 3.5
12 2, 6 4.0
12 3, 1 2.0
14 3, 2 2.5
15 3, 3 3.0
16 3, 4 3.5
17 3, 5 4.0
18 3, 6 4.5
19 4, 1 2.5
20 4, 2 3.0
21 4, 3 3.5
22 4, 4 4.0
23 4, 5 4.5
24 4, 6 5.0
25 5, 1 3.0
26 5, 2 3.5
27 5, 3 4.0
28 5, 4 4.5
29 5, 5 5.0
30 5, 6 5.5
31 6, 1 3.5
32 6, 2 4.0
33 6, 3 4.5
34 6, 4 5.0
35 6, 5 5.5
36 6, 6 6.0

There are 36 possible samples of size 2. Each sample outcome is equally likely and has a
probability of 1/36 of occurring. x can assume only 11 different possible values: 1.0,
1.5, ……….6.0, with certain values of x occurring more frequently than others.

We can construct the sampling distribution of our new random variable x .


Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 8
Sampling Distribution of x

x P(x )
1.0 1/36
1.5 2/36
2.0 3/36
2.5 4/36
3.0 5/36
3.5 6/36
4.0 5/36
4.5 4/36
5.0 3/36
5.5 2/36
6.0 1/36

The value x =1.0 occurs only once, so its probability is 1/36. The value of x =3.5 occurs
6 times, so its probability is 6/36.

The mean of the sampling distribution of x is 3.5, from:

 x =  x P(x )

 1   2   3   3   2   1 
= 1.0  + 1.5  + 2.0  + ...........5.0  + 5.5  + 6.0  = 3.5
 36   36   36   36   36   36 

The variance is 1.46, from:

 2 x =  (x −  x )2 P( x )

2 1  2 2  2 1 
= (1.0 − 3.5)   + (1.5 − 3.5)   + ................ + (6.0 − 3.5)   = 1.46
 36   36   36 

Note that the mean of 3.5 is the same as the mean of the population of tossing a die.

Further, note that the variance of the sampling distribution of x , where n=2, is 1.46
which is exactly half the variance of the population of the toss of a die (2.92).

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 9
Sampling Distribution of X for n = 2

Compare this to the distribution of X. They are quite different distributions.

The Distribution of x is

Repeating the experiment with larger sample sizes n, the sampling distribution tends to
resemble a normal probability distribution.

Sampling Distribution of X for n = 5

Mean of x =3.5 and variance of x =2.92/5

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 10
Sampling Distribution of X for n = 10

Mean of x =3.5 and variance of x =2.92/10

Sampling Distribution of X for n = 25

Mean of x =3.5 and variance of x =2.92/25

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 11
For each value of n, the mean of the sampling distribution is exactly the same as the
population mean.

That is:

Mean of the Population = Mean of the Sampling Distribution of sample mean

And after further investigation we see that:


2
Variance of the sampling distribution of sample mean is
n


and the standard deviation is This is known as the standard error of the mean.
n
Where:

 = the standard deviation of the population


n = the size of the sample

Also as the sample size n, increases, the sample means tend to cluster around the true
population mean.

We now develop important properties of the sampling distribution of the sample means.
Our analysis begins with a random sample of n observations from a very large population with
mean  and variance 2; the sample observations are random variables X1, X2, . . . , Xn. Before the
sample is observed, there is uncertainty about the outcomes.
This uncertainty is modeled by viewing the individual observations as random variables from a
population with mean  and variance 2. Our primary interest is in making
inferences about the population mean . An obvious starting point is the sample mean.

At this point we cannot determine the shape of the sampling distribution, but we can determine
the mean and variance of the sampling distribution from basic definitions we learned in Chapters
2.
In Chapters 2 and 3 we saw that the expectation of a linear combination of random variables is
the linear combination of the expectations:

In Chapters 2 and 3 we saw that the variance of a linear combination of independent random
variables is the sum of the linear coefficients squared times the variance of the random variables.
It follows that

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 12
Central Limit Theorem
The sampling distribution of the mean of a random sample drawn from any population is
approximately normal for a sufficiently large sample size, typically taken to be least 30
observations. The larger the sample size the more closely the sampling distribution will
resemble a normal distribution.

This means that as the sample size, n, gets larger, the sample means tend to follow a
normal probability distribution and tend to cluster around the true population mean. This
holds regardless of the distribution of the population from which the sample was drawn.

In summary, regardless of the type of distribution for which one draws a random sample,
the sampling distribution will be normal under certain conditions:

1. if the population distribution is normal N(, 2) the sampling distribution will be
normal N(, 2/n) regardless of sample size.
2. if the population distribution is approximately normal, the sample distribution will
be approximately normal.
3. if the population is not normal, the sample distribution will be approximately
normal if the sample is large enough, typically taken to be least 30.

Example
Here we have an underlying normal distribution x, with mean =  and
standard deviation =  .
Normal Distribution of x, with Mean =  and Standard Deviation = 
NORMAL DISTRIBUTION

x

We will generate the x distribution (sample size n), with mean =  and standard
deviation =  . This will also be a normal distribution – according to the central limit
n
theorem.
Normal Distribution of x , with Mean =  and Standard Deviation = 
n

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 13
NORMAL DISTRIBUTION

Finally we will standardize the distribution of x , giving us a standard normal with


mean=  and standard deviation = 0.
Standard Normal Distribution of x , with Mean =0 and Standard Deviation = 1
NORMAL DISTRIBUTION

z

This rule holds true for any underlying distribution x of x . So even if the underlying
distribution, x, was not normal, the distribution of x would still be normal with mean =
 and standard deviation =  . This is the result of the central limit theorem and we
n
must bear in mind that the sample size n must be at least 30.

Standard Deviation of Sample Means –


Standard Error of the Mean


x = when the population standard deviation,  , is KNOWN
n

s
sx = when the population standard deviation,  , is UNKNOWN
n

 x = the standard deviation of the sample means


 = the standard deviation of the population
s = the standard deviation of the sample
Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 14
sx = estimate of the standard deviation of the sample mean
n = sample size
Mean of the Sample Means- with Known Population Mean

x = 
 x = the mean of the sample means
 = the mean of the population

Mean of the Sample Means - with Unknown Population Mean

Here we take the average of the sample means and use that as an approximation to the
population mean. It is denoted by x .

Solving Sample Mean Probability Problems


Since the sample means tend to follow a normal probability distribution – (we know this
by looking at the Central Limit Theorem) we can use the ideas discussed earlier to
compute the probability that a sample mean will fall within a certain range.

We will want to convert any normal distribution to a STANDARD normal distribution.

NOTE: We will convert using

When the population standard deviation and population mean are both KNOWN.

X −
z=
 n

When the population standard deviation is UNKNOWN and the population mean is
KNOWN.

𝑋̄ − 𝜇
𝑇=
𝑠⁄√𝑛

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 15
Where:

X = variable for sample mean


 = mean of the population
 = the standard deviation of the population
s = the standard deviation of the sample
n = sample size

Example
The foreman of a bottling plant has observed that the amount of soda in each 32-ounce
bottle is actually a normally distributed random variable, with mean of 32.2 ounces and a
standard deviation of 0.3 ounces.

a) If a customer buys one bottle, what is the probability that the bottle will contain more
than 32 ounces?

b) If a customer buys a carton of four bottles, what is the probability that the mean
amount of the four bottles will be greater than 32 ounces?

Solution a.

(The solution uses table 1. You may also use tables 1 or Excel to solve. The methods to
achieve this were covered at great length in the previous chapter.)

Let X be the random variable representing the amount of soda in one bottle.
It is normally distributed with mean = 32.2 and SD = 0.3

X − mean 32 − 32.2
P( X  32) = P(  ) = P( Z  −0.67)
SD 0.3

NORMAL DISTRIBUTION
Mean=0 & SD=1
We require this area
where z is greater than -0.67

z
–0.67 0

= P(−0.67  Z  0) + 0.5
= 0.2486 + 0.5 = 0.7486

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 16
Solution b.

(The solution uses table B1. You may also use tables B1 or Excel to solve. The methods
to achieve this were covered at great length in the previous chapter.)

Let X be the random variable representing the average amount of soda in four bottles.
0.3
It is normally distributed with mean = 32.2 and SD = = 0.15
4
X − mean 32 − 32.2
P ( X  32) = P (  ) = P ( Z  −1.33)
SD 0.15

NORMAL DISTRIBUTION
Mean=0 & SD=1
We require this area
where z is greater than -1.33

z
–1.33 0

= P(−1.33  Z  0) + 0.5

= 0.4082 + 0.5 = 0.9082

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 17
Example

A real estate exams scores are normally distributed with mean 430 and standard deviation 20.
If we randomly selected 50 exams what is the probability that the sample mean of these 50
exams would exceed a score of 458?

(The solution uses table 1. You may also use tables 1 and Excel to solve. The methods to
achieve this were covered at great length in the previous chapter.)

The distribution X , has mean  =430 standard deviation  =20.


The distribution X is normal and has mean  =430 standard deviation  = 20 .
n 50
X can be standardized so that we can use the tables.
458 − 430 28
p ( X  458) = p ( Z  ) = p( Z  ) = p ( Z  9.899)
20 2.828
50
Normalizing

We require p( Z  9.899) . Graphically this is:

NORMAL DISTRIBUTION
Mean=0 & SD=1
We require this area
where z is greater than 9.899

z
0 9.899

This is approximately zero.

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 18
4.3. SAMPLING DISTRIBUTION OF THE PROPORTION

We may be interested in testing measures other than the sample mean. We may be
interested in measuring the percentage of people in the work force that would opt for
early retirement. Each person has two choices of either agreeing with early retirement or
not. This experiment follows a binomial probability distribution.

The sample proportion can be calculated by

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒


𝑛 𝑡𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒

This is just the probability of success based on the sample.

As we do not know the proportion of people in the population of the workforce that
would opt for early retirement we can take samples and calculate the approximate
population proportion.

If the samples are large enough we may use the normal distribution as an approximation
to the binomial.

The conditions that must apply for this to be the case are:

If 𝑛 ∗ 𝑝(1 − p) > 5 where:

𝑝 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

𝑛 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑖𝑛 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 19
Example

Suppose we take 10 sample groups of 150 people in each group, and record the number
of people in each group that agree with early retirement. The following are the results:

Sample Number of Sample


Group Successes Proportion
26⁄
1 26 150 = 0.173
18⁄
2 18 150 = 0.120
21⁄
3 21 150 = 0.140
30⁄
4 30 150 = 0.200
24⁄
5 24 150 = 0.160
21⁄
6 21 150 = 0.140
16⁄
7 16 150 = 0.107
28⁄
8 28 150 = 0.187
35⁄
9 35 150 = 0.233
27⁄
10 27 150 = 0.180

Averaging these all out gives an approximation for the population proportion:

0.173 + 0.12 + 0.14 + 0.2 + 0.16 + 0.14 + 0.107 + 0.187 + 0.233 + 0.18
= 0.164
10

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 20
The standard error of estimate of this sampling distribution is:

𝑝(1 − 𝑝) 0.164(1 − 0.164)


𝜎𝑝 = √ =√ = √0.000914 = 0.030
𝑛 150

Now we can answer such questions as, “what is the probability that 20% or less of the
workforce will agree with early retirement?” We already have the mean and standard
error and we know we can use the normal distribution to approximate the binomial.

That is 𝑝(𝑝̅ < 0.20)

Sampling Distribution
of the Proportion

 0.2 p bar

Standardizing, gives

𝑝̅ −𝑝𝑚𝑒𝑎𝑛 0.20−0.164
𝑝( < ) = 𝑝(𝑧 < 1.20) = 0.8849
𝜎𝑝 0.030

Sampling Distribution
of the Proportion

0.8849
 1.2 z

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 21
Exercises
1. A normal population has a mean of 60 and a standard deviation of 12. You select a
random sample of 9. Compute the probability the sample mean is:
a. Greater than 63.
b. Less than 56.
c. Between 56 and 63.
2. A population of unknown shape has a mean of 75. You select a sample of 40. The
standard deviation of the sample is 5. Compute the probability the sample mean is:
a. Less than 74.
b. Between 74 and 76.
c. Between 76 and 77.
d. Greater than 77.
3. In a certain section of Southern California, the distribution of monthly rent for a one-
bedroom apartment has a mean of $2,200 and a standard deviation of $250. The
distribution of the monthly rent does not follow the normal distribution. In fact, it is
positively skewed. What is the probability of selecting a sample of 50 one-bedroom
apartments and finding the mean to be at least $1,950 per month

4. According to an IRS study, it takes an average of 330 minutes for taxpayers to prepare,
copy, and electronically file a 1040 tax form and finds the standard deviation of the time
to prepare, copy, and electronically file form 1040 is 80 minutes. A consumer watchdog
agency selects a random sample of 40 taxpayers.
a. What assumption or assumptions do you need to make about the shape of the
population?
b. What is the standard error of the mean?
c. What is the likelihood the sample mean is greater than 320 minutes?
d. What is the likelihood the sample mean is between 320 and 350 minutes?
e. What is the likelihood the sample mean is greater than 350 minutes?
Paul Newbold: Exercises 6.5 - 6.12 pp.262-263
6.26 - 6.32 pp. 268-269

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 22
4.4. SAMPLING DISTRIBUTIONS OF SAMPLE VARIANCES
Now that sampling distributions for sample means and proportions have been developed,
we consider sampling distributions of sample variances.
We begin by considering a random sample of n observations drawn from a population
with unknown mean  and unknown variance 2. Denote the sample members as x1, x2,
…, xn.

The conclusion that the expected value of the sample variance is the population variance is quite
general. But for statistical inference we would like to know more about the sampling distribution.
If we can assume that the underlying population distribution is normal, then it can be shown that
the sample variance and the population variance are related through a probability distribution
known as the chi-square distribution.

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 23
Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh
(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 24
Thus, if we have a random sample from a population with a normal distribution, we can make
inferences about the sample variance s2 by using s2 and the chi-square distribution.

Exercise
A random sample of size n = 16 is obtained from a normally distributed population
with a population mean of  = 100 and a variance of 2 = 25.
a. What is the probability that x  101 ?
b. What is the probability that the sample variance is greater than 45?
c. What is the probability that the sample variance is greater than 60?

Solution
a) Let X be the random variable representing the population. X  N(, 2)
101 − 100 
( ) 
p x  101 = p  Z 
 5  = p ( Z  0.2 ) = 0.42

 2 (n − 1)S 2 (n − 1).45 (16 − 1).45 
b) p ( S  45) = p  χ n −1 =
2
 = = 27  = 0.029
 σ 2
σ 2
25 

 (n − 1)S 2 (n − 1).60 (16 − 1).60 


c) p ( S 2  60 ) = p  χ n2−1 =  = = 36  = 0.002
 σ 2
σ 2
25 

Exercises: 6.48-6.51 (Paul Newbold)

Chapter 4 – TOAE201 Lecture Notes by Vuong Thi Thao Binh


(Some contents of this slide are based on Lecture Notes by Panayiotis Skordi,
Fullerton University) 25

You might also like