0% found this document useful (0 votes)
28 views20 pages

Limit Theoram

The lecture slides cover the Central Limit Theorem (CLT) and sampling distributions, illustrating how sample means approximate the population mean as sample size increases. Through examples, it demonstrates that while small samples may not yield a normal distribution, larger samples lead to a more normal distribution of sample means. The slides also discuss the importance of sample size in applying the CLT and provide exercises related to estimating probabilities in non-normal distributions.

Uploaded by

Sunil Rathee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views20 pages

Limit Theoram

The lecture slides cover the Central Limit Theorem (CLT) and sampling distributions, illustrating how sample means approximate the population mean as sample size increases. Through examples, it demonstrates that while small samples may not yield a normal distribution, larger samples lead to a more normal distribution of sample means. The slides also discuss the importance of sample size in applying the CLT and provide exercises related to estimating probabilities in non-normal distributions.

Uploaded by

Sunil Rathee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STAT 22000 Lecture Slides

Variability in Estimates &


Central Limit Theorem

Yibi Huang
Department of Statistics
University of Chicago
Outline

This set of slides covers section 4.1 and 4.4 in the text, which
includes

• Central Limit Theorem (CLT)


• Sampling distribution

1
Example — Rating of a Movie

Suppose a certain movie has a bipolar distribution of ratings, that


in a 1 to 10 scale, of those having watched the movie, 1/3 gave 9
points, 1/3 gave 2 points, and the remaining 1/3 gave 1 points.

So the population distribution is

X 1 2 9
P (X ) 1/3 1/3 1/3

1 2 3 4 5 6 7 8 9 10
Population Distribution
2
Histogram of the Sample

In practice, since the population are difficult (or impossible) to


examine completely, we take a sample to learn about the
population. Will the makeup of the sample mimic the makeup of
the population?

First, the sampling method must be appropriate. A biased sample


won’t give us the correct information about the population.

Suppose we take a simple random sample of size n (say


n = 400) from the population. What will the histogram of the
ratings of the movie given by subjects in the sample look like?

popratings = c(1,2,9)
s400 = sample(popratings, size = 400, replace=T, prob=c(1/3,1/3,1/3))
hist(s400, breaks=0:10+.5, xlab="Ratings", main="Sample Size = 400")

3
sample size = 10 sample size = 25
sample mean = 3.6 sample mean = 4.72
Frequency

Frequency
4

8
2

4
0

0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Ratings Ratings
sample size = 100 sample size = 400
sample mean = 4.01 sample mean = 3.86
30
Frequency

Frequency
50 100
10 0

0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Ratings Ratings

The histogram of the sample looks somewhat like the histogram of


the population. The larger the sample size, the higher the
resemblance.

4
Estimation of the Population Mean

In practice, the population distribution is usually unknown. We are


often interested in population parameters, like the population
mean.

• As all we know about the population is the sample, we can


only use the sample to estimate the population parameter of
interest, called statistic.
• A commonly used estimate of the population mean is the
sample mean. Thus the sample mean is one of such statistic.
• Sample statistics vary from sample to sample.
• How close is the sample mean to the population mean?

5
Variability of the Sample Means

To know the variability of the sample mean of a sample of size


n = 25, we pretend that we know the population

X 1 2 9
P (X ) 1/3 1/3 1/3

and then to the following simulation.

1. We take a random sample of size n = 25 from the population,


compute and record the sample mean, and the put the
sample back.
2. We repeat the previous step 10000 times, and then obtain
10000 sample means.

What will the histogram of the 10000 sample means look like?
6
samplemean25 = vector("numeric", 10000)
for(i in 1:10000){
samplemean25[i] = mean(sample(popratings, size = 25, replace=T,
prob=c(1/3,1/3,1/3)))
}
hist(samplemean25, breaks=seq(1.5,7.02,by=0.04),
xlab="sample mean",
main="Histogram of the Means of 10000 Samples of Size 25")
abline(v=4, col=2)

Histogram of the Means of 10000 Samples of Size 25


0 100 250
Frequency

2 3 4 5 6 7
sample mean
The red vertical line marks the position of the population mean = 4

When we take a sample of size 25, the distribution of the sample


means is not very normal, with a number of hills and valleys. 7
samplemean100 = vector("numeric", 10000)
for(i in 1:10000){
samplemean100[i] = mean(sample(popratings, size = 100, replace=T,
prob=c(1/3,1/3,1/3)))
}
hist(samplemean100, breaks=seq(2.51,5.51,by=0.02),
xlab="sample mean",
main="Histogram of the Means of 10000 Samples of Size 100")
abline(v=4, col=2)

Histogram of the Means of 10000 Samples of Size 100


100 200
Frequency
0

2.5 3.0 3.5 4.0 4.5 5.0 5.5


sample mean
The red vertical line marks the position of the population mean = 4

8
samplemean400 = vector("numeric", 10000)
for(i in 1:10000){
samplemean400[i] = mean(sample(popratings, size = 400, replace=T,
prob=c(1/3,1/3,1/3)))
}
hist(samplemean400, breaks=seq(3.3,4.7,by=0.01),
xlab="sample mean",
main="Histogram of the Means of 10000 Samples of Size 400")
abline(v=4, col=2) # population mean

Histogram of the Means of 10000 Samples of Size 400


100 200
Frequency
0

3.4 3.6 3.8 4.0 4.2 4.4 4.6


sample mean
The red vertical line marks the position of the population mean = 4

When the sample size increases to 400, the distribution of the


9
sample means looks very normal.
Sampling Distribution

• The probability distribution of a statistic is called the sampling


distribution of the statistic.
• What we just constructed is the sampling distribution of the
sample mean.

10
Observations for the Simulations Above

• The sampling distribution of the sample mean may not be


normal when the sample size is small, but it gets more normal
when the sample size gets larger.
• The sample mean may not be equal to the population mean,
but its distribution centers at the population mean.
• With a larger sample, the variability sample mean around the
population gets smaller.
• What are the SDs of the sample means?

> mean(samplemean25) > sd(samplemean25)


[1] 3.99808 [1] 0.7073244
> mean(samplemean100) > sd(samplemean100)
[1] 4.001438 [1] 0.3577802
> mean(samplemean400) > sd(samplemean400)
[1] 3.99929 [1] 0.1770972
11
Expected Value and SD of the Sample Mean

For i.i.d. random variables X1 , X2 , . . . , Xn from a population with


mean µ and SD σ, the expected value and SD of the sample mean
X n = (X1 + X2 + · · · + Xn )/n are respectively

E (X n ) = µ, SD (X n ) = σ/ n

• Here, “i.i.d.” = “independent, and identically distributed”.


which means X1 , . . . , Xn are independent and have identical
probability distributions.
• Observations in a simple random sample is nearly i.i.d. if the
sample size is less than 10% of the population size.
• SD of the sample mean is specifically call the standard error.

12
For the movie rating example, recall the population distribution is

X 1 2 9
P (X ) 1/3 1/3 1/3

The mean, variance and SD of the population distribution are


respectively
1 1 1
µ=1· +2· +9· =4
3 3 3
r r
1 1 1 38
σ = (1 − 4) · + (2 − 4) · + (9 − 4) · =
2 2 2 ≈ 3.56.
3 3 3 3

sample expected > sd(samplemean25)


size n value of X n SD of X n [1] 0.7073244

25 4 3.56/ 25 ≈ 0.712 > sd(samplemean100)

100 4 3.56/ 100 ≈ 0.356 [1] 0.3577802

400 4 3.56/ 400 ≈ 0.178 > sd(samplemean400)
[1] 0.1770972
13
Central Limit Theorem (CLT)

Let X1 , X2 , . . . be a sequence of i.i.d. random variables (discrete or


continuous) with mean µ and variance σ2 . Then, when n is large,

• the distribution of the sample mean

1
Xn = (X1 + X2 + · · · + Xn )
n
is approximately
σ
!
N µ, √ .
n
• the distribution of the sum Sn = X1 + X2 + · · · + Xn is
approximately

N (nµ, nσ).

14
Example

Xi 1 2 9
Xi ’s are i.i.d., with the distribution
P (Xi ) 1/3 1/3 1/3

Recall that µ = 4, σ ≈ 3.56. So the sampling distribution of X 100


is approximately

N (µ, σ/ 100) = N (4, 0.356).
So
4.5 − 4
!
P (X 100 > 4.5) = P Z > ≈ P (Z > 1.40) ≈ 0.08.
0.356

In the simulation 804 of the 10000 simulated X 100 exceeds 4.5,


which agrees with the CLT approximation that X 100 exceeds 4.5 for
about 8% of the time.

> sum(samplemean100 > 4.5)


[1] 804 15
Sample Size Required to Use CLT?

• Provided the sample size is large enough, the sampling


distributions of the sample mean will be approximately
normal, even when the population distribution is not normal.
• If the population distribution is normal, then so does the
sampling distributions of the sample mean, regardless of the
sample size.
• If population distribution is symmetric, then n should be at
least 30 or so.
• If the population distribution is skewed or has outliers, then
sample size n should be moderate (at least 100 or so), or
even larger depending on how skewed or irregular the
population distribution is.

16
Exercise 4.35 – Housing Prices (p.214)

A housing survey was conducted to determine the price of a typical


home in Topanga, CA. The mean price of a house was roughly $1.3
million with an SD of $0.3 million. There were no houses listed
below $0.3 million but a few houses above $3 million.

Can we find an approximate probability that a randomly chosen


house in Topanga costs more than $1.4 million using the normal
distribution?

No, because the population do not follow a normal distribution (it is


right skewed), and a sample of size 1 is too small to use CLT.

17
Exercise 4.35 – Housing Prices (p.214)

Can we find an approximate probability that the mean of 60 ran-


domly chosen houses in Topanga is more than $1.4 million using
the normal distribution? If yes, compute the approximate probabil-
ity.

Yes, if the population distribution is not too skewed, the sampling


distribution of the sample mean of a sample of size 60 might be
normal by CLT.
σ 0.3
!
X 60 ∼ N µ = 1.3, SE = √ = √ = N (1.3, 0.0387).
60 60
So,
1.4 − 1.3
!
P (X 60 > 1.4) = P Z > ≈ P (Z > 2.58) ≈ 0.0049.
0.0387

18
What Does the CLT Say?

True or False and explain: The central limit theorem says that as you
take larger and larger samples from a population, the histogram of
the sample values looks more and more normal.

False, as you take larger and larger samples, the histogram of the
sample values looks more and more like the histogram of the
population.

What is the thing that becomes more and more normal as the sam-
ple size gets larger and larger?

It is the distribution of the sample mean that get’s more and more
normal.

19

You might also like