0% found this document useful (0 votes)

70 views116 pages

Estimation and Hypothesis Testing Guide

The document outlines the objectives and key concepts of estimation and hypothesis testing in statistics, including parameter estimations, hypothesis testing methods such as Z-tests and T-tests, and the Chi-Square test. It discusses the importance of understanding sampling distributions and the Central Limit Theorem, as well as the properties and types of estimators. Additionally, it explains the concepts of point and interval estimation, confidence intervals, and the factors affecting their width.

Uploaded by

Taonaishe Hastings Muzavazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views116 pages

Estimation and Hypothesis Testing Guide

Uploaded by

Taonaishe Hastings Muzavazi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Estimation and Hypothesis Testing

University of Zimbabwe
APRIL 2025
Objectives
2

 After complete this session you will be able to

do
🞑 Parameter estimations
 Pointestimate
 Confidence interval

🞑 Hypothesis testing
 Z-test
 T-test

🞑 Testing associations
 Chi-Square test
Introduction # 1
3

 Inferential is the process of generalizing or

drawing conclusions about the target
population on the basis of results obtained
from a sample.
Introduction #2
4

 Before beginning statistical analyses

🞑 itis essential to examine the distribution of the
variable for skewness (tails),
🞑 kurtosis (peaked or flat distribution), spread (range
of the values) and
🞑 outliers (data values separated from the rest of the
data).
 Information about each of these
characteristics determines to choose the
statistical analyses and can be accurately
explained and interpreted.
Sampling Distribution
5

 The frequency distribution of all these samples forms the

sampling distribution of the sample statistic
Sampling distribution .......
6

 Three characteristics about sampling distribution of a

statistic
🞑 its mean
🞑 its variance
🞑 its shape

 Due to random variation different samples from the

same population will have different sample means.
 If we repeatedly take sample of the same size n
from a population the means of the samples form a
sampling distribution of means of size n is equal to
population mean.
 In practice we do not take repeated samples from a
population
i.e. we do not encounter sampling distribution empirically,
but it is necessary to know their properties in order
to draw statistical inferences.
The Central Limit Theorem
7

 Regardless of the shape of the frequency

distribution of a characteristic in the parent population,
🞑 the means of a large number of samples
(independent observations) from the population
will follow a normal distribution (with the mean of
means approaches the population mean μ, and
standard deviation of σ/√n ).
 Inferential statistical techniques have various assumptions
that must
be met before valid conclusions can be obtained
 Samples must be randomly selected.
 sample size must be greater (n>=30)
 thepopulation must be normally or approximately
normally distributed if the sample size is less than
30.
Sampling Distribution......
8


Sampling Distribution ..........
9


Standard deviation and Standard error

Standard deviation is a measure of variability

between individual observations (descriptive
index relevant to mean)
Standard error refers to the variability of
summary statistics (e.g. the variability of the
sample mean or a sample proportion)
Standard error is a measure of uncertainty in
a sample statistics i.e. precision of the estimate
of the estimator
Parameter Estimations
11

 In parameter estimation, we generally assume

that the underlying (unknown) distribution of the
variable of interest is adequately described by one
or more (unknown) parameters, referred as population
parameters.

 As it is usually not possible to make measurements

on every individual in a population, parameters
cannot usually be determined exactly.

 Instead we estimate parameters by calculating the

corresponding characteristics from a random sample
estimates .
 the process of estimating the value of a parameter from
information obtained from a sample.
Estimation
Estimation is a procedure in which we use the
information included in a sample to get inferences
about the true parameter of interest.

 An estimator is a sample statistic that used to

estimate the population parameter while an estimate is
the possible values that a given estimator can assume.
Properties of a good estimator
Sample statistic Corresponding population parameter
(Sample mean) μ (population mean)
S2 (sample variance) σ2 (population variance)
S (sample Standard deviation) σ (population standard
deviation)
(Sample proportion) P (Population proportion)
A desirable property of a good estimator is the following
 It should be unbiased: The expected value of the estimator
must be
equal to the parameter to be estimated.
 It should be consistent: as the sample size increase, the value
of the
estimator should approaches to the value of the parameter
estimated.
 It should be efficient: the variance of the estimator is the smallest.
 It should be sufficient: the sample from which the estimator is
calculated
must contain the maximum possible information about the
population.
Types of Estimation
14

There are two types of estimation:

1. Point estimation: It uses the information in the

sample to arrive at a single number (that is
called an estimate) that is intended to be close
to the true value of the parameter.

2. Interval estimation: It uses the information of the

sample to end up at an interval (i.e. construct 2
endpoints) that is intended to enclose the true
value of the parameter.
Point Estimation
15

 x
p =
n
Example


Some BLUE estimators
17
Interval Estimation
18

 However the value of the sample statistic will

vary from sample to sample therefore, to
simply obtain an estimate of the single
value of the parameter is not generally
acceptable.
🞑 We need also a measure of how precise our estimate is
likely to be.
🞑 We need to take into account the sample to sample
variation of
the statistic.

 A confidence interval defines an interval within

which the true population parameter is like to
fall (interval estimate).
Confidence Intervals…
19

 Confidence interval therefore takes into account the sample

to sample variation of the statistic and gives the measure of
precision.

 The general formula used to calculate a Confidence interval is

Estimate
± K × Standard Error, k is called reliability coefficient.

 Confidence intervals express the inherent uncertainty in

any medical study by expressing upper and lower bounds
for anticipated true underlying population parameter.

 The confidence level is the probability that the interval

estimate will contain the parameter, assuming that a large
number of samples are selected and that the estimation
process on the same parameter is
repeated.
Confidence intervals…
20

 Most commonly the 95% confidence intervals are

calculated, however 90% and 99% confidence
intervals are sometimes used.

 The probability that the interval contains the true

population parameter is (1-α)100%.

 If we were to select 100 random samples from the

population and calculate confidence intervals for each,
approximately 95 of them would include the true
population mean B (and 5
would not)
Confidence interval ……
21

A (1-α) 100% confidence interval for unknown

population mean and population proportion is given as
follows;

 
 [x  z . x  z . ]
n 2 n
 
,
2
p(1 p) / n
2
 [ p  z . p(1 p) / n, p  z
2
.

]
Interval estimation
22
23
24
25
Confidence intervals…
26

 The 95% confidence interval is calculated in such a way that,

under the conditions assumed for underlying distribution, the
interval will contain true population parameter 95% of the time.

 Loosely speaking, you might interpret a 95% confidence interval

as one which you are 95% confident contains the true parameter.

 90% CI is narrower than 95% CI since we are only 90% certain that the
interval includes the population parameter.

 On the other hand 99% CI will be wider than 95% CI; the extra
width meaning that we can be more certain that the interval will
contain the population parameter. But to obtain a higher confidence
from the same sample, we must be
willing to accept a larger margin of error (a wider interval).
Confidence intervals…
27

 For a given confidence level (i.e. 90%, 95%,

99%) the width of the confidence interval
depends on the standard error of the estimate
which in turn depends on the
🞑 1. Sample size:-The larger the sample size, the
narrower the confidence interval (this is to mean
the sample statistic will approach the population
parameter) and the more precise our estimate. Lack
of precision means that in repeated sampling the
values of the sample statistic are spread out or
scattered. The result of sampling is not repeatable.
Confidence intervals…
28


Confidence interval for a single mean

CI =

Most commonly, we used to compute 95%

confidence interval, however, it is possible to
compute 90% and 99% confidence interval
estimation.
Table 1: Normal distribution
Area between 0 and z

30 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
Confidence interval ……
31

 If the population standard deviation is unknown

and the sample size is small (<30), the formula
for the confidence interval for sample mean is:

🞑 x is the sample mean

🞑 s is the sample standard deviation
🞑 n is the sample size
🞑 t is the value from the t-distribution with (n-1) degrees
of freedom
The t Distribution
32
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657 t D is trib utio n: d f= 1
2 1.886 2.920 4.303 6.965 9.925 0
3 1.638 2.353 3.182 4.541 5.841 0 .4
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 0 .3
7 1.415 1.895 2.365 2.998 3.499 Area = 0.10 Area = 0.10
8 1.397 1.860 2.306 2.896 3.355

f(t)

}
9 1.383 1.833 2.262 2.821 3.250 0 .2
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 1.372
-2.228 0
16 1.337 1.746 2.120 2.583 2.921 2.228

}
}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23
24
1.319
1.318
1.714
1.711
2.069
2.064
2.500
2.492
2.807
2.797
Whenever  is not known (and the population is
25
26
1.316
1.315
1.708
1.706
2.060
2.056
2.485
2.479
2.787
2.779
assumed normal), the correct distribution to use is
27
28
1.314
1.313
1.703
1.701
2.052
2.048
2.473
2.467
2.771
2.763
the t distribution with n-1 degrees of freedom.
29
30
1.311
1.310
1.699
1.697
2.045
2.042
2.462
2.457
2.756
2.750
Note, however, that for large degrees of freedom,
40
60
1.303
1.296
1.684
1.671
2.021
2.000
2.423
2.390
2.704
2.660
the t distribution is approximated well by the Z
1
1.289 1.658 1.980 2.358 2.617 distribution.

20
1.282 1.645 1.960 2.326 2.576
Point and Interval Estimation of the Population
Proportion (p)

We will now consider the method for estimating the

binomial proportion p of successes, that is, the
proportion of elements in a population that
have a certain characteristic.
A logical candidate for a point estimate of the
population 
x
proportion n, where x is the
p is the sample
proportion pˆ number
of observations in a sample of size n that have the
characteristic of interest. As we have seen in sampling
distribution of proportions, the sample proportion is the
best point estimate of the population proportion.
Proportion…
34

 The shape is approximately normal provided n is sufficiently large

- in this case, nP > 5 and nQ > 5 are the requirements
for sufficiently large n ( central limit theorem for proportions) .
 The point estimate for population proportion π is given by þ.
 A (1-α)100% confidence interval estimate for the
unknown population proportion π is given by:
CI =  
 p  Z (1 ) / n , p  Z (1 ) / n 
 2 2 
 If the sample size is small, i.e. np < 5 and nq < 5, and
the population standard deviations for proportion are not given,
then the confidence interval estimation will take t-distribution
instead of z as:
Example 1:
35

 A SRS of 16 apparently healthy subjects yielded the following

values of urine excreted (milligram per day);
0.007, 0.03, 0.025, 0.008, 0.03, 0.038, 0.007, 0.005, 0.032,
0.04,
0.009, 0.014, 0.011, 0.022, 0.009, 0.008
Compute point estimate of the population mean
If x1 , x 2 , ..., are n observed values then
xn n ,
 xi 0.295
x = i=1   0.01844
n 16
Construct 90%, 95%, 98% confidence interval for the mean
(0.01844-1.65x0.0123/4, 0.01844+1.65x0.0123/4)=(0.0134,
0.0235)
(0.01844-1.96x0.0123/4, 0.01844+1.96x0.0123/4)=(0.0124,
0.0245)
(0.01844-2.33x0.0123/4, 0.01844+2.33x0.0123/4)=(0.0113,
0.0256)
Example 2
The mean diastolic blood pressure for 225 randomly
selected individuals is 75 mmHg with a standard
deviation of 12.0 mmHg. Construct a 95% confidence
interval for the mean
Solution
n=225
mean =75mmhg
Standard deviation=12 mmHg
confidence level 95%
The 95% confidence interval for the unknown population mean is
given
95%CI = (75 ±1.96x12/15) = (73.432,76.56)
Example 2:
37

A stock market analyst wants to estimate the average return on a

certain stock. A random sample of 15 days yields an average
(annualized) return x  and a standard deviation of s =
of 10.37
3.5. Assuming a normal population of returns, give a 95% confidence
interval for the average return on this stock.
df t0.100 t0.050 t0.025 t0.010 t0.005
The critical value of t for df = (n -1) = (15 -1)
1 3.078 6.314 12.706
.
.
.
.
.
.
.
.
31.821
.
63.657
. =14 and a right-tail area of 0.025 is:
.
13
.
1.350
.
1.771
.
2.160
.
.
.
. t0.025  2.145
2.650 3.012
14
15
1.345
1.341
1.761
1.753
2.145
2.131
2.624
2.602
2.977
2.947
The corresponding confidence s interval or
.
.
.
.
.
.
.
.
.
.
.
. interval estimate is: x  t0.025 
n
8.43,12.31
. . . . . .

 10.37  2.145
 10.37  1.94

3.5
15
Example 3:
38

 In a survey of 300 automobile drivers in one city, 123

reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.
 Answer : p= 123/300 =0.41=41%
n=300,
Estimate of the seat belt of the city
at 95% CI = p ± z ×(√p(1-p) /n)
=(0.35,0.47)
Example 4:
In a sample of 400 people who were questioned regarding their participation in sports,
160 said that they did participate. Construct a 98 % confidence interval for P, the
proportion of P in the population who participate in sports.
Solution:
Let X= be the number of people who are interested to participate in sports.
X=160, n=400, =0.02, Hence

Pˆ X  Z 2  Z 
 0.01 2.33
n 

160 0.4  P2ˆ  P(1  P)  0.4(0.6)  0.0245
n 400
400

As a result, an approximate 98% confidence interval for P is given by:

Pˆ (1   (0.4  (2.33 * 0.0245)),(0.4  (2.33 * 0.0245
 Pˆ  P  Pˆ  Pˆ )
Pˆ (1    0.345,0.457
2
Z Z 2 )
Pˆ ) n

Hence, we can conclude that about 98% confident that the true proportion of people in
the population who participate in sports between 34.5% and 45.7%.
HYPOTHESIS TESTING
40

Introduction
🞑 Researchers are interested in answering many
types of questions. For example, A physician
might want to know whether a new
medication will lower a person’s blood
pressure.

🞑 These types of questions can be addressed

through statistical hypothesis testing, which is a
decision-making process for evaluating claims
about a population.
Hypothesis Testing
41

 The formal process of hypothesis testing provides us with a

means of answering research questions.

 Hypothesis is a testable statement that describes the

nature of the proposed relationship between two or
more variables of interest.

 In hypothesis testing, the researcher must defined the

population under study, state the particular
hypotheses that will be investigated, give the
significance level, select a sample from the population,
collect the data, perform the calculations required for the
statistical test, and reach a conclusion.
Idea of hypothesis testing
42
type of Hypotheses
43

 Null hypothesis (represented by HO) is the statement about the

value of the population parameter. That is the null hypothesis
postulates that ‘there is no difference between factor and outcome’
or ‘there is no an intervention effect’.
 Alternative hypothesis (represented by HA) states the ‘opposing’ view that
‘there is
a difference between factor and outcome’ or ‘there is an intervention
effect’.
Methods of hypothesis testing
44

 Hypotheses concerning about parameters which may

or may not be true

 Examples

• The mean GPA of this class is 3.5!

• The mean height of the Gondar College of Medical

Sciences (GCMS) students is 1.63m.

• There is no difference between the distribution of Pf

and Pv malaria in Ethiopia (are distributed in equal
proportions.)
Steps in hypothesis testing
4
5

1 2

Identify the null hypothesis H0 Choose a. The value should be small, usually
and less than 10%. It is important to
the alternate hypothesis HA. consider the consequences of both types
of errors.

3 samples; for larger number of samples, a z

Select the test statistic and statistic can work well if data are normally
determine its value from the distributed.
sample data. This value is called
the observed value of the test
statistic. Remember that t
statistic is usually appropriate
for a small number of
4

Compare the observed value of

the statistic to the critical value
obtained for the chosen a.

5
Make a decision.
6
Conclusion
Test Statistics
46

 Because of random variation, even an unbiased sample

may not
accurately represent the population as a whole.
 As a result, it is possible that any observed
differences or associations may have occurred by
chance.
 A test statistics is a value we can compare with known
distribution of what we expect when the null hypothesis
is true.
 The general formula of the test statistics is:

Observed _ Hypothesized
Test statistics = value value .
Standard error
 The known distributions are Normal distribution, student’s distribution , Chi-
square distribution ….
Critical value
47

 The critical value separates the critical region from the noncritical
region for a given level of significance
Decision making
48

 Accept or Reject the null hypothesis

 There are 2 types of errors

Type of decision H0 true H0 false

Reject H0 Type I error (a) Correct decision (1-β)

Accept H0 Correct decision (1-a) Type II error (β)

 Type I error is more serious error and it is the level of

significant
 power is the probability of rejecting false null hypothesis
and it is given by 1-β
9
50
51
Types of testes

H 0:   One tailed test  Critical

 Value(
s)
H 1:  <

Rejection 0
Regions Two
H 0 :    taile
d
H 1 :  >  0 test
H 0 :    0
H 1 :   


/2
Hypothesis testing about a Population mean (μ)
53

Two Tailed Test:

The large sample (n > = 30) test of hypothesis about a population
mean μ is as follows
1 H 0 :  0 (  0 )
H A :  1   0 (   0 )

zcal  x  0

n
ztabulated  fo tw taile test
z r o d
2

Decision :
i f | zcal | ztab
reject H o
if | zcal | do not reject H o
ztab
Steps in hypothesis testing…..
54

If the test statistic does not fall in the

If the test statistic falls in the critical
critical region:
region:
Conclude that there is not enough
Reject H0 in favour of HA.
evidence to reject H0.
One tailed tests
55

2 H : 0 ( 0 )
0
H A : 1  0 ( 0 )

zcal  x  0 , z  fo on tailed test

 tabulated
z r e
n
Decision :
i zcal   reject H o
f ztab

if zcal   ztab do not reject H o
3 H : 0 ( 0 )
0
H A : 1  0 ( 0 )
Decision :
i zcal  zta reject H o
f b

if zcal  do not reject H o
ztab
The P- Value
56

 In most applications, the outcome of performing a

hypothesis test is to produce a p-value.
 P-value is the probability that the observed difference
is due to chance.

 A large p-value implies that the probability of the value

observed, occurring just by chance is low, when the null
hypothesis is true.

 That is, a small p-value suggests that there might be

sufficient evidence for rejecting the null hypothesis.
 The p value is defined as the probability of observing
the computed significance test value or a larger one,
if the H0 hypothesis is true. For example, P[ Z >=Zcal/H0
true].
P-value……
57

 A p-value is the probability of getting the

observed difference, or one more extreme, in
the sample purely by chance from a
population where the true difference is
zero.

 If the p-value is greater than 0.05 then,

by convention, we conclude that the observed
difference could have occurred by chance
and there is no statistically significant
evidence (at the 5% level) for a difference
between the groups in the population.
How to calculate P-value
o Use statistical software like SPSS, SAS……..
o Hand calculations

—obtained the test statistics (Z

Calculated or t- calculated)
—find the probability of test statistics from
standard normal table
—subtract the probability from 0.5
—the result is P-value
Note if the test two tailed multiply 2 the result.
P-value and confidence interval
59

 Confidence intervals and p-values are based upon

the same theory and mathematics and will lead to
the same conclusion about whether a population
difference exists.

 Confidence intervals are referable because they

give information about the size of any difference in
the population, and they also (very usefully) indicate
the amount of uncertainty remaining about the size of
the difference.

 When the null hypothesis is rejected in a

hypothesis-testing situation, the confidence interval
for the mean using the same level of significance will
not contain the hypothesized mean.
The P- Value …..
60

 But for what values of p-value should we reject

the null hypothesis?
🞑 By convention, a p-value of 0.05 or smaller is
considered sufficient evidence for rejecting the null
hypothesis.
🞑 By using p-value of 0.05, we are allowing a 5% chance
of wrongly rejecting the null hypothesis when it is in
fact true.

 When the p-value is less than to 0.05, we often say

that the result is statistically significant.
Hypothesis testing for single population
mean
61

EXAMPLE 5: A researcher claims that the mean of the IQ

for 16 students is 110 and the expected value for all
population is 100 with standard deviation of 10. Test the
hypothesis .
 Solution
1. Ho:µ=100 VS HA:µ≠100
2. Assume α=0.05
3. Test statistics: z=(110-100)4/10=4
4. z-critical at 0.025 is equal to 1.96.
5. Decision: reject the null hypothesis since 4 ≥ 1.96
6. Conclusion: the mean of the IQ for all population is
different from 100 at 5% level of significance.
Example 6:
62

Suppose that we have a population mean 3.1 and

n=20
people x  4.5 and s  5.5 found and , our test
statistic is
1. Ho: 3.1
HA:  3.1
2. α = 0.5 at 95% CI
3. tx   4.5  3.1  1.14 t0.05,19  2.09
 s

n 5.5
20
4. the observed value of the test statistic falls
with in the range of the critical values
5.we accept Ho and conclude that there is no
enough to reject the null
evidence
hypothesis.
Cont….
63

A 95% confidence interval for the mean is

x  t0.05,19 / n  4.5 2.09(5.5 20)  (1.93,7.07)

s  /

Note that this interval includes the

hypothesis value of 3.1
Hypothesis testing for single proportions
64

Example 7: In the study of childhood abuse in psychiatry patients,

brown found that 166 in a sample of 947 patients reported histories of
physical or sexual abuse.
a) constructs 95% confidence interval
b) test the hypothesis that the true population proportion is
30%?
 Solution (a)
🞑 The 95% CI for P is given by

p(1  p)
p  z
2
n

 0.175  1.96  0.175 

0.825
 0.175  1.96  0.0124
947
 [0.151 ; 0.2]
Example……
65

 To the hypothesis we need to follow

the steps Step 1: State the hypothesis
Ho: P=Po=0.3
Ha: P≠Po ≠0.3
Step 2: Fix the level of significant (α=0.05)
Step 3: Compute the calculated and tabulated value of the test statistic


zcal p Po 0.175  0.3  0.125

  8.39
0.0149

p(1 p)  0.3(0.7)
n 947
ztab  1.96
Example……
66

 Step 4: Comparison of the calculated and tabulated

values of the test statistic
 Since the tabulated value is smaller than the calculated
value of the test the we reject the null hypothesis.
 Step 6: Conclusion
 Hence we concluded that the proportion of childhood
abuse in psychiatry patients is different from 0.3

 If the sample size is small (if np<5 and n(1-p)<5) then use
student’s
t- statistic for the tabulated value of the test statistic.
Chi-square test
67

 In recent years, the use of specialized statistical

methods for categorical data has increased
dramatically, particularly for applications in the
biomedical and social sciences.
 Categorical scales occur frequently in the health
sciences, for measuring responses.
 E.g.
 patient survives an operation (yes, no),
 severity of an injury (none, mild, moderate, severe),
and
 stage of a disease (initial, advanced).

 Studies often collect data on categorical variables that

can be summarized as a series of counts and
commonly arranged in a tabular format known as a
contingency table
Chi-square Test Statistic cont’d…
68

 As with the z and t distributions, there is a different chi-square

distribution for each possible value of degrees of freedom.

Chi-square distributions with a small number of degrees of

freedom are highly skewed; however, this skewness is
attenuated as the number of degrees of freedom increases.

The chi-squared distribution is concentrated over nonnegative

values. It has mean equal to its degrees of freedom (df), and
its standard deviation equals √(2df ). As df increases, the
distribution concentrates around larger values and is more
spread out.

The distribution is skewed to the right, but it becomes more bell-

shaped
(normal) as df increases.
69

The degrees of freedom for tests of hypothesis that

involve an rxc contingency table is equal to (r-1)x(c-1);
Test of Association
70

 The chi-squared (2) test statistics is widely used in the

analysis of contingency tables.

 It compares the actual observed frequency in each

group with the expected frequency (the later is based on
theory, experience or comparison groups).

 The chi-squared test (Pearson’s χ2 ) allows us to test for

association between categorical (nominal!) variables.

 The null hypothesis for this test is there is no association

between the variables. Consequently a significant p-value
implies association.
Test of Association
71

 It is a requirement that a chi-squared test be applied to discrete data.

Counting numbers are appropriate, continuous measurements are not.
Assuming continuity in the underlying distribution distorts the p value and
may make false positives more likely.

 Additionally, chi squared test should not be used when the observed values
in a cell are <5. It is, at times not inappropriate to pad an empty cell with a
small value, though, as one can only assume the result would be more
significant with no value there.
Test Statistic: 2-test with d.f. = (r-1)x(c-1)
72

2
  Eij 
2

  ij

O Eij
 i, j

i ra
th
total th column Ri  C j
w  j total
Eij 
grand total n

Oij=observed frequency, Eij=expected frequency of the cell at
the
juncture of I th raw & j th column
Chi-square test...
73

Consider the following 3 by 2 contingency table

Chi-square test...
74
Procedures of Hypothesis Testing
75

1. State the hypothesis

2. Fix level of significance
3. Find the critical value (x2 (df, α))
4. Compute the test statistics
5. Decision rules; reject null hypothesis if test statistics
> table value.
Example 11:
76

Consider the following 3x2 contingency table

Chi-square test...
77

= 153.40
Chi-square table
78 Right tail areas for the Chi-square Distribution

df\area .995 .990 .975 .950 .900 .750 .500 .250 .100 .050 .025 .010 .005

1 0.00004 0.00016 0.00098 0.00393 0.01579 0.10153 0.45494 1.32330 2.70554 3.84146 5.02389 6.63490 7.87944

2 0.01003 0.02010 0.05064 0.10259 0.21072 0.57536 1.38629 2.77259 4.60517 5.99146 7.37776 9.21034 10.5966

3 0.07172 0.11483 0.21580 0.35185 0.58437 1.21253 2.36597 4.10834 6.25139 7.81473 9.34840 11.3448 12.8381

4 0.20699 0.29711 0.48442 0.71072 1.06362 1.92256 3.35669 5.38527 7.77944 9.48773 11.1432 13.2767 14.8602

5 0.41174 0.55430 0.83121 1.14548 1.61031 2.67460 4.35146 6.62568 9.23636 11.0705 12.8325 15.0862 16.7496

6 0.67573 0.87209 1.23734 1.63538 2.20413 3.45460 5.34812 7.84080 10.6446 12.5915 14.4493 16.811 18.5475

7 0.98926 1.23904 1.68987 2.16735 2.83311 4.25485 6.34581 9.03715 12.0170 14.0671 16.0127 18.4753 20.2777

8 1.34441 1.64650 2.17973 2.73264 3.48954 5.07064 7.34412 10.2188 13.3615 15.5073 17.5345 20.0902 21.9549
Assumptions of the 2 -
79
test
The chi-squared test assumes that
 Data must be categorical
 The data be a frequency data
🞑 the numbers in each cell are ‘not too small’. No
expected frequency should be less than 1, and
🞑 no more than 20% of the expected frequencies should
be less than 5.
 If this does not hold row or column variables categories
can sometimes be combined (re-categorized) to make
the expected frequencies larger or use Yates continuity
correction.
Example 12:
80

Consider hypothetical example on smoking and symptoms of asthma.

The study involved 150 individuals and the result is given in the following
table:
Solution
81
Hypothesis:
🞑 H0: there is no association between smoking and symptoms of asthma
🞑 H0: there is association between smoking and symptoms of asthma

The critical value is given by X2 (0.05,1) = 3.841

Test statistics

And The corresponding p-value to 5.36 at 1 degree of freedom is

estimated by 0.02.
Hence, the decision is reject the null hypothesis and accept the
alternative
hypothesis
Conclusion: there is association between smoking and symptoms of
asthma).
Example 13:
82

Consider the data on the assessment of the effectiveness of

antidepressant.
The data is given below:
Solution
83

 Hypothesis
🞑 H0: there is no association between the treatment and relapse
🞑 H1: there is no association between the treatment and relapse
 The degree of freedom for this table is df = (3-1)(2-1) = 2.
thus the critical value from chi-square distribution
=is9.21
given by
Quiz
84

 You randomly sampled 286 sexually active individuals and

collect information on their HIV status and History of STDs. At
the .05 level, is there evidence of a relationship between them?

HIV
STDs Hx No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286
Summery

Characteristics χ2
1. Every χ2 distribution extends indefinitely to the right from 0.
2. Every χ2 distribution has only one (right ) tail.
3. As df increases, the χ2 curves get more bell shaped and approach the
normal curve in appearance (but remember that a chi square curve
starts at 0, not at
-∞)
4. If the value of χ2 is zero, then there is a perfect agreement
between the observed and the expected frequencies. The greater
the discrepancy between the observed and expected frequencies,
the larger will be the value of χ2.

Statistical Inference Basics
No ratings yet
Statistical Inference Basics
85 pages
Estimation and Hypothesis Testing Guide
No ratings yet
Estimation and Hypothesis Testing Guide
119 pages
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
No ratings yet
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
119 pages
Estimation and Hypothesis Testing Guide
No ratings yet
Estimation and Hypothesis Testing Guide
93 pages
Estimation and Confidence Intervals
No ratings yet
Estimation and Confidence Intervals
65 pages
Advantages and Disadvantages of Inferential Statistics
No ratings yet
Advantages and Disadvantages of Inferential Statistics
62 pages
Statistical Inference and Estimation Techniques
100% (1)
Statistical Inference and Estimation Techniques
33 pages
Chapter 4 Inferential
No ratings yet
Chapter 4 Inferential
135 pages
Estimation
No ratings yet
Estimation
44 pages
Estimation and CI
No ratings yet
Estimation and CI
87 pages
Business Statistics CH 2
No ratings yet
Business Statistics CH 2
49 pages
Chapter Two
No ratings yet
Chapter Two
28 pages
Chapter Two-Four
No ratings yet
Chapter Two-Four
118 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Statistical Estimation Techniques Explained
No ratings yet
Statistical Estimation Techniques Explained
46 pages
Ch-1.Ppt Business Statx
No ratings yet
Ch-1.Ppt Business Statx
66 pages
Chapter Two (Estimation and Hypothesis Testing)
No ratings yet
Chapter Two (Estimation and Hypothesis Testing)
20 pages
Understanding Probability and Estimation
No ratings yet
Understanding Probability and Estimation
92 pages
Estimation
No ratings yet
Estimation
74 pages
7 Estimation
No ratings yet
7 Estimation
91 pages
Estimation and Hypothesis Testing Guide
No ratings yet
Estimation and Hypothesis Testing Guide
118 pages
Statistical Estimation Methods Explained
No ratings yet
Statistical Estimation Methods Explained
108 pages
Understanding Inferential Statistics Concepts
No ratings yet
Understanding Inferential Statistics Concepts
154 pages
Theory of Estimation in Statistics
100% (1)
Theory of Estimation in Statistics
30 pages
Estimation in Statistics
100% (1)
Estimation in Statistics
4 pages
Biostat Lecture Seven
No ratings yet
Biostat Lecture Seven
59 pages
Statistical Estimations Explained
No ratings yet
Statistical Estimations Explained
28 pages
Module 5
No ratings yet
Module 5
67 pages
BBA IV Business Statistics
No ratings yet
BBA IV Business Statistics
270 pages
Understanding Population Mean Estimation
No ratings yet
Understanding Population Mean Estimation
9 pages
Statistical Estimation Techniques Explained
No ratings yet
Statistical Estimation Techniques Explained
130 pages
Estimation and Sample Size Determination
No ratings yet
Estimation and Sample Size Determination
37 pages
8.1 Estimation of Parameters
No ratings yet
8.1 Estimation of Parameters
5 pages
Statistical Estimations Explained
No ratings yet
Statistical Estimations Explained
10 pages
Chapte 8 Estimation
No ratings yet
Chapte 8 Estimation
60 pages
Estimation Techniques in Statistics
No ratings yet
Estimation Techniques in Statistics
33 pages
UNIT 10 - Estimations (With Voice)
No ratings yet
UNIT 10 - Estimations (With Voice)
67 pages
Statistical Methods for Students
No ratings yet
Statistical Methods for Students
24 pages
Parameter Estimation in Statistics
No ratings yet
Parameter Estimation in Statistics
3 pages
Inferential Stats Essentials
No ratings yet
Inferential Stats Essentials
37 pages
Statistical Inference & Estimation Guide
No ratings yet
Statistical Inference & Estimation Guide
90 pages
6 Estimation and Hypothesis
No ratings yet
6 Estimation and Hypothesis
95 pages
Inferential Statistics Guide
No ratings yet
Inferential Statistics Guide
102 pages
Inference on Population Mean & Proportion
No ratings yet
Inference on Population Mean & Proportion
19 pages
Chapter 4 - BUSINESS STATISTICS
No ratings yet
Chapter 4 - BUSINESS STATISTICS
14 pages
Lecture-7-Estimation Theory-MTH 106-Draft 2
No ratings yet
Lecture-7-Estimation Theory-MTH 106-Draft 2
101 pages
Statistical Estimation Guide
No ratings yet
Statistical Estimation Guide
68 pages
Lec - 7& 8 (Stastical Estimation)
No ratings yet
Lec - 7& 8 (Stastical Estimation)
65 pages
Understanding Estimation in Statistics
No ratings yet
Understanding Estimation in Statistics
53 pages
Business Stat-Chapt 3 & 4
No ratings yet
Business Stat-Chapt 3 & 4
107 pages
Estimation & Sample Size Determination
No ratings yet
Estimation & Sample Size Determination
91 pages
Confidence Intervals for Population Means
No ratings yet
Confidence Intervals for Population Means
29 pages
Understanding Statistical Inference Techniques
No ratings yet
Understanding Statistical Inference Techniques
44 pages
Understanding Statistical Inference
No ratings yet
Understanding Statistical Inference
5 pages
Statistical Estimation Guide
No ratings yet
Statistical Estimation Guide
180 pages
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
No ratings yet
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
37 pages
Chapter 5 - Estimation
No ratings yet
Chapter 5 - Estimation
8 pages
Chapter Two Stat II
No ratings yet
Chapter Two Stat II
20 pages
Point Estimation in Statistical Analysis
No ratings yet
Point Estimation in Statistical Analysis
17 pages
Theory of Estimation Assignment 1
No ratings yet
Theory of Estimation Assignment 1
1 page
Tutorial Question Wacc, Lease, WC Manaement
No ratings yet
Tutorial Question Wacc, Lease, WC Manaement
5 pages
Point Estimation Methods Explained
No ratings yet
Point Estimation Methods Explained
9 pages
Statistical Inference Tutorial Overview
No ratings yet
Statistical Inference Tutorial Overview
1 page
Method of Moments & MLE for Distributions
No ratings yet
Method of Moments & MLE for Distributions
1 page
Grade 3 Revision Book Corrected.
No ratings yet
Grade 3 Revision Book Corrected.
104 pages
Regression Analysis Tutorial 4 Guide
No ratings yet
Regression Analysis Tutorial 4 Guide
2 pages
05 Finmgt203 Lecture 5
No ratings yet
05 Finmgt203 Lecture 5
38 pages
Statistical Inference 1 Assignment 2
No ratings yet
Statistical Inference 1 Assignment 2
2 pages
Statistical Inference: Confidence Intervals and Hypothesis Testing
No ratings yet
Statistical Inference: Confidence Intervals and Hypothesis Testing
1 page
HASTS112 Tut5 2025
No ratings yet
HASTS112 Tut5 2025
1 page
Dean Assignment
No ratings yet
Dean Assignment
4 pages
Muringa Panashe P SDLSSC Assignment 1
No ratings yet
Muringa Panashe P SDLSSC Assignment 1
4 pages
Idsc Group 13 Assingment Literature Review
No ratings yet
Idsc Group 13 Assingment Literature Review
7 pages
Chapter5-Multiple Linear Regression
No ratings yet
Chapter5-Multiple Linear Regression
5 pages
Student Involvement & Development
No ratings yet
Student Involvement & Development
4 pages
TRANSPIRATION
No ratings yet
TRANSPIRATION
3 pages
Form 1 Revision Tests Marking Scheme
No ratings yet
Form 1 Revision Tests Marking Scheme
3 pages
OSI Model: 7 Layers Explained
No ratings yet
OSI Model: 7 Layers Explained
3 pages
Characteristics of Living Organisms
No ratings yet
Characteristics of Living Organisms
1 page
Understanding Oxidation and Rusting
No ratings yet
Understanding Oxidation and Rusting
3 pages
Source vs Object Code Explained
No ratings yet
Source vs Object Code Explained
4 pages
Algorithm for Summing Marks
No ratings yet
Algorithm for Summing Marks
1 page
CT Enterprises Shopping System Analysis
No ratings yet
CT Enterprises Shopping System Analysis
49 pages
MOCK EXAMINATION-marking Scheme
No ratings yet
MOCK EXAMINATION-marking Scheme
6 pages
Overview of Programming Languages
No ratings yet
Overview of Programming Languages
3 pages
Computer Application in Banks
100% (2)
Computer Application in Banks
4 pages
Time Study Procedures and Standards
No ratings yet
Time Study Procedures and Standards
7 pages
Bae Cher 2016
No ratings yet
Bae Cher 2016
17 pages
Shrout Bolger 2002
No ratings yet
Shrout Bolger 2002
26 pages
Public Transport Survey Report
No ratings yet
Public Transport Survey Report
57 pages
Golf Ball Distance Test Analysis
No ratings yet
Golf Ball Distance Test Analysis
1 page
Assignment Stataic Multimedia University
No ratings yet
Assignment Stataic Multimedia University
5 pages
Basic Statistics With R: Reaching Decisions With Data Stephen C. Loftus Online Reading
No ratings yet
Basic Statistics With R: Reaching Decisions With Data Stephen C. Loftus Online Reading
164 pages
Hallmarks of Scientific Research
100% (1)
Hallmarks of Scientific Research
7 pages
Understanding Regression Analysis Concepts
No ratings yet
Understanding Regression Analysis Concepts
10 pages
Chapter 7. Statistical Intervals For A Single Sample
No ratings yet
Chapter 7. Statistical Intervals For A Single Sample
102 pages
Trait Behavioral Theories of Leadership
100% (1)
Trait Behavioral Theories of Leadership
67 pages
Lewis, Margaret - Applied Statistics For Economists-Routledge (2012)
100% (3)
Lewis, Margaret - Applied Statistics For Economists-Routledge (2012)
465 pages
Confidence Interval
No ratings yet
Confidence Interval
11 pages
Applying Quantitative Bias Analysis To Epidemiologic Data
No ratings yet
Applying Quantitative Bias Analysis To Epidemiologic Data
206 pages
Output Analysis in Discrete-Event Simulation
No ratings yet
Output Analysis in Discrete-Event Simulation
35 pages
Definition of Business Research
No ratings yet
Definition of Business Research
28 pages
CH 06 Solutions
No ratings yet
CH 06 Solutions
8 pages
Understanding Surveying Errors and Accuracy
No ratings yet
Understanding Surveying Errors and Accuracy
10 pages
Understanding PR 2303 Exam Content
0% (1)
Understanding PR 2303 Exam Content
4 pages
215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
Reichow Et Al-2019-Cochrane Database of Systematic Reviews
No ratings yet
Reichow Et Al-2019-Cochrane Database of Systematic Reviews
59 pages
SSCLNet for Brain MRI Classification
No ratings yet
SSCLNet for Brain MRI Classification
9 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
X2gtefc4o2 Coy2 (RZ
No ratings yet
X2gtefc4o2 Coy2 (RZ
14 pages
Confidence Interval Basics
No ratings yet
Confidence Interval Basics
18 pages
Biodiversity Estimation with SpadeR
No ratings yet
Biodiversity Estimation with SpadeR
22 pages
Sample Size Determination in Health Studies A Practical Manual
No ratings yet
Sample Size Determination in Health Studies A Practical Manual
92 pages
Understanding Confidence Intervals
No ratings yet
Understanding Confidence Intervals
5 pages
Abhinav Et Al 2020 Comparative Study of Presurgical Infant Orthopedics by Modified Grayson Method and Dynacleft With
No ratings yet
Abhinav Et Al 2020 Comparative Study of Presurgical Infant Orthopedics by Modified Grayson Method and Dynacleft With
13 pages
Transport Logit Models Analysis
No ratings yet
Transport Logit Models Analysis
20 pages