Lecture Chi Square Non Parametric Test
Lecture Chi Square Non Parametric Test
expected frequency
Note that chi-square tests can only be used on actual
numbers and not on percentages, proportions, etc
A study of 667 drivers who were using a cell phone when they were involved
in a collision on a weekday examined the relationship between these
accidents and the day of the week.
Are the accidents equally likely to occur on any day of the working week?
To answer these questions we use the chi-square
goodness of fit test
expected frequency
Decision Rule:
χ 2
If STAT χ 2
α , reject H0,
otherwise, do not reject
H0
0
Do not Reject H0 2
reject H0 2α
The expected count for each of the five days is npi = 667(1/5) = 133.4.
2 2
(observed - expected) (count - 133.4)
2 = = day
= 8.49
expected 133.4
Following the chi-square distribution with 5 − 1 = 4 degrees of freedom.
p
df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12
2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20
3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73
4 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 20.00
5 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 22.11
6 7.84 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55 20.25 22.46 24.10
Since
7 the
9.04value 8.49
9.80 of
10.75 the test
12.02 statistic
14.07 is
16.01 less than
16.62 the
18.48 table
20.28 value
22.04of 9.49,
24.32 we
26.02
8 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 27.87
do9 not 11.39
reject H12.24
0 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67
10 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42
➔11There is
13.70 no significant
14.63 15.77 evidence
17.28 19.68of different
21.92 car
22.62 accident
24.72 rates
26.76 for
28.73 different
31.26 33.14
12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82
weekdays
13 15.98 when
16.98 the driver
18.20 was
19.81 using
22.36 a cell
24.74 phone.
25.47 27.69 29.82 31.88 34.53 36.48
Car accidents and day of the week
(bounds on P-value)
H0 specifies that all days are equally likely for
car accidents ➔ each pi = 1/5.
The expected count for each of the five days is npi = 667(1/5) = 133.4.
2 2
(observed - expected) (count - 133.4)
2 = = day
= 8.49
expected 133.4
Following the chi-square distribution with 5 − 1 = 4 degrees of freedom.
p
df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12
2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20
3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73
4 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 20.00
5 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 22.11
67.78 <
7.84 2 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55
X = 8.49 < 9.49 Thus the bounds on the P-value are 0.05 < P-value 20.25 22.46 24.10
< 0.1
7 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 26.02
We 8don’t10.22
know11.03the exact
12.03 P-value
13.36 but
15.51 we17.53
DO know
18.17 that P-value
20.09 21.95 > 23.77
0.05, thus
26.12 we27.87
9 11.39 12.24 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67
conclude
10
that
12.55
…
13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42
➔11There is no14.63
13.70 significant
15.77 evidence
17.28 19.68of different
21.92 car accident
22.62 24.72 rates28.73
26.76 for different
31.26 33.14
12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82
weekdays
13 15.98
when
16.98
the driver
18.20
was
19.81
using
22.36
a cell
24.74
phone.
25.47 27.69 29.82 31.88 34.53 36.48
Example 2: M & M Colors
2 = 9.316;degrees of freedom = 6 − 1 = 5
0.05 Here,
χ 2
= 9.316 < χ.05
2
= 11.070,
so we do not reject H0 and
0 conclude that there is not
Do not Reject H0 2 sufficient evidence to conclude
reject H0
20.05 = 11.070 that Mars has changed the color
proportions.
Example
Hand Preference
sample size = n = 300:
Gender Left Right
120 Females, 12
were left handed Female 12 108 120
180 Males, 24 were
left handed Male 24 156 180
36 264 300
Testing for independence
Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size
df=(r-1)*(k-1)
Computing the
Average Proportion
The average X1 + X2 X
p= =
proportion is: n1 + n2 n
Hand Preference
Gender Left Right
Observed = 12 Observed = 108
Female 120
Expected = 14.4 Expected = 105.6
Observed = 24 Observed = 156
Male 180
Expected = 21.6 Expected = 158.4
36 264 300
The Chi-Square Test Statistic
Hand Preference
Gender Left Right
Observed = 12 Observed = 108
Female 120
Expected = 14.4 Expected = 105.6
Observed = 24 Observed = 156
Male 180
Expected = 21.6 Expected = 158.4
36 264 300
The test statistic is:
(f o − f e ) 2
χ 2STAT =
all cells
fe
(12 − 14.4) 2 (108 − 105.6) 2 (24 − 21.6) 2 (156 − 158.4) 2
= + + + = 0.7576
14.4 105.6 21.6 158.4
Decision Rule
2
The test statistic is χ STAT = 0.7576 ; χ 02.05 with 1 d.f. = 3.841
Decision Rule:
2
If χ STAT > 3.841, reject H0,
otherwise, do not reject H0
Here,
2 2
0.05 χ STAT = 0.7576< χ 0.05 = 3.841,
so we do not reject H0 and
0 conclude that there is not
Do not Reject H0 2 sufficient evidence that the two
reject H0
20.05 = 3.841 proportions are different at =
0.05
Example 2: meal plan selection
• The meal plan selected by 200 students is shown below:
(Obs − Exp ) 2
=
2
Decision2 Rule:
If > 12.592, reject H0, otherwise,
do not reject H0
0.05 Here,
2 2
= 0.709 < χ 0.05 = 12.592,
so do not reject H0
0
Do not Reject H0 2 Conclusion: there is not
reject H0 sufficient evidence that meal
20.05=12.592 plan and class standing are
related.
2 Test of Independence
The Chi-square test statistic is:
( fo − fe )2
2
χ STAT =
all cells
fe
n where:
fo = observed frequency in a particular cell of the r x c table
fe = expected frequency in a particular cell if H0 is true
Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size
Decision Rule
• The decision rule is
2
If χ STAT χ α2 , reject H0,
otherwise, do not reject H0
2
Where χ α is from the chi-squared distribution
with (r – 1)(c – 1) degrees of freedom
Example
Suppose you have the following categorical data for 3
types of flu in three different regions