Chi square test
Chi Square
2 ( )
Is used to investigate whether distributions of categorical variables differ from one another. Categorical data may be displayed in contingency tables. May be used to test the hypothesis of no association between two or more groups between the row and column classifications.
Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.
There are several types of Chi Square test used depending on the way the data was collected and the hypothesis being tested.
2x2 Contingency Table
Chi Square Distribution Table probability level (alpha)
Df 1 0.5 0.455 0.10 2.706 0.05 3.841 0.02 5.412 0.01 6.635 0.001 10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
11.345
16.268
3.357
7.779
9.488
11.668
13.277
18.465
4.351
9.236
11.070
13.388
15.086
20.517
Table 1. General notation for a 2 x 2 contingency table.
Variable 2 Data type 1 Data type 2 Totals
Category 1
a+b
Category 2
c+d
Total
a+c
b+d
a+b+c+d=N
2 =
[ ]2 + + (+)(+)
EXAMPLE:
Suppose you conducted a drug trial on a group of animals and you hypothesized that the animals receiving the drug would show increased heart rates compared to those that did not receive the drug. You conduct the study and collect the following data: : The proportion of animals whose heart rate increased is independent of drug treatment. : The proportion of animals whose heart rate increased is associated with drug treatment.
Table 2. Hypothetical drug trial results.
Heart Rate Increased Treated Not treated Total 36 30 66 No Heart Rate Increase 14 25 39 Total 50 55 105
[ ] = + + ( + )( + )
After acquiring the chi square, the degrees of freedom will be computed next. We can get this by the formula: df= (number of columns-1)x(number of rows-1)
Chi Square Test of Independence
For a contingency table that has r rows and c columns, the chi square test can be thought of as a test of independence. In a test of independence, the null and alternative hypotheses are: : The two categorical variables are independent. : The two categorical variables are related.
Formula to be used:
2 =
( )2
Category I
Category II
Category III
Row Totals
Sample A
a+b+c
Sample B
d+e+f
Sample C
g+h+i
Column Totals
a+d+g
b+e+h
c+f+i
a+b+c+d+e+f+g+h+i=N
Effects of Learning Difficulties to the High School Students Mathematical Proficiency
Below Average Mathematical Proficiency With Learning Difficulties 42 Average Mathematical Proficiency Above Average Mathematical Proficiency
Total
62
29
133
Without Learning Difficulties Total
34
50
32
116
76
112
61
249
= Learning difficulties and mathematical proficiency are independent to each other. = Learning difficulties and mathematical proficiency are related to each other.
Observed 42 62 29 34 50 32
Expected 40.59 59.82 32.58 35.41 52.18 28.42
|O-E| 1.41 2.18 3.58 1.41 2.18 3.58
( ) 1.9881 4.7524 12.8164 1.9881 4.7524 12.8164
( ) / E 0.05 0.08 0.39 0.06 0.09 0.45
=
Computing for the expected values:
1.12