Title: Regression and Correlation: Mathematics Support Centre
Title: Regression and Correlation: Mathematics Support Centre
MATHEMATICS
SUPPORT CENTRE
Eight students took two mathematics tests. We In this example the number of pairs of values is
would like to know if we could predict the result 8 ie n = 8. We will put the calculations in a table:
of test 2 from test 1. The percentage results are Test 1 Test 2
given below: x y x2 y2 xy
Test1 54 72 32 68 55 80 45 77 54 31 2916 961 1674
Test2 31 38 16 34 27 41 22 37 72 38 5184 1444 2736
These results can be plotted on a scatter 32 16 1024 256 512
diagram. 68 34 4624 1156 2312
55 27 3025 729 1485
Scatter Diagram
80 41 6400 1681 3280
50 45 22 2025 484 990
77 37 5929 1369 2849
40
Total 483 246 31127 8080 15838
30
Σx Σy Σx Σy Σ xy
Test2
2 2
20
8 × 15838 − 483 × 246
b= = 0 ⋅ 5014
10
8 × 31127 − 483 2
0 b = 0 ⋅ 50 correct to 2 decimal places
0 20 40 60 80 100
246 − 0 ⋅ 5014 × 483
Test1 a= = 0 ⋅ 48 to 2 d.p.
8
We can see that those students with the higher so y = 0 ⋅ 48 + 0 ⋅ 50 x
marks in test 1 get higher marks in test 2 and the This is the equation of the regression line and
points almost lie on a straight line. If we can find can be drawn on the scatter diagram:
this line then we can use it to predict test 2 Scatter Diagram
marks. We want to make sure that the line is as
close to all the points as possible. This line is 50
40
called the regression line and is found by the 30
Test2
y = a + bx (if you are not sure about this see graph Test1
(Answers: a = 4, b = 3, y = 4 + 3 x
a = 59, b = 7, y = 59 + 7 x Exercise
a = -5·56, b = 0·58, y = -5·56 + 0·58 x Find the correlation coefficients for the data in
For each 1g increase in weight there will be 0·58 the previous exercise. (Answers: 1, 0·91, 0·99)
of a sultana. 15 sultanas )
The coefficient of determination is r2. This tells
Correlation us how much of the variation in y is explained by
In all the examples above the scatter diagrams the variation in x. In the above example r2 = 0·982
show that there seems to be a linear relationship r2 = 0·96 so we can say that 96% of the variation
connecting the variables so we are justified in in y is explained by the variation in x.
finding the regression line. We can also
calculate the correlation coefficient, r, to give a Exercise
measure of how good this relationship is. If the 1–3. Find the coefficients of determination for
points lie exactly on a straight line then we have the data in the previous exercise and interpret it.
perfect correlation and r = 1 or –1. Generally r 4. The following table gives the price of a
lies between these values. particular item and the number bought.
y y Price (£) 100 120 140 160 180
x x
Quantity 15 10 9 8 5
x r=1 x r = -1
x x
Plot a scatter diagram and find the equation of
x x the regression line if appropriate. Interpret the
x x coefficient of regression. Calculate the
y y correlation coefficient and coefficient of
x x x x determination. Estimate how many items are
x x 0<r <1 x x -1<r <0 bought if the price is a) £130 b) £210. Which is
x x the more reliable estimate and why?
x x
x x
(Answers: 1. 1, 100% of the variation in y is
y If the correlation is
x x
explained by the variation in x.
positive then as x
x x x r =0 2. 0·83, 83% of the variation in y is explained by
increases y increases
x x x the variation in x.
ie the gradient is
xx x x 3. 0·97, 97% of the variation in the number of
positive.
x sultanas is explained by the variation in the
If y decreases as x increases then the weight.
correlation is negative (negative slope). 4. q=24·8-0·11p, for each £ increase buy 0·11 less
items, -0·95, 0·91, 11, 2. a) is more reliable as
Mathematics Support Centre,Coventry University, 2001 within the given range of the data.)