0% found this document useful (0 votes)
30 views2 pages

Title: Regression and Correlation: Mathematics Support Centre

This document provides information about calculating a regression line from sample data to predict outcomes. It includes: 1) An example of test score data from 8 students and calculating the regression line equation to predict Test 2 scores from Test 1. 2) Formulas for calculating the regression coefficient (b), representing the change in y for a one unit change in x, and the y-intercept (a). 3) How to interpret the regression coefficient and use the line equation to predict scores. 4) Definitions of correlation coefficient (r) and coefficient of determination (r2) as measures of the strength of the linear relationship.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
30 views2 pages

Title: Regression and Correlation: Mathematics Support Centre

This document provides information about calculating a regression line from sample data to predict outcomes. It includes: 1) An example of test score data from 8 students and calculating the regression line equation to predict Test 2 scores from Test 1. 2) Formulas for calculating the regression coefficient (b), representing the change in y for a one unit change in x, and the y-intercept (a). 3) How to interpret the regression coefficient and use the line equation to predict scores. 4) Definitions of correlation coefficient (r) and coefficient of determination (r2) as measures of the strength of the linear relationship.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 2

S12

MATHEMATICS
SUPPORT CENTRE

Title: Regression and Correlation

Target: On completion of this worksheet you should be able to calculate the


equation of a regression line, the correlation coefficient and the coefficient of
determination.

Eight students took two mathematics tests. We In this example the number of pairs of values is
would like to know if we could predict the result 8 ie n = 8. We will put the calculations in a table:
of test 2 from test 1. The percentage results are Test 1 Test 2
given below: x y x2 y2 xy
Test1 54 72 32 68 55 80 45 77 54 31 2916 961 1674
Test2 31 38 16 34 27 41 22 37 72 38 5184 1444 2736
These results can be plotted on a scatter 32 16 1024 256 512
diagram. 68 34 4624 1156 2312
55 27 3025 729 1485
Scatter Diagram
80 41 6400 1681 3280
50 45 22 2025 484 990
77 37 5929 1369 2849
40
Total 483 246 31127 8080 15838
30
Σx Σy Σx Σy Σ xy
Test2

2 2

20
8 × 15838 − 483 × 246
b= = 0 ⋅ 5014
10
8 × 31127 − 483 2
0 b = 0 ⋅ 50 correct to 2 decimal places
0 20 40 60 80 100
246 − 0 ⋅ 5014 × 483
Test1 a= = 0 ⋅ 48 to 2 d.p.
8
We can see that those students with the higher so y = 0 ⋅ 48 + 0 ⋅ 50 x
marks in test 1 get higher marks in test 2 and the This is the equation of the regression line and
points almost lie on a straight line. If we can find can be drawn on the scatter diagram:
this line then we can use it to predict test 2 Scatter Diagram
marks. We want to make sure that the line is as
close to all the points as possible. This line is 50
40
called the regression line and is found by the 30
Test2

Method of Least Squares. We will use formulae 20

to find the equation of the regression line.


10
0
Suppose the equation of the regression line is 0 50 100

y = a + bx (if you are not sure about this see graph Test1

sheet G6). If we call the test 1 results ‘x’ and the


If another student achieved a score of 40 in
test 2 results ‘y’ then a and b are found from the
test 1 then we can predict that this student will
formulae:
get (0·48 + 0·50 × 40) = 20 in test 2.
nΣxy − (Σx)(Σy ) Σy − b × Σx The value of b (the gradient of the line) is called
b= a=
nΣx 2 − (Σx ) 2 n the regression coefficient and shows that for
(n is the number of pairs of values) each extra mark in test 1 the mark in test 2
Note: We must find b first then use it to find a. goes up by 0·50.
Mathematics Support Centre,Coventry University, 2001
Exercise Example
Plot a scatter diagram and find the values of a Using the data from the previous example
and b and the equation of the line of best fit for calculate the correlation coefficient, r.
the following sets of data: nΣxy − ΣxΣy
1.
r=
(nΣx 2 − (Σx) 2 )(nΣy 2 − (Σy ) 2 )
x 0 1 2
y 4 7 10 We have already calculated these totals :
2. Σx = 483 Σx 2 = 31127 Σxy = 15838
x 1 2 3 4
Σy = 246 Σy 2 = 8080 n = 8
y 70 70 80 100
3. Five fruit buns were weighed and the number 8 × 15838 − 483 × 246
r=
of sultanas in each was noted: (8 × 31127 − 483 2 )(8 × 8080 − 246 2 )
Weight (g) 22 38 47 50 53
r = 0 ⋅ 98
No. sultanas 7 18 20 24 26 This is close to 1 which is to be expected as we
Give an interpretation of b and predict how many can see from the scatter diagram that the points
sultanas there would be in a bun weighing 35g. lie very close to the regression line.

(Answers: a = 4, b = 3, y = 4 + 3 x
a = 59, b = 7, y = 59 + 7 x Exercise
a = -5·56, b = 0·58, y = -5·56 + 0·58 x Find the correlation coefficients for the data in
For each 1g increase in weight there will be 0·58 the previous exercise. (Answers: 1, 0·91, 0·99)
of a sultana. 15 sultanas )
The coefficient of determination is r2. This tells
Correlation us how much of the variation in y is explained by
In all the examples above the scatter diagrams the variation in x. In the above example r2 = 0·982
show that there seems to be a linear relationship r2 = 0·96 so we can say that 96% of the variation
connecting the variables so we are justified in in y is explained by the variation in x.
finding the regression line. We can also
calculate the correlation coefficient, r, to give a Exercise
measure of how good this relationship is. If the 1–3. Find the coefficients of determination for
points lie exactly on a straight line then we have the data in the previous exercise and interpret it.
perfect correlation and r = 1 or –1. Generally r 4. The following table gives the price of a
lies between these values. particular item and the number bought.
y y Price (£) 100 120 140 160 180
x x
Quantity 15 10 9 8 5
x r=1 x r = -1
x x
Plot a scatter diagram and find the equation of
x x the regression line if appropriate. Interpret the
x x coefficient of regression. Calculate the
y y correlation coefficient and coefficient of
x x x x determination. Estimate how many items are
x x 0<r <1 x x -1<r <0 bought if the price is a) £130 b) £210. Which is
x x the more reliable estimate and why?
x x
x x
(Answers: 1. 1, 100% of the variation in y is
y If the correlation is
x x
explained by the variation in x.
positive then as x
x x x r =0 2. 0·83, 83% of the variation in y is explained by
increases y increases
x x x the variation in x.
ie the gradient is
xx x x 3. 0·97, 97% of the variation in the number of
positive.
x sultanas is explained by the variation in the
If y decreases as x increases then the weight.
correlation is negative (negative slope). 4. q=24·8-0·11p, for each £ increase buy 0·11 less
items, -0·95, 0·91, 11, 2. a) is more reliable as
Mathematics Support Centre,Coventry University, 2001 within the given range of the data.)

You might also like