Regression Analysis
Regression Analysis
REGRESSION ANALYSIS
Correlation is a degree of relationship between variables, which seeks to
determine how well a linear or other equation describes or explains the relationship
between variables. It also implies “association” between two variables.
N xy − x y
r=
N x − ( x) N y − ( y)
2 2 2 2
where :
x = sum of the values of x
y = sum of the values of y
x = sum of the values of the square of x
2
123
Figure 7: Scatter plot Diagram
LINEAR REGRESSION
124
Y = a + bx
where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted
N xy − x y a = y − bx
b=
N x 2 − ( x )
2
where:
y = mean value of Y
x = mean value of X
where :
x = sum of the values of x
y = sum of the values of y
x = sum of the values of the square of x
2
Below are the scores of 12 college students in Mathematics and Physics tests of
80 items each.
Mathematics (x) 65 63 67 64 68 62 70 66 68 67 69 71
Physics (y) 68 66 68 65 69 66 68 65 71 67 68 70
Solution
125
Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend
stop analysis, conclude “no relationship”. Otherwise proceed to step number 2
72
71
70
69
68
67
66
65
64
60 62 64 66 68 70 72
Number
Mathematics
(x)
Physics (y) x2 y2 xy
1 65 68 4225 4624 4420
N = 12 x =800 y =811 x 2
=53418 y 2
=54849 xy =54107
126
N xy − x y
r=
N x − ( x) N y − ( y)
2 2 2 2
r=
(12)(54107) − (800)(811)
(12)(53418) − (800)2 (12)(54849) − (811)2
r = 0.70
Referring to the arbitrary scale for the interpretation of r = 0.70, it states
that there is a strong/ high positive relationship between the scores of the
students in Mathematics and Physics.
Step 3: Formulate the regression line equation by solving first the value of the
variables b and a.
Solving for b
b=
(12)(54107) − (800)(811)
b = 0.48
(12)(53418) − (800)2
Solving for a
Y = a + bx
y = 35.59 + 0.48 x regression line equation
We can now estimate scores in Physics (y) using the regression line
equation by substituting a value or score in Mathematics (x). Say for instance, if x
is equal to 75, then solving for y will give a 71.59.
y = 35.59 + 0.48(75)
y = 71.59
127
Therefore, the estimated score in Physics is 71.59 or approximately
equivalent to 72 if the score in Mathematics is 75. The regression line equation
may be used now in estimating scores for y by substituting a value of x.
For the solution of Correlation and Linear Regression using the Data Analysis
function on the Data menu, we need to install first the Analysis Toolpak which is
located in the add-in program.
1. Click the File tab, click Options, and then click the Add-Ins category. This
screen will appear.
2. Select Add-ins. In the Manage box, select Excel Add-ins and then click Go.
128
3. In the Add-Ins available box, select the Analysis ToolPak check box, and then
click OK. Click Data on the menu, then notice now that in the far right of the
excel screen, Data Analysis was successfully installed.
4. If you are prompted that the Analysis ToolPak is not currently installed on
your computer, click Yes to install it.
5. After the installation of Analysis Toolpak, let’s now start solving the example
for correlation and linear regression using the same given (scores in
Mathematics and Physics).
Mathematics
65 63 67 64 68 62 70 66 68 67 69 71
(x)
Physics (y)
68 66 68 65 69 66 68 65 71 67 68 70
129
6. Open a new blank MS- Excel worksheet then encode the given data in a vertical
manner or by column. Then number the respondents data. All in all there are
12 respondents.
7. Next step now is to select Data in the menu, then press Data Analysis. A
dialogue box will appear, scroll it down until finding Regression.
9. Select input X range, then highlight all the data in the mathematics, and then
in the input y range, highlight all the data in the physics. Remember that, x
represents time. Select confidence interval, the present value is 0.05 or 95%.
130
For the output, select new worksheet. For the residuals, select Line Fit plots to
visualize the graph of the data.
a. Look for the Regression Statistics, The value of the Multiple R is actually
the value of Pearson-r. The r-value of 0.702651, referring to the
arbitrary scale, states that there is a strong/high relationship between
131
the mathematics and the physics scores. Take note also that this will
be the basis to continue performing regression, that the variables are
significantly related to each other. If there is no relationship found, then
stop the process and conclude no significant relationship.
e. The last part of the result in the worksheet presents the residual
outputs after applying the regression equation that we had previously
derived. The predicted values of y are given in the predicted y column.
132
f. The Line Fit Plot now shows the graphical representation of the
variables with their actual observed values of Y and the predicted value
of Y using the previously derived linear regression equation.
133
134
EXERCISES for Chapter 5
1. Test scores of nine (9) students are shown below. What can you say about the
strength of the correlation between these sets of scores in Trigonometry and
Geometry?
Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57
135
2. Calculate the degree of linear relationship for the following number of minutes
consumed in studying and score in the examination.
Number of
27 50 57 15 18 48 52 55 28 32
minutes
Score in
40 53 52 24 21 35 40 39 47 36
examination
136
EXERCISES for Chapter 5
3. The number of hours spent per week viewing television (y) and the number of
years of education (x) were recorded for ten randomly selected individuals.
The results are given below;
x 12 14 11 16 16 18 12 20 10 12
y 10 9 15 8 5 4 20 4 16 15
a. Draw the scatter diagram.
b. Find the correlation coefficient of x and y and interpret your answer.
c. Find the regression line equation.
d. What is the predicted value of y if x are 15, 17 and 19.
137
138
EXERCISES for Chapter 5
Subject 1 2 3 4 5 6 7 8 9 10
Estrone in
7.4 7.5 8.5 9.0 9.0 11.0 13.0 14.0 14.5 16.0
Saliva (x)
Estrone in
free 30.0 25.0 31.5 27.5 39.5 38.0 43.2 49.0 55.0 48.5
plasma(y)
a. Compute and interpret the correlation coefficient for the estrone saliva
and estrone in free plasma.
b. Estimate the line of regression of estrone saliva on estrone in free
plasma.
c. If the estrone level is 12.1, predict the level of estrone in free plasma.
139
5. Compute the correlation ratio between test scores and teaching method.
Teaching Method 54 61 75 63 82 52 63 50
Test scores 76 80 89 80 88 83 79 82
140
EXERCISES for Chapter 5
6. Calculate the degree of linear relationship for the following number of years
of experiences and yearly salary.
141
7. A study was conducted to examine the association between adult immunity
and juvenile mortality in southern fur seals. Therefore, researchers determined
the percentage of adult southern fur seals on different island populations that
contained a certain antibody in their blood and they also determined the
mortality rate for seal pups on those same islands. Is there a significant
relationship between adult southern seal immunity and seal pup mortality on
these islands?
Antibody
Presence 35 58 69 43 94 26 7 9 12 45 11 66 51
Pup
mortality 115 98 109 63 24 226 357 339 112 145 111 36 54
142
8. A researcher allegedly thinks that a person who works in the academe and
spends years in it receives yearly increment in his salary. So the researcher
conducted the research, gathered data and sought to create a linear regression
equation to represent this allegation. Below are the gathered data.
143
144