0% found this document useful (0 votes)
76 views22 pages

Regression Analysis

This document discusses correlation and regression analysis. Correlation determines the relationship between variables and can range from +1 to -1, where 0 indicates no association. Pearson's correlation coefficient (r) specifically measures the strength of a linear relationship between interval/ratio variables. Linear regression finds the best fit line to estimate the relationship between two variables, with the equation Y = a + bx. The document provides an example of computing correlation and regression using data on students' math and physics test scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
76 views22 pages

Regression Analysis

This document discusses correlation and regression analysis. Correlation determines the relationship between variables and can range from +1 to -1, where 0 indicates no association. Pearson's correlation coefficient (r) specifically measures the strength of a linear relationship between interval/ratio variables. Linear regression finds the best fit line to estimate the relationship between two variables, with the equation Y = a + bx. The document provides an example of computing correlation and regression using data on students' math and physics test scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 22

CORRELATION AND

REGRESSION ANALYSIS
Correlation is a degree of relationship between variables, which seeks to
determine how well a linear or other equation describes or explains the relationship
between variables. It also implies “association” between two variables.

PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT

The Pearson product-moment correlation coefficient (or Pearson r for


short) is a measure of the strength of a linear association between two variables
with interval and ratio type of scale.

N  xy −  x y
r=
N  x − ( x) N  y − ( y) 
2 2 2 2

where :
 x = sum of the values of x
 y = sum of the values of y
 x = sum of the values of the square of x
2

 y = sum of the values of the square y


2

 xy = sum of the values of the product of x and y


n = total number of pair

The Pearson correlation coefficient, r, can take a range of values from +1


to -1. A value of 0 indicates that there is no association between the two variables.
This is shown in figure 7.

123
Figure 7: Scatter plot Diagram

The arbitrary scale for the interpretation of r is given below.

Range of computed r Interpretation


± 1.0 Perfect Relationship
± 0.70 to 0.99 Strong/ High Relationship
± 0.40 to 0.69 Moderate Relationship
± 0.10 to 0.39 Slight/ Low Relationship
0 No Correlation

LINEAR REGRESSION

Regression is a term used to describe the process of estimating the


relationship between two variables. The relationship is estimated by fitting a
straight line through the given data. The method of least squares permits us to
find a line of best fit called regression line which keeps the errors of prediction to
a minimum.

The equation for a fitted line is:

124
Y = a + bx

where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted

To find the slope b: To find the value of a:

N  xy −  x  y a = y − bx
b=
N  x 2 − ( x )
2
where:
y = mean value of Y
x = mean value of X
where :
 x = sum of the values of x
 y = sum of the values of y
 x = sum of the values of the square of x
2

 xy = sum of the values of the product of x and y


n= total number of pairs
Example

Below are the scores of 12 college students in Mathematics and Physics tests of
80 items each.

Mathematics (x) 65 63 67 64 68 62 70 66 68 67 69 71
Physics (y) 68 66 68 65 69 66 68 65 71 67 68 70

a. Draw a scatter diagram


b. Find the correlation coefficient of Mathematics and Physics scores and
interpret
c. Find the regression line equation
d. Predict the score in Physics (x) if the score in Mathematics (y) of the student
is 75

Solution

125
Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend
stop analysis, conclude “no relationship”. Otherwise proceed to step number 2

72
71
70
69
68
67
66
65
64
60 62 64 66 68 70 72

The scatter plot indicates an upward linear trend between Mathematics


and Physics proficiency. Thus, “there is a reason to believe that they are
related.”

Step 2: Compute for Pearson r by rearranging the given in columns.

Number
Mathematics
(x)
Physics (y) x2 y2 xy
1 65 68 4225 4624 4420

2 63 66 3969 4356 4158

3 67 68 4489 4624 4556

4 64 65 4096 4225 4160

5 68 69 4624 4761 4692


6 62 66 3844 4356 4092

7 70 68 4900 4624 4760

8 66 65 4356 4225 4290

9 68 71 4624 5041 4828

10 67 67 4489 4489 4489

11 69 68 4761 4624 4692

12 71 70 5041 4900 4970

N = 12  x =800  y =811  x 2
=53418 y 2
=54849  xy =54107

126
N  xy −  x y
r=
N  x − ( x) N  y − ( y) 
2 2 2 2

r=
(12)(54107) − (800)(811)
(12)(53418) − (800)2 (12)(54849) − (811)2 
r = 0.70
Referring to the arbitrary scale for the interpretation of r = 0.70, it states
that there is a strong/ high positive relationship between the scores of the
students in Mathematics and Physics.

Step 3: Formulate the regression line equation by solving first the value of the
variables b and a.

Solving for b

b=
(12)(54107) − (800)(811)
b = 0.48
(12)(53418) − (800)2
Solving for a

a = 67.58 − (0.48)(66.67 ) a = 35.59

Substitute the computed values of b and a to the regression line equation

Y = a + bx
y = 35.59 + 0.48 x regression line equation

We can now estimate scores in Physics (y) using the regression line
equation by substituting a value or score in Mathematics (x). Say for instance, if x
is equal to 75, then solving for y will give a 71.59.

y = 35.59 + 0.48(75)
y = 71.59

127
Therefore, the estimated score in Physics is 71.59 or approximately
equivalent to 72 if the score in Mathematics is 75. The regression line equation
may be used now in estimating scores for y by substituting a value of x.

COMPUTER BASED SOLUTION OF CORRELATION AND


LINEAR REGRESSION USING MS- EXCEL

For the solution of Correlation and Linear Regression using the Data Analysis
function on the Data menu, we need to install first the Analysis Toolpak which is
located in the add-in program.

1. Click the File tab, click Options, and then click the Add-Ins category. This
screen will appear.

2. Select Add-ins. In the Manage box, select Excel Add-ins and then click Go.

128
3. In the Add-Ins available box, select the Analysis ToolPak check box, and then
click OK. Click Data on the menu, then notice now that in the far right of the
excel screen, Data Analysis was successfully installed.

4. If you are prompted that the Analysis ToolPak is not currently installed on
your computer, click Yes to install it.

5. After the installation of Analysis Toolpak, let’s now start solving the example
for correlation and linear regression using the same given (scores in
Mathematics and Physics).

Mathematics
65 63 67 64 68 62 70 66 68 67 69 71
(x)
Physics (y)
68 66 68 65 69 66 68 65 71 67 68 70

129
6. Open a new blank MS- Excel worksheet then encode the given data in a vertical
manner or by column. Then number the respondents data. All in all there are
12 respondents.

7. Next step now is to select Data in the menu, then press Data Analysis. A
dialogue box will appear, scroll it down until finding Regression.

8. Press Ok. Another dialogue box will appear.

9. Select input X range, then highlight all the data in the mathematics, and then
in the input y range, highlight all the data in the physics. Remember that, x
represents time. Select confidence interval, the present value is 0.05 or 95%.

130
For the output, select new worksheet. For the residuals, select Line Fit plots to
visualize the graph of the data.

10. Click OK. The results will be this new worksheet.

11. For the interpretations, refer to the following:

a. Look for the Regression Statistics, The value of the Multiple R is actually
the value of Pearson-r. The r-value of 0.702651, referring to the
arbitrary scale, states that there is a strong/high relationship between

131
the mathematics and the physics scores. Take note also that this will
be the basis to continue performing regression, that the variables are
significantly related to each other. If there is no relationship found, then
stop the process and conclude no significant relationship.

b. The value of R Square (R2) 0.4937193 0r 49.37% now explains the


variation in the physics score from the mathematics score.

c. The ANOVA table presents the significance of the relation which is


0.01082225. Meaning, the relation of the variables is significant at
0.01 level of significance.

d. For the constant and coefficients of the linear regression equation y


= ax + b, refer to the third result in the worksheet. The first value in
the row of intercept is the value of the constant a. Then the first value
in the row of X Variable is the value of the regression coefficient b.
Substituting these values now in the regression equation will produce
to y = 35.824803 + 0.476378x.

e. The last part of the result in the worksheet presents the residual
outputs after applying the regression equation that we had previously
derived. The predicted values of y are given in the predicted y column.

If x = 65, then y = 35.824803 + 0.476378(75), y = 71.55 or 72. The


residuals mean the error in the predicted value when compared to the
actual observed value.

132
f. The Line Fit Plot now shows the graphical representation of the
variables with their actual observed values of Y and the predicted value
of Y using the previously derived linear regression equation.

133
134
EXERCISES for Chapter 5

Name:_________________ Score: _____________


Course & Year: _________ Date: ______________

1. Test scores of nine (9) students are shown below. What can you say about the
strength of the correlation between these sets of scores in Trigonometry and
Geometry?

Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57

135
2. Calculate the degree of linear relationship for the following number of minutes
consumed in studying and score in the examination.

Number of
27 50 57 15 18 48 52 55 28 32
minutes
Score in
40 53 52 24 21 35 40 39 47 36
examination

136
EXERCISES for Chapter 5

Name:_________________ Score: _____________


Course & Year: _________ Date: ______________

3. The number of hours spent per week viewing television (y) and the number of
years of education (x) were recorded for ten randomly selected individuals.
The results are given below;

x 12 14 11 16 16 18 12 20 10 12
y 10 9 15 8 5 4 20 4 16 15
a. Draw the scatter diagram.
b. Find the correlation coefficient of x and y and interpret your answer.
c. Find the regression line equation.
d. What is the predicted value of y if x are 15, 17 and 19.

137
138
EXERCISES for Chapter 5

Name:_________________ Score: _____________


Course & Year: _________ Date: ______________

4. An experiment was completed to study the relationship between


concentrations of estrone in saliva and in free plasma. The following data were
obtained:

Subject 1 2 3 4 5 6 7 8 9 10
Estrone in
7.4 7.5 8.5 9.0 9.0 11.0 13.0 14.0 14.5 16.0
Saliva (x)
Estrone in
free 30.0 25.0 31.5 27.5 39.5 38.0 43.2 49.0 55.0 48.5
plasma(y)

a. Compute and interpret the correlation coefficient for the estrone saliva
and estrone in free plasma.
b. Estimate the line of regression of estrone saliva on estrone in free
plasma.
c. If the estrone level is 12.1, predict the level of estrone in free plasma.

139
5. Compute the correlation ratio between test scores and teaching method.

Teaching Method 54 61 75 63 82 52 63 50
Test scores 76 80 89 80 88 83 79 82

140
EXERCISES for Chapter 5

Name:_________________ Score: _____________


Course & Year: _________ Date: ______________

6. Calculate the degree of linear relationship for the following number of years
of experiences and yearly salary.

No. of years of experience 7 11 33 24 5 18 35 19 10


Yearly salary 18 19 27 26 16 22 28 23 21

141
7. A study was conducted to examine the association between adult immunity
and juvenile mortality in southern fur seals. Therefore, researchers determined
the percentage of adult southern fur seals on different island populations that
contained a certain antibody in their blood and they also determined the
mortality rate for seal pups on those same islands. Is there a significant
relationship between adult southern seal immunity and seal pup mortality on
these islands?

Antibody
Presence 35 58 69 43 94 26 7 9 12 45 11 66 51
Pup
mortality 115 98 109 63 24 226 357 339 112 145 111 36 54

142
8. A researcher allegedly thinks that a person who works in the academe and
spends years in it receives yearly increment in his salary. So the researcher
conducted the research, gathered data and sought to create a linear regression
equation to represent this allegation. Below are the gathered data.

Monthl Monthl Monthl Monthl


y y y y
Yrs. Of Yrs. Of Yrs. Of Yrs. Of
Salary Salary Salary Salary
Experi Experi Experi Experi
(in (in (in (in
ence ence ence ence
Thousa Thousa Thousa Thousa
nds) nds) nds) nds)
7 18 2 14 19 30 35 24
11 19 9 18 10 26 6 17
33 28 12 20 9 24 5 17
24 25 13 25 12 20 7 18
5 19 7 18 13 25 11 16
18 23 10 18 7 18 33 25
35 30 3 12 10 18 20 25
19 21 4 15 3 12 13 25
10 16 18 25 4 15 7 18
8 20 35 28 18 23 10 18

a. Is there a basis for the researcher’s allegation?


b. Define the regression line equation.
c. If one of the male respondent has a 22-year experience, predict his
monthly salary.

143
144

You might also like