Correlation & Simple Regression
Correlation & Simple Regression
Bivariate Data/Distribution
Bivariate data refer to data relating to two variables. Statistical data relating to
simultaneous measurements of two variables are called bivariate data. Thus, for
n item we have n pairs of measurements or observations as
( x1 , y1 ), ( x2 , y2 ), ..., ( xn , yn ).
Example
Following is the bivariate data showing the Height (c.m) and Weight (kg) of 10
students.
Height (cm) 130 126 120 124 125 127 126 123 130 124
Weight (kg) 40 32 23 35 34 34 32 28 38 30
In the above data the height (cm) and weight (kg) of the first student are paired
as (130, 40), for the second student as (126, 32) and so on.
Scatter Diagram
For a bivariate distribution xi , yi , i 1,2,........., n , the diagram of the dots obtained
by the values of the variates x and y along the x-axis and y-axis respectively in
the x, y-plane gives the scatter diagram. From a scatter diagram it can be
evidently ascertained whether there is any correlation exists among the variates
or not.
1
placed the cumulative GPA (the dependent variable) on the vertical or Y axis
and the entrance examination score (the independent variable) on the horizontal
or X-axis. Figure-1 shows the completed scatter diagram.
4
Cumulative GPA
3.75
3.5
3.25
3
2.75
2.5
2.25
2
50 55 60 65 70 75 80 85 90 95
Entrance examination scores
Example: With a rise in price, the demand for commodity goes down; with the
better monsoon, output of the agricultural produces increases etc.
2
Correlation Analysis
A group of statistical techniques used to measure the strength of the relationship
(correlation) between two variables. The basic purpose of correlation analysis is
to find how strong the relationship is between two variables.
Measures of correlation
S .P.( x, y )
r ----------------------- (i)
S .S ( x).S .S ( y )
x y x 2
S .P( x. y) xy , S .S ( x) x 2
,
n n
y 2
S .S ( y ) y 2
n
The value of the correlation coefficient always lies in the range of –1 to 1; that
is,
-1 ≤ ρ ≤ 1 and -1 ≤ r ≤ 1
3
Figure_C1 Linear correlation between two
variables.
r=1
x 47
r = -1
48
x
4
Figure_C3 Linear correlation between two variables.
r≈0
49
x
x
(a) Strong positive linear correlation (r is close to 1)
50
5
Figure_C5 Linear correlation between variables.
x
(b) Weak positive linear correlation (r is positive
but close to 0)
51
x
(c) Strong negative linear correlation (r is close to -1)
52
6
Figure_C7 Linear correlation between variables.
x
(d) Weak negative linear correlation (r is negative
and close to 0)
53
Example
Rising Hills Manufacturing Inc. wishes to study the relationship between the
numbers of workers, X, and the number of tables, Y, produced in its Redwood
Falls plant. It has obtained a random sample of 10 hours of production. The
following (x, y) combinations of points were obtained:
(12, 20) (30, 60) (15, 27) (24, 50) (14, 21)
(18, 30) (28, 61) (26, 54) (19, 32) (27, 57)
Compute the covariance and correlation coefficient. Discuss briefly the
relationship between the number of workers and the number of tables produced
per hour.
Solution
The computations are set out in the Table bellow.
xi yi ( xi x ) ( xi x ) 2 ( yi y ) ( yi y ) 2 ( xi x )( yi y )
7
19 32 -2.3 5.29 -9.2 84.64 21.16
27 57 5.7 32.49 15.8 249.64 90.06
213 412 378.1 2505.6 962
Thus
( xi x )( yi y ) 962.4
Cov ( x, y ) s xy 106.93
n 1 9
( xi x ) 2 378.1
s x2 42.01
n 1 9
( yi y ) 2 2505.6
s 2y 278.4
n 1 9
And correlation
Cov ( x, y ) 106.93
r 0.989
sx s y 42.01 278.4
Problem: 1
Calculate the coefficient of correlation between the number of sales calls and
the number of units sold and comment on the result.
Sales Representative R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
No. of Sales Calls (x) 14 35 22 29 6 15 17 20 12 29
No. Units Sold (y) 28 66 38 70 22 27 28 47 14 68
Problem: 2
A department store gives in-service training to its salesmen which are followed
by a test. It is considering whether it should terminate the services of any
salesman who does not do well in the test. The following data gives the test
scores and sales made by the salesmen during a certain period.
Test Scores 15 20 25 22 27 23 16 21 20
Sales (Thousand Tk) 32 37 49 38 51 46 33 41 39
Compute the correlation coefficient between the test scores and the sales.
Problem: 3
From the following data find the association or the correlation in the value of
two currencies, the German mark and the Japanese yen, from 1988 to 1997.
Exchange rate of the German mark and the Japanese yen in U.S. dollars
8
Year 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
G_Mark 1.76 1.88 1.62 1.66 1.56 1.65 1.62 1.50 1.54 1.80
J_Yen 128.1 138 145 134.6 126.8 111.2 102.2 103.4 115.9 130.4
Regression Analysis
It is necessary to develop an equation to express the relationship between two
variables and estimate the value of the dependent variable Y based on a selected
value of the independent variable X . the technique used to develop the equation
for the straight line and make these predictions is called regression analysis. So,
regression is a statistical method to estimate (or predict) the unknown values of
one variable (Y ) for specified values of the other variable (X ).
Definition
A regression model is a mathematical equation that describes the relationship
between two or more variables. A simple regression model includes only two
variables: one independent and one dependent. The dependent variable is the
one being explained, and the independent variable is the one used to explain the
variation in the dependent variable.
9
SIMPLE LINEAR REGRESSION ANALYSIS cont.
y = A + Bx
Food Expenditure
Linear
Nonlinear
Income Income
(a) (b)
10
Figure_R2 Plotting a linear equation.
y
y = 50 + 5x
150
100 x = 10
y = 100
50 x=0
y = 50
5 10 15 x
6
SS xy
b and a y bx
SS xx
18
11
The Least Squares Line cont.
where
x y x 2
SS xy xy and SS xx x
2
n n
19
Example
Find the least squares regression line for the data on incomes and food
expenditure on the seven households given in the Table_R1, Use income as an
independent variable and food expenditure as a dependent variable.
12
Solution
Table_R2
x 212 y 64
x x / n 212 / 7 30.2857
y y / n 64 / 7 9.1429
22
13
x y (212)(64)
SS xy xy 2150 211.7143
n 7
x 2
(212) 2
SS xx x 2 7222 801.4286
n 7
23
Solution 13-1
SS xy 211.7143
b .2642
SS xx 801.4286
a y bx 9.1429 (.2642)(30.2857) 1.1414
Thus,
ŷ = 1.1414 + .2642x
24
14
Problems
1. The Bradford Electric Illuminating Company is studying the relationship
between kilowatt-hours (thousands) and number of rooms in a private
single-family residence. A random sample of 10 homes yielded the
following.
Number of Rooms 12 9 14 6 10 8 10 10 5 7
Kilowatt-hours(thous) 9 7 10 5 8 6 8 10 4 7
15