Module-4 (Correlation & Regression)
Module-4 (Correlation & Regression)
3 CHAPTER
STATISTICAL METHOD
4
H Introduction correlation
Rank correlation
I Properties of Rank correlation coefficient
G Introduction Regression Lines
H
L
I
G
H
T
S
4.2 Engineering Mathematics - III
o x
If two variables very in such a way that their ratio is always constants, then the
correlation is said to be perfect.
Uses of correlation
(i) It is used in deriving precisely the degree and direction of relationship between
variables like price and demand, advertising expenditure and sales, rain falls and
crops yield etc.
(ii) It is used in developing the concept of regression and ratio of variation which help
in estimating the values of one variable for a given value of another variable.
(iii) It is used in reducing the range of uncertainty in the matter of prediction.
(iv) It is used in presenting the average relationship between any two variables
through a single value of co - efficient of correlation.
(v) In the field of economics it is used in understanding the economic behaviour and
locating the important variables on which the others depend.
(vi) In the field of business it is used advantageously to estimate the cost of sales,
volume of sales, sales-prices and any other value on the basis of some other
variables which are financially related to each other.
(vii) In the field of science and philosophy, also the methods of correlation are profusely
used in making progressive developments in the respective lines.
(viii) In the field of nature also, it is used in observing the multiplicity of the inter-
related forces.
Correlation
Let x1, x2, x3... xn be n values of x and y1, y2 ,.... yn be the corresponding values of y. Then
1 1
x = ∑ x i and y = ∑ yi are called mean of the x – series and y – series.
n n
1
( ) 1
( )
2 2
σ x 2 = ∑ x i − x and σ x 2 = ∑ yi − y is called the variance of the x–series and y–series
n n
σ x and σ y is called S.D of x series and y series.
4.4 Engineering Mathematics - III
1
() 1
( ) with x , y , σ x , σ y given as above
2 2
σ x 2 = ∑ xi 2 − x and σ y 2 = ∑ yi 2 − y
n n
r=
(
∑ xi − x ) ( y − y ) is called the coefficient of correlation between x and y.
i
n σx σ y
For the coefficient of correlation between x and y –1 ≤ r ≤ 1.
Problem 1
Prove the following formulas for he coefficient of correlation
2 2
1 x y 1 xi y
(i) r = 1 − ∑ i − i (ii) r = 1 + ∑ + i
2n σ x σ y 2n σ x σ y
Hence reduce that –1 ≤ r ≤ 1
Solution
2
x y x 2 2x y y2 1 2 1
∑ i − i = ∑ i 2 − i i + i 2 = 2 ∑ xi2 − ∑ xi yi + 2 ∑ yi2
σ
x σy σ x σ x σ y σ y σ x σxσ y σy
2
x y 1 2 1
∑ i − i = 2 ( nσ x 2 ) − + {r n σ x σ y } + 2 ( nσ y 2 )
σ
x σ y σx σxσ y σy
2
1 x y
= n − 2nr + n = 2n (1 − r ) ⇒ r = 1 − ∑ i − i
2n σ x σ y
2
x y
Similarly, ∑ i + i = 2n (1 + r ) (2)
σx σ y
2
1 x y
⇒⇒ r = −1 + ∑ i + i
2n σ x σ y
Since L H S of (1) and (2) are non – negative and n > 0, it follows that
1 – r ≥ 0 and1 + r ≥ 0 i.e., 1 ≥ r and r ≥ –1 ⇒ –1 ≤ r ≤ 1.
If r = ± 1, we say that x and y are perfectly correlated.
If r = 0, we say that x and y are non – correlated.
Taking x i − x = X i and yi − yi = Yi
r=
(
∑ xi − x ) ( y − y ) can be rewritten as
i
∑ X iYi
n σx σ y r=
n σx σ y
Statistical Method 4.5
1 1
Also, σ x = ∑ x i , σ y = ∑ yi
2 2 2 2
n n
∑ X iYi
and r = This is known as the product - moment formula for r.
(
∑ X i 2 ∑ Yi 2 )( )
Problem 2
If r is the correlation coefficient between x and y, and z = ax + by, show that
r=
(
σ z 2 − a 2 σ x 2 + b2 σ y 2 ) (1)
2 a b σx σ y
Solution
Since z = ax + by & Z = ax + b y , where x , y & z are the means of x, y and z Series.
Let zi = axi + byi, i = 1, 2, 3,.... n. Then zi − z = ( ax i + byi ) − ax + b y ( )
= a( x i − x ) + b( yi − y )
( ) ( ) ( ) ( )( y − y)
2 2 2
So that zi − z = a2 x i − x + b2 yi − y + 2 ab x i − x i
(1)
( ) ( ) ( ) ( )( y − y)
2 2 2
∑ zi − z
= a2 ∑ x i − x + b2 ∑ yi − y + 2 ab ∑ x i − x i
(2)
∑ ( x − x )( y − y)
(
By using, σ x 2 = 1 ∑ x i − x , σ y 2 = 1 ∑ yi − y ) ( ) i i
2 2
and r =
n n nσ x σ y
⇒ r=
(
σ z 2 − a2σ x 2 + b2σ y 2 ) Is the required relation.
2 ab σ x σ y
As particular cases of the relation (1) proved above, we find by taking a = 1, b = 1, and
a = 1, b = –1 respectively, that r =
σ2x + y − σ x 2 + σ y 2 ( (3)
)
2σ x σ y
σ x 2 + σ y2 − σ x − y2
and r = (4)
2σ x σ y
σ x + y2 − σ x − y2
Adding the above expressions we get r =
4σ x σ y
4.6 Engineering Mathematics - III
Problem 3
Find the correlation coefficient for the following data
x 1 2 3 4 5
y 2 5 3 8 7
Solution
∑ x i yi 13 13
∴ r= = = = 0.806
∑ x i ∑ yi
2 2
10 × 26 16.1245
Problem 4
The following table gives the ages (in years) of 10 married couples. Calculate the
coefficient of correlation between these ages.
Age of husband (x) 23 27 28 29 30 31 33 35 36 39
Age of wife (y) 18 22 23 24 25 26 28 29 30 32
Solution
Here n = 10 values of (x, y) are given
1 311
The mean of the x – Series is x = ∑ x i = = 31.1
n 10
1 257
The mean of the y – series is y = ∑ x i = = 25.7
n 10
Statistical Method 4.7
xi X i = xi − x Xi2 yi Yi = yi − y Yi2 Xi Yi
Correlation is almost perfect i.e. in the given data, the ages of husbands and wives are
almost perfectly correlated.
Problem 5
σ x2 + σ y2 − σ x2 − y
Employ the formula r = to determine r for the following data
2σ x σ y
X 92 89 87 86 83 77 71 63 53 50
y 86 83 91 77 68 85 52 82 37 57
Solution
n = 10, values of (x, y) are given.
1 751 1 718
x = ∑ xi = = 75.1 and y = ∑ yi = = 71.8
10 10 10 10
1
\ σ2x = ∑ x i2 − x ()
2
10
1
(
= 10 92 + 89 + 87 + 86 + 83 + 77 + 71 + 63 + 53 + 50 − (75.1)
2 2 2 2 2 2 2 2 2 2 2
)
58487
− (75.1) = 208.69
2
=
10
4.8 Engineering Mathematics - III
1
( )
2
and σ y 2 = ∑ yi 2 − y
10
1
= 862 + 832 + 912 + 772 + 682 + 852 + 522 + 822 + 372 + 572 − (71.8 )
2
10
54390
= − ( 71.8 )2 = 283.76
10
1 1
Let zi = xi – yi then the mean of the Z – Series is z = ∑ zi = ∑ ( x i − yi )
10 10
= [6 + 6 − 4 + 9 + 15 − 8 + 19 − 19 + 16 − 7] =
1 33
= 3.3
10 10
1
()
2
Also, σ x − y 2 = σ z 2 = ∑ zi 2 − z
10
1
= 62 + 62 + ( −4 ) + 92 + 152 + ( −8 ) + 192 + ( −19) + 162 + ( −7 ) − (3.3)
2 2 2 2 2
10
1485
− (3.3) = 137.61
2
=
10
σ x 2 + σ y2 − σ x − y2 208.69 + 283.76 − 137.61
Using r = =
2σ x σ y 2 208.69 × 283.76
354.84
= = 0.729 is the required correlation coefficients.
2 × 243.347
Problem 6
If the variables x and y are such that (i) x + y has variable 15 (ii) x - y has variance
11 and (iii) 2x + y has variance 29. Find σx, σy and the coefficient of correlation
between x and y.
Solution
If z = ax + by we have σz = a σx + b2σy2 + 2abrσxσy
2 2 2
⇒ σ2x + y = σ x 2 + σ y 2 + 2r σ x σ y
σ2x − y = σ x 2 + σ y 2 − 2r σ x σ y
σ22 x + y = 4σ x 2 + σ y 2 + 4r σ x σ y
σ2x + y = 15 , σ2x − y = 11 , σ22 x + y = 29 , using these we get
σx2 + σy2 +2r σx σy = 15 (1)
σx + σ –2r σx σy = 11
2
y
2
(2)
Statistical Method 4.9
σx 2 + σy 2 − 11 4 + 9 − 11 2 1
r= = = =
2σ x σ y 2 × ( 2 )( 3 ) 12 6
Problem 7
While calculating the correlation coefficients the x and y from 25 pairs of
observations, a person obtained the following values: Σxi = 125, Σxi2 = 650,
Σyi = 100, Σyi2 = 460, Σxiyi = 508. It was later discovered that he had copied down
the pairs (8, 12) and (6, 8) as (6, 12) and (8, 6) respectively. Obtain the correct
value of the correlation coefficient.
Solution
In the competition of the correct value of the correlation coefficient the pairs (6, 12)
and (8, 6) are to be changed to (8, 12) and (6, 8) respectively.
Correct Σxi = 125 – 6 – 8 + 8 + 6 = 125
Correct Σxi2 = 650 – 62 –82 + 82 + 62 = 650
Correct Σyi = 100 – 12 – 6 + 12 + 8 = 102
Correct Σyi2 = 460 – 122 – 62 + 122 + 82 = 488
Correct Σxiyi = 508 – (6 × 12) – (8 × 6) + (8 × 12) + (6 × 8) = 532
Since the number of observations is n = 25, we have
∑ x i 125 ∑ yi 102
x=
= = 5, y = = = 4.08
n 25 n 25
∑ xi 2
() 650
2
σ2x = − x = − 25 = 1 so that σx = 1
n 25
∑ yi 2
( ) 488
2
σ2y = − y = − (4.08)2 = 2.8736 so that σ y = 1.6952
n 25
∴ Correct correlation coefficient is
r=
∑ ( x i − x )( yi − y )
=
∑x y
i i − x ∑ yi − y ∑ x i + n x y
n σx σ y n σx σ y
4.10 Engineering Mathematics - III
1 ∑ x i yi ∑ yi ∑ xi
= −x −y + x y
σx σ y n n n
1 ∑ x i yi 1 ∑ x i yi
= − x y − x y + x y = − x y
σx σ y n σx σ y n
1 532
= − 5 × 4.08 = 0.5191
1.6952 25
Exercises
1. In each of the following cases, calculate the coefficient of correlation between x
and y
x 10 14 18 22 26 30
(i)
y 18 12 24 6 30 36
Ans.: 0.67
x 1 2 3 4 5 6 7 8 9 10
(ii)
y 10 12 16 28 25 36 41 49 40 50
Ans.: 0.96
x 28 45 40 38 35 33 40 32 36 33
(iii)
y 23 34 33 34 30 26 28 31 36 35
Ans.: 0.518
x 1 2 3 4 5 6 7
(i)
y 4 6 9 10 12 14 15
Ans.: 0.99
(ii) x 21 23 30 54 57 58 72 78 87 90
y 60 71 72 83 110 84 100 92 113 135
Ans.: 0.876
( )
2
= ∑ x i 2 + n ( x ) − 2x ∑ x i
2
their means than ∑ x i 2 = ∑ x i − x
n ( n + 1)
2
2n + 1
= ∑n + 2
− .∑n
4 2
n( n + 1 )( 2n + 1 ) n( n + 1 )2 n( n + 1 )2
= + −
6 4 2
=
1 3
12
(
n −n )
Similarly ∑ yi 2 =
12
(
1 3
n −n )
(
Let di = xi – yi so that di = x i − x − yi − y ) ( )
di = Xi – Yi
\ ∑ di = ∑ X i + ∑ Yi − 2 ∑ X iYi
2 2 2
or ∑ X iYi =
1
2
(
∑ X i 2 + ∑ Yi 2 − ∑ di 2 )
=
1 3
12
( 1
n − n − ∑ di 2
2
)
∑ X iYi
Hence the correlation coefficient between these variables is r =
∑ X i 2 ∑ Yi 2
1 3
( 1
)
n − n − ∑ di 2
6∑d 2
= 12 2 =1− 3 i
n −n
1 3
12
n −n( )
4.12 Engineering Mathematics - III
6 ∑ d12
r =1− this is called the rank correlation coefficient and is denoted by ρ thus
n3 − n
6 ∑ di 2
ρ =1−
n3 − n
Where ρ Rank coefficient of correlation
d2 Some of the squares of the differences of two ranks.
n Number of paired observations.
Problem 1
Following are the rank obtained by 10 students in two subjects, statistics and
mathematics to what extent the knowledge of the students in two subjects is
related?
Statistics 1 2 3 4 5 6 7 8 9 10
Mathematics 2 4 1 5 3 9 7 10 6 8
Solution
Rank in statistics xi Rank in Mathematics d=x–y d2
1 2 –1 1
2 4 –2 4
3 1 2 4
Statistical Method 4.13
4 5 –1 1
5 3 2 4
6 9 –3 9
7 7 0 0
8 10 –2 4
9 6 3 9
10 8 2 4
di = 40
2
Rank correlation
6 ∑ d2
ρ =1−
n = 10
n3 − n
6 × 40
ρ =1−
= 0.757576
1000 − 10
Problem 2
Ten participants in a contest are ranked by two judges as follows
x 1 6 5 10 3 2 4 9 7 8
y 6 4 9 8 1 2 3 10 5 7
(VTU 2002)
Calculate the rank correlation coefficient ρ
Solution
If di = xi – yi then di = –5, 2, –4, 2, 2, 0, 1, –1, 2, 1
di2 = 25 + 4 + 16 + 4 + 4 + 0 + 1 + 1 +4 + 1 = 60
∑ di 2 6 × 60
\ Rank correlation ρ = 1 − =1− = 0.6364
n −n
3
990
Problem 3
Three judges A, B, C, give the following ranks, Find which pair of judges has
common approach.
A 1 6 5 10 3 2 4 9 7 8
B 3 5 8 4 7 10 2 1 6 9
C 6 4 9 8 1 2 3 10 5 7
4.14 Engineering Mathematics - III
Solution
Here n = 10
A (= x) Ranks by C(= z) d1 = x – y d2 = y – z d3 = z – x d12 d22 d32
B (= y)
1 3 6 –2 –3 5 4 9 25
6 5 4 1 +1 –2 1 1 4
5 8 9 –3 –1 4 9 1 16
10 4 8 6 –4 –2 36 16 4
3 7 1 –4 6 –2 16 36 4
2 10 2 –8 8 0 64 64 0
4 2 3 2 –1 –1 4 1 1
9 1 10 8 –9 1 64 81 1
7 6 5 1 1 –2 1 1 4
8 9 7 –1 2 –1 1 4 1
200 214 60
From the table we have d1 = 200, d2 = 214, d3 = 60
2 2 2
6 ∑ d12 6 × 200
\ ρ( x , y ) = 1 − =1− = −0.2
n −n
3
10 × 99
6 ∑ d22 6 × 214
ρ( y , z ) = 1 − =1− = −0.3
n −n
3
10 × 99
6 ∑ d32 6 × 60
ρ( z , x ) = 1 −
=1− = 0.6
n −n
3
10 × 99
sine ρ(z, x) is maximum, the pair of judges A and C have the nearest common approach.
Exercise
1. The ranks of 16 students in mathematics and statistics are as follows:
(1, 1) (2, 10) (3, 3) (4, 4) (5, 5) (6, 7) (7, 2) (8, 6) (9, 8) (10, 11) (11, 15) (12, 9)
(13, 14) (14, 12) (15, 16) (16, 13).
Calculate the rank correlation coefficient for proficiencies of this group in
mathematics and statistics
Ans. ρ = 0.8
2. Calculate the rank correlation coefficient from the following data showing
ranks of 10 students in two subjects.
Maths 3 8 9 2 7 10 4 6 1 5
Physics 5 9 10 1 8 7 3 4 2 6
Ans. 0.8545
Statistical Method 4.15
4.4 Regression
The term regression means `going back' or stepping down. The regression analysis is a
statistical tool for measuring the average relationship between any two or more closely
related (positively or negatively) variables in terns of the original units of their data. This
technique is extensively used as a formidable instrument in almost all the sciences viz.,
Natural science, physical Science and Social Sciences. This technique is invariably used for
studying the relationship between two or more related variables viz., Price and demand
and supply, production and consumption, expenditure on advertisement and volume of
sales, cost, volume and profit etc. This technique was developed by the British Biometrician
Sir Francis Galton in 1877 in course of his studying the relationship between the hights of
father's and the hights of sons.
The essential characteristics of regression analysis are.
(i) It consists of mathematical devices that are used to measure the average
relationship between two or more closely related variables.
(ii) It is used for estimating the unknown values of some dependent variable with
reference to the known values of its related independent variables.
(iii) It provides a mechanism for prediction or fore cost of the values of one variable in
terms of the values of the other variable.
(iv) It consists of two lines of equations viz., Equation of x on y and equation of y on x.
When the curve is a straight line, it is called a line of regression A line of regression is two
straight line which gives the best fit in the least square sense to the given frequency.
Regression lines
Suppose we are given `n' pairs of values (x1, y1) (x2, y2), (x3, y3)..... (xn, yn) of two variables x
and y. If we fit a st line to this data by taking x as independent variable and y as dependent
variable, than the st line obtained is called the line of regression of y on x. Its slope is called
the regression coefficient of y on x. Similarly If we fit a st line to the data by taking y as
independent variable and x as dependent variable, The line obtained is the line of regression
of x on y, the reciprocal of its slope is called the regression coefficient of x on y.
∑ x i yi
The equation of the line of regression of y on x. Its slope byx is given by byx = is the
∑ xi 2
∑ x i yi 1 1
regression coefficient of on x using this in r = σ x 2 = ∑ xi 2 σ y 2 = ∑ yi 2 we get
nσ x σ y n n
4.16 Engineering Mathematics - III
∑ x i yi rnσ x σ y σy
b yx = = =r (1)
∑ xi 2
nσ x 2
σx
Similarly, the equation of the line of regression of x on y is given by x − x = bxy y − y ( )
∑ x i yi σ
where bxy = =r x
∑ yi 2
σy
1
The slope of two line is ; the reciprocal of this slope namely bxy which is given by
bxy
σx
bxy = r ⋅ (2)
σy
is the regression coefficient of x on y from (1) and (2) r = b yx ⋅ bxy i.e. |r| is the geometric
mean of bxy and byx.
Since |r| < 1, it follows that byx > 1 when ever bxy < 1.
The lines of regression always pass through the point x , y ( )
Problem 1
Calculate the coefficient of correlation and obtain the lines of regression for the
following data
x 1 2 3 4 5 6 7 8 9
y 9 8 10 12 11 13 14 16 15
Obtain an estimate for y which corresponds to x = 6.2
Solution
1 45
n = 9 in each of the x and y series. Their means are x = ∑ x i =
= 5;
n 9
1 108
y = ∑ yi = = 12
n 9
xi X i = xi − x Xi2 yi Yi = yi − y yi2 XiYi
1 –4 16 9 –3 9 12
2 –3 9 8 –4 16 12
3 –2 4 10 –2 4 4
4 –1 1 12 0 0 0
5 0 0 11 –1 1 0
6 1 1 13 1 1 1
Statistical Method 4.17
7 2 2 14 2 4 4
8 3 3 16 4 16 12
9 4 4 15 3 9 12
45 60 108 60 57
From the table we have Xi = 60, Yi = 60, XiYi = 57
2 2
∑ X iYi 57
Correlation coefficient is r = = = 0.95
∑ X i ∑ Yi
2
(2 60 )( )
∑ X i 2 60 ∑ Yi 2 60
σx2 = = ; σy = =
2
n 9 n 9
σ x = σ y = 2.582
\ The regression coefficient are
σy
b yx = r
= 0.95 and bxy = r σ x = 0.95
σx σy
( )
The line of regression of y on x, y − y = b yx x − x is y – 12 = (0.95) (x – 5)
Problem 2
The following table gives the stopping distance by in meters of a motor bike
moving at a speed of x kms/hour when the breaks are applied.
x 16 24 32 40 48 56
y 0.39 0.75 1.23 1.91 2.77 3.81
Find the correlation coefficient between the speed and stopping distance, and
estimate the maximum speed at which the motor bike could be driven if the
stopping distance is not to exceed 5 mts.
Solution
n = 6 values of (x, y) are given.
1 216 1 10.86
x = ∑ xi =
= 36, y = ∑ yi = = 1.81
n 6 n 6
4.18 Engineering Mathematics - III
∑ x i 2 1120
The variances of x and y are σ x 2 = = = 186.667
n 6
∑ yi 8.3992
σ y2 = = = 1.3998
n 6
σ x = 186.667 = 13.663 , σ y = 1.3998 = 1.1831
σx 13.663
bxy = r
= 0.983 × = 11.352
σy 1.1831
\ The equation of the line of regression of x on y namely x − x = bxy y − y is, ( )
x – 36 = (11.352) ( y – 1.81)
x = 11.352 y + 15.453. (1)
When y = 5, Eqn. (1) gives x = 72.213.
⇒ For the stopping distance not to exceed 5 mts,
The speed must not exceed 72 kms/hrs
Problem 3
Obtain the lines of regression and hence find the coefficient of correlation for
the following data.
x 1 3 4 2 5 8 9 10 13 15
y 8 6 10 8 12 16 16 10 32 32
Statistical Method 4.19
Solution
n = 10 items in each of the x and y series.
x y X =x−x Y = y− y X2 Y2 XY
1 8 –6 –7 36 49 42
3 6 –4 –9 16 81 36
4 10 –3 –5 9 25 15
2 8 –5 –7 25 49 35
5 12 –2 –3 4 9 6
8 16 +1 1 1 1 +1
9 16 2 1 4 1 2
10 10 3 –5 9 25 –15
13 32 6 17 36 289 102
15 32 8 17 64 289 136
70 150 204 818 360
Using the entries in the table X i = 204, Yi = 818 and XY= 360
2 2
∑ x i yi 360 ∑ x i yi 360
b yx =
= = 1.7647 bxy = = = 0.4401
∑ x i 2 204 ∑ yi 2 818
\ The equations of regression lines, namely
y − y = bxy ( x − x ) and x − x = bxy ( y − y )
are y – 15 = 1.7647 (x – 7) and (x – 7) = (0.4401) ( y – 15)
which simplify to, y = 1.7647 x + 2.6471
x = 0.4401 y + 0.3985
Since bxy and byx are both positive, the coefficient is positive and is given by
r = b yx bxy = 1.7647 × 0.4401 = 0.8813
Problem 4
The equations of regression lines of two variables x and y are
y = 0.516 x + 33.73 x = 0512 y + 32.52
Find the correlation coefficient and the means of x and y.
4.20 Engineering Mathematics - III
Solution
The equation y = 0516 x + 33.73 represents the regression line of y on x and the second
equation is x = 0.512 y + 3252 of x on y.
byx = 0.516, bxy = 0.512
\ The correlation coefficient is r = (0.516 )(0.512) = 0.514
( )
Since the lines of regression pass through the point , x , y the given equations are
Problem 5
The two regression equations of the variables x and y are x = 19.13 – 0.87y and
y = 11.64 – 0.50x, Find (i) mean of x and y's (ii) The correlation coefficient
between x and y. (VTU 204)
Solution
( )
Since x , y satisfies the regression lines.
we have, x = 19.13 − 087 y (1)
y = 11.64 − 0.50x (2)
Multiplying (2) by 0.87 and subtracting form (1)
we have, [1 – (0.87) (0.50)] x = 19.13 − (11.64)(0.87)
0.57 x = 9.00 or x = 15.79
y = 11.64 − 0.50 (15.79) = 3.74
Problem 6
1 − r2 σx σ y
If θ is the angle between the two regression lines, show that tanθ = ⋅ 2
r σx + σ y2
Explain the significance when r = 0 and r = 1 (VTU 2007)
Solution
The equations to the line of regression of y on x
σy
y− y =r
σx
(x − x)
σx
and x on y is x − x = r
σy
(y− y )
σy σy
\ their slopes are m1 = r and m2 =
σx rσ x
m2 − m1
We know that the angle between the two straight line is given by tanθ =
1 + m1m2
σy rσ y σy rσ y
− −
rσ x σ x rσ σx
tan =
= x
σ rσ σ 2
1+ y × y 1 + y2
rσ x σ x σx
(1 − r ) σ
2
y
r σx 1 − r 2 σx σ y
= = 2
σx 2 + σ y2 r σx + σ y
2
σx2
1 − r 2 σx σ y
tanθ =
2
r σx + σ y
2
≠
When r = 0, tanθ or θ =
2
i.e., when the variables are independent, the two lines of regression are perpendicular
to each other.
When r = 1, tanθ = 0,
i.e., θ = 0 or . the lines of regression coincide
i.e., there is perfect correlation between the two variables.
4.22 Engineering Mathematics - III
Problem 7
In a partially destroyed laboratory record, only the lines of regression of y on
x and x on y are available as 4x – 5y + 33 = 0 and 20x – 9y = 107 respectively.
Calculate x , y and the coefficient of correlation between x and y.
(VTU 2005)
Solution
( )
Since the regression lines pass through , x , y therefore we have
4 x − 5 y + 33 = 0 (1)
20x − 9 y = 107 (2)
Multiply Eqn. (1) by 5 and subtract from (2)
272
We get 16 y = 272 ⇒ y = = 17
16
Put y = 17, in (1) 4 x − 5 (17) + 33 = 0
4 x − 85 + 33 = 0
52
4 x − 52 = 0 ⇒ x =
= 13
4
\ x 13
= =, and y 17
4 33
The line of regression of y on x as y = x+ ,
5 5
σ 4
Regression coefficient of y on x in byx = r y =
σx 5
9 107
The line of regression of x on y as x = y+ ,
20 9
σ 9
The regression coefficient of x on y bxy = r x =
σ y 20
σy σx 4 9 36
\ byx × bxy = r ×r = × ⇒ r2 = = 0.36
σx σ y 5 20 100
∴ r = 0.6 [∵ The positive sign being taken as byx and bxy both positive.
Statistical Method 4.23
Problem 8
In the following table are recorded data showing the test scores made by
salesman on an intelligence test and their weekly sales:
Sales man 1 2 3 4 5 6 7 8 9 10
Test Scores 40 70 50 60 80 50 90 40 60 60
Sales (000) 2.5 6.0 4.5 5.0 4.5 2.0 5.5 3.0 4.5 3.0
Calculate the regression line of sales on test scores and estimate the most
probable weekly sales volume if a salesman makes a score of 70.
Solution
Test scores Sales y Deviation of x Deviation of y dx × dy dx2 dy2
x from assumed from assumed
mean (= 60) dx average (= 4.5) dy
40 2.5 –20 –2 40 400 4
70 6.0 10 1.5 15 100 2.25
50 4.5 –10 0 0 100 0
60 5.0 0 0.5 0 0 2.25
80 4.5 20 0 0 400 0
50 2.0 –10 –2.5 25 100 6.25
90 5.5 30 1 30 900 1.00
40 3.0 –20 –1.5 30 400 2.25
60 4.5 0 0 0 0 0
60 3.0 0 –1.5 0 0 2.25
0 – 4.5 140 2400 18.25
From the table we have dx = 0, dy2 = –4.5, dxdy = 140, dx2 = 2400, dy2 = 18.25
0
x = mean of x (test scores) = 60 +
= 60
10
( −4.5) = 4.05
y = mean of y (sales) = 4.5 +
10
σy
Regression line of sales ( y) on scores (x) is given by y − y = r x−x
σx
( )
σy ∑ xy σ y ∑ xy
Where r = × =
σx σ x σ y σ x ( σ x )2
4.24 Engineering Mathematics - III
∑ dx ∑ d y
∑ dx d y −
= n
( ∑ dx )
2
∑ dx −
2
n
0 × ( −4.5)
140 −
10 140
= 2
= = 0.06
0 2400
2400 −
10
the required regression line is
y – 4.05 = 0.06 (x – 60) or y = 0.06 x + 0.45
For x = 70, y = 0.06 × 70 + 0.45 = 4.65
y = 0.06 × 70 + 0.45 = 4.65.
Thus the most probable weekly sales volume for a score of 70 is 4.65
Problem 9
Find the correlation coefficient and the regression lines of y on x and x on y for
the following data.
x 1 2 3 4 5
y 2 5 3 8 7
(VTU 2010)
Solution
There are n = 5 items in each of the x and y
∑ x i 15 ∑ yi 25
The means are x = = =3, y= = =5
n 5 n 5
1 –2 4 2 –3 9 6
2 –1 1 5 0 0 0
3 0 0 3 –2 4 0
4 1 1 8 3 9 3
5 2 4 7 2 4 4
15 10 25 26 13
From the table we have X = 10, Yi = 26 , XiYi = 13
2 2
i
Statistical Method 4.25
∑ x i 2 10
Also, σ x 2 = = =2 ; σ x = 2 = 1.4142
n 5
∑ yi 2 26
σ y2 = = = 5.2 ; σ y = 5.2 = 2.2806
n 5
σy
The regression coefficients are byx = r
σx
2.2806
byx = 0.8062 ×
= 1.3001
1.4142
σx 1.4142
and bxy = r = 0.8062 × = 0.4999
σy 2.2806
(
The line of regression of y on x, is y − y = b yx x − x )
y – 5 = 1.3001 (x – 3)
y = 1.3001 x + 5 – 3 (1.3001)
y = 1.3001 x + 1.0997
(
The line of regression of x on y is x − x = bxy y − y )
x – 3 = 0.4999 ( y – 5)
x = 0.4999 y + 3 – 5(0.4999)
x = 0.4999y + 0.5005
Problem 10
The regression equations of two variables x and y are x = 0.7y + 5.2,
y = 0.3x + 2.8. Find the means of the variables and the coefficients of correlation
between them.
Solution
The equation y = 0.3x + 2.8 represents the regression line of y on x byx = 0.3
The equation x = 0.7y + 5.2 represents the regression line of x on y bxy = 0.7
\ The correlation coefficient is r = bxy × b yx
r = 0.7 × 0.3 = 0.21 = 0.4583
4.26 Engineering Mathematics - III
( )
Since the lines of regression pass through the point x , y the given equations are
() 60 2
2
Variance, σ x = 2
− x = −
n 18 3
10 4 30 − 4 26
σx2 = − = =
3 9 9 9
\ σx = 1.6997
∑ yi 2 96 − 18 78
( ) 96
− (1 ) =
2
− y
2
and σ y 2 = = =
n 18 18 18
σ y = 2.0817
1 ∑ x i yi
The coefficient of correlation r = − x y
σx σ y n
1 48 2
r=
− (1 )
1.6997 × 2.0817 18 3
1 8 2 3
= − =
1.6997 × 2.0817 3 3 1.6997 × 2.0817
r = 0.8479
Statistical Method 4.27
σx
The regression coefficient of x on y bxy = r
σy
1.6997
= 0.8479 × = 0.6923
2.0817
bxy = 0.6923
and the regression coefficient of y on x is
σy 2.0817
b yx = r
= 0.8479 × = 1.0385
σx 1.6997
The regression lines.
(
y − y = b yx x − x
)
2
y − 1 = 1.0385 x −
3
2 × 1.0385
y = 1.0.385x −
3
y = 1.0385 x – 0.6923.
(
and x − x = bxy y − y )
2
x − = 0.6923 ( y − 1)
3
2
x = 0.6923 y – 0.6923 +
3
x = 0.6923 y – 0.2563
Problem 12
If the coefficient of correlation between two variables x and y is 0.5 and the acute
−1 3 1
angle between their lines of regression is tan , show that σ x = σ y
5 2
(VTU 2004)
Solution
−1 3
Given r = 0.5 θ = tan
5
1 − r2 σx σ y
We have, tanθ = , 2
r σx + σ y2
4.28 Engineering Mathematics - III
3 1 − ( 0.5) σx σ y
2
= × 2
5 0.5 σx + σ y2
3 1 − 0.25 σx σ y
= × 2
5 0.5 σx + σ y2
3 3 σx σ y
= × 2
5 2 σx + σ y2
2(x2 + y2) = 5x y,
2x2 – 5x y + 2y2 = 0,
2x2 – 4x y – x y + 2y2 = 0,
2x(x – 2y) – y(x – 2y) = 0
(x – 2y) (2x – y) = 0
x – 2y = 0 or 2x – y = 0
1
σx = σ y
2
Problem 13
If the tangent of the angle θ between the lines of regression of y on x and x on y is
0.6 and the standard deviation of y is twice the standard deviation of x, find the
coefficient of correlation between x and y.
Solution
It is given that tanθ = 0.6, y = 2x
1 − r 2 σx σ y
\ tanθ = ⋅ 2 gives,
r σx + σ y2
1 − r2 2σ 2
0.6 = ⋅ 2 x 2 ∵ σ y = 2σ x
r σ x + 4σ x
1 − r 2 2σ x 2
0.6 = ⋅
r 5σ x 2
3r = 2 – 2r2 2r2 + 3r – 2 = 0
1
(2r – 1) (r + 2) = 0. This gives the correlation coefficient as r = [Since r –2].
2
Problem 14
Two variables x and y are connected by the relation ax + by + c = 0. Show that the
correlation coefficient between them is –1 if the signs of a and b are the same
and +1 if they are different.
Statistical Method 4.29
Solution
−a c −b c
The given relation between x and y can be written as y = x− , x= y−
b b a a
These equations represent the regression lines.
a b
\ b yx = − and bxy = −
b a
r = byx bxy = 1 r = 1
2
Suppose a and b are of the same sign. Then bxy and byx are both negative and hence r
is negative. In this case r = –1. If a and b are of positive signs, then both byx and bxy are
positive and consequently r = 1.
Problem 15
The following information is available in respect of the price of a certain con-
sumer item in two cities. A, B. Average price in city A is Rs.65, average price in
city B is Rs 67; standard deviation in city A is 2.5; standard deviation in city B
is 3.5. The coefficient of correlation between the prices in the two cities is 0.8.
Find the most likely price in city B corresponding to the price of Rs. 70 in city A.
Solution
Let x denote the price in city A and y denote the price in city B. Given that
σ 3.5
x = 65, y = 67, σ x = 2.5, σ y = 3.5, r = 6.8 . These give b yx = r y = 0.8 × = 1.12 .
σx 2.5
Therefore, the equation of the line of regression of y on x is y – 67 = (1.12) (x – 65)
or y = (1.12)x –5.8
When x = 70, y = 1.12(70) – 5.8 = 78.4 – 5.8 = 72.6
Thus the most likely price in city B corresponding to the prices of Rs 70 in city A is
Rs. 72.6.
Problem 16
For two random variables x and y with the same mean, the two regression lines
b 1−a
are y = ax + b and x = ay + . Show that = . Find also the common mean.
β 1−α
Solution
The regression lines y on x and x on y are
y = ax + b (1)
and x = ay + β (2)
(
Since the regression lines passes through x , y )
The equations are satisfied for x = x and y = y .
\ We have y = ax + b (3)
4.30 Engineering Mathematics - III
and x = a y + β (4)
Multiply Eqn.(4) by a and subtract form (3)
b + aβ
− (1 − aα ) y = − ( b + aβ ) ⇒ y = (5)
1 − aα
b + aβ
Eqn. (4) gives x = a + β
1 − aα
β + bα
x= (6)
1 − aα
It is given that the mean of x and y are same
β + bα b + aβ
i.e., x = y ⇒ =
1 − aα 1 − aα
⇒ β + bα = b + aβ
b – b = – a
b(1 – ) = (1 – a)
b 1−a
⇒ =
β 1−α
Exersice
1. Obtain the lines of regression and hence find the coefficient of correlation for
the following data
x 1 2 3 4 5 6 7
y 9 8 10 12 11 13 14
Ans.: y = 0.93x + 7.28, x = 0.93y – 6.23, r = 0.93
2. For the following data, find (i) the regression equations, (ii) the coefficient of
correlation, and (iii) most likely value of y when x = 30.
x 25 28 35 32 31 36 29 38 34 32
y 43 46 49 41 36 32 31 30 33 39
Ans.: x = 0.234 y + 40.892, y = 0.664x + 59.248, r = 0.39 & y = 39.
3. The equations of regression lines of two variables x and y are 3x + 2y = 26 and
6x + y = 31. Find the mean values of x and y and the correlation coefficient.
Ans.: x = 4 , y = 7 , r = −0.7
4. In a partially destroyed laboratory data, only the following regression equation
were available: 7x – 16y + 9 = 0, 5y – 4x -3 = 0. Find the means of x and y and
the coefficient of correlation between x and y.
Ans.: x = −0.1034, y = 0.5172, r = 0.7395