0% found this document useful (0 votes)
14 views

Module-4 (Correlation & Regression)

The document provides an overview of statistical methods focusing on correlation and regression analysis. It explains different types of correlations, their properties, and the significance of correlation in various fields such as economics and business. Additionally, it includes formulas for calculating correlation coefficients and practical examples to illustrate these concepts.

Uploaded by

KOUSHIK S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Module-4 (Correlation & Regression)

The document provides an overview of statistical methods focusing on correlation and regression analysis. It explains different types of correlations, their properties, and the significance of correlation in various fields such as economics and business. Additionally, it includes formulas for calculating correlation coefficients and practical examples to illustrate these concepts.

Uploaded by

KOUSHIK S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

MODULE

3 CHAPTER

STATISTICAL METHOD
4
H  Introduction correlation
 Rank correlation
I  Properties of Rank correlation coefficient
G  Introduction Regression Lines

H
L
I
G
H
T
S
4.2 Engineering Mathematics - III

4.1 Correlation Introduction


Two variables are said to be correlated if with a change in the value of one variable there
arises a change in the value of and their variable.
For example, the yield of crop varies with the amount of rainfall.
Types of correlations
(i) Positive correlation: If an increase in the value of one variable X results in a
corresponding increase in the value of other variable Y on an average.
Or.
If a decrease in the value of one variable X results in a corresponding decrease in
value of other variable Y on an average. The correlation is said to be positive.
(ii) Negative correlation: If the increase in the values of one variable X results in a
corresponding decrease in the values of other variable Y.
Or.
If the decrease in the values of one variable X results in the increase to
corresponding values of Y. The correlation between X and Y is said to be negative.
(iii) Linear correlation: When all the plotted lie approximately on a straight line.
Then the correlation is said to be linear correlation.
y

o x

(iv) Perfect correlation: If the deviation of one variable X is proportional to the


deviation in other variable Y, then the correlation in said to be perfect. In two case
the plotted prints on a group lie exactly on a straight line.

(v) Positive Perfect Correlation: If increase in one variable X is proportional to the


increase in the other variable Y. The group will be exactly straight line.
Statistical Method 4.3

(vi) Negative Perfect Correlation: If increase in the variable is proportional to the


decrease in the other variable. The graph will be exactly a straight line.

If two variables very in such a way that their ratio is always constants, then the
correlation is said to be perfect.

Uses of correlation
(i) It is used in deriving precisely the degree and direction of relationship between
variables like price and demand, advertising expenditure and sales, rain falls and
crops yield etc.
(ii) It is used in developing the concept of regression and ratio of variation which help
in estimating the values of one variable for a given value of another variable.
(iii) It is used in reducing the range of uncertainty in the matter of prediction.
(iv) It is used in presenting the average relationship between any two variables
through a single value of co - efficient of correlation.
(v) In the field of economics it is used in understanding the economic behaviour and
locating the important variables on which the others depend.
(vi) In the field of business it is used advantageously to estimate the cost of sales,
volume of sales, sales-prices and any other value on the basis of some other
variables which are financially related to each other.
(vii) In the field of science and philosophy, also the methods of correlation are profusely
used in making progressive developments in the respective lines.
(viii) In the field of nature also, it is used in observing the multiplicity of the inter-
related forces.

Correlation
Let x1, x2, x3... xn be n values of x and y1, y2 ,.... yn be the corresponding values of y. Then
1 1
x = ∑ x i and y = ∑ yi are called mean of the x – series and y – series.
n n
1
( ) 1
( )
2 2
σ x 2 = ∑ x i − x and σ x 2 = ∑ yi − y is called the variance of the x–series and y–series
n n
σ x and σ y is called S.D of x series and y series.
4.4 Engineering Mathematics - III

Alternative forms for σ x and σ y


2 2

1
() 1
( ) with x , y , σ x , σ y given as above
2 2
σ x 2 = ∑ xi 2 − x and σ y 2 = ∑ yi 2 − y
n n

r=
(
∑ xi − x ) ( y − y ) is called the coefficient of correlation between x and y.
i

n σx σ y
For the coefficient of correlation between x and y –1 ≤ r ≤ 1.

Problem 1
Prove the following formulas for he coefficient of correlation
2 2
1 x y  1  xi y 
(i) r = 1 − ∑  i − i  (ii) r = 1 + ∑ + i
2n  σ x σ y  2n  σ x σ y 
Hence reduce that –1 ≤ r ≤ 1
Solution
2
x y   x 2 2x y y2  1 2 1
∑  i − i  = ∑  i 2 − i i + i 2  = 2 ∑ xi2 − ∑ xi yi + 2 ∑ yi2
σ 
 x σy   σ x σ x σ y σ y  σ x σxσ y σy
2
x y  1 2 1
∑  i − i  = 2 ( nσ x 2 ) − + {r n σ x σ y } + 2 ( nσ y 2 )
σ 
 x σ y  σx σxσ y σy
2
1 x y 
= n − 2nr + n = 2n (1 − r ) ⇒ r = 1 − ∑  i − i 

2n  σ x σ y 
2
x y 
Similarly, ∑  i + i  = 2n (1 + r ) (2)
 σx σ y 
2
1 x y 
⇒⇒ r = −1 + ∑  i + i 
2n  σ x σ y 
Since L H S of (1) and (2) are non – negative and n > 0, it follows that
1 – r ≥ 0 and1 + r ≥ 0 i.e., 1 ≥ r and r ≥ –1 ⇒ –1 ≤ r ≤ 1.
If r = ± 1, we say that x and y are perfectly correlated.
If r = 0, we say that x and y are non – correlated.
Taking x i − x = X i and yi − yi = Yi

r=

(
∑ xi − x ) ( y − y ) can be rewritten as
i
∑ X iYi
n σx σ y r=
n σx σ y
Statistical Method 4.5

1 1
Also, σ x = ∑ x i , σ y = ∑ yi
2 2 2 2

n n
∑ X iYi
and r = This is known as the product - moment formula for r.
(
∑ X i 2 ∑ Yi 2 )( )
Problem 2
If r is the correlation coefficient between x and y, and z = ax + by, show that

r=
(
σ z 2 − a 2 σ x 2 + b2 σ y 2 ) (1)
2 a b σx σ y
Solution
Since z = ax + by & Z = ax + b y , where x , y & z are the means of x, y and z Series.
Let zi = axi + byi, i = 1, 2, 3,.... n. Then zi − z = ( ax i + byi ) − ax + b y ( )
= a( x i − x ) + b( yi − y )

( ) ( ) ( ) ( )( y − y)
2 2 2
So that zi − z = a2 x i − x + b2 yi − y + 2 ab x i − x i
(1)

Summing up to `n' this yields.

( ) ( ) ( ) ( )( y − y)
2 2 2
∑ zi − z
= a2 ∑ x i − x + b2 ∑ yi − y + 2 ab ∑ x i − x i
(2)

∑ ( x − x )( y − y)
(
By using, σ x 2 = 1 ∑ x i − x , σ y 2 = 1 ∑ yi − y ) ( ) i i
2 2
and r =
n n nσ x σ y

The above equation (2) gives, nσ z = a nσ x + b nσ y + 2abnr σ x σ y


2 2 2 2 2

⇒ r=
(
σ z 2 − a2σ x 2 + b2σ y 2 ) Is the required relation.
2 ab σ x σ y
As particular cases of the relation (1) proved above, we find by taking a = 1, b = 1, and

a = 1, b = –1 respectively, that r =
σ2x + y − σ x 2 + σ y 2 ( (3)
)
2σ x σ y
σ x 2 + σ y2 − σ x − y2
and r = (4)
2σ x σ y
σ x + y2 − σ x − y2
Adding the above expressions we get r =
4σ x σ y
4.6 Engineering Mathematics - III

Problem 3
Find the correlation coefficient for the following data
x 1 2 3 4 5
y 2 5 3 8 7

Solution

n = 5, The mean of the x – Series is x = ∑ x i = (1 + 2 + 3 + 4 + 5) =


1 1 15
=3
n 5 5

The mean of the y – Series is y = ∑ yi = (2 + 5 + 3 + 8 + 7) =


1 1 25
=5
n 5 5

xi xi = xi − x xi2 yi yi = yi − y yi2 xiyi


1 1 – 3 = –2 4 2 2 – 5 = –3 9 6
2 2 – 3 = –1 1 5 5–5=0 0 0
3 3–3=0 0 3 3 – 5 = –2 4 0
4 4–3=1 1 8 8–5=3 9 3
5 5–3=2 4 7 7–5=2 4 4
15 10 25 26 13
Here we have ∑x 2
i = 10 , ∑ y = 26 , ∑ x i yi = 13
2
i

∑ x i yi 13 13
∴ r= = = = 0.806
∑ x i ∑ yi
2 2
10 × 26 16.1245

Problem 4
The following table gives the ages (in years) of 10 married couples. Calculate the
coefficient of correlation between these ages.
Age of husband (x) 23 27 28 29 30 31 33 35 36 39
Age of wife (y) 18 22 23 24 25 26 28 29 30 32

Solution
Here n = 10 values of (x, y) are given
1 311
The mean of the x – Series is x = ∑ x i = = 31.1
n 10
1 257
The mean of the y – series is y = ∑ x i = = 25.7
n 10
Statistical Method 4.7

xi X i = xi − x Xi2 yi Yi = yi − y Yi2 Xi Yi

23 – 8.1 65.61 18 –7.7 59.29 62.37


27 –4.1 16.81 22 –3.7 13.69 15.17
28 –3.1 9.61 23 –2.7 7.29 8.37
29 –2.1 4.41 24 –1.7 2.89 3.57
30 –1.1 1.21 25 –0.7 0.49 0.77
31 –0.1 0.01 26 0.3 0.09 –0.03
33 1.9 3.61 28 2.3 5.29 4.37
35 3.9 15.21 29 3.3 10.89 12.87
36 4.9 24.01 30 4.3 18.49 21.07
39 7.9 62.41 32 6.3 39.69 49.77
311 202.90 257 158.10 178.30
From the table we have ∑x 2
i = 202.90 , ∑ y = 158.10 , ∑ x i yi = 178.30
2
i

The coefficient of correlation between x and y is


∑ X iYi 178.3 178.3
r= = = = 0.9955.
∑X (
2
∑Y 2
i )( i )
202.9 × 158.1 179.105

Correlation is almost perfect i.e. in the given data, the ages of husbands and wives are
almost perfectly correlated.

Problem 5
σ x2 + σ y2 − σ x2 − y
Employ the formula r = to determine r for the following data
2σ x σ y

X 92 89 87 86 83 77 71 63 53 50
y 86 83 91 77 68 85 52 82 37 57
Solution
n = 10, values of (x, y) are given.
1 751 1 718
x = ∑ xi = = 75.1 and y = ∑ yi = = 71.8
10 10 10 10
1
\ σ2x = ∑ x i2 − x ()
2

10
1
(
= 10 92 + 89 + 87 + 86 + 83 + 77 + 71 + 63 + 53 + 50 − (75.1)
2 2 2 2 2 2 2 2 2 2 2
)
58487
− (75.1) = 208.69
2
=
10
4.8 Engineering Mathematics - III

1
( )
2
and σ y 2 = ∑ yi 2 − y
10
1
= 862 + 832 + 912 + 772 + 682 + 852 + 522 + 822 + 372 + 572  − (71.8 )
2

10
54390
= − ( 71.8 )2 = 283.76
10
1 1
Let zi = xi – yi then the mean of the Z – Series is z = ∑ zi = ∑ ( x i − yi )
10 10

= [6 + 6 − 4 + 9 + 15 − 8 + 19 − 19 + 16 − 7] =
1 33
= 3.3
10 10
1
()
2
Also, σ x − y 2 = σ z 2 = ∑ zi 2 − z
10
1
= 62 + 62 + ( −4 ) + 92 + 152 + ( −8 ) + 192 + ( −19) + 162 + ( −7 )  − (3.3)
2 2 2 2 2
10  
1485
− (3.3) = 137.61
2
=
10
σ x 2 + σ y2 − σ x − y2 208.69 + 283.76 − 137.61
Using r = =
2σ x σ y 2 208.69 × 283.76

354.84
= = 0.729 is the required correlation coefficients.
2 × 243.347
Problem 6
If the variables x and y are such that (i) x + y has variable 15 (ii) x - y has variance
11 and (iii) 2x + y has variance 29. Find σx, σy and the coefficient of correlation
between x and y.
Solution
If z = ax + by we have σz = a σx + b2σy2 + 2abrσxσy
2 2 2

⇒ σ2x + y = σ x 2 + σ y 2 + 2r σ x σ y

σ2x − y = σ x 2 + σ y 2 − 2r σ x σ y

σ22 x + y = 4σ x 2 + σ y 2 + 4r σ x σ y


σ2x + y = 15 , σ2x − y = 11 , σ22 x + y = 29 , using these we get

σx2 + σy2 +2r σx σy = 15 (1)

σx + σ –2r σx σy = 11
2
y
2
(2)
Statistical Method 4.9

4 σx2 + σy2 +4r σx σy = 29 (3)


Adding (1) and (2) ⇒ σ2x + σ2y = 13 (4)
Also (ii) X2 + (iii) gives 2σ2x + σ2y = 17 (5)
Solving (4) and (5) we get σx2 = 4 and σy2 = 9. σx = 2 and σy = 3.
Equation (2) ⇒ the coefficients of correlation as

σx 2 + σy 2 − 11 4 + 9 − 11 2 1
r= = = =
2σ x σ y 2 × ( 2 )( 3 ) 12 6

Problem 7
While calculating the correlation coefficients the x and y from 25 pairs of
observations, a person obtained the following values: Σxi = 125, Σxi2 = 650,
Σyi = 100, Σyi2 = 460, Σxiyi = 508. It was later discovered that he had copied down
the pairs (8, 12) and (6, 8) as (6, 12) and (8, 6) respectively. Obtain the correct
value of the correlation coefficient.
Solution
In the competition of the correct value of the correlation coefficient the pairs (6, 12)
and (8, 6) are to be changed to (8, 12) and (6, 8) respectively.
Correct Σxi = 125 – 6 – 8 + 8 + 6 = 125
Correct Σxi2 = 650 – 62 –82 + 82 + 62 = 650
Correct Σyi = 100 – 12 – 6 + 12 + 8 = 102
Correct Σyi2 = 460 – 122 – 62 + 122 + 82 = 488
Correct Σxiyi = 508 – (6 × 12) – (8 × 6) + (8 × 12) + (6 × 8) = 532
Since the number of observations is n = 25, we have
∑ x i 125 ∑ yi 102
x=
= = 5, y = = = 4.08
n 25 n 25
∑ xi 2
() 650
2

σ2x = − x = − 25 = 1 so that σx = 1
n 25
∑ yi 2
( ) 488
2

σ2y = − y = − (4.08)2 = 2.8736 so that σ y = 1.6952
n 25
∴ Correct correlation coefficient is


r=
∑ ( x i − x )( yi − y )
=
∑x y
i i − x ∑ yi − y ∑ x i + n x y
n σx σ y n σx σ y
4.10 Engineering Mathematics - III

1  ∑ x i yi ∑ yi ∑ xi 
=  −x −y + x y
σx σ y  n n n 
1  ∑ x i yi  1  ∑ x i yi 
=  − x y − x y + x y =  − x y
σx σ y  n  σx σ y  n 
1  532 
=  − 5 × 4.08  = 0.5191
1.6952  25 
Exercises
1. In each of the following cases, calculate the coefficient of correlation between x
and y

x 10 14 18 22 26 30
(i)
y 18 12 24 6 30 36
Ans.: 0.67

x 1 2 3 4 5 6 7 8 9 10
(ii)
y 10 12 16 28 25 36 41 49 40 50
Ans.: 0.96
x 28 45 40 38 35 33 40 32 36 33
(iii)
y 23 34 33 34 30 26 28 31 36 35
Ans.: 0.518

2. Using the formula r =


(σ x
2
+ σ y2 − σ x − y2 ) , find r in the following cases:
2σx σ y

x 1 2 3 4 5 6 7
(i)
y 4 6 9 10 12 14 15
Ans.: 0.99

(ii) x 21 23 30 54 57 58 72 78 87 90
y 60 71 72 83 110 84 100 92 113 135
Ans.: 0.876

4.2 Rank Correlation


A British psychologist Charles Adward spearman found out the method of finding the
coefficient of correlation by ranks. This method is based on rank and is useful in dealing
with qualitative Characteristics such as morality, Character, intelligence and beauty. It can
Statistical Method 4.11

not be measured quantitatively as in the case of person's coefficient of correlation. It is based


on the ranks given to the observations. Rank correlation is applicable only to be individual
observations.
A group of n individuals may be arranged in order to merit with respect to some characteristic.
The same group would give different orders for different characteristics. Considering the
orders corresponding to two characteristics A and B, the correlation between these n
pairs of ranks in called the rank correlation in the characteristics A and B for that group of
individuals.
Let xi, yi be the ranks of the ith individuals in A and B respectively. Assuming that no two
individuals are bracketed equal in either case, each of the variables taking the values 1, 2,
1 + 2 + 3 + ... + n n( n + 1 ) n + 1
3,... n we have x = y = = = If X, Y be the deviations of x, y from
n 2n 2

( )
2
= ∑ x i 2 + n ( x ) − 2x ∑ x i
2
their means than ∑ x i 2 = ∑ x i − x

n ( n + 1)
2
2n + 1
= ∑n + 2
− .∑n
4 2
n( n + 1 )( 2n + 1 ) n( n + 1 )2 n( n + 1 )2
= + −
6 4 2

=
1 3
12
(
n −n )
Similarly ∑ yi 2 =
12
(
1 3
n −n )
(
Let di = xi – yi so that di = x i − x − yi − y ) ( )
di = Xi – Yi
\ ∑ di = ∑ X i + ∑ Yi − 2 ∑ X iYi
2 2 2

or ∑ X iYi =
1
2
(
∑ X i 2 + ∑ Yi 2 − ∑ di 2 )
=
1 3
12
( 1
n − n − ∑ di 2
2
)
∑ X iYi
Hence the correlation coefficient between these variables is r =
∑ X i 2 ∑ Yi 2
1 3
( 1
)
n − n − ∑ di 2
6∑d 2
= 12 2 =1− 3 i
n −n
1 3
12
n −n( )
4.12 Engineering Mathematics - III

6 ∑ d12
r =1− this is called the rank correlation coefficient and is denoted by ρ thus
n3 − n
6 ∑ di 2
ρ =1−
n3 − n
Where ρ  Rank coefficient of correlation
d2  Some of the squares of the differences of two ranks.
n  Number of paired observations.

4.3 Properties of Rank Correlation Coefficient


1. The value of ρ lies between +1 and –1
2. If ρ = 1, these is complete agreement in the order if the ranks and the direction of
the rank is same.
3. If ρ = –1 then there is complete disagreement in the order of the ranks and they
are in opposite directions.

Working Rule to Solve Problems


1. When the ranks are given:
step 1. Compute the difference of two ranks and denote it by d
step 2. Square of d and compute d2
step 3. Obtain ρ by substituting the above values in the formula.
2. When the ranks are not given: Actual data are given, then we must give ranks,
we can give ranks by taking the highest as 1 or the lowest value as 1, next to the
highest (lowest) as 2 and follow the same Procedure for both the variables.

Problem 1
Following are the rank obtained by 10 students in two subjects, statistics and
mathematics to what extent the knowledge of the students in two subjects is
related?
Statistics 1 2 3 4 5 6 7 8 9 10
Mathematics 2 4 1 5 3 9 7 10 6 8

Solution
Rank in statistics xi Rank in Mathematics d=x–y d2
1 2 –1 1
2 4 –2 4
3 1 2 4
Statistical Method 4.13

4 5 –1 1
5 3 2 4
6 9 –3 9
7 7 0 0
8 10 –2 4
9 6 3 9
10 8 2 4
di = 40
2

Rank correlation
6 ∑ d2
ρ =1−
n = 10
n3 − n
6 × 40
ρ =1−
= 0.757576
1000 − 10
Problem 2
Ten participants in a contest are ranked by two judges as follows
x 1 6 5 10 3 2 4 9 7 8
y 6 4 9 8 1 2 3 10 5 7

(VTU 2002)
Calculate the rank correlation coefficient ρ
Solution
If di = xi – yi then di = –5, 2, –4, 2, 2, 0, 1, –1, 2, 1

di2 = 25 + 4 + 16 + 4 + 4 + 0 + 1 + 1 +4 + 1 = 60

∑ di 2 6 × 60
\ Rank correlation ρ = 1 − =1− = 0.6364
n −n
3
990
Problem 3
Three judges A, B, C, give the following ranks, Find which pair of judges has
common approach.
A 1 6 5 10 3 2 4 9 7 8
B 3 5 8 4 7 10 2 1 6 9
C 6 4 9 8 1 2 3 10 5 7
4.14 Engineering Mathematics - III

Solution
Here n = 10
A (= x) Ranks by C(= z) d1 = x – y d2 = y – z d3 = z – x d12 d22 d32
B (= y)
1 3 6 –2 –3 5 4 9 25
6 5 4 1 +1 –2 1 1 4
5 8 9 –3 –1 4 9 1 16
10 4 8 6 –4 –2 36 16 4
3 7 1 –4 6 –2 16 36 4
2 10 2 –8 8 0 64 64 0
4 2 3 2 –1 –1 4 1 1
9 1 10 8 –9 1 64 81 1
7 6 5 1 1 –2 1 1 4
8 9 7 –1 2 –1 1 4 1
200 214 60
From the table we have d1 = 200, d2 = 214, d3 = 60
2 2 2

6 ∑ d12 6 × 200
\ ρ( x , y ) = 1 − =1− = −0.2
n −n
3
10 × 99
6 ∑ d22 6 × 214

ρ( y , z ) = 1 − =1− = −0.3
n −n
3
10 × 99
6 ∑ d32 6 × 60
ρ( z , x ) = 1 −
=1− = 0.6
n −n
3
10 × 99
sine ρ(z, x) is maximum, the pair of judges A and C have the nearest common approach.

Exercise
1. The ranks of 16 students in mathematics and statistics are as follows:
(1, 1) (2, 10) (3, 3) (4, 4) (5, 5) (6, 7) (7, 2) (8, 6) (9, 8) (10, 11) (11, 15) (12, 9)
(13, 14) (14, 12) (15, 16) (16, 13).
Calculate the rank correlation coefficient for proficiencies of this group in
mathematics and statistics
Ans. ρ = 0.8
2. Calculate the rank correlation coefficient from the following data showing
ranks of 10 students in two subjects.
Maths 3 8 9 2 7 10 4 6 1 5
Physics 5 9 10 1 8 7 3 4 2 6
Ans. 0.8545
Statistical Method 4.15

3. Find the rank correlation for the following data:


x 56 42 72 36 63 47 55 49 38 42 68 60
y 147 125 160 118 149 128 150 145 115 140 152 155
Ans. 0.932

4.4 Regression
The term regression means `going back' or stepping down. The regression analysis is a
statistical tool for measuring the average relationship between any two or more closely
related (positively or negatively) variables in terns of the original units of their data. This
technique is extensively used as a formidable instrument in almost all the sciences viz.,
Natural science, physical Science and Social Sciences. This technique is invariably used for
studying the relationship between two or more related variables viz., Price and demand
and supply, production and consumption, expenditure on advertisement and volume of
sales, cost, volume and profit etc. This technique was developed by the British Biometrician
Sir Francis Galton in 1877 in course of his studying the relationship between the hights of
father's and the hights of sons.
The essential characteristics of regression analysis are.
(i) It consists of mathematical devices that are used to measure the average
relationship between two or more closely related variables.
(ii) It is used for estimating the unknown values of some dependent variable with
reference to the known values of its related independent variables.
(iii) It provides a mechanism for prediction or fore cost of the values of one variable in
terms of the values of the other variable.
(iv) It consists of two lines of equations viz., Equation of x on y and equation of y on x.
When the curve is a straight line, it is called a line of regression A line of regression is two
straight line which gives the best fit in the least square sense to the given frequency.

Regression lines
Suppose we are given `n' pairs of values (x1, y1) (x2, y2), (x3, y3)..... (xn, yn) of two variables x
and y. If we fit a st line to this data by taking x as independent variable and y as dependent
variable, than the st line obtained is called the line of regression of y on x. Its slope is called
the regression coefficient of y on x. Similarly If we fit a st line to the data by taking y as
independent variable and x as dependent variable, The line obtained is the line of regression
of x on y, the reciprocal of its slope is called the regression coefficient of x on y.
∑ x i yi
The equation of the line of regression of y on x. Its slope byx is given by byx = is the
∑ xi 2
∑ x i yi 1 1
regression coefficient of on x using this in r = σ x 2 = ∑ xi 2 σ y 2 = ∑ yi 2 we get
nσ x σ y n n
4.16 Engineering Mathematics - III

∑ x i yi rnσ x σ y  σy 
b yx = = =r  (1)
∑ xi 2
nσ x 2
 σx 
Similarly, the equation of the line of regression of x on y is given by x − x = bxy y − y ( )
∑ x i yi σ
where bxy = =r x
∑ yi 2
σy
1
The slope of two line is ; the reciprocal of this slope namely bxy which is given by
bxy
σx
bxy = r ⋅ (2)
σy
is the regression coefficient of x on y from (1) and (2) r = b yx ⋅ bxy i.e. |r| is the geometric
mean of bxy and byx.
Since |r| < 1, it follows that byx > 1 when ever bxy < 1.
The lines of regression always pass through the point x , y ( )
Problem 1
Calculate the coefficient of correlation and obtain the lines of regression for the
following data
x 1 2 3 4 5 6 7 8 9
y 9 8 10 12 11 13 14 16 15
Obtain an estimate for y which corresponds to x = 6.2
Solution
1 45
n = 9 in each of the x and y series. Their means are x = ∑ x i =
= 5;
n 9
1 108
y = ∑ yi = = 12
n 9
xi X i = xi − x Xi2 yi Yi = yi − y yi2 XiYi

1 –4 16 9 –3 9 12
2 –3 9 8 –4 16 12
3 –2 4 10 –2 4 4
4 –1 1 12 0 0 0
5 0 0 11 –1 1 0
6 1 1 13 1 1 1
Statistical Method 4.17

7 2 2 14 2 4 4
8 3 3 16 4 16 12
9 4 4 15 3 9 12
45 60 108 60 57
From the table we have Xi = 60, Yi = 60, XiYi = 57
2 2

∑ X iYi 57
Correlation coefficient is r = = = 0.95
∑ X i ∑ Yi
2
(2 60 )( )
∑ X i 2 60 ∑ Yi 2 60

σx2 = = ; σy = =
2

n 9 n 9
σ x = σ y = 2.582
\ The regression coefficient are
σy
b yx = r
= 0.95 and bxy = r σ x = 0.95
σx σy

( )
The line of regression of y on x, y − y = b yx x − x is y – 12 = (0.95) (x – 5)

y = 0.95x + 7.25 (1)


The line of regression of x on y,
( )
x − x = bxy y − y ⇒ x − 5 = ( 0.95) ( y − 12) or x = 0.95 y – 6.4
(2)

When x = 6.2 in (1) we get y = 0.95 (6.2) + 7.25 = 5.89 + 7.25


y = 13.14.

Problem 2
The following table gives the stopping distance by in meters of a motor bike
moving at a speed of x kms/hour when the breaks are applied.
x 16 24 32 40 48 56
y 0.39 0.75 1.23 1.91 2.77 3.81
Find the correlation coefficient between the speed and stopping distance, and
estimate the maximum speed at which the motor bike could be driven if the
stopping distance is not to exceed 5 mts.
Solution
n = 6 values of (x, y) are given.

1 216 1 10.86
x = ∑ xi =
= 36, y = ∑ yi = = 1.81
n 6 n 6
4.18 Engineering Mathematics - III

xi X i = xi − x Xi2 yi Yi = yi − y Yi2 XiYi


16 –20 400 0.39 –1.42 2.0164 28.4
24 –12 144 0.75 –1.06 1.1236 12.72
32 –4 16 1.23 –0.58 0.3364 2.32
40 4 16 1.91 0.1 0.001 0.4
48 12 144 2.77 0.96 0.9216 11.52
56 20 400 3.81 2.0 4.000 40.00
216 11.20 10.86 83.99 9.536
From the table we have Xi = 1120; Yi = 8399 XiYi = 95.36
2 2

The required coefficient of correlation is


∑ X iYi 95.36 95.36
r= = = = 0.983
(
∑ X ∑Y
2 2
i )( i )
1120 × 8.399 96.989

∑ x i 2 1120
The variances of x and y are σ x 2 = = = 186.667
n 6

∑ yi 8.3992
σ y2 = = = 1.3998
n 6
σ x = 186.667 = 13.663 , σ y = 1.3998 = 1.1831

σx 13.663
bxy = r
= 0.983 × = 11.352
σy 1.1831
\ The equation of the line of regression of x on y namely x − x = bxy y − y is, ( )
x – 36 = (11.352) ( y – 1.81)
x = 11.352 y + 15.453. (1)
When y = 5, Eqn. (1) gives x = 72.213.
⇒ For the stopping distance not to exceed 5 mts,
The speed must not exceed 72 kms/hrs

Problem 3
Obtain the lines of regression and hence find the coefficient of correlation for
the following data.
x 1 3 4 2 5 8 9 10 13 15
y 8 6 10 8 12 16 16 10 32 32
Statistical Method 4.19

Solution
n = 10 items in each of the x and y series.

Means are, x = ∑ x i = 70 = 7, y = 1 ∑ yi = 150 = 15


n 10 n 10

x y X =x−x Y = y− y X2 Y2 XY
1 8 –6 –7 36 49 42
3 6 –4 –9 16 81 36
4 10 –3 –5 9 25 15
2 8 –5 –7 25 49 35
5 12 –2 –3 4 9 6
8 16 +1 1 1 1 +1
9 16 2 1 4 1 2
10 10 3 –5 9 25 –15
13 32 6 17 36 289 102
15 32 8 17 64 289 136
70 150 204 818 360
Using the entries in the table X i = 204, Yi = 818 and XY= 360
2 2

∑ x i yi 360 ∑ x i yi 360
b yx =
= = 1.7647 bxy = = = 0.4401
∑ x i 2 204 ∑ yi 2 818
\ The equations of regression lines, namely

y − y = bxy ( x − x ) and x − x = bxy ( y − y )
are y – 15 = 1.7647 (x – 7) and (x – 7) = (0.4401) ( y – 15)
which simplify to, y = 1.7647 x + 2.6471
x = 0.4401 y + 0.3985
Since bxy and byx are both positive, the coefficient is positive and is given by
r = b yx bxy = 1.7647 × 0.4401 = 0.8813

Problem 4
The equations of regression lines of two variables x and y are
y = 0.516 x + 33.73 x = 0512 y + 32.52
Find the correlation coefficient and the means of x and y.
4.20 Engineering Mathematics - III

Solution
The equation y = 0516 x + 33.73 represents the regression line of y on x and the second
equation is x = 0.512 y + 3252 of x on y.
byx = 0.516, bxy = 0.512
\ The correlation coefficient is r = (0.516 )(0.512) = 0.514
( )
Since the lines of regression pass through the point , x , y the given equations are

satisfied by x = x and y = y , we have


y = 0.516 x + 33.73
(1)

x = 0512 y + 32.52 (2)
Solving (1) and (2) , x = 67.67 and y = 68.64

Problem 5
The two regression equations of the variables x and y are x = 19.13 – 0.87y and
y = 11.64 – 0.50x, Find (i) mean of x and y's (ii) The correlation coefficient
between x and y. (VTU 204)
Solution
( )
Since x , y satisfies the regression lines.
we have, x = 19.13 − 087 y (1)
y = 11.64 − 0.50x (2)
Multiplying (2) by 0.87 and subtracting form (1)
we have, [1 – (0.87) (0.50)] x = 19.13 − (11.64)(0.87)
0.57 x = 9.00 or x = 15.79

y = 11.64 − 0.50 (15.79) = 3.74

x = mean of x's = 15.79



y = mean of y's = 3.74

\ regression coefficient of y on x is –0.50, and regression coefficient of x on y is – 0.87
Since the coefficient of correlation is the geometric mean between the two regression
coefficients.
r = b yx × bxy =
( −0.50) × ( −0.87 ) = 0.43
r = – 0.66 [\ –ve sign is taken since the regression coefficients are –ve]

Statistical Method 4.21

Problem 6
1 − r2 σx σ y
If θ is the angle between the two regression lines, show that tanθ = ⋅ 2
r σx + σ y2
Explain the significance when r = 0 and r =  1 (VTU 2007)
Solution
The equations to the line of regression of y on x
σy

y− y =r
σx
(x − x)
σx
and x on y is x − x = r
σy
(y− y )
σy σy
\ their slopes are m1 = r and m2 =
σx rσ x
m2 − m1
We know that the angle between the two straight line is given by tanθ =
1 + m1m2
σy rσ y σy rσ y
− −
rσ x σ x rσ σx
tan  =
= x
σ rσ σ 2
1+ y × y 1 + y2
rσ x σ x σx

(1 − r )  σ
2
y 
 
r  σx   1 − r 2  σx σ y
= =  2
σx 2 + σ y2  r  σx + σ y
2

σx2
 1 − r 2  σx σ y
tanθ = 
 2
 r  σx + σ y
2


When r = 0, tanθ   or θ =
2
i.e., when the variables are independent, the two lines of regression are perpendicular
to each other.
When r =  1, tanθ = 0,
i.e., θ = 0 or . the lines of regression coincide
i.e., there is perfect correlation between the two variables.
4.22 Engineering Mathematics - III

Problem 7
In a partially destroyed laboratory record, only the lines of regression of y on
x and x on y are available as 4x – 5y + 33 = 0 and 20x – 9y = 107 respectively.
Calculate x , y and the coefficient of correlation between x and y.
(VTU 2005)
Solution
( )
Since the regression lines pass through , x , y therefore we have

4 x − 5 y + 33 = 0 (1)
20x − 9 y = 107 (2)
Multiply Eqn. (1) by 5 and subtract from (2)
272
We get 16 y = 272 ⇒ y = = 17
16
Put y = 17, in (1) 4 x − 5 (17) + 33 = 0

4 x − 85 + 33 = 0
52
4 x − 52 = 0 ⇒ x =
= 13
4
\ x 13
= =, and y 17
4 33
The line of regression of y on x as y = x+ ,
5 5
σ 4
Regression coefficient of y on x in byx = r y =
σx 5
9 107
The line of regression of x on y as x = y+ ,
20 9
σ 9
The regression coefficient of x on y bxy = r x =
σ y 20
σy σx 4 9 36
\ byx × bxy = r ×r = × ⇒ r2 = = 0.36
σx σ y 5 20 100
∴ r = 0.6 [∵ The positive sign being taken as byx and bxy both positive.
Statistical Method 4.23

Problem 8
In the following table are recorded data showing the test scores made by
salesman on an intelligence test and their weekly sales:
Sales man 1 2 3 4 5 6 7 8 9 10
Test Scores 40 70 50 60 80 50 90 40 60 60
Sales (000) 2.5 6.0 4.5 5.0 4.5 2.0 5.5 3.0 4.5 3.0
Calculate the regression line of sales on test scores and estimate the most
probable weekly sales volume if a salesman makes a score of 70.
Solution
Test scores Sales y Deviation of x Deviation of y dx × dy dx2 dy2
x from assumed from assumed
mean (= 60) dx average (= 4.5) dy
40 2.5 –20 –2 40 400 4
70 6.0 10 1.5 15 100 2.25
50 4.5 –10 0 0 100 0
60 5.0 0 0.5 0 0 2.25
80 4.5 20 0 0 400 0
50 2.0 –10 –2.5 25 100 6.25
90 5.5 30 1 30 900 1.00
40 3.0 –20 –1.5 30 400 2.25
60 4.5 0 0 0 0 0
60 3.0 0 –1.5 0 0 2.25
0 – 4.5 140 2400 18.25
From the table we have dx = 0, dy2 = –4.5, dxdy = 140, dx2 = 2400, dy2 = 18.25
0
x = mean of x (test scores) = 60 +
= 60
10
( −4.5) = 4.05
y = mean of y (sales) = 4.5 +

10
 σy 
Regression line of sales ( y) on scores (x) is given by y − y = r   x−x
 σx 
( )
σy ∑ xy σ y ∑ xy
Where r = × =
σx σ x σ y σ x ( σ x )2
4.24 Engineering Mathematics - III

∑ dx ∑ d y
∑ dx d y −
= n
 ( ∑ dx ) 
2

 ∑ dx −
2

 n 

0 × ( −4.5)
140 −
10 140
= 2
= = 0.06
0 2400
2400 −
10
 the required regression line is
y – 4.05 = 0.06 (x – 60) or y = 0.06 x + 0.45
For x = 70, y = 0.06 × 70 + 0.45 = 4.65
y = 0.06 × 70 + 0.45 = 4.65.
Thus the most probable weekly sales volume for a score of 70 is 4.65

Problem 9
Find the correlation coefficient and the regression lines of y on x and x on y for
the following data.
x 1 2 3 4 5
y 2 5 3 8 7

(VTU 2010)

Solution
There are n = 5 items in each of the x and y
∑ x i 15 ∑ yi 25
The means are x = = =3, y= = =5
n 5 n 5

xi X i = xi − x Xi2 yi Yi = yi − y Yi2 XiYi

1 –2 4 2 –3 9 6
2 –1 1 5 0 0 0
3 0 0 3 –2 4 0
4 1 1 8 3 9 3
5 2 4 7 2 4 4
15 10 25 26 13
From the table we have  X = 10,  Yi = 26 , XiYi = 13
2 2
i
Statistical Method 4.25

The coefficient of correlation is given by


∑ X iYi 13 13
r= = = = 0.8062
∑ X i ∑ Yi
2 2
10 × 26 260

∑ x i 2 10
Also, σ x 2 = = =2 ; σ x = 2 = 1.4142
n 5
∑ yi 2 26
σ y2 = = = 5.2 ; σ y = 5.2 = 2.2806
n 5
σy
The regression coefficients are byx = r
σx
2.2806
byx = 0.8062 ×
= 1.3001
1.4142
σx 1.4142
and bxy = r = 0.8062 × = 0.4999
σy 2.2806

(
The line of regression of y on x, is y − y = b yx x − x )
y – 5 = 1.3001 (x – 3)

y = 1.3001 x + 5 – 3 (1.3001)

y = 1.3001 x + 1.0997

(
The line of regression of x on y is x − x = bxy y − y )
x – 3 = 0.4999 ( y – 5)

x = 0.4999 y + 3 – 5(0.4999)

x = 0.4999y + 0.5005

Problem 10
The regression equations of two variables x and y are x = 0.7y + 5.2,
y = 0.3x + 2.8. Find the means of the variables and the coefficients of correlation
between them.
Solution
The equation y = 0.3x + 2.8 represents the regression line of y on x  byx = 0.3
The equation x = 0.7y + 5.2 represents the regression line of x on y  bxy = 0.7
\ The correlation coefficient is r = bxy × b yx


r = 0.7 × 0.3 = 0.21 = 0.4583
4.26 Engineering Mathematics - III

( )
Since the lines of regression pass through the point x , y the given equations are

Satisfied by x = x and y = y , we have,


y = 0.3x + 2.8
(1)
x = 0.7 y + 5.2 (2)
Multiply Eqn. (2) by (0.3) and subtract from Eqn. (1)
4.36
− 0.79 y + 4.36 = 0 ⇒ y = = 5.5189
0.79
x = 0.7 (5.5189) + 5.2

x = 9.0632
Problem 11
Find the lines of regression and coefficient of correlation for the data given
below:
n = 18, ∑x = 12, ∑y = 18, ∑x2 = 60, ∑y2 = 96, ∑xy = 48.
Solution
∑ x i 12 2 ∑ yi 18
Mean's x = = = ; y= = =1
n 18 3 n 18
∑ xi 2
2

() 60  2 
2
Variance, σ x = 2
− x = −
n 18  3 

10 4 30 − 4 26
σx2 = − = =
3 9 9 9
\ σx = 1.6997
∑ yi 2 96 − 18 78
( ) 96
− (1 ) =
2
− y
2
and σ y 2 = = =
n 18 18 18
 σ y = 2.0817
1  ∑ x i yi 
The coefficient of correlation r =  − x y
σx σ y  n 
1  48 2 
r=
− (1 )
1.6997 × 2.0817  18 3 
1 8 2 3
=  − =
1.6997 × 2.0817  3 3  1.6997 × 2.0817
r = 0.8479

Statistical Method 4.27

σx
The regression coefficient of x on y bxy = r
σy
1.6997
= 0.8479 × = 0.6923
2.0817
bxy = 0.6923
and the regression coefficient of y on x is
σy 2.0817
b yx = r
= 0.8479 × = 1.0385
σx 1.6997
The regression lines.
(
y − y = b yx x − x
)
 2
y − 1 = 1.0385  x − 
 3
2 × 1.0385
y = 1.0.385x −
3
y = 1.0385 x – 0.6923.
(
and x − x = bxy y − y )
2
x − = 0.6923 ( y − 1)
3
2
x = 0.6923 y – 0.6923 +
3
x = 0.6923 y – 0.2563

Problem 12
If the coefficient of correlation between two variables x and y is 0.5 and the acute
−1  3  1
angle between their lines of regression is tan   , show that σ x = σ y
5 2
(VTU 2004)
Solution
−1  3 
Given r = 0.5 θ = tan  
5
1 − r2 σx σ y
We have, tanθ = , 2
r σx + σ y2
4.28 Engineering Mathematics - III

3 1 − ( 0.5) σx σ y
2

= × 2
5 0.5 σx + σ y2
3 1 − 0.25 σx σ y
= × 2
5 0.5 σx + σ y2
3 3 σx σ y
= × 2
5 2 σx + σ y2
2(x2 + y2) = 5x y,
2x2 – 5x y + 2y2 = 0,
2x2 – 4x y – x y + 2y2 = 0,
2x(x – 2y) – y(x – 2y) = 0
(x – 2y) (2x – y) = 0
x – 2y = 0 or 2x – y = 0
1
σx = σ y
2
Problem 13
If the tangent of the angle θ between the lines of regression of y on x and x on y is
0.6 and the standard deviation of y is twice the standard deviation of x, find the
coefficient of correlation between x and y.
Solution
It is given that tanθ = 0.6, y = 2x
1 − r 2 σx σ y
\ tanθ = ⋅ 2 gives,
r σx + σ y2
1 − r2 2σ 2
0.6 = ⋅ 2 x 2 ∵ σ y = 2σ x 
r σ x + 4σ x
1 − r 2 2σ x 2
0.6 = ⋅
r 5σ x 2
3r = 2 – 2r2  2r2 + 3r – 2 = 0
1
(2r – 1) (r + 2) = 0. This gives the correlation coefficient as r = [Since r –2].
2
Problem 14
Two variables x and y are connected by the relation ax + by + c = 0. Show that the
correlation coefficient between them is –1 if the signs of a and b are the same
and +1 if they are different.
Statistical Method 4.29

Solution
−a c −b c
The given relation between x and y can be written as y = x− , x= y−
b b a a
These equations represent the regression lines.
a b
\ b yx = − and bxy = −
b a
r = byx bxy = 1  r = 1
2

Suppose a and b are of the same sign. Then bxy and byx are both negative and hence r
is negative. In this case r = –1. If a and b are of positive signs, then both byx and bxy are
positive and consequently r = 1.
Problem 15
The following information is available in respect of the price of a certain con-
sumer item in two cities. A, B. Average price in city A is Rs.65, average price in
city B is Rs 67; standard deviation in city A is 2.5; standard deviation in city B
is 3.5. The coefficient of correlation between the prices in the two cities is 0.8.
Find the most likely price in city B corresponding to the price of Rs. 70 in city A.
Solution
Let x denote the price in city A and y denote the price in city B. Given that
σ 3.5
x = 65, y = 67, σ x = 2.5, σ y = 3.5, r = 6.8 . These give b yx = r y = 0.8 × = 1.12 .
σx 2.5
Therefore, the equation of the line of regression of y on x is y – 67 = (1.12) (x – 65)
or y = (1.12)x –5.8
When x = 70, y = 1.12(70) – 5.8 = 78.4 – 5.8 = 72.6
Thus the most likely price in city B corresponding to the prices of Rs 70 in city A is
Rs. 72.6.
Problem 16
For two random variables x and y with the same mean, the two regression lines
b 1−a
are y = ax + b and x = ay +  . Show that = . Find also the common mean.
β 1−α
Solution
The regression lines y on x and x on y are
y = ax + b (1)
and x = ay + β (2)
(
Since the regression lines passes through x , y )
The equations are satisfied for x = x and y = y .
\ We have y = ax + b (3)
4.30 Engineering Mathematics - III

and x = a y + β (4)
Multiply Eqn.(4) by a and subtract form (3)
b + aβ
− (1 − aα ) y = − ( b + aβ ) ⇒ y = (5)
1 − aα
 b + aβ 
Eqn. (4) gives x = a   + β
 1 − aα 
β + bα
x= (6)
1 − aα
It is given that the mean of x and y are same
β + bα b + aβ
i.e., x = y ⇒ =
1 − aα 1 − aα
⇒ β + bα = b + aβ
b – b =  – a
b(1 – ) = (1 – a)
b 1−a
⇒ =
β 1−α
Exersice
1. Obtain the lines of regression and hence find the coefficient of correlation for
the following data
x 1 2 3 4 5 6 7
y 9 8 10 12 11 13 14
Ans.: y = 0.93x + 7.28, x = 0.93y – 6.23, r = 0.93
2. For the following data, find (i) the regression equations, (ii) the coefficient of
correlation, and (iii) most likely value of y when x = 30.
x 25 28 35 32 31 36 29 38 34 32
y 43 46 49 41 36 32 31 30 33 39
Ans.: x = 0.234 y + 40.892, y = 0.664x + 59.248, r = 0.39 & y = 39.
3. The equations of regression lines of two variables x and y are 3x + 2y = 26 and
6x + y = 31. Find the mean values of x and y and the correlation coefficient.
Ans.: x = 4 , y = 7 , r = −0.7
4. In a partially destroyed laboratory data, only the following regression equation
were available: 7x – 16y + 9 = 0, 5y – 4x -3 = 0. Find the means of x and y and
the coefficient of correlation between x and y.
Ans.: x = −0.1034, y = 0.5172, r = 0.7395


You might also like