Correlation and Regression
Correlation and Regression
com
Types of correlation
Examples:
X 1 2 3 4 5
Y 100 50 1 25 20
33
3
X -2 -1 0 1 2
Y 4 1 0 1 4
1
Correlation and Regression deenanathlamichhane@gmail.com
The correlation between two linearly associated variables (say X and Y) is called
simple linear correlation.
𝐶𝑜𝑣(𝑋, 𝑌)
𝑟= … … … (𝑖)
√𝑉𝑎𝑟(𝑋) √𝑉𝑎𝑟(𝑌)
Where,
2
2
∑(𝑥 − 𝑥̅ )2 ∑ 𝑥 2 ∑𝑥 1 2
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑋 = 𝜎𝑥 = = − ( ) = 2 [𝑛 ∑ 𝑥 2 − (∑ 𝑥) ]
𝑛 𝑛 𝑛 𝑛
2
2
∑(𝑦 − 𝑦̅)2 ∑ 𝑦 2 ∑𝑦 1 2
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑌 = 𝜎𝑦 = = − ( ) = 2 [𝑛 ∑ 𝑦 2 − (∑ 𝑦) ]
𝑛 𝑛 𝑛 𝑛
𝜎𝑥𝑦
𝑟= … … … … … … … … … … … … … … . . (𝑖𝑖)
𝜎𝑥 . 𝜎𝑦
𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑂𝑟, 𝑟= … … (𝑖𝑣)
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
2
Correlation and Regression deenanathlamichhane@gmail.com
Note that −1 ≤ 𝑟 ≤ +1 and it tells us about the degree and direction of association
between two variables X and Y as described below:
(1 − 𝑟 2 )
𝑃𝐸(𝑟) = 0.6745
√𝑛
Interpretations:
3
Correlation and Regression deenanathlamichhane@gmail.com
(1 − 𝑟 2 ) (1 − 0.82 )
𝑃𝐸(𝑟) = 0.6745 = 0.6745 = 0.0768 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑛𝑜𝑡 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 |𝑟|
√𝑛 √10
6 ∑ 𝑑2
𝑟 =1− … … … … … … … … … … … … … … . (𝑖)
𝑛(𝑛2 − 1)
Where, 𝑑 = 𝑅1 − 𝑅2
R1= rank of values of first variable (say 'X')
R2= rank of values of second variable (say 'Y')
n = number of pairs of ranks.
Rank correlation coefficient 𝑟𝑠 is interpreted in the same way as 𝑟𝑥𝑦 and −1 ≤ 𝑟𝑠 ≤
+1.
Ranking and adjustment for tied (repeated) rank: - The highest value being
ranked '1', the second highest value being ranked '2' and so on. When two or more
values are same/repeated then they are given the same rank that is an average of
the ranks they would get if they were different. In this case, an adjustment for Σ𝑑 2
4
Correlation and Regression deenanathlamichhane@gmail.com
Regression Analysis
It is a statistical technique of estimating value of one (dependent) variable when
value of another (independent) variable is known. There are always two
lines/equations of regression: line of X on Y, and line of Y on X. These two regression
equations are irreversible because assumptions of their construction are different. i.e.
they are prepared by minimising the error in X and by minimising the error in Y
respectively (by using method of least squares).
𝑦 = 𝑎 + 𝑏𝑦𝑥 𝑥 … … … … … … … … … … … . . (𝑖)
Where, y = dependent variable
x = independent variable
byx = regression coefficient of Y on X.
a = Value of Y when X = 0.
By using given data, we compute constants of regression equation (i) as follows:
𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑏𝑦𝑥 =
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2
∑𝑦 ∑𝑥
𝑎= − 𝑏̂𝑦𝑥
𝑛 𝑛
𝑥 = 𝑎 + 𝑏𝑥𝑦 𝑦 … … … … … … … … … … … . . (𝑖)
Where, x = dependent variable
5
Correlation and Regression deenanathlamichhane@gmail.com
y = independent variable
bxy = regression coefficient of X on Y.
a = Value of X when Y = 0.
By using given data, we compute constants of regression equation (i) as follows:
𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑏𝑥𝑦 =
𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
∑𝑥 ∑𝑦
𝑎= − 𝑏̂𝑥𝑦
𝑛 𝑛
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ ) … … … … … … … … … … … . . (𝑖)
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
𝜎𝑦
𝑏𝑦𝑥 = 𝑟
𝜎𝑥
Note that
i. Two lines of regression intersect at (𝑥̅ 𝑦̅)
ii. Both regression coefficients and correlation coefficient have same sign.
iii. Correlation coefficient is the geometric mean between two regression
coefficients. i.e.
𝑟𝑥𝑦 = √𝑏𝑥𝑦 . 𝑏𝑦𝑥
iv. Regression coefficients are independent of change of origin but not of scale:
- Let,
𝐴 =assumed mean of 𝑋 −series