0% found this document useful (0 votes)
64 views7 pages

Correlation and Regression

This document discusses correlation and regression. It defines correlation as a statistical measure of the relationship between two variables. There are different types of correlation such as positive, negative, linear, and non-linear. Regression analysis estimates the value of one variable based on the other variable. There are two regression lines - the line of Y on X which estimates Y from X, and the line of X on Y which estimates X from Y. Pearson's correlation coefficient and Spearman's rank correlation coefficient are presented as methods to quantify correlation between variables.

Uploaded by

Aditya Messi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
64 views7 pages

Correlation and Regression

This document discusses correlation and regression. It defines correlation as a statistical measure of the relationship between two variables. There are different types of correlation such as positive, negative, linear, and non-linear. Regression analysis estimates the value of one variable based on the other variable. There are two regression lines - the line of Y on X which estimates Y from X, and the line of X on Y which estimates X from Y. Pearson's correlation coefficient and Spearman's rank correlation coefficient are presented as methods to quantify correlation between variables.

Uploaded by

Aditya Messi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 7

Correlation and Regression deenanathlamichhane@gmail.

com

Correlation and Regression


Correlation is a statistical device used to measure the degree and direction of
association between two (or more) variables.

Types of correlation

1. Positive/ Negative correlation.


2. Linear/ Non-linear correlation.
3. Simple/ Partial/ Multiple correlation.

Examples:

1. Let, X = Number of copies and Y = Price then,

X 1 2 3 4 5

Y 50 100 150 200 250

There is positive correlation between X and Y.


2. Let, X = Number of militaria (i.e. no. of militaries) and Y = Number of days for
which the food lasts for them, then,

X 100 200 300 400 500

Y 100 50 1 25 20
33
3

There is negative correlation between X and Y.


3. Let X and Y are related as follows: [i.e. 𝑦 = 𝑥 2 ]

X -2 -1 0 1 2

Y 4 1 0 1 4

1
Correlation and Regression deenanathlamichhane@gmail.com

The correlation between two linearly associated variables (say X and Y) is called
simple linear correlation.

Methods of studying correlation

A. Karl Pearson's coefficient of Correlation between two variables X and Y


is denoted by 𝑟𝑥𝑦 (𝑠𝑖𝑚𝑝𝑙𝑦 ′𝑟′) and it is defined by:

𝐶𝑜𝑣(𝑋, 𝑌)
𝑟= … … … (𝑖)
√𝑉𝑎𝑟(𝑋) √𝑉𝑎𝑟(𝑌)

Where,

2
2
∑(𝑥 − 𝑥̅ )2 ∑ 𝑥 2 ∑𝑥 1 2
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑋 = 𝜎𝑥 = = − ( ) = 2 [𝑛 ∑ 𝑥 2 − (∑ 𝑥) ]
𝑛 𝑛 𝑛 𝑛

2
2
∑(𝑦 − 𝑦̅)2 ∑ 𝑦 2 ∑𝑦 1 2
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑌 = 𝜎𝑦 = = − ( ) = 2 [𝑛 ∑ 𝑦 2 − (∑ 𝑦) ]
𝑛 𝑛 𝑛 𝑛

And, Covariance (X, Y) is the joint variation between X and Y is given by

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) ∑ 𝑥𝑦 ∑𝑥 ∑𝑦


𝐶𝑜𝑣(𝑋, 𝑌) = 𝜎𝑥𝑦 = = − ( )( )
𝑛 𝑛 𝑛 𝑛
1
= [𝑛 ∑ 𝑥𝑦 − (∑ 𝑥) (∑ 𝑦)]
𝑛2

Putting these values in equation (i) we get,

𝜎𝑥𝑦
𝑟= … … … … … … … … … … … … … … . . (𝑖𝑖)
𝜎𝑥 . 𝜎𝑦

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)


𝑂𝑟, 𝑟= … … … … … … … (𝑖𝑖𝑖)
√∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅)2

𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑂𝑟, 𝑟= … … (𝑖𝑣)
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2

2
Correlation and Regression deenanathlamichhane@gmail.com

Similarly, for bi-variate frequency distribution,

𝑁 ∑ 𝑓𝑥𝑦 − (∑ 𝑓𝑥)(∑ 𝑓𝑦)


𝑟= … … (𝑣)
√𝑁 ∑ 𝑓𝑥 2 − (∑ 𝑓𝑥)2 √𝑁 ∑ 𝑓𝑦 2 − (∑ 𝑓𝑦)2

Effect of change of origin and/or scale in correlation coefficient (𝒓𝒙𝒚 ): -

Let, 𝐴 =assumed mean of 𝑋 −series

𝐵 =assumed mean of 𝑌 −series

ℎ = Non-zero scale factor in 𝑋 − series.

𝑘 = Non-zero scale factor in 𝑌 − series.


𝑥−𝐴 𝑦−𝐵 𝑁 ∑ 𝑓𝑢𝑣−(∑ 𝑓𝑢)(∑ 𝑓𝑣)
Define 𝑢 = 𝑎𝑛𝑑 𝑣 = then, 𝑟𝑥𝑦 = 𝑟𝑢𝑣 = … … (𝑣𝑖)
ℎ 𝑘 √𝑁 ∑ 𝑓𝑢2 −(∑ 𝑓𝑢)2 √𝑁 ∑ 𝑓𝑣 2 −(∑ 𝑓𝑣)2

𝑖. 𝑒. 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑖𝑠 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑐ℎ𝑎𝑛𝑔𝑒 𝑜𝑓 𝑜𝑟𝑖𝑔𝑖𝑛 𝑎𝑛𝑑 (𝑜𝑟) 𝑠𝑐𝑎𝑙𝑒.

Note that −1 ≤ 𝑟 ≤ +1 and it tells us about the degree and direction of association
between two variables X and Y as described below:

1. If 𝑟 = ±1, then there is perfectly positive/negative correlation.


2. If 'r' is closer to ±1, then there is high degree positive/negative correlation.
3. If 'r' is closer to 0, then there is low degree positive/negative correlation.
4. If 𝑟 = 0, then there no correlation between X and Y.

Probable error of correlation coefficient;

(1 − 𝑟 2 )
𝑃𝐸(𝑟) = 0.6745
√𝑛

Interpretations:

1. If 𝑃𝐸(𝑟) > |𝑟| then, correlation is insignificant.


2. If 6𝑃𝐸(𝑟) < |𝑟| then, correlation is significant and the limits of population
correlation are 𝑟 ± 𝑃𝐸(𝑟).

3
Correlation and Regression deenanathlamichhane@gmail.com

3. In other cases, nothing can be said.

Example: - The correlation coefficient between 10 pairs of observation of demand and


supply was found to be 0.8. Test the significance of the result.

Solution: - To test the significance of calculated value of correlation coefficient, we


compute probable error.

(1 − 𝑟 2 ) (1 − 0.82 )
𝑃𝐸(𝑟) = 0.6745 = 0.6745 = 0.0768 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑛𝑜𝑡 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑡ℎ𝑎𝑛 |𝑟|
√𝑛 √10

∴ 6𝑃𝐸(𝑟) = 6 × 0.0768 = 0.4607 < |𝑟| = 0.8

So, the correlation is significant and limits of population correlation are

𝑟 ± 𝑃𝐸(𝑟) = 0.8 ± 0.0768 = [0.8 − 0.0768, 0.8 + 0.0768] = [0.7232, 0.8768]

B. Spearman's coefficient of rank correlation is denoted by ′𝑟𝑠 ′ or simply 'r' and


it is given by:

6 ∑ 𝑑2
𝑟 =1− … … … … … … … … … … … … … … . (𝑖)
𝑛(𝑛2 − 1)

Where, 𝑑 = 𝑅1 − 𝑅2
R1= rank of values of first variable (say 'X')
R2= rank of values of second variable (say 'Y')
n = number of pairs of ranks.
Rank correlation coefficient 𝑟𝑠 is interpreted in the same way as 𝑟𝑥𝑦 and −1 ≤ 𝑟𝑠 ≤
+1.
Ranking and adjustment for tied (repeated) rank: - The highest value being
ranked '1', the second highest value being ranked '2' and so on. When two or more
values are same/repeated then they are given the same rank that is an average of
the ranks they would get if they were different. In this case, an adjustment for Σ𝑑 2

4
Correlation and Regression deenanathlamichhane@gmail.com

is suggested in formula of spearman’s rank correlation coefficient. It is given


below;
𝑚1 (𝑚1 2 − 1) 𝑚2 (𝑚2 2 − 1) 𝑚3 (𝑚3 2 − 1)
6 [∑ 𝑑 2 + + + + ⋯…………]
12 12 12
𝑟 =1− … … (𝑖𝑖)
𝑛(𝑛2 − 1)
Where, 𝑚1 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑖𝑟𝑠𝑡 𝑡𝑖𝑒𝑑 𝑟𝑎𝑛𝑘.
𝑚2 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑐𝑜𝑛𝑑 𝑡𝑖𝑒𝑑 𝑟𝑎𝑛𝑘.
𝑚3 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡ℎ𝑖𝑟𝑑 𝑡𝑖𝑒𝑑 𝑟𝑎𝑛𝑘 𝑎𝑛𝑑 𝑠𝑜 𝑜𝑛.

Regression Analysis
It is a statistical technique of estimating value of one (dependent) variable when
value of another (independent) variable is known. There are always two
lines/equations of regression: line of X on Y, and line of Y on X. These two regression
equations are irreversible because assumptions of their construction are different. i.e.
they are prepared by minimising the error in X and by minimising the error in Y
respectively (by using method of least squares).

A. Line/ equation of Y on X: - It is used to estimate the value of Y when value of X


is given/known. The equation of Y on X is written as;

𝑦 = 𝑎 + 𝑏𝑦𝑥 𝑥 … … … … … … … … … … … . . (𝑖)
Where, y = dependent variable
x = independent variable
byx = regression coefficient of Y on X.
a = Value of Y when X = 0.
By using given data, we compute constants of regression equation (i) as follows:

𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑏𝑦𝑥 =
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2

∑𝑦 ∑𝑥
𝑎= − 𝑏̂𝑦𝑥
𝑛 𝑛

B. Line/ equation of X on Y: - It is used to estimate the value of X when value of Y


is given/known. The equation of X on Y is written as;

𝑥 = 𝑎 + 𝑏𝑥𝑦 𝑦 … … … … … … … … … … … . . (𝑖)
Where, x = dependent variable

5
Correlation and Regression deenanathlamichhane@gmail.com

y = independent variable
bxy = regression coefficient of X on Y.
a = Value of X when Y = 0.
By using given data, we compute constants of regression equation (i) as follows:

𝑛 ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑏𝑥𝑦 =
𝑛 ∑ 𝑦 2 − (∑ 𝑦)2

∑𝑥 ∑𝑦
𝑎= − 𝑏̂𝑥𝑦
𝑛 𝑛

DEVIATION FROM MEAN METHOD

A. Line/ equation of Y on X: - It is used to estimate the value of Y when value of X


is given/known. The equation of Y on X is written as;

𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ ) … … … … … … … … … … … . . (𝑖)

B. Line/ equation of X on Y: - It is used to estimate the value of X when value of Y


is given/known. The equation of X on Y is written as;

𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅) … … … … … … … … … … … . . (𝑖𝑖)


Also,

𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

𝜎𝑦
𝑏𝑦𝑥 = 𝑟
𝜎𝑥
Note that
i. Two lines of regression intersect at (𝑥̅ 𝑦̅)
ii. Both regression coefficients and correlation coefficient have same sign.
iii. Correlation coefficient is the geometric mean between two regression
coefficients. i.e.
𝑟𝑥𝑦 = √𝑏𝑥𝑦 . 𝑏𝑦𝑥

iv. Regression coefficients are independent of change of origin but not of scale:
- Let,
𝐴 =assumed mean of 𝑋 −series

𝐵 =assumed mean of 𝑌 −series


6
Correlation and Regression deenanathlamichhane@gmail.com

ℎ = Non-zero scale factor in 𝑋 − series.

𝑘 = Non-zero scale factor in 𝑌 − series.


𝑥−𝐴 𝑦−𝐵 ℎ 𝑛 ∑ 𝑢𝑣−(∑ 𝑢)(∑ 𝑣) ℎ
Define 𝑢 = 𝑎𝑛𝑑 𝑣 = then, 𝑏𝑥𝑦 = 𝑏𝑢𝑣 × = ∑ 2 2 ×
ℎ 𝑘 𝑘 𝑛 𝑣 −(∑ 𝑣) 𝑘
𝑘 𝑛 ∑ 𝑢𝑣−(∑ 𝑢)(∑ 𝑣) 𝑘
And 𝑏𝑦𝑥 = 𝑏𝑣𝑢 × = ×
ℎ 𝑛 ∑ 𝑢2 −(∑ 𝑢)2 ℎ

v. Coefficient of determination: - The square of correlation coefficient [𝑟 2 ] is


called coefficient of determination.
If 𝑟 = 0.9 then 𝑟 2 = 0.81. It means 81% of total variation in dependent
variable is explained by independent variable and rest 19% remains
unexplained.

You might also like