0% found this document useful (0 votes)
27 views12 pages

Correlation

Correlation refers to the relationship between two variables, indicating how the average value of one changes with the other. It can be positive, negative, or absent, but correlation does not imply causation, meaning that just because two variables are correlated does not mean one causes the other. Various methods, such as scatter plots and Pearson's coefficient, are used to measure correlation and its degree, which can range from -1 to +1.

Uploaded by

mihaelk99hl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views12 pages

Correlation

Correlation refers to the relationship between two variables, indicating how the average value of one changes with the other. It can be positive, negative, or absent, but correlation does not imply causation, meaning that just because two variables are correlated does not mean one causes the other. Various methods, such as scatter plots and Pearson's coefficient, are used to measure correlation and its degree, which can range from -1 to +1.

Uploaded by

mihaelk99hl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT-II

CORRELATION
Correlation refers to the associations between variables. When an association exists between
two variables, it means that the average value of one variable changes as there is a change in
the value of the other variable. A correlation is the simplest type of association. When a
correlation is weak, it means that the average value of one variable changes only slightly (only
occasionally) in response to changes in the other variable. If there is no association, it means
that there is no change in the value of one variable in response to the changes in the other
variable. In some cases, the correlation may be positive or it may be negative. A positive
correlation means that as one variable increases the other variable increases, e.g. Height of a
child and age of the child. Negative correlation implies as one variable increases the other
variable decrease, e.g. value of a car and age of the car.
CORRELATION AND CAUSATION
The correlation between two variables measures the strength of the relationship between them
but it doesn’t indicate the cause and effect relationship between the variables. Correlation
measures co-variation, not causation. Causation means changes in one variable affects/ causes
the changes in other variable. In other words, just because two events or things occur together
does not imply that one is the cause of the other. A positive “linear” correlation between two
variables say X and Y implies that high values of X are associated with high values of Y, and
that low values of X are associated with low values of Y. It does not imply that X causes Y. for
example, a high degree of positive correlation may be obtained between the size of arms of
children and their reasoning ability i.e. children with longer arms reason better than those with
shorter arms, but there is no causal connection here. Children with longer arms reason better
because they’re older! In this example the common third factor ‘age’ is responsible for the high
correlation between size of arms and reasoning ability. This refers to spurious correlation.
Similarly a Researcher found a high degree of positive correlation between the number of
temple goers and the number of burglaries committed in different towns. An explanation that
more temple goers means more empty houses or attending temple makes people want to rob
would be a logical fallacy. Instead the third factor population is causing this relationship. The
highly populated area tends to have more temple goers and also case of burglaries.
Further, It is found that there is a positive and a high degree of correlation between the amount
of oranges imported and road accidents i.e. as the amount of imported oranges increases, so do
the traffic fatalities. However, it is fairly obvious just from logical thought that there is likely
to be no causal relationship between the two. That is, the importing of oranges does not cause
traffic fatalities. Conversely, if we stopped importing oranges, we would not expect the number
of traffic fatalities to decline. It may be a sheer coincidence that a high degree of correlation is
obtained between them
DEGREES OF CORRELATION
Through the coefficient of correlation, we can measure the degree or extent of the correlation
between two variables. On the basis of the coefficient of correlation we can also determine
whether the correlation is positive or negative and also its degree or extent.
1. Perfect correlation: If two variables change in the same direction and in the same
proportion, the correlation between the two is perfect positive. According to Karl Pearson the
coefficient of correlation in this case is +1. On the other hand, if the variables change in the
opposite direction and in the same proportion, the correlation is perfect negative. Its coefficient
of correlation is-1. In practice we rarely come across these types of correlations.
2. Absence of correlation: If two series of two variables exhibit no relations between them or
change in one variable does not lead to a change in the other variable, then we can firmly say
that there is no correlation or absurd correlation between the two variables. In such a case the
coefficient of correlation is 0.
3. Limited degrees of correlation: If two variables are not perfectly correlated or there is a
perfect absence of correlation, then we term the correlation as Limited correlation.
Thus Correlation may be positive, negative or zero but lies with the limits ± 1. i.e. the value of
r is such that -1 ≤ r ≤ +1. The + and – signs are used for positive linear correlations and negative
linear correlations, respectively.

• If x and y have a strong positive linear correlation, r is close to +1. An r value of exactly
+1 indicates a perfect positive correlation.
• If x and y have a strong negative linear correlation, r is close to -1. An r value of exactly
-1 indicates a perfect negative correlation.
• If there is no linear correlation or a weak linear correlation, r is close to 0.

METHODS OF DETERMINING CORRELATION


We shall consider the following most commonly used methods.
1. Scatter Plot
2. Karl Pearson’s coefficient of correlation
3. Spearman’s Rank-correlation coefficient.
Scatter Plot (Scatter diagram or dot diagram)
Scatter Plots (also called scatter diagrams) are used to graphically investigate the possible
relationship between two variables without calculating any numerical value. In this method,
the values of the two variables are plotted on a graph paper. One is taken along the horizontal
(X-axis) and the other along the vertical (Y-axis). By plotting the data, we get points (dots) on
the graph which are generally scattered and hence the name ‘Scatter Plot’.
The manner in which these points are scattered, suggest the degree and the direction of
correlation. The degree of correlation is denoted by ‘r’ and its direction is given by the signs
positive and negative.
Correlation Analysis MODULE - 4
Statistical Tools
(i) If all points lie on a rising straight line, the correlation is perfectly positive
and r = +1 (see fig. a)
(ii) If all points lie on a falling straight line the correlation is perfectly negative
and r = -1 (see fig. d)
(iii) If the points lie in narrow strip, rising upwards, the correlation is high degree
of positive (see fig. b)
Notes
(iv) If the points lie in a narrow strip, falling downwards, the correlation is high
degree of negative (see fig. e)
(v) If the points are spread widely over a broad strip, rising upwards, the
correlation is low degree positive (see fig. c)
(vi) If the points are spread widely over a broad strip, falling downward, the
correlation is low degree negative (see fig. f)
(vii) If the points are spread (scattered) without any specific pattern, the
correlation is absent. i.e. r = 0. (see fig. g)
Though this method is simple and is a rough idea about the existence and the degree
of correlation, it is not reliable. As it is not a mathematical method, it cannot
measure the degree of correlation.

10.7.2 Karl Pearson’s coefficient of correlation


It gives the precise numerical expression for the measure of correlation. It is
denoted by ‘r’. The value of ‘r’ gives the magnitude of correlation and its sign
denotes its direction. The mathematical formula for computing r is:

¦ xy
r= ...(1)
Nσ X σ y

where x = (X − X), y = (Y − Y), σ X = s.d. of X

σ y = s.d. of Y

and N = number of paris of observations

2 2
¦x ¦y
Since σ X = and σ y =
N N
So equation 1 can be rewritten as:

¦ xy
r=
2 2
¦x × ¦y

ECONOMICS 187
MODULE - 4 Correlation Analysis

Statistical Tools
By using actual mean
Σ (X − X)× (Y − Y)
r=
Σ (X − X) × Σ (Y − Y)
2 2 ...(2)

By assumed mean method


Σdx ⋅ Σdy
Notes Σdxdy −
r= N

Σdx 2

( Σdx )
2
× Σdy 2

( Σdy )
2
...(3)
N N
By direct method
N ¦ XY − [ ¦ X ][ ¦ Y ]
r=
N¦ X2 − ( ¦ X ) × N¦ Y2 − ( ¦ Y )
2 2
...(4)

Now covariance of X and Y is defined as


Σ ( Xi − X )( Yi − Y )
cov ( X, Y ) =
N
cov ( X, Y )
∴ r=
σXσY
Where N is the number of pairs of data.
dx = X – AX
dy = Y – AY

INTEXT QUESTIONS 10.5


1. Positive values of covariance indicate
(a) a positive variance of the X values
(b) a positive variance of the Y values
(c) the standard deviation is positive
(d) positive relation between two variables
Example 1: Calculate the coefficient of correlation between the expenditure on
advertising and sales of the company from the following data.
Advertising Expenditure 165 166 167 168 167 169 170 172
(in 000 `):
Sales (in Lakh `) 167 168 165 172 168 172 169 171
Solution: N = 8 (pairs of observations)

188 ECONOMICS
Correlation Analysis MODULE - 4
Statistical Tools
Table 10.3: Calculation of coefficient of correlation
Advertising Sales x= y= xy x2 y2
Expenditure (in Lakh `) Xi – X Yi – Y
(in 000 `) : Xi Yi
165 167 –3 –2 6 9 4
Notes
166 168 –2 –1 2 4 1
167 165 –1 –4 4 1 16
167 168 –1 –1 1 1 1
168 172 0 3 0 0 9
169 172 1 3 3 1 9
170 169 2 0 0 4 0
172 171 4 2 8 16 4
ΣXi = 1344 ΣYi = 1352 0 0 Σxy = 24 Σx2 = 36 Σy2 = 44

Calculation:
ΣXi 1344
X= =
N 8

Σx 2 36
= 168 cm and σ x = =
N 8
ΣYi 1352
Y= =
N 8

Σy 2 44
= 169 cm and σ y = =
N 8
Σxy 24 24
Now, r= = = = +0.6029
Nσ x σ y 36 44 36 × 44
8 ×
8 8

Since r is positive and 0.6. This shows that the correlation is positive and moderate
(i.e. direct and reasonably good).
Example 2: From the following data compute the coefficient of correlation
between X and Y.

ECONOMICS 189
MODULE - 4 Correlation Analysis

Statistical Tools
X Y
No. of items → 15 15
Arithmetic mean → 25 18

Σ ( Xi − X ) and Σ ( Yi − Y ) →
2 2
136 138
Notes
Σ ( Xi − X ) ⋅ Σ ( Yi − Y ) → 122

Solution: Given, N = 15, X = 25. Y = 18

Σ ( Xi − X )
2
i.e. Σx 2 = 136

Σ ( Yi − Y )
2
i.e. Σy2 = 138

and Σ ( Xi − X ) ⋅ Σ ( Yi − Y ) = Σxy = 122

Σxy
Using r=
Σx 2 × Σy 2

122 122
we get r= = = 0.891
136 × 138 136.9

Example 3: If covariance between X and Y is 12.3 and the variance of x and y are
16.4 and 13.8 respectively. Find the coefficient of correlation between them.
Solution: Given: Covariance = cov (X, Y) = 12.3
Variance of X (σx2 ) = 16.4
Variance of Y (σy2 ) = 13.8
Now,

cov ( X, Y ) 12.3
r= =
σxσy 16.4 × 13.8

12.3
= = 0.82
4.05 × 3.71

190 ECONOMICS
Correlation Analysis MODULE - 4
Statistical Tools
Calculation:

§ ( Σu )( Σv ) ·
Σuv − ¨ ¸¹
© N
rxy = ruv =
Σu 2

( Σu )
2
× Σv 2

( Σv )
2

N N
Notes
§ ( 5 )( 5 ) ·
53 − ¨ ¸
= © 10 ¹

85 −
( 5)
2
× 85 −
( 5)
2

10 10

§ ( 5 )( 5 ) ·
53 − ¨ ¸
= © 10 ¹

85 −
( 5)
2
× 85 −
( 5)
2

10 10

53 − 2.5
=
82.5 × 82.5
50.5
= = 0.61
82.5

10.6.3 Spearman’s Rank Correlation Coefficient


This method is based on the ranks of the items rather than on their actual values.
The advantage of this method over the others in that it can be used even when the
actual values of items are unknown. For example if you want to know the
correlation between honesty and wisdom of the boys of your class, you can use this
method by giving ranks to the boys. It can also be used to find the degree of
agreements between the judgments of two examiners or two judges. The formula
is:

6ΣD2
R = 1−
(
N N2 − 1 )
where R = Rank correlation coefficient
D = Difference between the ranks of two items
N = the number of observations.

ECONOMICS 193
MODULE - 4 Correlation Analysis

Statistical Tools
Note: –1 ≤ R ≤ 1.
(i) When R = +1 ⇒ Perfect positive correlation or complete agreement in
the same direction
(ii) When R = –1 ⇒ Perfect negative correlation or complete agreement in
the opposite direction.
Notes (iii) When R = 0 ⇒ No Correlation.

Computation:
(i) Give ranks to the values of items. Generally the item with the highest value
is ranked 1 and then the others are given ranks 2, 3, 4 ... according to their
values in the decreasing order.
(ii) Find the difference D = R1 – R2
where R1 = Rank of X and R2 = Rank of Y
Note that ΣD = 0 (always)
(iii) Calculate D2 and then find ΣD2
(iv) Apply the formula.

Note :
In some cases, there is a tie between two or more items. For example if each item
4+5
have rank say 4th then they are given = 4.5th rank. If three items are of equal
2
4+5+6
rank say 4th then they are given = 5th rank each. If m be the number of
3
1
items of equal ranks, the factor (m3 – m) is added to SD2. If there is more than
12
one of such cases then this factor added as many times as the number of such cases,
then
­
6 ® ΣD 2 +
1
(
m13 − m1 +
1
) ( ½
m 23 − m 2 + ...¾ )
R = 1− ¯ ¿
12 12
2
(
N N −1 )
Example 6 : Calculate ‘ R ’ from the following data.
Student No. : 1 2 3 4 5 6 7 8 9 10
Rank in Maths : 1 3 7 5 4 6 2 10 9 8
Rank in Stats : 3 1 4 5 6 9 7 8 10 2

194 ECONOMICS
Correlation Analysis MODULE - 4
Statistical Tools
Solution:
Table 10.5: Calculation of rank correlation
Student Rank in Rank in D = (R1 – R2) D2
No. Maths (R1) Stats (R2)
1 1 3 –2 4
2 3 1 2 4 Notes
3 7 4 3 9
4 5 5 0 0
5 4 6 –2 4
6 6 9 –3 9
7 2 7 –5 25
8 10 8 2 4
9 9 10 –1 1
10 8 2 6 36
N = 10 ΣD = 0 ΣD2 = 96

Calculation of R :
6ΣD2 6 ( 96 )
R = 1− = 1− 6 × 96
(
N N2 −1 ) 10 (100 − 1) = 1 − 10 × 99 = 0.4181
Example 7: Calculate ‘R’ of 6 students from the following data.
Marks in Stats : 40 42 45 35 36 39
Marks in English : 46 43 44 39 40 43
Solution:
Table 10.6: Calculation of rank correlation
Marks in R1 Marks in R2 D D2
Stats English
40 3 46 1 2 4
42 2 43 3.5 –1.5 2.25
45 1 44 2 –1 1
35 6 39 6 0 0
36 5 40 5 0 0
39 4 43 3.5 0.5 0.25
N=6 ΣD = 0 ΣD2 = 750

ECONOMICS 195
MODULE - 4 Correlation Analysis

Statistical Tools
Here m = 2 since in series of marks in English of items of values 43 repeated twice.

­
6 ® ΣD 2 +
1 3 ½
2 −2 ¾ ( ­
) 1 ½
6 ®7.5 + ( 8 − 2 ) ¾
R = 1− ¯ ¿ = 1− ¯ ¿
12 12
N N −12
( )6 ( 36 − 6 )
Notes

6 ( 7.5 + 0.5 )
R = 1− = 0.771
210

Example 8: The value of Spearman’s rank correlation coefficient for a certain


number of pairs of observations was found to be 2/3. The sum of the squares of
difference between the corresponding rnarks was 55. Find the number of pairs.

Solution: We have

6 ΣD 2 2
1− and ΣD2 = 55
(
N N −12
) but R =
3

2 6 × 55
∴ = 1−
3 N N2 −1 ( )
1 6 × 55
∴ − =−
3 N N2 − 1 ( )
∴ N(N2 –1) 6 × 55

Now N(N2 –1) = 990

∴ N(N2 –1) = 10 × 99 = 10(100 – 1)

∴ N(N2 –1) = 10(102 – 1) ⇒ N = 10

Therefore, there were 10 students.

196 ECONOMICS
Correlation Analysis MODULE - 4
Statistical Tools

INTEXT QUESTIONS 10.6


1. The marks awarded by two judges in a certain beauty contest are given below:
Judge I 56 75 45 71 61 64 58 80 76 61
Judge II 66 70 40 60 65 56 59 77 67 63 Notes
By Using Rank correlation method, Determine whether the two judges have
common taste in the judgement of beauty?

WHAT YOU HAVE LEARNT


z Correlation measures the associations between variables. Correlation can be
positive or negative and linear or non-linear. It is denoted by r.
z The value of r lies between -1 and +1 i.e. -1 ≤ r ≤ +1.
z The correlation coefficient ‘r’ is independent of change of origin and change
of scale.
z The important methods of measuring correlation are (i) Scatter Plot (ii) Karl
Pearson’s coefficient of correlation; and (iii) Spearman’s Rank-correlation
coefficient;
z Scatter Plots are used to graphically investigate the possible relationship
between two variables without calculating of any numerical value.
z The mathematical formula for computing r using Karl Pearson method is
given:
¦ xy
r= ...(1)
Nσ X σ y

where x = (X − X), y = (Y − Y), σ X = s.d. of X


σX = s.d. of Y and N = number of paris of observation
z Correlation (r) can also be calculated using actual figure of two variables X and
Y as follows:

N ¦ XY − [ ¦ X ][ ¦ Y ]
r=
N¦ X2 − ( ¦ X ) × N¦ Y2 − ( ¦ Y )
2 2

z The covariance ‘of two variables say X and Y is defined as:

¦ (X − X)(Y − Y)
cov(X, Y) =
N

ECONOMICS 197

You might also like