Regression
Regression
Concept of Regression:
If two variables are significantly correlated, and if there is some theoretical
basis for doing so, it is possible to predict values of one variable from the
other. This observation leads to a very important concept known as
‘Regression Analysis’.
Regression analysis, in general sense, means the estimation or
prediction of the unknown value of one variable from the known value of the
other variable. It is one of the most important statistical tools which are
extensively used in almost all sciences – Natural, Social and Physical. It is
specially used in business and economics to study the relationship between
two or more variables that are related causally and for the estimation of
demand and supply graphs, cost functions, production and consumption
functions and so on.
Regression analysis was explained by M. M. Blair as follows:
“Regression analysis is a mathematical measure of the average relationship
between two or more variables in terms of the original units of the data.”
Some of the examples of dependent and independent variables
(i) Hours spent studying Vs Marks scored by students
(ii) Amount of rainfall Vs Agricultural yield
(iii) Electricity usage Vs Electricity bill
(iv) Suicide rates Vs Number of stressful people
(v) Years of experience Vs Salary
(vi) Demand Vs Product price
(vii) Age Vs Beauty
(viii) Age Vs Health issues
(ix) Number of Degrees Vs Salary
(x) Number of Degrees Vs Education expenditure
Review/summary of objectives of regression:
[1] To determine whether a relationship exists between two variables
[2] To describe the nature of the relationship, should one exist, in the form of
a mathematical equation
[3] To assess the degree of accuracy of description or prediction achieved by
the regression equation, and
Assumptions of Linear Regression:
[1] Relationship is approximately linear (approximates a straight line in
scatter plot of Y, X)
[2] For each value of X there is a probability distribution of independent
values of Y, and from each of these Y distributions one or more values are
sampled at random.
[3] The means of the Y distributions fall on the regression line.
Lines of Regression and Equation:
Simple regression:
It is used to examine the relationship between one dependent and one
independent variable. After performing an analysis, the regression statistics
can be used to predict the dependent variable when the independent variable
is known.
The regression line (known as the least squares line):
It is a plot of the expected value of the dependent variable for all values of the
independent variable. Technically, it is the line that "minimizes the squared
residuals". The regression line is the one that best fits the data on a
scatterplot.
Using the regression equation, the dependent variable may be
predicted from the independent variable. The slope of the regression line (b)
is defined as the rise divided by the run. The y intercept (a) is the point on the
y axis where the regression line would intercept the y axis.
The slope and y intercept are incorporated into the regression equation. The
intercept is usually called the constant, and the slope is referred to as the
coefficient. Since the regression model is usually not a perfect predictor,
there is also an error term in the equation.
Here is a way to mathematically describe a linear regression model:
y = a + bx + e
If the slope is significantly different than zero, then we can use the
regression model to predict the dependent variable for any value of the
independent variable.
If the slope is zero. It has no prediction ability because for every value of the
independent variable, the prediction for the dependent variable would be the
same. Knowing the value of the independent variable would not improve our
ability to predict the dependent variable. Thus, if the slope is not significantly
different than zero, don't use the model to make predictions.
The standard error of the estimate for regression measures the amount of
variability in the points around the regression line. It is the standard deviation
of the data points as they are distributed around the regression line. The
standard error of the estimate can be used to develop confidence intervals
around a prediction.
A line minimizes the sum of squares of differences value given by
straight line, is chosen. This principle is called as least square principle. The
equation so obtained is called as least square regression line.
of squares of errors. We find the points minima using calculus methods. The
s s
solution of equation = 0 and = 0 gives extreme points.
a b
n n
S= ei2 = (yi -a-bx i ) 2
i=1 i=1
s n
a
=
a i=1
(yi -a-bx i ) 2
n
(yi -a-bx i ) 2
i 1 a
0 2 Yi – a-bx i
yi –a–bx i 0
yi - na – b x i 0
yi na + b x i .. 2
Similarly,
s
yi –a–bx i = 0 gives
2
b b
2 yi –a – bx i -x i = 0
x i yi – a x i b x i 2 = 0
x i yi a x i b xi 2 3
yi = na + b xi
na y i – b xi
yi xi
a -b
n n
a = y -bx 4
Substituting,
x i yi y bx x i b x i 2
x
xi yi nx y nb( x )2 b xi 2 x xi nx
n
xi yi nx y b xi 2 n x
2
Dividing by n we get
xi yi xi 2 2
x y b x
n n
xi yi xi 2
xy and Var x σ x
2 2
But, Cov(x,y) = x
n n
Cov(x,y) b Var x
Cov(x,y)
b=
σ 2x
Cov(x,y)
b yx = 5
σ 2x
y = y-b yx x+b yx x
y -y = b yx x-x
Cov(x,y)
y -y = x-x
σ 2x
Y on X
Derivation of Linear Regression Model of X on Y:
Suppose (xi, yi); i= 1,2,…,n., are n pairs of observations on variables X, Y.
We assume that X as dependent variable, which can be expressed in terms of
Y. The simplest form is the linear relation. Suppose X = bY + a ; However
when we observe the numerical values of x and y, the relation may not be
observed perfectly.
We assume the model X = bY + a + e ... (1)
We assume E(e) = 0 and Var (e) = o.
The equation (1) contains three unknown quantities and our main aim is to
estimate these quantities by using least square principle, whereas e is a
random variable, we estimate its parameters E (e) and Var (e).
We estimate a and b so that the error is minimum
e = x-by-a
By using principal of least square
n
Symbolically we write S= ei2 as sum of squares of errors. We find the
i=1
s
points minima using calculus methods. The solution of equation =0 and
a
s
=0 , gives extreme points.
b
n n
S= e = (x i -a-byi ) 2
2
i
i=1 i=1
s n
a
=
a i=1
(x i -a-byi ) 2
s n
= (x i -a-byi ) 2
a i=1 a
0 2 x i -a-byi
∑ ( x i -a-byi ) = 0
xi na – b yi 0
xi na b yi .. 2
∑xi – na – b ∑ yi = 0
s
Similarly, ∑(xi –a –byi)2 = 0 gives
b b
2 xi – a – byi yi 0
xi yi – a yi b yi 2 0
xi yi a yi b yi 2 4
xi na b yi
na xi – b yi
xi yi
a b
n n
a x by 4
Substituting, a x b y in equation (3), we get
xi yi x by yi b yi 2
y
xi yi x by ny b yi 2 y yi ny
n
xi yi ny x nb y b yi 2
2
xi yi ny x b yi 2 nb y
2
xi yi ny x b yi 2 n y
2
Dividing by n we get
xi yi yi 2 2
x y b y
n n
xi yi yi 2
xy and Var y σ y
2 2
But, Cov(x,y) = y
n n
Cov(x,y) bVar y
Cov(x,y)
b=
σ 2y
Cov(x,y)
b xy = 5
σ 2y
Substituting equation (4) and (5) in the regression equation, y=a + bx , we get
x = x -b xy y b xy y
x -x = b xy y y
Cov(x,y)
x -x = y y
σ 2y
of X on Y
Proof:
r= b yx ×b xy
the regression coefficients are invariant to the change of origin. This property
makes the computations of regression coefficients simple. We can subtract a
constant from each observation for computations
[7] Regression coefficients are invariant to the change of origin but not of
scale.
Proof:
Let
x-a y-b
u= and v =
h k
x-a y-b 1
Cov , =Cov u, v = Cov(x , y)
h k hk
1 2 1 2
σ2 X-a =σ 2u = σx and σ 2 y-b =σ 2v = σy
h h2
k
k2
Cov u, v Cov(x , y) hk k Cov(x , y)
b uv = = =
σ 2v σ 2y k 2 h σ 2y
Cov u, v Cov(x , y) hk h Cov(x , y)
b vu = = =
σ 2u σ 2x h 2 k σ 2x
[8] If r = +1
-
, then regression coefficients are reciprocals of each other.
Proof: We have,
b yx ×b xy = r 2
b yx ×b xy =1
1 1
b xy = or b yx =
b yx b xy
Proof:
We have
Cov(x,y) rσ y Cov(x,y) rσ x
b yx = = and b xy = =
σ 2x σx σ 2y σy
but σ x =σ y b yx = r = b xy
y y
2
in x is called total variation. It is denoted by SST or i and is given
by
y yˆ
2
variation. It is denoted by SSE or i i and is given by
1
yi yˆi
2
Average Sum of squares due to error (SSE)=
n
yˆ y
2
It is denoted by SSR or i and is given by
1
yˆi y
2
Average Sum of squares due to regression (SSR)=
n
Geometrical Interpretation:-
X 2 9 5 5 3 7 1 8 6 2
Y 69 98 82 77 71 84 55 94 84 64
Y y = 4.7426x + 55.036
R² = 0.9505
120
Y-Axis Grade on Exam
100
80
60 Y
40 Linear (Y)
20 Linear (Y)
0
0 2 4 6 8 10 12
X-Axis Hours studied
Coefficient of determination:-
Definition: The Coefficient of determination is the square of the coefficient
of correlation r2 which is calculated to interpret the value of the correlation. It
is useful because it explains the level of variance in the dependent variable
caused or explained by its relationship with the independent variable.
The coefficient of determination explains the proportion of the explained
variation or the relative reduction in variance corresponding to the regression
equation rather than about the mean of the dependent variable. For example,
if the value of r = 0.8, then r2 will be 0.64, which means that 64% of the
variation in the dependent variable is explained by the independent variable
while 36% remains unexplained.
Thus, the coefficient of determination is the ratio of explained variance to
the total variance that tells about the strength of linear association between
the variables, say X and Y. The value of r2 lies between 0 and 1 and observes
the following relationship with ‘r’. With the decrease in the value of ‘r’ from
its maximum value of 1, the ‘r2’ also decreases much more [Link] value
of ‘r’ will always be greater than ‘r2’ unless the r2 =0 or [Link] coefficient of
determination also explains that how well the regression line fits the
statistical data. The closer the regression line to the points plotted on a scatter
diagram, the more likely it explains all the variation and the farther the line
from the points the lesser is the ability to explain the variance.
Understanding the P Value
The P value is another statistic displayed on a spectrum of 0 to 1 that
you'll see after a regression analysis. Unlike R-squared, the P value tells
you how likely it is that there is no correlation whatsoever. A high P
value tells you that it's likely there is zero correlation, whereas a low P
value indicates that the two variables are correlated.
If the outcome of the dependent variable truly does depend on the
independent variable, the P value will be low. If you're way off base and
comparing apples to oranges, the P value will be high.
Regression Formulae
1 1
1 x = x i [2] y= yi
n n
1 1
σ 2x = x i - x σ 2y = yi 2 - y
2 2 2
[3] [4]
n n
1 Cov(x,y)
[5] Cov(x,y)= x i yi -x y [6] Corr x,y =
n σxσ y
7 Equation of a line of X on Y is
X= a + bY
regression line of equation of X on Y is
rσ x
(x-x )=b xy (y-y) (x-x )= (y-y) a =x -by
σy
8 Equation of a line of X on Y is
y = a + bx
Regression line of equation of Y on X is
rσ y
(y-y)=b yx (x-x) (y-y)= (x-x) a =y-bx
σx
[9]Regression coefficient of X on Y
cov(x,y) rσ x
b xy = b xy =
σy2 σy
[10]Regression coefficient of Y on X
cov(x,y) rσ y
b yx = b yx =
σx2 σx
y -yˆ
2
i i
[15]MeanSum of squares due to error (MSSE)=
n-2
SSR SST-SSE SSE
[16] Coefficient of determination = r 2 = = =1-
SST SST SST
ŷ -y
2
i
Explained variation
Coefficient of determination =r 2 = =
y -y
2 Total variation
i
Numerical Examples:
[1] A panel of examiners A and B based seven candidates independently and
awarded the following marks,
Candidate 1 2 3 4 5 6 7
Marks A 40 34 28 30 44 38 31
Marks B 32 39 26 30 38 34 28
Eight candidates was awarded 36 marks by examiner A using
regression
line estimate the marks awarded by the examiner B
Candidate Marks Marks X2 Y2 XY
by A by B
(X) (Y)
1 40 32 1600 1024 1280
2 34 39 1156 1521 1326
3 28 26 784 676 728
4 30 30 900 900 900
5 44 38 1936 1089 1452
6 38 34 1444 1156 1292
7 31 28 961 784 868
Total 245 227 8781 7150 7846
1
x = n1 × xi x = ×245 x=35
7
1 1
y = × yi y = ×227 y=32.42
n 7
σ 2x = x i2 -n(x)2 σ 2x =8781-7(35)2
σ 2x =8781-8775 σ 2x =6
cov(x,y)= x i yi -n(x)(y)
cov(x,y)=7846-7(35)(32.42)
cov(x,y)=7846-7942.9 cov(x,y)=-96.9
cov(x,y) -96.9
b xy = b xy =
σ 2y 7357.39
cov(x,y)
b xy =-0.4672 b yx =
σ 2x
-96.9
b yx = b yx =-16.15
6
∴ X-x = b xy (Y-y)
∴ X-35= -0.4672(Y-32.42)
∴ X-35 = -0.4672Y+50.14
∴ X= -16.81+50.14
∴ X= 33.33
∴ X=33
[2] The following data related to age of husband & wife in years at the time
of marriage
Age of husband 23 24 25 26 27
Age of wife 19 19 20 21 22
Estimate the age of husband if age of wife is 20 year
Solution:-
σ 2x =-9365
σ 2y = yi2 -n(y)2 σ 2y = 2047-20(2.2)
σ 2y =1950.2
Cov(x,y) = x i yi -n(x)(y) Cov(x,y)=2533-20(25)(2.2)
Cov(x,y)=1433
Cov(x,y) 1433
b xy = b xy =
σ 2x -9362
b xy =-0.153
x 25 0.153(y 2.2)
x 25 0.153 y 0.3366
x 0.153 y 25.33
x 3.06 25.33
x 28.39
x 28
The age of husband is 28 years when the age of wife is 20 years
σx
x-x=r (y-y)
σy
12.7
x 120.5 0.93 (y 10.37)
2.39
x 120.5 4.9418(y 10.37)
x 120.5 4.9418 y 51.246
x 4.9418 y 69.254 ......(1)
(ii) Now, we have to find x i.e. height of boy if y i.e age is 12 years
X=4.89418(12)+69.254
X=128.55
(i) Obtain the regression line (ii) Estimate y for x=3 & estimate x of y=3
Solution:
1 1
σ 2y yi2 (y)2 σ 2y 320 4
n 20
σ y 12
2
σ 2y 3.464
Cov(x,y)=
x i yi 1
-(x)(y) Cov(x, y) (480) (4)(2)
n 20
Cov(x,y) 24 8 Cov(x,y) 16
(ii)
Y = 0.2352X+1.0592
Y = 0.2352(3) +1.0592
Y = 0.7056+1.0592
Y = 1.7647
X = 1.333Y+1.334
X = 1.333(3) +1.134
X = 5.3333
Y=20,000
X=0.3241(20,000) + 50.786
X= 6532.78
[6] Determine the two regression lines from the following data
X 7 6 10 14 13
Y 22 18 20 26 24
Solution:-
Cov(x,y)=
x i yi 1
-(x)(y) Cov(x, y) (1138) (10)(22)
n 5
Cov(x,y) 227.6 220 Cov(x,y) 7.6
Regression coefficent X on Yis given by
Cov(x,y) 7.6
b xy = b xy = 0.95
σ 2y 8
Regression coefficent Y on Xis given by
Cov(x,y) 7.6
b yx = b yx = 0.76
σ 2x 10
Regression line of Y on X is
y-y =b yx x-x y-22 =0.76 x-10
y =0.76x-7.6+22 y = 0.76x+14.4
Regression line x on y is
x-x=b xy y-y x-10=0.95 y-22
x=0.95y-20.9+10 x=0.95y-10.9
[7] Following data are related to marks in Mathematics (X) and Marks in
Statistics (Y) of 10 candidates.
1 1 1 1
U= Ui = 10 =1 & V= Vi = -2 =-0.2
n 10 n 10
x=a+ U x = 66 +1 x = 67
1 1
σ 2U = Ui 2 -(U)2 σ 2U = (40)-(1)2
n 10
σ 2U =3 σ 2U =σ X2 variance isindependent of changeof origin
σ 2X =3
1 1
σ 2V = Vi 2 -(V)2 σ V2 = (24)-(0.2)2
n 10
σ 2V =2.36 σ 2V =σ 2Y variance isindependent of changeof origin
σ 2Y =2.36
Marks in Mathematics is 62
[8] Following are data of retail food price index(x) &whole sale food price
index (y) for 10 years. Find the regression lines hence find correlation
coefficient.
X 89 86 74 65 65 63 66 67 72 79
Y 92 91.5 84 75 73.5 72 70.5 75 77.5 84
1 1
σ 2U = Ui 2 -(U)2 σ 2U = (1352)-(7.6)2
n 10
σ 2U =77.44
1 1
σ 2V = Vi 2 -(V)2 σ V2 = (918.50)-(6)2
n 10
σ 2V =55.85
Cov(x,y) 63.85
bXY = = =1.1432
σ2Y 55.85
Cov(x,y) 63.85
bYX = = =0.8242
σ2X 77.44
Regression line of x on y
Regression line of y on x is
y-y = bYX (x-x) x-79.6=0.8242 x-72.6
y =0.8242x-59.859+79.6 y =0.8242x+19.641
Correlation coefficient is
r 2 0.9425
[9] Following are the results of [Link] examination in a certain for the last 10
years
1 1
U= Ui U= ×1076 U=107.4
n 10
1 1
V= Vi V= ×866 V=86.6
n 10
X = a +U X= 200+107.4 X=307.4
Y = b +V Y=164+86.6 Y=250.6
1 1
Cov U,V = Ui Vi -U V Cov U,V = 237293 - 107.4 86.6
n 10
Cov U,V =23729.3-9300.84 Cov U,V =14428.46
Cov(U, V) Cov(X, Y) Covariance is independent of change of origin
Cov(X,Y) 14428.46
1 1
σ 2U = Ui 2 -(U)2 σ 2U = (218206)-(107.4) 2
n 10
1
σ 2U = (21820.6)-(11534.76) σ 2U =10285.84
10
σ 2U =σ 2X variance isindependent of changeof origin
σ 2X =10285.84
Regression coefficient of Y on X is
Cov(X,Y) 14428.46
bYX = b YX = =1.4027498
σX
2
10285.84
Regression lineon yand x is
Y-y =bYX (X-x) Y-250.6=1.4027498(X-307.4)
Y-250.6=1.4027498X-431.2052 Y=1.4027498-180.6052 ....1
Estimate no. of successful candidate for the year 1996 if 400 candidates
appear examination i.e. X = 400 using equation (1)
Y = 1.4027498(400)-180.6052
Y = 380.51
Y = 381
[10] No. of successful candidates for year 1996 is 381 the following data
gives the sales and expense of 10 firs
Firm no 1 2 3 4 5 6 7 8 9 1
0
Sales 4 7 6 3 9 4 5 7 8 6
(in 000) 5 0 5 0 0 0 0 5 5 0
expense 3 9 7 4 9 4 6 8 8 5
s 5 0 0 0 5 0 0 0 0 0
Obtain the least square regression line of expenses on sales estimate expenses
if n sales x75000 also draw residual plot find the residual sum of square .
X2 Y2 y yˆ
2
X Y XY Regression Residual
estimate of y yˆ
ŷ
45 35 2025 1225 1575 47.79 -12.79 163.584
70 90 4900 8100 6300 73.110 16.890 285.272
65 70 4225 4900 4550 68.046 1.954 3.8181
30 40 900 1600 1200 32.598 7.402 54.7896
90 95 8100 9025 8550 93.366 1.634 2.6699
40 40 1600 1600 1600 42.726 -2.726 7.43107
50 60 2500 3600 3000 52.854 7.146 51.0653
75 80 5625 6400 6000 78.174 1.826 3.3342
85 80 7225 6400 6800 88.302 -8.302 68.9232
60 50 3600 2500 3000 62.982 -12.982 168.532
610 640 40700 45350 42575 639.948 0.0520 809.4196
1 1
x = xi x= 610 = 61
n 10
1 1
y = yi y = 640 = 64
n 10
1 1
σ 2x x i2 (x)2 σ 2x (40700) (61) 2
n 10
σ x 4070 3721
2
σ 2x 349 σ x 18.68
1 1
σ 2y yi2 (y)2 σ 2y (45350) (64) 2
n 10
σ y 4535 4096
2
σ y 439
2
σ Y 20.95
Cov(x,y)=
x i yi 1
-(x)(y) Cov(x, y) (42575) (61)(64)
n 10
Cov(x,y) 4257.5 3904 Cov(x,y) 353.5
Regression coefficient of y on x is
Cov(x,y) 353.5
b yx = b yx = 1.0128
σ2x 349
Regression line of y on x is
y-y =b yx x-x y-64 =1.0128 x-61.7808
y =1.0128x-61.7808+64 y = 1.0128x+2.2192 1
Given that estimate Y if sales are Rs. 75000 i.e. X=75000, From equation (1)
Y=1.0128(75000) + 2.2192
Y=75960 + 2.2192
Y=75962.22
SST=4390
SSE 809.4196
r 2 =1- r 2 =1-
SST 4390
4390 809.4196 3580.58
r2 r2 =
4390 4390
2
r = 0.81562
Residual Plot:-
Y-Values
20
15
10
5
Y-Values
0
0 20 40 60 80 100
-5
-10
-15
Solution:- Given
x+2y-5=0………….(1)
2x+3y-8=0..……….(2)
(i) To compute the correction between x &y assume the regression equation
(1) be X on Y
x+2y-5 = 0
x+2y = 5
x = 5-2y
x = a+by
bxy = -2.
3y=8-2x
8 2
y x
3 3
Compare y=a + b x
2
b yx
3
As we know
2
r 2 = b XY b YX r2 = 2
3
4
r2 = r 2 1.33
3
r =1.15 but r 1
[12] For bivariate data the regretion equation are 4x-5y+33=0 & 20x-9y=107
find means of x & y find correction coefficient between x&y also estimate
y when x=10
Solution:- Given equations are
4x-5y+33=0...........[i]
20x-9y=107..........[ii]
i] To find means of x & y since two regresion lines intersected at x y
therefore equations [i] and [ii]
4x-5y=-33...............[iii]
20x-9y=107............[iv]
Multiplying a constant 5 by equation [iii], we get
20x-25y=-165.........[v]
Substract equation [iv] from [v], we get
20 x 25 y 20 x 9 y 165 107
16 y 272
272
y
16
y 17
Put value of y in eqn ..[i]
4 x -5(17) 33 0 4 x -85 33 0
4 x 52 x 13
ii]Tofind correlation coefficient between x&y
Let, regression equation[ii] is y on x
4 x 5 y 33 0 4 x 5 y 33
5 y 33 4 x
x 2y 5 0 2y 5 x
5 1
y x
2 2
1
Hence by b YX =
2
Now regression equation [ii] be x on y
2x 3 y 8 0 2x 8 3 y
3
x 4 y
2
3
Hence by b XY =
2
We know,
3 1
r 2 =b xy ×b yx r2
2 2
3
r2 r 2 0.75
4
r 0.866
2] Estimate x when y 2.5
2x 3 y 8 0 2 x 3(2.5) 8 0
2 x 0.5 0 2 x 0.5
0.5
x x 0.25
2
33 4
y= + x y = a + bYX x
5 5
4
Hence, bYX
5
Then regression equation [ii] is x on y
20 x 9 y 107 20 x 107 9 y
107 9 9
x y bXY
20 20 20
As we know
9 4
r 2 =bXY .bYX r2 = ×
20 5
36
r2 = r = 0.6
100
iii] For estimating y when x=10
33 4 33 4
y x y 10
5 5 5 5
y 14.6
[13] For a certain bivariate dada the list square lines of regression are 4y-
x=19 and 9x-y=39 obtain
(i) Regression coefficient of x on y
(ii) Regression coefficient y on x
(iii) Correlation coefficient between x and y
Answer:- Given equations are,
4y-x=90………… [i]
9x-y=39………….[ii]
Let equation [i] become a regression line x on y
4 y x 19 x 19 4 y
x 19 4 y x = a + bXY y
bXY 4
Let equation [ii] become regression line y on x
9 x y 39 y 39 9 x
y 39 9 x y = a + bYX x
bYX 9
As we know
As we know
7 2
r 2 =bXY .bYX r2 = ×
5 3
14
r 2 =- r 2 =-0.93
15
r = -0.96
σx
(b) To find as we know,
σy
Cov(x,y)
b yx = …….(1)
σ2x
Cov(x,y)
bxy = ……(2)
σ2y
Divide equation (1) by (2), we get
Cov(x,y)
b yx σ 2x b yx Cov(x,y) σ 2y
b xy Cov(x,y) b xy σ 2x Cov(x,y)
σ 2y
b yx σ 2y σ 2x b xy
2
b xy σ 2x σ y b yx
σ 2x 7 3 σ 2x 21
σ 2y 5 2 σ 2y 10
σx
3.16
σy
[15] For a bivariate data on x and y the regression equation of two lines of
regression are 3x-2y+1=0 & 3x-8y+13=0 predict the value of y for x = 4 and
value of x for y =3
Solution:- Given equations are
– 3x-2y +1 = 0 ……(1)
3x-8y +13=0…….(2)
1 1
σ 2X = x i 2 -x 2 σ 2X = ×4400- -5
2
n 8
σ 2X =550-25 σ X2 =525
1 1
σ 2Y = yi 2 -y2 σ 2Y = ×167432- 35
2
n 8
σ 2Y =20929-1225 σ Y2 =19704
1 1
Cov(x,y)= x i yi -x y Cov(x,y)= ×21680-(-5)(35)
n 8
Cov(x,y)=2710+175 Cov(x,y)=2885
cov(x,y) 2885
b XY = b XY =
σ 2Y 19704
b XY =0.1464
Cov(x,y) 2885
b YX = b YX =
σ 2X 525
b YX =5.4952
[18] From the following data obtain the yield when the rainfall is 29 inches
Rainfall(inches) Yield(per acre)
A.M. 27 40 quintal
S.D. 3 6 quintal
Correlation coefficient between rainfall and yield is 0.8
Answer:-
Let x =27, y =40, σ x =3, σ y =6, r =0.8
Regression equation x on y
σx 3
x-x= r× (y-y) x-27=0.8× (y-40)
σy 6
x=11+0.44..............(i)
Regression equation y on x
σy 6
y-y =r× (x-x) y-40 =0.8× (x-27)
σx 3
y =1.6x-43.2+40
y =1.6x-3.2..................(ii)
[ii]Regression line x on y is
(x-x)=bXY .(y-y)
(x-53)=-0.2(y-28)
x=-0.2y+58.6.............(i)
Put y =30 in equation (i)
x=58.6-0.2(30)
x=52.6
[iii]Regression line y on x is
(y-y)=bYX (x-x)
(y-28)=-1.5(x-53)
y=-1.5x+107.5
y =107.5-1.5x................(ii)
x=60 put this value in equation (ii),we get
y=107.5-1.5(60)
y=17.5
[20] Obtain the coefficient of correlation & the regression lines from the
following data.
X Y
No. of observation 15 15
Sum of squares of deviation from mean 136 138
Sum of product of deviation from mean 122
Solution:-
r 2 =0.7930 r = 0.8905
Regression line x on y is
σx 3.01109
(x-x) = r. (y-y) (x-x)=0.8905× (y-y)
σy 3.0331
(x-x)=0.8970(y-y)
Regression line y on x is
σy 0.8905
(y-y)=r. (x-x) (y-y)= (x-x)
σx 1.0073
(y-y)=0.8840(x-x)
unexplained variance
(a) r 2 =1-
total variance
unexplained variance
(b) r 2 =
total variance
(c) both
(d) none
unexplained variance
Answer: (a) r 2 =1-
total variance
13 3X
[15] If the line Y= is the regression equation of y on the x then byx is
2 2
(a) 2/3 (b)-2/3
(c) 3/2 (d)-3/2
Answer: (a) 2/3
5
[16] The line, X=19- Y is the regression equation x on y then bxy is
2
(a) 19/2 (b) 5/2
(c) -5/2 (d) -2/5
Answer: (c) -5/2
31 1
[17] The line X= - Y is the regression equation of
6 6
(a) Y on X (b) X on Y
(c) both (d) we can not say
Answer: (d) we can not say
35 2
[18] In the regression equation x on y, X= - Y, b xy is equal to
8 5
(a)-2/5 (b) 35/8 (c) 2/5 (d) 5/2
Answer: (a)-2/5
[19] The correlation coefficient being +1 if the slope of the straight line in a
scatter dingram is
(a) positive (b) negative (c) zero (d) none
Answer: (a) positive
[20] The correlation coefficient being -1 if the slope of the straight line in a
scatter diagram is
(a) positive (b) negative (c) zero (d) none
Answer: (b) negative
[21] The more scattered the points are around a straight line in a scattered
diagram the……………. is the correlation coefficient.
(a) zero (b) more (c) less (d) none
Answer: (c) less
[22] If the values of y are not affected by changes in the values of x, the
variables are said to be
(a) correlated (b) uncorrelated
(c) both (d) zero
Answer: (b) uncorrelated
[23] If the amount of change in one variable tends to bear a constant ratio to
the amount of change in the other variable, then correlatiom in said to be
(a) non -linear (b) linear (c) both (d) none
Answer: (b) linear
[24] Two regression lines councide when r is equal to
(a) 0 (b) 2
(c) 1 (d) none
Answer: (c) 1
[27] When the variables are not independent, the correlation coefficient may
be zero.
(a) true (b) false
(c) both (d) none
Answer: (a) true
[28] bxy is called regression coefficient of
(a) x on y (b)y on x
(c) both (d)none
Answer: (a) x on y
[29] byx is called regression coefficient of
(a) x on y (b) y on x
(c) both (d)none
Answer: (b) y on x
[30] The slopes of the regression line of y on x is denoted by
(a) byx (b) bxy (c) bxx (d) byy
Answer: (a) byx
[31] The slopes of the regression line of x on y is denoted by
(a) byx (b) bxy (c) bxx (d) byy
Answer: (b) bxy
[32] The angle between the regression lines depends on
(a) correlation coefficient (b) regression coefficient
(c) both (d) none
Answer: (a) correlation coefficient
[33] If x and y satisfy the relationship y= -5+7x, the value of r is
(a) 0 (b) -1
(c) +1 (d) none
Answer: (c) +1
σx
(c) b yx =r (d) none
σ 2y
σy
Answer: (b) b yx =r
σx
Answer:[b]
[43] If the relationship between two variables x and y is giving by
2x+3x+4=0, then the value of the correlation between x and y is
(a) 0 (b) 1
(c ) -1 (d) negative.
Answer:[c]
[44] If there are two variables x and y, the number of regression equation
could be
(a) 1 (b) 2 (c) Any other (d) 3
Answer:[b]
[45] Since Blood Pressure of a person depends on age, we need consider
(a) The regression equation of Blood Pressure on age
(b) The regression equation of age on Blood Pressure
(c) Both (A) and (b)
(d) Either (a) or (b)
Answer: [a]
[46] The method applied for deriving the regression equations is knows as
(a) Least square (b) Concurrent deviation
(c) Product moment (d) Normal equation
Answer:[a]
[47] The different between the observed value and the estimated value in
regression analysis is knows as
(a) Error (b) Residue
(c) Deviation (d) (a) or (b)
Answer:[d]
[48] The error in case of regression equations are
(a) Positive (b) Negative
(c) Zero (d) All the above
Answer:[d]
[49] The regression line of y on x is derived by
(a) The minimization of vertical distances in the scatter diagram
(b) The minimization of horizontal distance in the scatter diagram
(c) Both (a) and (b)
(d) (a) or (B)
Answer:[a]
[50] The two lines of regression become identical when
(a) r = 1 (b ) r = -1
(c) r = 0 (d) (a) or (b)
Answer:[d]
[51] What are the limits of the two regression coefficients?
(a) No limit
(b) Must be positive
(c) one positive and the other negative
(d)Product of the regression coefficient must be numerically less than
unit.
Answer: [d]
[52] The regression coefficients remain unchanged due to a
(a) Shift of origin (b) Shift of scale
(c) Both (a) and (b) (d) (a) or (b).
Answer: (a) Shift of origin
[53] If the coefficient of correlation between two variables is -0 9, then the
coefficient of determination is
(a) 0.9 (b) 0.81 (c) 0.1 (d) 0.19
Answer: (b) 0.81
[54] If the coefficient of correlation between two variables is 0.7 then the
percentage of variation unaccounted for is
(a) 70% (b) 30% (c) 51% (d) 49%
Answer: (c) 51%
[55] If y = a + bx, then coefficient of correlation between x and y?
(a) 1 (b) -1
(c) 1 or -1 according as b > 0 or b < 0 (d) none of these.
Answer: (c)
[56] If u + 5x = 6 and 3y – 7v =20 and the correlation coefficient between x
and y is 0.58 then what would be the correlation coefficient between u and v?
(a) 0.58 (b) -0.58
(c) -0.84 (d) 0.84
Answer: (b) -0.58
[57] If the relation between x and u is 3x + 4u +7 = 0 and the correlation
coefficient between x and y is -0.6, then what is the correlation coefficient
between u and y?
(a) -0.6 (b) 0.8
(c) 0.6 (d) -0.8
Answer: (c) 0.6
[58] Following are the two normal equation obtained for deriving the
regression line of y and x: 5a+10b=40 and 10a+25b=95. The regression
line of y on x is given by
(a) 2x+3y=5 (b) 2y+3x=5
(c) y =2+3x (d) y=3+5x
Answer: (c) y =2+3x
[59] If the regression line of y on x and of x on y is given by 2x+3y=-1 and
5x+6y=-1 then the arithmetic means of x and y are given by
(a) (1, -1) (b) (-1, 1) (c) (-1, -1) (d) (2, 3)
Answer: (a) (1, -1)
[60] Given the regression equations as 3x+y=13 and 2x+5y=20, which one is
the regression equation of y on x?
(a) 3x+y=13 (b) 2x+5y=20
(c) both (a) and (b) (d) none of these
Answer: (b) 2x+5y=20
[61] Given the following equation: 2x-3y=10 and 3x+4y=15, which one is the
regression equation of x on y ?
(a) 2x-3y=10 (b) 3x+4y=15
(c) both the equation (d) none of these
Answer: (d) none of these
[62] If u=2x+5 and v=-3y-6 and regression coefficient of y on x is 2.4, what
is the regression coefficient of v on u?
(a) 3.6 (b) -3.6 (c) 2.4 (d) -2.4
Answer: (b) -3.6
[63] If 4y-5x=15 is the regression line of y on x and coefficient of correlation
between x and y is 0.75, what is the value of the regression coefficient of x on
y?
(a) 0.75 (b) 0.9375
(c) 0.6 (d) none of these
Answer: (a) 0.75
[64] If the regression line of y on x and that of x on y are given by y=-2x+3
and 8x=-y+3 respectively, what is the coefficient of correlation between x
and y?
(a) 0.5 (b) -1/√2
(c) -0.5 (d) none of these
Answer: (c) -0.5
[65] If the regression coefficient of y on x, the coefficient of correlation
3
between x and y and variance of y are -3/4, and 4 respectively, what is
2
the variance of x?
2
(a) (b) 16/3 (c) 4/3 (d) 4
3
2
Answer: (b) 16/3
[66] If y=3x+4 is the regression line of y on x and the arithmetic mean of x is
-1, what is the arithmetic mean of y ?
(a) 1 (b) -1
(c) 7 (d) none of these
Answer: (a) 1
[67] The regressions equation of y on x for the following dada
X 4 8 6 3 5 9 12 7 12 10
1 2 2 7 8 6 7 4 3 0
Y 2 5 3 1 4 8 10 6 98 73
8 6 5 7 2 5 5 1
(a) Y=1.2x-15 (b) Y =1.2x+15
(c) Y=o.93x-14.64 (d) Y =1.5x-10.89
Answer: (c)
[68] The following data relate to the heights of 10 pairs of fathers and sons;
(175,173) , (172, 172), (167, 171), (168,171), (172,173 ), (171,170),
(174,173), (176,175), (169,170) (170,173)
The regression equation of height of son on that of father is given by
(a) y=100+5x (b) y=99.708+0.405x
(c) y=89.653 +0.582 x (d) y=88.758+0.562x
Answer: (b)
[69] The two regression coefficients for the following data ;
X 38 23 43 33 28
Y 28 23 43 38 8
Answer: (b) 8
[73] Given below the information about the capital employed and profit
earned by a company over the last twenty five years;
mean S.D.
Capital employed (0000’ Rs.) 62 5
25 6
Profit earned (000Rs.)
Coefficient of correlation between capital employed and profit = 0.92. The
sum of the regression coefficients for the above data would be;
[2] Determine the two regression lines from the following data:
X 2 4 5 8 10
Y 4 16 25 64 100
Find
(i) Correlation coefficient between X and Y.
(ii) Estimate of Y when X = 60
(iii) Estimate of X when Y = 30
[6] The two regression equations of variables X and Y are 3X-Y-5 = 0 and
4X-3Y = 0. Find (i) Arithmetic mean of X and Y. (ii) Coefficient of
variations of X and Y, if σ X 2 . (iii) Correlation coefficient between X and
Y.
[7] The two regression equations of variables X and Y are 8X-10Y = -66 and
40X-18Y = 214. Find (i) Arithmetic mean of X and Y. (ii) Correlation
coefficient between X and Y.
[8] Find the regression line of Y on X from the following data:
n 10, x i2 385, yi2 192, x 5.5, y 4, x i x yi y 185
[9] Find the regression line of Y on X from the following data. Also, estimate
Y when X = 0
n 100, x i 25, yi 68, x i2 167, yi2 162, x i x yi y 130