0% found this document useful (0 votes)
74 views60 pages

Regression

Regression analysis is a statistical tool used to predict the value of one variable based on the known value of another, particularly in fields like business and economics. It involves determining the relationship between dependent and independent variables through a mathematical equation, typically represented as y = a + bx + e, where 'a' is the y-intercept, 'b' is the slope, and 'e' is the error term. The least squares principle is used to minimize the sum of the squares of the errors, allowing for the estimation of the regression coefficients.

Uploaded by

samridhikamakshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views60 pages

Regression

Regression analysis is a statistical tool used to predict the value of one variable based on the known value of another, particularly in fields like business and economics. It involves determining the relationship between dependent and independent variables through a mathematical equation, typically represented as y = a + bx + e, where 'a' is the y-intercept, 'b' is the slope, and 'e' is the error term. The least squares principle is used to minimize the sum of the squares of the errors, allowing for the estimation of the regression coefficients.

Uploaded by

samridhikamakshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

REGRESSION LINE (FITTING OF CURVE)

Concept of Regression:
If two variables are significantly correlated, and if there is some theoretical
basis for doing so, it is possible to predict values of one variable from the
other. This observation leads to a very important concept known as
‘Regression Analysis’.
Regression analysis, in general sense, means the estimation or
prediction of the unknown value of one variable from the known value of the
other variable. It is one of the most important statistical tools which are
extensively used in almost all sciences – Natural, Social and Physical. It is
specially used in business and economics to study the relationship between
two or more variables that are related causally and for the estimation of
demand and supply graphs, cost functions, production and consumption
functions and so on.
Regression analysis was explained by M. M. Blair as follows:
“Regression analysis is a mathematical measure of the average relationship
between two or more variables in terms of the original units of the data.”
Some of the examples of dependent and independent variables
(i) Hours spent studying Vs Marks scored by students
(ii) Amount of rainfall Vs Agricultural yield
(iii) Electricity usage Vs Electricity bill
(iv) Suicide rates Vs Number of stressful people
(v) Years of experience Vs Salary
(vi) Demand Vs Product price
(vii) Age Vs Beauty
(viii) Age Vs Health issues
(ix) Number of Degrees Vs Salary
(x) Number of Degrees Vs Education expenditure
Review/summary of objectives of regression:
[1] To determine whether a relationship exists between two variables
[2] To describe the nature of the relationship, should one exist, in the form of
a mathematical equation
[3] To assess the degree of accuracy of description or prediction achieved by
the regression equation, and
Assumptions of Linear Regression:
[1] Relationship is approximately linear (approximates a straight line in
scatter plot of Y, X)
[2] For each value of X there is a probability distribution of independent
values of Y, and from each of these Y distributions one or more values are
sampled at random.
[3] The means of the Y distributions fall on the regression line.
Lines of Regression and Equation:
Simple regression:
It is used to examine the relationship between one dependent and one
independent variable. After performing an analysis, the regression statistics
can be used to predict the dependent variable when the independent variable
is known.
The regression line (known as the least squares line):
It is a plot of the expected value of the dependent variable for all values of the
independent variable. Technically, it is the line that "minimizes the squared
residuals". The regression line is the one that best fits the data on a
scatterplot.
Using the regression equation, the dependent variable may be
predicted from the independent variable. The slope of the regression line (b)
is defined as the rise divided by the run. The y intercept (a) is the point on the
y axis where the regression line would intercept the y axis.
The slope and y intercept are incorporated into the regression equation. The
intercept is usually called the constant, and the slope is referred to as the
coefficient. Since the regression model is usually not a perfect predictor,
there is also an error term in the equation.
Here is a way to mathematically describe a linear regression model:
y = a + bx + e
If the slope is significantly different than zero, then we can use the
regression model to predict the dependent variable for any value of the
independent variable.
If the slope is zero. It has no prediction ability because for every value of the
independent variable, the prediction for the dependent variable would be the
same. Knowing the value of the independent variable would not improve our
ability to predict the dependent variable. Thus, if the slope is not significantly
different than zero, don't use the model to make predictions.
The standard error of the estimate for regression measures the amount of
variability in the points around the regression line. It is the standard deviation
of the data points as they are distributed around the regression line. The
standard error of the estimate can be used to develop confidence intervals
around a prediction.
A line minimizes the sum of squares of differences value given by
straight line, is chosen. This principle is called as least square principle. The
equation so obtained is called as least square regression line.

Regression as Prediction Model:-


Suppose we have a sample of size ‘n’ and it has two sets of measures,
denoted by x and y. We can predict the values of ‘y’ given the values of ‘x’
by using the equation, called the Regression Equation.
Y = a + bX
where,
Y is the dependent variable, measured in units of the dependent variable.
X is the independent variable, measured in units of the independent variable.
‘a’ is the Y-intercept is the value of Y when X = 0. ‘b’ is the slope of the line
and is known as the regression coefficient and is the change in Y associated
with a one-unit change in X.
The greater the slope or regression coefficient, the more influence the
independent variable has on the dependent variable, and the more change in
Y associated with a change in X.
The regression coefficient is typically more important than the
intercept from a policy researcher perspective as we are usually interested in
the effect of one variable on another.
Coming back to the equation, we also have a term to capture the error
in our estimating equation, denoted by ε or e. ei is the difference between
observed and estimated value and is the error or residue. It reflects the
unexplained variation in Y, and its magnitude reflects the goodness of fit of
the regression line. The smaller the error, the closer the points are to our line.
So our general equation describing a line is:
Y = a + bX + e
Note: While deriving the regression model following things are
important.
[i] Paired values of X, Y
[ii] Regression equation
[iii] Apply Principal of least squares to the regression equation.
[iv] Take the partial derivatives w.r.t. a and b.
[v] We get normal equations.
[vi] Determine constant a from first normal equation.
[vii] Determine constant b from second normal equation.
[viii] Substitute values of the constants a and b in regression equation as
mentioned in point number [ii].
Derivation of Linear Regression Model of Y on X:
Suppose  x i , yi  ; i= 1,2,…,n. , are n pairs of observations on variables X, Y.
We assume that Y as dependent variable, which can be expressed in terms of
X. The simplest form is the linear relation. Suppose Y = bX + a However
when we observe the numerical values of x and y, the relation may not be
observed perfectly.
We assume the model Y = bX + a + e ... (1)
We assume E(e) = 0 and Var (e) = o.
The equation (1) contains three unknown quantities and our main aim is to
estimate these quantities by using least square principle, whereas e is a
random variable, we estimate its parameters E (e) and Var (e).
We estimate a and b so that the error is minimum
e = Y-a-bx
n
By using principal of least square Symbolically we write S =  ei2 as sum
i=1

of squares of errors. We find the points minima using calculus methods. The
s s
solution of equation = 0 and = 0 gives extreme points.
a b
n n
S=  ei2 =  (yi -a-bx i ) 2
i=1 i=1

s  n
a
= 
a i=1
(yi -a-bx i ) 2

n

 (yi -a-bx i ) 2
i 1 a
0   2   Yi – a-bx i 
  yi –a–bx i   0
 yi - na – b  x i  0
 yi  na + b  x i ..  2 

Similarly,
s 
   yi –a–bx i  = 0 gives
2

b b
2   yi –a – bx i   -x i  = 0
 x i yi – a  x i  b  x i 2 = 0
 x i yi  a  x i  b  xi 2   3 

The equation (2) & (3) are referred to as normal equations.

Solving equations (2) & (3) simultaneously, we get a & b .

 yi = na + b  xi
na   y i – b  xi
 yi  xi
a -b
n n
a = y -bx 4
Substituting,

a  y  bx in eqn (3) we get ,

 x i yi   y  bx   x i  b  x i 2

 x 
 xi yi  nx y  nb( x )2  b  xi 2 x    xi  nx 
 n 


 xi yi  nx y  b  xi 2  n  x 
2

Dividing by n we get

 xi yi   xi 2 2
 x y  b x 
n  n 

 xi yi  xi 2
xy and Var  x   σ  x
2 2
But, Cov(x,y) = x
n n
Cov(x,y)  b Var  x 
Cov(x,y)
b=
σ 2x
Cov(x,y)
 b yx =  5
σ 2x

Substituting equation (4) and (5) in the regression equation, y=a + bx , we


get

y = y-b yx x+b yx x
 y -y  = b yx  x-x 
Cov(x,y)
 y -y  =  x-x 
σ 2x

Therefore,  y -y  = b yx  x-x  represents a least square regression equation of

Y on X
Derivation of Linear Regression Model of X on Y:
Suppose (xi, yi); i= 1,2,…,n., are n pairs of observations on variables X, Y.
We assume that X as dependent variable, which can be expressed in terms of
Y. The simplest form is the linear relation. Suppose X = bY + a ; However
when we observe the numerical values of x and y, the relation may not be
observed perfectly.
We assume the model X = bY + a + e ... (1)
We assume E(e) = 0 and Var (e) = o.
The equation (1) contains three unknown quantities and our main aim is to
estimate these quantities by using least square principle, whereas e is a
random variable, we estimate its parameters E (e) and Var (e).
We estimate a and b so that the error is minimum
e = x-by-a
By using principal of least square
n
Symbolically we write S= ei2 as sum of squares of errors. We find the
i=1

s
points minima using calculus methods. The solution of equation =0 and
a
s
=0 , gives extreme points.
b
n n
S=  e =  (x i -a-byi ) 2
2
i
i=1 i=1

s  n
a
= 
a i=1
(x i -a-byi ) 2

s n 
=  (x i -a-byi ) 2
a i=1 a
0   2  x i -a-byi
∑ ( x i -a-byi ) = 0
 xi  na – b  yi  0
 xi  na  b  yi ..  2 
∑xi – na – b ∑ yi = 0

∑xi = na + b ∑yi ……….. (3)

s 
Similarly,  ∑(xi –a –byi)2 = 0 gives
b b

2   xi – a – byi    yi   0
 xi yi – a  yi  b  yi 2  0
 xi yi  a  yi  b  yi 2   4 

Equations (2) & (3) are referred to as normal equations.

Solving equations (2) & (3) simultaneously, we get a & b .

 xi  na  b  yi
na   xi – b  yi
 xi  yi
a b
n n
a  x  by  4
Substituting, a  x  b y in equation (3), we get

 xi yi   x  by   yi  b  yi 2
 y 
 xi yi   x  by  ny  b  yi 2 y    yi  ny 
 n 
 xi yi  ny x  nb  y   b  yi 2
2

 xi yi  ny x  b  yi 2  nb  y 
2

 xi yi  ny x  b   yi 2  n  y  
2
 

Dividing by n we get

 xi yi   yi 2 2
 x y  b y 
n  n 
 xi yi  yi 2
xy and Var  y   σ  y
2 2
But, Cov(x,y) = y
n n
Cov(x,y)  bVar  y 
Cov(x,y)
b=
σ 2y
Cov(x,y)
 b xy =  5
σ 2y

Substituting equation (4) and (5) in the regression equation, y=a + bx , we get
x = x -b xy y  b xy y
 x -x  = b xy  y  y 
Cov(x,y)
 x -x  = y  y
σ 2y

Therefore,  x -x  = b xy  y  y  represents a least square regression equation

of X on Y

Slope of the line:-


Slope is the ratio of rise over the run. It is given by
Rise
Slope =
Run
y  y1
Slope = 2
x2  x1
Rise means how much does it go up or down and Run means how much does
it go side to side.
Sign of the slope is depend on rise and run i.e.
If the line is upward then slope is positive.
If the line is downward then slope is negative.
If the line is parallel to X-axis then slope of the line is zero.
0
Slope = =0
run
If the line is parallel to Y-axis then slope of the line is undefined.
Rise
Slope = =
0
Interpretation of Regression Coefficient:-
Definition: The Regression Coefficient is the constant ‘b’ in the regression
equation that tells about the change in the value of dependent variable
corresponding to the unit change in the independent variable. If there are two
regression equations, then there will be two regression coefficients:
Regression Coefficient of X on Y: The regression coefficient of X on Y is
b xy
represented by the symbol that measures the change in X for the unit
change in Y. Symbolically, it can be represented as:
Cov (x,y) Cov (x,y) rσ σ r σx
b xy = but, r =  b xy = x 2 y  b xy =
σ 2y σx σy σy σy

Regression Coefficient of Y on X: The symbol b yx is used that measures the


change in Y corresponding to the unit change in X. Symbolically, it can be
represented as:
Cov (x,y) Cov (x,y) rσ σ r σy
b yx = but, r =  b yx = x 2 y  b yx =
σ 2x σx σy σx σx

The Regression Coefficient is also called as a slope coefficient because it


determines the slope of the line i.e. the change in the dependent variable for
the unit change in the independent variable.
Interpretation of Regression coefficients:
Regression coefficients are estimates of the
unknown population parameters and describe the relationship between
a predictor variable and the response. In linear regression, coefficients are the
values that multiply the predictor values. Suppose you have the following
regression equation: y = 3X + 5. In this equation, +3 is the coefficient, X is
the predictor, and +5 is the constant.
The sign of each coefficient indicates the direction of the relationship
between a predictor variable and the response variable.
A positive sign indicates that as the predictor variable increases, the response
variable also increases.
A negative sign indicates that as the predictor variable increases, the response
variable decreases.
The coefficient value represents the mean change in the response given a
one unit change in the predictor. For example, if a coefficient is +3, the mean
response value increases by 3 for every one unit change in the predictor.

Properties of Regression Coefficients:


[1] Correlation coefficient and regression coefficients have same
algebraic sign
Cov(x,y) Cov(x,y) Cov(x,y)
Proof: b yx = ;b xy = and r=
σ 2x σ 2y σx σy

Clearly, numerator of each coefficient is same and denominator of each


coefficient is positive. Hence, numerator decides algebraic sign. Thus all
coefficients have same algebraic sign. Hence, If r > 0, then b yx  0 and b xy  0.
If r = 0, then b yx  0  b xy .
If r < 0, then b yx  0 and b xy  0.

[2] Correlation coefficient is a square root of product of regression


coefficients. i.e. r = 
b yx  b xy or correlation coefficient is geometric
mean of regression coefficients.

Proof:

Cov (x,y) Cov (x,y)


b yx ×b xy = ×
σ 2x σ 2y
2
 Cov (x,y) 
b yx ×b xy = 
 σ ×σ 
 x y 
b yx ×b xy =  r 
2

 r= b yx ×b xy

Note: Choose positive square root if regression coefficients are positive,


otherwise, negative.

[3] Both regression coefficients cannot exceed unity simultaneously.


Proof: If possible, let us assume b yx >1 and b xy  1 .
b yx ×b xy  1
Hence,
r2  1
Hence, which is impossible r < 1 . Thus our assumption is incorrect.
[4] Regression coefficient can be expressed in terms of correlation
coefficient.
rσ rσ
i.e. b yx = y and b xy = x
σx σy
Proof:
We have
Cov(x,y) Cov(x,y)
b yx = but r=  Cov(x,y)=rσ x σ y
σ 2x σx σy
rσ x σ y rσ y
 b yx =  b yx =
σ 2x σx
Cov(x,y) Cov(x,y)
b xy = but r=  Cov(x,y)=rσ x σ y
σ 2y σx σy
rσ x σ y rσ x
 b xy =  b xy =
σ 2
y
σy
[6] Regression coefficients are invariant to the change of origin.
Note that Cov (x ,y), σ x and σ y are invariant to the change of origin, hence

the regression coefficients are invariant to the change of origin. This property
makes the computations of regression coefficients simple. We can subtract a
constant from each observation for computations
[7] Regression coefficients are invariant to the change of origin but not of
scale.
Proof:
Let
x-a y-b
u= and v =
h k
 x-a y-b  1
Cov  ,  =Cov  u, v  = Cov(x , y)
 h k  hk
1 2 1 2
σ2 X-a  =σ 2u = σx and σ 2 y-b  =σ 2v = σy



h  h2  
 k 
k2
Cov  u, v  Cov(x , y) hk k Cov(x , y)
b uv = = =
σ 2v σ 2y k 2 h σ 2y
Cov  u, v  Cov(x , y) hk h Cov(x , y)
b vu = = =
σ 2u σ 2x h 2 k σ 2x
[8] If r = +1
-
, then regression coefficients are reciprocals of each other.

Proof: We have,
b yx ×b xy = r 2
b yx ×b xy =1
1 1
b xy = or b yx =
b yx b xy

[9] If σ x = σ y then prove that regression coefficients are equal.

Proof:
We have
Cov(x,y) rσ y Cov(x,y) rσ x
b yx = = and b xy = =
σ 2x σx σ 2y σy
but σ x =σ y  b yx = r = b xy

[10] Product of regression coefficients is less than unity.


Proof:
b yx ×b xy = r 2 but r 2  1
Hence, b yx ×b xy  1

[11] The acute angle ( θ ) between the regression lines is


Note :
(i) We see from the above expression that larger the r 2 , smaller is the angle
between the lines.
(ii) The point of intersection of two regression lines is  x,y 

(iii) When, r =  1, then tanθ = 0, therefore, θ = 0 .


When the angle θ = 0 , there are two possibilities. First, the lines will be
coincident and the second, the lines will be parallel. However, the regression
lines intersect at  x,y  . Hence the second possibility is ruled out. Therefore,
for r = ± 1, the regression lines are coincident. In other words, if there is
perfect correlation, then the regression lines coincide.
π
iv. If r = 0, then tanθ = , therefore, θ = . Hence, the lines are
2
perpendicular to each other. The points on scatter diagram will show
maximum spread. In other words, if the variables are uncorrelated, then the
regression lines are perpendicular to each other.
Different type of Variation in Regression Y on X:-
Residual: The difference between the observed value of Y and its predicted
or estimated value of Y is called residual or error in prediction. It is denoted
by e and is given by
 e =  yi -yˆ i 
It is measured in two ways: residual plot and residual sum of squares.
Residual plot:
It is obtained by plotting residuals  yi  yˆi  on y-axis against the xi values.
Total Variation (Total sum of squares):-
Variation of the regression of dependent variable y is caused by the variation

 y  y 
2
in x is called total variation. It is denoted by SST or i and is given

by

Average Total Sum .of squares (SST)=   y  y 


2
1
i
n
SST=SSR + SSE
 yi  y   yi  yˆi  yˆi  y
Unexplained Variation (Residual sum of squares):-
Variation not explained by the regression of y on x is called unexplained

  y  yˆ 
2
variation. It is denoted by SSE or i i and is given by

1
  yi  yˆi 
2
Average Sum of squares due to error (SSE)=
n

Explained Variation (sum of squares due to regression):-


Variation explained by the regression of y on x is called explained variation.

  yˆ  y 
2
It is denoted by SSR or i and is given by

1
  yˆi  y 
2
Average Sum of squares due to regression (SSR)=
n
Geometrical Interpretation:-
X 2 9 5 5 3 7 1 8 6 2
Y 69 98 82 77 71 84 55 94 84 64

Y y = 4.7426x + 55.036
R² = 0.9505
120
Y-Axis Grade on Exam

100

80

60 Y
40 Linear (Y)

20 Linear (Y)

0
0 2 4 6 8 10 12
X-Axis Hours studied

Coefficient of determination:-
Definition: The Coefficient of determination is the square of the coefficient
of correlation r2 which is calculated to interpret the value of the correlation. It
is useful because it explains the level of variance in the dependent variable
caused or explained by its relationship with the independent variable.
The coefficient of determination explains the proportion of the explained
variation or the relative reduction in variance corresponding to the regression
equation rather than about the mean of the dependent variable. For example,
if the value of r = 0.8, then r2 will be 0.64, which means that 64% of the
variation in the dependent variable is explained by the independent variable
while 36% remains unexplained.
Thus, the coefficient of determination is the ratio of explained variance to
the total variance that tells about the strength of linear association between
the variables, say X and Y. The value of r2 lies between 0 and 1 and observes
the following relationship with ‘r’. With the decrease in the value of ‘r’ from
its maximum value of 1, the ‘r2’ also decreases much more [Link] value
of ‘r’ will always be greater than ‘r2’ unless the r2 =0 or [Link] coefficient of
determination also explains that how well the regression line fits the
statistical data. The closer the regression line to the points plotted on a scatter
diagram, the more likely it explains all the variation and the farther the line
from the points the lesser is the ability to explain the variance.
Understanding the P Value
The P value is another statistic displayed on a spectrum of 0 to 1 that
you'll see after a regression analysis. Unlike R-squared, the P value tells
you how likely it is that there is no correlation whatsoever. A high P
value tells you that it's likely there is zero correlation, whereas a low P
value indicates that the two variables are correlated.
If the outcome of the dependent variable truly does depend on the
independent variable, the P value will be low. If you're way off base and
comparing apples to oranges, the P value will be high.
Regression Formulae
1 1
1 x =  x i [2] y=  yi
n n
1 1
σ 2x =  x i -  x  σ 2y =  yi 2 -  y 
2 2 2
[3] [4]
n n
1 Cov(x,y)
[5] Cov(x,y)=  x i yi -x y [6] Corr  x,y  =
n σxσ y
7 Equation of a line of X on Y is
X= a + bY
 regression line of equation of X on Y is
rσ x
(x-x )=b xy (y-y)  (x-x )= (y-y)  a =x -by 
σy
8 Equation of a line of X on Y is
y = a + bx
Regression line of equation of Y on X is
rσ y
(y-y)=b yx (x-x)  (y-y)= (x-x)  a =y-bx 
σx
[9]Regression coefficient of X on Y
cov(x,y) rσ x
b xy =  b xy =
σy2 σy
[10]Regression coefficient of Y on X
cov(x,y) rσ y
b yx =  b yx =
σx2 σx

[11]Sum of squares due to regression (SSR)=   yˆi  y 


2

[12]Sum of squares due to error (SSE)=   yi  yˆi 


2

[13]Total Sum of squares (SST)=   yi  y 


2

[14] SST=SSR + SSE

  y -yˆ 
2
i i
[15]MeanSum of squares due to error (MSSE)=
n-2
SSR SST-SSE SSE
[16] Coefficient of determination = r 2 = = =1-
SST SST SST

  ŷ -y 
2
i
Explained variation
Coefficient of determination =r 2 = =
  y -y 
2 Total variation
i

MSSE SSE n-2


[17] Adjusted r 2 =1- =
MSST SST n-1

Numerical Examples:
[1] A panel of examiners A and B based seven candidates independently and
awarded the following marks,
Candidate 1 2 3 4 5 6 7
Marks A 40 34 28 30 44 38 31
Marks B 32 39 26 30 38 34 28
Eight candidates was awarded 36 marks by examiner A using
regression
line estimate the marks awarded by the examiner B
Candidate Marks Marks X2 Y2 XY
by A by B
(X) (Y)
1 40 32 1600 1024 1280
2 34 39 1156 1521 1326
3 28 26 784 676 728
4 30 30 900 900 900
5 44 38 1936 1089 1452
6 38 34 1444 1156 1292
7 31 28 961 784 868
Total 245 227 8781 7150 7846

1
x = n1 × xi  x = ×245  x=35
7
1 1
y = × yi  y = ×227  y=32.42
n 7
σ 2x =  x i2 -n(x)2  σ 2x =8781-7(35)2
 σ 2x =8781-8775  σ 2x =6

σ 2y =  yi2 -n(y)2  σ 2y =7150-7(32.42) 2


σ 2y =7150-7357.39  σ 2y =-207.39

cov(x,y)= x i yi -n(x)(y)
cov(x,y)=7846-7(35)(32.42)
cov(x,y)=7846-7942.9  cov(x,y)=-96.9
cov(x,y) -96.9
b xy =  b xy =
σ 2y 7357.39
cov(x,y)
b xy =-0.4672  b yx =
σ 2x
-96.9
b yx =  b yx =-16.15
6

∴ X-x = b xy (Y-y)

∴ X-35= -0.4672(Y-32.42)

∴ X-35 = -0.4672Y+50.14

∴ X= -16.81+50.14

∴ X= 33.33

∴ X=33

The marks given by examiner B is 33

[2] The following data related to age of husband & wife in years at the time
of marriage

Age of husband 23 24 25 26 27
Age of wife 19 19 20 21 22
Estimate the age of husband if age of wife is 20 year

Solution:-

Age of Age of wife X2 Y2 XY


husband(x) (y)
23 19 529 361 437
24 19 576 361 456
25 20 625 400 500
26 21 676 441 546
27 22 729 484 594
3135 2047 2533
1 1
x=  x i  x= ×125  x=25
n 5
1 1
y=  yi  y= ×101  y=2.2
n 5
σ 2x =  x i2 -n(x)2  σ 2x = 3135-20(25)2

σ 2x =-9365
σ 2y =  yi2 -n(y)2  σ 2y = 2047-20(2.2)

σ 2y =1950.2
Cov(x,y) =  x i yi -n(x)(y)  Cov(x,y)=2533-20(25)(2.2)
Cov(x,y)=1433
Cov(x,y) 1433
b xy =  b xy =
σ 2x -9362
b xy =-0.153

X be the age of husband and y be the age of wife


 x-x = b xy (y-y)

 x  25  0.153(y 2.2)
 x  25  0.153 y  0.3366
 x  0.153 y  25.33
 x  3.06  25.33
 x  28.39
 x  28
The age of husband is 28 years when the age of wife is 20 years

[3] Given the following information:


Mean height (x) =120.5cm , mean age  y  =10.37year ,S.D of x =12.7cm,
S.D of y =2.39 year correlation coefficient between x and y = 0.93
(i) Fit the regression line (ii) Estimate the height of boy of 12 years
Solution:
Given,
x =120.5cm  y =10.37year
σ x =12.7cm  σ y =2.39year
r = 0.93

σ2x =(12.7)2 σ 2x =161.29


σ2y =(2.39)2 σ 2y =5.712

 Now, we have to find regression line x on y

σx
 x-x=r (y-y)
σy
 12.7 
 x  120.5  0.93   (y 10.37)
 2.39 
 x  120.5  4.9418(y 10.37)
 x  120.5  4.9418 y 51.246
 x  4.9418 y 69.254 ......(1)

Now regression line y on x


σy
 y-y  r. ( x-x )
σx
 2.39 
 y 10.37  0.93   (x  120.5)
 12.7 
 y10.37  0.1750 x  21.089
 y  0.1750 x 10.719 ....(2)

(ii) Now, we have to find x i.e. height of boy if y i.e age is 12 years

From equation (1)

X=4.89418(12)+69.254

X=128.55

[4] Following is the information about the bivariate frequency distribute


 x=80, y =40  x 2 =1680,  y 2 =320,  xy =480, n =20,

(i) Obtain the regression line (ii) Estimate y for x=3 & estimate x of y=3

Solution:

 x=80, y =40  x 2 =1680,  y 2 =320,  xy =480, n =20,


1
x = n1 × x i x  80
20
x4
1 1
y = × yi y  40
n 20
y2
1 1
σ 2x    x i2  (x)2  σ 2x   (1680)  (4)2
n 20
σ x  68
2
 σ 2x  8.246

1 1
σ 2y    yi2  (y)2  σ 2y   320  4
n 20
σ y  12
2
 σ 2y  3.464

Cov(x,y)= 
x i yi 1
-(x)(y)  Cov(x, y)  (480)  (4)(2)
n 20
Cov(x,y)  24  8  Cov(x,y)  16

Regression line yon xis given by


Cov(x,y) 16
b yx =  b yx   .02352
σx2
68
Cov(x,y) 16
b xy =  b xy =  1.33
σy2
12
y-y=b yx (x-x)   y-2  =0.2352(x-4)
 y-2  =0.2352x-0.9408  y =.02
Regression line of x on yis
x-x=b xy (y-y)  x-4=b xy (y-2)
x-4=1.333y-2.666  x  1.333 y  1.334

(ii)

Y = 0.2352X+1.0592

Y = 0.2352(3) +1.0592

Y = 0.7056+1.0592

Y = 1.7647

X = 1.333Y+1.334
X = 1.333(3) +1.134

X = 5.3333

[5] Following is the information about the bivariate frequency distribution


Result of capital employed and profit earn by a firm in ten successive year of
calculated.
Mean S.D.
Capital employed (000’Rs.) 55 28.7
Profit earned (000’Rs.) 13 85

Coefficient of correlation = 0.96. Estimate the amount of capital to be


employed to even profit of Rs.20000

Solution:- Given, r = 0.96

Consider, capital employed =X (000’Rs.) and Profit employed =Y(000’Rs.)

 x =55 , σ x =28.7, y =13, σ y =85, n =10

Regression line of x on yis


σ
 x-55=0.96 
28.7 
x-x= r x (y-y)  (y-13)
σy  85 
x-55=0.3241 y-13  x-55=0.3241y-4.2138
x=0.3241y + 50.786

Given that the amount of capital to be employed to even profit of RS 20,000

Y=20,000

X=0.3241(20,000) + 50.786

X= 6532.78

[6] Determine the two regression lines from the following data

X 7 6 10 14 13
Y 22 18 20 26 24
Solution:-

Xi Yi Xi2 Yi2 XiYi


7 22 49 484 154
6 18 36 324 108
10 20 100 400 200
14 26 196 676 364
13 24 169 576 312
50 110 550 2460 1138

n = no. of pairs of observation is 5


1
x = n1 × x i  x   50
5
x  10
1 1
y = × yi  y  110
n 5
y  22
1 1
σ 2x    x i2  (x)2  σ 2x   (550)  (10)2
n 5
σ x  110  100
2
 σ 2x  10
1 1
σ 2y    yi2  (y)2  σ 2y   (2460)  (22)2
n 5
σ y  492  484
2
 σy  8
2

Cov(x,y)= 
x i yi 1
-(x)(y)  Cov(x, y)  (1138)  (10)(22)
n 5
Cov(x,y)  227.6  220  Cov(x,y)  7.6
Regression coefficent X on Yis given by
Cov(x,y) 7.6
b xy =  b xy =  0.95
σ 2y 8
Regression coefficent Y on Xis given by
Cov(x,y) 7.6
b yx =  b yx =  0.76
σ 2x 10
Regression line of Y on X is
y-y =b yx  x-x   y-22 =0.76  x-10 
y =0.76x-7.6+22  y = 0.76x+14.4
Regression line x on y is
x-x=b xy  y-y   x-10=0.95  y-22 
x=0.95y-20.9+10  x=0.95y-10.9

[7] Following data are related to marks in Mathematics (X) and Marks in
Statistics (Y) of 10 candidates.
1 1 1 1
U=  Ui = 10  =1 & V=  Vi =  -2  =-0.2
n 10 n 10
x=a+ U  x = 66 +1  x = 67

y = b +V  y = 68+  -0.2   y = 67.8

X Y U=X- V=Y- U2 V2 U*V


66 68
66 68 0 0 0 0 0
65 67 -1 -1 1 1 1
68 67 2 -1 4 1 -2
68 70 2 2 4 4 -4
67 65 1 -3 1 9 -3
66 68 0 0 0 0 0
70 70 4 2 16 4 8
64 66 -2 -2 4 4 4
69 68 3 0 9 0 0
67 69 1 1 1 1 1
10 -2 40 24 13
1 1
Cov  U,V  =  Ui Vi -U  V  Cov  U,V  = 13 - 1 -0.2 
n 10
Cov  U,V  =1.3+ 0.2  Cov  U,V  =1.5
Cov(U, V)  Cov(X, Y)  Covariance is independent of change of origin 
 Cov(X,Y)  1.5

1 1
σ 2U =  Ui 2 -(U)2  σ 2U = (40)-(1)2
n 10
σ 2U =3  σ 2U =σ X2  variance isindependent of changeof origin 
 σ 2X =3

1 1
σ 2V =  Vi 2 -(V)2  σ V2 = (24)-(0.2)2
n 10
σ 2V =2.36  σ 2V =σ 2Y  variance isindependent of changeof origin 
 σ 2Y =2.36

i] Regression coefficient X on Yis,


cov(X,Y) 1.5
b xy =  b xy = =0.63
σ 2Y 2.36
Rrgression coefficient Y on X is,
cov(X.Y) 1.5
bYX =  b YX = =0.5
σ 2X 3
r 2 = b XY bYX  r 2 =(0.63)(0.5)
r 2 =0.315  r =0.56
ii]Given that X= Mathamatics Y=Statistics
X=76 then Y is...
Regression line Y on X is
Y-Y=b yx (X-X)  Y-61.8=0.5(X-67)
Y=0.5×X-33.5+61.8  X=76
Y=0.5×(76)+34.3  Y=72.30
then marks in statiscs is72.30
iii]marks obtained in Statistics
Y=60
Regression line of X on Yis
X-x = bxy  Y-y   X-67 = 0.63  y-67.8
X-67 = 0.63Y-42.714  X = 0.63y +24028
Y= 60 then X is
X=0.63  60  +24028  X=62.08

Marks in Mathematics is 62

[8] Following are data of retail food price index(x) &whole sale food price
index (y) for 10 years. Find the regression lines hence find correlation
coefficient.
X 89 86 74 65 65 63 66 67 72 79
Y 92 91.5 84 75 73.5 72 70.5 75 77.5 84

Solution:- Given that,


x = retail food price index and Y= whole sale food price index.
X Y U=X-65 V=Y- U2 V2 UV
73.5
89 92 24 18.5 576 342.25 444
86 91.5 21 18 441 324 378
74 84 9 10.5 81 110.25 94.5
65 75 0 1.5 0 0 0
65 73.5 0 0 0 2.25 0
63 72 -2 -1.5 4 9 3
66 70.5 1 -3 1 2.25 -3
67 75 2 1.5 4 16 3
72 77.5 7 4 49 2.25 28
79 84 14 10.5 196 110.25 147
76 60 1352 918.50 1094.5
1 1
U=  Ui  U= ×76  U=7.6
n 10
1 1
V=  Vi  V= ×60  V=6
n 10
X = a +U  X= 65+7.6  X=72.6
Y = b +V  Y=73.5+6  Y=79.5
1 1
Cov  U,V  =  Ui Vi -U  V  Cov  U,V  = 1094.5 -  7.6  6 
n 10
Cov  U,V  =109.45+45.6  Cov  U,V  =63.85
Cov(U, V)  Cov(X, Y)  Covariance is independent of change of origin 
 Cov(X,Y)  63.85

1 1
σ 2U =  Ui 2 -(U)2  σ 2U = (1352)-(7.6)2
n 10
σ 2U =77.44
1 1
σ 2V =  Vi 2 -(V)2  σ V2 = (918.50)-(6)2
n 10
σ 2V =55.85

Variance is independent of change of origin

σ 2U = σ 2X & σ 2V = σ 2Y ; σ 2X  77.44 & σ 2Y  55085

Regression coefficient of x on y & y on x

Cov(x,y) 63.85
bXY = = =1.1432
σ2Y 55.85

Cov(x,y) 63.85
bYX = = =0.8242
σ2X 77.44

Regression line of x on y

x-x = bXY (y-y)  x-72.6=1.1432  y-79.6 


x =1.1432y-90.88+7206  x =1.1432y-18.28

Regression line of y on x is
y-y = bYX (x-x)  x-79.6=0.8242  x-72.6 
y =0.8242x-59.859+79.6  y =0.8242x+19.641

Correlation coefficient is

r 2 = bXY  b YX  r 2  1.1432  0.8242 

r 2  0.9425

[9] Following are the results of [Link] examination in a certain for the last 10
years

Year No of candidate No of successful


appeared candidate
1981 120 100
1982 150 137
1983 200 164
1984 350 302
1985 371 356
1986 385 379
1987 400 375
1988 386 381
1989 362 331
1990 350 350
Using regression line estimate no of successful candidate for the year 1996 if
400 candidate appears for examination
Solution:-Let X =no of candidate of appeared and Y= no of successful
candidate
X Y U= X- V=Y- U2 V2 UV
200 164
120 100 -80 -64 6400 4096 5120
150 137 -50 -27 2500 729 1350
200 164 0 0 0 0 0
350 302 150 138 22500 19044 20700
371 356 171 192 29241 36864 32832
385 379 185 215 34225 46245 39775
400 375 200 211 40000 44521 42200
386 381 186 217 34596 47089 40362
365 331 162 167 26244 27889 27054
350 350 150 186 22500 34596 27900
1024 866 218206 261053 237293

1 1
U=  Ui  U= ×1076  U=107.4
n 10
1 1
V=  Vi  V= ×866  V=86.6
n 10
X = a +U  X= 200+107.4  X=307.4

Y = b +V  Y=164+86.6  Y=250.6
1 1
Cov  U,V  =  Ui Vi -U  V  Cov  U,V  =  237293 - 107.4 86.6 
n 10
Cov  U,V  =23729.3-9300.84  Cov  U,V  =14428.46
Cov(U, V)  Cov(X, Y)  Covariance is independent of change of origin 
 Cov(X,Y)  14428.46
1 1
σ 2U =  Ui 2 -(U)2  σ 2U = (218206)-(107.4) 2
n 10
1
σ 2U = (21820.6)-(11534.76)  σ 2U =10285.84
10
 σ 2U =σ 2X  variance isindependent of changeof origin 
 σ 2X =10285.84

Regression coefficient of Y on X is
Cov(X,Y) 14428.46
bYX =  b YX = =1.4027498
σX
2
10285.84
Regression lineon yand x is
Y-y =bYX (X-x)  Y-250.6=1.4027498(X-307.4)
Y-250.6=1.4027498X-431.2052  Y=1.4027498-180.6052 ....1

Estimate no. of successful candidate for the year 1996 if 400 candidates
appear examination i.e. X = 400 using equation (1)
Y = 1.4027498(400)-180.6052
Y = 380.51
Y = 381
[10] No. of successful candidates for year 1996 is 381 the following data
gives the sales and expense of 10 firs
Firm no 1 2 3 4 5 6 7 8 9 1
0
Sales 4 7 6 3 9 4 5 7 8 6
(in 000) 5 0 5 0 0 0 0 5 5 0
expense 3 9 7 4 9 4 6 8 8 5
s 5 0 0 0 5 0 0 0 0 0

Obtain the least square regression line of expenses on sales estimate expenses
if n sales x75000 also draw residual plot find the residual sum of square .
X2 Y2  y  yˆ 
2
X Y XY Regression Residual
estimate of y  yˆ

45 35 2025 1225 1575 47.79 -12.79 163.584
70 90 4900 8100 6300 73.110 16.890 285.272
65 70 4225 4900 4550 68.046 1.954 3.8181
30 40 900 1600 1200 32.598 7.402 54.7896
90 95 8100 9025 8550 93.366 1.634 2.6699
40 40 1600 1600 1600 42.726 -2.726 7.43107
50 60 2500 3600 3000 52.854 7.146 51.0653
75 80 5625 6400 6000 78.174 1.826 3.3342
85 80 7225 6400 6800 88.302 -8.302 68.9232
60 50 3600 2500 3000 62.982 -12.982 168.532
610 640 40700 45350 42575 639.948 0.0520 809.4196

First of all we fit regression line of  Y-y  =bYX (X-x)

1 1
x =  xi x=  610 = 61
n 10
1 1
y =  yi  y =  640 = 64
n 10
1 1
σ 2x    x i2  (x)2  σ 2x   (40700)  (61) 2
n 10
σ x  4070  3721
2
 σ 2x  349  σ x  18.68
1 1
σ 2y    yi2  (y)2  σ 2y   (45350)  (64) 2
n 10
σ y  4535  4096
2
 σ y  439
2
 σ Y  20.95

Cov(x,y)= 
x i yi 1
-(x)(y)  Cov(x, y)  (42575)  (61)(64)
n 10
Cov(x,y)  4257.5  3904  Cov(x,y)  353.5
Regression coefficient of y on x is
Cov(x,y) 353.5
b yx =  b yx =  1.0128
σ2x 349
Regression line of y on x is
y-y =b yx  x-x   y-64 =1.0128  x-61.7808
y =1.0128x-61.7808+64  y = 1.0128x+2.2192 1
Given that estimate Y if sales are Rs. 75000 i.e. X=75000, From equation (1)
Y=1.0128(75000) + 2.2192
Y=75960 + 2.2192
Y=75962.22

Substitute X in equation (1), we compute Y, Hence y  yˆ and  y  yˆ  , thus


2

we complete the column (6) (7) (8) in the above table


Residual sum of squares
SSE=  y  yˆ   SSE=809.4196
2

Total sum of square


SST=  (y-y)2  SST=  y 2 -n  y 
2

SST=45350 10   64   SST=45350  40960


2

SST=4390
SSE 809.4196
r 2 =1-  r 2 =1-
SST 4390
4390  809.4196 3580.58
r2   r2 =
4390 4390
2
r = 0.81562
Residual Plot:-

Y-Values
20
15
10
5
Y-Values
0
0 20 40 60 80 100
-5
-10
-15

Interpretation:-There is no pattern seen on residual plot.

[11] The two lines of regression of x+2y-5=0 & 2x+3y-8=0

(i) Compute the correlation between X & Y

(ii) Estimate X when y =2.5

Solution:- Given

x+2y-5=0………….(1)

2x+3y-8=0..……….(2)

(i) To compute the correction between x &y assume the regression equation
(1) be X on Y

x+2y-5 = 0

x+2y = 5

x = 5-2y

x = a+by

bxy = -2.

Now the regression equation (2) be Y on X


2x+3y-8=0

3y=8-2x
8 2
y  x
3 3
Compare y=a + b x
2
b yx 
3
As we know
2
r 2 = b XY  b YX  r2 =  2
3
4
r2 =  r 2  1.33
3
r =1.15 but r 1
[12] For bivariate data the regretion equation are 4x-5y+33=0 & 20x-9y=107
find means of x & y find correction coefficient between x&y also estimate
y when x=10
Solution:- Given equations are
4x-5y+33=0...........[i]
20x-9y=107..........[ii]
i] To find means of x & y since two regresion lines intersected at x y
therefore equations [i] and [ii]
4x-5y=-33...............[iii]
20x-9y=107............[iv]
Multiplying a constant 5 by equation [iii], we get
20x-25y=-165.........[v]
Substract equation [iv] from [v], we get
 20 x  25 y    20 x  9 y   165 107
16 y  272
272
y
16
y  17
Put value of y in eqn ..[i]
4 x -5(17)  33  0  4 x -85  33  0
4 x  52  x  13
ii]Tofind correlation coefficient between x&y
Let, regression equation[ii] is y on x
4 x  5 y  33  0  4 x  5 y  33
5 y  33  4 x

hence our supposition is wrong


Let regression equation [i] be y on x

x  2y 5  0  2y  5 x

5 1
y  x
2 2
1
Hence by b YX = 
2
Now regression equation [ii] be x on y
2x  3 y  8  0  2x  8  3 y
3
x  4 y
2
3
Hence by b XY = 
2
We know,
3 1
r 2 =b xy ×b yx  r2  
2 2
3
r2   r 2  0.75
4
r  0.866
2] Estimate x when y  2.5
2x  3 y  8  0  2 x  3(2.5)  8  0
2 x  0.5  0  2 x  0.5
0.5
x  x  0.25
2
33 4
y= + x  y = a + bYX x
5 5
4
Hence, bYX 
5
Then regression equation [ii] is x on y
20 x  9 y  107  20 x  107  9 y
107 9 9
x  y bXY 
20 20 20
As we know
9 4
r 2 =bXY .bYX  r2 = ×
20 5
36
r2 =  r = 0.6
100
iii] For estimating y when x=10
33 4 33 4
y  x y  10
5 5 5 5
y  14.6
[13] For a certain bivariate dada the list square lines of regression are 4y-
x=19 and 9x-y=39 obtain
(i) Regression coefficient of x on y
(ii) Regression coefficient y on x
(iii) Correlation coefficient between x and y
Answer:- Given equations are,
4y-x=90………… [i]
9x-y=39………….[ii]
Let equation [i] become a regression line x on y
4 y  x  19   x  19  4 y
x  19  4 y  x = a + bXY y
bXY  4
Let equation [ii] become regression line y on x
9 x  y  39   y  39  9 x
y  39  9 x  y = a + bYX x
 bYX  9
As we know

r 2 =bXY .bYX  r 2 =4×9


r 2 =36 r = 6
But r ≤1, Hence our assumption was wrong and therefore alternate the
equations
9 x  y  39  9 x  39  y
39 1
x  y  x = a + bXY y
9 9
1
 bXY 
9
Regression equation [i] is y on x
4 y  x  19  4 y  19  x
19 1
y  x  y = a + bYX x
4 4
1
bYX 
4
Correlation coefficient
1 1
r 2 =bXY .bYX  r2 = ×
4 9
1
r2 =  r = 0.16
36
[14] The equation of the two regression lines are 2x+3y-6=0 & 5x+7y-12=0
obtain
σx
(a) correlation coefficient between x & y (b) .
σy
Solution :- Let 2x+3y-6=0…….[i]
5x+7y-12=0……[ii]
(a) Now, assume regression equation (i) is y on x
2x  3 y  6  3y  6  2x
2
y  2 x  y = a + bYX x
3
2
 bYX  
3
Let regression equation (ii) is x on y
5 x  7 y  12  5 x  12  7 y
12 7
x  y  x = a + bXY y
5 5
7
bXY  
5

As we know

 7   2 
r 2 =bXY .bYX  r2 =  × 
 5   3 
 14 
r 2 =-    r 2 =-0.93
 15 
r = -0.96
σx
(b) To find as we know,
σy
Cov(x,y)
b yx = …….(1)
σ2x
Cov(x,y)
bxy = ……(2)
σ2y
Divide equation (1) by (2), we get
Cov(x,y)
b yx σ 2x b yx Cov(x,y) σ 2y
   
b xy Cov(x,y) b xy σ 2x Cov(x,y)
σ 2y
b yx σ 2y σ 2x b xy
  2 
b xy σ 2x σ y b yx
σ 2x 7 3 σ 2x 21
   
σ 2y 5 2 σ 2y 10
σx
 3.16
σy

[15] For a bivariate data on x and y the regression equation of two lines of
regression are 3x-2y+1=0 & 3x-8y+13=0 predict the value of y for x = 4 and
value of x for y =3
Solution:- Given equations are
– 3x-2y +1 = 0 ……(1)
3x-8y +13=0…….(2)

Assume regression equation (1) is x on y


3x  2 y  1  3 x  1  2 y
1 2 2
x  y  bXY 
3 3 3
Let regression equation (2) is y on x
3x  8 y  13  0  3x  8 y  13
13 3
8 y  13  3x y= + x
8 8
3
y = a + bYX x  bYX 
8
3
y = a + bx  b yx 
8
We know,
3 2
r 2 = b yx  b xy  r2 = ×
8 3
2
r2 =  r = 0.5
8
Now equation [i] is x on y, Put the value of y for getting x in equation [iii]
2 1 2 1
x y  x  3
3 3 3 3
5
x  x  1.66
3
Put value in equation [iv]
13 3 25
y   y
8 8 8
y  3.125
[16] You are given the following information about two variables x & y,
n=10,

x  5.5, y  4,  x 2  385,  y 2  192,  xy  185 .Find

(i) Regression line of y on x (ii) Regression line x on y

Solution:- Regression line of x on y is


σX
(x  x )  bXY (y y)  (x  x )  r × (y y)......... i 
σY
Regression line y on x is
σY
(y y )  b YX (x  x )  (y y )  r × (x  x )....... i 
σX
x  5.5  y4
1 1
σ 2X =  x i 2 -x 2  σ 2X = ×185-5.52
n 10
σ 2X =38.5-30.25  σ X2 =8.25
1 1
σ 2Y =  yi 2 -y2  σ 2Y = ×192-42
n 10
σ 2Y =19.2-16  σ Y2 =3.2
1 1
Cov(x,y)=  x i yi -x y  Cov(x,y)= ×185-(5.5)(4)
n 10
Cov(x,y)=18.5-22  Cov(x,y)=3.2
cov(x,y) 3.5
bXY =  b XY =-
σ 2Y 3.2
bXY =-1.0937
Cov(x,y) -3.5
bYX =  b YX =
σ 2X 8.25
bYX =-0.4242
(i) Regression line X on Y is
(x  x )  bXY (y y )  ( x  5.5)  1.0937( y  4)
x 1.0937 y  4.3748  x  1.0937 y  9.8748
x  9.8748 1.0937 y
(ii) Regression line of Y on X
(y y )  b YX (x  x )  y  4  0.4242 x  2.3331
y  0.4242 x  6.3331  y  6.3331  0.4242 x
[17] Compute regression coefficient from the following data
n = 8,  (x-45)=-40,  (x-45)2 =4400,  (y-150)=280,  (y-150)2 =167432,
 (x-45)(y-150)=21680
Answer:
1 1
x =  xi  x = ×-40
n 8
x =-5
1 1
y =  yi  y = ×280
n 8
y =35

1 1
σ 2X =  x i 2 -x 2  σ 2X = ×4400-  -5
2

n 8
σ 2X =550-25  σ X2 =525
1 1
σ 2Y =  yi 2 -y2  σ 2Y = ×167432-  35
2

n 8
σ 2Y =20929-1225  σ Y2 =19704

1 1
Cov(x,y)=  x i yi -x y  Cov(x,y)= ×21680-(-5)(35)
n 8
Cov(x,y)=2710+175  Cov(x,y)=2885

cov(x,y) 2885
b XY =  b XY =
σ 2Y 19704
b XY =0.1464
Cov(x,y) 2885
b YX =  b YX =
σ 2X 525
b YX =5.4952

[18] From the following data obtain the yield when the rainfall is 29 inches
Rainfall(inches) Yield(per acre)
A.M. 27 40 quintal
S.D. 3 6 quintal
Correlation coefficient between rainfall and yield is 0.8
Answer:-
Let x =27, y =40, σ x =3, σ y =6, r =0.8
Regression equation x on y
σx 3
x-x= r× (y-y)  x-27=0.8× (y-40)
σy 6
x=11+0.44..............(i)
Regression equation y on x
σy 6
y-y =r× (x-x)  y-40 =0.8× (x-27)
σx 3
y =1.6x-43.2+40
y =1.6x-3.2..................(ii)

Put x=29 in equation (ii) we get,


Y=43.2
The yield is 43.2 per acre when the rainfall is 29 inches
[19] For a bivariate data we have
x=53, y=28, b YX =-1.5, b XY =-0.2 find
i] Correlation coefficient between x & y
ii] Estimate of y for x=60
iii] Estimate x for y=30
Solution:-
[i] r 2 = bXY ×b YX  r 2 = -0.2×(-1.5)
r 2 = -0.3  r = -0.5477

[ii]Regression line x on y is
(x-x)=bXY .(y-y)
(x-53)=-0.2(y-28)
x=-0.2y+58.6.............(i)
Put y =30 in equation (i)
x=58.6-0.2(30)
x=52.6
[iii]Regression line y on x is
(y-y)=bYX (x-x)

(y-28)=-1.5(x-53)
y=-1.5x+107.5
y =107.5-1.5x................(ii)
x=60 put this value in equation (ii),we get
y=107.5-1.5(60)
y=17.5

[20] Obtain the coefficient of correlation & the regression lines from the
following data.

X Y
No. of observation 15 15
Sum of squares of deviation from mean 136 138
Sum of product of deviation from mean 122

Solution:-

Here,  (x-x)2 =136,  (y-y) 2 =138, n=15, (x i -x)(yi -y)=122


Now ,
1 1
Cov(x,y)=  (x i -x)(yi -y)  Cov(x,y)= ×122
n 15
Cov(x,y)=8.1333
1 1
σ 2X =  (x i -x)2  σ 2X = ×136
n 15
σ 2X =9.0666  σ X =3.01107
1 1
σ 2Y =  (yi -y)2σ 2Y = ×138σ 2Y =9.2σ Y =3.0331
n 15
Regression coefficient of x on y & y on x
cov(x,y) 8.1333
bXY =  b XY =
σ 2Y 9.2
bXY =0.8840
Cov(x,y) 8.1333
bYX =  b YX =
σ 2X 9.066
bYX =0.8971
r 2 =b XY  b YX  r 2 = 0.8840×0.8971

r 2 =0.7930  r = 0.8905
Regression line x on y is
σx 3.01109
(x-x) = r. (y-y)  (x-x)=0.8905× (y-y)
σy 3.0331
(x-x)=0.8970(y-y)
Regression line y on x is
σy 0.8905
(y-y)=r. (x-x)  (y-y)= (x-x)
σx 1.0073
(y-y)=0.8840(x-x)

QUESTION BANK ON REGRESSION


[1] ________ gives the mathematical relations of the variables
(a) correlation (b) regression
(c) both (d) none
Answer: (b) regression
[2] Under Algebraic Method we get ________ linear equations.
(a) one (b) two
(c) three (d) none
Answer: (c) three
[3] In linear equations Y= a+bX and X=a+bY ‘a’ is the
(a) intercept of the line (b) slope
(c) both (d) none
Answer: (b) slope
[4] In linear equation Y=a+bX and X=a+bY ‘b’ is the
(a) intercept of the line (b) slope of the line
(c) both (d) none
Answer: (b) slope of the line
[5] The regression equtions is Y=a+bX and X a+bY are based on the
Method of the
(a) greatest squares (b) least squares
(c) both (d) none
Answer: (a) greatest squares
[6] The line Y= a + bX represents the regressions equations of
(a) Y on X (b) X on Y
(c) both (d) none
Answer: (a) Y on X
[7] The lines X= a+bY represents the regression equation of
(a) Y on X (b) X on Y
(c) both (d) none
Answer: (b) X on Y
[8] Two regression lines always intersect at the means
(a) true (b) false
(c) none (d) both
Answer: (a) true
[9] r, b xy , b yx all have ______ sign
(a) different (b) same
(c) both (d) done
Answer: (b) same
[10] The regression coefficients are zero if r is equal to
(a) 2 (b) -1
(c) 1 (d) 0
Answer: (d) 0
[11] The regression lines are identical if r is equal to
(a) +1 (b) -1
(c) 1 (d) 0
Answer: (b) -1
[12] The regression lines are perpendicular to each other if r is equal to
(a) 0 (b) +1
(c) -1 (d) 1
Answer: (d) 1
[13] Feature of least square regression lines are _______ The sum of the
deviations at the Y’ s or the X’s from their regressions lines are zero
(a) true (b) false
(c) both (d) none
Answer: (c) both
[14] The coefficient of determination is defined by the formula

unexplained variance
(a) r 2 =1-
total variance

unexplained variance
(b) r 2 =
total variance

(c) both

(d) none

unexplained variance
Answer: (a) r 2 =1-
total variance
13 3X
[15] If the line Y=  is the regression equation of y on the x then byx is
2 2
(a) 2/3 (b)-2/3
(c) 3/2 (d)-3/2
Answer: (a) 2/3
5
[16] The line, X=19- Y is the regression equation x on y then bxy is
2
(a) 19/2 (b) 5/2
(c) -5/2 (d) -2/5
Answer: (c) -5/2
31 1
[17] The line X= - Y is the regression equation of
6 6
(a) Y on X (b) X on Y
(c) both (d) we can not say
Answer: (d) we can not say
35 2
[18] In the regression equation x on y, X= - Y, b xy is equal to
8 5
(a)-2/5 (b) 35/8 (c) 2/5 (d) 5/2
Answer: (a)-2/5
[19] The correlation coefficient being +1 if the slope of the straight line in a
scatter dingram is
(a) positive (b) negative (c) zero (d) none
Answer: (a) positive
[20] The correlation coefficient being -1 if the slope of the straight line in a
scatter diagram is
(a) positive (b) negative (c) zero (d) none
Answer: (b) negative

[21] The more scattered the points are around a straight line in a scattered
diagram the……………. is the correlation coefficient.
(a) zero (b) more (c) less (d) none
Answer: (c) less
[22] If the values of y are not affected by changes in the values of x, the
variables are said to be
(a) correlated (b) uncorrelated
(c) both (d) zero
Answer: (b) uncorrelated
[23] If the amount of change in one variable tends to bear a constant ratio to
the amount of change in the other variable, then correlatiom in said to be
(a) non -linear (b) linear (c) both (d) none
Answer: (b) linear
[24] Two regression lines councide when r is equal to
(a) 0 (b) 2
(c)  1 (d) none

Answer: (c)  1

[25] Neither y nor x can be estimated by a linear function of the other


variable when r is equal to
(a) +1 (b) -1
(c) 0 (d) none
Answer: (c) 0
[26] When r = 0 then cov (x, y) is equal to
(a) +1 (b) -1
(c) 0 (d) none
Answer: (c) 0

[27] When the variables are not independent, the correlation coefficient may
be zero.
(a) true (b) false
(c) both (d) none
Answer: (a) true
[28] bxy is called regression coefficient of
(a) x on y (b)y on x
(c) both (d)none
Answer: (a) x on y
[29] byx is called regression coefficient of
(a) x on y (b) y on x
(c) both (d)none
Answer: (b) y on x
[30] The slopes of the regression line of y on x is denoted by
(a) byx (b) bxy (c) bxx (d) byy
Answer: (a) byx
[31] The slopes of the regression line of x on y is denoted by
(a) byx (b) bxy (c) bxx (d) byy
Answer: (b) bxy
[32] The angle between the regression lines depends on
(a) correlation coefficient (b) regression coefficient
(c) both (d) none
Answer: (a) correlation coefficient
[33] If x and y satisfy the relationship y= -5+7x, the value of r is
(a) 0 (b) -1
(c) +1 (d) none
Answer: (c) +1

[34] If byx and bxy are negative the r is


(a) positive (b) negative
(c) zero (d)none
Answer: (b) negative
[35] Correlation coefficient r lie between the regression coefficients byx and
bxy
(a) true (b)false (c) both (d)none
Answer: (a) true
[36] Since the correlation coefficient r cannot be greater than 1 numerically,
the product of the regression coefficient must
(a) not exceed 1 (b) exceed 1
(c) be zero (d) none
Answer: (a) not exceed 1
[37] The correlation coefficient r is the -------of the two regression coefficient
byx and bxy
(a) A.M. (b) G.M.
(c) H.M. (b) none
Answer: (b) G.M.
[38] Which is true?
σx σy
(a) b yx =r (b) b yx =r
σy σx

σx
(c) b yx =r (d) none
σ 2y

σy
Answer: (b) b yx =r
σx

[39] Maximum value of rank correlation coefficient is


(a) -1 (b) +1 (c) 0 (d) none
Answer: (b) +1
[40] The partial correlation coefficient lies between
(a) -1 and +1inclusive of these two value (b) 0 and +1
(c) -1 and (d) none
Answer: (a) -1 and +1inclusive of these two value
[41] Regression analysis is concerned with
[a] Establishing a mathematical relationship between two variables
[b] Measuring the extent of association between two variables
[c] Predicting the value of the dependent variable for a given value of
the independent variable.
[d] Both (a) and (c).
Answer: [d]
[42] If case the correlation coefficient between two variables is 1, the
relationship between the two variables would be
(a) y = a + bx (b) y = a + bx, b>0

(c) y = a + bx, b < 0 (d) y = a + bx , both a and b being positive

Answer:[b]
[43] If the relationship between two variables x and y is giving by
2x+3x+4=0, then the value of the correlation between x and y is
(a) 0 (b) 1
(c ) -1 (d) negative.
Answer:[c]
[44] If there are two variables x and y, the number of regression equation
could be
(a) 1 (b) 2 (c) Any other (d) 3
Answer:[b]
[45] Since Blood Pressure of a person depends on age, we need consider
(a) The regression equation of Blood Pressure on age
(b) The regression equation of age on Blood Pressure
(c) Both (A) and (b)
(d) Either (a) or (b)
Answer: [a]
[46] The method applied for deriving the regression equations is knows as
(a) Least square (b) Concurrent deviation
(c) Product moment (d) Normal equation
Answer:[a]
[47] The different between the observed value and the estimated value in
regression analysis is knows as
(a) Error (b) Residue
(c) Deviation (d) (a) or (b)
Answer:[d]
[48] The error in case of regression equations are
(a) Positive (b) Negative
(c) Zero (d) All the above
Answer:[d]
[49] The regression line of y on x is derived by
(a) The minimization of vertical distances in the scatter diagram
(b) The minimization of horizontal distance in the scatter diagram
(c) Both (a) and (b)
(d) (a) or (B)
Answer:[a]
[50] The two lines of regression become identical when
(a) r = 1 (b ) r = -1
(c) r = 0 (d) (a) or (b)
Answer:[d]
[51] What are the limits of the two regression coefficients?
(a) No limit
(b) Must be positive
(c) one positive and the other negative
(d)Product of the regression coefficient must be numerically less than
unit.
Answer: [d]
[52] The regression coefficients remain unchanged due to a
(a) Shift of origin (b) Shift of scale
(c) Both (a) and (b) (d) (a) or (b).
Answer: (a) Shift of origin
[53] If the coefficient of correlation between two variables is -0 9, then the
coefficient of determination is
(a) 0.9 (b) 0.81 (c) 0.1 (d) 0.19
Answer: (b) 0.81
[54] If the coefficient of correlation between two variables is 0.7 then the
percentage of variation unaccounted for is
(a) 70% (b) 30% (c) 51% (d) 49%
Answer: (c) 51%
[55] If y = a + bx, then coefficient of correlation between x and y?
(a) 1 (b) -1
(c) 1 or -1 according as b > 0 or b < 0 (d) none of these.
Answer: (c)
[56] If u + 5x = 6 and 3y – 7v =20 and the correlation coefficient between x
and y is 0.58 then what would be the correlation coefficient between u and v?
(a) 0.58 (b) -0.58
(c) -0.84 (d) 0.84
Answer: (b) -0.58
[57] If the relation between x and u is 3x + 4u +7 = 0 and the correlation
coefficient between x and y is -0.6, then what is the correlation coefficient
between u and y?
(a) -0.6 (b) 0.8
(c) 0.6 (d) -0.8
Answer: (c) 0.6
[58] Following are the two normal equation obtained for deriving the
regression line of y and x: 5a+10b=40 and 10a+25b=95. The regression
line of y on x is given by
(a) 2x+3y=5 (b) 2y+3x=5
(c) y =2+3x (d) y=3+5x
Answer: (c) y =2+3x
[59] If the regression line of y on x and of x on y is given by 2x+3y=-1 and
5x+6y=-1 then the arithmetic means of x and y are given by
(a) (1, -1) (b) (-1, 1) (c) (-1, -1) (d) (2, 3)
Answer: (a) (1, -1)
[60] Given the regression equations as 3x+y=13 and 2x+5y=20, which one is
the regression equation of y on x?
(a) 3x+y=13 (b) 2x+5y=20
(c) both (a) and (b) (d) none of these
Answer: (b) 2x+5y=20
[61] Given the following equation: 2x-3y=10 and 3x+4y=15, which one is the
regression equation of x on y ?
(a) 2x-3y=10 (b) 3x+4y=15
(c) both the equation (d) none of these
Answer: (d) none of these
[62] If u=2x+5 and v=-3y-6 and regression coefficient of y on x is 2.4, what
is the regression coefficient of v on u?
(a) 3.6 (b) -3.6 (c) 2.4 (d) -2.4
Answer: (b) -3.6
[63] If 4y-5x=15 is the regression line of y on x and coefficient of correlation
between x and y is 0.75, what is the value of the regression coefficient of x on
y?
(a) 0.75 (b) 0.9375
(c) 0.6 (d) none of these
Answer: (a) 0.75
[64] If the regression line of y on x and that of x on y are given by y=-2x+3
and 8x=-y+3 respectively, what is the coefficient of correlation between x
and y?
(a) 0.5 (b) -1/√2
(c) -0.5 (d) none of these
Answer: (c) -0.5
[65] If the regression coefficient of y on x, the coefficient of correlation
3
between x and y and variance of y are -3/4, and 4 respectively, what is
2
the variance of x?
2
(a) (b) 16/3 (c) 4/3 (d) 4
3
2
Answer: (b) 16/3
[66] If y=3x+4 is the regression line of y on x and the arithmetic mean of x is
-1, what is the arithmetic mean of y ?
(a) 1 (b) -1
(c) 7 (d) none of these
Answer: (a) 1
[67] The regressions equation of y on x for the following dada
X 4 8 6 3 5 9 12 7 12 10
1 2 2 7 8 6 7 4 3 0
Y 2 5 3 1 4 8 10 6 98 73
8 6 5 7 2 5 5 1
(a) Y=1.2x-15 (b) Y =1.2x+15
(c) Y=o.93x-14.64 (d) Y =1.5x-10.89
Answer: (c)

[68] The following data relate to the heights of 10 pairs of fathers and sons;
(175,173) , (172, 172), (167, 171), (168,171), (172,173 ), (171,170),
(174,173), (176,175), (169,170) (170,173)
The regression equation of height of son on that of father is given by
(a) y=100+5x (b) y=99.708+0.405x
(c) y=89.653 +0.582 x (d) y=88.758+0.562x
Answer: (b)
[69] The two regression coefficients for the following data ;
X 38 23 43 33 28
Y 28 23 43 38 8

(a) 1.2 and 0.4 (b) 1.6and0.8


(c) 1.7 and 0.8 (d) 1.8 and 0.3
Answer: (a)
[70] For y =25, what is the estimated value of x , from the following data:
X 11 12 15 16 18 19 21
Y 21 15 13 12 11 10 9

(a) 15 (b) 13.926


(c) 13,588 (d) 14.986
Answer: (c)
[71] Given the following data
Variable X Y
Mean 80 98
Variance 4 9
Coefficient of correlation=0.6. What is the most likely value of y when x =
90?
(a) 90 (b) 103 (c) 104 (d) 107
Answer: (d) 107
[72] The two lines of regression are 8x+10y=25 and 16x+5y=12
respectively; If the variance of x is 25, what is the standard deviation of y ?

(a) 16 (b) 8 (c) 64 (d) 4

Answer: (b) 8
[73] Given below the information about the capital employed and profit
earned by a company over the last twenty five years;
mean S.D.
Capital employed (0000’ Rs.) 62 5
25 6
Profit earned (000Rs.)
Coefficient of correlation between capital employed and profit = 0.92. The
sum of the regression coefficients for the above data would be;

(a) 1.871 (b) 2.358 (c) 1.968 (d) 2.346

Answer: (a) 1.871

[74] The coefficient of correlation between cost of advertisement and sales of


a product on the basis of the following data;
Ad cost (000 Rs.) 75 81 85 105 93 113 121 125
Sales (000 Rs. ) 35 45 59 75 43 79 87 95

(a) 0.85 (b) 0.89 (c) 0.95 (d) o.98


Answer: (c) 0.95

[A] THEORY QUESTIONS:


[1] Define the term ‘regression’ in details.
[2] State utility of regression lines.
[3] Define regression coefficients and state its properties.
[4] How would you interpret regression coefficients?
[5] State the situations where regression analysis is used
[6] Derive the expression for regression lines of Y on X.
[7] Derive the expression for regression lines of X on Y.
[8] Derive standard error of regression estimate
[9] Explain the following terms:
[i] Explained variation of dependent variable
[ii] Unexplained variation of dependent variable
[iii] Coefficient of determination
[10] Show that regression lines intersect at  x , y  .

[11] Show that r, b YX , b XY have same algebraic sign.


[B] Numerical Problems:
[1] Determine the two regression lines from the following data:
X 1 2 3 4 5
Y 5 4 3 2 1

[2] Determine the two regression lines from the following data:
X 2 4 5 8 10
Y 4 16 25 64 100

[3] Following are the data of marks in Statistics and Mathematics of 5


students
Statistics 78 82 88 90 95
Mathematics 71 76 80 88 100
(i) Calculate Correlation coefficients.
(ii) Calculate regression coefficients.
(iii) Estimate marks in Mathematics when he has scored 93 marks in
Statistics.
(iv) Estimate marks in Statistics when he has scored 85 marks in
Mathematics.
[4] From the following data, correlation coefficient between rainfall and yield
is 0.8. Obtain the yield when the rainfall is 30 inches.
Rainfall (inches) Yield (per acre)
Arithmetic mean 28 40
Standard deviation 4 6
[5] For a bivariate data:
Arithmetic means X = 53 Y = 28
Regression coefficient b YX = -1.5 b XY = -0.2

Find
(i) Correlation coefficient between X and Y.
(ii) Estimate of Y when X = 60
(iii) Estimate of X when Y = 30
[6] The two regression equations of variables X and Y are 3X-Y-5 = 0 and
4X-3Y = 0. Find (i) Arithmetic mean of X and Y. (ii) Coefficient of
variations of X and Y, if σ X  2 . (iii) Correlation coefficient between X and
Y.
[7] The two regression equations of variables X and Y are 8X-10Y = -66 and
40X-18Y = 214. Find (i) Arithmetic mean of X and Y. (ii) Correlation
coefficient between X and Y.
[8] Find the regression line of Y on X from the following data:
n  10,  x i2  385,  yi2  192, x  5.5, y  4,   x i  x  yi  y   185

[9] Find the regression line of Y on X from the following data. Also, estimate
Y when X = 0
n  100,  x i  25,  yi  68,  x i2  167,  yi2  162,   x i  x  yi  y   130

[10] Find the regression line of X on Y from the following data:


n  20,  x i2  285,  yi2  172, x  4.5, y  3,   x i  x  yi  y   40

You might also like