Simple Regression
Simple Regression
CORRELATION
DEFINITION
REGRESSION ANALYSIS : the process of estimating a
functional relationship between a random variable Y
(dependent variable) and one or more variable X’s
(independent/explanatory variable = predictor), that
is, we estimate the parameter of regression equation
to predict a value of Y for a given value of X’s.
One variable X : Simple Regression
More than one variables X : Multiple Regression
CORRELATION ANALYSIS : the study of measuring the
direction (positive/negative) and strength
(strong/weak) of relationship between two random
variables.
FRAMEWORK OF
REGRESSION ANALYSIS
THEORY
MODEL
DATA
PARAMETER ESTIMATION
STATISTICAL TEST
t – TEST F – TEST
SAMPLE REGRESSION MODELS
Linear
Simple : Y = a + b1X1 + e
Multiple : Y = a + b1X1 + … + bnXn + e
Non-linear, examples :
Polynomial : Y = a + b1X + b2X2 + b3X3 + e
Reciprocal : Y = a + b1/X + e
Interaction : Y = a + b1X1 + b2X2 + b3X1X2 +e
Semi-log : Y = a + b1 ln X + e
Double-log : log Y = a + b1 log X + e
Multiply : Y = aKb1Lb2e
PARAMETER ESTIMATION
Method : Ordinary Least Square (OLS)
Minimize the sum of the squared residuals
e 2
Normal equations :
Y n.a b X
XY a X b X2
Short-cut :
b
n XY X Y
a
Y b X
n X X
2 2
n
Year Consumption ($) Income ($)
1997 70 80
1998 65 100
1999 90 120
2000 95 140
2001 110 160
2002 115 180
2003 120 200
2004 140 220
2005 155 240
2006 150 260
Year Cons (Y) Inc (X) YX Y^2 X^2 Y' Y rata2 (Y-Yrata2)^2 (Y-Y')^2 (Y'-Yrata2)^2
a= 24.4545 R= 0.9808
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9808
R Square 0.9621
Adjusted R Square 0.9573
Standard Error 6.4930
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 8552.7273 8552.7273 202.8679 0.0000006
Residual 8 337.2727 42.1591
Total 9 8890
180
y = 0.5091x + 24.455
160
R2 = 0.9621
140
120
100
Y
80
60
40
20
0
0 50 100 150 200 250 300
X
CORRELATION
Pearson product-moment correlation coefficient (for
interval-scaled and ratio-scaled data) :
n XY X Y
R
n X 2 2
(X ) n Y (Y )
2 2
or
R
X X Y Y
n 1s x s y S = standard deviation
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Negative Correlation
10
9
8
7
6
5
Y 4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Correlation
10
9
8
7
6
5
Y 4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Zero Correlation
10
9
8
7
6
5
Y 4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Inc Cons
Inc 1.000
Cons .981 1.000
10 sample size
Adjusted R2 : n 1
R 1 1 R
2 2
n 1 k
Measures of Variation SST= SSR+SSE
Yi Error sum of
n 2
Y squares
Yi Yˆi
i 1
SSE
Total sum of squares
Yˆi b0 b1 X i
Y Y
n
2
SST
i 1
i
Regression sum of squares
Yˆ Y
n 2
i SSR
i 1
0 X
Xi
STATISTICAL TEST
t - TEST partial test of significance
b b S y. x
t Sb Standard error
Sb SE( b ) X
2
of parameter
X 2
n
S y .x
SSE
Y Ŷ
2
n 1 k n 1 k Standard error
of estimation
Y 2 a Y b XY
S y. x
n 1 k d.f. = n – 1 – k
Confidence Interval
1 ( X X )2
Y ' t ( SY X )
n ( X ) 2
X
2
n
Y’is the predicted value for any selected X value
X is an selected value of X
X is the mean of the Xs
n is the number of observations
Sy.x is the standard error of the estimate
t is the value of t at n-2 degrees of freedom
F – TEST simultaneous test of significance
SSR / k R2 / k
F
SSE /( n 1 k ) ( 1 R 2 ) /( n 1 k )
d.f1 (numerator) : k
d.f2 (denominator) : n – 1 – k
(k : number of independent variables)
Testing the significance of the
correlation coefficient :
r n 1 k
t
1 r 2
d.f. = n – 1 – k