0% found this document useful (0 votes)
48 views

Simple Regression

This document defines and compares regression and correlation analysis. Regression analysis estimates the relationship between a dependent variable and one or more independent variables, while correlation analysis measures the strength and direction of the relationship between two variables. The document outlines the framework of regression analysis including theory, model, data, parameter estimation using ordinary least squares, and statistical tests like t-tests and F-tests. It provides examples of linear and non-linear regression models and discusses estimating parameters, goodness of fit, and testing the significance of estimated parameters.

Uploaded by

Bryan Luke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Simple Regression

This document defines and compares regression and correlation analysis. Regression analysis estimates the relationship between a dependent variable and one or more independent variables, while correlation analysis measures the strength and direction of the relationship between two variables. The document outlines the framework of regression analysis including theory, model, data, parameter estimation using ordinary least squares, and statistical tests like t-tests and F-tests. It provides examples of linear and non-linear regression models and discusses estimating parameters, goodness of fit, and testing the significance of estimated parameters.

Uploaded by

Bryan Luke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

REGRESSION &

CORRELATION
DEFINITION
 REGRESSION ANALYSIS : the process of estimating a
functional relationship between a random variable Y
(dependent variable) and one or more variable X’s
(independent/explanatory variable = predictor), that
is, we estimate the parameter of regression equation
to predict a value of Y for a given value of X’s.
 One variable X : Simple Regression
 More than one variables X : Multiple Regression
 CORRELATION ANALYSIS : the study of measuring the
direction (positive/negative) and strength
(strong/weak) of relationship between two random
variables.
FRAMEWORK OF
REGRESSION ANALYSIS
THEORY

MODEL

DATA

PARAMETER ESTIMATION

STATISTICAL TEST

t – TEST F – TEST
SAMPLE REGRESSION MODELS
 Linear
 Simple : Y = a + b1X1 + e
 Multiple : Y = a + b1X1 + … + bnXn + e
 Non-linear, examples :
 Polynomial : Y = a + b1X + b2X2 + b3X3 + e
 Reciprocal : Y = a + b1/X + e
 Interaction : Y = a + b1X1 + b2X2 + b3X1X2 +e
 Semi-log : Y = a + b1 ln X + e
 Double-log : log Y = a + b1 log X + e
 Multiply : Y = aKb1Lb2e
PARAMETER ESTIMATION
 Method : Ordinary Least Square (OLS)
 Minimize the sum of the squared residuals
 e 2

 Normal equations :

 Y  n.a  b X
 XY  a X  b X2

 Short-cut :

b
n XY   X  Y
a
 Y  b X
n X   X 
2 2
n
Year Consumption ($) Income ($)
1997 70 80
1998 65 100
1999 90 120
2000 95 140
2001 110 160
2002 115 180
2003 120 200
2004 140 220
2005 155 240
2006 150 260
Year Cons (Y) Inc (X) YX Y^2 X^2 Y' Y rata2 (Y-Yrata2)^2 (Y-Y')^2 (Y'-Yrata2)^2

1987 70 80 5600 4900 6400 65.18 111.00 1681.00 23.21 2099.31


1988 65 100 6500 4225 10000 75.36 111.00 2116.00 107.40 1269.95
1989 90 120 10800 8100 14400 85.55 111.00 441.00 19.84 647.93
1990 95 140 13300 9025 19600 95.73 111.00 256.00 0.53 233.26
1991 110 160 17600 12100 25600 105.91 111.00 1.00 16.74 25.92
1992 115 180 20700 13225 32400 116.09 111.00 16.00 1.19 25.92
1993 120 200 24000 14400 40000 126.27 111.00 81.00 39.35 233.26
1994 140 220 30800 19600 48400 136.45 111.00 841.00 12.57 647.93
1995 155 240 37200 24025 57600 146.64 111.00 1936.00 69.95 1269.95
1996 150 260 39000 22500 67600 156.82 111.00 1521.00 46.49 2099.31
10 1110 1700 205500 132100 322000 8890.00 337.27 8552.73

b= 0.5091 R^2 = 0.9621 Sy.x = 6.493

a= 24.4545 R= 0.9808
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.9808
R Square 0.9621
Adjusted R Square 0.9573
Standard Error 6.4930
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 8552.7273 8552.7273 202.8679 0.0000006
Residual 8 337.2727 42.1591
Total 9 8890

Coefficients Standard Error t Stat P-value


Intercept 24.4545 6.4138 3.8128 0.0051422
Income 0.5091 0.0357 14.2432 0.0000006
Estimation of Regression Model

180
y = 0.5091x + 24.455
160
R2 = 0.9621
140
120
100
Y

80
60
40
20
0
0 50 100 150 200 250 300
X
CORRELATION
 Pearson product-moment correlation coefficient (for
interval-scaled and ratio-scaled data) :
n XY   X  Y
R
n X 2 2

 (X ) n Y  (Y )
2 2

or
R
 X  X Y  Y 
n  1s x s y S = standard deviation

 Coefficient of Correlation (R) : -1 ≤ R ≤ 1


10
9
8
7
6
5
Y 4
3
2
1
0

0 1 2 3 4 5 6 7 8 9 10
X
Perfect Negative Correlation
10
9
8
7
6
5
Y 4
3
2
1
0

0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Correlation
10
9
8
7
6
5
Y 4
3
2
1
0

0 1 2 3 4 5 6 7 8 9 10
X

Zero Correlation
10
9
8
7
6
5
Y 4
3
2
1
0

0 1 2 3 4 5 6 7 8 9 10
X

Strong Positive Correlation


Perfect negative No Perfect positive
correlation Correlation correlation

Strong Moderate Weak Weak Moderate Strong


Negative Negative Negative Positive Positive Positive
Correlation Correlation Correlation Correlation Correlation Correlation
-1 -0.5 0 0.5 1
Negative Correlation Positive Correlation
Correlation Matrix

Inc Cons
Inc 1.000
Cons .981 1.000

10 sample size

± .632 critical value .05 (two-tail)


± .765 critical value .01 (two-tail)
COEFFICIENT OF DETERMINATION
 The proportion of the total variation in the dependent
variable Y that is explained by the variation in the
independent variable X’s.
 The summary measure that tell how well the sample
regression line fits the data  ‘goodness of fit’
assessment of regression model
 Coefficient of Determination (R2) : 0 ≤ R ≤ 1
SSR SSE SST  SSE  SSR
R  2
 1
SST SST  Y  Y 2
  Y  Yˆ 2
  Yˆ  Y 2

Adjusted R2 : n 1
 

R  1 1 R
2 2

n 1  k
Measures of Variation SST= SSR+SSE
Yi Error sum of
  
n 2
Y squares
 Yi  Yˆi
i 1
 SSE
Total sum of squares
Yˆi  b0  b1 X i
 Y  Y 
n
2
 SST
i 1
i
 Regression sum of squares

 Yˆ  Y 
n 2

i  SSR


i 1

0 X
Xi
STATISTICAL TEST
 t - TEST  partial test of significance
b b S y. x
t  Sb  Standard error
Sb SE( b )  X
2
of parameter
X 2

n

S y .x 
SSE


 Y  Ŷ 
2

n 1  k n 1  k Standard error
of estimation
 Y 2  a  Y  b  XY
S y. x 
n 1  k d.f. = n – 1 – k
Confidence Interval
1 ( X  X )2
Y ' t  ( SY  X ) 
n ( X ) 2
X 
2
n
Y’is the predicted value for any selected X value
X is an selected value of X
X is the mean of the Xs
n is the number of observations
Sy.x is the standard error of the estimate
t is the value of t at n-2 degrees of freedom
 F – TEST  simultaneous test of significance
SSR / k R2 / k
F 
SSE /( n  1  k ) ( 1  R 2 ) /( n  1  k )

 d.f1 (numerator) : k
 d.f2 (denominator) : n – 1 – k
(k : number of independent variables)
Testing the significance of the
correlation coefficient :
r n 1 k
t
1 r 2
d.f. = n – 1 – k

You might also like