File4-Session3-Introduction To Regression
File4-Session3-Introduction To Regression
Regression
Introduction to linear regression
y 0 1 x u
y = Dependent variable
X = Independent variable
0 = y-intercept of the line (constant), cuts through the y-axis
1 = Unknown parameter – Slope of the line
u = Random error component
Terminology for simple
regression
y X
Dependent variable Independent variable
Explained variable Explanatory variable
Response variable Control variable
Predicted variable Predictor variable
Regressand Regressor
Examples
Sleepinghours 0 1Sporthours u
Turnover 0 1 Adverting u
Score 0 1 Attend u
yi 0 1 xi ui
File: dataspss-s4.1
DETERMINING THE EQUATION OF THE
REGRESSION LINE
Deterministic Regression Model –
mathematical models that produce an ‘exact’ output
for a given input
yˆ 0 1 x
Probabilistic Regression Model- a model that
includes an error term that allows for various values
of output to occur for a given value of input
yi 0 1 xi i
ŷ = predicted value of y
xi = value of independent variable for the ith value
yi = real value of dependent variable for the ith value
1 = population slope
0 = population intercept
i = error of prediction for the i th
value
Simple Linear Regression Model
Y
Yi β0 β1Xi ε i
Observed
value of Y for
Xi
εi
Slope = β1
Predicted
value of Y for Random error for
Xi this Xi value
Intercept = β0
Xi X
7
Sample Regression
Function (SRF)
SRF : yˆ i b0 b1 xi
ŷi = estimated value of Y for observation i
xi = value of X for observation i
b0 = Y- intercept
is the value of Y when X is zero
b1 = slope of the regression line
change in Y for 1 unit X
b1 > 0 : Line will go up; positive relationship between X and Y
b1 < 0 : Line will go down; negative relationship between X and Y
SIMPLE LINEAR REGRESSION MODEL
(sample)
SRF : yˆ i b0 b1 xi
where
SS xy
b1 b0 y b1 x
SS xx
( x)( y )
SS xy ( x x)( y y ) xy
n
( x ) 2
SS xx ( x x) x
2 2
n
S yy (Y Y ) 2
Sample Regression Function
(SRF) (continued)
b0 and b1 are obtained by finding the values of b0 and b1
that minimizes the sum of the squared residuals (minimize the
error). This process is called Least Squares Analysis
e
n 2 n
Yi Yˆi 2
i
i 1 i 1
Yi = actual value of Y for observation i
Yˆi = predicted value of Y for observation i
ei = residual (error)
b0 provides an estimate of 0
b1 provides and estimate of 1
RESIDUAL ANALYSIS
i 1
i 0
| Yi Yˆi | i
Simple example
4 .
yˆ 0.1 0.7 x
3
.
2 .
1 . .
x
1 2 3 4 5
Result of estimation by SPSS
Degree of freedom
k
n-k-1
n-1
yˆ 0.1 0.7 x
Meaning of b0 and b1
Y-Intercept (b0)
• Average value of individual income (Y) is
-0.1 (10 million VND) when when the experience year
(X) is 0
Slope (b1)
• Income (Y) is expected to increase by 0.7 (*10 million
VND) for each unit increased in experience year
16
_
SSR = (Yi - Y)2
_
Y
X
Xi
Result of estimation by SPSS
Coefficient of Determination, r2
The coefficient of determination is the portion of the total variation in
the dependent variable that is explained by variation in the
independent variable.
The coefficient of determination is also called r-squared and is denoted
as r2
Note: 0 r2 1
19
The Coefficient of Determination r2
and the Coefficient of Correlation r
2
SSR b1 S xx
r
2
0<r2<1
SS yy SS yy
r2 = Coefficient of Determination
Measures the % of variation in Y that is explained by the
independent variable X in the regression model
r r 2
-1<r<1
r = Coefficient of Correlation
Measures how strong the relationship is between X and Y
r > 0 if b1>0
r < 0 if b1 <0
Examples of Approximate r2
values
Perfect linear
Y Y relationship
between X and Y.
100% of the
variation in Y is
explained by
variation in X.
X
r2 = 1 r2 = 1 X
Y
No linear relationship between X
and Y.
The value of Y does not depend
on X (None of the variation in Y is
explained by variation in X).
r2 = 0 X
21
Examples of Approximate r2
values
0<r <1
2
Y
Weaker linear relationships
between X and Y.
X
22
Standard Error of the Estimate
The standard deviation of the variation of observations around the
regression line is estimated by:
SSE i i
(Y Yˆ ) 2
SYX i 1
n2 n2
Where
SSE = error sum of squares
n = sample size
23
Result of estimation by SPSS
SSE 1.1
SYX 0.60653
n2 52
Inferences About the Slope
The standard error of the regression slope coefficient (b 1) is
estimated by:
SYX SYX
Sb1
SSX (X i X) 2
Where
Sb1 = 0.1914854
Inference about the Slope: t Test
t test for a population slope:
• Is there a linear relationship between X and Y?
27
Result of estimation by SPSS
b1 β1 0. 7 0
t 3.66
S b1 0.1914854
Inferences about the Slope: t
Test Example
H0: β1 = 0 Test Statistic: t = 3.66
H1: β1 ≠0 T critical = +/- 3.182 (from t tables)
Decision: Reject H0
d.f. = 5-2 = 3 Conclusion: There is
sufficient evidence that
Do not /2=.025 number of customers
/2=.025
reject H0 affects weekly sales.
Reject H0 Reject H0
-t/2 0 t/2
-3.182 3.182 3.66
29
F Test for Significance
F Test statistic:
SSR
MSR Where MSR
F k
MSE SSE
MSE
n k 1
30
Result of estimation by SPSS
MSR 4. 9
F 13.36
MSE 0.3666667
F Test for Significance Example
df1= k =1
df2 = n-k-1=5-1-1
H0: β1 = 0 Test Statistic:
H1: β1 ≠ 0
MSR
= .05 F 13.36
MSE
df1= 1 df2 = 3
Conclusion:
Reject H0 at = 0.05
Critical Value:
F = 10.128 There is sufficient evidence that
number of customers affects
weekly sales.
= .05
0
F
Do not reject H0
F.05 = 10.128 Reject H0
32
Introduction to SPSS (file: dataspss-s4.1)
Result of estimation by SPSS
Voice of result
R-squared ranges in value between 0 and 1
R2 = 0, nothing to help explain the variance in y
R2 = 1, all the same points lie on the estimated regression line
Example: R2 = 0.93 implies that the regression equation explains
93% of the variation in the dependent variable
Multiple regression
Find out relationships between dependent and
independent variables
Dummy variable enclosed
Solution? and SPSS
Linear regression
y 0 1 x1 2 x2 .... n xn
File: dataspss-s4.2
Dependent variable ?
Independent variables
SPSS program
Estimate and discuss
Think?
Survey conducted with variables
Income
Age
Years in experience working
Education
Gender
………
Think which one is dependent and independent
variables?
Regression with dummy
independent variables
Independent variable: Gender
1= female, 0 = male
If coefficient estimated of gender is a positive value,
dependent variable is the direction increase with female
If coefficient estimated of gender is negative value,
dependent variable is the direction increase with male.
Samples of hypotheses
2 n 1
r 2
adj 1 (1 r )
n k 1
(where/với n = sample size/kích cỡ mẫu, k = number of indendent
variables/số biến độc lập)
45
Measuring Collinearity Variance
Inflationary Factor
The variance inflationary factor VIFj can be used to measure collinearity:
Chọn Collinearity
diagnostics
Kết quả điển hình từ SPSS
Biến phụ thuộc: Hài lòng