Lecture - Regression - Compatibility Mode
Lecture - Regression - Compatibility Mode
Regression Analysis
A Graphical Illustration
Simple Linear Regression Model
Yi = βo + β1xi + εi
Where
Y = Dependent variable
X =Independent variable
β o = Model parameter that represents mean value of dependent variable (Y) when
the independent variable (X) is zero
β1 = Model parameter that represents the slope that measures change in mean
value of dependent variable associated with a one-unit increase in the
independent variable
εi = Error term that describes the effects on Yi of all factors other than value of Xi
18 19
SSE ei2 ( yi yi ) 2 ( yi (b0 b1 xi )) 2
SSM Reduction in sum of squared prediction error that
has been accomplished using x in predicting y
Predicted value of Yi , Explained variance
( Yˆi Y ) 2
(Yi Y ) (Yi Yˆi )2 (Yˆi Y )2
2
20 21
Regression Analysis
Regression Analysis -- Quantify the degree of linear association
--Example
Store Traffic Versus Advertising
Predict how the store traffic would change (e.g. increase by how many Dollars
2000
people) with a $1,000 increase in advertising?
1500
Store Traffic
-- If we can observe the store traffic (y) and ad spending (x) for all 200 1000
stores: (x1,y1), (x2,y2), …, (x200, y200), we could fit a regression line for the
500
entire population. The linear equation might be hypothesize as:
0
Y i 0 1 * X i i Other factors 0 500
Advertising Dollars
1000
affecting y
# of Consumers Ad spending ($1000) for store i But instead we have data from the sample of 20 stores from which we
visiting store i (independent variable) can perform sample linear regression.
(dependent variable)
Change of store traffic
Identify the straight line where most of the sample points fall upon!
Store traffic without ad given $1,000 increase of ad
22 23
22 23
25
24 25
Plot option for predicted values and residuals
Residual = Observed – Predicted
26 27
26 27
28 29
28 29
Check samples 14 and 17.
30 31
30 31
1 Estimate of :
Use estimate of b1 from sample to infer value of 1 in the population
0 Estimate of :
Without any ad spending, on average, about 149
On average, about 1 and a half more consumer would • H0: 1 0 (No ad effect on store traffic in the population)
visit a store if ad spending goes up by $1000.
people would visit a store on Saturday. • Ha 1 0
b 0 1.5408
Tstat 1 7.234
Sb 0.2130
Alternative way of judgment: P-value
E.g. if you set α=0.05, you can reject H0 here.
32 33
Yi = 148.64 + 1.5 * Xi + ei
32 33
Multiple Regression Evaluating the Importance of Independent Variables
A linear combination of predictor factors is used to predict the Step 1: Consider t-value for βi's
outcome or response factors
The general form of the multiple regression model: Step 2: Use beta standard coefficients when independent variables are
The prediction equation Check for multi-collinearity: “colinearity diagnostic” in the SPSS
Y b0 b1 X 1 b2 X 2 .... bk X k
34 35
36
Stepwise Method
36 37
38
Y 0 1 D1 2 D2 3 D3 4 X 5 380 5 0
6 450 6.5 1
7 420 4.5 0
• For Rational buyer, Ŷi = a+ b4X 8 550 5 1
38 39
40 41
40 41
42 43
42 43
44 45
Adjust Original Regression Model if Necessary
Y i 1 * x1i 2 * x 2 i i Summary of Multiple Regression Output
(sales) (unit price) (advertising)
Diagnostic outputs (R-square, Adjusted R-Square, Significance F)
• R Square (adjusted) = 0.470 indicate the general fitness of the linear equation model to the
• Significance of model = 0.002 sample data;
• P-value for Unit Price = 0.004
• P-value for Advertising = 0.004 Key outputs (coefficients, t Stat, P-value) indicate the significance
of each independent variable in the regression model, and its
Therefore, the original linear model seems fine, suggesting that the change
of unit sales is mainly subject to the negative influence from price, and the marginal effect on changing the dependent variable in population
positive influence from advertising. under study;
46 47
Summary
independent
T-test: compares means between two independent groups
48
48