Part IIA Paper 3 Econometrics
Lecture 5:
Testing Model Specification
Oleg I. Kitov
[email protected]
Faculty of Economics and Selwyn College
Michaelmas Term 2021
1/15
Lecture outline
I F -test for overall regression significance.
I Examples of joint hypothesis testing.
I Ramsey RESET test [model specification].
I Logarithmic variables in regression.
I Moment-generating functions.
I Jensen’s inequality.
2/15
F -test revised
I Test a subset of regressors for joint significance.
I Unrestricted: Yi = β0 + β1 Xi1 + · · · + βk Xik + ui , with SSRu and Ru2 .
I Test H0 : β1 = β2 = · · · = βq = 0 with q restrictions.
I Restricted: Yi = π0 + πq+1 Xi,q+1 + · · · + πk Xik + vi with SSRr and Rr2 .
(SSRr − SSRu ) /q Ru2 − Rr2 /q
W = = ∼ Fq,n−k−1
SSRu / (n − k − 1) (1 − Ru2 ) / (n − k − 1)
I Reject H0 if w > F1−α,q,n−k−1 , the critical value at 1 − α probability.
I Consider: wagei = β0 + β1 educi + β2 IQi + β3 experi + ui .
I H0 : (β1 = 0) ∩ (β2 = 0), with q = 2 restrictions.
- [wage2.dta] STATA: reg wage educ IQ exper; test educ IQ
- F (2, 931) = 89.98, p = 0, reject H0 at any conceivable α.
3/15
F -test for overall regression significance
I Overall regression significance: test all regressors for joint significance.
I Unrestricted: Yi = β0 + β1 Xi1 + · · · + βk Xik + ui , with SSRu and Ru2 .
I H0 : β1 = β2 = · · · = βk = 0 with q = k restrictions.
I Restricted: Yi = π0 + vi [no explanatory variables in restricted model].
Pn
I Prediction Ŷi = π̂0 = Ȳ , for all i, so SSEr = i=1 (Ŷi − Ȳ ) = 0.
I As SSRr + SSEr = SST , so SSRr = SST and Rr2 = SSEr /SST = 0.
(SST − SSRu ) /k SSEu /k Ru2 /k
W = = = 2
.
SSRu / (n − k − 1) SSRu / (n − k − 1) (1 − Ru ) / (n − k − 1)
I W ∼ Fk,n−k−1 . For overall significance estimate unrestricted model only.
I Consider: wagei = β0 + β1 educi + β2 IQi + β3 experi + ui .
I H0 : (β1 = 0) ∩ (β2 = 0) ∩ (β3 = 0), with q = 3 restrictions.
- [wage2.dta] STATA: reg wage educ IQ exper
- F (3, 931) = 59.99, p = 0, reject H0 at any conceivable α.
4/15
F -test for multiple linear restrictions
I The F -test can be used to test general linear restrictions. Consider:
wagei = β0 + β1 educi + β2 IQi + β3 experi + β4 hoursi + ui
I [wage2.dta] STATA: reg wage educ IQ exper hours
I H0 : (β1 = 8β2 ) ∩ (β3 = 20) ∩ (β4 = −2), q = 3.
- test (educ = 8*IQ) (exper = 20) (hours = -2)
- F (3, 930) = 1.17, p = 0.32, do not reject H0 at α = 0.05.
I H0 : (β1 = 8β2 ) ∩ (β1 = 3β3 ) ∩ (β2 = 5) ∩ (β2 = −2β4 ), q = 4.
- test (educ = 8*IQ) (educ = 3*exper) (IQ = 5) (IQ = -2*hours)
- F (4, 930) = 2.36, p = 0.05, do not reject H0 at α = 0.01.
wage Coefficient Std. err. t P>|t| [95% conf. interval]
educ 58.5538 7.060856 8.29 0.000 44.69674 72.41086
IQ 5.109458 .9408177 5.43 0.000 3.263086 6.955829
exper 17.31687 3.115021 5.56 0.000 11.20358 23.43015
hours -2.286972 1.686849 -1.36 0.176 -5.597444 1.023499
_cons -447.9619 134.7606 -3.32 0.001 -712.4319 -183.4918
5/15
Non-linear regressors
I Consider: wagei = β0 + β1 educi + β2 experi + β3 exper2i + ui .
I [wage2.dta] STATA commands to estimate regression:
- gen exper2 = exper^2
- reg wage educ exper exper2
I Partial effect of experience on wage depends on individual’s experience:
∂wagei
= β2 + 2β3 experi
∂experi
I Suppose β2 > 0 and β3 < 0, wage is a concave function of experience:
∂wagei ∂ 2 wagei
= β2 > 0, = 2β3 < 0
∂experi experi =0 ∂exper2i
I H0 : β2 = β3 = 0: experi has no impact on wage [controlling for educi ].
I STATA : test exper exper2; F = 15.55, p = 0, reject H0 at any α.
I H0 : β3 = 0: experi has linear impact on wage [controlling for educi ].
I STATA : test exper2; F = 0.01, p = 0.94, not reject H0 at α < 0.94.
6/15
Interaction of regressors
I Consider: wagei = β0 + β1 educi + β2 IQi + β3 educi × IQi + ui .
I Partial effect of education on wage depends on individual’s IQ:
∂wagei
= β1 + β3 IQi
∂educi
I If β3 > 0, then education and IQ are complements.
I If β3 < 0, then education and IQ are substitutes.
I H0 : β1 = β3 = 0. Education has no effect on wage [controlling for IQ].
I [wage2.dta] STATA commands:
- gen educIQ = educ*IQ
- reg wage educ IQ educIQ
- test educ educIQ
I Estimate on cross-product β̂3 = 0.508 and its p-value, pβ̂3 = 0.195.
I F = 21.47, p = 0, reject H0 at any α. Education has an effect on wage.
7/15
Ramsey Regression Specification-Error Test (RESET) [1/3]
I Consider: wagei = β0 + β1 educi + β2 IQi + β3 experi + ui .
I Is this model specification correct or is it misspecified [omitted variables]?
I Should we include higher powers of regressors [squares, cubes, fourth]?
I Add educ2i , IQ2i , exper2i , educ3i , IQ3i , exper3i , educ4i , IQ4i , exper4i .
I Test if coefficients on these additional regressors are jointly zero: q = 9.
I [wage2.dta] STATA manual test procedure:
- gen educ2 = educ^2; gen IQ2 = IQ^2; gen exper2=exper^2;
- gen educ3 = educ^3; gen exper3 = exper^3; gen IQ3 = IQ^3;
- gen educ4 = educ^4; gen exper4 = exper^4; gen IQ4 = IQ^4;
- reg wage educ educ2 educ3 educ4 IQ IQ2 IQ3 IQ4 exper exper2
exper3 exper4;
- test educ2 educ3 educ4 IQ2 IQ3 IQ4 exper2 exper3 exper4;
- F (9, 922) = 0.99, p = 0.45, do not reject H0 , higher powers irrelevant.
I STATA built-in test procedure [Ramsey RESET test with powers only]:
- reg wage educ IQ exper
- estat ovtest, rhs 8/15
Ramsey Regression Specification-Error Test (RESET) [2/3]
I Should we include higher powers of regressors and cross-terms?
I Include educ2i , IQ2i , exper2i , educi × IQi , educi × experi etc. Too many!
I Short-cut: use powers of predicted values from original OLS regression
[ i = β̂0 + β̂1 educi + β̂2 IQi + β̂3 experi
wage
2
[ i = (β̂0 + β̂1 educi + β̂2 IQi + β̂3 experi )2
wage
3
[ i = (β̂0 + β̂1 educi + β̂2 IQi + β̂3 experi )3
wage
4
[ i = (β̂0 + β̂1 educi + β̂2 IQi + β̂3 experi )4
wage
2
[ i contains squares and cross-terms of regressors.
I wage
3
[ i contains cubes and cross-terms of squares with regressors.
I wage
4
[ i contains fourth power and cross-terms of cubes with regressors...
I wage
I Ramsey RESET tests whether non-linear combinations of the fitted values
can help explain the response variable.
9/15
Ramsey Regression Specification-Error Test (RESET) [3/3]
I Augment model with powers of predicted values as regressors:
2 3 4
wagei = γ0 +γ1 educi +γ2 IQi +γ3 experi +δ1 wage
[ i +δ2 wage
[ i +δ3 wage
[ i +vi
I Ramsey RESET test: H0 : δ1 = δ2 = δ3 = 0.
I Use F -test statistic with q = 3 restrictions.
I [wage2.dta] STATA built-in procedure [RESET test with cross-terms]:
- reg wage educ IQ exper
- estat ovtest
I F (3, 928) = 1.80, p = 0.1456, do not reject H0 at α < 0.1456.
I No evidence that model is misspecified and needs non-linear terms.
I rhs in ovtest means “right-hand side” variables [powers, no cross-terms].
10/15
Logarithmic variables [1/2]: Coefficient interpretation
I Suppose Xi is a continuous random variable.
I ln Yi = β0 + β1 Xi + ui :
β1 is proportional increase in Yi when Xi goes up by 1 unit:
∂ ln Yi 1 ∂Yi ∂Yi 1
= = = β1
∂Xi Yi ∂Xi Yi ∂Xi
I Yi = β0 + β1 ln Xi + ui :
β1 is an increase in Yi for a 1% change in Xi :
∂Yi 1 ∂Xi
= β1 =⇒ β1 = ∂Yi /
∂Xi ∂Xi Xi
I ln Yi = β0 + β1 ln Xi + ui :
β1 is the elasticity of Yi with respect to Xi :
∂ ln Yi 1 ∂Yi 1 ∂Yi ∂Xi 11/15
Moment-generating function [1/2]
I A moment-generating function (mgf) of a rv X with density fX (x )
Z ∞
MX (t) = E e tX = e tx fX (x ) dx .
−∞
I Consider exponential rv, X ∼ exp (β), with fX (x ) = β1 e −x /β , for x > 0:
Z ∞
1 ∞ x (t−1/β)
Z i∞
tX tx 1 −x /β 1 h
E e = e e dx e dx = e x (t−1/β)
0 β β 0 β −1 +t 0
β
1 −1
= [0 − 1] = (1 − βt)
(−1 + βt)
I Note that MX (t) exists only for t < 1/β, as the integral is well-defined.
I Use moment-generating function to compute n-th raw moment of X :
(n) d n MX (t)
E [X n ] = MX (0) =
dt n t=0
I First raw moment of X ∼ exp (β):
(1) d −1 −2
E [X ] = MX (0) = (1 − βt) = β (1 − βt) =β
dt t=0 t=0
12/15
Moment-generating function [2/2]
I Use mgf to derive the distribution of sums of random variables.
I Suppose X1 , . . . , Xn are iid random variables with mgf MXi (t). The sum
Pn
of these random variables, S = i=1 Xi , has mgf MS (t) such that
n
Y
MS (t) = MXi (t) = MX1 (t) × · · · × MXn (t) .
i=1
I A normal random variable X ∼ N µ, σ 2 has
1 2 2
MX (t) = e tµ+ 2 σ t
.
Pn
I Suppose Xi ∼ iidN µ, σ 2 for i = 1, . . . , n and that S = i=1 Xi , then
n
1 2 2 n
Y 1 2 2
MS (t) = MXi (t) = e tµ+ 2 σ t = e tnµ+ 2 nσ t .
i=1
I Conclude that S ∼ N nµ, nσ 2 : the sum of normal rvs is normal.
Pn Pn
I If Xi ∼ χ2di , for i = 1, . . . , n, then S = i=1 Xi ∼ χ2d , where d = i=1 di .
13/15
Logarithmic variables [2/2]: Predictions
I Consider: ln wagei = β0 + β1 educi + β2 IQi + ui , with ui ∼ iidN 0, σ 2 .
I OLS estimators β̂0 , β̂1 , β̂2 and σ̂ 2 = SSR/ (n − 3).
I Predicted value of ln wagei given educi = 15 and IQi = 100:
ln\
wagei = β̂0 + 15β̂1 + 100β̂2 .
I Prediction of wagei from regression for ln wagei is not straightforward:
wagei = e β0 +15β1 +100β2 +ui = e β0 +15β1 +100β2 e ui
E [wagei ] = E e β0 +15β1 +100β2 e ui = e β0 +15β1 +100β2 E [e ui ]
[ i = e β̂0 +15β̂1 +100β̂2 E
wage b [e ui ]
2 2 2
I Recall Mui (t) = E [e tui ] = e σ t /2
, and so Mui (1) = E [e ui ] = e σ /2
.
2
ui σ /2 E[ui ] 0 2
I Note that E [e ] = e 6= e = e = 1, as σ 6= 0.
I Prediction of wagei from regression for ln wagei :
2
[ i = e β̂0 +15β̂1 +100β̂2 e σ̂
wage /2
= e β̂0 +15β̂1 +100β̂2 e SSR/2(n−3)
14/15
Jensen’s inequality
15/15