0% found this document useful (0 votes)

18 views13 pages

M Is Specification

The document discusses model specification and data problems in econometrics, focusing on issues such as heteroscedasticity, endogeneity, and functional form misspecification. It introduces methods for detecting misspecification, including Ramsey's RESET test, and the use of proxy variables to address omitted variable bias. The document highlights the importance of ensuring correct model specification to achieve unbiased and consistent estimators in regression analysis.

Uploaded by

vincus27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views13 pages

M Is Specification

Uploaded by

vincus27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1 2

Model Specification and Data Problems

I In the previous class we analyzed one failure of Gauss-Markov
MODEL SPECIFICATION and DATA assumptions: MLR. 5 Constant Variance
PROBLEMS I Heteroscedasticity does not cause bias or inconsistency in the
OLS estimators but causes inefficiency. We learned that it is
relatively easy to adjust standard errors and test statistics.
Hüseyin Taştan1 I Now we want to analyze a more serious problem, namely,
violation of the assumption of exogeneity (MLR.3). We will
1 Yıldız
Technical University examine the case where “the error term u is correlated with
Department of Economics
one or more of the explanatory variables” (ie, endogeneity).
These presentation notes are based on I Recall that if the x variable is correlated with the error term it
Introductory Econometrics: A Modern Approach (2nd ed.)
by J. Wooldridge. is called an endogenous variable.
I Recall that when a relevant variable is omitted from the
31 Aralık 2012 model OLS estimators are biased and inconsistent.
I In the special case that the omitted variable is a function of
an explanatory variable in the model, the model suffers from
functional form misspecification.

3 4

Model Specification and Data Problems Functional Form Misspecification

I In this chapter, we will first discuss functional form I A multiple regression model suffers from functional form
misspecification and how to test for it. misspecification when it does not properly account for the
I Then, we will discuss how to use proxy variables to mitigate relationship between the dependent and observed explanatory
omitted variable bias. variables.
I We will also discuss problems caused by measurement errors I For example, if we fit a level-level model instead of a log-log
in dependent and explanatory variables. model (which is the true specification); or if we omit a
quadratic term where we should have added, then the model
I We will discuss the problems caused by endogenous variables
suffers from functional form misspecification. This, of course,
within the context of OLS estimators. In most cases,
leads to biased and inconsistent β̂j .
endogeneity problem cannot be solved within the OLS
framework. We will need consistent estimation methods such I Another example: suppose that the return to an additional
as Instrumental Variables and Two Stage Least Squares year of education changes with the gender implying that the
(2SLS). model should contain an interaction term. If we omit, for
some reason, this interaction term given by f emale × educ
then the functional form will be misspecified.
5 6

Functional Form Misspecification A Test for Functional Form Misspecification

I How to detect misspecified functional form? We can always I Is there a general test that can detect functional form
use the F -test for the joint exclusion restrictions such joint misspecification?
significance of quadratic terms, interaction terms, etc. I Yes, there are many misspecification tests. We will only
I We can use usual statistical inference procedures to mitigate examine one of them.
the functional form misspecification problem. I We will learn “Regression Specification Error Test” or RESET
I Significant quadratic terms may be symptomatic of other test of Ramsey (1969).
functional form problems such as using level of a variable I Ramsey’s RESET test is designed to detect if there are any
when the log is more appropriate. neglected nonlinearities in the model.
I In fact, using log transformation, where appropriate, may work
well in practice.

7 8

Ramsey’s RESET Test RESET Test

I Suppose that in the multiple linear regression model I The auxiliary regression for the RESET test statistic can be
written as follows:
y = β0 + β1 x1 + β2 x2 + . . . + βk xk + u
y = β0 + β1 x1 + β2 x2 + . . . + βk xk + δ1 ŷ 2 + δ2 ŷ 3 + u
the assumption MLR.3 (exogenous xs) is satisfied. I The null hypothesis of the RESET test says that the model is
I This implies that no nonlinear functions of the independent correctly specified:
variables (such as squares and cubes of xj s should be H0 : δ1 = 0, δ2 = 0
significant when added to the model.
I In large samples and under the Gauss-Markov assumptions,
I But, as in the White heteroscedasticity test, adding squares,
the usual F restrictions test follows the F (2, n − k − 3)
cubes and cross-products uses up many degrees of freedom.
distribution.
This is a drawback. I If the F statistic is greater than the critical value at a given
I Instead of this, we can add squares and cubes of the fitted significance level then we reject the null hypothesis of correct
values, ŷ 2 , ŷ 3 , into the model and test for the joint specification. This indicates that there is a functional form
significance of added terms using F or LM test. misspecification.
I We can also use LM test statistic. The LM test statistic
follows the χ22 distribution.
9 10

RESET Test Example: House prices, hprice1.gdt RESET Test Example: Level-level Model
I Level-level model:
Auxiliary regression for RESET specification test
OLS, using observations 1-88
price = β0 + β1 lotsize + β2 sqrf t + β3 bdrms + u Dependent variable: price

I Level-level estimation results: coefficient std. error t-ratio p-value

------------------------------------------------------------
[ = − 21.77 + 0.002 lotsize + 0.123 sqrft + 13.85 bdrms const 166.097 317.433 0.5233 0.6022
price lotsize 0.000153723 0.00520304 0.02954 0.9765
(29.475) (0.0006) (0.013) (9.010)
sqrft 0.0175988 0.299251 0.05881 0.9532
2
n = 88 R = 0.672 bdrms 2.17490 33.8881 0.06418 0.9490
yhat^2 0.000353426 0.00709894 0.04979 0.9604
I We form our test regression by adding squares and cubes of ŷ yhat^3 1.54557e-06 6.55431e-06 0.2358 0.8142
into the model above. Test statistic: F = 4.668205,
with p-value = P(F(2,82) > 4.66821) = 0.012

in GRETL: from the menu within estimation results window:

TESTS-RAMSEY’S RESET-SQUARES and CUBES
RESET Test Result: at 5% significance level we reject the null
hypothesis which states that the functional form is correctly
specified. Thus, there is functional form misspecification.

11 12

RESET Test Example: hprice1.gdt RESET Test Example: hprice1.gdt

I Alternative functional form: log-log model (except bdrms)
Auxiliary regression for RESET specification test
OLS, using observations 1-88
lprice = β0 + β1 llotsize + β2 lsqrf t + β3 bdrms + u
Dependent variable: lprice
I Log-log estimation results: coefficient std. error t-ratio p-value
-------------------------------------------------------
\ = −1.297 + 0.17 llotsize + 0.70 lsqrft + 0.037 bdrms
lprice const 87.8849 240.974 0.3647 0.7163
(0.651) (0.038) (0.093) (0.028)
llotsize -4.18098 12.5952 -0.3319 0.7408
2
n = 88 R = 0.643 lsqrft -17.3491 52.4899 -0.3305 0.7418
bdrms -0.925329 2.76975 -0.3341 0.7392
I Now let us calculate RESET test statistic. yhat^2 3.91024 13.0143 0.3005 0.7646
yhat^3 -0.192763 0.752080 -0.2563 0.7984

Test statistic: F = 2.565042,

with p-value = P(F(2,82) > 2.56504) = 0.0831

RESET Test Result: at 5% significance level, we fail to reject the

null hypothesis of correct specification. This indicates that the
functional form is correct. We prefer log-log specification.
13 14

RESET Test Tests Against Nonnested Alternatives

I A drawback with RESET test is that it provides no real I There are several tests for functional form misspecification.
direction on how to proceed if the model is rejected. Consider the following two models:
I Some have argued that RESET is a very general test for
y = β0 + β1 x1 + β2 x2 + u
model misspecification, including unobserved omitted
variables and heteroscedasticity. y = β0 + β1 log(x1 ) + β2 log(x2 ) + u
I This conclusion is misguided. If the omitted variable is linearly
related to the included variables the RESET test has no power
I These are nonnested models. We cannot write one of these
detecting this. model as a special case of the other.
I Also, if the functional form is correct, the RESET test has no
I In this case we cannot use F test.
power for detecting heteroscedasticity. I As long as the dependent variable is the same, two different
I RESET test is just a functional form test. It should not be approaches have been suggested.
used for other purposes. I We can form a bigger model which includes both models as
special cases and use F test. This method is suggested by
Mizon-Richard.

15 16

Tests Against Nonnested Alternatives Using Proxy Variables for Unobserved Explanatory
I The other method is known as the Davidson-MacKinnon test. Variables
This test is based on including the fitted values ŷ from one I Can we use a proxy variable for an omitted unobserved
model into the other model as an additional regressor and explanatory variable?
conducting a t-test. I We know that if the unobserved variable is an important,
I We will not examine these tests in detail. relevant variable then OLS estimators are biased and
I There are several drawbacks associated with nonnested tests. inconsistent.
I First, these tests may not choose a correct specification. Both I The question can be rephrased as follows: Can we solve or at
models could be rejected or neither model could be rejected. least mitigate the omitted variable bias using proxy variables?
I If neither model could be rejected, we can use the adjusted I A Proxy variable is something that is related to the
R-square to choose between them. unobserved variable that we would like to control for.
I Second, rejecting one model does not automatically mean I Example: recall that in the wage equation we could not
that the alternative is correct. The true model may have a observe innate ability. Can we use intelligence quotient (IQ) as
completely different specification. a proxy for ability?
I Third, if the dependent variable is different, for example if one I IQ does not have to be the same thing as ability, we know
has y and the other has log(y) as dependent variables, these they are not. But what we need is for IQ to be correlated with
tests cannot be used. We need to employ more complex ability.
testing procedures which we will not discuss here.
17 18

Using Proxy Variables Using Proxy Variables

I Consider the following model I How can we use x3 to get an unbiased or at least consistent
estimators?
y = β0 + β1 x1 + β2 x2 + β3 x∗3 + u I We can just pretend that x∗3 and x3 are the same and run the
regression of y on x1 , x2 , x3 . This is called plug-in solution
y :log(wage), x1 :educ, x2 : exper, x∗3 :ability (unobserved)
to the omitted variables problem.
I x∗3 : unobserved; x3 : proxy for unobserved variable I How does this approach produce consistent estimators?
I Proxy variable must be related to the unobserved variable, I To show this we need to make some assumptions about the
represented by the following simple regression: error terms u and ν3 .
I The error term, u, is uncorrelated with x1 , x2 and x∗3 . This is
x∗3 = δ0 + δ3 x3 + ν3
the standard MLR.3 assumption.
I We need the error term ν3 because these variables are not I In addition to this, u must be uncorrelated with x3 . Since x3
exactly related. is the proxy variable, it is irrelevant in the population model.
It is x∗3 that affects y not x3 .
I Typically, these variables are positively correlated so that
δ3 > 0. E(u|x1 , x2 , x∗3 , x3 ) = E(u|x1 , x2 , x∗3 ) = 0
I If δ3 = 0 then x3 cannot be a suitable proxy.

19 20

Using Proxy Variables Using Proxy Variables

I The error term ν3 is uncorrelated with x1 , x2 and x3 . I Plugging in x∗3 = δ0 + δ3 x3 + ν3 into the model and
I This can be stated as follows: rearranging we obtain

E(x∗3 |x1 , x2 , x3 ) = E(x∗3 |x3 ) = δ0 + δ3 x3 y = (β0 + β3 δ0 ) + β1 x1 + β2 x2 + β3 δ3 x3 + u + β3 ν3

I This says that once x3 is controlled for the expected value of I Let the composite error term be e = u + β3 ν3
x∗3 does not depend on x1 and x2 .
y = α0 + β1 x1 + β2 x2 + α3 x3 + e
I For example, in the wage equation where IF is the proxy
variable for ability this condition becomes where α0 = (β0 + β3 δ0 ), α3 = β3 δ3
I If the assumptions for the proxy variables are all satisfied then
E(ability|educ, exper, IQ) = E(ability|IQ) = δ0 + δ3 IQ
the composite error term e will be uncorrelated with the
I This implies that the average level of ability only changes with explanatory variables included in the model. Thus, OLS
IQ, not with educ and exper. Is this a reasonable estimators of α0 , β1 , β2 , α3 will be consistent.
assumption? I The coefficient on IQ, α3 , measures the impact of a one point
change in IQ test score on wage.
21
Dependent variable: log(wage)
Using Proxy Variables: Wage2.gdt
I This data set contains information about monthly wages,
education, experience, tenure, IQ scores, and several
demographic characteristics for a sample of 935 working men
in 1980.
I Adding IQ test scores we obtain the following results:
Model 1: OLS, using observations 1–935
Dependent variable: lwage

Coefficient Std. Error t-ratio p-value

const 5.17644 0.128001 40.4407 0.0000
educ 0.0544106 0.00692849 7.8532 0.0000
exper 0.0141458 0.00316510 4.4693 0.0000
tenure 0.0113951 0.00243938 4.6713 0.0000
married 0.199764 0.0388025 5.1482 0.0000
south −0.0801695 0.0262529 −3.0537 0.0023
urban 0.181946 0.0267929 6.7908 0.0000
black −0.143125 0.0394925 −3.6241 0.0003
IQ 0.00355910 0.000991808 3.5885 0.0004
Mean dependent var 6.779004 S.D. dependent var 0.421144
Sum squared resid 122.1203 S.E. of regression 0.363152
R2 0.262809 Adjusted R2 0.256441

23 24

Using Lagged Dependent Variables as Proxy Variables Using Lagged Dependent Variables as Proxy Variables
I In some applications (eg, the wage example) we have at least I Example: CRIME2.gdt, 1987 crime data for 46 cities,
a vague idea about which unobserved factor we want to information in 1982 also available
I The model without the lagged crime rate:
control.
I In other applications, we suspect that one or more of the \ = 3.34 − 0.029 unem87 + 0.203 l lawexpc87
l crmrte87
(1.251) (0.032) (0.173)
independent variables is correlated with an omitted variable, 2
n = 46 R = 0.057
but we have no idea how to obtain a proxy for that omitted
variable. I The model with lagged crime rate:
I In such cases, we can include the value of the dependent
\ = 0.076 + 0.009 unem87 − 0.140 l lawexpc87 + 1.194 l crmrte82
l crmrte87
variable y from an earlier time period, y−1 . (0.821) (0.02) (0.109) (0.132)
I To do this we need the lagged value of the dependent 2
n = 46 R = 0.680
variable. This provides a way of controlling historical factors
I In the first model, crime rate decreases as unemployment
that cause current differences in dependent variable. increases. This is counterintuitive.
I For example, some cities have had high crime rates in the past I After controlling for the crime rate in 1982 (5 years ago)
Many of the unobserved factors contribute to both high coefficient on unem is positive but insignificant.
current and past crime rates. Slowly moving components in I What is the elasticity of the current crime rate to the crime
rate in the previous period?
dependent variable (inertial effects) can be captured by the
lagged value.
25 26

Measurement Errors Measurement Errors in the Dependent Variable

I In some applications, it may be difficult or impossible to I Let y ∗ be the actual value of the dependent variable that we
collect data on actual values of variables. attempt to explain. For concreteness, suppose that y ∗ is the
I If the true value is not observed (in other words we have an actual savings of households.
imprecise measure of a variable) then the observed value will y ∗ = β 0 + β 1 x1 + β 2 x2 + . . . + β k xk + u
contain measurement error.
I y is the observed (or reported) value. The difference between
I For example, income and consumption reported by households the observed value and the actual value is the measurement
may be different than the actual values. They may tend to error in the population
underreport their income level.
e0 = y − y ∗
I In this section, we are interested in the properties of OLS
estimators under measurement errors. I From this we have y ∗ = y − e0 . Plugging this into the model
I We will examine measurement errors in two parts: (1) we obtain:
measurement errors in the dependent variable and (2) y = β0 + β1 x1 + β2 x2 + . . . + βk xk + u + e0
measurement errors in the explanatory variables.
I Now, the error term in the new model is u + e0 . Measurement
I We will learn under what conditions measurement errors lead error is now in the regression error term. Does OLS produce
to inconsistency in OLS estimators. consistent estimators?

27 28

Measurement Errors in the Dependent Variable Measurement Errors in the Dependent Variable: Example
I The model is: I Consider the following savings model:
y = β0 + β1 x1 + β2 x2 + . . . + βk xk + u + e0
| {z } sav ∗ = β0 + β1 inc + β2 size + β3 educ + β4 age + u
I If the measurement error, e0 , is uncorrelated with each xj sav ∗ : actual household savings, sav: reported (observed)
then consistent estimation is possible. If the measurement household savings, inc: annual household income, size:
error is independent from explanatory variables then OLS number of individuals in the household, educ: education level
estimators are unbiased and consistent. of the household head, age: age of the household head.
I If the error term, u and the measurement error e0 are I When the measurement error (sav − sav ∗ ) creates a problem?
independent (this is usually assumed), then we have: I We can assume that the measurement error is uncorrelated
with income, size, education and age.
Var(u + e0 ) = Var(u) + Var(e0 ) = σu2 + σ02 > σu2
I On the other hand, we may think that families with higher
I This means that measurement error in the dependent variable incomes, or more education, report their savings more
results in a larger error variance than when no error occurs. accurately.
I As a result, OLS estimators will have larger variances and I Since we cannot observe measurement error we may never be
standard errors. In this case, we may try to collect more able to determine if the measurement error is correlated with
“quality” data. income or education.
29 30

Measurement Error in Explanatory Variable Measurement Error in Explanatory Variable

I Measurement error in x can lead to more serious problems I Assume that the error term u is uncorrelated with both x∗1
than measurement errors in y. and x1 so that:
I To determine conditions under which OLS estimators become E(y|x∗1 , x1 ) = E(y|x∗1 )
inconsistent let us consider the simple regression model: I This means that after controlling for x∗1 we no longer need x1
in the model.
y = β0 + β1 x∗1 + u
I If we use x1 instead of x∗1 , what are the properties of OLS
Suppose that the first 4 Gauss-Markov assumptions hold. estimators? Are they still consistent?
I Here, x∗1 is the unobserved actual value and x1 is the observed I This depends on the assumption we make about the
value. measurement error.
I Then, the measurement error is I There are two possible assumptions: (1) measurement error is
uncorrelated with x1 .
e1 = x1 − x∗1 I (2) measurement error is uncorrelated with unobserved actual
value, x∗1 .
I Assume that the expected value of the measurement error is
zero: E(e1 ) = 0

31 32

(1) e1 and x1 are uncorrelated (2) e1 and x∗1 are uncorrelated (CEV Assumption)
I This assumption can be written as I This is known as the “Classical Errors-in-Variables (CEV)”. In
the econometrics literature, when we talk about measurement
Cov(x1 , e1 ) = 0 error in explanatory variable we usually mean CEV.
I The CEV assumption can be written as:
I Since e1 = x1 − x∗1 , it must be the case that e1 and x∗1 are
correlated. Cov(x∗1 , e1 ) = 0
I Under this assumption, substituting x∗1 = x1 − e1 in the model I The observed value can be written as the sum of actual value
we obtain: and measurement error:
y = β0 + β1 x1 + (u − β1 e1 )
x1 = x∗1 + e1
I Expected value and variance of the composite error term:
I Obviously, if x∗1 and e1 are uncorrelated, then, x1 and e1 must
E(u − β1 e1 ) = 0, Var(u − β1 e1 ) = σu2 + β12 σe21 be correlated:

I OLS estimators are consistent because the error term and x1 Cov(x1 , e1 ) = E(x1 e1 ) = E(x∗1 e1 ) + E(e21 ) = 0 + σe21 = σe21
are uncorrelated. But the variance will be higher. I Under CEV assumption, the covariance between x1 and e1 is
equal to the variance of the measurement error.
33 34

(2) CEV Assumption: Cov(x∗1 , e1 ) = 0 (2) CEV Assumption: Cov(x∗1 , e1 ) = 0

I Recall that the model was written as: I In the simple regression model, the probability limit of the
OLS estimator of the slope parameter is:
y = β0 + β1 x1 + (u − β1 e1 )

I Since e1 is included in the composite error term, its covariance Cov(x1 , u − β1 e1 )

plim(β̂1 ) = β1 +
with x1 will create a problem. Var(x1 )
I The covariance between composite error term and x1 is β1 σ 2
= β 1 − 2 e1 2
σx∗ + σe1
Cov(x1 , u − β1 e1 ) = −β1 Cov(x1 , e1 ) = −β1 σe21 1
!
σe21
I Because this covariance is not 0, OLS estimators will be = β1 1 − 2
σx∗ + σe21
1
biased and inconsistent under CEV assumption
σx2∗
!
I We can calculate the amount of inconsistency in OLS. = β1 1
σx2∗ + σe21
1

35 36

(2) CEV Assumption: Cov(x∗1 , e1 ) = 0 (2) CEV Assumption: Cov(x∗1 , e1 ) = 0

I Probability limit of the OLS estimator: I Probability limit of the OLS estimator:

σx2∗ σx2∗
! !
1 1
plim(β̂1 ) = β1 6= β1 plim(β̂1 ) = β1 6= β1
σx2∗ + σe21 σx2∗ + σe21
| 1 {z } | 1 {z }
≤1 ≤1

I The term in the parenthesis will always be smaller than 1. If I If the variance of x∗1 is large as compared to the variance of e1
and only if σe21 = 0 then it is 1. then the ratio Var(x∗1 )/Var(x1 ) will be close to 1. In this case
I This means that: β̂1 is always closer to 0 than the true value the amount of inconsistency may not be large. But it is almost
β1 is. This is called attenuation bias. impossible to determine this.
I If β1 > 0 then β̂1 will approach a value smaller than the true
I Things are more complicated when we add more explanatory
value in the limit (underestimation). Otherwise, it will variables.
approach a bigger value (overestimation). I But we can say that measurement errors generally lead to
inconsistency of all OLS estimators.
37 38

(2) CEV Assumption: Cov(x∗1 , e1 ) = 0 Data Problems

I Consider the following model for the college success: I Measurement errors can be viewed as a data problem because
we cannot obtain data on actual variables of interest.
colGP A = β0 + β1 f aminc∗ + β2 hsGP A + β3 SAT + u
I Another data problem that we saw before is multicollinearity
f aminc: Family income, hsGP A: high school GPA, SAT : among the explanatory variables. When two independent
Scholastic Aptitude Test result variables are highly correlated, it can be difficult to estimate
I f aminc∗ is the actual family income. If a questionnaire the partial effect of each reflected by high standard errors.
method is used to collect data then the student will be asked Remember that no assumption is violated in this case.
to report family income. I There may be several other data problems:
I We can collect data on hsGPA and SAT scores from student
I Missing data
records. But we cannot do this for family income levels.
I If the reported income is different from the actual income, and I Nonrandom samples
if the CEV assumption is valid (ie actual income and I Outliers (extreme observations)
measurement error are uncorrelated) then, OLS estimator for
β1 will be biased and inconsistent.
I As a result, the impact of the family income on the college
success will be underestimated (downward bias).

39 40

Missing Data Nonrandom Sampling

I The missing data problem can arise in a variety of forms. For I Violation of MLR.2 Random Sampling. If the missing data
example, in surveys respondent may not answer some of the results in nonrandom sample then we have a more serious
questions. problem.
I If data are missing for an observation on either the dependent I For example, in the wage equation, suppose we want to
variable or one of the independent variables, then the include IQ scores as an explanatory variable.
observation cannot be used in estimation. Econometric I If obtaining an IQ score is easier for those with higher IQs,
software packages usually ignore observations with missing
then the sample is not representative of the population.
data. As a result sample size decreases.
Workers with high IQs will be over-represented in the sample.
I Is there any serious statistical consequences of missing data?
The answer depends on why the data are missing. If the data
I In this case MLR.2 may not hold and thus OLS estimators
are missing at random then this does not cause any bias. The may be biased.
only result is that the sample is reduced and OLS estimates
will be less precise.
I If the data are missing in a systematic way the OLS
estimators may be biased. For example, in the birthweight
example, if the probability that education is missing is higher
for those people with lower than average level of education
then we have systematic missing data. MLR.2 Random
41 42

Nonrandom Sampling Nonrandom Sampling

I Certain types of nonrandom sampling do not cause bias or I If the sample selection is based on the dependent variable, y,
inconsistency. MLR. 2 will not be satisfied which will cause bias in OLS.
I Sample can be chosen on the basis independent variables I This is called endogenous sample selection.
without causing any statistical problems. I Consider the following wealth equation:
I This is called exogenous sample selection.
I For example, consider the following saving equation: wealth = β0 + β1 educ + β2 exper + β3 age + u
saving = β0 + β1 income + β2 age + β3 size + u
I Suppose that only people with wealth below $250,000 are
I If our data set was based on a survey of people over 35 years included in the sample. This a kind of endogenous sample
of age, then we have exogenous sample selection, a type of selection and will result in biased and inconsistent estimators.
nonrandom sampling.
I This is because the population regression
I If the other assumptions are satisfied, then OLS is still
unbiased and consistent. The reason is that conditional E(wealth|educ, exper, age)
expectation
E(saving|income, age, size) is not the same as the expected value conditional wealth being
is the same for any subset of the population described by less than $250,000.
income, age, or size.

43 44

Outliers - Influential Observations Outliers - Influential Observations

I In some applications, (usually but not only in small data sets) I Outliers can occur for two reasons in practice: (1) a mistake
the OLS estimators are sensitive to the inclusion of one or has been made in collecting and entering the data (eg adding
several observations a zero by mistake or misplacing a decimal point), or (2)
I An observation is an influential observation if dropping it from outlier is a feature of the distribution of the variable.
the analysis changes the key OLS estimates by a practically I In practical applications, it may be a good idea to examine
large amount. summary statistics of variables, eg, mean, median, mode,
I An outlier is an unusually large or small values in some minimum, maximum, standard deviation etc.
observations. I It is not very clear what should be done if the outlier is a
I OLS can be sensitive to the outliers because in minimizing feature of the distribution.
SSR, large residuals receive a lot of weight. I Outlying observations can provide important information by
I How can we determine if an observation is outlier/influential increasing the variation in the explanatory variables resulting
observation? in reduced standard errors.
I Usual practice is that OLS results are reported with and
without outlying observations.
45
Outliers: Example
Outliers: Example
I Research and Development (R&D) intensity and firm
performance:

rdintens = β0 + β1 sales + β2 prof marg + u

rdintens: R&D expenditures as percentage of sales; sales:

sales (in millions $); prof marg: profits as a percentage of
sales, %
I Data set: RDCHEM.gdt, estimation results

\ = 2.62 + 0.00005 sales + 0.045 profmarg

rdintens
(0.585) (0.00004) (0.046)
2
n = 32 R = 0.076

I Neither sales nor prof marg is statistically significant at even

the 10% level.
I Are there any outliers? Let us examine the scatter diagram.

48
Outliers: Example
Outliers: Example
I Of the 32 firms, 31 have annual sales less than $20 billion.
One firm has annual sales of nearly $40 billion
I This may be an outlier. Estimation results without outlier:

\ = 2.297 + 0.000186 sales + 0.0478 profmarg

rdintens
(0.592) (0.000084) (0.0445)
2
n = 31 R = 0.1728

I When the largest firm is dropped from the regression, the

coefficient on sales more than triples, and it now has a t
statistic over 2.
I There is a statistically significant relationship between R&D
intensity and sales.
I The profit margin is still insignificant and its coefficient has
not changed much.
49

Outliers
I Certain functional forms may be less sensitive to outlying
observations. Logarithmic transformation significantly narrows
the range of the data that can potentially mitigate the
problems created by outliers. For example, consider the
following model
log(rd) = β0 + β1 log(sales) + β2 prof marg + u
rd: R$D expenditures, $millions
I n = 32 with outlier:
\ = −4.378 + 1.084 log(sales) + 0.023 profmarg
log(rd)
(0.468) (0.060) (0.013)
2
n = 32 R = 0.918
I n = 31 without outlier:
\ = −4.404 + 1.088 log(sales) + 0.0218 profmarg
log(rd)
(0.511) (0.067) (0.013)
2
n = 31 R = 0.9037
I Results are practically the same. Can we reject the null
hypothesis of unit elasticity?

Chapter 9
No ratings yet
Chapter 9
38 pages
As of Oct 24, 2017: Seppo Pynn Onen Econometrics I
No ratings yet
As of Oct 24, 2017: Seppo Pynn Onen Econometrics I
27 pages
CLRM Assumptions and Diagnostics Overview
No ratings yet
CLRM Assumptions and Diagnostics Overview
39 pages
ECON 342 AE Model Specification and Data Problems 2021
No ratings yet
ECON 342 AE Model Specification and Data Problems 2021
43 pages
Advanced Econometrics: OLS & Regression Analysis
No ratings yet
Advanced Econometrics: OLS & Regression Analysis
65 pages
Ramsey RESET Test in Econometrics
No ratings yet
Ramsey RESET Test in Econometrics
4 pages
Unit 5. Model Selection: María José Olmo Jiménez
No ratings yet
Unit 5. Model Selection: María José Olmo Jiménez
15 pages
EC228 f2009
No ratings yet
EC228 f2009
16 pages
CLRM Assumptions & OLS Violations
No ratings yet
CLRM Assumptions & OLS Violations
54 pages
Topic 10 Regression Diagnostic IV Analysis Model Specification Errors
No ratings yet
Topic 10 Regression Diagnostic IV Analysis Model Specification Errors
30 pages
Model Specification & Data Issues
No ratings yet
Model Specification & Data Issues
45 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Intro To Econometrics Latter Half Chanon-1016098-17101310898743
No ratings yet
Intro To Econometrics Latter Half Chanon-1016098-17101310898743
15 pages
Ec226 24-25 Week7 Thursday
No ratings yet
Ec226 24-25 Week7 Thursday
13 pages
Model Specification and Data Problems: 8.1 Functional Form Misspecification
No ratings yet
Model Specification and Data Problems: 8.1 Functional Form Misspecification
9 pages
Econometrics 2021
No ratings yet
Econometrics 2021
9 pages
31 Lecture Slides 29 and 30
No ratings yet
31 Lecture Slides 29 and 30
15 pages
Chapter 4
No ratings yet
Chapter 4
62 pages
07-Functional Form and Functional Adequacy
No ratings yet
07-Functional Form and Functional Adequacy
16 pages
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
No ratings yet
Economics 308: Econometrics Professor Moody: Describing The Relationship Between Two Variables
8 pages
Specification Errors in Regression Analysis
No ratings yet
Specification Errors in Regression Analysis
7 pages
Assignment of Econometrics
No ratings yet
Assignment of Econometrics
12 pages
Misspecification Testing in Linear Regression
No ratings yet
Misspecification Testing in Linear Regression
6 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
35 pages
Tema II (Forma Funcional)
No ratings yet
Tema II (Forma Funcional)
41 pages
Unit 2
No ratings yet
Unit 2
15 pages
Econometric S
No ratings yet
Econometric S
10 pages
Chapter 6
No ratings yet
Chapter 6
20 pages
BEC 340 Econometrics I Course Outline
No ratings yet
BEC 340 Econometrics I Course Outline
6 pages
Lecture 3 - LRM
No ratings yet
Lecture 3 - LRM
40 pages
Tests
No ratings yet
Tests
10 pages
New Section 1
No ratings yet
New Section 1
39 pages
EC501 Lecture 04
No ratings yet
EC501 Lecture 04
30 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Chapter Four
No ratings yet
Chapter Four
47 pages
2A.3 Lecture Slides5 Model Specification
No ratings yet
2A.3 Lecture Slides5 Model Specification
15 pages
Econometrics - Review Sheet ' (Main Concepts)
No ratings yet
Econometrics - Review Sheet ' (Main Concepts)
5 pages
Points For Session 4 - Updated
No ratings yet
Points For Session 4 - Updated
9 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
PS09 2023
No ratings yet
PS09 2023
2 pages
Funtional Form
No ratings yet
Funtional Form
21 pages
Introduction To Econometrics With R
No ratings yet
Introduction To Econometrics With R
18 pages
Ôn Final KTL
No ratings yet
Ôn Final KTL
5 pages
cn4 IV
No ratings yet
cn4 IV
18 pages
Homework 3
No ratings yet
Homework 3
10 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
3 pages
OLS Assumptions and Diagnostics
No ratings yet
OLS Assumptions and Diagnostics
18 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Homework 3
No ratings yet
Homework 3
10 pages
Functional Form and Prediction: OLS Estimation - Assumptions
No ratings yet
Functional Form and Prediction: OLS Estimation - Assumptions
37 pages
Techniques of Statistical Analysis 1 Group 2 2014-15
No ratings yet
Techniques of Statistical Analysis 1 Group 2 2014-15
3 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Model Specification in Econometrics
No ratings yet
Model Specification in Econometrics
52 pages
Chap - 2 - Econometrics I Jonse
No ratings yet
Chap - 2 - Econometrics I Jonse
41 pages
CH - 09 - More On Specification and Data Issues
No ratings yet
CH - 09 - More On Specification and Data Issues
18 pages
Letak ENG
No ratings yet
Letak ENG
1 page
Wto Dispute
No ratings yet
Wto Dispute
26 pages
Latex Snippets
No ratings yet
Latex Snippets
10 pages
The University of Chicago Press
No ratings yet
The University of Chicago Press
14 pages
Endogenous Regional Development
No ratings yet
Endogenous Regional Development
352 pages
Brent. Applied Cost-Benefit Analysis PDF
100% (2)
Brent. Applied Cost-Benefit Analysis PDF
493 pages
Deep Regional Integrations and NTM
No ratings yet
Deep Regional Integrations and NTM
37 pages
Principle of Territoriality: Eur-Med Origin Models
No ratings yet
Principle of Territoriality: Eur-Med Origin Models
2 pages
Univariate Time Series Analysis With Matlab - M. Perez
No ratings yet
Univariate Time Series Analysis With Matlab - M. Perez
147 pages
Walrasian Exercises
No ratings yet
Walrasian Exercises
6 pages
LSDV Bias in Unbalanced Panel Data
No ratings yet
LSDV Bias in Unbalanced Panel Data
6 pages
Convergence Report M Ay 2 0 0 6: ISSN 1725931-2
No ratings yet
Convergence Report M Ay 2 0 0 6: ISSN 1725931-2
88 pages
Trees
No ratings yet
Trees
78 pages
Nitish Kumar Bharti Resume
No ratings yet
Nitish Kumar Bharti Resume
3 pages
CS3491 AIML Lab Viva QA Full
No ratings yet
CS3491 AIML Lab Viva QA Full
4 pages
Significant Figures in Arithmetic Operations
No ratings yet
Significant Figures in Arithmetic Operations
5 pages
Deepcoder: A Deep Neural Network Based Video Compression
No ratings yet
Deepcoder: A Deep Neural Network Based Video Compression
4 pages
Lab 4 - Chaos in Discrete Logistic Equation
No ratings yet
Lab 4 - Chaos in Discrete Logistic Equation
4 pages
Data Mining Concepts Models and Techniques 1st Edition by Florin Gorunescu ISBN 3642197213 9783642197215 ebook digital version 2025
100% (5)
Data Mining Concepts Models and Techniques 1st Edition by Florin Gorunescu ISBN 3642197213 9783642197215 ebook digital version 2025
91 pages
Data Structures and Algorithms Syllabus
80% (5)
Data Structures and Algorithms Syllabus
610 pages
Two-Way Anova
No ratings yet
Two-Way Anova
13 pages
Graham Scan Algorithm for Convex Hulls
No ratings yet
Graham Scan Algorithm for Convex Hulls
41 pages
Lab 1-9
No ratings yet
Lab 1-9
21 pages
Awesome Again
No ratings yet
Awesome Again
4 pages
Exercise Sheet 7 Mathematics of Data Science
No ratings yet
Exercise Sheet 7 Mathematics of Data Science
2 pages
CSL100 Lec1 Algorithms Binary
No ratings yet
CSL100 Lec1 Algorithms Binary
56 pages
PS Lab-4
No ratings yet
PS Lab-4
4 pages
Physucs prjct-1
No ratings yet
Physucs prjct-1
33 pages
Mind The Gap Sample
No ratings yet
Mind The Gap Sample
8 pages
Cost Estimation
No ratings yet
Cost Estimation
16 pages
An Implementation of K-Means Clustering For Efficient Image Segmentation
No ratings yet
An Implementation of K-Means Clustering For Efficient Image Segmentation
10 pages
Reinforcement Learning Based EV Charging Management Systems-A Review
No ratings yet
Reinforcement Learning Based EV Charging Management Systems-A Review
27 pages
Probability Concepts Tutorial
No ratings yet
Probability Concepts Tutorial
2 pages
Week 06 Assignment
No ratings yet
Week 06 Assignment
3 pages
Deep Learning Project Evaluation
No ratings yet
Deep Learning Project Evaluation
2 pages
B. Teçh. (Third Semester) Examination,: April-May 2024
No ratings yet
B. Teçh. (Third Semester) Examination,: April-May 2024
6 pages
Q & A - Module-1 (Bcs602) - Machine Learning
No ratings yet
Q & A - Module-1 (Bcs602) - Machine Learning
58 pages
TLS Crypto
100% (1)
TLS Crypto
15 pages
Deep Learning Goodfellow I. Download
100% (9)
Deep Learning Goodfellow I. Download
62 pages
Ingersoll Theory of Financial Decision Making Chapter 4
No ratings yet
Ingersoll Theory of Financial Decision Making Chapter 4
26 pages
Probability Density Function (PDF) - Definition, Formula, Graph, Example
No ratings yet
Probability Density Function (PDF) - Definition, Formula, Graph, Example
12 pages
Digital Circuits Problem Solving
No ratings yet
Digital Circuits Problem Solving
9 pages

M Is Specification

Uploaded by

M Is Specification

Uploaded by

1 2

Model Specification and Data Problems

Model Specification and Data Problems Functional Form Misspecification

Functional Form Misspecification A Test for Functional Form Misspecification

Ramsey’s RESET Test RESET Test

I Level-level estimation results: coefficient std. error t-ratio p-value

in GRETL: from the menu within estimation results window:

RESET Test Example: hprice1.gdt RESET Test Example: hprice1.gdt

Test statistic: F = 2.565042,

RESET Test Result: at 5% significance level, we fail to reject the

RESET Test Tests Against Nonnested Alternatives

Using Proxy Variables Using Proxy Variables

Using Proxy Variables Using Proxy Variables

E(x∗3 |x1 , x2 , x3 ) = E(x∗3 |x3 ) = δ0 + δ3 x3 y = (β0 + β3 δ0 ) + β1 x1 + β2 x2 + β3 δ3 x3 + u + β3 ν3

Coefficient Std. Error t-ratio p-value

Measurement Errors Measurement Errors in the Dependent Variable

Measurement Error in Explanatory Variable Measurement Error in Explanatory Variable

(2) CEV Assumption: Cov(x∗1 , e1 ) = 0 (2) CEV Assumption: Cov(x∗1 , e1 ) = 0

I Since e1 is included in the composite error term, its covariance Cov(x1 , u − β1 e1 )

(2) CEV Assumption: Cov(x∗1 , e1 ) = 0 (2) CEV Assumption: Cov(x∗1 , e1 ) = 0

(2) CEV Assumption: Cov(x∗1 , e1 ) = 0 Data Problems

Missing Data Nonrandom Sampling

Nonrandom Sampling Nonrandom Sampling

Outliers - Influential Observations Outliers - Influential Observations

rdintens = β0 + β1 sales + β2 prof marg + u

rdintens: R&D expenditures as percentage of sales; sales:

\ = 2.62 + 0.00005 sales + 0.045 profmarg

I Neither sales nor prof marg is statistically significant at even

\ = 2.297 + 0.000186 sales + 0.0478 profmarg

I When the largest firm is dropped from the regression, the

You might also like