Multiple Linear Regression
Multiple Linear Regression
1
Multiple Linear Regression (1)
The Simple Linear Regression appears to be an intuitive and rather simple statistical tool, but its
limitation is quite evident: there are usually more factors that could affect “Y” and are possibly
related to “X” as well.
A simple regression does not allow for a “ceteris paribus” interpretation of the effect of X on Y!!
- The regression framework can be easily generalized to the inclusion of a certain (fixed) number of
observed covariates.
What is the effect of X on Y, keeping constant all the other (observed) conditional factors of Y?
This kind of questions can be approached with a Multiple Linear Regression.
- First, we will present the mechanics and interpretation of the OLS estimator for Multiple Linear Regression
(i.e. a regression that includes more explanatory variable at the same time).
- Second, we will define the underlying assumptions under which the OLS estimator represents a good
approximation of the statistical relationship under investigation and the properties that OLS has if such
conditions are satisfied.
- Third, we will derive the tools that can be employed to carry out statistical inference from the model’s
estimate (which are also valid under some “additional” assumptions).
2
Multiple Linear Regression (2)
Definitions and notation:
- yi = dependent (endogenous) variable
- Xi = (1, x1i, x2i, x3i, …., xki) = vector of explanatory (exogenous) variables (k + 1, the intercept)
- β = (α, β1, β2, β3, ...., βk) = vector of parameters to be estimated (intercept and slopes for each X)
- εi = error term (random disturbance)
Matrix notation: 𝑦𝑖 = 𝛽′ 𝑋𝑖 + 𝜀𝑖
𝑆𝑆𝑅 = (𝑌𝑖 −𝛼ො − 𝛽መ1 𝑥1 − 𝛽መ2 𝑥2 −𝛽መ3 𝑥3 − ⋯ − 𝛽መ𝑘 𝑥𝑘 )2 = (𝜀𝑖Ƹ )2 = (𝑌𝑖 − 𝛽′𝑋
መ 𝑖 )′(𝑌𝑖 − 𝛽′𝑋
መ 𝑖)
𝑖=1 𝑖=1
5
Multiple Linear Regression (5)
What is really done by OLS in a multiple regression? Graphical evidence with simulated data
𝑦𝑖 = 2 + 0.5𝑥1𝑖 + 1.2𝑥2𝑖 + 𝜀𝑖 ; 𝜀𝑖 ~𝑁 0,1 ; 𝑥1𝑖 ~𝑁 5,2 ; 𝑥2𝑖 ~𝑈(1,50)
6
Multiple Linear Regression (6)
What is really done by OLS in a multiple regression? Graphical evidence with simulated data
𝑦𝑖 = 2 + 0.5𝑥1𝑖 + 1.2𝑥2𝑖 + 𝜀𝑖 ; 𝜀𝑖 ~𝑁 0,1 ; 𝑥1𝑖 ~𝑁 5,2 ; 𝑥2𝑖 ~𝑈(1,50)
7
Multiple Linear Regression (7)
The solution of the OLS optimization problem provides, by construction, the best linear
approximation of “Y” from x1 to xk plus a constant (i.e. 𝛽መ = 𝑋 ′ 𝑋 −1 (𝑋′𝑌)).
- We start by considering that the (linear) relationship between Y and the Xs (𝑦𝑖 = 𝛽 ′ 𝑋𝑖 + 𝜀𝑖 ) is valid for any
well-defined population of interest (i.e. all household of a country, all firms of a given industry, etc.).
- This linear equation is assumed to be valid for any possible observation, while we only observe a sample of
“n” observations that is a single possible realization of all possible samples of the same size that could have
been drawn from the same population:
In this case, Y, X and ε are random variables 𝑦𝑖 = 𝛽 ′ 𝑋𝑖 + 𝜀𝑖 becomes a statistical model!
β is a vector of unknown parameters that characterize the population of interest.
In this setting, the statistical model is tautological without further assumptions (i.e. for any value of β, we
can always define ε such that the linear statistical model that explains Y holds).
8
Multiple Linear Regression (8)
The coefficients’ vector obtained by OLS (𝛽መ = 𝑋 ′ 𝑋 −1 (𝑋′𝑌)) represents the estimates of the
true populational parameters’ vector (β).
OLS is thus an estimator: it represents the rule that says how a given sample is translated into an
approximate value of β.
The OLS estimator is itself a random variable (i.e. a new sample means a new estimate):
- Because the sample is randomly drawn from a larger population.
- Because the data are generated by some random process.
How good is the OLS estimator to represent the true value of the unknown “betas” depends on
the assumptions that we are willing to make.
Given the assumptions that will be made (see next), we can evaluate the quality of OLS as an
estimator based on the properties that is has (that finally depends on the validity of the underlying
assumptions).
9
Multiple Linear Regression (9)
Classical OLS Assumptions (Gauss-Markov Assumptions):
1) Conditional Exogeneity (independence between the explanatory variables and the error term):
𝐸 𝜀𝑖 𝑋𝑖 = 0֜𝐸 𝜀𝑖 = 0; 𝐸 𝑋, 𝜀 = 0
This assumptions states that the expected value of ε is 0 for any value of the Xs.
It implies that the expected value of ε is zero and that ε is not correlated with the Xs (notice that the opposite
could not be true).
When is this assumption not satisfied? Three general cases: *
Intuitively, taken together, assumptions 1-3 mean that the matrix of regressor values X does not provide any
information about the first and second moments of the distribution of the unobservables (ε).
10
Multiple Linear Regression (10)
Classical OLS Assumptions (Gauss-Markov Assumptions):
4) Linearity of the conditional expectation of Y (linear function of the parameters):
𝑘
Assumption 4 together with assumption 1 indicate that the multiple linear regression has a “ceteris paribus”
interpretation.
- In general, we would be able to test the validity of assumptions 2, 3 and 5. The failure of assumptions 2 and 3 can be
generally accommodated (using alternative estimators, e.g. Instrumental Variables/Two Stages Least Squares).
- Assumption 4 is generally valid, or at least can be taken as a valid approximation (although non-linear methods also exist).
- Assumption 1 is generally untestable (although an imperfect test can be done when comparing OLS with alternative
estimators).
11
Multiple Linear Regression (11)
What are the properties of the OLS estimators under the Gauss-Markov Hypothesis?
i.e. how good is OLS to explain the reality when assumptions 1-5 are satisfied?
Let’s consider first the properties of OLS for a small (finite) sample.
𝑌
𝐸 𝛽መ = 𝐸 𝑋 ′ 𝑋 −1
𝑋′𝑌 =𝐸 𝑋′𝑋 −1
𝑋 ′ 𝛽′ 𝑋 + 𝜀 = 𝛽 + 𝐸 𝑋′𝑋 −1
𝑋′𝜀 = 𝐸 𝑋′𝑋 −1
𝑋 ′ · 𝐸(𝜀)
֜𝐸 𝛽መ = 𝛽 𝑖𝑓 𝐸 𝜀, 𝑋 = 0 (𝑢𝑛𝑑𝑒𝑟 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 1)
If E[ε|X] = E[ε,X] = E[ε] = 0, the OLS estimator of the unknown parameters β is unbiased, which means that
on average, in repeated sampling, the OLS estimator is equal to the populational parameters.
Notice that assumptions 2 and 3 play no role in the “correctness” of the OLS estimator. This means that OLS
is unbiased even when assumption 2 and 3 fail.
12
Multiple Linear Regression (12)
In addition to knowing that, on average, the OLS estimator is correct, we would also like to
make statements about how (un)likely it is to be far off in a given sample.
How can we characterize the distribution of the OLS estimator?
What is the degree of precision of 𝛽መ in providing a “correct” estimation of the true relationship of interest?
= 𝑋′𝑋 −1
𝑋 ′ 𝐸 𝜀𝜀 ′ 𝑋 𝑋 𝑋 ′ 𝑋 −1
= 𝐸 𝜀𝜀 ′ 𝑋 𝑋 ′ 𝑋 −1
= 𝜎 2 𝑋′𝑋 −1
(𝑢𝑛𝑑𝑒𝑟 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 2 − 3)
13
Multiple Linear Regression (13)
Notice that the previous result involves the variance of the error term (ε), which is by
definition unknown (since the errors are unobservable).
In order to obtain an (unbiased) estimation of the variance of the OLS coefficients (under assumptions 1-5),
we use the sample variance of the residuals (𝜀):
Ƹ
𝜀′Ƹ 𝑖 𝜀𝑖Ƹ (𝑌𝑖 − መ 𝑖 )′(𝑌𝑖 − 𝛽′𝑋
𝛽′𝑋 መ 𝑖) σ𝑛𝑖=1 𝜀𝑖Ƹ 2
𝑉𝑎𝑟
𝜀𝑖 𝑋𝑖 = 𝑉𝑎𝑟 𝜀𝑖Ƹ 𝑋𝑖 = = = = 𝜎ො 2
𝑛 − (𝑘 + 1) 𝑛 − (𝑘 + 1) 𝑛 − (𝑘 + 1)
- Notice that the estimator of the error term’s variance has a “degrees of freedom” correction, since the
standard formula of the variance would provide a biased estimation of σ2.
Substituting the estimated variance of the errors into the formula of the coefficients’ variance yields:
σ𝑛𝑖=1 𝜀𝑖Ƹ 2 𝑆𝑆𝑅
𝑉𝑎𝑟 𝛽መ 𝑋 = 𝜎ො 2 𝑋 ′ 𝑋 −1
= · 𝑋′𝑋 −1
= · 𝑋′𝑋 −1
𝑛 − (𝑘 + 1) 𝑛 − (𝑘 + 1)
- This is the so-called Variance-Covariance Matrix of the OLS Coefficients.
- The squared values of the main diagonal of this matrix represent the Coefficients’ Standard Errors.
- The elements outside the main diagonal represent the covariances between different coefficients.
14
Multiple Linear Regression (14)
The components of the OLS Coefficients’ Variance.
It is possible to analyze the factors that affect the variance of the estimated coefficients:
𝜎ො𝑢 𝜎ො𝑢
𝑉𝑎𝑟 𝛽መ 𝑋 = 𝜎ො 2 𝑋 ′ 𝑋 −1
֜ 𝑉𝑎𝑟 𝛽መ𝑗 = =
𝑆𝑆𝑇𝑗 (1 − 𝑅𝑗2 ) (σ𝑁 2 2
𝑖=1(𝑥𝑖𝑗 − 𝑥𝑗ҧ ) )(1 − 𝑅𝑗 )
- The variance of 𝛽መ𝑗 increases with the estimated error’s variance (𝜎ො𝑢 ).
The more “noise” in the equation that explains y, the more imprecision in the estimation of the coefficients.
- The variance of 𝛽መ𝑗 decreases with the amount of variation in xj (𝑆𝑆𝑇𝑗 ).
More variability in the explanatory variable(s) and larger samples (i.e. more degrees of freedom, n-(k+1)) reduce the
variance of the coefficients.
- The variance of 𝛽መ𝑗 increases with the relationship between xj and the other explanatory variables (𝑅𝑗2 ).
If the explanatory variables are excessively correlated (multicollinearity), their coefficients are imprecisely estimated.
Overall, an excessive variance in the estimates coefficients means less precise estimators, which
translates into larger confidence intervals and less accurate hypothesis testing (see later). 15
Multiple Linear Regression (15)
Overall, what can be said about the OLS estimator under the Gauss-Markov assumptions?
- In finite samples (i.e. regardless of sample size), the Gauss-Markov Theorem states that the OLS estimator is
BLUE: the Best Linear Unbiased Estimator.
𝛽መ is the estimator the most accurate one, since it has the lowest possible variance (i.e. is the most precise),
among the ones that are unbiased and linear.
This result is quite useful, especially because it can be used as benchmark for a) cases in which any of the
underlying hypothesis fail or b) other estimators that can be alternative to OLS.
𝑦𝑖 = 𝛼 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜀𝑖 ֜ 𝑦ො𝑖 = 𝐸 𝑦𝑖 𝑋𝑖 = 𝛼ො + 𝛽መ1 𝑥1𝑖 + 𝛽መ2 𝑥2𝑖 + 𝐸[𝜀𝑖 ] = 𝛼ො + 𝛽መ1 𝑥1𝑖 + 𝛽መ2 𝑥2𝑖
17
Multiple Linear Regression (17)
Can we use the regression model to “predict” the values of the dependent variable under
different scenarios?
- As for the estimated coefficient, it is possible to evaluate the precision with which the model predicts the
outcome for specific values of the regressors by deriving its variance.
መ ∗ = 𝑋 ∗ ′ 𝑉𝑎𝑟 𝛽መ 𝑋 ∗ = 𝑋 ∗ ′ 𝑉𝑎𝑟 𝛽መ 𝑋 ∗ = 𝜎 2 𝑋 ∗ ′ 𝑋 ′ 𝑋
𝑉𝑎𝑟 𝑦ො ∗ = 𝑉𝑎𝑟 𝛽′𝑋 −1
𝑋∗
- However, the variance of the prediction is only an indication of the variation in the predictor if different
መ
samples were drawn (i.e. the variation in the predictor owing to variation in 𝛽).
In order to better appreciate how accurate the prediction is, we need to compute the variance of the
prediction error (𝑦ො𝑖 -𝑦𝑖 ).
መ ∗ − 𝛽 ′ 𝑋 ∗ − 𝜀 ∗ = 𝑋 ∗ ′ 𝛽መ − 𝛽 − 𝜀 ∗ ֜ 𝑉𝑎𝑟 𝑦ො𝑖 − 𝑦𝑖 = 𝜎 2 + 𝜎 2 𝑋 ∗ ′ 𝑋 ′ 𝑋
𝑦ො𝑖 − 𝑦𝑖 = 𝛽′𝑋 −1
𝑋∗
Notice that in a simple regression model the last formula simplifies to:
1 (𝑥 ∗ − 𝑥)ҧ 2
֜ 𝑉𝑎𝑟 𝑦ො𝑖 − 𝑦𝑖 = 𝜎 2 + 𝜎 2 + Intuition: the further the value of x* is from its sample
𝑛 σ𝑖(𝑥𝑖 − 𝑥)ҧ 2
mean, the larger will be the variance of the predictor.
18
Multiple Linear Regression (18)
An additional hypothesis (useful but not trivial): Normality of the Error Term
- So far we “only” assumed that the error terms (εi) are independent on the Xs, are mutually uncorrelated and
have constant variance. Therefore, we haven’t established any assumption about the “shape” of the error
terms’ distribution.
- For finite sample, an additional is needed for the purpose of carrying out statistical inference from the
regression model (in finite samples): Joint Normality of the Error Terms.
𝜀𝑖 ~𝑁𝐼𝐷(0, 𝜎 2 )
Notice that if the error terms are normally distributed, this means that also the conditional distribution of y
(given Xs) follows a normal distribution.
- What is the implication of this hypothesis in terms of the distribution of the beta coefficients?
መ
𝜀𝑖 ~𝑁𝐼𝐷(0, 𝜎 2 ) ֜ 𝛽~𝑁𝐼𝐷(𝛽, 𝜎 2 (𝑋 ′ 𝑋)−1 )
This result actually provides the basis for carrying out statistical inference (i.e. hypothesis testing) from the
Multiple Linear Regression Model (see later).
19
Multiple Linear Regression (18)
Asymptotic properties of the OLS estimator
- We now know what are the properties of the OLS estimator for finite (i.e. small) samples, which essentially depend
on the assumptions we made about the error term (ε).
What happens to OLS when sample size grows, hypothetically, infinitively large (i.e. 𝑛 → ∞)?
- Let’s consider the so-called Asymptotic Properties of the OLS estimator, which are derived using the standard results
from the Asymptotic Theory.
Specifically, it is interesting to analyze whether it is possible to “relax” some of the underlying assumptions (1-5)
when sample size goes to infinity.
- Consistency of the OLS estimator:
𝑝 lim 𝛽መ𝑘 − 𝛽𝑘 > 𝛿 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝛿 > 0 ≡ 𝑝 lim 𝛽መ𝑘 = 𝛽𝑘
𝑛→∞ 𝑛→∞
It is possible to show that the consistency property is satisfied is E[ε,X] = 0, which is a weaker hypothesis than (1)
E[ε|X] = 0 (i.e. The latter implies the former, but the opposite is not necessarily true).
21
Statistical Inference (2)
Hypothesis testing for single coefficients.
- Given that if the error terms are normally distributed, the estimates coefficients would be also normally
መ
distributed (𝛽~𝑁𝐼𝐷(𝛽, 𝜎 2 (𝑋 ′ 𝑋)−1 )), it is possible to construct a t-Statistic for the single unknown populational
parameter βk.
𝛽መ𝑘 − 𝛽𝑘0
𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝛽መ𝑘 = 𝑡𝑘 =
𝑠. 𝑒. 𝛽መ1
- Two-sided test:
𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 (𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑡𝑟𝑢𝑒) 𝐻0 : 𝛽𝑘 = 𝛽𝑘0
֜ 𝑃𝑟𝑜𝑏 |𝑡𝑘 | > 𝑡𝑛− 𝑘+1 ;𝛼 Τ2 = 𝛼 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 .
𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻1 : 𝛽𝑘 ≠ 𝛽𝑘0
Reject H0 if |tk|> 𝑡𝑛− 𝑘+1 ;𝛼 Τ2 or P-value < α (α usually 0.1, 0.05, 0.01 but always chosen by the researcher).
One-side test:
𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑡𝑟𝑢𝑒 𝐻0 : 𝛽𝑘 ≤ 𝛽𝑘0
0 ֜ 𝑃𝑟𝑜𝑏 𝑡𝑘 > 𝑡𝑛− 𝑘+1 ;𝛼 = 𝛼 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 .
𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻1 : 𝛽𝑘 > 𝛽𝑘
Reject H0 if |tk|> 𝑡𝑛− 𝑘+1 ;𝛼 or P-value < α (α usually 0.1, 0.05, 0.01 but always chosen by the researcher).
22
Statistical Inference (3)
Hypothesis testing for single coefficients.
- The most common (two-sided) test involves the statistical significance of the estimated βk
Reject H0 (i.e. 𝛽መ𝑘 is statistically different from 0, or is said to be significant) if |tk|> 𝑡𝑛− 𝑘+1 ;𝛼 Τ2 or P-value < α.
- Example:
Estimated model (s.e. within parenthesis),n = 1000: 𝑦𝑖 = 2 + 0.6
ด 𝑥1𝑖 + 1.3
ด 𝑥3𝑖 + 𝜀𝑖Ƹ
0.15 0.9
𝐻0 : 𝛽1 = 0 0.6
መ
֜ 𝑡 − 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝛽1 = 𝑡𝑘 = = 4 > 1.98472 = 𝑡 0.05 ֜ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
𝐻1 : 𝛽1 ≠ 0 0.15 100−3;
2
24
Statistical Inference (5)
25
Statistical Inference (6)
26
Statistical Inference (3)
Confidence Intervals of estimated coefficients
The Confidence Interval (CI) provides the interval of all values of 𝛽𝑘0 for which the null hypothesis 𝐻0 : 𝛽𝑘 = 𝛽𝑘0
would not be rejected (i.e. range of values of the true 𝛽𝑘 that are not unlikely given the data).
𝛽መ𝑘 − 𝛽𝑘
−𝑡𝑛− 𝑘+1 ; 𝛼 Τ2 < < 𝑡𝑛− መ −𝑡𝑛−
𝑘+1 ;𝛼 Τ2 ֜ 𝛽𝑘 𝑘+1 ;𝛼 Τ2 · 𝑠. 𝑒. 𝛽መ1 < 𝛽መ𝑘 < 𝛽መ𝑘 + 𝑡𝑛− 𝑘+1 ;𝛼 Τ2 · 𝑠. 𝑒. 𝛽መ1
መ
𝑠. 𝑒. 𝛽1
𝐿𝑜𝑤𝑒𝑟 𝐵𝑜𝑢𝑛𝑑 𝑜𝑓 𝑡ℎ𝑒 𝐶𝐼 𝑈𝑝𝑝𝑒𝑟 𝐵𝑜𝑢𝑛𝑑 𝑜𝑓 𝑡ℎ𝑒 𝐶𝐼
መ ∗ ± 𝑡𝑛−
Similarly, the Confidence Interval of the Prediction is: 𝐶𝐼 𝑦ො𝑖 = 𝛽′𝑋 𝑘+1 ;𝛼 Τ2 · 𝜎ො 1 + 𝑋 ∗ ′(𝑋 ′ 𝑋)−1 𝑋 ∗
- Example:
Estimated model (s.e. within parenthesis),n = 1000: 𝑦𝑖 = 2 + 0.6
ด 𝑥1𝑖 + 1.3
ด 𝑥3𝑖 + 𝜀𝑖Ƹ
0.15 0.9
27
Statistical Inference (7)
Hypothesis testing for linear combinations of coefficients.
- We are often interested in testing the statistical validity of restrictions based on linear combinations of
coefficients, for example:
𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 (𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑡𝑟𝑢𝑒) 𝐻0 : 𝛽1 = 𝛽2 (≡ 𝛽1 − 𝛽2 = 0)
.
𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻1 : 𝛽1 ≠ 𝛽2 (≡ 𝛽1 − 𝛽2 ≠ 0)
We would need 𝐶𝑜𝑣(𝛽መ1 , 𝛽መ2 ), which can be retrieved from the variance-covariance matrix of the coefficients:
𝑉𝑎𝑟(𝛽0 ) ⋯ 𝐶𝑜𝑣(𝛽0 , 𝛽𝑘 )
2 ′ −1 ′ −1
𝑉𝑎𝑟 𝛽 = 𝜎ො (𝑋 𝑋) = 𝜀′Ƹ 𝜀ƸΤ(𝑛 − (𝑘 + 1)) (𝑋 𝑋) = ⋮ ⋱ ⋮
𝐶𝑜𝑣(𝛽𝑘 , 𝛽1 ) ⋯ 𝑉𝑎𝑟(𝛽𝑘 )
29
Statistical Inference (9)
- Example: 𝑦𝑖 = 𝛼 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜀𝑖
𝐻0 : 𝛽1 = 𝛽2 (≡ 𝛽1 − 𝛽2 = 0) 𝐻 : 𝜃=0
֜𝛽1 − 𝛽2 = 𝜃 ֜ 0
𝐻1 : 𝛽1 ≠ 𝛽2 (≡ 𝛽1 − 𝛽2 ≠ 0) 𝐻1 : 𝜃 ≠ 0
- In order to reparametrize the model and obtain an expression that enables testing for 𝐻0 : 𝜃 = 0, we need to
create a new variable like 𝑧𝑖 = 𝑥1𝑖 + 𝑥2𝑖 , that is z is the sum of x1 and x2.
- The new model to be estimated becomes:
𝑧𝑖
𝑦𝑖 = 𝛼 + 𝜃𝑥1𝑖 + 𝛽2 𝑧𝑖 + 𝜀𝑖 = 𝛼 + 𝜃𝑥1𝑖 + 𝛽2 (𝑥1𝑖 + 𝑥2𝑖 ) + 𝜀𝑖
From this new model, testing 𝐻0 : 𝜃 = 0 is equivalent to test for the null hypothesis 𝐻0 : 𝛽1 = 𝛽2 in the original
model, because 𝛽1 = 𝜃 + 𝛽2 , therefore:
𝑧𝑖
𝑦𝑖 = 𝛼 + 𝜃𝑥1𝑖 + 𝛽2 𝑧𝑖 + 𝜀𝑖 = 𝛼 + 𝜃𝑥1𝑖 + 𝛽2 𝑥1𝑖 + 𝑥2𝑖 + 𝜀𝑖 = 𝛼 + 𝜃𝑥1𝑖 + 𝛽2 𝑥1𝑖 + 𝛽2 𝑥2𝑖 +𝜀𝑖
𝜃 𝑧𝑖
֜ 𝑦𝑖 = 𝛼 + 𝜃 + 𝛽2 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜀𝑖 ≡ 𝛼 + (𝛽1 − 𝛽2 )𝑥1𝑖 + 𝛽2 𝑥1𝑖 + 𝑥2𝑖 + 𝜀𝑖
Therefore, the t-statistic for 𝐻0 : 𝛽1 = 𝛽2 using method 1) is equivalent to the t-statistic for 𝐻0 : 𝜃 = 0 in the
equation 𝑦𝑖 = 𝛼 + 𝜃𝑥1𝑖 + 𝛽2 𝑧𝑖 + 𝜀𝑖 , with 𝑧𝑖 = 𝑥1𝑖 + 𝑥2𝑖 .
30
Statistical Inference (10)
Hypothesis testing for linear combinations of coefficients.
- We are often interested in testing the statistical validity of restrictions based on linear combinations of coefficients,
for example:
𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 (𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑡𝑟𝑢𝑒) 𝐻0 : 𝛽1 = 𝛽2 (≡ 𝛽1 − 𝛽2 = 0)
.
𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻1 : 𝛽1 ≠ 𝛽2 (≡ 𝛽1 − 𝛽2 ≠ 0)
There are several ways to approach this kind of tests:
3) Constructing an F-Statistic for (multiple) linear hypotheses:
2
- Obtain the Sum of Squared Residuals (or the R2) from the Unrestricted Model (𝑆𝑆𝑅𝑈𝑅 /𝑅𝑈𝑅 ):
2
𝑦𝑖 = 𝛼 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝜀𝑖 𝑈𝑁𝑅𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ֜ 𝑆𝑆𝑅𝑈𝑅 /𝑅𝑈𝑅
- Obtain the Sum of Squared Residuals (or the R2) from the Restricted Model (𝑆𝑆𝑅𝑅 /𝑅𝑅2 ) that incorporates the
restriction that 𝛽1 = 𝛽2 = 𝛽:
𝑧𝑖
𝑦𝑖 = 𝛼 + 𝛽𝑥1𝑖 + 𝛽𝑥2𝑖 + 𝜀𝑖 ֜ 𝑦𝑖 = 𝛼 + 𝛽 𝑥2𝑖 + 𝑥1𝑖 + 𝜀𝑖 = 𝛼 + 𝛽𝑧𝑖 + 𝜀𝑖 𝑅𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 ֜ 𝑆𝑆𝑅𝑅 /𝑅𝑅2
2
(𝑆𝑆𝑅𝑅 − 𝑆𝑆𝑅𝑈𝑅 )Τ𝑞 (𝑅𝑈𝑅 − 𝑅𝑅2 )Τ𝑞 𝐹 > 𝐹𝑞,𝑛− 𝑘+1 ; 𝛼
֜𝐹 = = 2 Τ ֜ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 ቊ
𝑆𝑆𝑅𝑈𝑅 Τ(𝑛 − (𝑘 + 1)) (1 − 𝑅𝑈𝑅 ) (𝑛 − (𝑘 + 1)) 𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼
where q = number of restrictions, n = number of observation and k+1 = number of coefficients to be estimated.
This F-statistic is equivalent to the square of the t-statistic obtained from methods 1) or 2) (i.e. in general, F = t2 when
the test involves a single restriction). 31
Statistical Inference (11)
Joint Significance of the Estimated Coefficients.
- The F test can be also used to analyze the joint significance of all (or a subset of) the estimated coefficients:
𝑦𝑖 = 𝛼 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + 𝜀𝑖
𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑡𝑜 𝑏𝑒 𝑡𝑟𝑢𝑒 𝐻0 : 𝛽1 = 𝛽2 = 𝛽3
𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻1 : 𝛽1 ≠ 0; 𝛽2 ≠ 0; 𝛽3 ≠ 0;
2
Unrestricted model: 𝑦𝑖 = 𝛼 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + 𝜀𝑖 ֜ 𝑆𝑆𝑅 𝑈𝑅 /𝑅𝑈𝑅
Restricted model: 𝑦𝑖 = 𝛼 + 𝑢𝑖 ֜ 𝑆𝑆𝑅 𝑅 (𝑛𝑜𝑡𝑖𝑐𝑒 𝑡ℎ𝑎𝑡 𝑅𝑅2 = 0)
2 Τ
(𝑆𝑆𝑅𝑅 − 𝑆𝑆𝑅𝑈𝑅 )Τ𝑞 𝑅𝑈𝑅 𝑞 𝐹 > 𝐹𝑞,𝑛− 𝑘+1 ; 𝛼
֜𝐹 = = 2 Τ ֜ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 ቊ
𝑆𝑆𝑅𝑈𝑅 Τ(𝑛 − (𝑘 + 1)) (1 − 𝑅𝑈𝑅 ) (𝑛 − (𝑘 + 1)) 𝑃 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼
If the null hypothesis is rejected, the model is said to be jointly significant.
- Notice that the same approach can be used to test the joint significance of a subset of coefficients (but the
simplification for the R-squared is no longer valid).
- Moreover, it is possible to show that the F-Statistic for a single coefficient is equal to the square of the
corresponding t-Statistic.
32
Statistical Inference (12)
- Example 1:
Estimated unrestricted model (n = 1000): 𝑦𝑖 = 15 − 0.7𝑥1𝑖 + 1.2𝑥2𝑖 + 2.2𝑥3𝑖 + 𝜀𝑖 ֜ 𝑆𝑆𝑅𝑈𝑅 = 123
Alternatively: p − Value (𝐹2, 996 = 125.51) = 3.10652𝑒 − 049 < 0.05 ֜ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0
33
Statistical Inference (13)
34
35
36
Cases when OLS fails
When can we expect that 𝐸 𝜀𝑖 𝑋𝑖 ≠ 0 and/or 𝐸(𝜀𝑖 , 𝑋𝑖 ) ≠ 0?
2) Omitted Variable Bias (elements that affect Y and are related with X are not controlled for)
In all these situation we should apply an alternative estimator: Instrumental Variables
Estimator/Two-Stages Least Squares (or other methods to achieve “identification”).
However, as you will see in Econometrics II, this is not always possible.