0% found this document useful (0 votes)
20 views8 pages

Econometric S

The document provides an overview of econometrics, focusing on theoretical and empirical linear regressions, including definitions, properties, and specific cases such as Simple Linear Regression (SLR). It discusses conditional expectations, the Ordinary Least Squares (OLS) estimator, and the relationship between simple and multiple linear regressions, including the Frisch-Waugh theorem and omitted variable bias. Key concepts include the projection of Y on X, goodness-of-fit measures, and the estimation of coefficients in linear regression models.

Uploaded by

duongdz1919
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views8 pages

Econometric S

The document provides an overview of econometrics, focusing on theoretical and empirical linear regressions, including definitions, properties, and specific cases such as Simple Linear Regression (SLR). It discusses conditional expectations, the Ordinary Least Squares (OLS) estimator, and the relationship between simple and multiple linear regressions, including the Frisch-Waugh theorem and omitted variable bias. Key concepts include the projection of Y on X, goodness-of-fit measures, and the estimation of coefficients in linear regression models.

Uploaded by

duongdz1919
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Econometrics

Notation
Theoretical Linear Regressions (level 1 - in population)
Definition
General Case
Particular Case: Simple Linear Regression (SLR)
Particular Case: SLR without constant
Particular Case: SLR with only constant
Conditional Expectations
Linear regressions as projections
Ordinary Least Square (level 2 - in sample)
Definition
Empirical operators
Properties of the OLS estimator
Goodness-of-fit
Particular case of SLR
Links between simple and multiple linear regressions
Frisch-Waugh theorem
Algebraic link between “short” and “long” regressions

Notation
This is some core notation that the course uses:

is the underlying probability space, where Ω is the sample space, A is the event
(Ω, A, P)

space and P is the probability function.

E.g: Toss a coin, the sample space is Ω = {H , T }, the event space is


A = {∅, {H }, {T }, {H , T }}.

Theoretical Linear Regressions (level 1 - in population)


(Section 2.2)

Linear Regression in the case of all population.

Definition
General Case

Let (Y ∈ R
Ω
) and (X d
∈ (R )
Ω
, for some (d ∈ N ). If
)

(i) (E[|Y | 2
] < +∞) ;

(ii) (E[∥X∥ 2
e
] < +∞) , where (∥ ⋅ ∥ e) denotes the Euclidean norm, that is
1/2

(∥X∥ e := (∑ j=1 |X j | )
d 2
;
)

(iii) (E[XX ′
]) is invertible;

then
d Ω ′
∃! (β 0 , ε) ∈ R × R : Y = X β 0 + ε, E[Xε] = 0.

Besides, that unique (β 0


) is defined as
′ −1
β 0 := E[XX ] E[XY ]

We call X ′
β 0 ∈ (R )
d Ω
the theoretical linear regression of Y on X. We call β 0
d
∈ (R )
Ω
the
coefficients of that theoretical linear regression or the theoretical coefficients of the linear
regression of Y on X.

Particular Case: Simple Linear Regression (SLR)


SLR is any regression with X with D ∈ R .

Ω
= (1, D)

When X , we have:

= (1, D)

Cov(Y , D) Cov(Y , D)
S
α 0 := E[Y ] − E[D]; β :=
D
V [D] V [D]

with β 0 = (α 0 , β
S
D
)

Special Case: SLR is binary regressor: Support(D) = {0, 1}


with D ∈ {0, 1} (binary variable)

X = (1, D)

S
α 0 = E[Y |D = 0]; β D = E[Y |D = 1] − E[Y |D = 0]

Particular Case: SLR without constant

SLR with X = D

E[DY ]
β0 = ∈ R
2
E[D ]

If D is centred (zero expectation)

Cov(Y , D) E[DY ]
=
2
V [D] E[D ]

In words, in a simple linear regression, if the regressor has zero expectation, the slope
theoretical coefficient does not change whether a constant is included or not.

Particular Case: SLR with only constant


SLR with X = 1

β 0 = E[Y ]

Conditional Expectations
Let Y ∈ R
Ω
and X d
= (R )
Ω
. If E[Y 2
] < +∞, then the conditional expectation of Y
given/knowing X, is (almost surely) uniquely defined by

E[Y |X] is a random variable


E[Y |X = x] is a real number, the realized best prediction of Y by arbitrary functions of X

knowing that “X equals x”.

Properties: Conditional expectation is linear


Condition by X for any function g(X)

E[g(X)|X] = g(X)

If X ⊥
⊥ Y , that is X and Y are independent, then:

E[Y |X] = E[Y ]

Satisfies (conditional) Jensen’s inequality

E[f (X)|Y ] ≥ f (E[X|Y ])

Law of Iterated Expectation (or tower property):

E[Y ] = E(E[Y |X])

If g(X) is any function of X, we have

E[Y |g(X)] = E(E[Y |X]|g(X))

Linear regressions as projections


Consider the simplest case (SLR without constant): Let Y ∈ R
Ω
and X d
= (R )
Ω
with
E[|Y | ] < +∞ and E[||X|| ] < +∞.
2 2
e

We define the best linear prediction of Y by linear functions of X as X′β P


∈ R
Ω
, denoted
E lin [Y |X] or L[Y |X] with β P
∈ R
d
defined by:
′ 2
β P := argminE[(Y − X b) ]
d
b∈R

L[Y |X] = X β P

is the projection of Y on the finite-dimension subspace:
lin(X) ′ d
L := {X β : β ∈ R }
2

of L made of all linear functions of X.


2

The theoretical linear regression of Y on X and the orthogonal projection of Y on


is coincide
lin(X)
L2

′ −1 ′ 2
E[XX ] E[XY ] =: β 0 = β P := argminE[(Y − X b) ]
d
b∈R

Ordinary Least Square (level 2 - in sample)


SETTING:

Y ∈ R
Ω
(real random variable) and X = (X 1 , X 2 , . . . , X d )

∈ (R )
d Ω
(column vector).
X, Y is drawn from some joint distribution P (X,Y )

Goal: predict/explain Y (outcome) using X (regressors/covariates).


=> To do so, we use L[Y |X] =: X ′
β0 , the theoretical linear regression/projection of Y
on X

In practice, β is unknown. We need to estimate it to form our predictions. We will use data
0

for that. For a fixed P , we assume to have access to


(Y ,X)

i.i.d
(Y i, Xi) i=1,...,n ∼ P (Y , X)(i. i. d)

a cross-sectional sample of n ∈ N observations assumed independent and identically


distributed (i.i.d) from the distribution of interest (and thus representative of the population of
interest).

We consider the case of d := dim(X) ∈ N , and will thus see the case of a single scalar

regressor and a constant, namely X = (1, D) , D ∈ R


′ Ω
, as a special case.

Definition
The OLS estimator in the (empirical) linear regression of Y ∈ R
Ω
onX ∈ (R )
d Ω
(also known
as, the (empirical) coefficients in the empirical linear regression of Y on X) is defined as:
n
1 ′ 2 d Ω
^ := argmin
β ∑(Y i − X i b) ∈ (R )
bR
d n
i=1

Note: Key idea (method of moments or plug-in) for estimation: replace unknown theoretical
expectations E[(Y − X b) ] with empirical means n 1 ∑ (Y − X b) .
′ 2 − n

i=1 i

i
2

Applied to the definition of β : 0

−1
n n
1 ′
1 ′ −1
^ = (
β ∑ Xi Xi ) ( ^
∑ X i Y i ) = E[XX ] ^
E[XY ]
n n
i=1 i=1

Sample invertibility condition


n
1 ′
∑ Xi Xi is invertible
n
i=1

Empirical operators

Properties of the OLS estimator

If sample invertibility condition holds, then

Goodness-of-fit

Answer for the question "Does (linear combinations of) X predict Y accurately?" with the
(empirical) R
^ 2
^ Y
V[ ^]
^ 2 :=
R ∈ R
Ω

^
V[Y ]

it is thus the part of the empirical variance of the outcome/target Y that is explained by the
predicted/fitted value Y^ .

If we add a new explanatory variable (dim(X) → d + 1), the R


^ cannot decrease, in other
2

words, it always weakly increases.

Particular case of SLR


We now consider the particular case when X = (1, D)′ , with D ∈ RΩ: an intercept and one
scalar (1 by 1) regressor.

Property of the OLS estimator in SLR

Links between simple and multiple linear regressions


Frisch-Waugh theorem
Coefficient is used here to refer either to a theoretical coefficient or an OLS estimator.

The particular case of a simple linear regression provides a simple and quite
understandable, intuitive

(stochatic estimators) of the OLS estimator, β D


∈ R
Ω
, a real random variable,
(non-stochastic parameters) and of the theoretical coefficient β S
D
, a non-stochastic
∈ R

real number,
and the same equality holds with the empirical counterparts (symbolically, simply adding the
hats) for β^ .
S
D

Theorem 3 (Frisch-Waugh (in population – level 1)):

Theorem 3 operates at level 1, in population. The same result holds with the empirical, in
sample, counterparts. We write it directly with the X = (1, D, G ) notation:
′ ′

Algebraic link between “short” and “long” regressions


This is also called “Omitted Variable Bias” (OVB) formula
Let Y ∈ R
Ω
,X = (1, D, G )
′ ′
with D ∈ R and G ∈ (R
Ω dim(X)−2
)
Ω
, where dim(X) > 2.

The Multiple Linear Regression of Y on X is well defined and we write it:



L[Y |X] = L[Y |1, D, G] = α 0 + β D D + G β G .

The SLR of Y on (1, D) is well defined and we write it:


S S
L[Y |1, D] = α + β D.
0 D

For each component G of G = (G j 1, . . . , Gp )



, with p := dim(X) − 2 = dim(G) ≥ 1, the SLR
of G on 1, D is well defined and we write it:
j

L[G j |1, D] = αj + λ j D. (omitted on included)

For any j ∈ 1, . . . , p, λ is thus the slope coefficient in the linear regression of G on D (and a
j j

constant). We define λ := (λ1, . . . , λp) ∈ R the (column) vector collecting those slope
′ p

coefficients. Then, we have the equality


S ′
βD = βD + λ βG

The same holds in sample at level 2 with OLS estimators instead of theoretical coefficients.
In a nutshell, it says:

short = long + omitted × coef f icients of omitted on included.

You might also like