Simultaneous Equations Modelling in Python

Simultaneous equations modelling
Secciones 7.5–7.9 material impreso

What is the relationship between inflation and stock r eturns? C learly, they ought to be simultane-
ously related given that the rate of inflation w ill a ffect t he d iscount r ate a pplied t o c ashflows and
therefore the value of equities, but the performance of the stock market may also affect consumer
demand and therefore inflation through its impact on householder wealth (perceived or actual).
This simple example employs the same macroeconomic data as used previously to estimate this
relationship simultaneously. Suppose (without justification) that we wish to estimate the following
model, which does not allow for dynamic effects or partial adjustments and does not distinguish
between expected and unexpected inflation:
inflationt = α0 + α1 returnst + α2 dcreditt + α3 dprodt + α4 dmoney + u1t (5)

returnst = β 0 + β 1 dprodt + β 2 dspreadt + β 3 inflationt + β 4 rtermt + u2t (6)
where ’returns’ are stock returns.
It is evident that there is feedback between the two equations since the inflation variable appears
in the stock returns equation and vice versa. Are the equations identified? Since there are two equa-
tions, each will be identified if one variable is missing from that equation. Equation (5), the inflation
equation, omits two variables. It does not contain the default spread or the term spread, and so is
over-identified. Equation (6), the stock returns equation, omits two variables as well -- the consumer
credit and money supply variables, and so it over-identified too. Two-stage least squares (2SLS) is
therefore the appropriate technique to use.
To do this we need to specify a list of instruments, which would be all of the variables from the
reduced form equation. In this case, the reduced form equations would be:
inflation = f (constant, dprod, dspread, rterm, dcredit, qrev, dmoney) (7)
returns = g(constant, dprod, dspread, rterm, dcredit, qrev, dmoney) (8)

For this example we will be using the ’macro.pickle’ file. To perform a 2SLS regression, we need to
import a third-party package linearmodels.30
There are a variety of built-in functions from this module for advanced econometric applications.
To fit our purpose, we can directly import the function IV2SLS.
In [1]: import pickle

from linearmodels import IV2SLS
import statsmodels.api as sm
import Pandas as pd
abspath = 'C:/Users/tao24/OneDrive - University of Reading/PhD/' \

'QMF Book/book Ran/data files new/Book4e_data/'
with open(abspath + 'macro.pickle', 'rb') as handle:

data = pickle.load(handle)
30 To
install the linearmodels package, you need to press START and search for the ’Anaconda Prompt’. Once the
window opens, type the command pip install linearmodels and hit ENTER.
1
The steps for constructing regression specifications remain similar, albeit via a different library. The
first step of these would be adding a constant term to the data by the function add_constant from
Statsmodels. Subsequently, the 2SLS regression model instance is created by performing the function
IV2SLS. In its brackets, we specify four parameters: the dependent variable, the exogenous variable,
the endogenous variable and the instruments. Specifically, the first parameter dependent is defined
as the series ’inflation’. exog are const dprod dcredit dmoney, while the variable rsandp is set as
endog. Last but not least, the list of Instruments comprises the variables ’rterm’ and ’dspread’.
The res_2sls regression result instance is then generated by the function fit. However, in this case,
we type the argument cov_type=’unadjusted’ for the function.
In [2]: # 2SLS, specification 1

data = sm.add_constant(data)
ivmod = IV2SLS(dependent = data.inflation,\
exog = data[['const','dprod','dcredit','dmoney']],\
endog = data.rsandp,\
instruments = data[['rterm','dspread']])
res_2sls1 = ivmod.fit(cov_type='unadjusted')
print(res_2sls1)
IV-2SLS Estimation Summary

==============================================================================
Dep. Variable: inflation R-squared: -1.8273
Estimator: IV-2SLS Adj. R-squared: -1.8571
No. Observations: 384 F-statistic: 21.784
Date: Fri, Nov 09 2018 P-value (F-stat) 0.0002
Time: 11:39:19 Distribution: chi2(4)
Cov. Estimator: unadjusted
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 0.2129 0.0369 5.7777 0.0000 0.1407 0.2852
dprod 0.0309 0.0500 0.6172 0.5371 -0.0671 0.1289
dcredit -0.0052 0.0019 -2.7214 0.0065 -0.0089 -0.0015
dmoney -0.0028 0.0011 -2.6408 0.0083 -0.0049 -0.0007
rsandp 0.1037 0.0333 3.1092 0.0019 0.0383 0.1690
==============================================================================
Endogenous: rsandp
Instruments: rterm, dspread
Unadjusted Covariance (Homoskedastic)
Debiased: False
Similarly, the inputs for the ’rsandp’ equation would be specified as in the follow code cell and the
output for the returns equation is shown below.
In [3]: # 2SLS, specification 2

ivmod = IV2SLS(dependent = data.rsandp,\
2
exog = data[['const','dprod','dcredit','dmoney']],\
endog = data.inflation,\
instruments = data[['rterm','dspread']])
res_2sls2 = ivmod.fit(cov_type='unadjusted')
print(res_2sls2)
IV-2SLS Estimation Summary

==============================================================================
Dep. Variable: rsandp R-squared: -0.1795
Estimator: IV-2SLS Adj. R-squared: -0.1920
Date: Fri, Nov 09 2018 P-value (F-stat) 0.0761
Cov. Estimator: unadjusted
Parameter Estimates
==============================================================================
------------------------------------------------------------------------------
const -1.1624 0.6697 -1.7357 0.0826 -2.4750 0.1502
dprod -0.2366 0.4376 -0.5406 0.5888 -1.0942 0.6211
dcredit 0.0368 0.0186 1.9851 0.0471 0.0005 0.0732
dmoney 0.0185 0.0108 1.7087 0.0875 -0.0027 0.0397
inflation 6.3039 2.2728 2.7736 0.0055 1.8493 10.759
==============================================================================
Endogenous: inflation
Unadjusted Covariance (Homoskedastic)
Debiased: False
The results show that the stock index returns are a positive and significant determinant of inflation
(changes in the money supply negatively affect inflation), while inflation also has a positive effect on
the stock returns, albeit less significantly so.
3
The Generalised method of moments for instrumental variables
Secciones 7.8 material impreso y Anexo adjunto al final (The Generalised Method of Moments)
Apart from 2SLS, there are other ways to address the endogeneity issue in a system of equations.
Following the previous section using inflation and stock returns, we apply the same "macro.pickle"
Python workfile to explore a different technique: the Generalised Method of Moments (GMM). First,
recall the estimated models as follows:
inflationt = α0 + α1 returnst + α2 dcreditt + α3 dprodt + α4 dmoney + u1t (9)

returnst = β 0 + β 1 dprodt + β 2 dspreadt + β 3 inflationt + β 4 rtermt + u2t (10)
where ’returns’ are stock returns .
Clearly, there is feedback between the two equations since the inflation v ariable a ppears i n the
stock returns equation and vice versa. Therefore, the GMM is employed with a list of instruments to
be tested. To perform the GMM, we again use the third-party package linearmodels. We can directly
import the function IVGMM this time.
In [1]: import pickle

from linearmodels.system import IVSystemGMM
from linearmodels.iv import IVGMM
import statsmodels.api as sm
import Pandas as pd

with open(abspath + 'macro.pickle', 'rb') as handle:

data = pickle.load(handle)
The steps for constructing regression specifications are the same as the previous ones, albeit via a
different regression function. The first step is writing the regression formula. Within the GMM
setting, the regression model is the dependent variable followed by the exogenous variables with en-
dogenous variables and instruments added afterwards in squared brackets. Specifically, the formula
statement is as follows:
’inflation ~ 1 + dprod + dcredit + dmoney + [rsandp ~ rterm + dspread]’
where inflation is the dependent variable; 1 is the constant term; dprod, dcredit, and dmoney
are exogenous variables; rsandp is the endogenous variable; rterm and dspread are instru-
ments. Subsequently, the GMM regression model instance is created by performing the function
IVGMM.from_formula. In its brackets, we input the formula specified, data and weighting scheme.
Next, the covariance type is set as robust in the fit. Executing the cell will lead the regression results
to be displayed in the output window.
In [2]: # GMM, specification 1

formula = 'inflation ~ 1 + dprod + dcredit + dmoney + [rsandp ~ rterm + dspread]'
mod = IVGMM.from_formula(formula, data, weight_type='unadjusted')
res1 = mod.fit(cov_type='robust')
print(res1.summary)
4
IV-GMM Estimation Summary
==============================================================================
Dep. Variable: inflation R-squared: -1.8273
Estimator: IV-GMM Adj. R-squared: -1.8571
Date: Sun, Nov 11 2018 P-value (F-stat) 0.0005
Cov. Estimator: robust
Parameter Estimates
==============================================================================
------------------------------------------------------------------------------
Intercept 0.2129 0.0422 5.0416 0.0000 0.1302 0.2957
dprod 0.0309 0.0699 0.4413 0.6590 -0.1062 0.1679
dcredit -0.0052 0.0017 -2.9732 0.0029 -0.0086 -0.0018
dmoney -0.0028 0.0011 -2.5944 0.0095 -0.0049 -0.0007
rsandp 0.1037 0.0419 2.4768 0.0133 0.0216 0.1857
==============================================================================
Endogenous: rsandp
GMM Covariance
Debiased: False
Robust (Heteroskedastic)
Similarly, the second specification for the ’rsandp’ equation would be written as in the following
code cell and the output for the returns equation is shown below.
In [3]: # GMM, specification 2

formula = 'rsandp ~ 1 + dprod + dcredit + dmoney + [inflation ~ rterm + dspread]'
mod = IVGMM.from_formula(formula, data, weight_type='unadjusted')
res2 = mod.fit(cov_type='robust')
print(res2.summary)
IV-GMM Estimation Summary

==============================================================================
Dep. Variable: rsandp R-squared: -0.1795
Estimator: IV-GMM Adj. R-squared: -0.1920
Date: Sun, Nov 11 2018 P-value (F-stat) 0.2240
Cov. Estimator: robust
Parameter Estimates
==============================================================================
------------------------------------------------------------------------------
5
Intercept -1.1624 0.8919 -1.3033 0.1925 -2.9105 0.5857
dprod -0.2366 0.6369 -0.3714 0.7103 -1.4849 1.0117
dcredit 0.0368 0.0185 1.9929 0.0463 0.0006 0.0731
dmoney 0.0185 0.0104 1.7849 0.0743 -0.0018 0.0388
inflation 6.3039 3.1875 1.9777 0.0480 0.0565 12.551
==============================================================================
Endogenous: inflation
GMM Covariance
Debiased: False
Robust (Heteroskedastic)
The results show that the stock index returns are a positive and significant determinant of inflation
(changes in the money supply negatively affect inflation), while inflation also has a positive effect on
the stock returns, albeit less significantly so.
6
VAR estimation
Sección 7.10 material impreso
In this section, a VAR is estimated in order to examine whether there are lead–lag relationships
between the returns to three exchange rates against the US dollar: the euro, the British pound and
the Japanese yen. The data are daily and run from 14 December 1998 to 3 July 2018, giving a total of
7,142 observations. The data are contained in the Excel file ’currencies.xls’.
First, we import the dataset into the NoteBook. Next, we construct a set of continuously com-
pounded percentage returns called ’reur’, ’rgbp’ and ’rjpy’ using a custom function LogDiff. More-
over, these new variables are saved in a workfile currencies.pickle for the future usage.
In [1]: import Pandas as pd

import NumPy as np
import statsmodels.tsa.api as smt
import pickle

data = pd.read_excel(abspath + 'currencies.xls',index_col=[0])
def LogDiff(x):
x_diff = 100*np.log(x/x.shift(1))
x_diff = x_diff.dropna()
return x_diff
data = pd.DataFrame({'reur':LogDiff(data['EUR']),
'rgbp':LogDiff(data['GBP']),
'rjpy':LogDiff(data['JPY'])})
with open(abspath + 'currencies.pickle', 'wb') as handle:

pickle.dump(data, handle)
VAR estimation in Python can be accomplished by importing the function VAR from the library
statsmodels.tsa.api. The VAR specification appears as in the code cell [2]. We define the dependent
variables to be ’reur’, ’rgbp’ and ’rjpy’. Next we need to specify the number of lags to be included
for each of these variables. In this case, the maximum number of lags is two, i.e., the first lag and
the second lag. Let us write the argument maxlags=2 for the regression instance and estimate this
VAR(2) model. The regression output appears as below.
In [2]: # VAR
model = smt.VAR(data)
res = model.fit(maxlags=2)
print(res.summary())
Summary of Regression Results

==================================
Model: VAR
Method: OLS
Date: Mon, 20, Aug, 2018
7
Time: 15:35:25
--------------------------------------------------------------------
No. of Equations: 3.00000 BIC: -5.41698
Nobs: 7139.00 HQIC: -5.43024
Log likelihood: -10960.3 FPE: 0.00435167
AIC: -5.43720 Det(Omega_mle): 0.00433889
--------------------------------------------------------------------
Results for equation reur
==========================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------
const 0.000137 0.005444 0.025 0.980
L1.reur 0.147497 0.015678 9.408 0.000
L1.rgbp -0.018356 0.017037 -1.077 0.281
L1.rjpy -0.007098 0.012120 -0.586 0.558
L2.reur -0.011808 0.015663 -0.754 0.451
L2.rgbp 0.006623 0.017032 0.389 0.697
L2.rjpy -0.005427 0.012120 -0.448 0.654
==========================================================================
Results for equation rgbp

==========================================================================
--------------------------------------------------------------------------
const 0.002826 0.004882 0.579 0.563
L1.reur -0.025271 0.014058 -1.798 0.072
L1.rgbp 0.221362 0.015277 14.490 0.000
L1.rjpy -0.039016 0.010868 -3.590 0.000
L2.reur 0.046927 0.014045 3.341 0.001
L2.rgbp -0.067794 0.015272 -4.439 0.000
L2.rjpy 0.003287 0.010868 0.302 0.762
==========================================================================
Results for equation rjpy

==========================================================================
--------------------------------------------------------------------------
const -0.000413 0.005524 -0.075 0.940
L1.reur 0.041061 0.015908 2.581 0.010
L1.rgbp -0.070846 0.017287 -4.098 0.000
L1.rjpy 0.132457 0.012298 10.771 0.000
L2.reur -0.018892 0.015893 -1.189 0.235
L2.rgbp 0.024908 0.017282 1.441 0.150
L2.rjpy 0.014957 0.012298 1.216 0.224
==========================================================================
Correlation matrix of residuals

reur rgbp rjpy
reur 1.000000 0.634447 0.270764
8
rgbp 0.634447 1.000000 0.164311
rjpy 0.270764 0.164311 1.000000
At the top of the table, we find information for the model as a whole, including values of the infor-
mation criteria, while further down we find coefficient estimates and goodness-of-fit measures for
each of the equations separately. Each regression equation is separated by a horizontal line.
We will shortly discuss the interpretation of the output, but the example so far has assumed that
we know the appropriate lag length for the VAR. However, in practice, the first step in the construction
of any VAR model, once the variables that will enter the VAR have been decided, will be to determine
the appropriate lag length. This can be achieved in a variety of ways, but one of the easiest is to em-
ploy a multivariate information criterion. In Python, this can be done by calling the built-in function
select_order based on the regression model instance. In the specification brackets of the function, we
specify the maximum number of lags to entertain including in the model, and for this example, we
arbitrarily enter 10. By executing the code cell, we should be able to observe the following output.
In [3]: res = model.select_order(maxlags=10)

print(res.summary())
VAR Order Selection (* highlights the minimums)

==================================================
AIC BIC FPE HQIC
--------------------------------------------------
0 -5.346 -5.343 0.004769 -5.345
1 -5.433 -5.422* 0.004368 -5.429
2 -5.437 -5.417 0.004351 -5.430*
3 -5.438 -5.409 0.004350 -5.428
4 -5.439* -5.401 0.004344* -5.426
5 -5.438 -5.392 0.004346 -5.423
6 -5.437 -5.382 0.004351 -5.418
7 -5.437 -5.373 0.004353 -5.415
8 -5.436 -5.364 0.004358 -5.411
9 -5.435 -5.354 0.004360 -5.407
10 -5.434 -5.344 0.004367 -5.403
--------------------------------------------------
Python presents the values of various information criteria and other methods for determining the lag
order. In this case, the Akaike (AIC) and Akaike’s Final Prediction Error Criterion (FPE) both select
a lag length of four as optimal, while Schwarz’s (SBIC) criterion chooses a VAR(1) and the Hannan-
Quinn (HQIC) criteria selects a VAR(2). Let us estimate a VAR(1) and examine the results. Does the
model look as if it fits the data well? Why or why not?
Next, we run a Granger causality test. We call the statsmodels.tsa.api function VAR again and
construct a VAR regression instance. Next, we run the built-in function test_causality and specify
the parameters as follows: causing variable ’rgbp’, caused variable ’reur’, performing test options
’wald’ and significance level for computing critical values, 0.05. It is unfortunate that the Granger
causality can only be tested between two variables in Python since we want to run the test among
9
all variables. However, this can be done by separately estimating each of the pairwise combinations
and noting down the statistics in each case (Table 1).31
In [4]: model = smt.VAR(data)

#--------------------------------------------------
# Equation reur, Excluded rgbp
resCausality = res.test_causality(causing=['rgbp'],
caused=['reur'],
kind='wald',signif=0.05 )
31 Inthe code cell, we only demonstrate the command for the equation reur due to limited space. Users can modify it
for other equations.
10
Table 1: Granger Causality Wald tests
Equation Excluded chi2 Critical value p-value df
reur rgbp 1.186 5.991 0.553 2
reur rjpy 0.6260 5.991 0.731 2
reur All 1.764 9.488 0.779 4
rgbp reur 12.88 5.991 0.002 2
rgbp rjpy 12.92 5.991 0.002 2
rgbp All 28.99 9.488 0.000 4
rjpy reur 7.320 5.991 0.026 2
rjpy rgbp 17.12 5.991 0.000 2
rjpy All 17.38 9.488 0.002 4
The results show only modest evidence of lead-lag interactions between the series. Since we have
estimated a tri-variate VAR, three panels are displayed, with one for each dependent variable in the
system. There is causality from EUR to GBP and from JPY to GBP that is significant at the 1% level.
We also find significant causality at the 5% level from EUR to JPY and GBP to JPY, but no causality
from any of the currencies to EUR. These results might be interpreted as suggesting that information
is incorporated slightly more quickly in the pound-dollar rate and yen-dollar rates than into the
euro-dollar rate.
It is preferable to visualise the impact of changes in one variable on the others at different hori-
zons. One way to achieve this is to obtain the impulse responses for the estimated model. To do so,
we first re-define the dependent variables (reur rgbp rjpy) and select a VAR model with one lag of
each variable. This can be done by inputting the argument maxlags=1. We then specify that we want
to generate a graph for the irf, that is, the built-in function from the VAR result instance. Finally, we
need to select the number of periods over which we want to generate the IRFs. We arbitrarily select
20 and feed it to irf. Type the command irf.plot() and Python produces the impulse response graphs
(Figure 27) as below.
In [5]: model = smt.VAR(data)

# Impulse Response Analysis

irf = res.irf(20)
irf.plot()
Out[5]:
11
Figure 27: Impulse Responses
As one would expect given the parameter estimates and the Granger causality test results, only a few
linkages between the series are established here. The responses to the shocks are very small, except
for the response of a variable to its own shock, and they die down to almost nothing after the first
lag.
Note that plots of the variance decompositions (also known as forecast error variance decompo-
sitions, or fevd in Python) can also be generated using the fevd function. Instead of plotting the IRFs
by the irf function, we choose fevd – that is, the forecast-error variance decompositions. Bar charts
for the variance decompositions would appear as follows (see Figure 28).
In [6]: # Forecast Error Variance Decomposition (FEVD)
12
fevd = res.fevd(20)
fevd.plot()
Out[6]:
Figure 28: Variance Decompositions
To illustrate how to interpret the FEVDs, let us have a look at the effect that a shock to the euro rates
has on the other two rates and on later values of the euro series itself, which are shown in the first
row of the FEVD plot. Interestingly, while the percentage of the errors that is attributable to own
shocks is 100% in the case of the euro rate (dark black bar), for the pound, the euro series explains
around 40% of the variation in returns (top middle graph), and for the yen, the euro series explains
around 7% of the variation.
13
We should remember that the ordering of the variables has an effect on the impulse responses and
variance decompositions, and when, as in this case, theory does not suggest an obvious ordering of
the series, some sensitivity analysis should be undertaken. Let us assume we would like to test how
sensitive the FEVDs are to a different way of ordering. We first generate a new DataFrame data1
with the reverse order of the columns to be used previously, which is rjpy rgbp reur. To inspect and
compare the FEVDs for this ordering and the previous one, we can create graphs of the FEVDs by
implementing the VAR regression and fevd function again. We can then compare the FEVDs of the
reverse order (Figure 29) with those of the previous order.
In [7]: data1 = data[['rjpy','rgbp','reur']] # reverse the columns
model = smt.VAR(data1)
# Forecast Error Variance Decomposition (FEVD)

fevd = res.fevd(20)
fevd.plot()
Out[7]:
14
Figure 29: Variance Decompositions for Different Orderings
15
1
Anexo
14.4 The Generalised Method of Moments
14.4.1 lntroduction to the Method of Moments

In Chapters 3-5, we have discussed how the method of least squares can be used
to estima te the parameters of a model by setting up a loss function (the residual sum
of squares) and minimising it. While least squares has man y advantages, including its
tractability and our depth of knowledge about how and when it works (and how and
when it doesn't), there are two Further broad approaches to model parameter
estimation rhat are available ancl widely used. One of these is maximu111 likelihood,
which was discussed in detail in Section 9.9 of Chapter 9 with sorne
mathematical results covered in the Appendix to that chapter; the Fmal estimation
technique is known as the method of 111ome11ts, and this will now be discussed in
detail in the remainder of this section.
The generalised mcthod of moments (GMM), as the name suggests, provides a
generalisation of the conventional method of moments estimator which has Found
widespread applicability For Fmance in areas as diverse as asset pricing (including
factor models and utility Functions). interest rate moclels, and market microstructure
- see Jaganathan, Skoulakis ancl Wang (2002) for a high-level survey. GMM can be
appliecl in the context of time-series, cross-sectional or panel data. In fact, many
other estimators that we have seen at various points in this book are special cases of
the GMM estimator: OLS, GLS, instrumental variables, two-stage least squares, ancl
maximum likelihoocl.
The method of moments technique dates back to Pearson (1895) and in essence
it works by computing the moments of the sample elata ancl setting them equal to
their corresponding population values based on an assumcd probability clistribution
•
2
for the latter. lf we have k parameters to estímate, we need k sample moments. So,
for example, if the observed data (y) are assumed to follow a normal distribution,
there are two parameters we would need to estímate: the mean and the variance.
To estímate the population mean (call this µ0), we know that E[yi ] - µ0 = O. We
also know that the sample moments will converge to their population counterparts
asymptotically by the law of large numbers. So, as the number of data points T
increases, we have that
1 T
-:¡.¿Y, -1.1,0- O as r- oo
l=l
Thus the f1rst sample moment condition is found by taking the usual sample average
of y,, y
T
1
-;¡LY,-J..lo = O (14.51)
l=l
We would then adopt the same approach to match the second moment
(14.52)
and thus
(14.53)
and so we have
(14.54)
lf we had a more complex distribution with more than two parameters, we would
simply continue to compute the third, fourth, ... moments until we had the same
number as parameters to estimate.
In the context of estimation of the parameters in a regression model, the method
of moments relies principally on the assumption that the explanatory variables are
orthogonal to the disturbances in the model
Elu,xi] =O (14.55)
for ali t where x, is a T x k matrix of observations on the explanatory variables and

there are k + 1 unknowns including an intercept term to keep the notation consistent
with the previous chapters.
Given this assumption, if we let /J* denote the true value of /J, a vector of
parameters, then we can write a moment condition as
E[(y, - .r;/J*).r,] = o (14.56)

3
Solving these moment conditions would lead to the familiar OLS estimator for fi
given in equation (4.8) of Chapter 4. Again, in practice we use the sample analogue
of the moments of E(y).
In terms of its properties, the method of moments is a consistent estimator but
sometimes not efficient. The maximum likelihood technique (see Chapter 9 of this
book for details) uses information from the entire assumed distribution function, and
OLS requires an assumption about the independence of the error terms while the
method of moments (and GMM) use only information on specif1c moments and thus
the latter is more flexible and less restrictive.
14.4.2 The Generalised Method of Moments

The main disadvantage of the conventional method of moments estimator is that it
is only applicable in situations where we have exactly the same number of moment
conditions (i.e., equations) as unknowns (parameters to estímate) - in other words,
we could say that the system is exactly identified. 14 However, in most situations we
will have more moment conclitions than unknowns, in which case the system would
be overide11tified; GMM was developecl by Hansen ( 1982) precisely for this purpose.
lf the number of moment conditions is the same as the number of unknowns, then
there will be a unique solution that optimises the moment conditions and in the
case of the moment conclition in equation ( 14.56) above, it will be exactly satisf1ed.
However, if the number of moment conclitions exceeds the number of unknowns,
there will be multiple solutions and it is necessary to select the 'best' from among
them. A natural way to do this would be to choose the parameter estimates that
minimise the variance of the moment conditions. Effectively, vía a weighting matrix
W, this gives higher weight to moment conclitions with a lower variance (in other
words, those that are closer to being satisf,ed).
To establish sorne more general notation, suppose that we have 1 = 1, ..., L
moment conditions and we wish to estímate k parameters in a model, and ali of these
parameters are stacked into a vector fJ. We would write the moment conditions as
E[m¡(y,,.r,;fJ)l =o (14.57)
The sample analogue of this equation is effectively the mean of each moment
condition
T
111¡(y,,xr ;fi) = ¿ m¡(y,,x, ;fi) = O (14.58}
r=I
Note that it does not matter here whether we divide by l/T or not since this term
would cancel out anyway. As discussed above, if L = k, these L equations will have
a unique solution and thus for such exactly identified systems, ali of the moment
conditions in the equation above will be exactly zero, but there will be more than
14
Sce Chaptcr 7 for a detailed discussion of this conccpt in a diffcrent comcxt.
4
one solution when L > k. In such cases we woulcl choose the parameters that come
as near as possible to solving this, which would mean that the sample moment vector
is as close to zero as possible. This would be written as
fiGMM = argminp m(/3) 1 Wn-1(fi) (14.59)
where m(/3) = (111 1, •••, 111¿) are the L moment conditions (which will be a function of
the estimated parameters, /3, and W is the weighting matrix which must be positive
definite. It is possible to show that the optima! W is the inverse of the variance
[
covariance matrix of the moment conditions
T ]-1
w = � e� m(/3) in(/3)') (14.60)
The necessity to choose a weighting matrix is a disadvantage of GMM. Although,

as stated, the optima! weighting matrix will be the inverse of the covariance of the
moment equations, this depends on the true but unknown parameter vector. The
most common approach to dealing with this problem is to use a two-step estimation
procedure where in the f1rst stage the weighting matrix is substituted by an arbitrary
choice that does not depend on the parameters (such as the identity matrix of
appropriate order) and then in the second stage it is substituted by an estimate of the
variance based on the parameter estimates given in the f1rst stage. If the weighting
matrix is the identity matrix, then minimisation has OLS as a special case. More
generally, it can be seen that the form of equation ( 14.59) is redolent of the G LS
approach.
A more sophisticated variant of this technique employs these steps repeatedly,
continually updating the parameter estimates and the variance of the moment
conditions until the collective change in the parameter estimates from one iteration
to the next falls below sorne pre-specif1ed threshold.
For overidentif1ed systems where there are more moment conditions than param
eters to estimate, we can use these degrees of freedom to test the overidentifying
restrictions through what is known as the Sargan-Hansen ]-test, or sometimes just
the Sargan ]-test. The null hypothesis is that ali of the moment conditions are exactly
satisf1ed so if the null is rejected it would be indicative that the estimated parameters
are not supported by the data. The test statistic is given by
where EAV is the estimated asymptotic variance, and is asymptotically distributed as

a chi-squared with L - k clegrees of freedom.
The sampling theo1y that lies behind GMM, and the test for over-identifying
restrictions, are only valid asymptotically, and this might provide particular issues
when the number of observations available is small. Monte Cario simulation evidence
5
in Ferson and Foerster ( 1994) has suggested that GMM estimators may be oversized
for modest numbers of data points.
Under sorne assumptions, it is possible to show that the GMM estimator is
asymptotically normal with mean equal to the true parameter vector and a variance
that is an inverse function of the sample size and of the partial derivatives of the
moments with respect to the parameters - see Hansen ( 1982).
14.4.3 GMM in the Asset Pricing Context

One of the most common uses of GMM is in the context of asset pricing models that
seek to simultaneously estímate the exposures of the returns on stocks to a set of
risk factors and the risk premium per unit of each source of risk. We therefore brietly
discuss the setup in this context, loosely following the description and notation in
Jaganathan, Skoulakis and Wang (2010). If we define Rr as an N x 1 vector of excess
returns (over the risk-free rate) on N stocks at time t, A as a K x 1 vector of risk
premia ancl B as a K x N matrix of factor loadings on a K x 1 vector ft of K risk
factors. In the context of the empirical arbitrage pricing model of Chen, Roll and
Ross (1986), these would be broad economic factors such as market risk, unexpected
changes in intlation or oil prices or GDP, etc.
Each element in B, which we might term 8 k,,,, defines the amount of exposure
to factor k that each stock n has. Then a straightforward linear pricing model that
defines the expected returns follows as
E[R1 ] = B/\ (14.61)
The Fama-MacBeth procedure involves two steps to implement the model (see
Section l 4.2 earlier in this chapter): first, a set of time-series regressions to estímate
the factor exposures, B, and second, a set of cross-sectional regressions to estímate the
risk premia, /\. If we further define µ, to be a K x 1 vector of means of each of
the factors, the f1rst of these stages to estímate the factor loadings would involve
the regressions
R1 = A + 8ft + lit (14.62)
where A is a N x 1 vector of intercept terms, ancl u, is a N x 1 vector of disturbances.

However, the GMM approach would be able to estímate both 8 ancl /\ in a single stage.
We could define the moment restrictions as
E[Rr - 8(/\ - ¡,1, + }í-)] = O (14.63)
E[(Rr - 8(/\ - ¡,1, + ft))f,1 ] = O (14.64)
EfJí - µ,j =o (14.65)
Equations (14.63) comprises N moment restrictions, equation (14.64) has N x K

restrictions and equation ( 14.65) has K restrictions; there will be a total of N - K
6
degrees of freedom which can be used as overidentifying restrictions in a J-test. A

natural extension of this framework is to allow either the factor exposures B or the
risk premia /\. to be time-varying - see Jaganathan et al. (2010) for further details.
14.4.4 A GMM Application to the Link Between Financia! Markets

and Economic Growth
We now discuss an application of GMM in the context of the link between f1 11ancial
markets and economic growth by Beck and Levine (2004). Their key research
question is to what extent the development of the banking sector and the stock
market can positively affect the leve! of economic growth. The theoretical literature
propases that effectively functioning f1 11ancial intermediation can help the flow of
information regarding the quality ofinvestment projects and can reduce transactions
costs between investors/savers on the one hand and borrowers/issuers 011 the other.
This would support higher economic growth by ensuring an optima! allocation of
resources. Yet there also exist contrary arguments suggesting that greater financia!
development may harm long-run economic growth, a11Cl thus the link between the
two is a live issue to be tested empirically.
Beck and Levine examine the roles of both bank lending and the stock market,
since they represent quite different forms of fmancing for firms and may therefore
help to overcome different forms of information def1ciencies or transactions costs.
They establish a 40-countiy panel ofdata measured using non-overlapping f1ve-year
averages over the 1976 to 1998 period and thus comprising a total of 146 data points.
Five-year averages are used rather than annual data to enable the authors to focus
on the long run and since severa! oftheir variables do not show much variation from
one year to the next for each given country.
The variables employed in the model are as follows. Stock market development is
proxied by the turnover ratio, which is the total value ofshares traded divided by the
total value of shares listed on the exchange. The higher this ratio, the deeper is the
market ancl the more frequently the stock is turned over, suggesting higher liquidity
and lower transactions costs. Banking sector development is proxied by the ratio
of total loans to the private sector divided by gross domestic product (GDP). Severa!
control variables are also employed in the model: the initial leve! ofGDP is included to
allow for the 'catching up effect' where countries with lower GDP tend to grow faster
and GDP figures converge cross-sectionally; average years of schooling (measures
the country's stock of investment in human capital); government consumption; the
ratio ofimports and exports to GDP (a measure oftrade openness); the inflation rate;
and the 'black market premium'.15 The dependent variable in ali oftheir specif1cations
is real per capita GDP growth [or, in sorne specif1cations, its f1rst difference).
The basic model is
Yi.r - Jl i.r-1 = ªYi,r-1 + /3 1.r¡,, + 17¡ + Ui,t (14.66)

15 Neither this variable nor govcrnmcnt consumption appear to be dcfincd in thc paper.
7
where y;, 1 represents the log ofreal GDP per capita in country i at time t, x;, 1 in eludes
ali of the explanatory variables except the previous leve! of GDP per capita (which
is separated out), f3 is a vector of slope parameters and u;,r is a disturbance term. The
additional term, 1J has an i subscript but no t subscript, indicating that it varíes by
country and not over time. This is a vector of parameters that allows the intercept
to be different for each country. These are known as country jixed effects, and are
discussed in detail in Chapter 11. The authors turn equation (14.66) into a f1rst
difference form
(y;,1 - Yi,1-1) - (Yi,1-1 - )l;,1-2) = a1 (Yi,1-1 - Yi,1-2)

+ f3; (x;,1 - Xi,f-1) + (u;,t - lli,l-1) (14.67)
In this equation, the country-specif1c effects (17¡) have dropped out when using a
difference form since they do not vary over time. GMM is employed as the core
estimation approach rather than OLS. lf we write the error term in this equation
(14.67) as v;. 1 = (u;, 1 - u;, 1-1) for simplicity, then we could use the following moment
conditions
E[y;,1-sVi,t] =O (14.68)
E[.r;,1-slJ;,i] =O (14.69)
for s � 2; t = 3, ..., T in both cases.

Beck and Levine use severa! specif1cations, but for brevity I only report in
Table 14.5 here the results from their Table 5, which are based on the above GMM
differences specif1cation.
In the differences regression presented above, neither the bank credit variable nor
the turnover ratio have consistent positive signs and are not statistically signif1cant
in any ofthe f1ve specif1cations, although they are consistently so in the levels GMM
regression (not reportee! here, where the dependent variable is the leve! ofGDP growth
rather than the change in GDP growth). In the levels regression, the authors give
the example of Mexico, whose stock market turnover ratio and bank lending to the
private sector were particularly low, but ifthe values ofthe two variables had instead
been at the OECD's average leve! then GDP would have been expected to grow by
0.6 and 0.8 percentage points more per year, respectively.
The log of initial GDP per capita parameter estimates are statistically signif1cant
and negative for ali f1ve specif1cations, indicating a 'regression to the mean' due to
a convergence effect where countries with already high GDP grow more slowly, as
expected; the parameter estímate on the trade openness variable also has a positive
sign and is statistically signif1cant in model (3 ) in the table where it is included.
The second panel of Table 1 4.5 reports p-values for tests of three summary
diagnostic measures for the model. The f1rst of these is the Sargan J-test and since
the p-value for ali f1ve models is greater than O.1, we would conclude that the
8
':lbl 1 .5 GMM estimates of the effect of stock markets and bank lending on economic
growth
Constant 2.089 2.067 1.536 2.028 2.06
(0.014)** (0.001)*** (0.008)*** (0.054)* (0.005)***
Lagged log(GDP) -13.59 -8.517 -7.374 -15.956 - 10.547
(0.001)*** (0.001)*** (O.O! 9)** (0.001)*** (0.001)***
Av. years of school 1.554 -1.395 -10.605 2.557 3.76
(0.717) (0.690) (0.012)** (0.495) (0.271)
Government consumption 2.992
(0.229)
Trade openness 5.676
(0.001)***
lnílation rate 0.866
(0.336)
Black market premium -0.788
(O.738)
Bank credit 0.749 0.683 -0.471 0.370 0.626
(0.388) (0.426) (0.644) (0.656) (0.552)

Turnover ratio -0.36 -0.145 0.699 -0.225 -0.496
(0.674) (0.803) (0.129) (0.828) (0.506)
Sargan test 0.259 0.120 0.315 0.305 0.155

Serial correlation test 0.859 0.530 0.102 0.710 0.800
Wald joint signif1cance test 0.361 0.483 0.189 0.787 0.323
Notes: The dependent variable is the change in the growth of GDP. The first column states the
explanatory variables while the numbered columns give the parameter estimates with p-values in
parentheses. •, •• and *** denote significance at the 10%, 5% and 1%, levels respectively. Av. years
of school is measured as ln(1 + number of years). Ali regressions are conducted with data spanning
40 countries and with a total of 146 observations. The numbers presented in the second panel for
the diagnostic tests are ali p-values.
Source: Beck and Levine (2004). Reprinted with permission from Elsevier.
9
overidentifying restrictions are satisf1ed and that the moments are close to zero
and therefore the proposed models are adequate. Likewise, for the autocorrelation
test reported in the second row of that panel, the p-values are ali greater than 0.1
(albeit only marginally in specification (3)), and therefore there is no evidence of
autocorrelation in the residuals from the f1tted model. The final row presenrs the p
values from Wald tests, akin to the regression F-statistic which measures the joint
signif1cance of ali parameters in the model. In this case, we would want to reject
the null hypothesis that ali of the parameters are zero, but we are unable to do so
in any of the specif1cations ancl thus the model in differences form clisplayed here
fails this test. This is not the case, however, for the moclels in other forms or for the
more complex hybricl between the levels ancl clifferences form not presentecl here -
see Beck and Levine (2004) for further cletails.
The main conclusion from the study is that both stock market depth and bank
lending - and thus overall financia! development - enhance economic growth as
both have positive and statistically signif1cant parameter estimates in the majority
of specif 1 cations that Beck and Levine examine. As is ohen the case, while GMM
is demonstrably superior from an econometric perspective, as the authors note, the
conclusions are mostly not qualitatively altered compared with the case where OLS
is used.
14.4.5 Additional Further Reading

In addition to the in-text citations above, the core reference for GMM as a
technique is the book by Hall (2005); a further mathematical treatment is available in
Hamilton (1994, Chapter 14). Within the frnance area specifically, Cochrane's (2005)
book is also very useful.

Simultaneous Equations Modelling in Python

Uploaded by

Simultaneous Equations Modelling in Python

Uploaded by

Simultaneous equations modelling

Secciones 7.5–7.9 material impreso

inflationt = α0 + α1 returnst + α2 dcreditt + α3 dprodt + α4 dmoney + u1t (5)

inflation = f (constant, dprod, dspread, rterm, dcredit, qrev, dmoney) (7)

returns = g(constant, dprod, dspread, rterm, dcredit, qrev, dmoney) (8)

In [1]: import pickle

abspath = 'C:/Users/tao24/OneDrive - University of Reading/PhD/' \

with open(abspath + 'macro.pickle', 'rb') as handle:

In [2]: # 2SLS, specification 1

IV-2SLS Estimation Summary

In [3]: # 2SLS, specification 2

IV-2SLS Estimation Summary

inflationt = α0 + α1 returnst + α2 dcreditt + α3 dprodt + α4 dmoney + u1t (9)

In [1]: import pickle

abspath = 'C:/Users/tao24/OneDrive - University of Reading/PhD/' \

with open(abspath + 'macro.pickle', 'rb') as handle:

’inflation ~ 1 + dprod + dcredit + dmoney + [rsandp ~ rterm + dspread]’

In [2]: # GMM, specification 1

In [3]: # GMM, specification 2

IV-GMM Estimation Summary

In [1]: import Pandas as pd

abspath = 'C:/Users/tao24/OneDrive - University of Reading/PhD/' \

data = pd.read_excel(abspath + 'currencies.xls',index_col=[0])

with open(abspath + 'currencies.pickle', 'wb') as handle:

Summary of Regression Results

Results for equation rgbp

Results for equation rjpy

Correlation matrix of residuals

In [3]: res = model.select_order(maxlags=10)

VAR Order Selection (* highlights the minimums)

In [4]: model = smt.VAR(data)

In [5]: model = smt.VAR(data)

# Impulse Response Analysis

In [6]: # Forecast Error Variance Decomposition (FEVD)

Figure 28: Variance Decompositions

In [7]: data1 = data[['rjpy','rgbp','reur']] # reverse the columns

# Forecast Error Variance Decomposition (FEVD)

14.4 The Generalised Method of Moments

14.4.1 lntroduction to the Method of Moments

for ali t where x, is a T x k matrix of observations on the explanatory variables and

E[(y, - .r;/J*).r,] = o (14.56)

14.4.2 The Generalised Method of Moments

fiGMM = argminp m(/3) 1 Wn-1(fi) (14.59)

The necessity to choose a weighting matrix is a disadvantage of GMM. Although,

where EAV is the estimated asymptotic variance, and is asymptotically distributed as

14.4.3 GMM in the Asset Pricing Context

where A is a N x 1 vector of intercept terms, ancl u, is a N x 1 vector of disturbances.

E[(Rr - 8(/\ - ¡,1, + ft))f,1 ] = O (14.64)

EfJí - µ,j =o (14.65)

Equations (14.63) comprises N moment restrictions, equation (14.64) has N x K

degrees of freedom which can be used as overidentifying restrictions in a J-test. A

14.4.4 A GMM Application to the Link Between Financia! Markets

Yi.r - Jl i.r-1 = ªYi,r-1 + /3 1.r¡,, + 17¡ + Ui,t (14.66)

(y;,1 - Yi,1-1) - (Yi,1-1 - )l;,1-2) = a1 (Yi,1-1 - Yi,1-2)

for s � 2; t = 3, ..., T in both cases.

Constant 2.089 2.067 1.536 2.028 2.06

(0.014)** (0.001)*** (0.008)*** (0.054)* (0.005)***

Lagged log(GDP) -13.59 -8.517 -7.374 -15.956 - 10.547

(0.001)*** (0.001)*** (O.O! 9)** (0.001)*** (0.001)***

Av. years of school 1.554 -1.395 -10.605 2.557 3.76

(0.717) (0.690) (0.012)** (0.495) (0.271)

Government consumption 2.992

Trade openness 5.676

(0.388) (0.426) (0.644) (0.656) (0.552)

(0.674) (0.803) (0.129) (0.828) (0.506)

Sargan test 0.259 0.120 0.315 0.305 0.155

14.4.5 Additional Further Reading

You might also like

(0.014) (0.001)* (0.008)*** (0.054)* (0.005)***

(0.001)* (0.001)* (O.O! 9) (0.001)* (0.001)***