SSR2 Final YC
SSR2 Final YC
Article:
Seo, Myung Hwan and Shin, Yongcheol (2016) Dynamic panels with threshold effect and
endogeneity. Journal of Econometrics. pp. 169-186. ISSN 0304-4076
[Link]
Reuse
This article is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs
(CC BY-NC-ND) licence. This licence only allows you to download this work and share it with others as long
as you credit the authors, but you can’t change the article in any way or use it commercially. More
information and the full terms of the licence here: [Link]
Takedown
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by
emailing eprints@[Link] including the URL of the record and the reason for the withdrawal request.
eprints@[Link]
[Link]
Dynamic Panels with Threshold E¤ect and Endogeneity
Myung Hwan Seo
Department of Economics, Seoul National University, Kwan-Ak Ro 1, Kwan-Ak Gu, Seoul, Korea
Yongcheol Shin
Department of Economics and Related Studies, University of York, York YO105DD, UK
Abstract
Corresponding author.
1 Introduction
The econometric literature on dynamic models has long been interested in the implications of
the existence of nonlinear asymmetric dynamics. Examples include Markov-Switching, Smooth
Transition and Threshold Autoregression Models. The popularity of these models lies in al-
lowing us to draw inferences about the underlying data generating process or to yield reliable
forecasts in a manner that is not possible using linear models. Until recently, however, most
econometric analysis has stopped short of studying the issues of nonlinear asymmetric mecha-
nisms explicitly within a dynamic panel data context. Hansen (1999) develops a static panel
threshold model where regression coe¢cients can take on a small number of di¤erent values,
depending on the value of exogenous stationary variable. González et al. (2005) generalize this
approach and develop a panel smooth transition regression model which allows the coe¢cients
to change gradually from one regime to another.1 In a broad context these models are a speci…c
example of the panel approach that allows coe¢cients to vary randomly over time and across
cross-section units as surveyed by Hsiao (2003, Chapter 6).
These approaches are static, the validity of which has not yet been established in dynamic
panels, though increasing availability of the large panel data sets has prompted more rigorous
econometric analyses of dynamic heterogeneous panels. Surprisingly, there has been almost
no rigorous study investigating an important issue of nonlinear asymmetric mechanism in
dynamic panels, especially when time periods are short, though there is a huge literature on
GMM estimation of linear dynamic panels with heterogeneous individual e¤ects, e.g., Holtz-
Eakin et al. (1988), Arellano and Bond (1991), Ahn and Schmidt (1995), Arellano and Bover
(1995), Blundell and Bond (1998), Alvarez and Arellano (2003), Bun and Windmeijer (2010),
Hayakawa (2015) and Hsiao and Zhang (2015).
Another limitation is the maintained assumption of exogeneity of the regressors and/or
the threshold variable. While the endogenous transition in the Markov-Switching model has
been studied by Kim et al. (2008), much progress has not been made in the threshold re-
gression literature. The standard least squares approach, such as Hansen (2000) and Seo and
Linton (2007), requires exogeneity in all the covariates. Caner and Hansen (2004) relax this
requirement by allowing for endogenous regressors, but they assume the threshold variable to
be exogenous. See also Hansen (2011) for an extensive survey.
In the dynamic panel context, Dang et al. (2012) have proposed the generalized GMM
estimator applicable for dynamic panel threshold models, which can provide consistent esti-
1
See Fok et al. (2005) for a large T treatment of smooth transition regression, thus not requiring the …xed
e¤ect or …rst di¤erence transformation to estimate the model.
[1]
mates of heterogeneous speeds of adjustment as well as a valid testing procedure for threshold
e¤ects in short dynamic panels with unobserved individual e¤ects. Ramirez-Rondan (2013)
has extended the Hansen’s (1999) work to allow the threshold mechanism in dynamic panels,
and proposed the maximum likelihood estimation techniques, following the approach by Hsiao
et al. (2002). In order to allow endogenous regressors, Kremer et al. (2013) have considered
a hybrid dynamic version by combining the forward orthogonal deviations transformation by
Arellano and Bover (1995) and the instrumental variable estimation of the cross-section model
by Caner and Hansen (2004). However, the crucial assumption in all of these studies is that
either regressors or the transition variable or both are exogenous.2
We aim to …ll this gap by explicitly addressing an important issue as how best to model
nonlinear asymmetric dynamics and unobserved individual heterogeneity, simultaneously. To
this end we extend the approaches by Hansen (1999, 2000) and Caner and Hansen (2004) to
the dynamic panel data model with endogenous threshold variable and regressors. Speci…cally,
following the main literature on the GMM, we consider the asymptotic experiment under large
cross-section unit with a …xed time period.
We propose a general GMM approach based on the …rst-di¤erence (FD) transformation.
As we allow both threshold variable and regressors to be endogenous, the FD-GMM approach
is expected to overcome the main limitation in the existing literature, namely, the assumption
of exogeneity of regressors and/or the transition variable that may hamper the usefulness of
threshold regression models in a general context. We develop the asymptotic theory through
the diminishing threshold and the standard …xed threshold asymptotics (e.g. Hansen, 2000),
and show that the FD-GMM estimator follows a normal distribution asymptotically. More
importantly, the asymptotic normality holds true irrespective of whether the regression function
is continuous or not. Hence, the standard inference on the threshold and other parameters
based on the Wald statistic can be carried out. This is in contrast to the least squares approach
in which the discontinuity of the regression function changes the asymptotic distribution in a
dramatic way.
Next, we examine the special case where the threshold variable is strictly exogenous, and
propose a more e¢cient two-step least squares (FD-2SLS) estimator. This generalizes Caner
and Hansen’s (2004) approach for the cross-section data to the dynamic panel data with a
…xed e¤ect. We establish that the FD-2SLS estimator satis…es the Oracle property because
the threshold and the slope estimates are asymptotically independent. Furthermore, the FD-
2SLS estimator of the threshold parameter is shown to be super-consistent. Though its infer-
2
Recently, Yu and Phillips (2014) and Kourtellos et al. (2015) have also addressed an issue of endogenous
threshold variable in the single equation context.
[2]
ence is non-standard, we show that a properly weighted LR statistic follows the same pivotal
asymptotic distribution as in Hansen (2000).
We provide testing procedures for identifying the threshold e¤ect, based on the supremum
statistics, which follow non-standard asymptotic distributions due to the loss of identi…cation
under the null of no threshold e¤ect. The critical values or the p-values of the tests can be easily
evaluated by the bootstrap. Furthermore, we develop the Hausman type testing procedure for
the validity of the null hypothesis that the threshold variable is exogenous.
Finite sample property of the FD-GMM estimator is examined through Monte Carlo stud-
ies. Speci…cally, we evaluate its bias and mean squared error, and the coverage probability of
the con…dence interval constructed by the asymptotic normal approximation. Overall results
provide support for our theoretical predictions. Given that there are many di¤erent ways to
compute the weight matrix in the …rst step, we propose an averaging of a class of the two-step
FD-GMM estimators that are obtained by randomizing the weight matrix in the …rst step.
This turns out to be successful in signi…cantly reducing the sampling errors.
Using the UK company panel data, we demonstrate the usefulness of the proposed dy-
namic threshold panel data modelling by providing an empirical application investigating an
asymmetric sensitivity of investment to cash ‡ows. We consider three …rm-speci…c variables
as an endogenous threshold variable that potentially a¤ects the investment dynamics. By em-
ploying a panel dataset of 560 UK …rms over the period 1973-1987, we …nd that the cash ‡ow
sensitivity of investment is signi…cantly stronger for cash-constrained, high-growth and high-
leveraged …rms, a consistent …nding with an original hypothesis by Farazzi et al. (1988) that
the sensitivity of investment to cash ‡ows is an indicator of the degree of …nancial constraints.
The plan of the paper is as follows: Section 2 describes the model. Section 3 presents the
detailed estimation steps for FD-GMM. Section 4 develops an asymptotic theory, including
consistent and e¢cient estimation of the threshold parameter. Section 5 provides the inference
for threshold e¤ects and endogeneity of the transition variable. Finite sample performance of
the FD-GMM estimator is examined in Section 6. Empirical application is presented in Section
7. Section 8 concludes. We provide two Appendices. Appendix A presents the estimation
theory for FD-2SLS, which is shown to be more e¢cient in the special case where the threshold
variable is exogenous. All the mathematical proofs are collected in Appendix B.
2 The Model
Consider the following dynamic panel threshold regression model:
yit = 1; x0it 1 1 fqit g + 1; x0it 2 1 fqit > g + "it ; i = 1; :::; n; t = 1; :::; T; (1)
[3]
where yit is a scalar stochastic variable of interest, xit the k1 1 vector of time-varying re-
gressors, that may include the lagged dependent variable, 1 f g an indicator function, and qit
the transition variable. is the threshold parameter, and 1 and 2 are the slope parameters
associated with di¤erent regimes. The error, "it consists of the error components:
where i is an unobserved individual …xed e¤ect and vit is a zero mean idiosyncratic random
disturbance. In particular, vit is assumed to be a martingale di¤erence sequence,
E (vit jFt 1) = 0;
where Ft is a natural …ltration at time t. It is worthwhile to mention that we do not assume xit
or qit to be measurable with respect to Ft 1, say E (vit xit ) 6= 0 or E (vit qit ) 6= 0, thus allowing
endogeneity in both the regressor, xit and the threshold variable, qit .
The estimation of dynamic panel data with a large number of individuals but with a …xed
number of time periods has been commonplace, e.g. Holts-Eakin et al. (1988), Arellano and
Bond (1991), Arellano and Bover (1995), Ahn and Schmidt (1995), Blundell and Bond (1998)
and Alvarez and Arellano (2003). Following this tradition, we aim to extend the static panel
threshold modelling advanced by Hansen (1999), and generalize the Arellano and Bond (1991)
FD-GMM estimator to a new estimation approach applicable for dynamic panel threshold
models. Speci…cally, we consider the asymptotic experiment under large n with a …xed T ,3 in
which case the martingale di¤erence assumption is just for expositional simplicity. The sample
is generated from random sampling across i.
A leading example of interest is the self-exciting threshold autoregressive (SETAR) model
popularized by Tong (1990), in which case xit consists of the lagged yit ’s and qit = yi;t d for
any d 1.4
It is well-established in the linear dynamic panel data literature that the …xed e¤ects
estimator of the autoregressive parameters is biased downward (e.g. Nickell, 1981). To deal
with the correlation of regressors with individual e¤ects in (1) and (2), we consider the …rst-
di¤erence transformation of (1) as follows (e.g. Arellano and Bond, 1991):5
0
yit = xit + 0 Xit0 1it ( ) + "it ; (3)
3 T
On the other hand, if ! c as N ! 1, we conjecture (e.g. Alvarez and Arellano, 2003; Hsiao and Zhang,
N
p
2015) that our proposed FD-GMM estimator is asymptotically biased of order c.
4
We note in passing that all the results go through when qit = yit d for any d 1, which covers the delayed
SETAR mechanism. It is su¢cient to check if the moment conditions hold with the particular choice of qit .
5
In (2), "it = vit . For convenience we use "it instead of vit throughout the paper. Here, we decompose
0 0 0 0 0 0 0
the parameters, 1 =( 11 ; 12 ) , 2 =( 21 ; 22 ) and = ( 1; 2) , conformable with (1; x0it ) :
[4]
0
where is the …rst di¤erence operator, = 12 ; :::; 1;k1 +1 , = 2 1; and
k1 1 (k1 +1) 1
0 1 !
(1; x0it ) 1 fqit > g
Xit =@ A and 1it ( ) = :
2 (1+k1 ) 1; x0i;t 1 2 1 1 fqit 1 > g
0 0
Let = ; 0; and assume that belongs to a compact set, = Rk ; with
k = 2k1 + 2: Following convention, we let = ; ; where and are two percentiles of the
threshold variable. Typically, they are lower and upper tenth or …fteenth percentiles.
We allow for both “…xed threshold e¤ect” and “diminishing or small threshold e¤ect” for
statistical inference for the threshold parameter, by de…ning (e.g. Hansen, 2000):
The OLS estimator obtained from (3) is biased since the transformed regressors are corre-
lated with "it . To …x this problem we need to …nd an l 1 vector of instrument variables,
0 ; ::::; z 0 0
zit for 2 < t0 T with l k such that either
0 iT
0 0 0
E zit0
"it0 ; :::; ziT "iT = 0; (5)
or
E ( "it jzit ) = 0; for each t = t0 ; :::; T: (6)
Notice that zit may include lagged values of (xit ; qit ) and lagged dependent variables and that
the number of instruments may be di¤erent for each time t.6
3 FD-GMM Estimation
We allow for the threshold variable qit to be endogenous; E (qit "it ) 6= 0 such that qit does not
belong to the set of instrumental variables, fzit gTt=t0 . We consider the following l-dimensional
column vector of the sample moment conditions:
n
1X
gn ( ) = gi ( ) ;
n
i=1
where 0 1
0 0
zit0 yit0 xit0 Xit0 0 1it0 ( )
B .. C
gi ( ) = B
@ . C:
A (7)
l 1
0 0 0 1 ( )
ziT yiT xiT XiT iT
6
In practice, the choice of instruments is important. In Section 4.1 we present the order and the rank
conditions (see Assumption 3 below) for the practitioners to check with their own choice of instruments.
[5]
0 0 0
Assume that Egi ( ) = 0 if and only if = 0 and let gi = gi ( 0 ) = zit0
"it0 ; :::; ziT "iT ,
and = E (gi gi0 ), where is assumed to be positive de…nite. For a positive de…nite matrix,
p 1,
Wn such that Wn ! let
Jn ( ) = gn ( )0 Wn gn ( ) : (8)
Strictly speaking, b is given by an interval but we let b be the minimum of the interval.
Since the model is linear in for each 2 , and the objective function Jn ( ) is not
0 0
continuous in with = ; ; the grid search algorithm is practical: for a …xed , let
n n
1X 1X
g1n = g1i ; and g2n ( ) = g2i ( ) ;
n n
i=1 i=1
where 0 1 0 1
zit0 yit0 zit0 xit0 ; 1it0 ( )0 Xit0
B .. C B .. C
g1i = B
@ . C;
A g2i ( ) = B
@ . C:
A
l 1 l (k 1)
ziT yiT ziT xiT ; 1iT ( )0 XiT
Then, the GMM estimator of and , for a given , is given by
0 1
b ( )0 ; b ( )0 = g2n ( )0 Wn g2n ( ) g2n ( )0 Wn g1n :
Denoting the objective function evaluated at b ( ) and b ( ) by Jbn ( ), we obtain the GMM
estimator of by
0 0
b = argmin Jbn ( ) ; and b 0 ; b0 = b (b)0 ; b (b)0 :
2
[6]
2. Estimate the parameter by minimizing Jn ( ) with
n n n
! 1
1X 0 1 X X 0
Wn = gbi gbi gbi gbi ; (11)
n n2
i=1 i=1 i=1
0
where gbi = c"it z 0 ; :::; c"iT z 0 .
0 it0 iT
Remark 1 In the linear dynamic panel data literature the number of initial conditions on
yi0 have been proposed to improve the e¢ciency of the FD-GMM estimator, e.g. Ahn and
Schmidt (1995), Arellano and Bover (1995) and Blundell and Bond (1998).7 In this paper we
consider the dynamic panels with the length of time period not too small relative to the number
of individuals as in our empirical application of …rm’s investment decision. In this regard, we
adopt a more robust speci…cation in which the distribution of yi0 given i is left unrestricted.
4 Asymptotic Theory
This section develops an asymptotic theory for the FD-GMM estimator. There are two frame-
works in the literature. One is the …xed threshold assumption (Chan, 1993) and the other the
diminishing threshold assumption (Hansen, 2000). We also discuss the estimation of unknown
quantities in the asymptotic distributions such as the asymptotic variances and the normalizing
factors when an estimator is not asymptotically normal.
0 0 0 0 0
Partition = 1; , where 1 = ; . As the true value of is n, the true values of
and 1 are denoted by n and 1n , respectively. De…ne
2 3 2 3
E zit0 x0it0 E zit0 1it0 ( )0 Xit0
6 .. 7 6 .. 7
G =6 4 . 7;
5 G ( )=6
4 . 7;
5
l k1 l (k1 +1)
0
E (ziT xiT ) E ziT 1iT ( )0 XiT
and
2 3
Et0 1 zit0 1; x0it0 1 j pt0 1( ) Et0 zit0 1; x0it0 j pt0 ( ) 0
6 .. 7
G ( )=6
4 . 7;
5
l 1
ET 1 ziT 1; x0iT 1 j pT 1( ) ET [ziT (1; x0iT ) j ] pT ( ) 0
where Et [ j ] denotes the conditional expectation given qit = and pt ( ) the density of qit :
7
Bun and Windmeijer (2010) show for the covariance stationary AR(1) panel model that the system GMM
estimator has a smaller bias and root mean square error than the FD-GMM when the series are persistent, but
2 2
that this bias increases with increasing = v and can become substantial.
[7]
Assumption 1 The true value of is …xed at 0 while that of depends on n such that
n = 0n for some 0 < 1=2 and 0 6= 0. n are interior points of : is …nite and
positive de…nite.
This assumption allows for both the standard setup, n = 0 6= 0 for all n; and the
diminishing setup, n ! 0 as n ! 1. The latter has been widely used in the threshold model
(without an endogenous regressor) to obtain a tractable asymptotic distribution for the least
squares estimator of , see Hansen (2000). As shown below, however, the GMM estimate b
is asymptotically normal whether or not n ! 0; implying that the inferential procedure is
the same for any 0 < 1=2: Therefore, we do not need to consider the diminishing setup,
though we keep it for an internal consistency of the expositions.
Assumption 2 (i) The threshold variable, qit has a continuous and bounded density, pt ( ) ;
such that pt ( 0) > 0 for all t = 1; :::; T ; (ii) Et zit x0it ; x0i;t 1 j is continuous at 0, where
Et ( j ) = E ( jqit = ) :
The smoothness assumption on the distribution of the threshold variable and conditional
moments are standard. Notice however that the distribution of GMM estimator of unknown
threshold is invariant to the continuity of the regression function at the change point because
our model does not require its discontinuity at the change point. This is a novel feature of the
GMM. As a consequence, we do not need a prior knowledge on the continuity of the model to
make inference for the threshold model.8
This is a standard rank condition in GMM for identi…cation. Typically, the lagged vari-
ables are employed as instruments. For instance, if the model is the SETAR, the lagged
dependent variables, yit d ’s for d > 1 are valid instruments and thus the dimension of the
moment conditions grows quickly as T increases to satisfy the required number of moments
for identi…cation.9
8
The GMM criterion function can be viewed as an extreme form of smoothing in the sense of Seo
and Linton (2007). The smoothed least squares implies moment conditions that include one of the type
qt
E et ( ) p hn
= 0; where et ( ) is the error for a given , p ( ) is a density function, and hn is a smoothing
parameter that goes to zero. The diminishing rate of hn determines the degree of smoothing and the convergence
1=2
rate of the threshold estimate, which is nhn 1 : The slower the rate is, the more smoothing it imples. The
GMM criterion corresponds to the case where hn is …xed, yielding the convergence rate of n 1=2 .
9
Our moment conditions in (7) utilize the moments related to "it only, but not those related to the level
"it as in Blundell and Bond (1998).
[8]
Theorem 1 Under Assumptions 1-3, as n ! 1;
0 ! 1
p b
0
B n C d 1
B b C ! N 0; G0 1
G :
@ n A
n1=2 (b 0 )
Remark 2 Theorem 1 establishes that the FD-GMM always follows the normal distribution
asymptotically, irrespective of whether = 0 or not. It can be argued that such a normality
result can be simply derived through applying the standard GMM asymptotics. However, for
our models with non-smooth criterion functions, we still need to verify certain stochastic dif-
ferentiability conditions, which is nontrivial and shown to be achieved by applying the empirical
process theory, e.g. van der Vaart and Wellner (1996). We can also allow for = 0 unlike in
the least squares of threshold regression (e.g. Hansen, 2000). Furthermore, our result does not
require us to know a priori whether the regression function is continuous or not, the validity
of which is con…rmed by the Monte-Carlo studies below.
The asymptotic variance matrix contains 0, and the convergence rate of b hinges on the
unknown quantity, . These two quantities cannot be consistently estimated in separation,
but they cancel out in the construction of t-statistic. Thus, con…dence intervals for can be
constructed in the standard manner. Let
n n
! n
!
X 1X 1X 0
b= 1 gbi gbi0 gbi gbi ;
n n n
i=1 i=1 i=1
G may be estimated by the standard Nadaraya-Watson kernel estimator: for some kernel K
and bandwidth h (e.g. the Gaussian kernel and Silverman’s rule of thumb)10 , let
2 h i 3
1 Pn 0 0 b qit0 1 0 0 b qit0 b
z it 1; x it0 1 K 1; x it0 K
6 nh i=1 0 h h 7
b =6
G 6
..
.
7
7: (12)
4 P h i 5
1 n 0 0 b qiT 1 0 )0 K b qiT b
nh i=1 z iT 1; x iT 1 K h (1; x iT h
[9]
Remark 3 As n ! 1, the consistency of b and G
b follows from the standard uniform law of
large numbers (ULLN) for iid data across i; and the consistency of the Nadaraya-Watson and
the kernel density estimators. The existence of the absolute moment is su¢cient to get ULLN.
The convergence rate for Gb follows the standard nonparameteric rate for the Nadaraya-Waston
and the kernel density estimators. See Härdle and Linton (1994) for more details on the choice
of kernel and bandwidth.
converges to the standard normal distribution. Hence, the con…dence intervals can be con-
structed in the standard manner. Alternatively, the nonparametric bootstrap can be employed
to construct the con…dence intervals, see Section 5.1 for details.
5 Testing
5.1 Testing for Linearity
The asymptotic results provide ways to make inference for unknown parameters and their
functions. However, it is well-established that the test for linearity or threshold e¤ects requires
us to develop the di¤erent asymptotic theory due to the presence of unidenti…ed parameters
under the null (e.g. Davies, 1977). Speci…cally, recall the model speci…cation (3) and consider
the null hypothesis:
H0 : = 0; for any 2 ; (14)
H1 : 6= 0; for some 2 :
where Wn ( ) is the standard Wald statistic for each …xed , that is,
Wn ( ) = nb ( )0 b ( ) 1b
( );
[10]
where b ( ) is the FD-GMM estimate of , given , and b ( ) is the consistent asymptotic
1
variance estimator for b ( ) ; given by b ( ) = R Vbs ( ) Vbs ( ) R0 , where Vbs ( ) is com-
puted as in Section 4 with b = and R = 0(k1 +1) k1 ;Ik1 +1 . The supremum statistic is an
application of the union-intersection principle commonly used in the literature, e.g. Hansen
(1996) and Lee et al. (2011).
We present the limiting distribution of the supW statistic below.
where Z N 0; 1 :
Although the limiting distribution of supW is derived as the supremum of the square of a
Gaussian process with a simpler covariance kernel, it is not straightforward to pivotalize the
statistic and tabulate the critical values. Hence, we follow Hansen (1996) and bootstrap or
simulate the asymptotic critical values or p-values as follows:
Let b be the FD-GMM estimator and construct:
d
"it = yit x0it b b0 X 0 1it (b) ;
it
1. Let i be a random draw from f1; :::; ng, and Xit = Xi t , qit = qi t , zit = zi t and
" = [
it "i t . Then, generate
2. Repeat step 1 n times, and collect f( yit ; Xit ; qit ; zit ) : i = 1; :::; n; t = t0 ; :::; T g.
3. Construct the supW statistic, say supW , from the bootstrap sample using the same
estimation method for b.
4. Repeat steps 1-3 B times, and evaluate the bootstrap p-value by the frequency of supW
that exceeds the sample statistic, supW.
Note that when simulating the bootstrap samples, the null model is imposed in step 1.
[11]
5.2 Testing for Exogeneity
In this section we describe how to test for the exogeneity of the threshold variable. Recently,
Kapetanios (2010) develops the exogeneity test of the regressors in threshold models, following
the general principle of the Hausman (1978) test. Similarly, we can develop the Hausman
type testing procedure for the validity of the null hypothesis that the threshold variable, qit is
exogenous. Indeed, this is a straightforward by-product obtained by combining FD-GMM and
FD-2SLS estimators and their asymptotic results.
Speci…cally, we propose the following t-statistic for the null hypothesis that the FD-GMM
estimate b of the unknown threshold is equal to the FD-2SLS estimate, bF D 2SLS (see Ap-
pendix A for details of the FD-2SLS estimator):
p
n b bF D 2SLS
tH = 1 ;
Vb 0 Vb Vb 0 Vbs Vbs0 Vbs Vbs0 Vb
where the denominator is derived as in Section 4. Note that this t-statistic is identical to the
t-statistic in (13) except that 0 is replaced by bF D 2SLS : However,
1
bF D 2SLS = 0 + op n 1=2
Vb 0 Vb Vb 0 Vbs Vbs0 Vbs Vbs0 Vb
due to its super-consistency. Then, it is easily seen that the asymptotic distribution of the
t-statistic is the standard normal under the null hypothesis of strict exogeneity of qit :
yit = (0:7 0:5yit 1 ) 1 fyit 1 0g + ( 1:8 + 0:7yit 1 ) 1 fyit 1 > 0g + 1 uit ; (16)
yit = (0:52 + 0:6yit 1 ) 1 fyit 1 0:8g + (1:48 0:6yit 1 ) 1 fyit 1 > 0:8g + 2 uit ; (17)
[12]
for t = 1; :::; 10; and i = 1; :::; n, where uit are iidN (0; 1). The …rst model from Tong (1990)
allows a jump in the regression function at the threshold point. The second is the continuous
model considered by Chan and Tsay (1998). In both models the threshold is located around
the center of the distribution of the threshold variable. In terms of the previous notations in
(3) ; the unknown true parameter values are = 0:5 and ( 1 ; 2) = ( 2:5; 1:2)0 in the …rst
model and = 0:6 and ( 1 ; 2) = (0:96; 1:2)0 in the second. All the past levels of yit are used
as the instrumental variables.11
In addition we consider an averaging of a class of FD-GMM estimators, which is expected to
be particularly relevant in …nite sample. There are many di¤erent ways to compute the weight
matrix, Wn in the …rst step, though there is no way to tell which is optimal. Provided that the
…rst step estimators are consistent, all the second step estimators are asymptotically equivalent,
suggesting that the averaging does not change the …rst order asymptotic distribution.12 In this
regard, we propose to randomize the weight matrix, Wn in the …rst step as follows: We compute
0 ; :::; e 0 0
Wn in (11) with gbi = e
"it0 zit0
"iT ziT , where e
"it s are randomly generated from N (0; 1) :
In our experiments we do this 100 times and take the average of the second step estimators.
Our proposal follows the similar idea by Chamberlain and Imbens (2004) and Sun (2014),
who demonstrate that randomizing initial draws are able to improve coverage rates leading to
more accurate inference. Consistent with these expectations, the subsequent simulation results
demonstrate that the variance of the averaging estimator is greatly reduced in small samples.
We examine the bias, standard error and mean square error (MSE) of the FD-GMM es-
timator with 1,000 iterations. For n = 50, 100 and 200, we set 1 = 1 and 2 = 0:5. The
simulation results are reported in Tables 1 - 3. First, looking at the MSEs in Table 1, those of
the FD-GMM for each parameter generally decreases as the sample size rises, but some para-
meters, particularly 1 and 2, are estimated with much larger MSEs. The continuous design
yields higher MSEs for the regression coe¢cients, because it has the smaller change than the
discontinuous design. When we compare MSEs of the FD-GMM with those of the averaging
estimator, we …nd that the averaging signi…cantly reduces MSEs. In some cases the gains are
so large that MSEs of the FD-GMM estimator are as twice as those of the averaging estimator.
As a rule of thumb, the reduction in MSEs by averaging becomes larger when the original MSEs
are relatively large, though this gain becomes smaller as the sample size increases. Turning
11
We have one IV for t = 3, two IVs for t = 4, and thus a total of 36 IVs for T = 10.
12
Alternatively, we may consider the continuous updating GMM estimator (CUE) proposed by Hansen et al.
(1996), which is supposed to be invariant to the initial weighting matrix. However, its evaluation goes beyond
the scope of the current paper mainly due to the computational complexity and time. Furthermore, Hasuman
et al. (2011) show that the CUE does not always perform well due to its no-moment problem that leads to wide
dispersion of the estimates.
[13]
to biases and standard errors as reported in Tables 2 and 3, we observe that the averaging
always reduces stand errors, but it has a mixed e¤ect on biases. In particular, when the bias
of the FD-GMM is large (those of 1 and 2 ), then the averaging reduces it and vice versa. As
a result, the average bias of the FD-GMM is almost the same as that of the averaging whilst
the standard deviation of the former is always larger than that of the latter. This implies that
the averaging has positive MSE reduction e¤ects on the FD-GMM estimator.
We have also performed the same experiment by …xing the intercepts across the regimes:
yit = 0:7 0:5yit 1 1 fyit 1 1:5g + 0:7yit 1 1 fyit 1 > 1:5g + 1 uit ;
yit = 0:52 + 0:6yit 1 1 fyit 1 0:4g 0:6yit 1 1 fyit 1 > 0:4g + 2 uit ;
where the threshold values are reset to stay in the middle of distribution. From Tables 4 - 6, we
…nd that the averaging reduces MSEs and standard errors more substantially. Furthermore,
biases are greatly reduced by the averaging for more than 70% of the cases.
This section explores the coverage probability of the con…dence intervals by inverting the t-
statistic. We focus on the …rst two data generating processes (16) and (17). Table 7 reports
empirical coverage probabilities of the 95% con…dence intervals for the FD-GMM estimator
and its averaging. In the averaging, both the estimator and the asymptotic variance estimator
are averaged. We select the bandwidth for the asymptotic variance by the Silverman’s rule of
thumb multiplied by h, and report the results for h = (0:5; 1; 1:5). Not surprisingly, as h rises,
the coverage frequency in‡ates. The bandwidth with h = 0:5 yields too low coverage for the
continuous design and that with h = 1:5 yields excessive coverage especially for the threshold
parameter, : Thus, we follow the Silverman’s rule.
The results for h = 1 appear to be more promising than the existing studies that document
rather poor empirical coverage probabilities for and 2; e.g. Hansen (2000) and Caner and
Hansen (2004). Importantly, the averaging results in much improved coverage, especially when
n is small, in which case the FD-GMM tends to exhibit very poor coverage. Thus, subsequent
[14]
discussions are focussed on the averaging results. For ; the coverage improves steadily to the
nominal 95% level as the sample size rises for both jump and continuous designs, from 99%
at n = 50 to 98% or 95% at n = 200. For 2; we observe somewhat lower coverages in the
continuous design, which improve as the sample size increases and look reasonable for n = 200.
Finally, the results for and are better than those for 13
1 2.
where Iit is investment, CFit cash ‡ows, Qit Tobin’s Q, and "it consists of the one-way error
components, "it = i + vit .14 The coe¢cient, 1 represents the cash ‡ow sensitivity of invest-
ment. If …rms are not …nancially constrained, external …nance can be raised to fund future
investments without the use of internal …nance. In this case, cash ‡ows are least relevant to
investment spending and 1 is expected to be close to zero. In contrast, if …rms were to face
certain …nancial constraints, 1 would be expected to be signi…cantly positive. Extensions of
this Tobin’s Q model involve additional …nancing variables such as leverage to control for the
e¤ect of capital structure on investment (Lang et al., 1996) as well as lagged investment to
capture the accelerator e¤ect of investment in which past investments have a positive e¤ect
13
More excessive coverage probabilities for are reported in Hansen (2000) and Caner and Hansen (2004),
showing more than 98% coverage even for 90% nominal level. They also reported the lower coverages for 2 .
14
We have also estimated the model with the two-way error components by including the time dummies. The
results, available upon request, are qualitatively similar.
[15]
on future investments (Aivazian et al., 2005). Therefore, we consider the following augmented
dynamic investment model:
where 1 f g is an indicator function, qit the transition variable and the threshold parame-
ter. We estimate (20) by the proposed FD-GMM, which allows for both (contemporaneous)
regressors and the transition variable to be endogenous. On the other hand, existing studies
(e.g. Hansen, 1999; González et al., 2005) employs the lagged values of CF , Q and L to avoid
the potential problem of endogenous regressors and transition variable, which is a common
practice in empirical corporate …nance, e.g. Dang et al. (2012).
Table 8 summarizes the estimation results for the dynamic threshold model of investment,
(20), with cash ‡ow, leverage and Tobin’s Q used as the transition variable, which are expected
to proxy the certain degree of …nancial constraints. This choice of the transition variable is
broader than Hansen (1999) who considers only leverage, and González et al. (2005) who
employ leverage and Tobin’s Q. The FD-GMM estimation results are reported respectively in
the low and the high regimes.
When cash ‡ow is used as the transition variable, the results for (20) show that the threshold
estimate is 0.36 such that about 80% of observations fall into the lower cash-‡ow regime. The
coe¢cient on lagged investment is signi…cantly higher for …rms with low cash ‡ows, suggesting
that the accelerator e¤ect of investment is stronger for cash-constrained …rms. The coe¢cient
on Tobin’s Q reveals an expected …nding that …rms respond to growth opportunities more
quickly when they are cash-unconstrained than when they are constrained. Next, we …nd the
[16]
more negative impacts of the leverage when …rms are cash-constrained. This is consistent with
our expectations that the leverage should have a stronger negative impact on investment for
the constrained …rms, which is in line with the overinvestment hypothesis about the role of
leverage as a disciplining device that prevents …rms from over-investing in negative net present
value projects (e.g. Jensen, 1986). Finally and importantly, the sensitivity of investment to
cash ‡ow is signi…cantly higher for cash-constrained …rms than for cash-rich …rms. Firms
with limited cash resources are likely to face some forms of …nancial constraints (Kaplan and
Zingales, 1997). Hence, this …nding supports evidence for the role of …nancial constraints in
the investment-cash ‡ow sensitivity.
When the leverage is used as the transition variable, the threshold parameter is estimated
at 0.10, lower than the mean leverage (0.24), with more than 73% of observations falling into
the high-leverage regime. We …nd that past investment has a much higher positive impact on
current investment for highly-levered …rms, suggesting that …rms with high leverage attempt
to respond to growth options quickly, hence a higher accelerator e¤ect. The e¤ect of Tobin’s
Q on investment is higher for lowly-levered …rms, which provides a support for the argument
that by lowering the risky "debt overhang" to control underinvestment incentives ex ante, …rms
are able to take more growth opportunities and make more investments ex post, though these
impacts are rather small. We also …nd the more negative impacts of the leverage when …rms are
highly levered. The coe¢cient on cash ‡ow is signi…cantly higher for …rms in the high-leverage
regime, a …nding consistent with the prediction that cash ‡ow should be more relevant and
have a stronger e¤ect on the level of investment for …nancially constrained …rms.15
When using Tobin’s Q as the transition variable, the threshold is estimated at 0.56 with
59% of observations falling into the higher growth regime. We …nd that past investment has
a slightly stronger positive e¤ect on current investment for …rms with low Tobin’s Q, but the
di¤erential impacts are statistically insigni…cant. The coe¢cient on Tobin’s Q in the low regime
is signi…cantly higher, indicating that …rms with low growth options respond more strongly to
changes in their investment opportunities. Surprisingly, we …nd a negative relationship between
leverage and investment only in the lower growth regime. The sensitivity of investment to cash
‡ow is also relatively higher for high-growth …rms than low-growth …rms. This, therefore,
supports the hypothesis that cash ‡ow should be more relevant for …rms with potentially high
…nancial constraints.16
15
Notice, however, that the non-dynamic threshold model of investment developed by Hansen (1999) fails to
…nd conclusive evidence in favor of this prediction.
16
When comparing our results with those reported in González et al. (2005), who apply the static panel
smooth transition regression model, we …nd that their results are qualitatively similar to ours regarding the
impacts on investment of both Tobin’s Q and leverage. However, they document an opposite evidence that the
[17]
In order to check the validity of the …nal speci…cations employed above, we also report the
test results for the null of no threshold e¤ects and the validity of the overidentifying moment
conditions in Table 8. First, we …nd that the bootstrap p-values of the supW test are all close to
zero, providing strong evidence in favour of threshold e¤ects. Next, the J-test results indicate
that the null of valid instruments is not rejected for the cases with the leverage and the Tobin’s
Q used as the transition variable, though it is rejected at the 1% signi…cance level for the case
with the cash ‡ow used as the transition variable. Given that the number of instruments rises
quadratically with T , this evidence is relatively satisfactory.17
In sum, when examining a dynamic threshold panel data estimation of Tobin’s Q model
of investment by using the Tobin’s Q, leverage and cash ‡ow as a possible transition vari-
able, we …nd that the results on the relationships between investment and past investment,
as well as cash ‡ow, Tobin’s Q and leverage are generally consistent with theoretical predic-
tions. More importantly, the cash ‡ow sensitivity of investment is signi…cantly stronger for
cash-constrained, high-growth and high-leveraged …rms, a consistent …nding with an original
hypothesis by Farazzi et al. (1988) that the sensitivity of investment to cash ‡ows is an indica-
tor of the degree of …nancial constraints facing the …rms. Methodologically, our results clearly
demonstrate the usefulness of the proposed dynamic panel data estimation with threshold ef-
fects despite the fact that the transition variables used in the current study may have caveats
since these variables are imperfect measures of …nancial constraints.18
8 Conclusion
The investigation of nonlinear asymmetric dynamic modelling has recently assumed a promi-
nent role. Increasing availability of the large and complex panel data sets has also prompted
more rigorous econometric analyses of dynamic heterogeneous panels, especially when the time
coe¢cient on the (lagged) cash ‡ow is positive but considerably smaller for the higher regime.
17
To avoid the potential issue related to weak instrument or over…tting, we set the maximum lag order of y
and x to be used as instruments to 4 (e.g. Roodman, 2009).
18
Kaplan and Zingales (1997) …nd that the relationship between cash ‡ows and investment is not monotonic
with …nancial constraints. Consequently, a large body of the literature seeks to address the question of what
measures can be used to classify …rms as ‘…nancially constrained’ and ‘unconstrained’. Several criteria have been
suggested, including size, age, leverage, …nancial slack, dividend payout and bond rating (e.g. Hovikimian and
Titman, 2006). An alternative approach would be to use indices computed to control for …nancial constraints,
e.g. Whited and Wu (2006).
[18]
period is short. In this paper we have explicitly addressed this challenging issue by developing
the dynamic threshold panel data model, which allows both regressors and threshold e¤ect to
be endogenous. We have proposed the FD-GMM estimation on the basis of FD transforma-
tion for removing unobserved individual e¤ects, and derived its asymptotic properties through
employing the diminishing threshold e¤ect asymptotics and the empirical process theory. In
the special case where the threshold variable is strictly exogenous, we have also proposed more
e¢cient FD-2SLS estimation.
We note several avenues for further researches. First, it is uncertain if the FD-GMM is
most e¢cient in the presence of an endogenous threshold variable, especially with respect
to alternative initial conditions and potentially many weak instruments. Simultaneously, an
extension to the large n; large T case would make an interesting future research topic. Next,
given that estimation can be signi…cantly a¤ected by the presence of cross-sectionally correlated
errors (e.g., Pesaran, 2006; Bai, 2009), it would be desirable to explicitly control for the
cross-section dependence in the dynamic threshold panels. Furthermore, researches to develop
similar estimation algorithms for models with multivariate covariates, with multiple threshold
variables and regimes, and with alternative nonlinear mechanisms will be under way.
[19]
A Appendix: the FD-2SLS Estimator
This Appendix considers the special case where the threshold variable, qit in (3) ; are exogenous
and the conditional moment restriction (6) holds. That is, zit includes qit and qi;t 1. In this
case, the threshold estimate, b can achieve the e¢cient rate of convergence, as obtained in
the classical regression model (e.g. Hansen, 2000), and the slope estimate, b can achieve the
semi-parametric e¢ciency bound (Chamberlain, 1987) under conditional homoskedasticity as
if the true threshold value, 0; is known. This strong result can be obtained since the two sets
of estimators are shown to be asymptotically independent.
A.1 Estimation
We consider two cases for the reduced form regression – the regression of endogenous regressors
on the instrumental variables. The …rst type is a general non-linear regression where unknown
p
parameters can be estimated by the standard n rate, and the second type is the threshold
regression with a common threshold.
The second case was also considered by Caner and Hansen (2004), albeit in the cross-
sectional regression. Their approach consists of three steps; the …rst two steps yield an es-
timate of the threshold value and the third step performs the standard GMM for the linear
regression within each subsample divided by the estimated threshold. However, this split-
sample GMM approach does not work with the panel data with a time varying threshold
variable, qit , because it generates multiple regimes with cross-regime restrictions. Importantly,
we demonstrate below that the …rst step estimation error a¤ects the asymptotic distribution
of the threshold estimate in the second step. In this context, we will develop new consistent
estimation algorithm for the threshold estimate.
We consider general non-linear regressions for the reduced form and provide the asymptotic
variance formula that corrects the estimation error stemming from the reduced form regression.
This is practically relevant since the linear projection in the reduced form invalidates the
consistency of b when the structural form is the threshold regression, e.g. Yu (2013).
Under the conditional moment condition in (6) and the exogeneity of q, the …rst-di¤erenced
model in (3) implies the following regression of yit on zit :
0
E ( yit jzit ) = E ( xit jzit ) + 0 E Xit0 jzit 1it ( ) : (21)
[20]
Assume for each t that the reduced form regressions are given by
! !
1; x0it 0 (z ; b )
1; F1t it 1t
E jzit = = Ft (zit ; bt ); (22)
1; x0it 1 0 (z ; b )
1; F2t it 2t 2 (1+k1 )
where bt = (b01t ; b02t )0 is an unknown parameter vector and Ft is a known function. Also let
A few remarks are in order; (i) since all the elements of xit or xit 1 are not endogenous, some
elements of Ft would be fully known; (ii) we need to run two regressions for xit ; E (xit jzit ) and
E (xit jzit+1 ), as the instruments zit are di¤erent for each t. This is due to the FD transformation
and the fact that zit varies over time; and (iii) it is not su¢cient to consider the regression
E ( xit jzit ) only, due to the last term in the structural form (21).
The representation in (21) and (22) motivates the following two-step estimation procedure:
1. For each t, estimate the reduced form, (22) by the least squares, and obtain the parameter
estimates, bbt ; t = t0 ; :::; T; and the …tted values, Fbit = Ft zit ; bbt and H
b it = Ht zit ; bbt .
2. Estimate by
XX n T
bn( ) = 1
2
min M eit ; bbt ; (23)
2 n t=t i=1 0
where
eit ( ; bt ) = yit 0
Ht (zit ; bt ) 0
Ft (zit ; bt )0 1it ( ) :
This step can be done by the grid search as the model is linear in and for a …xed .
Thus, b ( ) and b ( ) can be obtained from the pooled OLS of yit on Hb it and Fb0 1it ( ),
it
which are constructed in step 1. Finally, b is de…ned as the minimum of the minimizers
bn( ):
of the pro…led sum of squared errors, M
This produces a rate-optimal estimator for , implying that and can be estimated as
if 0 were known. In the special case with T = t0 , we end up estimating a linear regression
model with a conditional moment restriction. This two-step estimation yields the optimal
estimate for and if the model is conditionally homoskedastistic, i.e., E "2it jzit = 2, see
Chamberlain (1987). While it requires to estimate the conditional heteroskedasticity to fully
exploit the implications of the conditional moment restriction, (6), in practice, it is reasonable
to employ the two-step estimator and robustify the standard errors for heteroskedasticity. We
will provide a heteroskedasticity-robust standard errors for b and b. Note that these standard
errors are also corrected for the estimation error stemming from the …rst step estimation of b:
[21]
A.1.2 Threshold Regression in Reduced Form
E( it jzit ) = 0; (24)
0
where zit = 1; x0it 1 , and 1t and 2t are unknown parameters. This results in the following
structural threshold regression:
0 0 0 0
yit = 1t zit 1 fqit g+ 2t zit 1 fqit > g 3t zit 1 fqit 1 g 4t zit 1 fqit 1 > g + eit ;
(25)
E (eit jzit ) = 0;
0 0 0 0 0 0 0 0
where 1t = 0; 1t , 2t = 1; 22 2t , 3t zit = xit 1, 4t zit = 1 + 22 xit 1 and
eit = "it + 0 ( + 1 fqit > g ).19 Since the estimates of and are asymptotically
it 2
independent of each other, we do not need to impose any restrictions on to estimate .
Thus, we estimate the model as follows:
1. Estimate by the pooled least square of (25), which can be done by the grid search,20
and denote the estimate by e:
Remark 4 Our approach is crucially di¤erent from that of Caner and Hansen (2004), who
estimate the threshold parameter separately in the reduced and the structural form. Such an
approach introduces dependency between separate threshold estimates, which violates the valid-
ity of their asymptotic results.21 Intuitively, the estimation error in the …rst step will a¤ect
the second step estimation of since the true threshold is restricted to be the same in both re-
duced and structural forms. On the other hand, our FD-2SLS estimator is designed to remove
asymptotic correlation between the threshold estimator and the …rst step estimator.
19
See footnote 5 for the de…nition of parameters.
20
That is, …x and obtain eeit ( ) and ejt ( ), j = 1; :::; 4 by the OLS for each t. Then, e is the minimizer of
P
the pro…led sum of squared errors, i;t ee2it ( ) and ejt = ejt (e) ; j = 1; :::; 4:
21
Lemma 1 in Caner and Hansen (2004) requires more restrictions. Speci…cally, their (A.7) is true only when
the threshold estimate is n-consistent, which cannot be obtained under the maintained diminishing threshold
parameter setup. Accordingly, the high-level assumption (17) in their Assumption 2 is no longer satis…ed.
[22]
Remark 5 We consider the common threshold case mainly because we highlight an important
misspeci…cation issue in Caner and Hansen (2004) that the …rst estimation of the threshold
a¤ect the second step estimation, which was not recognized properly in the literature. But, it
would be more general to allow di¤erent thresholds in the structural and reduced-form equations.
In principle, we may consider the multiple scenarios: the structural regression follows the
threshold regression and the reduced form regression is symmetric and both structural and
reduced regressions follow the threshold regression with the same threshold parameter and with
di¤erent threshold parameters. Such an extension will be able to develop the framework of
multiple thresholds with multiple threshold variables. Recently, in the single regression context,
Chen et al. (2012) develops a threshold autoregressive model which contains two threshold
variables. However, due to the more complicated speci…cation issues associated with dynamic
heterogeneous panel structure, we leave this important issue for future studies, see also Chong
and Yan (2015) for the number of related technical issues.
This section presents the asymptotic theory for the FD-2SLS only under the diminishing thresh-
old framework (Hansen, 2000). It is worthwhile to note that the transformed model, (3) consists
of 4 regimes, which are generated by two threshold variables, qit and qit 1, while the thresh-
old parameter is restricted to be the same. This change in the model characteristic from the
original 2-regime threshold model complicates the estimation and statistical inference.
Since some elements of xit may belong to zit ; in which case the reduced form is identity, and
some elements of E (xit jzit ) may be identical to E (xit jzit+1 ) for some t, we collect all distinct
reduced form regression functions, Ft , t = t0 ; :::; T; that are not identities, and denote it as
F (zi ; b) ; where zi and b are the collections of all distinct elements of zit and bt , t = t0 ; :::; T .
We denote the collection of the corresponding elements of xit ’s by xi , and write the reduced
form as the multivariate cross section regression as follows:
Let bb denote the least squares estimate, and de…ne Fi (b) = F (zi ; b), Fi = F (zi ; b0 ) and
Fbi = F zi ; bb , where b0 indicates the true value of b.
We …rst consider the case in which the reduced form is the regular nonlinear regression and
the reduced form parameter estimate, bb is asymptotically normal.
[23]
Assumption 4 The estimator bb is consistent. F is twice continuously di¤erentiable in b in a
neighborhood of b0 almost surely and its …rst derivative matrix at b0 , a kb 2k1 (T t0 + 1)
4 4
matrix-valued function, is denoted as Fi = F (zi ). E jFi j and E j i j are …nite, where jAj
denotes the Euclidean norm if A is a vector and the vector-induced norm if A is a matrix.
where i is given in (26) : Here we illustrate how the estimation error in the …rst step a¤ects the
asymptotic distribution of the estimator of , and in the second step. Recall the functions
introduced in Section A.1.1 and let
" #
Hit (bt )
it ( ; bt ) = for each t; i ( ; b) =( it0 ( ; bt0 ) ; :::; iT ( ; bT )) .
(2k1 +1) 1 Fit (bt )0 1it ( ) (2k1 +1) (T t0 +1)
(27)
0 T
Let ei be the vector stacking "it + 0( xit E ( xit jzit )) t=t0
: Then, de…ne
M1 ( ) =E i( ) i( )0 ; and V1 ( ) = A( ) ( ; ) A ( )0 ;
(2k1 +1) (2k1 +1) (2k1 +1) (2k1 +1)
where
" ! #
i ( 1 ) ei ;
( 1; 2) =E e0i 0i ( 2 ) ; 0i F0i ;
((2k1 +1)+kb ) ((2k1 +1)+kb ) Fi i
" T
# !
@ X 1
A( ) = I(2k1 +1) ; E Hit0 0 it ( ) EFi F0i :
(2k1 +1) ((2k1 +1)+kb ) @b0 t=t
0
[24]
Assumption 5 The true value of is …xed at 0 while that of depends on n such that
n = 0n for some 0 < < 1=2 and 0 6= 0.
Assumption 6 (i) The threshold variable, qit has a continuous and bounded density, pt , such
that pt ( 0) > 0 for all t = 1; :::; T ; (ii) Et (wit j ) is continuous at 0 for all t, and non-zero
2 2
for some t, where wit is either eit 0
1; F1;it + eit+1 0
1; F2;it+1 , 0
1; F1;it , or
0 0 0
2
0
1; F2;it 0 ; (iii) E vec ( i( ; b)) vec ( i( ; b))0 is continuously di¤erentiable in b for all
in a neighborhood of 0.
Assumption 7 For some > 0 and > 0, E supt T;jb b0 j< jeit Ft (zit ; bt )j2+ < 1. For all
> 0; E supt T;jb b0 j< jeit (Ft (zit ; bt ) Ft (zit ))j2+ =O 2+ .
The asymptotic con…dence intervals can be constructed by inverting a test statistic. In par-
ticular, Hansen (2000) advocates the LR inversion for the construction of con…dence intervals
for the threshold value, 0, for which we de…ne the LR statistic as
bn( ) M
M b n (b)
LRn ( ) = n :
b n (b)
M
We present the main asymptotic results for the 2SLS estimator and the LR statistic below.
and
M22 d jrj
n1 2
(b 0) ! argmin W (r) ; (29)
V2 r2R 2
where W (r) denotes the standard two-sided Brownian motion independent of the normal vari-
ate in (28). Furthermore, for 2 = E e2it ,
e
M2 2
e d
LR ( 0) ! inf (jrj 2W (r)) :
V2 r2R
[25]
Theorem 3 yields the asymptotic independence between b and the other estimates. The
…rst estimation error does not a¤ect the asymptotic distribution of b, though it a¤ects the
asymptotic variance of b and b through V1 . However, estimation of the asymptotic variances
of b and b is standard, i.e. the same as in the linear regression due to the aforementioned
asymptotic independence.
Recall that W (r) = W1 ( r) 1 fr 0g + W2 (r) 1 fr 0g, where W1 and W2 are two inde-
pendent Wiener processes. The asymptotic distribution for b in (29) is symmetric around zero
with distribution function
p p p
1 + x=2 exp ( x=8) + (3=2) exp (x) 3 x=2 ((x + 5) =2) x=2 for x 0;
where is the standard normal distribution function, see Bhattacharya and Brockwell (1976).
The unknown normalizing factor, n2 V2 1 M22 can be consistently estimated by Vb2 1 M
c2 ; where
2
XT n
c2 = 1 X b
2 qit b b
2 qit 1 b
M 1; Fb1;it
0
K + 1; Fb2;it
0
K ;
t=t
nh h h
0 i=1
T
X n
1 X b
2 qit b b
2 qit 1 b
Vb2 = ebit 1; Fb1;it
0
K + ebit 1; Fb2;it
0
K
t=t0
nh h h
i=1
T
X1 n
1 X b 1; Fb0 bK qit b
+2 ebit ebit+1 1; Fb1;it
0
2;it+1 :
t=t0
nh h
i=1
Now, consider the case where the reduced form is the threshold regression, (24), which can
be estimated via the three-step procedure described in Section A.1.2. It turns out that the
asymptotic distributions of b can be presented by a slight modi…cation of Theorem 3. Thus,
we state its asymptotic distribution as Corollary. Interestingly, the way how the covariance
kernels are characterized in this case is illuminating. If we estimated the common threshold
separately by the two-step approach as in Theorem 3, then the estimation error in the …rst
step would a¤ect the asymptotic distribution of the threshold estimate in the second step.
0 0 0
Corollary 4 Let j = jt0 ; :::; jT ; j = 1; :::; 4, and assume that 1 2 = n 1 for
some non-zero vector 1. Let Assumptions, 5, 6 and 8 hold with F1;it = 1t zit 1 fqit g+
[26]
4
2t zit 1 fqit > g, F2;it = xit 1 ; E jzit j < 1 and Ee4it < 1. Then, the asymptotic distribution
of b estimated from (24) is the same as in Theorem 3.
Notice that it would be desirable to relax certain conditions in Corollary 4 such as the
common threshold across the reduced form and the structural form (see also Remark 5) or the
same to control the magnitude of the threshold e¤ect.
We present the asymptotic distribution of the supW statistic de…ned in (15), which tests the
validity of the null hypothesis of no threshold e¤ect (see (14)). If were estimated by the FD-
2SLS, as is well-known in the literature, the limit is the supremum of the square of a Gaussian
process with unknown covariance kernel, yielding non-pivotal asymptotic distribution.
Theorem 5 Suppose that Assumptions, 6(i) ; 7, 8, and 4 hold. Then, under the null (14) ;
h i 1
d
supW ! sup B ( )0 M1 ( ) 1 R0 RM1 ( ) 1 V1 ( ) M1 ( ) 1 R0 RM1 ( ) 1 B ( ) ;
2
0
where B ( ) is a mean-zero Gaussian process with the covariance kernel, A ( 1) ( 1; 2) A ( 2) .
The p-values can be simulated following the same bootstrap steps as in Section 5.1. When
the reduced form is a threshold regression, our test can be performed more e¢ciently based on
the model, (25). In this case both reduced form and structural equations are linear under the
null:
H00 : 1t 2t = 3t 4t = 0; for all 2 and t = t0 ; :::; T: (30)
As discussed earlier, the model, (25) can be estimated by the pooled OLS for each , and
therefore, the construction of supW statistic is standard (e.g. Hansen, 1996).
where the transition variable, qit is now randomly drawn from Uniform[-1,1], and independent
of all uit , t = 1; 2; :::; T . We also consider its restricted version with the common intercept
(DGP 2):
yit = 0:7 0:5yit 1 1 fqit 0g + 0:7yit 1 1 fqit > 0g + 1 uit :
[27]
The FD-2SLS estimator is estimated by the 3-step procedure described in Section A.1.2, em-
ploying the following 4-regime threshold regression model:
0 0 0 0
yit = 1 zit 1 fqit g+ 2 zit 1 fqit > g 3 zit 1 fqit 1 g 4 zit 1 fqit 1 > g+ uit ;
Let 0 1 0 1
0
xit0 zit Xit0 0 1it0 ( ) zit
0
0 0
B .. C B .. C
i =B
@ . C
A and i ( ) = B
@ . C:
A
0
xiT ziT 0
XiT 1iT ( ) ziT 0
0 0 0
gi ( ) = gi i( 0) i( n) ( i( ) i) ; (31)
0 0 0 0 0
E (gi ( )) = E xit zit ( 0) ; E 1it ( 0 ) Xit zit ( n) 6= 0:
t=t0 ;:::;T
0 0 0
If 6= 0 and 6= 0; the rank condition is su¢cient since ( 0) ; ( n) ; 6= 0:
[28]
Next, given the linearity in the slope parameters for a …xed ; we can write
! n
!
b( ) 1 1X
0
= An ( )0 Wn An ( ) An ( )0 Wn gn + ( i i ( )) n ; (32)
b( ) n n
i=1
Pn 1 Pn
where An ( ) = 1
n i=1
0
i; i ( )0 and gn ( ) = n i=1 gi ( ). As convention, gn = gn ( n ) :
p 1 p 0 0
Since Wn ! and An ( ) ! A( ) = E i ; i ( ) uniformly, which follows from the
standard uniform law of large numbers (ULLN),
!
b( ) p 1
0
n ! A ( )0 1
A( ) A ( )0 1
(E i E i ( )) 0 ;
b( ) n
as gn = Op n 1=2 due to the CLT: Since gn ( ) is continuous in and for any given ; the
continuous mapping theorem and standard algebra yield that
p 1
n gn b ( ) ; b ( ) ; ! I + A ( ) A ( )0 1
A( ) A ( )0 1
(E i E i ( )) 0:
The term in the …rst brackets in the right hand side is positive de…nite and E i ( ) = E i if
and only if = 0 : Therefore, p limn!1 n2 Jn b ( ) ; b ( ) ; is continuous and uniquely
minimized at = 0 and the convergence is uniform, which implies the consistency of :
Convergence rate and asymptotic normality: Recall the de…nition of Jn ( ) in (8)
and let Jn ( ) = E (gi ( ))0 Wn E (gi ( )) : Also recall Assumption 3 and the de…nition of G in it
and note that G0 1G is nonsingular and …nite and that
@ 0
G = (G ; G ; G ) = E 0i ; E 0i ; E i( 0) n :
l k @
And let Dn = 2 1 G0 W
n n gn ; where n is a 2k1 + 2 dimensional diagonal matrix whose …rst
2k1 + 1 diagonals are ones and the other element is n . We …rst claim that for any hn ! 0
p
nR ( )
sup p n = op (1) ; (33)
j n j hn
1 + nj nj
where
Rn ( ) = Jn ( ) Jn ( n) Jn ( ) Dn0 ( n) :
1=2 0 1 0 1
Note that n Dn = Op n from CLT and Jn ( ) = 2 ( n ) n G Wn G n ( n) +
o j nj
2
: Then, using 1 b n instead of
b 0 , the same line of argument as in the
n
proof of Theorem 7.1 in Newey and McFadden (1994) yields that 1 b n = Op n 1=2 :
n
[29]
Proof of (33) De…ne a centered empirical process
p
"n ( ) = n (gn ( ) Egi ( ) gn )
and decompose Rn to obtain a bound (see the proof of Theorem 7.2 of Newey and McFadden
for details) such that
p 5
X
nR ( )
p n rjn ( ) ;
1 + nj nj j=1
where
p p
r1n ( ) = 2 + j n "n ( )0 Wn "n ( ) = 1 +
nj = nj nj
0 p p
r2n ( ) = Egi ( ) G n 1 ( n ) Wn ngn = j nj 1+ nj nj
p p
r3n ( ) = n (Egi ( ) + gn )0 Wn "n ( ) = 1 + n j nj
Let hn ! 0 be any arbitrary sequence. First, note that supj n j hn j"n ( )j = op (1) if the
p
empirical process n (gn ( ) Egi ( )) is stochastically equicontinuous. However, gi ( ) is a
sum of four terms and the …rst is free of and the next two are linear in and , leaving only
the last term to check for the stochastic equicontinuity. Since is bounded and each element
in i( ) is of the type, it 1 fqit > g ; we need to show that the empirical process indexed
by the type is stochastically equicontinuous. However, the indicator functions of half intervals
constitute a Vapnik-Chervonenkis (VC) class and Theorem 2.14.1 of van der Vaart and Wellner
(1996) yields the desired result by choosing an envelope function, j it j 1 fjqit 0j hn g :
Next, note that
p p
sup nEgi ( ) = 1 + nj nj sup jEgi ( )j = j nj = O (1) ;
j nj hn j nj hn
1( =j
due to the di¤erentiability of Egi ( ). For the same reason, supj nj hn Egi ( ) G n n) nj =
o (1) : Therefore, these and the Cauchy-Schwarz inequality yield that supj nj hn jrjn ( )j =
op (1) for all j.
B.2 2SLS
In this section, many variables and processes are indexed by two di¤erent types of parameters,
the reduced form parameter b and the structural form parameter , for instance, eit ( ; b),
Hit ( ; b), Mn ( ; b), Mn ( ; b), and so on. As in previous sections, we make the following
[30]
notational convention, where we write for instance eit = eit ( 0 ; b0 ), eit ( ) = eit ( ; b0 ), and
ebit ( ) = eit ; bb and the same for the other terms.
Now, we turn to the proof of main theorem.
Proof of Theorem 3. We follows the standard three-step approach of establishing consis-
tency, convergence rate, and asymptotic distribution in sequel.
Consistency We show that b = n + op (1) : Recall that
0 0
eit ( ) = eit ( 0 ) Hit ( n) Fit0 1it [1it ( ) 1it ]0 Fit ; (34)
and let
T
X
Mn ( ) = E e2it ( ) :
t=t0
bn( ) p
Then, it is su¢cient to show (i) sup 2 M Mn ( ) ! 0 and (ii) Mn ( ) is continuous
and has a unique minimum at n. For (ii), note that Mn ( ) is continuous everywhere, twice
di¤erentiable everywhere but = 0, and the second derivative with respect to and is
positive de…nite uniformly in by Assumption 8. Furthermore, direct calculation reveals that
@Mn ( ) =@ is positive if > 0 and negative if < 0 in a neighborhood of n: Since the
conditional mean is the minimizer of the mean squared errors, n becomes the unique minimizer
of Mn ( ) in the compact set, . For (i), note that
bn( ) bn( ) p
sup M Mn ( ) sup M Mn ( ) + sup jMn ( ) Mn ( )j ! 0
2 2 2
Convergence of the …rst term following the inequality is delegated to the proof on conver-
gence rate below, while the convergence of the second is a standard ULLN, e.g. Newey and
McFadden’s (1994, Lemma 2.4). Thus, the consistency proof is complete.
Convergence rate We verify the conditions of Theorem 3.4.1 in van der Vaart and
Wellner (1996) with the distance function de…ned by
1=(2 4 )
dn ( ; n) =j 0j +j nj +j 0j :
In particular, in terms of maximization, we need to show that (using their notation), for
n < <
sup fn ( )
M fn (
M n)
2
; (35)
=2<dn ( ; n)
and
p h i
E sup n Men fn ( )
M en
M fn (
M n) C n( )
=2<dn ( ; n)
[31]
1
for functions n such that ! n ( ) = is decreasing on ( n ; ) :Then, for rn C n and
p
rn2 n rn 1 n; and for any b such that
en b
M en(
M n) + Op rn 2 ;
p
d b; n = Op rn 1 : For our case, we set rn = n, n = n 1=2 , and n( ) = . Because
any estimator b satisfying M
en b e n ( n ) + Op r 2 has the convergence rate of r 1 and
M n n
p b n ( ) such that M en( ) Mb n ( ) = Op n 1
rn = n; the maximizer of M has the same
convergence rate of rn 1 in terms of the distance dn :
De…ne
then,
n T
1 XX 2
bn( )
M Mn ( ) = rit ; bb + 2eit ( ) rit ; bb :
n t=t i=1 0
Pn PT
However, the …rst term 1
n ; bb = Op n 1 uniformly in in a neighborhood of
i=1
2
t=t0 rit
p
0 by applying the ULLN, the n-consistency of bb and the di¤erentiability of F in Assumption
4. For the second term, note that, proceeding similarly by expansion of F and H and applying
P P
the CLT and ULLN, n1 ni=1 Tt=t0 eit rit ; bb = Op n 1 uniformly in in a neighborhood
of 0; where eit is the …rst term in the expansion of eit ( ) in (34) : Then,
b n ( ) = Mn ( )
M Rn ; bb + Op n 1
; (36)
P P
where Rn ( ; b) = n2 ni=1 Tt=t0 rit ( ; b) ( 0
0 ) Hit + (
0 0
n ) (Fit 1it ) + [1it ( ) 1it ]0 Fit :
Since bb is square root n consistent, we may consider the process over the expanded pa-
rameter space 2 Bn ; where = f : dn ( ; g for some 1=2
n n n) >n and Bn =
p 0 0 0
fb : jb b0 j K= ng for some K < 1: Note that n = n ; b0 should correspond to n in
van der Vaart and Wellner’s Theorem 3.4.1. Accordingly, from (36) we de…ne
en( ) =
M Mn ( ) + Rn ( ; b) ; (37)
[32]
fn (
and check the …rst condition (35). Note that M n) = Mn ( n) ; and
T
X
fn ( ) =
M Mn ( ) + 2E rit ( ; b) ( 0
0 ) Hit +( n)
0
Fit0 1it + [1it ( ) 1it ]0 Fit ;
t=t0
1=2 p
whose last term is O n due to Assumption 4 and the fact that jb b0 j K= n: Thus, it
is enough to consider Mn ( ) : However, as shown in the consistency proof, Mn ( ) is quadratic
around nin terms of the distance dn and it satis…es the condition (35) :
p en M fn ( ) en M fn ( n )
The maximal inequality for the empirical process n M M
is the second condition to check. Consider Mn ( ) ; the …rst term of Me in (37) : Then, we need
to check the maximal inequality for the centered empirical process:
n T
1 XX 2
p eit ( ) e2it Ee2it ( ) + Ee2it :
n t=t
i=1 0
The function e2it ( ) e2it is the sum of linear and quadratic functions of and multiplied
by [1it ( ) 1it ] : This is a VC class of functions. In this case, a maximal inequality bound is
given by the L2 norm of an envelope. We choose the following envelope:
for some C < 1: The …rst two terms are clearly O ( ) in L2 norm. As the last two terms can
be treated in a similar way, we only need to show that
n o
E1=2 jeit j2 jFit j2 1 jqit 0j
2 4
+ 1 jqit 1 0j
2 4
(j n j + ) = O ( ) :
e n b; bb
M en(
M n ; b0 ) + Op n 1
:
[33]
p
But, for bb b0 K= n; we have
e n b; bb = Mn b; bb + Op n
M 1
Mn n; b
b + Op n 1
en
=M n; b
b + Op n 1
en(
=M n ; b0 ) + Op n 1
;
where we have shown the …rst and third equality in (36) ; the second inequality by construction,
and the last equality follows because Mn ( ; b) does not depend on b for = n: Thus,
p p 1=(2 4 )
ndn ( ; 0) = n j 1 10 j +j 0j = Op (1) :
bn
n M n + h:=rn ; bb Mn b
n; b (38)
on fh : jhj Kg for an arbitrary K < 1; where := is the elementwise division. Then, the
argmax continuous mapping theorem (e.g. van der Vaart and Wellner, 1996) will yield the
desired result.
Let ei = (eit0 ; :::; eiT )0 ; hn = h:=rn ; and 2i h n2 1; b denote the bottom k1 + 1 rows of
i( ; b) evaluated at = 0 + h n2 1, and de…ne
p
mni (h; b) = n [ei ( n + hn ; b) ei (b)]
0 p 0 p
= i (b) hc n 2i h n2 1
;b 2i (b) n +h = n :
Writing ebi = ei bb , m
b ni (h) = mni h; bb , and ei = ei (b0 ) as before, we have:
n n
b b 1X 2 X 0
n Mn n + hn ; b Mn n; b = b ni (h)j2
jm p b ni (h) :
ebi m (39)
n n
i=1 i=1
Consider the last term in (39). By Assumption 4 and (26) ; we apply the mean value theorem
to get an expansion:
n n
1 X 1 X
p b ni (h)0 ebi = p
m b ni (h)0
m "i
n n
i=1 i=1
n eb
0
n
!
1X @ i 10 E (Fi F0i ) 1 X
+ b ni (h)0
m p Fi i + op (1) : (40)
n @b0 n
i=1 i=1
[34]
Next, expand its …rst term in (40):
n
1 X
p b ni (h)0
m "i
n
i=1
n
1 X 0 bb 1
=p hc i n2 ( 0 + o (1))0 2i h n2 1
; bb 2i
bb "i
n
i=1
Xn n
1 1 X 1
=p h0c i "i p n2 0
0 2i h n2 1
2i "i + op (1) ; (41)
n n
i=1 i=1
where the last equality is due to the asymptotic normality of bb. The CLT applies for the …rst
term in (41) : For the weak convergence of the second term, we need to consider a sequence of
classes of functions:
n 1
o
0
Gn = gn (h ) = n 2 0 2i h n2 1
2i "i : jh j < K ;
and apply Theorem 2.11.22 of van der Vaart and Wellner (1996). Recall that 2i h n2 1 is
the collection of Fit0 1it h n2 1 over all t: As the indicator functions (and those multiplied by a
random variable) constitute a VC class of functions, they satisfy the uniform entropy condition
of Theorem 2.11.22. Since 2i ( ) has continuous …rst and second moments, it remains to verify
the conditions on the envelope Gn : It is clear that EG2n = O (1) and the Lindeberg condition
is satis…ed since
p
E G2n 1 jGn j > n
T
X
E2n1 2
j 0 j2 1 jqit 0j h n 1+2
t=t0 1
n
j "i j2 jF (zi )j2 1 j "i j jF (zi )j >
2 (T + 1) j 0 j
O n = o (1) :
due to Assumption 7. We will specify the covariance kernel below after noting that the second
term in (40) expands by the standard Taylor series expansion to yield
n
2 X 0
p b ni (h)
ebi m
n
i=1
n
" #
0 0 1 2 X e ni (h)0
m "i
= I e ni (h) [IT
Em ( 0 )] Fi E Fi F0i p + op (1) ;
n Fi i
i=1
[35]
1
0
e ni (h) = h0c
where m i n2 0 2i h n2 1
2i . Turning back to the covariance kernel
of the empirical process indexed by Gn above and the covariance between the process indexed
by hc and the process indexed by h ;we note that the latter vanishes due to the di¤erence
in the convergence rates. For this, it is enough to observe that each element in the matrix
E 2i h n2 1
2iis bounded by, up to a constant,
Z
E1 jqit 0j h n2 1 = 1 fjqj 1g p h n2 1 q + 0 h n2 1
dq = O n2 1
;
due to Assumption 2, where the change-of-variable is applied for the …rst equality. By the
same reasoning,
0
@ i
eb 10 0
@ i 10
e ni (h)
Em = h0c E i + o (1) ;
@b0 @b0
Pn
and the limit of 1
n b ni (h)j2
i=1 jm
is the sum of a quadratic function of hc and a function of
0 0
h without any interaction term. This implies the asymptotic independence between b ; b
and b: For the former, note that gn (h ) gn h = 0 unless h and h have the same sign. For
h >h 0;
1+2
n E gn (h ) gn h
T
X h i0
0
= 0 E "it "ir Fit0 1it 0+h n
2 1
1it 1ir 0+h n
2 1
1ir Fir 0:
r;t=t0
(42)
The evaluation of the expectation can be done in the same way as above. Thus, those expec-
tations involving the products of indicators of qit and qit0 with t 6= t0 will vanish. After some
0
algebra, we can show that the limit of (42) is 0 V2 ( 0 ) 0 h h ; and more generally
n o
0
0 V2 ( 0 ) 0 h h 1 sgn (h ) = sgn h ;
where V2 ( ) is given in Section 4. This functional form of the covariance kernel implies that
the limit Gauss process is a two-sided Brownian motion originating from zero.
P P
Now, applying a standard ULLN to n1 ni=1 Tt=t0 mit (h; b)2 , and using the consistency of
bb and the same line of arguments as above, we may conclude that
n
1X p
b ni (h)j2 ! h0c E
jm 0
i i hc + M2 ( 0 ) jh j:
n
i=1
[36]
representation in main body of the theorem follows from Hansen (2000), in which it is shown
for a two-sided standard Brownian motion W and for any positive constants c1 and c2 that
p c2 j j
argmin [c1 j j 2 c2 W ( )] = 2 argmin W( ) :
2R c1 2R 2
Furthermore, the same line of proof as in Theorem 2 of Hansen (2000) applies to the convergence
of LRn ( ) given the results obtained above about b1 and b: This completes the proof.
0
Proof of Corollary 4. The consistency proof is almost identical to Theorem 3, and
thus omitted. For the convergence rate of the estimator, recall that we need to verify two
conditions, one is the condition on the limit criterion function and the other is the condition
on the maximal inequality of the empirical process part. The latter is identical to that in
Theorem 3 since the sum of two VC classes of functions is VC. For the former note that the
current case has another component in the regression function than in Theorem 3, which is
1 fqit 1 > g. This generates a kink in the limit criterion function at 0 as 1 fqit > g does.
Therefore, the limit criterion function has the same feature as the one in Theorem 3. Thus,
we get the same rate of convergence as in Theorem 3.
Finally, turning to the asymptotic distribution, we note that the argument for the stochastic
equicontinuity of the rescaled criterion function is the same as in Theorem 3. To get the
covariance kernel of the limit Gaussian process note that, as discussed in (42), the covariances
between two terms involving two indicators of qit and qit0 with t 6= t0 vanish, yielding the
covariance kernel as desired. Details are omitted to avoid repetition.
B.3 Testing
Proof of Theorem 2. Applying the standard ULLN and the continuous mapping theorem
to (32), we have:
2 h i 1
3
0 0 1 0 1
Z0 1=2 G ( ) G( ) 1G ( ) R0 R G( ) 1G ( ) R0
Wn ( ) ) 4 1
5
0 1G ( 1=2 Z;
R G( ) ) G( )
[37]
show the stochastic equicontinuity of the process. Recall the expression from (23) that
! n T
! 1 n T
!
b( ) 1 XX 0 1 XX
= Xit bbt ; Xit bbt ; Xit bbt ; yit ;
b( ) n n
t=t
i=1 0 t=t
i=1 0
0 0
where Xit (bt ; ) = Ht (zit ; bt )0 Ft (zit ; bt )0 1it ( )
: The uniform convergence of the …rst
sum can be derived as in the proof of Theorem 3 using the ULLN and the consistency of bb
p b( )
0
in Assumption 4. Thus, the stochastic equicontinuity of n ( ) ; b ( )0 implies
p
that of nb ( ). Since the functions Ht and Ft are twice continuously di¤erentiable in bt ;it
ends up with verifying the stochastic equicontinuity of the empirical process of the types of
functions f (zit ) 1 fqit > g ; where f is some known transformation of zit : However, this is a
VC class of function, which implies the stochasting equicontinuity of the empiricl process of
this class of functions, see e.g. van der Vaart and Wellner’s (1996) Section 2.6.
Acknowledgement
We are mostly grateful to the editor, Han Hong, the associate editor and three anonymous ref-
erees for their helpful comments. We are also grateful to Mini Ahn, Heather Anderson, Mehmet
Caner, Jinseo Cho, In Choi, Viet Anh Dang, Robert Fa¤, Matthew Greenwood-Nimmo, Jin-
wook Jeong, Taehwan Kim, Jay Lee, Myungjae Lee, James Morley, Joon Park, Kevin Reilly,
Laura Serlenga, seminar participants at Universities of Canterbury, Korea, Leeds, Melbourne,
New South Wales, Queensland, Sogang, Sung Kyun Kwan and Yonsei, and conference dele-
gates at the AMES at Korea University, Seoul, August 2011, the 20th Panel Data Conference
at Hitotsubashi University, Tokyo, July 2014 and the ESEM, Toulouse, August 2014 for their
helpful comments. We would like to thank Minjoo Kim for excellent research assistance. The
…rst author acknowledges support by Promising-Pioneering Researcher Program by Seoul Na-
tional University (SNU) in 2015 and partial support by Jewon research institute. The second
author acknowledges partial …nancial support from the ESRC (Grant No. RES-000-22-3161).
The usual disclaimer applies.
References
[1] Ahn, S.C. and P. Schmidt, 1995, E¢cient Estimation of Models for Dynamic Panel Data.
Journal of Econometrics 68, 5-27.
[38]
[2] Aivazian, V.A., Y. Ge and J. Qiu, 2005, The Impact of Leverage on Firm Investment:
Canadian Evidence. Journal of Corporate Finance 11, 277-291.
[3] Alvarez, J. and M. Arellano, 2003, The Time Series and Cross-section Asymptotics of
Dynamic Panel Data Estimators. Econometrica 71, 1121-1159.
[5] Arellano, M. and S. Bond, 1991, Some Tests of Speci…cation for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations. Review of Economic Studies 58,
277-297.
[6] Arellano, M. and O. Bover, 1995, Another Look at the Instrumental Variable Estimation
of Error Components Models. Journal of Econometrics 68, 29-51.
[7] Bai, J., 2009, Panel Data Models with Interactive Fixed E¤ects. Econometrica 77, 1229-
1279.
[8] Blundell, R. and S. Bond, 1998, Initial Conditions and Moment Restrictions in Dynamic
Panel Data Models. Journal of Econometrics 87, 115-143.
[9] Bun, M.J.G. and F. Windmeijer, 2010, The Weak Instrument Problem of the System
GMM Estimator in Dynamic Panel Data Models. Econometrics Journal, 95–126.
[10] Caner, M. and B.E. Hansen, 2004, Instrumental Variable Estimation of a Threshold Model.
Econometric Theory 20, 813-843.
[11] Chamberlain, G., 1987, Asymptotic E¢ciency in Estimation with Conditional Moment
Restrictions. Journal of Econometrics 34, 305–334.
[12] Chamberlain, G. and G. Imbens, 2004, Random E¤ects Estimators with Many Instru-
mental Variables. Econometrica 72, 295-306.
[13] Chan, K.S., 1993, Consistency and Limiting Distribution of the Least Squares Estimator
of a Threshold Autoregressive Model. Annals of Statistics 21, 520-33.
[14] Chen, H., T.T.L. Chong and J. Bai, 2012, Theory and Applications of TAR Model with
Two Threshold Variables. Econometric Reviews 31, 142–170.
[15] Chong, T.T.L. and I.K.M. Yan, 2015, A New Threshold Regression Approach to Predict
Currency Crises. mimeo., Chinese University of Hong Kong.
[39]
[16] Dang, V.A., M. Kim and Y. Shin, 2012, Asymmetric Capital Structure Adjustments:
New Evidence from Dynamic Panel Threshold Models. Journal of Empirical Finance 19,
465-482.
[17] Davies, R.B., 1977, Hypothesis Testing when a Nuisance Parameter is Present only under
the Alternative. Biometrika 64, 247-254.
[18] Fazzari, S.M., R.G. Hubbard and B.C. Petersen, 1988, Financing Constraints and Corpo-
rate Investment. Brookings Papers on Economic Activity 1, 141–195.
[19] Fok, D., D. van Dijk and P.H. Franses, 2005, A Multi-Level Panel STAR model for US
Manufacturing Sectors. Journal of Applied Econometrics 20, 811-827.
[20] González, A., T. Teräsvirta and D. van Dijk, 2005, Panel Smooth Transition Model and an
Application to Investment Under Credit Constraints. Working Paper, Stockholm School
of Economics.
[21] Hansen, B.E., 1996, Inference when a Nuisance Parameter is not Identi…ed under the Null
Hypothesis. Econometrica 64, 414-30.
[22] Hansen, B.E., 1999, Threshold E¤ects in Non-dynamic Panels: Estimation, Testing and
Inference. Journal of Econometrics 93, 345-368.
[23] Hansen, B.E., 2000, Sample Splitting and Threshold Estimation. Econometrica 68, 575-
603.
[24] Hansen, B.E., 2011, Threshold Autoregression in Economics. Statistics and Its Interface
4, 123-127.
[25] Hansen, L., J. Heaton and A. Yaron, 1996, Finite-sample Properties of Some Alternative
GMM Estimators. Journal of Business and Economic Statistics 14, 262–280.
[26] Hausman, J.A., 1978, Speci…cation Tests in Econometrics. Econometrica 46, 1251–1271.
[27] Hausman, J., R. Lewis, K. Menzel and W. Newey, 2011, Properties of the CUE Estimator
and a Modi…cation with Moments. Journal of Econometrics 165, 45–57.
[28] Hayakawa, K., 2015, The Asymptotic Properties of the System GMM Estimator in Dy-
namic Panel Data Models when Both N and T are Large. Econometric Theory 31: 647–667.
[40]
[29] Hovakimian, G. and S. Titman, 2006, Corporate Investment with Financial Constraints:
Sensitivity of Investment to Funds from Voluntary Asset Sales. Journal of Money, Credit,
and Banking 38, 357-374.
[30] Hsiao, C., 2003, Analysis of Panel Data. Cambridge: Cambridge University Press.
[31] Hsiao, C., M.H. Pesaran and K. Tahmiscioglu, 2002, Maximum Likelihood Estimation
of Fixed E¤ects Dynamic Panel Data Models Covering Short Time Periods. Journal of
Econometrics 109, 107-150.
[32] Hsiao, C. and J. Zhang, 2015, IV, GMM or Likelihood Approach to Estimate Dynamic
Panel Models when either N or T or both are Large. Journal of Econometrics 187, 312–322.
[33] Holtz-Eakin, D., W.K. Newey and H.S. Rosen, 1988, Estimating Vector Autoregressions
with Panel Data. Econometrica 56, 1371–1395
[34] Jensen, M., 1986, Agency Costs of Free Cash Flow, Corporate Finance and Takeovers.
American Economic Review 76, 323-339.
[35] Kapetanios, G., 2010, Testing for Exogeneity in Threshold Models. Econometric Theory
26, 231-259.
[36] Kaplan, S. and L. Zingales, 1997, Do Financing Constraints Explain Why Investment is
Correlated with Cash Flow?” Quarterly Journal of Economics 112, 169-216.
[37] Kim, C.J. and J. Piger and R. Startz, 2008, Estimation of Markov Regime-switching
Regression Models with Endogenous Switching. Journal of Econometrics 143, 263-273.
[38] Kourtellos, A., T. Stengos and C.M. Tan, 2015, Structural Threshold Regression. forth-
coming in Econometric Theory.
[39] Kremer, S., A. Bick and D. Nautz, 2013, In‡ation and Growth: New Evidence from a
Dynamic Panel Threshold Analysis. Empirical Economics 44, 861-878.
[40] Lang, L., E. Ofek and R.M. Stulz, 1996, Leverage, Investment, and Firm Growth. Journal
of Financial Economics 40, 3-29.
[41] Lee, S., M.H. Seo, and Y. Shin, 2011, Testing for Threshold E¤ects in Regression Models.
Journal of the American Statistical Association 106, 220-231.
[41]
[42] Newey, W. and D.L. McFadden, 1994, Large Sample Estimation and Hypothesis Testing.
in Handbook of Econometrics IV, eds by R.F. Engle and D.L. McFadden, 2111-2245,
Elsevier.
[43] Nickell, S., 1981, Biases in Dynamic Models with Fixed E¤ects. Econometrica 49, 1417-
1426.
[44] Pesaran, M.H., 2006, Estimation and Inference in Large Heterogeneous Panels with a
Multifactor Error Structure. Econometrica 74, 967-1012.
[45] Seo, M.H. and O. Linton, 2007, A Smoothed Least Squares Estimator for Threshold
Regression Models. Journal of Econometrics 141, 704-735.
[46] Sun, Y., 2014, Fixed-smoothing Asymptotics in a Two-step Generalized Method of Mo-
ments Framework. Econometrica 82: 2327–2370.
[47] Tong, H., 1990, Nonlinear Time Series: A Dynamical System Approach. Oxford: Oxford
University Press.
[48] van der Vaart, A.W. and J.A. Wellner, 1996, Weak Convergence and Empirical Process.
New York: Springer.
[49] Yu, P., 2013, Inconsistency of 2SLS Estimators in Threshold Regression with Endogeneity.
Economics Letters 120, 532-536.
[50] Yu, P. and P.C.B. Phillips, 2014, Threshold Regression with Endogeneity. Cowels Foun-
dation Discussion Paper No. 1966.
[51] Whited, T.M and G. Wu, 2006, Financial Constraints Risk. Review of Financial Studies
19, 531-559.
[52] Zilak, J., 1997, E¢cient Estimation with Panel Data When Instruments Are Predeter-
mined: An Empirical Comparison of Moment-Condition Estimators. Journal of Business
and Economic Statistics 15, 419-431.
[42]
Table 1: MSE of FD-GMM estimators
FD-GMM Averaging
DGP n 1 2 1 2
Jump 50 0.063 0.077 0.179 0.498 0.115 0.096 0.185 0.566
100 0.089 0.075 0.207 0.600 0.087 0.066 0.172 0.517
200 0.066 0.068 0.174 0.536 0.067 0.056 0.144 0.474
Cont. 50 0.077 0.320 0.588 0.863 0.009 0.112 0.292 0.273
100 0.079 0.383 0.677 1.002 0.041 0.203 0.439 0.591
200 0.083 0.383 0.662 0.963 0.060 0.289 0.542 0.743
FD-GMM Averaging
DGP n 1 2 1 2
Jump 50 0:041 0:005 0:044 0:100 0:269 0:199 0:151 0:390
100 0:047 0:007 0:044 0:095 0:106 0:073 0:070 0:093
200 0:029 0:011 0:018 0:098 0:060 0:016 0:034 0:033
Cont. 50 0.057 0.180 -0.288 0.184 0.055 0.105 -0.198 0.163
100 0.064 0.145 -0.271 0.199 0.057 0.099 -0.231 0.210
200 0.074 0.190 -0.298 0.162 0.067 0.158 -0.270 0.170
FD-GMM Averaging
DGP n 1 2 1 2
Jump 50 0.247 0.277 0.421 0.699 0.207 0.238 0.402 0.644
100 0.294 0.273 0.452 0.769 0.275 0.246 0.409 0.713
200 0.255 0.261 0.417 0.726 0.252 0.236 0.377 0.688
Cont. 50 0.272 0.537 0.711 0.911 0.080 0.317 0.503 0.497
100 0.274 0.601 0.777 0.981 0.194 0.440 0.621 0.739
200 0.279 0.589 0.757 0.968 0.236 0.514 0.685 0.845
[43]
Table 4: MSE of FD-GMM estimators (restricted)
FD-GMM Averaging
DGP n
Jump 50 0.105 0.102 0.124 0.050 0.095 0.132
100 0.106 0.116 0.142 0.075 0.097 0.122
200 0.095 0.080 0.102 0.076 0.070 0.088
Cont. 50 0.033 0.075 0.155 0.019 0.067 0.143
100 0.039 0.094 0.192 0.030 0.085 0.177
200 0.039 0.082 0.170 0.034 0.080 0.168
FD-GMM Averaging
DGP n
Jump 50 0.009 0.051 -0.008 -0.029 -0.082 0.143
100 0.012 0.064 -0.047 0.021 0.031 -0.010
200 0.028 0.052 -0.047 0.025 0.041 -0.035
Cont. 50 0.013 -0.049 0.103 0.092 -0.008 0.038
100 0.021 -0.081 0.144 0.052 -0.053 0.098
200 0.014 -0.064 0.116 0.028 -0.051 0.094
FD-GMM Averaging
DGP n
Jump 50 0.324 0.315 0.352 0.222 0.297 0.335
100 0.325 0.334 0.374 0.273 0.310 0.350
200 0.307 0.278 0.316 0.275 0.261 0.295
Cont. 50 0.182 0.270 0.380 0.102 0.259 0.376
100 0.196 0.295 0.414 0.164 0.286 0.409
200 0.197 0.279 0.396 0.183 0.278 0.399
[44]
Table 7: Coverage Frequency of FD-GMM estimators
FD-GMM Averaging
DGP h n 1 2 1 2
Jump 1/2 50 0.876 0.731 0.736 0.647 0.878 0.641 0.753 0.705
100 0.931 0.895 0.897 0.847 0.942 0.884 0.907 0.871
200 0.937 0.917 0.950 0.897 0.939 0.930 0.956 0.914
1 50 0.960 0.857 0.886 0.716 0.995 0.821 0.894 0.778
100 0.978 0.962 0.971 0.899 0.991 0.946 0.973 0.928
200 0.979 0.963 0.967 0.933 0.983 0.969 0.979 0.947
3/2 50 0.986 0.882 0.928 0.814 1.000 0.805 0.910 0.867
100 0.995 0.968 0.977 0.936 0.998 0.969 0.980 0.954
200 1.000 0.971 0.982 0.971 1.000 0.973 0.985 0.974
Cont. 1/2 50 0.427 0.473 0.621 0.518 0.904 0.700 0.744 0.694
100 0.525 0.716 0.804 0.698 0.772 0.819 0.857 0.798
200 0.585 0.796 0.894 0.798 0.691 0.847 0.926 0.839
1 50 0.811 0.592 0.745 0.624 0.990 0.780 0.871 0.799
100 0.898 0.795 0.916 0.806 0.980 0.881 0.947 0.876
200 0.900 0.862 0.947 0.868 0.947 0.905 0.965 0.894
3/2 50 0.965 0.680 0.810 0.669 0.999 0.847 0.904 0.865
100 0.997 0.892 0.944 0.843 1.000 0.937 0.970 0.916
200 1.000 0.917 0.969 0.889 1.000 0.941 0.980 0.914
Note: These are empirical coverage frequencies of 95% nominal con…dence in-
tervals. The bandwidth for the asymptotic variance estimation in equation (12) is
selected by h times Silverman’s rule of thumb.
[45]
Table 8: A dynamic threshold panel data model of investment
[46]
Table 9: E¢ciency Comparison of FD-GMM and FD-2SLS Estimators
FD-GMM FD-2SLS
n bias ln (RM SE) bias ln (RM SE)
DGP 1 50 -0.002 -2.6 0.002 -4.7
100 0.003 -2.6 0.001 -4.9
200 0.002 -2.7 0.0 -5.0
DGP 2 50 0.007 -1.7 0.009 -3.2
100 -0.001 -1.8 0.003 -4.0
200 -0.006 -1.9 0.002 -4.6
[47]