0% found this document useful (0 votes)

10 views13 pages

FPCR Jasa

This document introduces functional versions of Principal Component Regression (PCR) and Partial Least Squares (PLS) to address the challenges of regressing scalar responses on high-dimensional signal predictors, such as NIR spectra. The authors develop two functional PCR methods that incorporate roughness penalties and demonstrate their advantages over existing methods through simulation studies and real data analyses. The paper also discusses parameter selection techniques and asymptotic convergence properties for these new methodologies.

Uploaded by

reissphil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

FPCR Jasa

Uploaded by

reissphil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Functional Principal Component Regression and

Functional Partial Least Squares

Philip T. R EISS and R. Todd O GDEN

Regression of a scalar response on signal predictors, such as near-infrared (NIR) spectra of chemical samples, presents a major challenge
when, as is typically the case, the dimension of the signals far exceeds their number. Most solutions to this problem reduce the dimen-
sion of the predictors either by regressing on components [e.g., principal component regression (PCR) and partial least squares (PLS)] or
by smoothing methods, which restrict the coefficient function to the span of a spline basis. This article introduces functional versions of
PCR and PLS, which combine both of the foregoing dimension-reduction approaches. Two versions of functional PCR are developed, both
using B-splines and roughness penalties. The regularized-components version applies such a penalty to the construction of the principal
components (i.e., it uses functional principal components), whereas the regularized-regression version incorporates a penalty in the regres-
sion. For the latter form of functional PCR, the penalty parameter may be selected by generalized cross-validation, restricted maximum
likelihood (REML), or a minimum mean integrated squared error criterion. Proceeding similarly, we develop two versions of functional
PLS. Asymptotic convergence properties of regularized-regression functional PCR are demonstrated. A simulation study and split-sample
validation with several NIR spectroscopy data sets indicate that functional PCR and functional PLS, especially the regularized-regression
versions with REML, offer advantages over existing methods in terms of both estimation of the coefficient function and prediction of future
observations.
KEY WORDS: B-splines; Functional linear model; Linear mixed model; Multivariate calibration; Signal regression; SIMPLS.

1. INTRODUCTION chemometrics. In a typical application, the signals are near-

infrared (NIR) spectra representing reflectance of radiation by
The signal regression problem begins with a vector y =
a chemical sample as a function of the wavelength of the ra-
(y1 , . . . , yn )T of scalar responses for n samples and an n × N
diation, and the responses are analytical properties of the sam-
matrix X = (x1 , . . . , xn )T , whose ith row is an N -dimensional
ple, such as water content. The popular methods of principal
signal (i.e., a vector of predictors corresponding to sites along a
component regression (PCR) and partial least squares (PLS) re-
continuum) for the ith sample. The columns of X are assumed
duce the number of predictors by extracting a limited number
to have mean 0. We seek an intercept α and an N -dimensional of components (which, it is hoped, are the most useful ones for
vector ω, sometimes called the coefficient function or weight the regression) and discarding the rest of the predictor data.
function, to make This article introduces a set of flexible methods that com-
"y − α1 − Xω"2 (1) bine the smoothing and dimension-reduction approaches. As
√ we hope to demonstrate, some of these methods compare very
“small,” where "v" = vT v. For the methods that we dis- favorably with existing ones.
cuss herein, α̂ = ȳ; our interest centers on estimating ω. For More specifically, the contributions of the article are as fol-
large N , the problem suffers from multicollinearity. Two main lows. First, we develop two functional data analytic (Ramsay
dimension-reduction approaches have been explored to circum- and Silverman 2005) formulations of PCR and PLS. Both of
vent this difficulty: smoothing and component selection. these formulations depend on roughness penalty smoothing;
A smoothing-based formulation of signal regression is the they differ in that the first applies a roughness penalty in the
functional linear model (see Ramsay and Silverman 2005, construction of principal components (PCs) or PLS compo-
chap. 15, and references therein). The general spline problem nents, whereas the second incorporates it in the regression on
(Wahba 1990) encompasses signal regression as well as non- these components. The penalized-components functional ver-
parametric regression. Here we observe yi = Li f + εi for i = sions of PCR and PLS are based on ideas of Silverman (1996)
1, . . . , n for given functionals Li , and we seek the unknown and Goutis and Fearn (1996). Penalized-regression functional
function f . When the Li are evaluation functionals, Li f = PCR/PLS, on the other hand, is a novel methodology. Second,
f (ti ) for given sites t1 , . . . , tn , we have the special spline prob- we extend (in Secs. 5.2–5.4) three approaches to smoothing
lem, that is, the standard problem of nonparametric regression parameter selection from the nonparametric regression to the
!
by splines. When the Li are given by Li f = xi f for known signal regression context: generalized cross-validation (Wahba
functions x1 , . . . , xn , we have signal regression. 1990); the mixed-model approach exploited by Ruppert, Wand,
The component selection approach is exemplified by stan- and Carroll (2003); and the optimal smoothing ideas of Wand
dard approaches to the multivariate calibration problem in (1999). Third, we compare the new methodologies with some
existing signal regression approaches by means of a simulation
study and real data analyses.
Philip T. Reiss is Assistant Professor, New York University Child Study Cen- Section 2 presents an existing smoothing-based approach to
ter, New York, NY 10016. R. Todd Ogden is Associate Professor, Department signal regression, as well as PCR and PLS, the two standard
of Biostatistics, Columbia University, New York, NY 10032. The authors thank
Chung Chang, Martin Lindquist, Marianthi Markatou, Ian McKeague, Martina
component-selection methods. These serve as building blocks
Pavlicova, Eva Petkova, Ioana Schiopu-Kratina, and Hongtu Zhu for informa- for our hybrid methods, which are “functional” versions of PCR
tive discussions; Hervé Cardot, Sijmen de Jong, Paul Eilers, and Brian Marx
for graciously answering questions about their related work; Phil Hopke for
making the spectroscopy data publicly available; and the joint editors, asso- © 2007 American Statistical Association
ciate editor, and referees, whose comments significantly improved the article. Journal of the American Statistical Association
The first author gratefully acknowledges support from the National Institute of September 2007, Vol. 102, No. 479, Theory and Methods
Mental Health through grant 1 F31 MH73379-01A1. DOI 10.1198/016214507000000527

984
Reiss and Ogden: Functional PCR and PLS 985

(Sec. 3) and PLS (Sec. 4). Section 5 discusses how smoothing elements of D in descending order. Let UA , DA , and VA be
parameters are selected, and Section 6 gives asymptotic conver- the truncated-at-A versions of the three SVD matrices; that is,
gence results for a version of functional PCR. Section 7 com- UA and VA consist of the first A columns of U and V, whereas
pares the performance of the various methods, and Section 8 DA is the A × A upper left submatrix of D, which we assume to
concludes with some discussion and directions for future re- be nonsingular. The columns of VA are the first A eigenvectors
search. of the covariance matrix XT X, that is, the loadings of the first A
PCs of the signal data. Thus if we restrict ω in (1) to the column
2. BUILDING BLOCKS space of VA or, equivalently, find the unconstrained minimizing
2.1 Penalized B –Spline Expansion ζ ∈ RA of

Marx and Eilers (1999) proposed overcoming the multi- "y − α1 − XVA ζ "2 , (3)
collinearity problem by projecting ω onto a B-spline basis and then our new design matrix XVA has as its columns the first A
adding a roughness penalty to the criterion to be minimized, PCs of the original regressors. The unconstrained minimization
which then becomes of (3) is therefore referred to as A-component PCR.
"y − α1 − XBβ"2 + λβ T PT Pβ. (2) The A-component PCR model that we have described in-
cludes the A PCs with the largest variances—what is sometimes
Here B is an N × K B-spline basis design matrix with (i, j ) called choosing PCs “from the top.” Thus PCs are chosen with-
entry Bj (ti ), where Bj is the j th basis function, ti is the ith in- out regard to how well they predict the response. Such a prac-
dex value or site, P is a full-rank r × K matrix with r ≤ K tice might be justified on the grounds that “fortunately, there
such that β T PT Pβ provides a measure of the roughness of is often a tendency. . . for the components with the largest vari-
the function ω = Bβ, and λ is a parameter controlling the ex- ances to best explain the dependent variables” (Mardia, Kent,
tent to which roughness is penalized (i.e., the higher the value and Bibby 1979). Nevertheless, many authors have been un-
of λ, the smoother the fitted function). Marx and Eilers took P willing to assume that matters will work out so fortuitously and
to be a (K − d) × K differencing matrix Pd such that Pd w have proposed ways to take the responses into account when
gives the dth-order differences of w. The resulting difference choosing which PCs to include in the regression. One such
penalty yields a method that Marx and Eilers termed P -spline method (Massy 1965) includes those PCs that are most highly
signal regression
! (PSR). Alternatively, if P is chosen such that correlated with the response and thus explain the most variation
PT P = ( Bi&& (t)Bj&& (t) dt)1≤i,j ≤K (or at least a discrete approx- therein. Such alternatives to choosing PCs from the top are not
imation to that integral), then β T PT Pβ equals (approximately) considered further here.
the integrated squared second derivative of the weight function
2.3 Partial Least Squares
ω = Bβ. The latter roughness penalty was used by, for exam-
ple, Cardot, Ferraty, and Sarda (2003) and is used herein. We PLS regression may be thought of as a counterpart method
refer to the resulting model as the penalized B-spline expan- to PCR that seeks to improve on the latter. A potential draw-
sion (PBSE). back of PCR is that the regressed-on components are chosen
Whereas PBSE yields consistent estimates of the true coef- based solely on how much of the predictor variance they ex-
ficient function (Cardot et al. 2003), Marx and Eilers (2002) plain, without reference to how well they explain the responses.
reported that their original version of PSR (i.e., PBSE with a PLS, in contrast, seeks components that are most relevant to
difference penalty) yielded coefficient function estimates that predicting the outcome.
were “too enthusiastic, with magnitude too large for success- PLS is often presented as an iterative algorithm that approx-
ful stability.” They consequently added a ridge penalty to their imately decomposes both X and y in terms of latent variables
model to shrink the coefficients toward 0. Although this appears or score vectors ta . The following version of the algorithm was
to improve the performance of PSR, it might be argued that un- given by Goutis and Fearn (1996). Suppose that we have an
like the squared second derivative, this ridge term has no natural n × N predictor matrix X and an n-dimensional response vec-
interpretation as a measure of roughness, and necessitates op- tor y, both of which are centered. Set E0 = X and f0 = y, and
timizing over an additional continuous parameter—something let M+ denote the Moore–Penrose inverse of matrix M. For
that we might wish to avoid, especially with very large data a = 1, . . . , A, let
sets. The methods developed in Sections 3 and 4, in contrast, ETa−1 fa−1
require optimizing over one discrete parameter and one contin- pa = ,
uous parameter. "ETa−1 fa−1 "
ta = Ea−1 pa ,
2.2 Principal Component Regression
Ea = Ea−1 − ta pTa ,
In this section and the following one, we consider the two
major component-selection approaches to minimizing "y − and
α1 − Xω"2 over the N -dimensional weight function ω. The
fa = fa−1 − ta tTa (Ea−1 ETa−1 )+ fa−1 .
first such approach, PCR (Massy 1965), regresses not on the
N columns of X, but rather on a small number of regressors From the loadings pa and scores ta (a = 1, . . . , A), we can
accounting for most of the variability of the signal data. derive weight vectors r1 , . . . , rA of unit length, such that
To give this a more precise formulation, we consider UDVT , cov(y, Xri ) is maximized successively subject to orthogonal-
the singular value decomposition (SVD) of X, with the diagonal ity of the Xri . The PLS solution with A components then
986 Journal of the American Statistical Association, September 2007

minimizes "y − α1 − XRA ζ "2 over ζ ∈ RA [cf. (3)], where The two methods can be thought of as differing in terms of
RA = (r1 , . . . , rA ). The SIMPLS algorithm of de Jong (1993), what the adjective “functional” modifies. In the first method,
which is equivalent to PLS (for univariate y, the only case that the principal components are “functional” in the sense that their
we consider here) but more computationally efficient, derives construction uses a roughness penalty (as in Silverman 1996).
the weight vectors r1 , . . . , rA directly. In the second method, it is the regression that incorporates a
roughness penalty and thus shares the “functional” character
3. FUNCTIONAL PRINCIPAL COMPONENT of Ramsay and Silverman’s (2005, chap. 15) functional regres-
REGRESSION WITH B –SPLINES sion. Hence the two methods correspond to two ways of pars-
ing the term “functional principal component regression.” The
3.1 XB as the Design Matrix; B –Spline Principal
first can be viewed as {functional principal component} regres-
Component Regression
sion; the second, as functional {principal component regres-
Although PCR often alleviates the ill-posed nature of the sion}. In what follows we denote the two methods by FPCRC
original regression problem, it is invariant under permutation and FPCRR , to indicate, respectively, application of a rough-
of the regressors—in other words, it fails to take their ordering ness penalty to the components and to the regression.
into account—and produces a nonsmooth weight function esti-
mate ω̂. A proposal of Cardot et al. (2003) works around these 3.3 FPCRC
difficulties by projecting ω̂ onto a B-spline basis, that is, replac- 3.3.1 Functional Principal Components. The first FPCR
ing it with B(BT B)−1 BT ω̂, where B is as in (2). Whereas this method, FPCRC , incorporates the roughness penalty into the
device does produce an at least minimally smooth weight func- construction of the PCs. We seek to minimize "y − α1 −
tion, it assumes that the eigendecomposition or SVD used for XBṼA ζ "2 , which is the same as criterion (4) except with VA
PCR is trustworthy despite the large number N of predictors. replaced by a new matrix, ṼA , the construction of which we
Yet part of the reason for smoothing is our reluctance to trust now describe.
such a decomposition for large N . Indeed, Cardot et al. (2003) In (4), the columns of VA are the loadings of the first A
found this method somewhat less effective than PBSE. PCs of the regressor data XB. Equivalently, they are the first A
Instead of projecting ω̂ onto a B-spline basis only after PCR eigenvalues of the covariance matrix BT XT XB and thus are the
fitting, we might consider fitting PCR after projection, that is, successive maximizers of vT BT XT XBv/vT v subject to ortho-
with design matrix XB taking the place of X. (Note that the ap- normality. The columns of ṼA , on the other hand, are functional
plication of PCR with design matrix XB makes sense because, PC loading vectors (Silverman 1996; Ramsay and Silverman
assuming the size of the basis to be large, this new design matrix 2005). These are the orthonormal vectors v that successively
again presents an ill-posed problem, and moreover it is readily maximize
seen to have mean-0 columns.) The “B-spline PCR” solution
with A components will minimize vT BT XT XBv
,
vT (I + λPT P)v
"y − α1 − XBVA ζ "2 (4)
where P is chosen so that vT PT Pv measures the roughness of v
over ζ ∈ RA , where VA is now derived from the SVD XB = [cf. (2)].
UDVT . 3.3.2 Choice of λ. In their discussion of functional princi-
pal component analysis, Ramsay and Silverman (2005,
3.2 Two Approaches to Functional Principal sec. 9.3.3) proposed choosing the smoothing parameter λ by
Component Regression cross-validation (CV). Essentially, given A, we minimize the
The degree of smoothing achieved by the foregoing B-spline sum of squared L2 distances between the signals xi (i =
PCR method depends on the richness of the B-spline basis and 1, . . . , n) and the projections of each signal on the first A func-
thus may be quite minimal. We would prefer a method that lets tional PCs that would be formed if the signal matrix X were
the data choose the level of smoothing through an appropriate replaced with X(−i) , the (n − 1) × N matrix of all signals ex-
roughness penalty. This goal accords with the functional data cept the ith.
analysis paradigm of Ramsay and Silverman (2005), and thus The foregoing CV procedure was proposed in the context of
such a method might be called functional PCR (FPCR). Such an functional principal component analysis, and accordingly does
approach—first projecting onto a B-spline basis, then smooth- not relate to prediction of external variables. For principal com-
ing further through truncation by PCs and the use of a roughness ponent regression, we might instead choose λ based on how
penalty—can be carried out in at least two ways: well the PCs that it determines predict the outcome of our re-
gression model. " To do this, we would choose λ to minimize the
1. Apply a roughness penalty to the PCs themselves, then CV criterion ni=1 (yi − ŷ(i) )2 , where ŷ(i) is the predicted value
choose an appropriate number of these functional princi- for the ith response, based on a model constructed with all but
pal components (Silverman 1996; Ramsay and Silverman the ith observation. Our implementation uses the more compu-
2005) for the weight function. tationally efficient m-fold CV, in which the data are divided into
2. Add a roughness penalty to criterion (4) and minimize the m equal parts, each of which takes a turn serving as a validation
resulting penalized sum of squares. set for the model derived from the rest of the data.
Reiss and Ogden: Functional PCR and PLS 987

3.4 FPCRR 5. CHOICE OF SMOOTHING PARAMETERS

Earlier we considered minimizing the penalized sum of Either form of FPCR or FPLS requires that we optimally
squares "y − α1 − XBβ"2 + λβ T PT Pβ [eq. (2)]. Suppose now choose (1) the number K of basis functions, (2) the number A
that the number of columns K of the design matrix XB (the di- of components, and (3) the roughness penalty tuning parame-
mension of the B-spline basis) is still too large to remedy the ter λ. The first and third of these must also be chosen for PBSE.
ill-posed nature of the problem. We can further reduce the de- It is often assumed (e.g., Marx and Eilers 1999; Cardot et al.
sign matrix XB through PCR. The resulting method, FPCRR , 2003) that the precise value of K has little impact as long as
minimizes there are sufficient knots to capture the variation in the weight
"y − α1 − XBVA ζ "2 + λζ T VTA PT PVA ζ (5) function. Thus our discussion focuses on A and λ.

over ζ ∈ RA , where VA consists of the first A columns of V 5.1 Number of Components

from the SVD XB = UDVT . The choice of λ for FPCRR is The number of components A for FPCR and FPLS can be
considered in Section 5. chosen by multifold CV. The choice of λ is nested within this
Note that if n ≥ K, then for A = K, VA = V is a nonsin-
choice. For FPCRC and FPLSC , as discussed earlier, λ is also
gular K × K matrix, and any β ∈ RK equals VA ζ for some
chosen by CV, resulting in the “double-cross” (Stone 1974). For
ζ ∈ RA . Thus minimizing (5) is equivalent to minimizing (2).
each A and each training set, we find the λ with the lowest
In other words, PBSE is a special case of FPCRR in which all
SSEpred for the validation set. We then choose that A for which
K components are retained.
the sum of SSEpred over all validation sets is lowest, and finally
4. FUNCTIONAL PARTIAL LEAST SQUARES choose the CV-minimizing λ for that A. For PBSE, FPCRR ,
and FPLSR , the three methods described next can be used to
4.1 Partial Least Squares With Smooth Factors
choose λ.
Like ordinary PCR, ordinary PLS does not take the spa-
tial location of the signals into account. Aside from loss of 5.2 Finding λ by Generalized Cross-Validation
information, this may make the factors difficult to interpret. For FPCRR and FPLSR , instead of the double-cross, there
Consequently, several attempts have been made to incorporate are computational and theoretical advantages to taking, for
smoothing into the PLS factors. In one such procedure (Goutis each A and each training set, the λ that minimizes the general-
and Fearn 1996), the first step within the loop of the PLS algo- ized cross-validation (GCV) criterion n1 "(I − Hλ )y"2 /[ n1 tr(I −
rithm given earlier is replaced by the following: Hλ )]2 (Wahba 1990), where Hλ is the smoother or “hat” matrix
Let pa be the unit vector in the row space of Ea−1 that max- such that
imizes
ŷ = Hλ y (6)
pTa ETa−1 fa−1 fTa−1 Ea−1 pa − λpTa VT Vpa ,
for the given method. GCV can also be used to choose λ for
where V is chosen so that the second term of this expression
PBSE. It seems to work similarly to CV but with much lower
constitutes a penalty on the roughness of pa . When Ea−1 has
computational expense.
rank p, the number of predictors, the restriction to the row space
of Ea−1 is no restriction, and pa is simply the eigenvector cor- 5.3 Finding λ by Fitting a Linear Mixed Model
responding to the largest eigenvalue of ETa−1 fa−1 fTa−1 Ea−1 −
Let U∗ D∗ UT∗ be the SVD of VTA PT PVA , with
λVT V. The smoothing parameter λ can be chosen by m-fold
CV. D∗ = diag(d1 , . . . , ds , 0, . . . , 0),
√ √
4.2 Functional Partial Least Squares where d1 ≥ · · · ≥ ds > 0. Let D1 = diag( d1 , . . . , ds , 1, . . . ,
1). Then
Our development for smooth PLS parallels that for functional # $
PCR. The first step is to replace the design matrix X with XB. Is 0
U∗ D∗ U∗ = U∗ D1
T
D1 UT∗ .
Unpenalized B-spline PLS regression with A components con- 0 0
structs the SIMPLS weight matrix RA with the new design ma-
Letting ζ 1 = D1 UT∗ ζ , we obtain that (5) is equal to
trix and minimizes "y − α1 − XBRA ζ "2 over ζ ∈ RA [cf. (4)]. # $
Again as with PCR, we can then envision two forms of PLS −1 2 Is 0
that are “functional” in that they use both a smooth basis and "y − α1 − XBVA U∗ D1 ζ 1 " + λζ 1 T
ζ 1.
0 0
a roughness penalty. One of these forms incorporates a penalty
into the construction of the components (analogous to FPCRC ), Taking ζ 1 = (uT , bT )T , where u has length s, and BVA U∗ ×
whereas the other uses a penalty in the regression (analogous D−1
1 = (Z F), where Z has s columns, we find that the fore-
to FPCRR ). A method of the first type, which we refer to as going is equal to "y − α1 − XZu − XFb"2 + λuT u. This last
FPLSC , can be derived by applying the procedure of Goutis and expression divided by −2σε2 yields the best linear unbiased pre-
Fearn (1996) with XB as the design matrix. A functional PLS diction (BLUP) criterion for the linear mixed model
method of the second type, which we call FPLSR , minimizes y|u ∼ N(α1 + XFb + XZu, σε2 ), u ∼ N(0, (σε2 /λ)I).
2
"y − α1 − XBRA ζ " + λζ T
RTA PT PRA ζ
Thus minimizing (5) is equivalent to maximizing the BLUP
over ζ ∈ RA [cf. (5)], where RA is derived by SIMPLS with criterion. As emphasized by Ruppert et al. (2003), this corre-
design matrix XB. spondence allows us to take advantage of the well-developed
988 Journal of the American Statistical Association, September 2007

methodology for mixed models. In particular, we can estimate estimate of ω. Thus it seems appropriate to plug the minimally
λ through restricted maximum likelihood (REML) estimation smooth estimate ω̂0 into (8). An explicit equality defining the
of the variance parameters. Moreover, because the variance ma- plug-in value of λ was given by Reiss (2006).
trix of the random-effect coefficient vector u is a multiple of the
6. ASYMPTOTIC RESULTS FOR FPCRR
identity, this estimation can be carried out with standard mixed-
model software; compare model 3 of Wang (1998), which sim- To derive asymptotic results for FPCRR , we begin with sig-
ilarly reduces the curve fitting problem to a mixed model with nals x∗i = (xi1 , . . . , xiN )T , i = 1, 2, . . . , which are iid ran-
dom vectors with E(x1j ∗ ) = 0 and E(x ∗4 ) < ∞ for each
variance proportional to I. 1j
If n ≥ K, then, as noted in Section 3.4, PBSE can be seen j = 1, . . . , N . Throughout this section, we assume that the
as FPCRR with all K components. Thus the foregoing mixed- outcomes are generated by the model y = α1 + X∗ Bβ + ε,
model formulation can be used to find λ for PBSE. (See Reiss where X∗ = (x∗1 , . . . , x∗n )T ; ε is a vector of iid errors with
2006 for a mixed-model formulation for PBSE that remains mean 0 and finite variance, independent of X∗ ; and B is a
valid even for n < K.) Finally, by considering the SVD of fixed N × K B-spline basis matrix. For estimation, we replace
RTA PT PRA rather than that of VTA PT PVA , we can use the same X∗ with X = (I − n−1 11T )X∗ , which has mean-0 columns as
argument to find λ for FPLS by REML estimation. before. For given n, λ, and A, the FPCRR estimate of β is
β̂ n = BVA (VTA BT XT XBVA + λVTA PT PVA )−1 VTA BT XT y. In
5.4 The Minimum Mean Integrated Squared Error this section we use the notation λn to emphasize that λ may
Value of λ vary with n.
Let V∗A be the K × A matrix whose columns are the eigen-
For PBSE and FPCRR , we can alternatively choose λ by
expressing the mean integrated squared error (MISE) of ω̂, vectors of E(BT x1 xT1 B) corresponding to its leading eigenval-
"N ues ξ1 > · · · > ξA > 0 (i.e., we assume the first A eigenvalues
E[( ω̂ − ωi )2 ], as a function of λ and finding the min-
to be distinct and positive). Then V∗A can be seen as a popu-
i=1 i
imizer of this function. The following proposition, proved by
lation version of VA . Let %A = diag(ξ1 , . . . , ξA ). Some theory
Reiss (2006), gives a general formula encompassing PBSE and
on eigenvalues and eigenvectors of random matrices leads to
FPCRR as special cases.
the following result. Detailed proofs of this and the two theo-
Proposition 1. Suppose that y − ȳ1 ∈ Rn has mean Xω and rems that follow are available on the American Statistical As-
variance σ 2 I, where 1T X = 0, and that ω is estimated by sociation website at [Link]
supplemental_materials.
ω̂ = ω̂(λ) = GT−1
λ G X y,
T T
(7)
Theorem 1. Suppose that
where G is an N × A matrix of rank A < N and Tλ =
GT XT XG + λQ for some A × A symmetric penalty matrix Q, β = V∗A ζ for some ζ = (ζ1 , . . . , ζA )T . (9)
with G and Q not depending on y. Then the MISE of ω̂ attains If β̂ n denotes an A-component FPCRR estimate for which λn
a critical point at any value of λ satisfying d
is chosen to be oP (n1/2 ), then n1/2 (β̂ n − β) → Z1 + Z2 , where
−1
ωT [(I − WTλ )(Wλ − W2λ )]ω = σ 2 tr[GT−1 2
λ G (Wλ − Wλ )],
T Z1 ∼ NK (0, σ 2 V∗A %A V∗T A ) and Z2 ∼ NK (0, W) for a K × K
matrix W not depending on σ 2 .
(8)
The variance of Z1 in Theorem 1, σ 2 V∗A %−1 ∗T
A VA , is what
where Wλ = GT−1
λ G X X.
T T
the asymptotic variance of n1/2 (β̂ n − β) would be if we could
The critical point referred to in Proposition 1 is not necessar- substitute the population-based V∗A for the sample-based VA
ily the global minimum. Ordinarily, however, we would expect in calculating the estimate. The added variance associated with
the MISE to be either a U-shaped function of λ (in which case Z2 represents the price of having to use VA rather than V∗A .
the proposition allows us to find the global minimum) or an Note that because W does not depend on σ 2 , this price be-
increasing function of λ ∈ [0, ∞) (so that it is minimized by comes smaller (in a relative sense) as σ 2 increases. More details
taking λ = 0). about Z2 are given in the proof of Theorem 1.
Proposition 1 applies to PBSE with G = B, Q = PT P, and Under mild assumptions, the λn = oP (n1/2 ) condition im-
to FPCRR with G = BVA , Q = VTA PT PVA . In simulation set- posed in Theorem 1 is met if λn is chosen by GCV or REML;
tings for which the true ω and σ 2 are known, we can find the indeed, a stronger condition then holds.
MISE-minimizing λ by simply solving (8). We can then use Theorem 2. Let λn be the GCV or REML value associ-
this λ to construct an “oracle” estimator of ω against which ated with the A-component FPCRR estimate. Assume that
data-dependent estimators can be compared. V∗T ∗ ∗T
A P PVA is nonsingular and that VA β -= 0. Then there ex-
T
In real data settings, if it is reasonable to assume that ω̂0 , the ists M > 0 such that P (λn > M) → 0 as n → ∞.
estimate derived by setting λ = 0, is close to the true ω, then
Next, consider the choice of the number of components by
Proposition 1 allows us to construct a plug-in estimate of ω by
any multifold CV scheme in which we form Dn divisions of
(a) calculating ω̂0 , (b) estimating the error variance by σ̂02 =
the n observations into training and validation sets of size nt
"y − ȳ1 − Xω̂0 "2 /(n − A − 1), (c) substituting ω̂0 , σ̂02 into (8), and nv = n − nt ; sum (over the Dn divisions) the prediction
and (d) inserting the resulting root λ̂ into (7). Using ω̂0 as a errors obtained by applying each training-set model to the cor-
“stand-in” for the true ω is motivated by the empirical result responding validation set; and choose the number of compo-
that when an estimate of ω and the associated estimate of σ 2 nents yielding the smallest sum. Using Theorem 1 to derive the
are substituted into (8), the resulting λ̂ gives rise to a smoother asymptotic prediction error leads to the following result.
Reiss and Ogden: Functional PCR and PLS 989

Theorem 3. Assume that (9) holds with ζ A -= 0. Suppose that 7.1 Simulation Studies
the number of FPCRR components is chosen by multifold CV,
Both the simulations and the real data validation described
as described in the previous paragraph, and that λn = oP (n1/2 ).
later were based on spectroscopic data sets described by Kali-
If nt , nv → ∞ and Dn = oP [(min{nt , nv })1/2 ], then for any
vas (1997) and publicly available at Phil Hopke’s ftp site
positive integer A1 < A, the A-component model will be cho-
([Link] ). The
sen over the A1 -component model with probability tending to 1
wheat data set consists of NIR spectra of 100 wheat samples,
as n → ∞.
measured in 2-nm intervals from 1,100 nm to 2,500 nm, and two
This result ensures that a “too-small” model will not be cho- response variables: the samples’ moisture content and protein
sen in the limit, but leaves open the possibility of a “larger- content. The gasoline data set consists of spectra of 60 gasoline
than-needed” model (cf. Shao 1993). We would argue that in samples, measured in 2-nm intervals from 900 to 1,700 nm, and
the context of presenting FPCRR as an alternative to the PBSE a response variable, octane number, available for each sample.
model, which is equivalent to FPCRR with all components (see To correct for a baseline shift observed in the wheat spectra
Sec. 5.3), ruling out “too-small” models is the more pressing [Fig. 1(a)], we used the once-differenced spectra [Fig. 1(b)].
concern. As is common practice for PCR/PLS with predictor data (such
Informally, the foregoing theorems say that FPCRR using as signals) with uniform units of measurement, the predictors
GCV or REML produces a consistent estimate of the spline were not scaled to unit variance.
coefficients if enough components are used, and that the latter Each set of simulations was conducted four times, with
condition will be met in the limit if the number of components each of the two data sets and each of two true coefficient
is chosen by multifold CV. functions. The two true coefficient functions were chosen to
represent different degrees of roughness. The first of these
7. COMPARISON OF MODELS
was obtained from the relatively smooth function f1 (t) =
Three sets of models were tested: (a) PBSE models with λ 2 sin(.5πt) + 4 sin(1.5πt) + 5 sin(2.5πt), t ∈ [0, 1], used in
chosen by GCV, with λ chosen by REML and with MISE- the simulations of Cardot et al. (2003), by transforming its do-
minimizing (oracle) λ (the plug-in model was excluded due to main to that of the spectra. The second, more “bumpy” func-
computational problems and very poor performance); (b) PCR tion was obtained
" by appropriately transforming the domain
models: ordinary (unsmoothed) PCR, the unpenalized B-spline of f2 (t) = 4j =1 aj exp[bj (t − cj )2 ], t ∈ [0, 1], a sum of four
PCR of Section 3.1, FPCRC , and FPCRR with λ chosen by Gaussian curves differing significantly from 0 on disjoint do-
GCV, with λ chosen by REML, and with true (oracle) and esti- mains (similar to a function used for simulations in Cardot
mated (plug-in) MISE-minimizing λ; and (c) PLS models: or- 2002). This function was constructed to have two peaks in re-
dinary PLS; unpenalized B-spline PLS, FPLSC , and FPLSR gions where the variance was high for the gasoline signals and
with λ set by GCV and by REML. low for the wheat signals and two troughs in regions where the

(a) (b)

Figure 1. Wheat spectra and estimated ω with moisture as outcome. (a) Raw wheat spectra; (b) differenced wheat spectra. Plots (c) and (d)
overlay the estimates obtained for the five training data sets by PCR and FPCRR –REML, respectively.
990 Journal of the American Statistical Association, September 2007

reverse was true. This was intended to facilitate comparisons of Table 2. Mean of L2 norm of error (root integrated squared error)
estimation accuracy at wavelengths of high versus low signal in estimating ω
variance.
Smooth function Bumpy function
To test the methods with both high and low signal-to-noise
ratios, two sets of responses, y = Xω + ε, were created in each Wheat Gasoline Wheat Gasoline
simulation by first generating iid standard normal error vectors .9 .6 .9 .6 .9 .6 .9 .6
and then multiplying these by error standard deviations σε cho-
PBSE–GCV .97 2.04 1.07 2.17 2.87 4.34 2.54 3.90
sen so that R 2 = var(Xω)/(var(Xω) + σε2 ) (i.e., the squared
PBSE–REML .65 .72 .36 .59 2.45 3.10 1.17 1.21
multiple correlation coefficient of the true model) would equal PBSE-oracle .42 .67 .33 .48 2.17 2.51 1.06 1.07
.9 and .6.
PCR 1.01 1.16 .81 1.01 .92 1.24 1.00 1.10
For each combination of the three factors (data set, true co-
B-spline PCR .98 1.14 .77 1.27 .91 1.36 1.20 1.63
efficient function, and R 2 ), we carried out 300 simulations FPCRC 1.08 1.43 1.57 2.76 1.71 2.82 2.03 3.36
for each method except FPLSC (by far the slowest method), FPCRR –GCV .80 1.01 .53 .73 1.08 1.50 1.12 1.19
for which 100 simulations were done. Spline-based methods FPCRR –REML .66 .75 .29 .42 .98 1.06 .95 .98
used cubic B-splines with 40 equally spaced internal knots. FPCRR -oracle .85 .86 .54 .65 .66 .72 .92 .94
For PCR- and PLS-type methods, the candidates for number FPCRR -plug-in .94 1.07 .69 1.06 .80 1.03 1.09 1.40
of components were 1–10, 12, and 15–40 at intervals of 5. PLS 1.01 1.18 .78 .94 .92 1.18 .99 1.12
We chose the number of components by eightfold CV. For the B-spline PLS .94 1.04 .74 1.15 .83 1.08 1.18 1.52
plug-in version of FPCRR , however, we simply set the number FPLSC 1.11 1.51 .82 1.46 1.00 1.22 1.36 1.80
of components to the number chosen by unpenalized B-spline FPLSR –GCV .87 1.04 .59 1.05 1.14 1.56 1.24 1.54
PCR, on the assumption that ω̂0 for this number of components FPLSR –REML .68 .74 .31 .44 .97 1.06 .95 .98
should serve as a reasonable surrogate for the true ω. The sim- NOTE: Scaled by the L2 norm of the true coefficient function.
ulations were programmed in R version 2.1.0 (R Development
Core Team 2005).
performers, except with the wheat spectra and smooth ω, for
7.2 Simulation Results which they were bested by PBSE–REML. The latter method
also did very well with the gasoline spectra and smooth ω, but
7.2.1 Prediction. Table 1 presents the average over all sim-
only moderately well with bumpy ω. FPCRR and FPLSR with
ulations of the empirical mean squared error of prediction
λ chosen by GCV did less well than the REML variants but
(1/n)(ω̂ − ω)T XT X(ω̂ − ω) for each method, true ω, data set,
consistently outperformed FPCRC and FPLSC . PCR and PLS
and signal-to-noise ratio. To make the columns comparable,
were always among the six worst nonoracle methods.
each was scaled by the appropriate value of var(Xω). Not sur-
prisingly, both oracle methods performed well. The PBSE ora- 7.2.2 Estimation. Table 2 presents the mean L2 norm of
cle method consistently had the lowest value with smooth ω, but the difference between the true and estimated coefficient func-
did poorly with bumpy ω. The FPCRR oracle method had one tions (mean root integrated squared error), scaled by the L2
1 "M
of the five lowest values in each column. Among the nonoracle norm of the true function, that is, M m=1 ((ω̂ − ω)T (ω̂ −
methods, FPCRR and FPLSR with REML were always the top ω)/(ωT ω))1/2 , where M is the number of simulations. Rela-

Table 1. Average scaled MSE of prediction

Smooth function Bumpy function

Wheat Gasoline Wheat Gasoline
.9 .6 .9 .6 .9 .6 .9 .6
PBSE–GCV .0065 .0327 .0127 .0703 .0103 .0445 .0161 .0774
PBSE–REML .0050 .0178 .0095 .0528 .0091 .0364 .0144 .0669
PBSE-oracle .0049 .0152 .0085 .0426 .0086 .1784 .0228 .0918
PCR .0124 .0492 .0279 .1176 .0102 .0535 .0154 .0687
B-spline PCR .0089 .0354 .0148 .0958 .0076 .0437 .0131 .0680
FPCRC .0094 .0334 .0200 .1029 .0113 .0483 .0156 .0756
FPCRR –GCV .0065 .0271 .0111 .0651 .0069 .0348 .0113 .0543
FPCRR –REML .0055 .0214 .0088 .0515 .0060 .0288 .0098 .0470
FPCRR -oracle .0063 .0211 .0097 .0608 .0075 .0323 .0126 .0641
FPCRR -plug-in .0088 .0338 .0129 .0824 .0347 .0712 .0118 .0600
PLS .0135 .0497 .0278 .1070 .0100 .0542 .0154 .0691
B-spline PLS .0081 .0308 .0149 .0945 .0069 .0398 .0136 .0706
FPLSC .0112 .0526 .0190 .1746 .0086 .1997 .0160 .0830
FPLSR –GCV .0072 .0281 .0128 .0750 .0075 .0369 .0130 .0643
FPLSR –REML .0057 .0203 .0091 .0511 .0063 .0290 .0101 .0483
NOTE: Based on 100 simulations for FPLSC , 300 simulations for all other methods.
Reiss and Ogden: Functional PCR and PLS 991

tive MISE (the foregoing expression without taking the square than the peaks. In this case, the plug-in version of FPCRR was
root) is sometimes used for comparisons of this type, but we the only method shown whose 90% confidence limits surround
took square roots to reduce the skewness of the distributions. the troughs quite closely.
The PBSE oracle method did very well with the smooth ω, and These figures indicate that all of the methods have difficulty
the FPCRR oracle method was always the best method with with estimation. Sections 8.2 and 8.3 provide more discussion
the bumpy ω. That the latter method did better at estimation focusing on PBSE and FPCRR .
than at prediction (see Table 1) is unsurprising, because it is ex-
pressly designed for optimal estimation. Among the 13 nonor-
7.3 Application to Real Data
acle methods, FPCRR –REML and FPLSR –REML appeared to
do well most consistently; they were always among the best 7.3.1 Split-Sample Validation. Split-sample validation was
three, except for the wheat spectra with the bumpy ω, for which conducted for each of the nonoracle methods with the gasoline
they fell in the middle. As shown in Table 1, PBSE–REML did and wheat signals and associated outcome measures. The in-
very well with the smooth coefficient function but less well with dices of the samples were divided into five sets of equal size
the bumpy one. FPCRC and FPLSC were generally relatively (samples 1, 6, 11, . . . ; samples 2, 7, 12, . . . ; and so on). For
unsuccessful, as were PCR and PLS. each such set V and each method, the sum of squared errors
Figures 2 and 3 display 90% empirical pointwise confidence "
of prediction i∈V (yi − yˆi )2 was calculated based on a model
intervals, based on the 300 simulations with the wheat spec-
fitted with the remaining samples as a training set. This quan-
tra, for four methods: PBSE–REML, ordinary PCR, FPCRR –
tity was then averaged over the five validation sets. The results
REML, and FPCRR with plug-in estimate of λ. These figures
are displayed in Table 3; to facilitate comparisons, each entry is
clearly show the instability of the PCR estimates. Whereas Ta-
bles 1 and 2 show that PBSE–REML fared relatively well for divided by the column minimum.
the wheat spectra with smooth function, the confidence inter- For the gasoline data, the REML versions of FPCRR and
vals (CIs) in Figure 2 do not indicate consistently accurate es- FPLSR had the best results. For prediction of protein content
timation of this function. For the bumpy ω, the other three from the wheat spectra, the top performers were B-spline PLS
methods did a much better job of estimating the bumps than and FPLSR –GCV. The protein values are known to have poor
PBSE–REML. As noted earlier, the bumpy ω has two troughs precision (Centner et al. 2000) and to be less closely related
in regions of high variance and two peaks in regions of low vari- to the spectra compared with the moisture values (Brenchley,
ance for the wheat spectra. Accordingly, as shown in Figure 3, Hörchner, and Kalivas 1997). These properties evidently made
the methods were much more effective at detecting the troughs it difficult to model protein by any of the methods.

Figure 2. Estimates of smooth ω based on simulations with wheat data, R 2 = .9 (—–, true ω; - - - - - -, empirical median; · · · · · ·, pointwise
90% CIs).
992 Journal of the American Statistical Association, September 2007

Figure 3. Estimates of bumpy ω based on simulations with wheat data, R 2 = .9 (—–, true ω; - - - - - -, empirical median; · · · · · ·, pointwise
90% CIs).

7.3.2 Moisture Content of Wheat. Because high moisture Section 7.1—in contrast to the other two validations, for which
content can lead to storage problems for wheat, the ability to FPCRR –REML markedly outperformed PCR—Figure 1 illus-
predict moisture in a wheat sample by spectroscopic meth- trates why FPCRR –REML represents an advance over PCR
ods, as opposed to the much more time-consuming methods of even for the moisture analysis. Figures 1(c) and 1(d) display
traditional “wet” chemistry, is particularly valuable. Whereas overlaid plots of the five training set estimates of ω for standard
FPCRR –REML is seen in Table 3 to have 9% higher prediction PCR [Fig. 1(c)] and for FPCRR –REML [Fig. 1(d)]. FPCRR
error than ordinary PCR for the moisture outcome described in produced estimates of ω that are more stable across training sets
and also more interpretable; a trough around 2,040 nm, which
Table 3. Split-sample validation results
seems to coincide with one of the minor peaks appearing in the
differenced spectra in Figure 1(b), emerges clearly as the coef-
Wheat–Moisture Wheat–Protein Gasoline–Octane ficient function’s most prominent feature. FPCRR may be said
to attain a more parsimonious representation for ω, in that 9-
PBSE–GCV 1.11 1.12
to 12-component models are selected for the five training sets,
PBSE–REML 1.24 2.35 1.12
versus 30- to 40-component models for PCR.
PCR 1 2.02 1.44
B-spline PCR 1.35 1.08 1.40 8. DISCUSSION
FPCRC 1.36 1.30 1.31
FPCRR –GCV 1.29 1.08 1.19 8.1 Regularized Components versus
FPCRR –REML 1.09 1.66 1 Regularized Regression
FPCRR -plug-in 2.66 2.76 1.29 A major goal of this study was to evaluate the relative
PLS 1.09 1.90 1.44 merits of regularized-component versus regularized-regression
B-spline PLS 1.24 1 1.14 versions of FPCR and FPLS. Regularized-component meth-
FPLSC 1.09 1.13 1.16 ods tend to represent ω̂ more parsimoniously in the sense of
FPLSR –GCV 1.27 1 1.17
choosing fewer components. However, this advantage appears
FPLSR –REML 1.11 1.52 1.06
to be offset by several advantages of regularized-regression
NOTE: Each data set was split into five equal subsets; for each, SSE of prediction was methods. The latter are faster, because λ can be chosen with-
computed based on a model fit with the remaining data. The mean SSE values are shown,
expressed as ratios with respect to the column minimum. The PBSE–GCV prediction er- out recourse to the double-cross (see Sec. 5.1). Moreover,
ror for moisture is based on only four training set models, because for one training set, regularized-regression methods offer improved performance for
the GCV criterion chose λ = 0, resulting in a computational singularity. The same error
occurred for all five training sets for PBSE–GCV with the protein data; hence the missing both estimation of ω and prediction of y, especially with λ cho-
entry. sen by REML. Thus, although FPCRC is related to the trun-
Reiss and Ogden: Functional PCR and PLS 993

cated Karhunen–Loève expansion estimator, for which conver- hand, FPCRR /FPLSR failed to consistently improve on PBSE
gence rates have been derived for both prediction error (Cai for smooth ω; evidently, the span of the leading components
and Hall 2006) and estimation error (Hall and Horowitz 2007), was not always sufficiently rich to approximate the smooth
FPCRC appears to not be the most successful method for our function well.
small samples. Figures 4 and 5 provide some insight into the variation in
8.2 Building on the Penalized B –Spline Expansion relative performance of PBSE and FPCRR . These plots com-
pare the true coefficient functions used in the simulations with
Cardot et al. (2003) showed that under reasonable assump- oracle estimates—by FPCRR with various numbers of compo-
tions, the MISE of the PBSE estimator is of order n−2p/(4p+1) nents and by PBSE—derived from a set of random outcomes.
in probability, where p depends on the specific assumptions but Also shown are the projections of the true functions on the span
is bounded above by the degree of the B-splines used. One of of the FPCR components, or on the span of the B-spline basis
the assumptions is that λ grows at a certain rate as n → ∞.
for PBSE. Such a projection represents the most accurate pos-
In our simulations, however, λ often was chosen to be essen-
sible estimate given the subspace of RN to which the estimate
tially infinite, in the sense that the resulting estimator was in-
is restricted.
distinguishable from a straight line. This occurred more often
with REML than with GCV, but the latter method’s tendency to For the bumpy function, Figure 4 shows that the troughs are
undersmooth in some cases caused it to fare less well overall. well estimated with as few as two components because, as men-
Improved methods for the choice of λ (perhaps along the lines tioned earlier, these occur in regions with high variation in the
of Kou and Efron 2002) may help optimize the performance of signals. The peaks, occurring in regions with little variation in
PBSE. the signals, are recovered to some extent only with a large num-
FPCRR /FPLSR with REML performed better than PBSE in ber of components or by PBSE, which comes at the price of less
the simulations for bumpy ω. Intuitively, this may be because accurate estimation of the troughs. But evidently, because the
PBSE with λ = 0 leads to an extremely bumpy function, so that trough regions are more important for prediction, estimating the
if some bumps are real, then the roughness penalty has diffi- troughs well is preferable to estimating both the troughs and the
culty distinguishing these from spurious bumps. On the other peaks somewhat well. On the other hand, Figure 5 shows that a

Figure 4. Estimating the bumpy coefficient function by PBSE and FPCRR . The solid line in each plot represents the true coefficient function;
the dashed line is the projection of this function on the span of the FPCR components (or of the B-spline basis, for PBSE); and the dotted line
is the oracle estimate (i.e., the estimate using the MISE-minimizing value of λ) based on a set of random outcomes generated using the wheat
spectra with R 2 = .9.
994 Journal of the American Statistical Association, September 2007

Figure 5. Estimating the smooth coefficient function by PBSE and FPCRR . See Figure 4 for an explanation.

large number of components is needed to estimate the smooth smooth ω), and correspondingly, FPCRR –REML outperformed
function well, but apparently, exceeding the required number PBSE-oracle. However, for the wheat data with smooth ω,
of components (as PBSE does) necessitates a very conservative FPCRR –REML usually chose a small number of components,
amount of smoothing to counter the risk of overfitting, resulting notwithstanding the large-sample result of Theorem 3. This
in inferior estimation. suboptimal choice may explain why FPCRR –REML did less
Figure 6 shows plots of f (x) = MISE(FPCRR with x compo- well than PBSE-oracle in this case.
nents)/MISE(PBSE-oracle) where λ for FPCRR was chosen The idea that CV-type criteria cannot always be counted on
to give the same degrees of freedom [trace of the hat matrix to choose the optimal number of components is reinforced by
defined in (6)] as the PBSE-oracle fit. (By definition, the ora- some preliminary findings with a positron emission tomogra-
cle choice of λ for FPCRR would be more advantageous for phy (PET) data set. Parsey et al. (2006) measured binding po-
FPCRR , but the equal-degrees-of-freedom choice allows for a tential (BP) of serotonin 1A receptors, using PET studies with
cleaner comparison.) Within each of the subfigures, f is plot- the radioligand [carbonyl-11 C]WAY 100635, in 28 depressed
ted for R 2 = .3, .6, and .9. The shape of f seems to depend subjects and 43 controls. BP is an index of the density of sero-
primarily on ω, secondarily on the data set, and least on R 2 . In tonin receptors, which are believed to play a key role in de-
agreement with Figures 4 and 5, these plots suggest that with pression. It is of interest to use such BP maps as predictors of
bumpy (smooth) ω, FPCRR tends to do best with a small (large) depression-related outcomes, such as the Hamilton depression
number of components. score. Marx and Eilers (2005) have extended their implementa-
tion of PBSE to multidimensional signals or images. With this
8.3 Cross-Validation-Type Criteria May Choose Too
data set, the number of images (n) is much smaller than the
Few Components
number of basis elements (K) needed to capture a reasonable
Although in practice the degrees of freedom are not the level of detail, whereas the PBSE convergence result of Car-
same for FPCRR –REML as for PBSE-oracle, Figure 6 may dot et al. (2003) assumes that K/n → 0. Partly for this reason,
shed some light on the varying performance of FPCRR –REML we expected FPCRR –REML to be more suitable than PBSE–
in the simulations. For three of the four data set/true coef- REML. To test this expectation, we carried out a simulation
ficient function combinations, CV usually chose the number study with outcomes generated using two-dimensional slices
of components well in the aforementioned sense for FPCRR – obtained from 68 of the 71 BP maps along with the true co-
REML (a small number for bumpy ω, and a large number for efficient function described by Reiss (2006). With the number
Reiss and Ogden: Functional PCR and PLS 995

Figure 6. Comparing MISE for PBSE-oracle and FPCRR . The MISE for FPCRR divided by the MISE for PBSE-oracle is plotted as a
function of number of components used for FPCRR , with R 2 = .3 (1), .6 (!), and .9 (P).

of components chosen by GCV rather than by CV, FPCRR re- Cardot, H. (2002), “Local Roughness Penalties for Regression Splines,” Com-
quired only about 25% more computation time than PBSE. putational Statistics, 17, 89–102.
Cardot, H., Ferraty, F., and Sarda, P. (2003), “Spline Estimators for the Func-
The relative performance of signal regression methods de- tional Linear Model,” Statistica Sinica, 13, 571–591.
pends in a nontrivial way on the eigenstructure of the signals Centner, V., Verdú-Andrés, J., Walczak, B., Jouan-Rimbaud, D., Despagne, F.,
Pasti, L., Poppi, R., Massart, D.-L., and de Noord, O. E. (2000), “Comparison
(Cardot et al. 2003; Hall and Horowitz 2007). In view of this, a of Multivariate Calibration Techniques Applied to Experimental NIR Data
key difference between the spectra studied earlier and the PET Sets,” Applied Spectroscopy, 54, 608–623.
images is that for the latter, a much larger number of PCs is de Jong, S. (1993), “SIMPLS: An Alternative Approach to Partial Least Squares
Regression,” Chemometrics and Intelligent Laboratory Systems, 18, 251–263.
needed to account for most of the variation. Thus we would ex- Goutis, C., and Fearn, T. (1996), “Partial Least Squares Regression on Smooth
pect that FPCRR would need a large number of components to Factors,” Journal of the American Statistical Association, 91, 627–632.
work well; but nevertheless, GCV often chose a small number Hall, P., and Horowitz, J. L. (2007), “Methodology and Convergence Rates for
Functional Linear Regression,” The Annals of Statistics, 35, 70–91.
of components. Imposing a minimum of 30 components im- Kalivas, J. H. (1997), “Two Data Sets of Near-Infrared Spectra,” Chemometrics
proved the results. Thus FPCRR had lower prediction error than and Intelligent Laboratory Systems, 37, 255–259.
PBSE in 135 out of 200 simulations, but with the 30-component Kou, S., and Efron, B. (2002), “Smoothers and the Cp , GML, and EE Criteria:
A Geometric Approach,” Journal of the American Statistical Association, 97,
minimum, this number increased to 156. Similarly, FPCRR had 766–782.
lower estimation error than PBSE in 160 of the simulations Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979), Multivariate Analysis, New
York: Academic Press.
without, and 186 with, the 30-component minimum. Marx, B. D., and Eilers, P. H. C. (1999), “Generalized Linear Regression on
We conclude that FPCRR /FPLSR with REML may often Sampled Signals and Curves: A P –Spline Approach,” Technometrics, 41,
outperform not only other forms of FPCR/FPLS, but also ex- 1–13.
(2002), “Multivariate Calibration Stability: A Comparison of Meth-
isting approaches to signal regression, such as PBSE and un- ods,” Journal of Chemometrics, 16, 129–140.
smoothed PCR/PLS. At the same time, further research is (2005), “Multidimensional Penalized Signal Regression,” Technomet-
needed on the optimal choice of both the number of compo- rics, 47, 13–22.
Massy, W. F. (1965), “Principal Components Regression in Exploratory Statis-
nents and the smoothing parameter. tical Research,” Journal of the American Statistical Association, 60, 234–256.
Parsey, R. V., Oquendo, M. A., Ogden, R. T., Olvet, D. M., Simpson, N., Huang,
[Received October 2005. Revised March 2007.] Y., Van Heertum, R. L., Arango, V., and Mann, J. J. (2006), “Altered Serotonin
1A Binding in Major Depression: A [Carbonyl-C-11]WAY100635 Positron
REFERENCES Emission Tomography Study,” Biological Psychiatry, 59, 106–113.
R Development Core Team (2005), R: A Language and Environment for Sta-
Brenchley, J. M., Hörchner, U., and Kalivas, J. H. (1997), “Wavelength Selec- tistical Computing, Vienna, Austria: R Foundation for Statistical Computing,
tion Characterization for NIR Spectra,” Applied Spectroscopy, 51, 689–699. [Link]
Cai, T. T., and Hall, P. (2006), “Prediction in Functional Linear Regression,” Ramsay, J. O., and Silverman, B. W. (2005), Functional Data Analysis
The Annals of Statistics, 34, 2159–2179. (2nd ed.), New York: Springer-Verlag.
996 Journal of the American Statistical Association, September 2007

Reiss, P. T. (2006), “Regression With Signals and Images as Predictors,” un- Stone, M. (1974), “Cross-Validatory Choice and Assessment of Statistical Pre-
published doctoral dissertation, Columbia University, Dept. of Biostatistics. dictions,” Journal of the Royal Statistical Society, Ser. B, 36, 111–147.
Ruppert, D., Wand, M. P., and Carroll, R. J. (2003), Semiparametric Regression, Wahba, G. (1990), Spline Models for Observational Data, Philadelphia: Society
Cambridge, U.K.: Cambridge University Press. for Industrial and Applied Mathematics.
Shao, J. (1993), “Linear Model Selection by Cross-Validation,” Journal of the Wand, M. P. (1999), “On the Optimal Amount of Smoothing in Penalised Spline
American Statistical Association, 88, 486–494. Regression,” Biometrika, 86, 936–940.
Silverman, B. W. (1996), “Smoothed Functional Principal Components Analy- Wang, Y. (1998), “Smoothing Spline Models With Correlated Random Errors,”
sis by Choice of Norm,” The Annals of Statistics, 24, 1–24. Journal of the American Statistical Association, 93, 341–348.

A Simulation Study On Comparison of Prediction Methods When Only A Few Components Are Relevant
No ratings yet
A Simulation Study On Comparison of Prediction Methods When Only A Few Components Are Relevant
21 pages
Iterative Predictor Weighting (IPW) PLS: A Technique For The Elimination of Useless Predictors in Regression Problems
No ratings yet
Iterative Predictor Weighting (IPW) PLS: A Technique For The Elimination of Useless Predictors in Regression Problems
21 pages
Jurnal Internasional 4
100% (1)
Jurnal Internasional 4
9 pages
Principal Components Regression
No ratings yet
Principal Components Regression
14 pages
FERRÉ (2006) - A - Multilayer Preceptón With Functional Inputs
No ratings yet
FERRÉ (2006) - A - Multilayer Preceptón With Functional Inputs
17 pages
BBL 016
No ratings yet
BBL 016
13 pages
User Friendly Multivariate Calibration GP
100% (4)
User Friendly Multivariate Calibration GP
354 pages
Overview and Recent Advances in Partial Least Squares: Lecture Notes in Computer Science November 2005
No ratings yet
Overview and Recent Advances in Partial Least Squares: Lecture Notes in Computer Science November 2005
19 pages
Methods For Scalar-On-Function-Regression
No ratings yet
Methods For Scalar-On-Function-Regression
26 pages
The Geometry of Partial Least Squares
No ratings yet
The Geometry of Partial Least Squares
28 pages
Big Data Challenges in Life Sciences
No ratings yet
Big Data Challenges in Life Sciences
94 pages
Rsimpls
No ratings yet
Rsimpls
37 pages
Boulesteix, Strimmer: Partial Least Squares: A Versatile Tool For The Analysis of High-Dimensional Genomic Data
No ratings yet
Boulesteix, Strimmer: Partial Least Squares: A Versatile Tool For The Analysis of High-Dimensional Genomic Data
30 pages
Ojowoldhwoldsetcpls Wold Siam
No ratings yet
Ojowoldhwoldsetcpls Wold Siam
9 pages
An Overview of Methods in Linear Least-Squares Regression
No ratings yet
An Overview of Methods in Linear Least-Squares Regression
69 pages
Orthogonal Signal Correction of Near-Infrared Spectra: Svante Wold, Henrik Antti, Fredrik Lindgren, Jerker Ohman
No ratings yet
Orthogonal Signal Correction of Near-Infrared Spectra: Svante Wold, Henrik Antti, Fredrik Lindgren, Jerker Ohman
11 pages
Partial Least Squares Regression
100% (1)
Partial Least Squares Regression
448 pages
PLS Modeling of Input/Output Data
No ratings yet
PLS Modeling of Input/Output Data
18 pages
PACE: Functional Data Analysis Overview
No ratings yet
PACE: Functional Data Analysis Overview
37 pages
Terbraak 1998
No ratings yet
Terbraak 1998
14 pages
An Introduction To Partial Least Squares Regression: Randall D. Tobias, SAS Institute Inc., Cary, NC
No ratings yet
An Introduction To Partial Least Squares Regression: Randall D. Tobias, SAS Institute Inc., Cary, NC
8 pages
A Review of Variable Selection Methods in Partial Least Squares Regression
No ratings yet
A Review of Variable Selection Methods in Partial Least Squares Regression
8 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
How To Make Pls Consistent
No ratings yet
How To Make Pls Consistent
6 pages
Partial Least Squares Regression and Projection On Latent Structure Regression (PLS Regression)
No ratings yet
Partial Least Squares Regression and Projection On Latent Structure Regression (PLS Regression)
10 pages
pls Package: PCR & PLSR in R
No ratings yet
pls Package: PCR & PLSR in R
23 pages
Averaged and Weighted Average Partial Least Squares: M.H. Zhang, Q.S. Xu, D.L. Massart
No ratings yet
Averaged and Weighted Average Partial Least Squares: M.H. Zhang, Q.S. Xu, D.L. Massart
11 pages
Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
100% (6)
Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
637 pages
Mallows - Some Comments On CP
No ratings yet
Mallows - Some Comments On CP
16 pages
Dynamic Behavior of Chemical Processes
No ratings yet
Dynamic Behavior of Chemical Processes
45 pages
Partial Least Square
No ratings yet
Partial Least Square
29 pages
PLS Algorithms in Multivariate Calibration
No ratings yet
PLS Algorithms in Multivariate Calibration
17 pages
Aos 1681
No ratings yet
Aos 1681
28 pages
1 s2.0 S0169743902000692 Main PDF
No ratings yet
1 s2.0 S0169743902000692 Main PDF
20 pages
Principal Component Regression, Partial Least Squares, Linear Classification
No ratings yet
Principal Component Regression, Partial Least Squares, Linear Classification
19 pages
Statistical Methods for Chemical Engineers
No ratings yet
Statistical Methods for Chemical Engineers
61 pages
Dejong 1993
No ratings yet
Dejong 1993
13 pages
2020 21 Report
No ratings yet
2020 21 Report
52 pages
University of Minnesota and Facultad de Ingenier Ia Qu Imica, UNL. Researcher of CONICET
No ratings yet
University of Minnesota and Facultad de Ingenier Ia Qu Imica, UNL. Researcher of CONICET
21 pages
(Monographs On Statistics and Applied Probability (Series) 161) Li, Bing - Sufficient Dimension Reduction - Methods and Applications With R-CRC Press (2018)
100% (1)
(Monographs On Statistics and Applied Probability (Series) 161) Li, Bing - Sufficient Dimension Reduction - Methods and Applications With R-CRC Press (2018)
307 pages
System Identification: Prediction-Error Methods
No ratings yet
System Identification: Prediction-Error Methods
53 pages
10.3934 Math.2021633
No ratings yet
10.3934 Math.2021633
17 pages
Linear Fitting with Noisy Data
No ratings yet
Linear Fitting with Noisy Data
38 pages
Roumen,+Bolboaca Biomath
No ratings yet
Roumen,+Bolboaca Biomath
11 pages
Unit II ML
No ratings yet
Unit II ML
14 pages
Partial Inverse Regression: Biometrika (2007), 94, 3, Pp. 615-625 Printed in Great Britain
No ratings yet
Partial Inverse Regression: Biometrika (2007), 94, 3, Pp. 615-625 Printed in Great Britain
12 pages
Intronumericalrecipes v01 Chapter02 Regress
No ratings yet
Intronumericalrecipes v01 Chapter02 Regress
15 pages
PLS Tutorial PDF
No ratings yet
PLS Tutorial PDF
12 pages
Partial Least Squares A Tutorial
No ratings yet
Partial Least Squares A Tutorial
12 pages
Intro to Simple Linear Regression
No ratings yet
Intro to Simple Linear Regression
11 pages
From Dummy Regression To Prior Probabili
No ratings yet
From Dummy Regression To Prior Probabili
8 pages
Outline and Equation Sheet For M E 345: Every Additive Term in An Equation Must Have The Same Dimensions
No ratings yet
Outline and Equation Sheet For M E 345: Every Additive Term in An Equation Must Have The Same Dimensions
7 pages
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
No ratings yet
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
22 pages
Acs Analchem 0c02178
No ratings yet
Acs Analchem 0c02178
9 pages
PLS Applications in Industrial RDP
No ratings yet
PLS Applications in Industrial RDP
44 pages
REG2022
No ratings yet
REG2022
313 pages
Regression Basics for Epidemiologists
No ratings yet
Regression Basics for Epidemiologists
18 pages
Stepwise Regression Techniques Explained
No ratings yet
Stepwise Regression Techniques Explained
15 pages
Q4 Statistics and Probability 11 - Module 2
100% (2)
Q4 Statistics and Probability 11 - Module 2
16 pages
Silva Dias Gaspar Brito 2013
No ratings yet
Silva Dias Gaspar Brito 2013
10 pages
General Examples Using The Crow Model
No ratings yet
General Examples Using The Crow Model
10 pages
First-Grade Classroom Behavior - Its Short - and Long-Term Consequences For School Performance
No ratings yet
First-Grade Classroom Behavior - Its Short - and Long-Term Consequences For School Performance
15 pages
Lecture 8 Application of VAR Model
100% (1)
Lecture 8 Application of VAR Model
22 pages
Model Question Paper Business Statistics BCC 104: Option A Option B Option C Option D Correct Answer Key
No ratings yet
Model Question Paper Business Statistics BCC 104: Option A Option B Option C Option D Correct Answer Key
4 pages
Inferential Statistics: Sampling & Estimation
No ratings yet
Inferential Statistics: Sampling & Estimation
39 pages
Solved Data Mining Warehousing Paper
No ratings yet
Solved Data Mining Warehousing Paper
4 pages
Unit 1
No ratings yet
Unit 1
50 pages
Chocolate Spending Analysis
No ratings yet
Chocolate Spending Analysis
5 pages
CS 4407 Unit 3 Graded Quiz Review
No ratings yet
CS 4407 Unit 3 Graded Quiz Review
11 pages
GRS
No ratings yet
GRS
9 pages
Hypothesis Testing in Python
No ratings yet
Hypothesis Testing in Python
149 pages
For Event Studies
No ratings yet
For Event Studies
54 pages
3.practice Assignment 3.1 - Not Graded
No ratings yet
3.practice Assignment 3.1 - Not Graded
16 pages
Methods Ecol Evol - 2012 - Nakagawa - A General and Simple Method For Obtaining R2 From Generalized Linear Mixed Effects
No ratings yet
Methods Ecol Evol - 2012 - Nakagawa - A General and Simple Method For Obtaining R2 From Generalized Linear Mixed Effects
10 pages
Statistics Assignment Analysis
No ratings yet
Statistics Assignment Analysis
22 pages
Xi Unit Test-3 Economics
No ratings yet
Xi Unit Test-3 Economics
4 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
Demand Forecasting Essentials
No ratings yet
Demand Forecasting Essentials
2 pages
Business Analytics Syllabus
No ratings yet
Business Analytics Syllabus
6 pages
Bow - Stat Quarter Iii Sy 2023 2024
100% (1)
Bow - Stat Quarter Iii Sy 2023 2024
3 pages
Lesson 3 and Lesson 4
No ratings yet
Lesson 3 and Lesson 4
20 pages
Forecasting by Machine Learning Techniques and Econometrics A Review
No ratings yet
Forecasting by Machine Learning Techniques and Econometrics A Review
7 pages
MMW L10
No ratings yet
MMW L10
2 pages
Test For Heteroskedasticity in Logit - Probit Models - Statalist
No ratings yet
Test For Heteroskedasticity in Logit - Probit Models - Statalist
3 pages
Statistical Tools Used in Quantitative Research
No ratings yet
Statistical Tools Used in Quantitative Research
3 pages
Ch5 Big Data and Analytics Definitions
No ratings yet
Ch5 Big Data and Analytics Definitions
2 pages
IFM Chapter 09 - For Spring 2024
No ratings yet
IFM Chapter 09 - For Spring 2024
23 pages
Simple Linear Regression Theory Answers
No ratings yet
Simple Linear Regression Theory Answers
2 pages

FPCR Jasa

Uploaded by

FPCR Jasa

Uploaded by

Functional Principal Component Regression and

Functional Partial Least Squares

1. INTRODUCTION chemometrics. In a typical application, the signals are near-

3.4 FPCRR 5. CHOICE OF SMOOTHING PARAMETERS

over ζ ∈ RA , where VA consists of the first A columns of V 5.1 Number of Components

Table 1. Average scaled MSE of prediction

Smooth function Bumpy function

You might also like