LectureNotes RBF
LectureNotes RBF
by Stefano De Marchi
Department of Mathematics
University of Padua
These lecture notes were inspired mainly by two seminal books on the topic by Holger
Wendland [13] and by Gregory E. Fasshauer [6]. The first presents more theoretical aspects
while the second provides also useful Matlab functions for understanding better the theory
and all the applictions discussed. The notes have then been used during a short teaching-
visit of the author to the University of Antwerp, for the Erasmus Teaching Staff Mobility.
People interested on radial basis functions, can refer to the wide literature available
that, especially in the last two decades, has grown very fast. The popularity of radial basis
functions can be understood by means of the following parallelism. In many cooking
recepies the parsley is used to give flavour and colour to dishes. Radial basis functions
can be considered as a mathematical parsley since they have been used in all mathematical
problems requiring a powerful, i.e. efficient and stable, approximation tool.
These four lectures were thoughts for students without a strong background on func-
tional analysis, so in the presentation of the topics I deliberately avoid, when possible, to
introduce functional analysis concepts. This is a great lack, but I hope that the people who
will use these notes will be not too critical towards me.
Moreover, these are only four introductory lectures on the topic and many important
aspects and applications, for lack of time, are not considered. Every lecture provides also a
set of exercises solvable by using Matlab. This choice has been done with the aim of making
the discussion more interesting from both the numerical and geometrical point of view.
I do hope that after this brief introduction, interested students will be encouraged and
also interested in getting into this fascinating mathematical tool.
Stefano De Marchi
Antwerp October 14, 2013.
2
Contents
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3
Stefano De Marchi Four lectures on radial basis functions
4 Error estimates 45
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4
List of Figures
1.2 A typical basis function for the Euclidean distance matrix fit, Bk (x) = kx
xk k2 with xk = 0 and d = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 tensor products of equally spaced points and tensor products of Chebyshev
points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 First row: G-L when d = 1 and n = 1, 2. Second row: G-L for d = 2 and
n = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Matern RBF for different , respectively. As before the shape parameter is
= 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Power function for the Gaussian kernel with = 6 on a grid of 81 uniform,
Chebyshev and Halton points, respectively. . . . . . . . . . . . . . . . . . . 48
4.3 Trial and error strategy for the interpolation of the 1-dimensional sinc func-
tion by the Gaussian for [0, 20], taking 100 values of and for different
equispaced data points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 The sup-norm of the 2D power function on uniform points for the Gaussian
kernel, varying [0, 20] for different values of N . . . . . . . . . . . . . . 53
4.5 LOOCV 1d: sup norm of the error on Chebyshev points for the Gaussian
kernel, varying [0, 20] for different values of N . . . . . . . . . . . . . . 54
6
List of Tables
7
Stefano De Marchi Four lectures on radial basis functions
8
Lecture 1
1.1 Motivations
In many cases the data are scattered, that is they have no special structure, and they
are in a big amount (several millions). Moreover in several applications the data sites are
considered in high dimension. Hence, for a unifying approach, methods have been developed
in the last decades with the aim to meet all these (new) situations.
We start from the univariate setting. We suppose that the data sites are ordered as follows
and we have some data values f1 , . . . , fN to be interpolated at the data set X. What we
want to do, mathematically speaking, is finding s : [a, b] R with the property s(xj ) = fj
for all j = 1, . . . , N .
Notice, that the data values fj is not necessary stem from a function f but we shall
keep in mind this possibility for reasons that will become clearer later.
In the univariate setting, a simple solution of the above problem consists in taking
s as polynomial p of degree at most N 1. However, as we can see later, this solution
is not working in higher dimensions. Remaining in the univariate case, no one with
9
Stefano De Marchi Four lectures on radial basis functions
Let us review briefly the main properties of univariate splines, especially in the case of
cubic splines. The set of cubic splines corresponding to the subdivision (1.1) is the space
with a = x0 and xN +1 = b. The space S3 (X) has dimension N + 4, so that the interpolation
conditions s(xi ) = fi , 1 i N are not sufficient to guarantee a unique interpolant. To
enforce uniqueness, in the case of natural splines, i.e. the set
that consists of all cubic splines that are linear polynomials on the outer intervals [a, x1 ] and
[xN , b]. It come easy to see that a cubic spline s is a natural spline if and only if it satisfies
s00 (x1 ) = s(3) (x1 ) = 0 and s00 (xN ) = s(3) (xN ) = 0. With this choice we have imposed 4
additional conditions to the space, so it is natural to assume that the dim(N3 (X)) = N .
Even more, it can be shown that the initial interpolation problem has a unique solution in
N3 (X).
2. An interpolating natural cubic spline satisfies a minimal norm property. Assume that
f comes from the Sobolev space H2 [a, b], i.e. f is continuous in [a, b] and has weak
first and second order derivatives in L2 [a, b] (a more precise definition will be done
later or can be found in any books of functional analysis). Assume further that f is
such that f (xj ) = fj , 1 j N . If sf,X denotes the natural cubic spline interpolant
(at the data set X) then
indicating that the natural cubic splines interpolant is the function from H2 [a, b] that
minimizes the semi-norm kf 00 k2L2 [a,b] under the conditions f (xj ) = fj , 1 j N .
3. They possess a local basis called B-splines. This basis, which is more stable than any
other, can be defined by recursion, by divided differences of the truncated cubic power
p(x; t) = (x t)3+ or by convolution. Here x+ takes the value of x for nonnegative x
and zero otherwise.
10
Stefano De Marchi Four lectures on radial basis functions
Interested readers on splines and their many properties can refer to the following funda-
mental books by Schumaker [12] or de Boor [3].
Remarks
Property 1. combined with the local basis, not only allows the efficient computation
and evaluation of splines but also is the key ingredient for a simple error analysis.
Hence, the natural way of extending splines to the multivariate setting is
based on this property. To this end, a bounded region Rd is partitioned
into essentially disjoint regions {j }N
j=1 (patches). Then the spline space consists of
those functions s that are piecewise polynomials on each j and that have smooth
connections on the boundaries of two adjacent patches. In two dimensions the most
popular subdivision of a polygonal region is by triangles. It is interesting to note
that even in this simple case, the dimension of the spline space is in general unknown.
When coming to higher dimensions it is not all clear what an appropriate replacement
for the triangulation would be. Hence, even if great progresses have been made in the
2-dimensional setting, the method is not suited for general dimensions.
Another possible generalization to the multivariate setting is based on the property
3. In particular a construction based on convolution led to the so called Box-splines.
Again, even in the 2-dimensions the problem can be handle, for higher dimensions is
still an open problem.
The property 2. is the motivation for a general framework in higher dimensions. This
approach has allowed to develop a beautiful theory where all space dimensions can be
handled in the same way. The resulting approximation spaces no longer consist of piecewise
polynomials, so they can not be called splines. The new functions are known with the
fashionable words of Radial Basis Functions (in what follows we refer to them simply as
RBF).
To get a better idea, let us remind that the set S3 (X) has the basis of truncated powers
( xj )3+ , 1 j N plus an arbitrary basis for P3 (R). Hence every s N3 (X) can be
represented in the form
N
X 3
X
s(x) = aj (x xj )3+ + bj xj , x [a, b] . (1.4)
j=1 j=0
Because s is a natural spline we have the additional information that s is linear on the two
outer intervals. That is on [a, x1 ] the spline is simply s(x) = b0 + b1 x (since b2 = b3 = 0).
Thus (1.4) becomes
N
X
s(x) = aj (x xj )3+ + b0 + b1 x, x [a, x1 ] . (1.5)
j=1
11
Stefano De Marchi Four lectures on radial basis functions
3 N
X 3 X
s(x) = (1)3l aj xj3l xl + b0 + b1 x, x [xN , b] . (1.6)
l
l=0 j=1
N
X N
X
aj = aj xj = 0 . (1.7)
j=1 j=1
where (r) = r3 , r 0 and p P1 (R). The coefficients {aj } have to satisfy the relations
(1.7). On the contrary, for every set X = {x1 , . . . , xN } R of pairwise distinct points and
for every f RN there exists a function s of the form (1.8), with (1.7), that interpolates
the data, i.e. s(xj ) = f (xj ), 1 j N .
This is the starting point for understanding the origin of RBF. The resulting interpolant
is, up to a low-degree polynomial, a linear combination of shifts of a radial function =
(| |). The function is then called radial because is the composition of a univariate function
with the Euclidean norm on R.
The generalization to Rd is straightforward where the name radial becomes even more
evident. In fact
12
Stefano De Marchi Four lectures on radial basis functions
N
X
s(x) = aj (kx xj k2 ) + p(x), x Rd , (1.9)
i=1
where : [0, ) R is a univariate fixed function and p Pm1 (Rd ) is a low degree
d-variate polynomial. The additional conditions on the coefficients (corresponding to (1.7))
become
XN
aj q(xj ) = 0, q Pm1 (Rd ) . (1.10)
i=1
In many cases (see Lecture 2), we can avoid the side conditions on the coefficients
(1.10). In these cases the interpolation problem has solution if the matrix
The answer is affirmative. Examples of functions that allow to build matrices nonsin-
2
gular are: the gaussians (r) = er , > 0, the inverse multiquadric (r) = (c2 + r2 )1/2
and the multiquadric (r) = (c2 + r2 )1/2 , c > 0. In the two first cases it is even true that
the matrix A,X is always positive definite (and so invertible).
Remark. In what follows, in the context of RBFs, instead of A,X we shall use simply
A thinking to the interpolation matrix with radial basis functions.
In many disciplines one faces the following problem: we are given a set of data (measure-
ments, locations at which these measurements are taken,...) and we want to find a rule
which allows to get information about the process we are studying also at locations different
from those at which the measurements are taken (or provided).
The main reasons why we are interested on such a problem in our setting are:
13
Stefano De Marchi Four lectures on radial basis functions
Solving the interpolation problem under this assumption leads to a system of linear
equations of the form
Ac = y,
where the entries of the interpolation matrix A are given by Ajk = Bk (xj ), j, k = 1, . . . , N ,
c = [c1 , . . . , cN ]T , and y = [y1 , . . . , yN ]T .
The scattered data fitting problem will be well-posed, that is a solution to the problem
will exist and be unique, if and only if the matrix A is non-singular.
14
Stefano De Marchi Four lectures on radial basis functions
Definition 1. Let the finite-dimensional linear function space B C() have a basis
{B1 , . . . , BN }. Then B is a Haar space on if
det(A) 6= 0
for any set of distinct points x1 , . . . , xN . Here, A is the matrix with entries Ai,j =
Bj (xi ).
The existence of a Haar space guarantees the invertibility of the matrix A. In the
univariate setting it is well known that one can interpolate to arbitrary data at N distinct
data sites using a polynomial of degree N 1. This means that the polynomials of degree
N 1 form an N -dimensional Haar space for the set of distinct points X = {x1 , . . . , xN }.
Theorem 1. (Haar-Mairhuber-Curtis)
If Rd , d 2 contains an interior point, then there exist no Haar spaces of continuous
functions except for the 1-dimensional case.
Proof. Let d 2 and assume that B is a Haar space with basis {B1 , . . . , BN } with
N 2. We show that this leads to a contradiction. In fact, let x1 , . . . , xn be a set of N
distinct points in Rd and A the matrix such that Aj,k = Bk (xj ), j, k = 1, . . . , N . By the
above definition of Haar space det(A) 6= 0. Now, consider the closed path P in connecting
only x1 and x2 . This is possibile since by assumption contains an interior point. We
then can exchange x1 and x2 by moving them continuosly along P (without interfering
with other point xj ). This means that the rows 1 and 2 of the matrix A have been changed
and so the determinant has changed sign. Since the determinant is a continuous function
of x1 and x2 we must have had det(A) = 0 at some point along P . This contradicts the
fact that det(A) 6= 0.
15
Stefano De Marchi Four lectures on radial basis functions
d
Y
For example fd (x) = 4d xk (1 xk ), x = (x1 , . . . , xd ) [0, 1]d , which is zeros on the
k=1
boundary of the unit cube in Rd and has a maximum value of one at the center of the
d-dimensional cube.
Assume for now that d = 1. We have already seen that for small N one can use
univariate polynomials; if N is relatively large it is better to use splines (the simplest
approach is the C 0 piecewise linear splines connect the dots)
{Bk = | xk | k = 1, . . . , N } .
N
X
Pf (x) = ck |x xk |, x [0, 1]
k=1
Pf (xj ) = f1 (xj ), j = 1, . . . , N
Some observations.
The points xk to which the basic function is shifted to form the basis functions, are
usually referred to as centers or knots.
Technically, one could choose these centers different from the data sites. However,
usually centers coincide with the data sites. This simplifies the analysis of the method,
and is sufficient for many applications. In fact, relatively little is known about the
case when centers and data sites differ.
Now the coefficients ck in the scattered data interpolation problem are found by solving
the linear system
|x1 x1 | |x1 x2 | . . . |x1 xN | c1 f1 (x1 )
|x2 x1 | |x2 x2 | . . . |x2 xN | c2 f1 (x2 )
.. = (1.12)
.. .. . . .. ..
. . . . . .
|xN x1 | |xN x2 | . . . |xN xN | cN f1 (xN )
Distance matrices have been studied in geometry and analysis in the context of iso-
metric embeddings of metric spaces for a long time.
It is known that the distance matrix based on the Euclidean distance between a set
of distinct points in Rd is always non-singular (see below).
Since distance matrices are non-singular for Euclidean distances in any space dimension
d we have an immediate generalization: for the scattered data interpolation problem on
[0, 1]d we can take
XN
Pf (x) = ck kx xk k2 , x [0, 1]d , (1.13)
k=1
and find the ck by solving
kx1 x1 k2 kx1 x2 k2 ... kx1 xN k2 c1 fd (x1 )
kx2 x1 k2 kx2 x2 k2 ... kx2 xN k2 c2 fd (x2 )
= .
.. .. .. .. .. ..
. . . . . .
kxN x1 k2 kxN x2 k2 . . . kxN xN k2 cN fd (xN )
Piecewise linear splines in higher space dimensions are usually constructed differently
(via a cardinal basis on an underlying computational mesh)
For d > 1 the space span{k xk k2 , k = 1, . . . , N } is not the same as piecewise linear
splines
In order to show the non-singularity of our distance matrices we use the Courant-Fischer
theorem (see for example the book by Meyer [9]):
Theorem 2. Let A be a real symmetric N N matrix with eigenvalues 1 2 N ,
then
k = max min xT Ax and k = min max xT Ax.
dimV=k xV dimV=N k+1 xV
kxk=1 kxk=1
17
Stefano De Marchi Four lectures on radial basis functions
Figure 1.2: A typical basis function for the Euclidean distance matrix fit, Bk (x) = kxxk k2
with xk = 0 and d = 2.
N X
X N
cj ck Ajk < 0 (1.14)
j=1 k=1
N
X
for all c = [c1 , . . . , cN ]T 6= 0 RN that satisfy cj = 0.
j=1
Now we have
N
X
so that A has at least N 1 negative eigenvalues. But since tr(A) = k 0, A also
k=1
must have at least one positive eigenvalue.
18
Stefano De Marchi Four lectures on radial basis functions
Depending on the type of approximation problem we are given, we may or may not be able
to select where the data is collected, i.e., the location of the data sites or design.
Standard choices in low space dimensions are depicted in Fig. 1.3. In higher space di-
mensions it is important to have space-filling (or low-discrepancy) quasi-random point sets.
Examples include: Halton points, Sobol points, lattice designs, Latin hypercube
designs and quite a few others (digital nets, Faure, Niederreiter, etc.).
Figure 1.3: tensor products of equally spaced points and tensor products of Chebyshev
points
19
Stefano De Marchi Four lectures on radial basis functions
Here we show how one can generate Halton points in every spatial dimension. Halton
points are uniformely distributed random point in (0, 1)d generated from Van der Corput
sequences. We start by generating Van der Corput sequences. Let k N be chosen.
where the coefficients ai are integer such that 0 ai < p. For example taking n = 10
and p = 3
10 = 1 30 + 0 31 + 1 32
giving k = 2, a0 = a2 = 1, a1 = 0.
Pk
i=0 ai
(ii) We define the function hp : N [0, 1) as hp (n) = pi+1
. For example
1 1 10
hp (10) = + = .
3 33 27
In our example
1 2 1 4 7 2 5 8 1 10
h3,10 = {0, , , , , , , , , , } .
3 3 9 9 9 9 9 9 27 27
Starting from the Van der Corput sequence, the Halton seqnuence is generated as follows:
take d distinct primes p1 , . . . , pd and generated hp1 ,N , . . . , hpd ,N that we use as coor-
dinates of d-dimensional points, so that
Proposition 2. Halton points form a nested sequence, that is if M < N then Hd,M Hd,N .
20
Stefano De Marchi Four lectures on radial basis functions
These points can be constructed sequentially. Similar to these points are Leja se-
quences [4]. As a final observation, for Halton points we have
C(log N )d
DN (Hd,N ) .
N
The difference between the standard (tensor product) designs and the quasi-random
designs shows especially in higher space dimensions, as shown in Fig. 1.6
21
Stefano De Marchi Four lectures on radial basis functions
1.4 Exercises
In practice, using different colours plot the Halton points for different values of M
and N .
3. Again, by using the function DistanceMatrix.m on different set of Halton points of di-
mension d = 2, verify that the corresponding distance-matrix, say A, is ill-conditioned,
by computing its condtion number in the 2-norm (in Matlab cond(A)).
5. Repeat the previous exercise by using the Gaussian radial basis function, (x) =
2 2
e kxk , > 0 again for d = 2. For accomplish this interpolation, use the function
RBFInterpolation2D.m that generalizes the DistanceMatrixFit.m.
http://www.math.unipd.it/demarchi/TAA2010
23
Stefano De Marchi Four lectures on radial basis functions
24
Lecture 2
In the first lecture we saw that the scattered data interpolation problem with RBFs leads
to the solution of a linear system
Ac = y
with Ai,j = (kxi xj k2 ) and yi the i-th data value. The solution of the system requires
that the matrix A is non-singular. The situation is favourable if we know in advance that
the matrix is positive definite. Moroever we would like to characterize the class of functions
for which the matrix is positive definite.
for c RN . If the quadratic form (2.1) is zero only for c = 0 then A is called positive
definite (sometimes we will use the shorthand notation PD).
The most important property of such matrices, is that their eigenvalues are positive and
so is its determinant. The contrary is not true, that is a matrix with positive determinant
is not necessarly positive definite. Just consider as a trivial example the matrix (in Matlab
notation) A=[-1 0 ; 0 -3] whose determinant is positive, but A is not PD.
25
Stefano De Marchi Four lectures on radial basis functions
Hence, if in (1.11) the basis Bk generates a positive definite interpolation matrix that
we would always have a well-defined interpolation problem. In order to get such property,
we need to introduce the class of positive definite functions.
is nonnegative. The function is then called positive definite if the quadratic form
above is positive for c CN , c 6= 0.
Ty
Example 4. The function (x) = eix , for a fixed y Rd , is PD on Rd . If fact,
N X
N N X
N
Ty
X X
ci cj (xi xj ) = ci cj ei(xi xj )
i=1 j=1 i=1 j=1
N
! N
T T
cj eixj y
X X
= ci eixi y
i=1 j=1
N 2
X
ixT y
= ci e 0.
i
i=1
Remark. From the previous definition and the discussion done on Lecture 1, we
should usePPD functions as basis, i.e. Bi (x) = (x xi ), that is an interpolant of the form
N
Pf (x) = i=1 ci Bi (x). Moreover at this point, we do not need Pf be a radial function, but
simply translation invariant (that is, Pf is the same as the translated interpolant to the
original data). We will characterize PD and radial functions in Rd later in this lecture.
These functions have some properties that we summarize in the following theorem.
Theorem 4. Suppose is a positive semi-definite function. Then
Proof. If is PD and real valued, then it is even by the previous theorem (by property
2.). Letting ck = ak + i bk then
N
X N
X
ci cj (xi xj ) = (ai aj + bi bj )(xi xj ) +
i,j=1 i,j=1
N
X
+ i aj bi [(xi xj ) (xj xi )] .
i,j=1
As is even, the second sum on the right hand side is zero. The first sum is nonnegative
bacause of the assumption, vanishing only if ai = bi = 0.
ix ix
Example 5. The cosine function is PD on R. In fact, for all x R, cos x = e +e
2 . By
property 4. of Theorem 4 and the fact that the exponential is PD (see Example 4), we
conclude.
When we are dealing with radial functions i.e (x) = (kxk), then it will be con-
venient to refer to the univariate function as positive definite radial function. A
consequence of this notational convention is the following Lemma.
27
Stefano De Marchi Four lectures on radial basis functions
The classical way to characterize PD and radial functions is by this Theorem, due to Schoen-
berg [11].
where (
cos(r) d=1
d (r) = d 2 (d2)/2
( 2 ) r J(d2)/2 (r) d 2
Here J is the classical Bessel function of the first kind of order , that is the solution of
the Bessel differential equation. For example, around x = 0, this function can be expressed
as the Taylor series
X (1)k x 2k+
J (x) = .
k! (k + + 1) 2
k=0
In particular, as already seen, (x) = cos(x) can be seen as a fundamental PD and radial
function on R.
Observation. Since these lectures are intended for people with no preperation and
mathematical background on measure theory and functional analysis, we prefer to avoid to
introduce more results characterizing PD functions by using (Fourier) transforms. We prefer
to use another and more comprehensible approach based on the definition of completely
monotone and multiply monotone functions. Interested readers can satisfy their hunger for
knowledge by looking at the books [1, 6, 13].
In the framed Theorem 6 and successive remark, we observed that we avoid to characterize
the positive definiteness of RBF using Fourier transforms, also because Fourier transforms
are not always easy to compute.
This definition allows to verify when a function is positive definite and radial for all
dimensions d.
Here we enumerate some of the most important PD functions showing that they are
CM.
1
3. The function (r) = (1+r)
, 0 is CM on [0, ) since
There are now two Theorems to quote that give more informations
Hence as a consequence of these two theorems, the functions (r) = ecr , c > 0 and
1 d
(r) = (1+r) , 0 are CM on [0, ) and since they are not constant, in R for all d,
we have
2 2
1. (x) = ec kxk , c > 0 is positive definite and radial on Rd for all d. This is
the family of Gaussians. The parameter c is a shape parameter that change the
shape of the function, making it more spiky when c and flatter when c 0.
This characterization allows to check when a function is PD and radial on Rd for some
fixed d.
29
Stefano De Marchi Four lectures on radial basis functions
In what follows sometimes we use the shorthand notation MM for multiply monotone func-
tions.
k (r) = (1 r)k+
These functions (we will see later) lead to radial functions that are PD on Rd provided
k bd/2c + 1.
30
Stefano De Marchi Four lectures on radial basis functions
Figure 2.2: Left: truncated power with k = 1. Right: truncated power with k = 2.
2
(x) = ekxk Ld/2 2
n (kxk )
d/2
where Ln indicates the Laguerre polynomial of degree n and order d/2, that is
n
(1)k n + d/2 k
X
d/2
Ln (t) = t .
k! nk
k=0
d n=1 n=2
2 2
1 (3/2 x2 )ex (15/8 5/2 x2 + 1/2x4 )ex
2 2
2 (2 kxk2 )ekxk (3 3kxk2 + 1/2kxk4 )ekxk
Figure 2.3: First row: G-L when d = 1 and n = 1, 2. Second row: G-L for d = 2 and
n = 1, 2.
the cases d = 2, 3, 4.) Notice: Jp in Matlab can be computed using the function
besselj(p, z) (where z is an array of evaluation points)
3. Mat
ern functions. They are defined as
Kd/2 (kxk)kxkd/2
(x) = , d < 2 ,
21 ()
where Kp is the modified Bessel function of the second kind of order p, that can be
defined as a function of the Bessel function of first kind as follows
Jp (x) Jp (x)
Kp (x) =
2 sin( p)
In Table 2.3 we present the M atern functions for three values of , indicated as
i , i = 1, 2, 3. Notice that the Matern function for 1 is not differentiable at the
origin; while for 2 is C 2 (Rs ) and for 3 is C 4 (Rs ).
32
Stefano De Marchi Four lectures on radial basis functions
Figure 2.4: Poisson RBF for d = 2, 3, 4, respectively. Here the shape parameter is = 10
5. Whittaker functions. The idea of these functions is based on the following construc-
tion. Let f C[0, ) be a non-negative and not identically equal to zero, and define
the function Z
k1
(r) = (1 rt)+ f (t)dt . (2.3)
0
N
X Z N
X
ci cj (kxi xj k) = ci cj k1 (tkxi xj k)f (t)dt
i,j=1 0 i,j=1
33
Stefano De Marchi Four lectures on radial basis functions
Figure 2.5: Matern RBF for different , respectively. As before the shape parameter is
= 10
k=2 k=3
34
Stefano De Marchi Four lectures on radial basis functions
Figure 2.6: Inverse multiquadric with shape parameter = 3. On the left = 1/2 that
corresponds to the Hardy multiquadric. On the right with = 1 that corresponds to the
inverse quadric.
2.2 Exercises
1. Plot some of the radial function positive definite (centered in the origin). When d = 1
take x [1, 1] while for d = 2 consider x [1, 1]2 .
- The Gaussians -Laguerre for n = 1, 2 and d = 1, 2. See Table 2.1 for the corre-
sponding definitions.
- The Poisson functions for d = 2, 3, 4 in [1, 1]2 using as shape parameter = 10,
see Table 2.2
- The Matern function in [1, 1]2 , for three different values of and shape param-
eter = 10, as defined in Table 2.3.
- The generalized inverse multiquadrics (x) = (1 + kxk2 ) , s < 2, in [1, 1]2 ,
in these two (simple) cases (with = 5): = 1/2 (which corresponds to the so-
called Hardy inverse multiquadrics) and = 1 (which is the inverse quadrics).
- The truncated powers (x) = (1 kxk)l+ for l = 2, 4 (in [1, 1]2 ).
- The Whittakers potentials in the square [1, 1]2 for different values of the pa-
rameters , k and , as defined in Table 2.4. For these last plots use = 1.
f (x1 , x2 ) = .75 exp[((9x1 2)2 + (9x2 2)2 )/4] + .75 exp[(9x1 + 1)2 /49 (9x2 + 1)/10]
+ .5 exp[((9x1 7)2 + (9x2 3)2 )/4] .2 exp[(9x1 4)2 (9x2 7)2 ];
on a grid of 20 20 Chebyshev points in [0, 1]2 with Poisson and Matern functions.
Compute also the corresponding RMSE.
35
Stefano De Marchi Four lectures on radial basis functions
36
Lecture 3
The multivariate polynomials are not suitable for solving the scattered data interpolation
problem. Only data sites in special locations can guarantee well-posedness of the inter-
polation problem. In order to have a flavour of this setting, we introduce the notion of
unisolvency.
Definition 8. A set X = {x1 , . . . , xN } Rd is called m-unisolvent, if the only polyno-
mial of total degree at most m interpolating the zero data on X is the zero polynomial.
Then, to get a unique solution of the polynomial interpolation problem with a d-variate
d
m,
polynomial of degree to some given data, we need to find a subset of X R with
m+d
cardinality M = = dim(Pm (Rd ) which is m-unisolvent.
m
Example 7. These are examples of points that form unisolvent sets
A simple example is this one. Take N = 5 points in R2 . There is no unique way to use
bivariate linear interpolation o quadratic. In fact linear polynomials have dimension
M = 3, bivariate M = 6.
The Padua points on the square [1, 1]2 , form the first complete example of unisol-
vent points whose Lebesgue constant has optimal growth (cf. [10]).
In R2 the Chung and Yao points. The construction of these( d points is based
) on
X
lattices. In practise, a lattice Rd has the form = hi vi , hi Z with
i=1
{vi , . . . , vd } a basis for Rd . For details see the paper [2].
37
Stefano De Marchi Four lectures on radial basis functions
The m-unisolvency of the set X = {x1 , . . . , xN } is equivalent to the fact that the
matrix P such that
for any polynomial basis pj , has full (column)-rank. For N = M this is the classical
polynomial interpolation matrix.
This observation, can be easily checked when in R2 we take 3 collinear points: they are
not 1-unisolvent, since a linear interpolant, that is a plane through three arbitrary heights at
these three collinear points is not uniquely determined. Otherwise such a set is 1-unisolvent.
Remark. This problem arises when we want to construct interpolants with polynomial
precision. That is, interpolants that reproduce polynomials.
Example 8. Take the function f (x, y) = (x+y)/2 on the unit square [0, 1]2 . Using Gaussian
RBF (with = 6) interpolate it on a grid of N = 1089 = 33 33 uniformly distributed
points. This will lead to an interpolant which is not a linear function as is f (i.e. the
interpolation does not work).
N
2 kxx k2
X
Pf (x) = ci e i
+ cN +1 + cN +2 x + cN +3 y . (3.1)
| {z }
i=1
polynomial part
how can we find the remaining three conditions so that the resulting system will be square?
As we shall see later, the solution is
N
X N
X N
X
ci = 0, ci xi = 0, ci yi = 0 . (3.3)
i=1 i=1 i=1
Combining the interpolation conditions (3.2) with the side conditions (3.3), the resulting
linear system becomes
A P c y
= (3.4)
PT O d 0
where the matrix A and the vectors c and y are the same as in Lecture 1; while P is a
N 3 matrix with entries Pi,j = pj (xi ) with p1 (x) = 1, p2 (x) = x, p3 (x) = y. Finally the
matrix O is 3 3 of zeros.
38
Stefano De Marchi Four lectures on radial basis functions
N
X
ci pj (xi ) = 0, j = 1, . . . , M , (3.6)
i=1
A P c y
= (3.7)
PT O d 0
where now the matrices and arrays have the following dimensions: A is N N , P is N M ,
O is M M ; c, y are N 1 and d, 0 are M 1.
Remark. It seems ill-adapted for the use to formulate the general setting for the
polynomial space Pm1 (Rd ) instead, as before, of Pm (Rd ). The reason will be explained
later and, as we will see, it will be quite natural.
In order to prove that the augmented system (3.7) is non-singular we start with the easiest
case of m = 1 (in any dimension d). This is equivalent to the reproduction of the polynomials
of degree m 1 = 0, i.e. the constants.
The following theorem shows that the augmented system is uniquely solvable
39
Stefano De Marchi Four lectures on radial basis functions
Proof.. Assume that the solution (c, d)T RN +1 is a solution of the homogeneous
system, i.e. with y = 0. We then prove that c = 0 and d = 0. From the first block,
multiplying by cT , we have
cT Ac + dcT P = 0
an using the second equation P T c = cT P = 0 we get that cT Ac = 0. Since A is CPD of
order 1, by asumption we get c = 0. Moreover, from the first block, Ac + dP = 0 that
implies d = 0.
Corollary 1. For reproducing constant functions in Rd , the system to be solved has the
form (3.9).
40
Stefano De Marchi Four lectures on radial basis functions
Hence the natural question is to look for the smallest possibile order m. When
speak of the order of such class of functions we will always refer to the minimal possible m.
Here we list the most important conditionally positive definite radial functions
Generalized multiquadrics.
Radial powers
(x) = kxk , x Rs , 0 <
/ 2N ,
which are CPD of order m = d 2 e (and higher).
For = 3 we get a function which is CPD of order 2 and for = 5 a function which
is CPD of order 3. A property of these functions is that they are shape parameter
free. This has the advantage that the user need not worry about finding a good (or
the best) value for .
Thin-plate splines
As for the PD case, the definition reduces to real coefficients and polynomials if the
basis function is real-valued and even. This is the case when the function is radial.
42
Stefano De Marchi Four lectures on radial basis functions
The special case m = 1 appers already in the linear algebra literature and it is dealt
PN and is then referred as conditionally negative definite. The constraints are simply
with
i=1 ci = 0. Since the matrix A is CPD of order 1, it is PD in a subspace of dimension
N 1 (or in general N M where M = dim(Pm1 (Rd )). That is only N 1 (or N M )
eigenvalues are positive.
Theorem 11. Suppose is CPD of order 1 and that (0) 0. Then the matrix
A RN N , i.e. Ai,j = (xi xj ), has one negative and N 1 positive eigenvalues. In
particular it is invertible.
For example, the generalized multiquadrics (x) = (1)de (1 + kxk2 ) , 0 < < 1 (which
includes the Hardys one, = 1/2) satisfied the previous Theorem.
Proof.. Assume that the solution (c, d)T RN +M is a solution of the homogeneous
system, i.e. with y = 0. We then prove that c = 0 and d = 0. From the first block,
multiplying by cT , we have
cT Ac + cT P d = 0
an using the second equation P T c = cT P = 0T we get that cT Ac = 0. Since A is CPD
of order m, by assumption we get c = 0. The unisolvency of the data sites, i.e. P has
43
Stefano De Marchi Four lectures on radial basis functions
columns linearly independent, and the fact that c = 0 guarantee d = 0 from the top block
Ac + P d = 0.
3.1 Exercises
1. Plot the most important conditionally positive definite functions in the square [1, 1]2 .
The most important is the one for = 1 which is CPD of order 2. For = 2
we get a function which is CPD of order 3. Verify that also these functions are
shape parameter free.
2. For a CPD function of order 1 (such as the Hardy multiquadric) check the Theorem
11.
44
Lecture 4
Error estimates
In evaluating the error between the interpolant Pf and the data values at some set =
{ 1 , . . . , M } Rd of evaluation points we can compute the root-mean-square error,
that is v
u
u1 X M
1
RM SE := t (Pf ( j ) f ( j ))2 = kPf f k2 . (4.1)
M M
j=1
The root-mean-square error (RMSE) is a frequently used measure of the differences between
values predicted by a model or an estimator and the values actually observed. These
individual differences are called residuals when the calculations are performed over the
data sample that was used for estimation, and are called prediction errors when computed
out-of-sample. The RMSE serves to aggregate the magnitudes of the errors in predictions
for various times into a single measure of predictive power. RMSE is a good measure of
accuracy, but only to compare forecasting errors of different models for a particular variable
and not between variables, as it is scale-dependent. In practice, (4.1) is simply a quantitative
error estimator.
Our goal is to provide error estimates for scattered data interpolation with (condition-
ally) positive definite functions. We start by considering the PD case.
The measure that is always used in approximation theory is the fill-distance or mesh size
which represents how well the data in X fill out the domain and corresponds to the radius
of the largest empty ball that can be placed among the data sites inside .
45
Stefano De Marchi Four lectures on radial basis functions
In Matlab the fill distance can be determined by the line h=max(min(DM eval)), where
DM eval is the matrix consisting of the mutual distances between the evaluation points (for
example a uniform grid in ) and the data set X. In Fig. 4.1 we show 25 Halton points and
the corresponding fill distance computed on a grid of 11 11 evaluation points of [0, 1]2 .
(h)
What we want to analysize, is whether the error kf Pf k 0 as h 0, here
(h)
Pf indicates the interpolant depending on the fill-distance h. To understand the speed
of convergence to zero, one has to understand the so-called approximation order of the
interpolation process.
Definition 11. We say that the process has approximation order k if
(h)
kf Pf kp = O(hk ), k 0
here the norm is taken for 1 p .
For completeness, another quantity often used for stability analysis purposes (or finding
good interpolation points) is the separation distance of the data sites X
1
qX := min kxi xj k
2 i6=j
which represents the radius of the largest ball that can be placed around every point of
X such there are no overlaps. In Matlab qX=min(min(DM data+eye(size(DM data))))/2,
where DM data is the matrix of distances among the data sites. We added the identity
matrix in order to avoid that this minimum will be 0, which is the separation distance of
every points from itself.
This idea goes back to Wu and Schaback [14] and consists in expressing the interpolant by
means of cardinal functions as in the polynomial case.
Theorem 13. If is a positive definite kernel on Rd . Then, for any set of distinct
points x1 , . . . , xN , there exist functions uj span{( xj ), j = 1, . . . , N } such that
uj (xi ) = i,j .
Here we enumerate a few facts in analogy with univariate fundamental Lagrange polyno-
mials.
The uj do not depend on the data values (i.e. the fj ). In fact, once the data sites
X and the kernel are chosen, then they can be determined by solving the system
(4.3). That is, they do depend on the data sites and the kernel.
An important aspect, related to stability issues, is the choice of the data points. As
proved in [5], the quasi-uniform points are always a good choice for RBF interpolation.
Another ingredient for understanding the error estimates is the power function. The starting
point to define this function is the following quadratic form
where the vector b RN is defined as in the previous section and u is any N dimensional
vector.
47
Stefano De Marchi Four lectures on radial basis functions
Using formula (4.4), combined with the system that defines the cardinal functions (4.3),
we get two alternatives ways to compute the power function
q q
P,X (x) = (0) (u (x)) b(x) = (0) (u (x))T Au (x) , (first) (4.6)
T
q
P,X (x) = (0) (b(x))T A1 b(x) , (second). (4.7)
Notice that when is a PD kernel then A is, therefore we get immediately the following
bounds: p
0 P,X (x) (0) .
Figure 4.2: Power function for the Gaussian kernel with = 6 on a grid of 81 uniform,
Chebyshev and Halton points, respectively.
Proof. Consider the formula (4.4), the minimum of this quadratic form is given by
the solution of the linear system Au = b(x) which, however yields the cardinal functions
u = u (x).
The power function, by definition, is a positive function, vanishing at the data sites,
decreasing to zero as the number of data points increases. Therefore, if we take two data
sets such that Y X then PY, PX, . This is referred as the maximality property of the
power function.
As a final remark, the power function is defined similarly for conditionally positive
definite functions.
The error bounds come rather naturally once we associate with each radial basic function
a certain space of functions called native space. This space in connected to the so called
Reproducing Kernel Hilbert Space (RKHS). The theory of RKHS is beyond our aims, but
for understanding a little better the error estimates that we will present, it is necessary to
introduce some very basic notions of RKHS.
Definition 13. A space of functions is called an Hilbert space if it is a real or complex
inner product space that is also a complete metric space w.r.t. the distance induced by the
inner product.
Z b
Here the inner product between two functions f and g is thought as (f, g) = f (x)g(x)dx ,
Z b a
in the real case or (f, g) = f (x)g(x)dx , in the complex case, which has many of the fa-
a
miliar properties of the Euclidean (discrete) dot product.
Examples of Hilbert spaces are: any finite dimensional inner product space (for example
Rn , Cn equipped with the dot product of two vectors); the Lebesgue spaces Lp , Sobolev
spaces. The space C([a, b]) is an incomplete product space dense in L2 ([a, b]) which is
complete.
Definition 14. Let H be a real Hilbert space of functions f : R with inner product
(, )H . A function K : R is called a RKHS for H if
The second is the reproducing property. In particular, if f = K then we get the kernel
K since K(x, y) = (K(, y), K(, x))H for all x, y .
Notice: the RKHS is known to be unique and that the kernel K is positive definite.
We now show that every positive definite radial basis function can be associated
with a RKHS: its native space.
provided xj . Moreover
XN N
X
kf k2H = (f, f )H =( ci K(, xi ), cj K(, xj ))H
i=1 j=1
N
X
= ci cj (K(, xi ), K(, xj ))H
i,j=1
N
X
= ci cj K(xi , xj ) .
i,j=1
HK () := span{K(, y) : y } (4.8)
XNK NK
X NK
X
( ci K(, xi ), dj K(, y j ))K = ci dj K(xi , y j )
i=1 j=1 i,j=1
The last observation is that, this bilinear form defines an inner product on HK (), so
that Hk () is a pre-Hilbert space, that means that it is not complete.
The native space for the kernel K, indicated as NK () (or if not confusion arises, simply
N ), is the completition of HK () w.r.t. the K-norm k kK , so that kf kK = kf kN for all
f HK (). For details, please refer to the book by Wendland [13].
50
Stefano De Marchi Four lectures on radial basis functions
We quote here two results that gives a flavour of the topic. The first theorem gives a
pointwise estimate based on the power function, whose proof can be found in [6, p. 117-
118] while the second uses the fill-distance (the proof is again in [6, p. 121-122]).
and
C (x) = max max |D2 (w, z)| .
||=2k w,zB(x,c2 hX, )
Comparing (4.9) and (4.10) we get a bound of the power function in terms of the
fill-distance p
P,X (x) ChkX, C (x) .
Moreover the Theorem 16 says that interpolation with a C 2k kernel has approxima-
tion order k. This means that for kernels infinitely smooth, such as Gaussians, Laguerre-
Gaussians, Poisson and generalized inverse multiquadrics, the bound above is arbitrarely
high. On the contrary, Matern, Whittaker radial functions have approximation order limited
by the smoothness of the basic function .
One more observation is that the above estimates consider f N() . There exist
similar estimates for f
/ N() (see e.g. [6, 15.3]).
51
Stefano De Marchi Four lectures on radial basis functions
This last section aims to give some insights to the problem of the choice of shape parameter
in order to get the smallest (possible) interpolation error. In the recent literature there
have been exploited various strategies. Here, we present only three of them, which turn out
to be indeed the most used by practictioners.
In all that follows, we assume to use the same kernel , we use only one value to scale
all basis functions uniformly. The number of data points could be changed by comparison
of results.
Figure 4.3: Trial and error strategy for the interpolation of the 1-dimensional sinc function
by the Gaussian for [0, 20], taking 100 values of and for different equispaced data
points.
52
Stefano De Marchi Four lectures on radial basis functions
This is simply connected to the error analysis presented in the previous section (cf. formula
(4.9)). Once we have decided which and data set X to use, we calculate the power
function on scaled version of the kernel in order to optimize the error component that is
independent of f . This approach is similar to the Trial and Error strategy and has the limit
to forget the second part of the error, i.e. the one that depends on the basis function via
the native space norm of f . In Figure 4.5 we plot the sup-norm of the power function for
500 values of [0, 20] for the Gaussian kernel and the set of uniform data points X with
N = 9, 25, 81, 289. This strategy is implemented in the M-file Powerfunction2D.m in the
Matlab codes provided in [6].
Figure 4.4: The sup-norm of the 2D power function on uniform points for the Gaussian
kernel, varying [0, 20] for different values of N
This method is popular in the statistics literature, known in the case the 2-norm is used
as PRESS (Predictive REsidual Sum of Squares). The optimal value is obtained
minimizing the (least-squares) error for a fit of the data values based on an interpolant for
which one of the centers is left out. In details, we start by constructing Pf,k , the radial basis
53
Stefano De Marchi Four lectures on radial basis functions
such that
Pf,k (xi ) = fi , i = 1, . . . , k 1, k + 1, . . . , N .
Then we compute the error at the point xk , the one not used by Pf,k
Ek = fk Pf,k (xk ) .
Then the quality of the fit is determined by the (sup) norm of the vector E = [E1 , . . . , EN ]T .
In the experiments, people add a loop on in order to compare the error norms for
different values of the shape parameter, choosing for the one that yields the minimum
error norm. This method in general is quite expensive from the computation point of
view (it has a complexity of O(N 4 ) flops). There is a way to accelerate the method, by
computing Ek as
ck
Ek = 1
Ak,k
where ck is the k-th coefficient in the interpolant Pf based on all the data points and A1
k,k
is the k-th diagonal element of the inverse of the corresponding collocation matrix. Since
both ck and A1 will be computed once for each value of , this results in O(N 3 ) flops.
This strategy is implemented in the M-file LOOCV.m in the Matlab codes provided in [6].
Figure 4.5: LOOCV 1d: sup norm of the error on Chebyshev points for the Gaussian kernel,
varying [0, 20] for different values of N
54
Stefano De Marchi Four lectures on radial basis functions
4.7 Exercises
1. Find the optimal shape parameter, opt , by means of the trial & error strategy for the
following univariate functions:
(a)
sin x
f1 (x) = sinc(x) = .
x
(b) variant of the Franke function
3 (9x2)2 /4 2
1 2 1 2
f2 (x) = e + e(9x+1) /49 + e(9x7) /4 e(9x4) ,
4 2 10
For each of the fi , i = 1, 2 produce a table of the form
N kPfi fi k opt
3
5
9
17
33
65
where, for each N , opt corresponds to the minimum of the error curve in the sup-
norm, computed by varying the shape parameter [0, 20]. As radial basis function
for the Pfi take the Gaussian.
2. Plot the power function in 2-dimension for the Gaussian kernel with = 6 on a grid
of N = 92 = 81 equispaced, Chebyshev and Halton points in [1, 1]2 . Well see that
the power function will depend on the chosen points.
Verify that PX, (xi ) = 0 for all xi X. Show how varies the maximum value of the
power function as N increases.
Use the M-file Powerfunction2D.m.
3. Plot kP,X k for the Gaussian kernel in 2-dimensions, by using for the power function
the formula q
P,X (x) = (0) (b(x))T A1 b(x)
with A representing the interpolation matrix and b(x) = [(xx1 ), , (xxN )]T ,
by varying [0, 20], and N = 9, 25, 81, 289. Take equispaced points both as centers
and evaluation points of the power function.
Make a table similar to the one of the previous exercise adding one column for the
condition number of A corresponding to the optimal shape parameter. Use the
function Powerfunction2D.m for computing the 2-dimensional power function.
55
Stefano De Marchi Four lectures on radial basis functions
56
Bibliography
[2] Chung K. C. and Yao, T. H.: On lattices adimmitting unique Lagrange interpolations,
SIAM J. Numer. Anal. 14 (1977), 735743.
[3] C. de Boor, A Practical Guide to Splines, revised edition, Springer, New York 2001.
[4] S. De Marchi: On Leja sequences: some results and applications, Appl. Math. Comput.
152(3) (2004), 621647.
[6] Gregory E. Fasshauer, Meshfree Approximation Methods with Matlab, World Scientific
Publishing, Interdisciplinary Mathematical Sciences - Vol 6, 2007.
[7] Gregory E. Fasshauer, Meshfree Approximation Methods with Matlab, Lecture 1, slides.
Dolomites Res. Notes Approx. - Vol 1, 2008.
[8] Armin Iske, Multiresolution Methods in Scattered Data Modelling, Lecture Notes in
Computational Science and Engineering Vol. 37, Springer (2004)
[9] Meyer, C. D., Matrix Analysis and Applied Linear Algebra. SIAM (Philadelphia), 2000.
[10] L. Bos, M. Caliari, S. De Marchi, M. Vianello and Y. Xu: Bivariate Lagrange inter-
polation at the Padua points: the generating curve approach, J. Approx. Theory 143
(2006), 1525.
[11] Schoenberg I. J.: Metric spaces and completely monotone functions, Ann. of Math. 39
(1938), 811841.
[12] L. L. Schumaker, Spline Functions - Basic Theory, Wiley-Interscience, New York 1981.
[14] Wu Z. and Schaback R. : Local error estimates for radial basis function interpolation
of scattered data, IMA J. Numer. Anal. 13 (1993), 1327.
58