LectureNotes RBF

Four lectures on
Radial Basis Functions
by Stefano De Marchi
Department of Mathematics
University of Padua
Antwerp, October 14-21, 2013

Stefano De Marchi Four lectures on radial basis functions
These lecture notes were inspired mainly by two seminal books on the topic by Holger
Wendland [13] and by Gregory E. Fasshauer [6]. The first presents more theoretical aspects
while the second provides also useful Matlab functions for understanding better the theory
and all the applictions discussed. The notes have then been used during a short teaching-
visit of the author to the University of Antwerp, for the Erasmus Teaching Staff Mobility.
People interested on radial basis functions, can refer to the wide literature available
that, especially in the last two decades, has grown very fast. The popularity of radial basis
functions can be understood by means of the following parallelism. In many cooking
recepies the parsley is used to give flavour and colour to dishes. Radial basis functions
can be considered as a mathematical parsley since they have been used in all mathematical
problems requiring a powerful, i.e. efficient and stable, approximation tool.
These four lectures were thoughts for students without a strong background on func-
tional analysis, so in the presentation of the topics I deliberately avoid, when possible, to
introduce functional analysis concepts. This is a great lack, but I hope that the people who
will use these notes will be not too critical towards me.
Moreover, these are only four introductory lectures on the topic and many important
aspects and applications, for lack of time, are not considered. Every lecture provides also a
set of exercises solvable by using Matlab. This choice has been done with the aim of making
the discussion more interesting from both the numerical and geometrical point of view.
I do hope that after this brief introduction, interested students will be encouraged and
also interested in getting into this fascinating mathematical tool.
Stefano De Marchi
Antwerp October 14, 2013.
2
Contents
1 Learning from splines 9
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 From cubic splines to RBF . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 The scattered data interpolation problem . . . . . . . . . . . . . . . . . . . 13
1.3.1 The Haar-Mairhuber-Curtis theorem . . . . . . . . . . . . . . . . . . 15
1.3.2 Distance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.3 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.4 Halton points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Positive definite functions 25
2.1 Positive definite matrices and functions . . . . . . . . . . . . . . . . . . . . 25
2.1.1 Completely monotone functions . . . . . . . . . . . . . . . . . . . . . 28
2.1.2 Multiply monotone functions . . . . . . . . . . . . . . . . . . . . . . 29
2.1.3 Other positive definite radial functions . . . . . . . . . . . . . . . . . 31
2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Conditionally positive definite functions 37
3.0.1 Conditionally positive definite matrices and functions . . . . . . . . 39
3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3
4 Error estimates 45
4.1 Fill distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Lagrange form of the interpolant . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 The power function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Native space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Generic error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6 Strategies in reducing the interpolation error . . . . . . . . . . . . . . . . . 52
4.6.1 Trial and Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6.2 Power function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.3 Cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4
List of Figures
1.1 Data points, data values and data function . . . . . . . . . . . . . . . . . . 14
1.2 A typical basis function for the Euclidean distance matrix fit, Bk (x) = kx
xk k2 with xk = 0 and d = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 tensor products of equally spaced points and tensor products of Chebyshev
points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Halton and Sobol points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Lattice and Latin points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 Projections in 2D of different point sets . . . . . . . . . . . . . . . . . . . . 24
2.1 Left: Gaussian with c = 3. Right: inverse multiquadric with = 1. . . . . . 30
2.2 Left: truncated power with k = 1. Right: truncated power with k = 2. . . . 31
2.3 First row: G-L when d = 1 and n = 1, 2. Second row: G-L for d = 2 and
n = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Poisson RBF for d = 2, 3, 4, respectively. Here the shape parameter is = 10 33
2.5 Matern RBF for different , respectively. As before the shape parameter is
= 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Inverse multiquadric with shape parameter = 3. On the left = 1/2

that corresponds to the Hardy multiquadric. On the right with = 1 that
corresponds to the inverse quadric. . . . . . . . . . . . . . . . . . . . . . . . 35
2.7 Whittaker functions with = 0, k = 2 and = 1, k = 2. . . . . . . . . . . . 36
3.1 Multiquadrics. Left: Hardy multiquadric. Right: with = 5/2. . . . . . . . 41
3.2 Radial powers. Left: with = 3, = 3. Right: with = 5, = 3. . . . . . 42

5
3.3 Thin plate splines. Left: for = 1, = 1. Right: for = 2, = 1. . . . . . 42
4.1 The fill distance of 25 Halton points h 0.2667 . . . . . . . . . . . . . . . . 46
4.2 Power function for the Gaussian kernel with = 6 on a grid of 81 uniform,
Chebyshev and Halton points, respectively. . . . . . . . . . . . . . . . . . . 48
4.3 Trial and error strategy for the interpolation of the 1-dimensional sinc func-
tion by the Gaussian for [0, 20], taking 100 values of and for different
equispaced data points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 The sup-norm of the 2D power function on uniform points for the Gaussian
kernel, varying [0, 20] for different values of N . . . . . . . . . . . . . . 53
4.5 LOOCV 1d: sup norm of the error on Chebyshev points for the Gaussian
kernel, varying [0, 20] for different values of N . . . . . . . . . . . . . . 54
6
List of Tables
2.1 Gaussians-Laguerre functions for different d and n . . . . . . . . . . . . . . 31
2.2 Poisson functions for various d . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Matern functions for different values of the parameter . . . . . . . . . . . 33
2.4 Whittaker functions for various choices of k and . . . . . . . . . . . . . . 34
7
8
Lecture 1
Learning from splines
1.1 Motivations
In practical applications we have to face the problem of reconstructing an unknown function

f from a set (usually small) of data. These data consist of two sets: the data sites X =
{x1 , . . . , xN } and the data values fj = f (xj ), j = 1, . . . , N . The reconstruction has to
approximate the data values at the data sites. In practice we are looking for a function s
that either interpolates the data, i.e. it satisfies the conditions s(xj ) = fj , 1 j N or
approximate the data, i.e. s(xj ) fj . This latter is important, for example, when the data
come from some measurement or contain noise.
In many cases the data are scattered, that is they have no special structure, and they
are in a big amount (several millions). Moreover in several applications the data sites are
considered in high dimension. Hence, for a unifying approach, methods have been developed
in the last decades with the aim to meet all these (new) situations.

We start from the univariate setting. We suppose that the data sites are ordered as follows
X : a < x1 < x2 < < xN < b (1.1)
and we have some data values f1 , . . . , fN to be interpolated at the data set X. What we
want to do, mathematically speaking, is finding s : [a, b] R with the property s(xj ) = fj
for all j = 1, . . . , N .
Notice, that the data values fj is not necessary stem from a function f but we shall
keep in mind this possibility for reasons that will become clearer later.
In the univariate setting, a simple solution of the above problem consists in taking
s as polynomial p of degree at most N 1. However, as we can see later, this solution
is not working in higher dimensions. Remaining in the univariate case, no one with
9
experience in approximation theory would even try to interpolate a hundred

thousand points with a polynomial. Indeed it is a well-established fact that a large
data set is better dealt with splines than by polynomials. One aspect to notice in contrast
to polynomials, the accuracy of the interpolation process using splines is not based on the
polynomial degree but on the spacing of the data sites.
Let us review briefly the main properties of univariate splines, especially in the case of
cubic splines. The set of cubic splines corresponding to the subdivision (1.1) is the space
S3 (X) = {s C 2 [a, b] : s|[xi ,xi+1 ] P3 (R), 0 i N } (1.2)
with a = x0 and xN +1 = b. The space S3 (X) has dimension N + 4, so that the interpolation
conditions s(xi ) = fi , 1 i N are not sufficient to guarantee a unique interpolant. To
enforce uniqueness, in the case of natural splines, i.e. the set
N3 (X) = {s S3 (X) : s|[a,x1 ] , s|[xN ,b] P1 (R)} (1.3)
that consists of all cubic splines that are linear polynomials on the outer intervals [a, x1 ] and
[xN , b]. It come easy to see that a cubic spline s is a natural spline if and only if it satisfies
s00 (x1 ) = s(3) (x1 ) = 0 and s00 (xN ) = s(3) (xN ) = 0. With this choice we have imposed 4
additional conditions to the space, so it is natural to assume that the dim(N3 (X)) = N .
Even more, it can be shown that the initial interpolation problem has a unique solution in
N3 (X).
Here some important properties of splines that are worth to be mentioned
1. They are piecewise polynomials.
2. An interpolating natural cubic spline satisfies a minimal norm property. Assume that
f comes from the Sobolev space H2 [a, b], i.e. f is continuous in [a, b] and has weak
first and second order derivatives in L2 [a, b] (a more precise definition will be done
later or can be found in any books of functional analysis). Assume further that f is
such that f (xj ) = fj , 1 j N . If sf,X denotes the natural cubic spline interpolant
(at the data set X) then
(f 00 s00f,X , s00f,X )L2 [a,b] = 0 .
This leads to the Pythagorean equation
kf 00 s00f,X k2L2 [a,b] + ks00f,X k2L2 [a,b] = kf 00 k2L2 [a,b] ,
indicating that the natural cubic splines interpolant is the function from H2 [a, b] that
minimizes the semi-norm kf 00 k2L2 [a,b] under the conditions f (xj ) = fj , 1 j N .
3. They possess a local basis called B-splines. This basis, which is more stable than any
other, can be defined by recursion, by divided differences of the truncated cubic power
p(x; t) = (x t)3+ or by convolution. Here x+ takes the value of x for nonnegative x
and zero otherwise.
10
Interested readers on splines and their many properties can refer to the following funda-
mental books by Schumaker [12] or de Boor [3].
Remarks
Property 1. combined with the local basis, not only allows the efficient computation
and evaluation of splines but also is the key ingredient for a simple error analysis.
Hence, the natural way of extending splines to the multivariate setting is
based on this property. To this end, a bounded region Rd is partitioned
into essentially disjoint regions {j }N
j=1 (patches). Then the spline space consists of
those functions s that are piecewise polynomials on each j and that have smooth
connections on the boundaries of two adjacent patches. In two dimensions the most
popular subdivision of a polygonal region is by triangles. It is interesting to note
that even in this simple case, the dimension of the spline space is in general unknown.
When coming to higher dimensions it is not all clear what an appropriate replacement
for the triangulation would be. Hence, even if great progresses have been made in the
2-dimensional setting, the method is not suited for general dimensions.
Another possible generalization to the multivariate setting is based on the property
3. In particular a construction based on convolution led to the so called Box-splines.
Again, even in the 2-dimensions the problem can be handle, for higher dimensions is
still an open problem.
The property 2. is the motivation for a general framework in higher dimensions. This
approach has allowed to develop a beautiful theory where all space dimensions can be
handled in the same way. The resulting approximation spaces no longer consist of piecewise
polynomials, so they can not be called splines. The new functions are known with the
fashionable words of Radial Basis Functions (in what follows we refer to them simply as
RBF).
1.2 From cubic splines to RBF
To get a better idea, let us remind that the set S3 (X) has the basis of truncated powers
( xj )3+ , 1 j N plus an arbitrary basis for P3 (R). Hence every s N3 (X) can be
represented in the form
N
X 3
X
s(x) = aj (x xj )3+ + bj xj , x [a, b] . (1.4)
j=1 j=0
Because s is a natural spline we have the additional information that s is linear on the two
outer intervals. That is on [a, x1 ] the spline is simply s(x) = b0 + b1 x (since b2 = b3 = 0).
Thus (1.4) becomes
N
X
s(x) = aj (x xj )3+ + b0 + b1 x, x [a, x1 ] . (1.5)
j=1
11
To derive the representation of s in [xN , b] we have simply to remove all subscripts + on

the functions ( xj )3+ in (1.5). Expanding these cubics and rearranging the sums we get

3 N
X 3 X
s(x) = (1)3l aj xj3l xl + b0 + b1 x, x [xN , b] . (1.6)
l
l=0 j=1
Thus, for s to be a natural spline, the coefficients of s have to satisfy
N
X N
X
aj = aj xj = 0 . (1.7)
j=1 j=1
This is a first characterization of natural cubic splines.

(|x|3 +x3 )
One more step. Using the identity x3+ = 2 , and thanks to the relations (1.7) we
get
N N
X aj 3
X aj
s(x) = |x xj | + (x xj )3 + b0 + b1 x
2 2
j=1 j=1

N 3 N
X aj X 1 3 X
= |x xj |3 + (1)3l aj x3l
j x
l
+ b0 + b1 x
2 2 l
j=1 l=0 j=1
N
X
= j |x xj |3 + b0 + b1 x ,
a
j=1
PN PN
j = aj /2, 1 j N , b0 = b0
where a 1
2
3
j=1 aj xj and b1 = b1 + 3
2
2
j=1 aj xj .
Proposition 1. Every natural spline s has the representation

N
X
s(x) = aj (|x xj |) + p(x), x R (1.8)
j=1
where (r) = r3 , r 0 and p P1 (R). The coefficients {aj } have to satisfy the relations
(1.7). On the contrary, for every set X = {x1 , . . . , xN } R of pairwise distinct points and
for every f RN there exists a function s of the form (1.8), with (1.7), that interpolates
the data, i.e. s(xj ) = f (xj ), 1 j N .
This is the starting point for understanding the origin of RBF. The resulting interpolant
is, up to a low-degree polynomial, a linear combination of shifts of a radial function =
(| |). The function is then called radial because is the composition of a univariate function
with the Euclidean norm on R.
The generalization to Rd is straightforward where the name radial becomes even more
evident. In fact
12
N
X
s(x) = aj (kx xj k2 ) + p(x), x Rd , (1.9)
i=1
where : [0, ) R is a univariate fixed function and p Pm1 (Rd ) is a low degree
d-variate polynomial. The additional conditions on the coefficients (corresponding to (1.7))
become
XN
aj q(xj ) = 0, q Pm1 (Rd ) . (1.10)
i=1
In many cases (see Lecture 2), we can avoid the side conditions on the coefficients
(1.10). In these cases the interpolation problem has solution if the matrix
A,X := ((kxi xj k2 )1i,jN ,
is invertible. To be more precise we ask

Problem 1. Does there exist a function : [0, ) R such that for all d, N N and all
pairwise distrinct x1 , . . . , xn Rd the matrix A,X is nonsingular ?
The answer is affirmative. Examples of functions that allow to build matrices nonsin-
2
gular are: the gaussians (r) = er , > 0, the inverse multiquadric (r) = (c2 + r2 )1/2
and the multiquadric (r) = (c2 + r2 )1/2 , c > 0. In the two first cases it is even true that
the matrix A,X is always positive definite (and so invertible).
Remark. In what follows, in the context of RBFs, instead of A,X we shall use simply
A thinking to the interpolation matrix with radial basis functions.
1.3 The scattered data interpolation problem
In many disciplines one faces the following problem: we are given a set of data (measure-
ments, locations at which these measurements are taken,...) and we want to find a rule
which allows to get information about the process we are studying also at locations different
from those at which the measurements are taken (or provided).
The main reasons why we are interested on such a problem in our setting are:
Scattered data fitting is a fundamental problem in approximation theory and data

modeling in general
Mathematical challenge: we want a well-posed problem formulation
This will naturally lead to distance matrices
Later we generalize to radial basis functions or positive definite kernels
13
Problem 2. Given data (xj , yj ), j = 1, . . . , N , with xj Rd , yj R, find a (continuous)

function Pf (depending on f ) such that Pf (xj ) = yj , j = 1, . . . , N .
Figure 1.1: Data points, data values and data function
Now, assume Pf is a linear combination of certain basis functions Bk , that is

N
X
Pf (x) = ck Bk (x), x Rd . (1.11)
k=1
Solving the interpolation problem under this assumption leads to a system of linear
equations of the form
Ac = y,
where the entries of the interpolation matrix A are given by Ajk = Bk (xj ), j, k = 1, . . . , N ,
c = [c1 , . . . , cN ]T , and y = [y1 , . . . , yN ]T .
The scattered data fitting problem will be well-posed, that is a solution to the problem
will exist and be unique, if and only if the matrix A is non-singular.
14
1.3.1 The Haar-Mairhuber-Curtis theorem
We need to introduce the following definition
Definition 1. Let the finite-dimensional linear function space B C() have a basis
{B1 , . . . , BN }. Then B is a Haar space on if
det(A) 6= 0
for any set of distinct points x1 , . . . , xN . Here, A is the matrix with entries Ai,j =
Bj (xi ).
The existence of a Haar space guarantees the invertibility of the matrix A. In the
univariate setting it is well known that one can interpolate to arbitrary data at N distinct
data sites using a polynomial of degree N 1. This means that the polynomials of degree
N 1 form an N -dimensional Haar space for the set of distinct points X = {x1 , . . . , xN }.
This is a counterexample useful to understand the necessity of a different approach

than using polynomials.
Example 1. It is not possible to perform unique interpolation with (multivariate) poly-

nomials of degree N to data given at arbitrary locations in R2 .
The Haar-Mairhuber-Curtis theorem tells us that if we want to have a well-posed mul-

tivariate scattered data interpolation problem we can no longer fix in advance the set
of basis functions we plan to use for interpolation of arbitrary scattered data.
Instead, the basis should depend on the data points.
Theorem 1. (Haar-Mairhuber-Curtis)
If Rd , d 2 contains an interior point, then there exist no Haar spaces of continuous
functions except for the 1-dimensional case.
Proof. Let d 2 and assume that B is a Haar space with basis {B1 , . . . , BN } with
N 2. We show that this leads to a contradiction. In fact, let x1 , . . . , xn be a set of N
distinct points in Rd and A the matrix such that Aj,k = Bk (xj ), j, k = 1, . . . , N . By the
above definition of Haar space det(A) 6= 0. Now, consider the closed path P in connecting
only x1 and x2 . This is possibile since by assumption contains an interior point. We
then can exchange x1 and x2 by moving them continuosly along P (without interfering
with other point xj ). This means that the rows 1 and 2 of the matrix A have been changed
and so the determinant has changed sign. Since the determinant is a continuous function
of x1 and x2 we must have had det(A) = 0 at some point along P . This contradicts the
fact that det(A) 6= 0.
15
1.3.2 Distance matrices
We want to construct a (continuous) function Pf that interpolates samples obtained from

a test function fd at data sites xj [0, 1]d . that is we want
Pf (xj ) = fd (xj ), xj [0, 1]d .
d
Y
For example fd (x) = 4d xk (1 xk ), x = (x1 , . . . , xd ) [0, 1]d , which is zeros on the
k=1
boundary of the unit cube in Rd and has a maximum value of one at the center of the
d-dimensional cube.
Assume for now that d = 1. We have already seen that for small N one can use
univariate polynomials; if N is relatively large it is better to use splines (the simplest
approach is the C 0 piecewise linear splines connect the dots)
A basis for the space of piecewise linear interpolating splines is
{Bk = | xk | k = 1, . . . , N } .
Hence our spline interpolant can be written as
N
X
Pf (x) = ck |x xk |, x [0, 1]
k=1
and the coefficents ck will be determined by the interpolation conditions
Pf (xj ) = f1 (xj ), j = 1, . . . , N
Some observations.
The basis functions Bk = | xk | are dependent on the data sites xk as suggested by

Haar-Mairhuber-Curtis.
B(x) = |x| is called basic function.
The points xk to which the basic function is shifted to form the basis functions, are
usually referred to as centers or knots.
Technically, one could choose these centers different from the data sites. However,
usually centers coincide with the data sites. This simplifies the analysis of the method,
and is sufficient for many applications. In fact, relatively little is known about the
case when centers and data sites differ.
Bk are (radially) symmetric about their centers xk , radial basis function.

16
Now the coefficients ck in the scattered data interpolation problem are found by solving
the linear system

|x1 x1 | |x1 x2 | . . . |x1 xN | c1 f1 (x1 )
|x2 x1 | |x2 x2 | . . . |x2 xN | c2 f1 (x2 )
.. = (1.12)

.. .. . . .. ..
. . . . . .
|xN x1 | |xN x2 | . . . |xN xN | cN f1 (xN )
The matrix in (1.12) is an example of distance matrix.
Distance matrices have been studied in geometry and analysis in the context of iso-
metric embeddings of metric spaces for a long time.
It is known that the distance matrix based on the Euclidean distance between a set
of distinct points in Rd is always non-singular (see below).
Therefore, our scattered data interpolation problem is well-posed.
Since distance matrices are non-singular for Euclidean distances in any space dimension
d we have an immediate generalization: for the scattered data interpolation problem on
[0, 1]d we can take
XN
Pf (x) = ck kx xk k2 , x [0, 1]d , (1.13)
k=1
and find the ck by solving

kx1 x1 k2 kx1 x2 k2 ... kx1 xN k2 c1 fd (x1 )
kx2 x1 k2 kx2 x2 k2 ... kx2 xN k2 c2 fd (x2 )
= .

.. .. .. .. .. ..
. . . . . .
kxN x1 k2 kxN x2 k2 . . . kxN xN k2 cN fd (xN )
Note that the basis is again data dependent
Piecewise linear splines in higher space dimensions are usually constructed differently
(via a cardinal basis on an underlying computational mesh)
For d > 1 the space span{k xk k2 , k = 1, . . . , N } is not the same as piecewise linear
splines
In order to show the non-singularity of our distance matrices we use the Courant-Fischer
theorem (see for example the book by Meyer [9]):
Theorem 2. Let A be a real symmetric N N matrix with eigenvalues 1 2 N ,
then
k = max min xT Ax and k = min max xT Ax.
dimV=k xV dimV=N k+1 xV
kxk=1 kxk=1
17
Figure 1.2: A typical basis function for the Euclidean distance matrix fit, Bk (x) = kxxk k2
with xk = 0 and d = 2.
Definition 2. A real symmetric matrix A is called conditionally negative definite of

order one (or almost negative definite) if its associated quadratic form is negative, that is
N X
X N
cj ck Ajk < 0 (1.14)
j=1 k=1
N
X
for all c = [c1 , . . . , cN ]T 6= 0 RN that satisfy cj = 0.
j=1
Now we have
Theorem 3. An N N matrix A which is almost negative definite and has a non-negative

trace possesses one positive and N 1 negative eigenvalues.
Proof. Let 1 2 N denote the eigenvalues of A. From the Courant-Fischer

theorem we get
2 = min max xT Ax max cT Ac < 0,
dimV=N 1
P
xV c: ck =0
kxk=1 kck=1
N
X
so that A has at least N 1 negative eigenvalues. But since tr(A) = k 0, A also
k=1
must have at least one positive eigenvalue.
Example 2. It is known that (r) = r is a strictly conditionally negative definite function of

order one, i.e., the matrix A with Ajk = kxj xk k2 is almost negative definite. Moreover,
since Ajj = (0) = 0, j = 1, . . . , N then tr(A) = 0. Therefore, our distance matrix is
non-singular by the above theorem.
18
1.3.3 Data sets
Depending on the type of approximation problem we are given, we may or may not be able
to select where the data is collected, i.e., the location of the data sites or design.
Standard choices in low space dimensions are depicted in Fig. 1.3. In higher space di-
mensions it is important to have space-filling (or low-discrepancy) quasi-random point sets.
Examples include: Halton points, Sobol points, lattice designs, Latin hypercube
designs and quite a few others (digital nets, Faure, Niederreiter, etc.).
Figure 1.3: tensor products of equally spaced points and tensor products of Chebyshev
points
Definition 3. Given a sequence X = {x1 , ..., xN } its discrepancy is

#(X, B)
DN (X) := sup d (B) (1.15)
BJ N
where
J := di=1 [ai , bi ) = {x Rd : ai xi bi , 0 ai < bi < 1} (d-dimensional

Q
intervals),
#(X, B) is the number of points of X in B.
d is Lebesgue measure
When DN (X) mis(B) then DN is called low discrepancy.
Low-discrepancy sequences are known also as quasi-random sequences, because they

are often used as uniformely distributed random numbers.
Example 3. A typical application of such sequences is numerical quadrature (or cubature).
For example in the one-dimensional case
Z 1 N
1 X
f (t)dt f (xi ) . (1.16)
0 N
i=1
19
If in (1.16) xi = i/N then we get the rectangle formula.
If xi are random numbers, then (1.16) is the Montecarlo method.
If xi are a low discrepancy sequences then (1.16) is a quasi-Montecarlo method.
1.3.4 Halton points
Here we show how one can generate Halton points in every spatial dimension. Halton
points are uniformely distributed random point in (0, 1)d generated from Van der Corput
sequences. We start by generating Van der Corput sequences. Let k N be chosen.
(i) Every n N can be written as

k
X
n= ai pi
i=0
where the coefficients ai are integer such that 0 ai < p. For example taking n = 10
and p = 3
10 = 1 30 + 0 31 + 1 32
giving k = 2, a0 = a2 = 1, a1 = 0.
Pk
i=0 ai
(ii) We define the function hp : N [0, 1) as hp (n) = pi+1
. For example
1 1 10
hp (10) = + = .
3 33 27
(iii) The Van der Corput sequence is then
hp,N = {hp (n) : n = 0, 1, . . . , N } .
In our example
1 2 1 4 7 2 5 8 1 10
h3,10 = {0, , , , , , , , , , } .
3 3 9 9 9 9 9 9 27 27
Starting from the Van der Corput sequence, the Halton seqnuence is generated as follows:
take d distinct primes p1 , . . . , pd and generated hp1 ,N , . . . , hpd ,N that we use as coor-
dinates of d-dimensional points, so that
Hd,N = {(hp1 ,N (n), . . . , hpd ,N (n) : n = 0, . . . , N }
is the set of N + 1 Halton points in [0, 1)d .
Proposition 2. Halton points form a nested sequence, that is if M < N then Hd,M Hd,N .
20
These points can be constructed sequentially. Similar to these points are Leja se-
quences [4]. As a final observation, for Halton points we have
C(log N )d
DN (Hd,N ) .
N

1. In Matlab the program haltonseq.m by Daniel Dougherty, dowloadable at the Mat-

lab Central File Exchange, generates Halton points in every space dimension.
The call is haltonseq(numpts, dim). Notice that in this implementation the point
0 = (0, ..., 0)T is not part of the point set, that is they are generated starting from
n = 1 instead of n = 0 as described above. In recent Matlab versions, the function
P=haltonset(n,d) computes n Halton points in dimension d.
2. Analogously, Sobol points can be generated in Matlab by P=sobolset(d). This call

constructs a d-dimensional point set P of the sobolset class, with default property
settings. For example, if d = 2, P is an array intmax 2. If one wishes different
properties, the call become P=sobolset(d, p1, var1, p2,var2, ...) that speci-
fies property name/value pairs used to construct P.
3. Similarly, the Matlab function X = lhsdesign(n,p) returns an n p matrix, X,

containing a latin hypercube sample of n values on each of p variables. For each
column of X, the n values are randomly distributed with one from each interval
(0, 1/n), (1/n, 2/n), ..., (1 1/n, 1), and they are randomly permuted.
Figure 1.4: Halton and Sobol points
The difference between the standard (tensor product) designs and the quasi-random
designs shows especially in higher space dimensions, as shown in Fig. 1.6
21
Figure 1.5: Lattice and Latin points
1.4 Exercises
The Haar-Maierhuber-Curtis theorem told us that is not possibile to interpolate by multi-

varaite polynomials of degree N 2 scattered data in dimension d 2. This suggested to
take a basis of function localized at the so called data-sites .
The tools that we need for these exercises are
The Halton points on the hypercube [0, 1]d , d 1.
The fill-distance (or mesh size) hX, of a set X con Rd that is
hX, = sup min kx xj k2 , (1.17)

x xj X
In Matlab, this distance can be easily computed by the command hX=max(min(DME)),

where DME is the distance matrix, generated by the function DistanceMatrix.m eval-
uated at a set of target (or evaluation) points (for example an equispaced grid
finer than the of the data-sites X).
The functions to be approximated are

s
Y
fs (x) = 4s xk (1 xk ), x = (x1 , . . . , xs ) [0, 1]d (1.18)
k=1
s
Y sin(xk )
sinc(x) = . (1.19)
xk
k=1
1. By means of the Matlab function haltonseq.m compute N = 5d Halton points in

dimensions d = 1, 2, 3. Compute for each set the corrisponding fill-distance, hX, .
2. Verify graphically the nested property of Halton points, that is
Hd,M Hd,N , M < N . (1.20)

22
In practice, using different colours plot the Halton points for different values of M
and N .
3. Again, by using the function DistanceMatrix.m on different set of Halton points of di-
mension d = 2, verify that the corresponding distance-matrix, say A, is ill-conditioned,
by computing its condtion number in the 2-norm (in Matlab cond(A)).
4. In dimension d = 2, using the function DistanceMatrixFit.m, build the RBF inter-

polant using the basis k (x) = kx xk k2 (that is, the translates at xk of the basic
function (r) = r) of the functions (1.18) and (1.19) by computing the Root Mean
Square Error, RMSE (see for its definition Lecture 4). Verify that as N increases, the
error decreases. Notice: the RMSE has to be evaluated in a finer grid of evaluation
points.
5. Repeat the previous exercise by using the Gaussian radial basis function, (x) =
2 2
e kxk , > 0 again for d = 2. For accomplish this interpolation, use the function
RBFInterpolation2D.m that generalizes the DistanceMatrixFit.m.
The Matlab files can be downloaded at the link
http://www.math.unipd.it/demarchi/TAA2010
23
Figure 1.6: Projections in 2D of different point sets
24
Lecture 2
Positive definite functions
In the first lecture we saw that the scattered data interpolation problem with RBFs leads
to the solution of a linear system
Ac = y
with Ai,j = (kxi xj k2 ) and yi the i-th data value. The solution of the system requires
that the matrix A is non-singular. The situation is favourable if we know in advance that
the matrix is positive definite. Moroever we would like to characterize the class of functions
for which the matrix is positive definite.
2.1 Positive definite matrices and functions
We start by the definition of positive definite matrices.
Definition 4. A real symmetric matrix A is called positive semi-definite if its

associated quadratic form cT Ac 0, that is
N X
X N
ci cj Ai,j 0 (2.1)
i=1 j=1
for c RN . If the quadratic form (2.1) is zero only for c = 0 then A is called positive
definite (sometimes we will use the shorthand notation PD).
The most important property of such matrices, is that their eigenvalues are positive and
so is its determinant. The contrary is not true, that is a matrix with positive determinant
is not necessarly positive definite. Just consider as a trivial example the matrix (in Matlab
notation) A=[-1 0 ; 0 -3] whose determinant is positive, but A is not PD.
25
Hence, if in (1.11) the basis Bk generates a positive definite interpolation matrix that
we would always have a well-defined interpolation problem. In order to get such property,
we need to introduce the class of positive definite functions.
Definition 5. A continuous complex valued function : Rd C is called

positive semi-definite if, for all N N, all sets of pairwise distinct points
X = {x1 , . . . , xN } Rd and c CN the quadratic form
N X
X N
ci cj (xi xj ) (2.2)
i=1 j=1
is nonnegative. The function is then called positive definite if the quadratic form
above is positive for c CN , c 6= 0.
Ty
Example 4. The function (x) = eix , for a fixed y Rd , is PD on Rd . If fact,
N X
N N X
N
Ty
X X
ci cj (xi xj ) = ci cj ei(xi xj )
i=1 j=1 i=1 j=1
N
! N
T T
cj eixj y
X X
= ci eixi y
i=1 j=1
N 2
X
ixT y
= ci e 0.
i

i=1
Remark. From the previous definition and the discussion done on Lecture 1, we
should usePPD functions as basis, i.e. Bi (x) = (x xi ), that is an interpolant of the form
N
Pf (x) = i=1 ci Bi (x). Moreover at this point, we do not need Pf be a radial function, but
simply translation invariant (that is, Pf is the same as the translated interpolant to the
original data). We will characterize PD and radial functions in Rd later in this lecture.

These functions have some properties that we summarize in the following theorem.
Theorem 4. Suppose is a positive semi-definite function. Then
1. (0) 0 and (0) = 0 iff 0.
2. (x) = (x) for all x Rd .
3. |(x)| (0) for all x Rd (boundness)

26
If {i , i = 1, . . . , m} are positive semi-definite and j 0, j = 1, ..., m, then =

4. P
m
i=1 j j is also positive semi-definite. If one of the j is positive definite and the
corresponding coefficient j is positive, then is also positive definite.
5. The product of two positive definite functions is positive definite.
Proof. The proof can be found in [13, p. 65-66] or [6, 29-30].
Remark. From property 2. of Theorem 4, it is clear that a positive semi-definite

function is real valued iff is even. But we can also restrict to real coefficient vectors c RN
in the quadratic form. In fact holds the following theorem.
Theorem 5. Suppose : Rd R is continuous. Then is positive definite if and only if

is even and we have, for all N N and all c R\{0} and all pairwise distinct x1 , . . . , xN
N X
X N
ci cj (xi xj ) > 0 .
i=1 j=1
Proof. If is PD and real valued, then it is even by the previous theorem (by property
2.). Letting ck = ak + i bk then
N
X N
X
ci cj (xi xj ) = (ai aj + bi bj )(xi xj ) +
i,j=1 i,j=1
N
X
+ i aj bi [(xi xj ) (xj xi )] .
i,j=1
As is even, the second sum on the right hand side is zero. The first sum is nonnegative
bacause of the assumption, vanishing only if ai = bi = 0.
ix ix
Example 5. The cosine function is PD on R. In fact, for all x R, cos x = e +e
2 . By
property 4. of Theorem 4 and the fact that the exponential is PD (see Example 4), we
conclude.
When we are dealing with radial functions i.e (x) = (kxk), then it will be con-
venient to refer to the univariate function as positive definite radial function. A
consequence of this notational convention is the following Lemma.
Lemma 1. If (x) = (kxk) is PD (or positive semi-definite) and radial on Rd , then

is also PD (or positive semi-definite) and radial on R for any d.

27
The Schoenberg characterization of PD and radial functions
The classical way to characterize PD and radial functions is by this Theorem, due to Schoen-
berg [11].
Theorem 6. A continuous function : [0, ) R is PD and radial on Rd if and

only if it is the Bessel transform of a finite nonnegative Borel measure on [0, ), i.e.
Z
(r) = d (rt)d(t)
0
where (
cos(r) d=1
d (r) = d 2 (d2)/2

( 2 ) r J(d2)/2 (r) d 2
Here J is the classical Bessel function of the first kind of order , that is the solution of
the Bessel differential equation. For example, around x = 0, this function can be expressed
as the Taylor series

X (1)k x 2k+
J (x) = .
k! (k + + 1) 2
k=0
In particular, as already seen, (x) = cos(x) can be seen as a fundamental PD and radial
function on R.
Observation. Since these lectures are intended for people with no preperation and
mathematical background on measure theory and functional analysis, we prefer to avoid to
introduce more results characterizing PD functions by using (Fourier) transforms. We prefer
to use another and more comprehensible approach based on the definition of completely
monotone and multiply monotone functions. Interested readers can satisfy their hunger for
knowledge by looking at the books [1, 6, 13].
2.1.1 Completely monotone functions
In the framed Theorem 6 and successive remark, we observed that we avoid to characterize
the positive definiteness of RBF using Fourier transforms, also because Fourier transforms
are not always easy to compute.
Definition 6. A function : [0, ) R that is C[0, ) C (0, ) and satisfies
(1)k (k) (r) 0, r > 0, k = 0, 1, 2, . . .
is called Completely Monotone (shortly CM).

28
This definition allows to verify when a function is positive definite and radial for all
dimensions d.
Here we enumerate some of the most important PD functions showing that they are
CM.
1. The function (r) = c, c 0 is CM on [0, ).
2. The function (r) = ec r , c 0 is CM on [0, ) since
(1)k (k) (r) = ck ecr 0, k = 0, 1, 2, ...
1
3. The function (r) = (1+r)
, 0 is CM on [0, ) since
(1)k (k) (r) = (1)2k ( 1) ( + k 1)(1 + r)k 0, k = 0, 1, 2, ...
There are now two Theorems to quote that give more informations
Theorem 7. A function is CM on [0, ) if and only of = (k k2 ) is positive semi-

definite and radial on Rd for all d.
Notice that now is defined via the square of the norm.
Theorem 8. A function is CM on [0, ) but not constant if and only of = (k k2 )

is PD and radial on Rd for all d.
Hence as a consequence of these two theorems, the functions (r) = ecr , c > 0 and
1 d
(r) = (1+r) , 0 are CM on [0, ) and since they are not constant, in R for all d,
we have
2 2
1. (x) = ec kxk , c > 0 is positive definite and radial on Rd for all d. This is
the family of Gaussians. The parameter c is a shape parameter that change the
shape of the function, making it more spiky when c and flatter when c 0.
2. (x) = (1 + kxk2 ) , 0 is positive definite and radial on Rd for all d. This

is the family of inverse multiquadrics.
2.1.2 Multiply monotone functions
This characterization allows to check when a function is PD and radial on Rd for some
fixed d.
29
Figure 2.1: Left: Gaussian with c = 3. Right: inverse multiquadric with = 1.
Definition 7. A function : (0, ) R which is C s2 (0, ), s 0 and for which

(1)k (k) (r) 0, non-increasing and convex for k = 0, 1, . . . , s 2 is called s times
monotone on (0, ). In case s = 1 we only require C(0, ) to be non-negative and
non-increasing.
In what follows sometimes we use the shorthand notation MM for multiply monotone func-
tions.
Convexity can be expressed by the inequality 00 0 if 00 exists. Hence, a MM function

is essentially a CM function with truncated monotonicity.
Example 6. Here we introduce two new families of radial basis functions.
1. The truncated power function
k (r) = (1 r)k+
is k-times monotone for any k. Indeed

(s) ks
(1)s k = k(k 1) (k s + 1)(1 r)+ 0, s = 0, 1, . . . , k
These functions (we will see later) lead to radial functions that are PD on Rd provided
k bd/2c + 1.
2. Consider the integral operator I defined as

Z
(If )(r) = f (t)dt , r 0 ,
r
with an f which is k-times monotone. Then If is (k+1)-times monotone. This follows

from the fundamental theorem of calculus. This operator I plays an important role
in the construction of compactly supported radial basis functions (that for lack of
time we will not discuss in these lectures).
30
Figure 2.2: Left: truncated power with k = 1. Right: truncated power with k = 2.
2.1.3 Other positive definite radial functions
1. Gaussians-Laguerre. These are defined as
2
(x) = ekxk Ld/2 2
n (kxk )
d/2
where Ln indicates the Laguerre polynomial of degree n and order d/2, that is
n
(1)k n + d/2 k
X
d/2
Ln (t) = t .
k! nk
k=0
For example for n = 1, 2 and d = 1, 2 see Table 2.1.
d n=1 n=2
2 2
1 (3/2 x2 )ex (15/8 5/2 x2 + 1/2x4 )ex
2 2
2 (2 kxk2 )ekxk (3 3kxk2 + 1/2kxk4 )ekxk
Table 2.1: Gaussians-Laguerre functions for different d and n
Notice, that the Gaussians-Laguerre, depends on the space dimension d. Therefore

they are PD and radial on Rd (and therefore also on R for d).
2. Poisson functions. These functions has the definition
Jd/21 (kxk)
(x) = , d 2,
kxkd/21
where Jp is the Bessel function of the first kind and order p. While these functions
are not defined at the origin, they can be extended to be C (Rd ) (see Table 2.2 for
31
Figure 2.3: First row: G-L when d = 1 and n = 1, 2. Second row: G-L for d = 2 and
n = 1, 2.
d=2 d=3 d=4

r
2 sin(kxk) J1 (kxk)
J0 (kxk)
kxk kxk
Table 2.2: Poisson functions for various d
the cases d = 2, 3, 4.) Notice: Jp in Matlab can be computed using the function
besselj(p, z) (where z is an array of evaluation points)
3. Mat
ern functions. They are defined as
Kd/2 (kxk)kxkd/2
(x) = , d < 2 ,
21 ()
where Kp is the modified Bessel function of the second kind of order p, that can be
defined as a function of the Bessel function of first kind as follows
Jp (x) Jp (x)
Kp (x) =
2 sin( p)
In Table 2.3 we present the M atern functions for three values of , indicated as
i , i = 1, 2, 3. Notice that the Matern function for 1 is not differentiable at the
origin; while for 2 is C 2 (Rs ) and for 3 is C 4 (Rs ).
32
Figure 2.4: Poisson RBF for d = 2, 3, 4, respectively. Here the shape parameter is = 10
d+1 d+3 d+5

1 = 2 2 = 2 3 = 2
ekxk (1 + kxk) ekxk 3 + 3kxk + kxk2 ekxk

Table 2.3: Matern functions for different values of the parameter
4. The generalized inverse multiquadrics
(x) = (1 + kxk2 ) , d < 2
5. Whittaker functions. The idea of these functions is based on the following construc-
tion. Let f C[0, ) be a non-negative and not identically equal to zero, and define
the function Z
k1
(r) = (1 rt)+ f (t)dt . (2.3)
0
Then the (x) = (kxk) is positive definite and radial on Rd provided k b d2 c + 2.

In fact the quadratic form
N
X Z N
X
ci cj (kxi xj k) = ci cj k1 (tkxi xj k)f (t)dt
i,j=1 0 i,j=1
33
Figure 2.5: Matern RBF for different , respectively. As before the shape parameter is
= 10
which is non-negative since the truncated power k1 (k k) is positive semi-definite

(see above) and f is non-negative. It is easy to see that it is indeed positive definite,
and so is PD itself.
Then the choice of f gives different Whittaker functions. For example, as described in
[6, p. 44], taking f (t) = t et , 0, > 0 we get a function with a long expression
(see [6, p.44, formula (4.10)]) that uses the so called Whittaker -M function.
Special cases when = 1 are provided in Table 2.4.
k=2 k=3
kxk + kxke/kxk 2 2kxk + 2kxk2 2kxk2 e/kxk

0
2 3
2kxk + ( + 2kxk)e/kxk 2 4kxk + 6kxk2 (2kxk + 6kxk2 )e/kxk

1
3 4
Table 2.4: Whittaker functions for various choices of k and
34
Figure 2.6: Inverse multiquadric with shape parameter = 3. On the left = 1/2 that
corresponds to the Hardy multiquadric. On the right with = 1 that corresponds to the
inverse quadric.
2.2 Exercises
1. Plot some of the radial function positive definite (centered in the origin). When d = 1
take x [1, 1] while for d = 2 consider x [1, 1]2 .
- The Gaussians -Laguerre for n = 1, 2 and d = 1, 2. See Table 2.1 for the corre-
sponding definitions.
- The Poisson functions for d = 2, 3, 4 in [1, 1]2 using as shape parameter = 10,
see Table 2.2
- The Matern function in [1, 1]2 , for three different values of and shape param-
eter = 10, as defined in Table 2.3.
- The generalized inverse multiquadrics (x) = (1 + kxk2 ) , s < 2, in [1, 1]2 ,
in these two (simple) cases (with = 5): = 1/2 (which corresponds to the so-
called Hardy inverse multiquadrics) and = 1 (which is the inverse quadrics).
- The truncated powers (x) = (1 kxk)l+ for l = 2, 4 (in [1, 1]2 ).
- The Whittakers potentials in the square [1, 1]2 for different values of the pa-
rameters , k and , as defined in Table 2.4. For these last plots use = 1.
2. Interpolate the Franke function
f (x1 , x2 ) = .75 exp[((9x1 2)2 + (9x2 2)2 )/4] + .75 exp[(9x1 + 1)2 /49 (9x2 + 1)/10]
+ .5 exp[((9x1 7)2 + (9x2 3)2 )/4] .2 exp[(9x1 4)2 (9x2 7)2 ];
on a grid of 20 20 Chebyshev points in [0, 1]2 with Poisson and Matern functions.
Compute also the corresponding RMSE.
35
Figure 2.7: Whittaker functions with = 0, k = 2 and = 1, k = 2.
36
Lecture 3
Conditionally positive definite

functions
The multivariate polynomials are not suitable for solving the scattered data interpolation
problem. Only data sites in special locations can guarantee well-posedness of the inter-
polation problem. In order to have a flavour of this setting, we introduce the notion of
unisolvency.
Definition 8. A set X = {x1 , . . . , xN } Rd is called m-unisolvent, if the only polyno-
mial of total degree at most m interpolating the zero data on X is the zero polynomial.
Similarly, X is unisolvent for Pm (Rd ) (i.e. polynomials in d variables of degree at most

m) if there exists a unique polynomial in Pm (Rd ) of lowest possible degree which interpolates
the data X.
Then, to get a unique solution of the polynomial interpolation problem with a d-variate
d
m,
polynomial of degree to some given data, we need to find a subset of X R with
m+d
cardinality M = = dim(Pm (Rd ) which is m-unisolvent.
m
Example 7. These are examples of points that form unisolvent sets
A simple example is this one. Take N = 5 points in R2 . There is no unique way to use
bivariate linear interpolation o quadratic. In fact linear polynomials have dimension
M = 3, bivariate M = 6.
The Padua points on the square [1, 1]2 , form the first complete example of unisol-
vent points whose Lebesgue constant has optimal growth (cf. [10]).
In R2 the Chung and Yao points. The construction of these( d points is based
) on
X
lattices. In practise, a lattice Rd has the form = hi vi , hi Z with
i=1
{vi , . . . , vd } a basis for Rd . For details see the paper [2].
37
The m-unisolvency of the set X = {x1 , . . . , xN } is equivalent to the fact that the
matrix P such that
Pi,j = pj (xi ), i = 1, . . . , N, j = 1, ..., M
for any polynomial basis pj , has full (column)-rank. For N = M this is the classical
polynomial interpolation matrix.
This observation, can be easily checked when in R2 we take 3 collinear points: they are
not 1-unisolvent, since a linear interpolant, that is a plane through three arbitrary heights at
these three collinear points is not uniquely determined. Otherwise such a set is 1-unisolvent.
Remark. This problem arises when we want to construct interpolants with polynomial
precision. That is, interpolants that reproduce polynomials.
Example 8. Take the function f (x, y) = (x+y)/2 on the unit square [0, 1]2 . Using Gaussian
RBF (with = 6) interpolate it on a grid of N = 1089 = 33 33 uniformly distributed
points. This will lead to an interpolant which is not a linear function as is f (i.e. the
interpolation does not work).
Hence, instead of using an interpolant of the form Pf (x) = N 2 kxxk k2 , we use

P
k=1 ck e
the following
N
2 kxx k2
X
Pf (x) = ci e i
+ cN +1 + cN +2 x + cN +3 y . (3.1)
| {z }
i=1
polynomial part
Having only N interpolation conditions, namely
Pf (xi ) = (xi + yi )/2, i = 1, . . . , N , (3.2)
how can we find the remaining three conditions so that the resulting system will be square?
As we shall see later, the solution is
N
X N
X N
X
ci = 0, ci xi = 0, ci yi = 0 . (3.3)
i=1 i=1 i=1
Combining the interpolation conditions (3.2) with the side conditions (3.3), the resulting
linear system becomes
A P c y
= (3.4)
PT O d 0
where the matrix A and the vectors c and y are the same as in Lecture 1; while P is a
N 3 matrix with entries Pi,j = pj (xi ) with p1 (x) = 1, p2 (x) = x, p3 (x) = y. Finally the
matrix O is 3 3 of zeros.
38
The general form of the (3.2) is

N
X M
X
Pf (x) = ci (kx xi k) + dk pk (x) , x Rd . (3.5)
i=1 k=1
a basis for the d-variate polynomials of degree m1, whose dimension

where p1, . . . , pM is
m1+d
is M = . The side-conditions become
m1
N
X
ci pj (xi ) = 0, j = 1, . . . , M , (3.6)
i=1
ensuring a unique solution for the system

A P c y
= (3.7)
PT O d 0
where now the matrices and arrays have the following dimensions: A is N N , P is N M ,
O is M M ; c, y are N 1 and d, 0 are M 1.
Remark. It seems ill-adapted for the use to formulate the general setting for the
polynomial space Pm1 (Rd ) instead, as before, of Pm (Rd ). The reason will be explained
later and, as we will see, it will be quite natural.
3.0.1 Conditionally positive definite matrices and functions
In order to prove that the augmented system (3.7) is non-singular we start with the easiest
case of m = 1 (in any dimension d). This is equivalent to the reproduction of the polynomials
of degree m 1 = 0, i.e. the constants.
Definition 9. A real symmetric matrix A, N N , is called conditionally positive

semi-definite of order m = 1 if its associated quadratic form cT Ac 0 for all c =
(c1 , . . . , cN )T RN that satisfy
N
X
ci = 0 . (3.8)
i=1
If c 6= 0 implies strictly inequality, i.e. cT Ac > 0, than A is called conditionally

positive of order one.
Notice: the conditions (3.8) can be viewed as a condition of ortogonality w.r..t.

constant functions.
The following theorem shows that the augmented system is uniquely solvable
39
Theorem 9. Let A be real symmetric of order N that is conditionally postive definite

(CPD) of order one. Let P = (1, . . . , 1)T be an N 1 array. Then the system

A P c y
= (3.9)
PT O d 0
is uniquely solvable
Proof.. Assume that the solution (c, d)T RN +1 is a solution of the homogeneous
system, i.e. with y = 0. We then prove that c = 0 and d = 0. From the first block,
multiplying by cT , we have
cT Ac + dcT P = 0
an using the second equation P T c = cT P = 0 we get that cT Ac = 0. Since A is CPD of
order 1, by asumption we get c = 0. Moreover, from the first block, Ac + dP = 0 that
implies d = 0.
Corollary 1. For reproducing constant functions in Rd , the system to be solved has the
form (3.9).
We are now ready to introduce conditionally positive definite functions of

order m.
Definition 10. A continuous function : Rd C is said to be conditionally positive
semi-definite of order m in Rd , if
N X
X N
ci cj (xi xj ) 0 (3.10)
i=1 j=1
for any N set X = {x1 , . . . , xN } Rd of pairwise distinct points, and c = (c1 , . . . , cN )T

CN such that
XN
ck p(xk ) = 0
k=1
for any complex-valued polynomial p of degree m 1. The function is then called
conditionally positive definite of order m on Rd if the quadratic form (3.11) vanishes
only when c 0.
The first important fact concerning conditionally positive (semi)-definite functions is

their order. To this aim holds the following important result.
Proposition 3. A function which is conditionally positive (semi)-definite of order m is

also conditionally positive (semi)-definite of any order s m. Moreover, a function that
is conditionally positive (semi)-definite of order m in Rd is also conditionally positive
(semi)-definite of order m on Rk with k d.
40
Hence the natural question is to look for the smallest possibile order m. When
speak of the order of such class of functions we will always refer to the minimal possible m.
Examples of CPD functions
Here we list the most important conditionally positive definite radial functions
Generalized multiquadrics.
(x) = (1 + kxk2 ) , x Rd , R\N0 ,
which are CPD of order m = de (and higher).

Notice that in the definition we have to exclude positive integer values for , otherwise
we are led to polynomials of even degree.
Two cases: for = 1/2 we obtain the well-known Hardy multiquadric that is CPD
of order 1; for = 5/2 we have a function which is CPD of order 3.
Figure 3.1: Multiquadrics. Left: Hardy multiquadric. Right: with = 5/2.
Radial powers
(x) = kxk , x Rs , 0 <
/ 2N ,
which are CPD of order m = d 2 e (and higher).
For = 3 we get a function which is CPD of order 2 and for = 5 a function which
is CPD of order 3. A property of these functions is that they are shape parameter
free. This has the advantage that the user need not worry about finding a good (or
the best) value for .
Thin-plate splines
(x) = kxk2 log kxk, x Rs , N ,

41
Figure 3.2: Radial powers. Left: with = 3, = 3. Right: with = 5, = 3.
which are CPD of order m = + 1.

The classical TPS is for = 1 that is CPD of order 2. For = 2 we get a CPD
function of order 3. Also TPS are shape parameter free.
Figure 3.3: Thin plate splines. Left: for = 1, = 1. Right: for = 2, = 1.

As for the PD case, the definition reduces to real coefficients and polynomials if the
basis function is real-valued and even. This is the case when the function is radial.
Theorem 10. A continuous function : Rd R is said to be conditionally positive

definite of order m in Rd , if
N X
X N
ci cj (xi xj ) > 0 (3.11)
i=1 j=1
42
for any N set X = {x1 , . . . , xN } Rd of pairwise distinct points, and c = (c1 , . . . , cN )T

RN \ {0} such that
XN
ck p(xk ) = 0
k=1
for any real-valued polynomial p of degree m 1.
The special case m = 1 appers already in the linear algebra literature and it is dealt
PN and is then referred as conditionally negative definite. The constraints are simply
with
i=1 ci = 0. Since the matrix A is CPD of order 1, it is PD in a subspace of dimension
N 1 (or in general N M where M = dim(Pm1 (Rd )). That is only N 1 (or N M )
eigenvalues are positive.
For conditionally positive definite functions of order m = 1 the following important

property holds (which is Example 2 of Lecture 1).
Theorem 11. Suppose is CPD of order 1 and that (0) 0. Then the matrix
A RN N , i.e. Ai,j = (xi xj ), has one negative and N 1 positive eigenvalues. In
particular it is invertible.
Proof. From the Courant-Fischer theorem,Pwe conclude that it has N 1 positive

eigenvalues. Now, since 0 N (0) = tr(A) = Ni=1 i , then A must have at least one
negative eigenvalue.
For example, the generalized multiquadrics (x) = (1)de (1 + kxk2 ) , 0 < < 1 (which
includes the Hardys one, = 1/2) satisfied the previous Theorem.
Finally we can prove a Theorem similar to Theorem 9.

Theorem 12. Let A be real symmetric of order N that is conditionally positive definite
(CPD) of order m on Rd and the points x1 , . . . , xN form an m 1-unisolvent set. Then
the system
A P c y
= (3.12)
PT O d 0
is uniquely solvable.
Proof.. Assume that the solution (c, d)T RN +M is a solution of the homogeneous
system, i.e. with y = 0. We then prove that c = 0 and d = 0. From the first block,
multiplying by cT , we have
cT Ac + cT P d = 0
an using the second equation P T c = cT P = 0T we get that cT Ac = 0. Since A is CPD
of order m, by assumption we get c = 0. The unisolvency of the data sites, i.e. P has
43
columns linearly independent, and the fact that c = 0 guarantee d = 0 from the top block
Ac + P d = 0.
3.1 Exercises
1. Plot the most important conditionally positive definite functions in the square [1, 1]2 .
(a) generalized multiquadrics
(x) = (1 + kxk2 ) , x R2 , R\N0
in particular for = 1/2 we have the Hardy multiquadric which is of (minimum)

order 1 and that for = 5/2 which is of (minimun) order 3.
(b) radial powers
(x) = kxk , x R2 , 0 <
/ 2N
For example take = 3 (which is CPD of order 2) and = 5 (CPD of order 3).
Verify furthermore that the power functions are shape parameter free.
(c) thin-plate splines
(x) = kxk2 log kxk, x R2 , N
The most important is the one for = 1 which is CPD of order 2. For = 2
we get a function which is CPD of order 3. Verify that also these functions are
shape parameter free.
2. For a CPD function of order 1 (such as the Hardy multiquadric) check the Theorem
11.
44
Lecture 4
Error estimates
In evaluating the error between the interpolant Pf and the data values at some set =
{ 1 , . . . , M } Rd of evaluation points we can compute the root-mean-square error,
that is v
u
u1 X M
1
RM SE := t (Pf ( j ) f ( j ))2 = kPf f k2 . (4.1)
M M
j=1
The root-mean-square error (RMSE) is a frequently used measure of the differences between
values predicted by a model or an estimator and the values actually observed. These
individual differences are called residuals when the calculations are performed over the
data sample that was used for estimation, and are called prediction errors when computed
out-of-sample. The RMSE serves to aggregate the magnitudes of the errors in predictions
for various times into a single measure of predictive power. RMSE is a good measure of
accuracy, but only to compare forecasting errors of different models for a particular variable
and not between variables, as it is scale-dependent. In practice, (4.1) is simply a quantitative
error estimator.
Our goal is to provide error estimates for scattered data interpolation with (condition-
ally) positive definite functions. We start by considering the PD case.
4.1 Fill distance
The measure that is always used in approximation theory is the fill-distance or mesh size
h = hX, := sup min kx xj k2 , (4.2)

x xi X
which represents how well the data in X fill out the domain and corresponds to the radius
of the largest empty ball that can be placed among the data sites inside .
45
Figure 4.1: The fill distance of 25 Halton points h 0.2667
In Matlab the fill distance can be determined by the line h=max(min(DM eval)), where
DM eval is the matrix consisting of the mutual distances between the evaluation points (for
example a uniform grid in ) and the data set X. In Fig. 4.1 we show 25 Halton points and
the corresponding fill distance computed on a grid of 11 11 evaluation points of [0, 1]2 .
(h)
What we want to analysize, is whether the error kf Pf k 0 as h 0, here
(h)
Pf indicates the interpolant depending on the fill-distance h. To understand the speed
of convergence to zero, one has to understand the so-called approximation order of the
interpolation process.
Definition 11. We say that the process has approximation order k if
(h)
kf Pf kp = O(hk ), k 0
here the norm is taken for 1 p .
For completeness, another quantity often used for stability analysis purposes (or finding
good interpolation points) is the separation distance of the data sites X
1
qX := min kxi xj k
2 i6=j
which represents the radius of the largest ball that can be placed around every point of
X such there are no overlaps. In Matlab qX=min(min(DM data+eye(size(DM data))))/2,
where DM data is the matrix of distances among the data sites. We added the identity
matrix in order to avoid that this minimum will be 0, which is the separation distance of
every points from itself.
Finally, the ratio

qX
X, :=
hX,
known as uniformity, which can be identified as X, = limY X X, among all point sets
Y X with X consisting, in some cases, of the Voronoi vertices used to decompose
Rd with Voronoi tiles. Therefore if X, 1 the data points are nearly equispaced in the
Euclidean norm. It has been proved in [5] that, in contrast with polynomial interpolation,
radial basis interpolants behaves better when the points are nearly equispaced,
46
4.2 Lagrange form of the interpolant
This idea goes back to Wu and Schaback [14] and consists in expressing the interpolant by
means of cardinal functions as in the polynomial case.
Instead of solving the system Ac = y as we did in the previous lectures, we consider

the (new) system
Au (x) = b(x) (4.3)
where A is the N N positive definite matrix (invertible!) Ai,j = (xi xj ), i, j =
1, . . . , N , u = (u1 , . . . , uN )T , with the uj (xi ) = i,j (i.e. cardinal functions), and b =
(( x1 ), . . . , ( xN ))T .
Why this is possible? Thanks to the following theorem
Theorem 13. If is a positive definite kernel on Rd . Then, for any set of distinct
points x1 , . . . , xN , there exist functions uj span{( xj ), j = 1, . . . , N } such that
uj (xi ) = i,j .
Here we enumerate a few facts in analogy with univariate fundamental Lagrange polyno-
mials.
The uj can be determined as a ratio of determinants as for the fundamental Lagrange

polynomials. Letting V = det(uj (xi )), and Vi,x the same determinant when xi is
substituted by a general point x , then ui (x) = Vi,x /V .
The uj do not depend on the data values (i.e. the fj ). In fact, once the data sites
X and the kernel are chosen, then they can be determined by solving the system
(4.3). That is, they do depend on the data sites and the kernel.
An important aspect, related to stability issues, is the choice of the data points. As
proved in [5], the quasi-uniform points are always a good choice for RBF interpolation.
4.3 The power function
Another ingredient for understanding the error estimates is the power function. The starting
point to define this function is the following quadratic form
Q(u) = (0) 2uT b + uT Au (4.4)
where the vector b RN is defined as in the previous section and u is any N dimensional
vector.
47
Definition 12. On Rd let us consider a subset and a continuous kernel which we

assume PD. For any set X = {x1 , . . . , xN } the power function is defined as follows
p
P,X (x) = Q(u (x)) , (4.5)
where u is the vector of the cardinal functions in Theorem 13.
Using formula (4.4), combined with the system that defines the cardinal functions (4.3),
we get two alternatives ways to compute the power function
q q
P,X (x) = (0) (u (x)) b(x) = (0) (u (x))T Au (x) , (first) (4.6)
T
q
P,X (x) = (0) (b(x))T A1 b(x) , (second). (4.7)
Notice that when is a PD kernel then A is, therefore we get immediately the following
bounds: p
0 P,X (x) (0) .
Figure 4.2: Power function for the Gaussian kernel with = 6 on a grid of 81 uniform,
Chebyshev and Halton points, respectively.
An interesting characterization of the power function is given in the next Theorem.

48
Theorem 14. Let Rd and a PD kernel on Rd . Let X as usual a set of N pairwise

distinct points in . The minimum of the quadratic form Q(u) is when u = u (x), that is
Q(u (x)) Q(u), for all u RN .
Proof. Consider the formula (4.4), the minimum of this quadratic form is given by
the solution of the linear system Au = b(x) which, however yields the cardinal functions
u = u (x).
The power function, by definition, is a positive function, vanishing at the data sites,
decreasing to zero as the number of data points increases. Therefore, if we take two data
sets such that Y X then PY, PX, . This is referred as the maximality property of the
power function.
As a final remark, the power function is defined similarly for conditionally positive
definite functions.
4.4 Native space
The error bounds come rather naturally once we associate with each radial basic function
a certain space of functions called native space. This space in connected to the so called
Reproducing Kernel Hilbert Space (RKHS). The theory of RKHS is beyond our aims, but
for understanding a little better the error estimates that we will present, it is necessary to
introduce some very basic notions of RKHS.
Definition 13. A space of functions is called an Hilbert space if it is a real or complex
inner product space that is also a complete metric space w.r.t. the distance induced by the
inner product.
Z b
Here the inner product between two functions f and g is thought as (f, g) = f (x)g(x)dx ,
Z b a
in the real case or (f, g) = f (x)g(x)dx , in the complex case, which has many of the fa-
a
miliar properties of the Euclidean (discrete) dot product.
Examples of Hilbert spaces are: any finite dimensional inner product space (for example
Rn , Cn equipped with the dot product of two vectors); the Lebesgue spaces Lp , Sobolev
spaces. The space C([a, b]) is an incomplete product space dense in L2 ([a, b]) which is
complete.
Definition 14. Let H be a real Hilbert space of functions f : R with inner product
(, )H . A function K : R is called a RKHS for H if
(i) K(, x) H for all x

49
(ii) f (x) = (f, K(, x))H for all f H and all x .
The second is the reproducing property. In particular, if f = K then we get the kernel
K since K(x, y) = (K(, y), K(, x))H for all x, y .
Notice: the RKHS is known to be unique and that the kernel K is positive definite.
We now show that every positive definite radial basis function can be associated
with a RKHS: its native space.
From Definition 14, a space H should contains functions of the form

N
X
f () = ci K(, xi )
i=1
provided xj . Moreover
XN N
X
kf k2H = (f, f )H =( ci K(, xi ), cj K(, xj ))H
i=1 j=1
N
X
= ci cj (K(, xi ), K(, xj ))H
i,j=1
N
X
= ci cj K(xi , xj ) .
i,j=1
Hence we define this (infinite dimensional) space
HK () := span{K(, y) : y } (4.8)
with associated a bilinear form
XNK NK
X NK
X
( ci K(, xi ), dj K(, y j ))K = ci dj K(xi , y j )
i=1 j=1 i,j=1
where Nk = is also possible.
The last observation is that, this bilinear form defines an inner product on HK (), so
that Hk () is a pre-Hilbert space, that means that it is not complete.
The native space for the kernel K, indicated as NK () (or if not confusion arises, simply
N ), is the completition of HK () w.r.t. the K-norm k kK , so that kf kK = kf kN for all
f HK (). For details, please refer to the book by Wendland [13].
50
Remark. In dealing with positive definite (translation invariat) functions , we will

write
(x y) = K(x, y) ,
so that K is taken in instead of simply .
4.5 Generic error estimates
We quote here two results that gives a flavour of the topic. The first theorem gives a
pointwise estimate based on the power function, whose proof can be found in [6, p. 117-
118] while the second uses the fill-distance (the proof is again in [6, p. 121-122]).
Theorem 15. Let Rd and C( ) be PD on Rd . Let X = {x1 , . . . , , xn } be a

set of distinct points. Take a function f N () and denote with Pf its interpolant on X.
Then, for every x
|f (x) Pf (x)| P,X (x)kf kN () . (4.9)
where the norm of f is the native space norm.
We can express such error by means of the fill distance.
Theorem 16. Let Rd and C 2k ( ) be symmetric and positive definite. Let

X = {x1 , . . . , , xn } be a set of distinct points. Take a function f N () and its interpolant
Pf on X. Then, there exist positive constants h0 and C (independent of x, f and ), with
hX, h0 , such that
p
|f (x) Pf (x)| ChkX, C (x)kf kN () . (4.10)
and
C (x) = max max |D2 (w, z)| .
||=2k w,zB(x,c2 hX, )
Comparing (4.9) and (4.10) we get a bound of the power function in terms of the
fill-distance p
P,X (x) ChkX, C (x) .
Moreover the Theorem 16 says that interpolation with a C 2k kernel has approxima-
tion order k. This means that for kernels infinitely smooth, such as Gaussians, Laguerre-
Gaussians, Poisson and generalized inverse multiquadrics, the bound above is arbitrarely
high. On the contrary, Matern, Whittaker radial functions have approximation order limited
by the smoothness of the basic function .
One more observation is that the above estimates consider f N() . There exist
similar estimates for f
/ N() (see e.g. [6, 15.3]).
51
4.6 Strategies in reducing the interpolation error
This last section aims to give some insights to the problem of the choice of shape parameter
in order to get the smallest (possible) interpolation error. In the recent literature there
have been exploited various strategies. Here, we present only three of them, which turn out
to be indeed the most used by practictioners.
In all that follows, we assume to use the same kernel , we use only one value to scale
all basis functions uniformly. The number of data points could be changed by comparison
of results.
4.6.1 Trial and Error
It is the simplest approach. It consists in performing various interpolation experiments with

different values of the shape parameter. The best parameter, say will be the one that
minimize the interpolation error. In Figure 4.3 we plot the interpolation max-error varying
for different data points, using the Gaussian kernel in the univariate case. The minimum
of every curve gives the optimal value.
Figure 4.3: Trial and error strategy for the interpolation of the 1-dimensional sinc function
by the Gaussian for [0, 20], taking 100 values of and for different equispaced data
points.
52
4.6.2 Power function
This is simply connected to the error analysis presented in the previous section (cf. formula
(4.9)). Once we have decided which and data set X to use, we calculate the power
function on scaled version of the kernel in order to optimize the error component that is
independent of f . This approach is similar to the Trial and Error strategy and has the limit
to forget the second part of the error, i.e. the one that depends on the basis function via
the native space norm of f . In Figure 4.5 we plot the sup-norm of the power function for
500 values of [0, 20] for the Gaussian kernel and the set of uniform data points X with
N = 9, 25, 81, 289. This strategy is implemented in the M-file Powerfunction2D.m in the
Matlab codes provided in [6].
Figure 4.4: The sup-norm of the 2D power function on uniform points for the Gaussian
kernel, varying [0, 20] for different values of N
4.6.3 Cross validation
This method is popular in the statistics literature, known in the case the 2-norm is used
as PRESS (Predictive REsidual Sum of Squares). The optimal value is obtained
minimizing the (least-squares) error for a fit of the data values based on an interpolant for
which one of the centers is left out. In details, we start by constructing Pf,k , the radial basis
53
function interpolant to the data {f1 , . . . , fk1 , fk+1 , . . . , fN }, that is

N
X
Pf,k (x) = ci,k (x xi )
i=1,i6=k
such that
Pf,k (xi ) = fi , i = 1, . . . , k 1, k + 1, . . . , N .
Then we compute the error at the point xk , the one not used by Pf,k
Ek = fk Pf,k (xk ) .
Then the quality of the fit is determined by the (sup) norm of the vector E = [E1 , . . . , EN ]T .
In the experiments, people add a loop on in order to compare the error norms for
different values of the shape parameter, choosing for the one that yields the minimum
error norm. This method in general is quite expensive from the computation point of
view (it has a complexity of O(N 4 ) flops). There is a way to accelerate the method, by
computing Ek as
ck
Ek = 1
Ak,k
where ck is the k-th coefficient in the interpolant Pf based on all the data points and A1
k,k
is the k-th diagonal element of the inverse of the corresponding collocation matrix. Since
both ck and A1 will be computed once for each value of , this results in O(N 3 ) flops.
This strategy is implemented in the M-file LOOCV.m in the Matlab codes provided in [6].
Figure 4.5: LOOCV 1d: sup norm of the error on Chebyshev points for the Gaussian kernel,
varying [0, 20] for different values of N
54
4.7 Exercises
1. Find the optimal shape parameter, opt , by means of the trial & error strategy for the
following univariate functions:
(a)
sin x
f1 (x) = sinc(x) = .
x
(b) variant of the Franke function
3 (9x2)2 /4 2
1 2 1 2
f2 (x) = e + e(9x+1) /49 + e(9x7) /4 e(9x4) ,
4 2 10
For each of the fi , i = 1, 2 produce a table of the form
N kPfi fi k opt
3
5
9
17
33
65
where, for each N , opt corresponds to the minimum of the error curve in the sup-
norm, computed by varying the shape parameter [0, 20]. As radial basis function
for the Pfi take the Gaussian.
2. Plot the power function in 2-dimension for the Gaussian kernel with = 6 on a grid
of N = 92 = 81 equispaced, Chebyshev and Halton points in [1, 1]2 . Well see that
the power function will depend on the chosen points.
Verify that PX, (xi ) = 0 for all xi X. Show how varies the maximum value of the
power function as N increases.
Use the M-file Powerfunction2D.m.
3. Plot kP,X k for the Gaussian kernel in 2-dimensions, by using for the power function
the formula q
P,X (x) = (0) (b(x))T A1 b(x)
with A representing the interpolation matrix and b(x) = [(xx1 ), , (xxN )]T ,
by varying [0, 20], and N = 9, 25, 81, 289. Take equispaced points both as centers
and evaluation points of the power function.
Make a table similar to the one of the previous exercise adding one column for the
condition number of A corresponding to the optimal shape parameter. Use the
function Powerfunction2D.m for computing the 2-dimensional power function.
55
56
Bibliography
[1] Buhmann, M. D. Radial Basis Functions: Theory and Implementations. Cambridge

University Press, 2003.
[2] Chung K. C. and Yao, T. H.: On lattices adimmitting unique Lagrange interpolations,
SIAM J. Numer. Anal. 14 (1977), 735743.
[3] C. de Boor, A Practical Guide to Splines, revised edition, Springer, New York 2001.
[4] S. De Marchi: On Leja sequences: some results and applications, Appl. Math. Comput.
152(3) (2004), 621647.
[5] S. De Marchi, R. Schaback and H. Wendland: Near-Optimal Data-independent Point

Locations for Radial Basis Function Interpolation , Adv. Comput. Math. 23(3) (2005),
317330
[6] Gregory E. Fasshauer, Meshfree Approximation Methods with Matlab, World Scientific
Publishing, Interdisciplinary Mathematical Sciences - Vol 6, 2007.
[7] Gregory E. Fasshauer, Meshfree Approximation Methods with Matlab, Lecture 1, slides.
Dolomites Res. Notes Approx. - Vol 1, 2008.
[8] Armin Iske, Multiresolution Methods in Scattered Data Modelling, Lecture Notes in
Computational Science and Engineering Vol. 37, Springer (2004)
[9] Meyer, C. D., Matrix Analysis and Applied Linear Algebra. SIAM (Philadelphia), 2000.
[10] L. Bos, M. Caliari, S. De Marchi, M. Vianello and Y. Xu: Bivariate Lagrange inter-
polation at the Padua points: the generating curve approach, J. Approx. Theory 143
(2006), 1525.
[11] Schoenberg I. J.: Metric spaces and completely monotone functions, Ann. of Math. 39
(1938), 811841.
[12] L. L. Schumaker, Spline Functions - Basic Theory, Wiley-Interscience, New York 1981.
[13] Holger Wendland, Scattered Data Approximation, Cambridge Monographs on Applied

and Computational Mathematics, Cambridge Univ. Press, 2005.
57
[14] Wu Z. and Schaback R. : Local error estimates for radial basis function interpolation
of scattered data, IMA J. Numer. Anal. 13 (1993), 1327.
58

LectureNotes RBF

Uploaded by

LectureNotes RBF

Uploaded by

Four lectures on

Radial Basis Functions

Antwerp, October 14-21, 2013

1 Learning from splines 9

1.2 From cubic splines to RBF . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 The scattered data interpolation problem . . . . . . . . . . . . . . . . . . . 13

1.3.1 The Haar-Mairhuber-Curtis theorem . . . . . . . . . . . . . . . . . . 15

1.3.2 Distance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.3.3 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.3.4 Halton points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Positive definite functions 25

2.1 Positive definite matrices and functions . . . . . . . . . . . . . . . . . . . . 25

2.1.1 Completely monotone functions . . . . . . . . . . . . . . . . . . . . . 28

2.1.2 Multiply monotone functions . . . . . . . . . . . . . . . . . . . . . . 29

2.1.3 Other positive definite radial functions . . . . . . . . . . . . . . . . . 31

3 Conditionally positive definite functions 37

3.0.1 Conditionally positive definite matrices and functions . . . . . . . . 39

4.1 Fill distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Lagrange form of the interpolant . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 The power function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 Native space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.5 Generic error estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.6 Strategies in reducing the interpolation error . . . . . . . . . . . . . . . . . 52

4.6.1 Trial and Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.6.2 Power function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6.3 Cross validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

1.1 Data points, data values and data function . . . . . . . . . . . . . . . . . . 14

1.4 Halton and Sobol points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.5 Lattice and Latin points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.6 Projections in 2D of different point sets . . . . . . . . . . . . . . . . . . . . 24

2.1 Left: Gaussian with c = 3. Right: inverse multiquadric with = 1. . . . . . 30

2.2 Left: truncated power with k = 1. Right: truncated power with k = 2. . . . 31

2.4 Poisson RBF for d = 2, 3, 4, respectively. Here the shape parameter is  = 10 33

2.6 Inverse multiquadric with shape parameter  = 3. On the left = 1/2

2.7 Whittaker functions with = 0, k = 2 and = 1, k = 2. . . . . . . . . . . . 36

3.1 Multiquadrics. Left: Hardy multiquadric. Right: with = 5/2. . . . . . . . 41

3.2 Radial powers. Left: with = 3,  = 3. Right: with = 5,  = 3. . . . . . 42

3.3 Thin plate splines. Left: for = 1,  = 1. Right: for = 2,  = 1. . . . . . 42

4.1 The fill distance of 25 Halton points h 0.2667 . . . . . . . . . . . . . . . . 46

2.1 Gaussians-Laguerre functions for different d and n . . . . . . . . . . . . . . 31

2.2 Poisson functions for various d . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3 Matern functions for different values of the parameter . . . . . . . . . . . 33

2.4 Whittaker functions for various choices of k and . . . . . . . . . . . . . . 34

Learning from splines

In practical applications we have to face the problem of reconstructing an unknown function

X : a < x1 < x2 < < xN < b (1.1)

experience in approximation theory would even try to interpolate a hundred

S3 (X) = {s C 2 [a, b] : s|[xi ,xi+1 ] P3 (R), 0 i N } (1.2)

N3 (X) = {s S3 (X) : s|[a,x1 ] , s|[xN ,b] P1 (R)} (1.3)

Here some important properties of splines that are worth to be mentioned

1. They are piecewise polynomials.

(f 00 s00f,X , s00f,X )L2 [a,b] = 0 .

This leads to the Pythagorean equation

kf 00 s00f,X k2L2 [a,b] + ks00f,X k2L2 [a,b] = kf 00 k2L2 [a,b] ,

1.2 From cubic splines to RBF

To derive the representation of s in [xN , b] we have simply to remove all subscripts + on

Thus, for s to be a natural spline, the coefficients of s have to satisfy

This is a first characterization of natural cubic splines.

Proposition 1. Every natural spline s has the representation

A,X := ((kxi xj k2 )1i,jN ,

is invertible. To be more precise we ask

1.3 The scattered data interpolation problem

Scattered data fitting is a fundamental problem in approximation theory and data

Mathematical challenge: we want a well-posed problem formulation

This will naturally lead to distance matrices

Later we generalize to radial basis functions or positive definite kernels

Problem 2. Given data (xj , yj ), j = 1, . . . , N , with xj Rd , yj R, find a (continuous)

Figure 1.1: Data points, data values and data function

Now, assume Pf is a linear combination of certain basis functions Bk , that is

1.3.1 The Haar-Mairhuber-Curtis theorem

We need to introduce the following definition

This is a counterexample useful to understand the necessity of a different approach

Example 1. It is not possible to perform unique interpolation with (multivariate) poly-

The Haar-Mairhuber-Curtis theorem tells us that if we want to have a well-posed mul-

1.3.2 Distance matrices

We want to construct a (continuous) function Pf that interpolates samples obtained from

2.4 Poisson RBF for d = 2, 3, 4, respectively. Here the shape parameter is = 10 33

2.6 Inverse multiquadric with shape parameter = 3. On the left = 1/2

3.2 Radial powers. Left: with = 3, = 3. Right: with = 5, = 3. . . . . . 42

3.3 Thin plate splines. Left: for = 1, = 1. Right: for = 2, = 1. . . . . . 42