Wavelets

WAVELETS
Jöran Bergh Fredrik Ekstedt Martin Lindberg
February 3, 1999
– Can’t you look for some money somewhere? Dilly said.
Mr Dedalus thought and nodded.
– I will, he said gravely. I looked all along the gutter in O’Connell street. I’ll
try this one now.
James Joyce, Ulysses
Preface
Why write another book on wavelets?

Well, we think that presently the treatises on wavelets fall broadly into either
the monograph or the handbook category. By a monograph we mean a
comprehensive and fully referenced text, while by a handbook we mean a
bare collection of recipes or algorithms.1
Thus, we perceived a need for an interpolation: a not too demanding
text for those who wish to get acquainted with the basic mathematical ideas
and techniques appearing in wavelet analysis, and at the same time get some
perspective of in which contexts, and how, the theory is presently applied.
The mathematical prerequisites needed for the reader are, in our minds,
on the advanced undergraduate level: we assume a working familiarity with
linear algebra, Fourier series and integrals. However, to better justify the
theory, also knowledge about convergence results for (Lebesgue) integrals
would have been desirable.
The prerequisites consequently presented us with a dilemma. Mathemat-
ical presentations should be made with precision, and scrupulous attention
to precision makes for too heavy demands on the reader we had in mind.
Our resolution of this dilemma was to suppress some of the mathematical
detail, and then refer the reader to more comprehensive texts. Our aim has
been to focus on the key ideas and techniques, and this has brought us to
the compromise you now have before you.
It seems unavoidable to make some errors of judgement when selecting
what to include and what to exclude. We have certainly made such errors.
We would be very grateful for every suggestion for improvement in this - and
any other - respect.
Göteborg in January 1999, JB FE ML
1
In addition, there have appeared a few books on the subject aimed at the general
public.
ii
iii
An Outline of the Contents

The book is divided into two parts: the first devoted to theory and the second
to applications. The chapters on filter banks and multiresolution analysis are
basic. Subsequent chapters are built on these, and can be read independently
of each other. There is also an introduction, in which we roughly sketch some
key ideas and applications.
In the theoretical part, we first review the foundations of signal pro-
cessing, and go on with filter banks. The main chapter of this part treats
multiresolution analysis and wavelets. We end the first part with chapters
on wavelets in more than one dimension, lifting, and the continuous wavelet
transform.
In the applications part, we first present some of the most well-known
wavelet bases. Next, we discuss adaptive bases, compression and noise re-
duction, followed by wavelet methods for the numerical treatment of, i.a.,
partial differential equations. Finally, we describe differentiability in wavelet
representations, an application of the continuous wavelet transform, feature
extraction, and some implementation issues.
At the conclusion of most chapters, we have put a section with sugges-
tions for further reading. The suggestions are meant to give starting points
to expand the subject matter, and/or obtain a better mathematical under-
standing.
Throughout, we have inserted exercises, which are meant to complement
the text, and to provide the reader with opportunities to make a few manip-
ulations to consolidate the intended message.
iv
Contents
1 Introduction 1
1.1 The Haar Wavelet and Approximation . . . . . . . . . . . . . 3
1.2 An Example of a Wavelet Transform . . . . . . . . . . . . . . 6
1.3 Fourier vs Wavelet . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Fingerprints and Image Compression . . . . . . . . . . . . . . 12
1.5 Noise Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
I Theory 15
2 Signal Processing 17
2.1 Signals and Filters . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 The z-transform . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Linear Phase and Symmetry . . . . . . . . . . . . . . . . . . . 24
2.5 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Two-Dimensional Signal Processing . . . . . . . . . . . . . . . 29
2.7 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Filter Banks 33
3.1 Discrete-Time Bases . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 The Discrete-Time Haar Basis . . . . . . . . . . . . . . . . . . 36
3.3 The Subsampling Operators . . . . . . . . . . . . . . . . . . . 39
3.4 Perfect Reconstruction . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Design of Filter Banks . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Multiresolution Analysis 49
4.1 Projections and Bases in L2 (R) . . . . . . . . . . . . . . . . . 49
4.2 Scaling Functions and Approximation . . . . . . . . . . . . . . 55
v
vi CONTENTS
4.3 Wavelets and Detail Spaces . . . . . . . . . . . . . . . . . . . 61

4.4 Orthogonal Systems . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 The Discrete Wavelet Transform . . . . . . . . . . . . . . . . . 71
4.6 Biorthogonal Systems . . . . . . . . . . . . . . . . . . . . . . . 74
4.7 Approximation and Vanishing Moments . . . . . . . . . . . . . 78
4.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Wavelets in Higher Dimensions 83

5.1 The Separable Wavelet Transform . . . . . . . . . . . . . . . . 83
5.2 Two-dimensional Wavelets . . . . . . . . . . . . . . . . . . . . 89
5.3 Non-separable Wavelets . . . . . . . . . . . . . . . . . . . . . 93
5.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6 The Lifting Scheme 101

6.1 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.3 Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.4 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 The Continuous Wavelet Transform 113

7.1 Some Basic Facts . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 Global Regularity . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.3 Local Regularity . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
II Applications 121
8 Wavelet Bases: Examples 123
8.1 Regularity and Vanishing Moments . . . . . . . . . . . . . . . 123
8.2 Orthogonal Bases . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.3 Biorthogonal Bases . . . . . . . . . . . . . . . . . . . . . . . . 133
8.4 Wavelets without Compact Support . . . . . . . . . . . . . . . 138
8.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9 Adaptive Bases 145

9.1 Time-Frequency Decompositions . . . . . . . . . . . . . . . . . 145
9.2 Wavelet Packets . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.3 Entropy and Best Basis Selection . . . . . . . . . . . . . . . . 153
9.4 Local Trigonometric Bases . . . . . . . . . . . . . . . . . . . . 156
9.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
CONTENTS vii
10 Compression and Noise Reduction 161

10.1 Image Compression . . . . . . . . . . . . . . . . . . . . . . . . 161
10.2 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
10.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
11 Fast Numerical Linear Algebra 173

11.1 Model Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 173
11.2 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
11.3 The Non-Standard Form . . . . . . . . . . . . . . . . . . . . . 176
11.4 The Standard Form . . . . . . . . . . . . . . . . . . . . . . . . 177
11.5 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
11.6 Multilevel Iterative Methods . . . . . . . . . . . . . . . . . . . 181
11.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
12 Functional Analysis 183

12.1 Differentiability and Wavelet Representation . . . . . . . . . . 183
12.2 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
13 An Analysis Tool 187

13.1 Two Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 187
13.2 A Numerical Sometime Shortcut . . . . . . . . . . . . . . . . . 192
13.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
14 Feature Extraction 193

14.1 The Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
14.2 Local Discriminant Bases . . . . . . . . . . . . . . . . . . . . . 194
14.3 Discriminant Measures . . . . . . . . . . . . . . . . . . . . . . 195
14.4 The LDB Algorithm . . . . . . . . . . . . . . . . . . . . . . . 196
14.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
15 Implementation Issues 199

15.1 Finite Length Signals . . . . . . . . . . . . . . . . . . . . . . . 199
15.2 Pre- and Post-Filtering . . . . . . . . . . . . . . . . . . . . . . 202
15.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
viii CONTENTS
Chapter 1
Introduction
We here ask the reader to think of, for example, a sound signal as recorded
by a microphone. We do this, since for most people it seems helpful to have
a specific concrete application in mind, when trying to get acquainted with
unfamiliar mathematical concepts.
In this introduction, we will attempt a first approximative answer to
the following questions, making some comparisons with the classical Fourier
methods, and presenting a few basic examples. In subsequent chapters, we
intend to answer them in greater detail.
What are wavelets and when are they useful?

Wavelets may be seen as a complement to classical Fourier decomposition
methods.
Classically, a signal (function) may be decomposed into its independent
Fourier modes, each mode with a distinct frequency but having no specific
localization in time.
Alternatively, the signal may be decomposed into its independent wavelet
modes, each mode belonging mainly in a frequency band and having a certain
localization in time. (This is related to – but different from – windowed
Fourier transform methods.)
Thus, the wavelet decomposition trades the sharply defined frequency in
the Fourier decomposition for a localization in time.
We define the Fourier transform of a function f by
Z ∞
fb(ω) = f (t)e−iωt dt = hf, eiω· i
−∞
R
where hf, gi = f (t)g(t) dt.
1
2 CHAPTER 1. INTRODUCTION
Under certain mild conditions on the function f we have both a Fourier

decomposition
Z ∞
1
f (t) = hf, eiω· i eiωt dω
2π −∞
and a wavelet decomposition
∞
X
f (t) = hf, ψj,k i ψj,k (t)
j,k=−∞
where ψj,k (t) = 2j/2 ψ(2j t − k) all are translations and dilations of the same
function ψ.
In general, the function
R ψ is more or less localized both in time and in
frequency, and with ψ(t) dt = 0: a cancellation/oscillation requirement.
If ψ is well localized in time it has to be less so in frequency, due to the
following inequality (coupled to Heisenberg’s uncertainty relation in quantum
mechanics).
Z Z 1/2 Z 1/2
(1.1) 2
|ψ(t)| dt ≤ 2 2
|tψ(t)| dt (2π) −1 b 2
|ω ψ(ω)| dω
Roughly put, the contributions are, respectively,

Z
1
hf, eiω· i eiωt dω
2π Ij
and
∞
X
hf, ψj,k i ψj,k (t)
k=−∞
from the frequency (octave) band Ij = {ω; 2j−1 + 2j−2 < |ω/π| < 2j + 2j−1}.
In the latter sum, each term is localized around t = 2−j k if ψ(t) is localized
around t = 0. The frequency content of ψj,k is localized around ω = 2j π if
the funtion ψRhas its frequency content mainly in a neighbourhood of ω = π.
(Recall that ψ(t) dt = 0 means ψ(0) b = 0.) For the Haar wavelet, shown
in Figure 1.1, the modulus of the Fourier transform is 4(sin ω/4)2/ω, ω > 0,
illustrating this.
In contrast, the harmonic constituents eiωt in the Fourier representation
have a sharp frequency ω, and no localization in time at all.
Thus, if hf, eiω· i is perturbed for some ω in the given frequency band Ij ,
then this will influence the behaviour at all times.
Conversely, if hf, ψj,k i is perturbed, then this will influence the behaviour
in the given frequency band Ij mainly, and in a neighbourhood of t = 2−j k
with a size comparable to 2−j mainly.
1.1. THE HAAR WAVELET AND APPROXIMATION 3
Exercises 1.0
1.1. Prove inequality (1.1), using the identity
ψ(t) = D(tψ(t)) − tDψ(t)
followed by a partial integration, and an application of the the Cauchy-

Schwarz inequality
|hf, gi| ≤ kf k kgk
where kf k is defined by
Z ∞ 1/2
2
kf k = |f (t)| dt
−∞
1.1 The Haar Wavelet and Approximation

The wavelet expansion converges to a given function, under certain mild
conditions. The general strategy in our illustration here is first to indicate
the approximation (in what sense we do not enter into here) of the given
function in terms of the scaling function on a certain fine enough scale, and
then express this chosen approximating function in terms of the wavelet.
Our presentation will also provide an argument for having a wavelet ex-
pansion of a given function in terms of wavelets, which all have integral 0,
although this may seem contradictory at a first glance.
We will consider the Haar wavelet system in a simple case. The Haar
wavelet and the corresponding Haar scaling function are, respectively, the
functions ψ and ϕ shown in Figure 1.1. Both functions are 0 outside the
interval (0, 1).
Note the two fundamental scaling relations expressing the functions on
one scale in terms of the scaling function ϕ on the halved scale.
ϕ(t) = ϕ(2t) + ϕ(2t − 1)

ψ(t) = ϕ(2t) − ϕ(2t − 1)
Now ϕ and ψ are orthogonal,R as well as {ϕ(t − k)}k , and {ψ(t − k)}k , in the
scalar product hf, gi := f (t)g(t) dt.
Let f (t) = t2 , 0 < t < 1, and f (t) = 0 elsewhere. We may approximate
f by its mean values over the dyadic intervals (k 2−j , (k + 1)2−j ). For j = 2
this is shown in Figure 1.2.
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−1 0 1 2 −1 0 1 2
Figure 1.1: The functions ϕ (left) and ψ (right).
The approximations are integer translates of dilated scaling functions.

The second mean value from the left is
Z 1
2
hf, 2ϕ(2 · −1)i = t2 2ϕ(22 t − 1) dt
0
where the dilated scaling function is normalized to have its square integral
equal to 1.
0.8
0.6
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1
Figure 1.2: The function and its chosen step approximation.
Clearly, this approximation gets better with increasing j, and, choosing

j large enough (fine enough scale), can be made as close as we wish (in L2 ).
Now, for illustration purposes, we start from the (arbitrarily chosen) ap-
proximation case j = 2, as in Figure 1.2. Now comes a crucial step. We may
1.1. THE HAAR WAVELET AND APPROXIMATION 5
express the approximating step function with its mean values over the dyadic
intervals with twice the length 2−(j−1) = 2−1 and, at the same time, record
the difference between the two approximating step functions. Note that the
difference will be expressed in terms of the Haar wavelet on the same doubled
scale in Figure 1.3.
1 1
0.8
0.6 0.5
0.4
0.2 0
0
−0.2
0 0.5 1 0 0.5 1
Figure 1.3: The first two successive means and differences at doubled scales.
We may repeat this procedure as many times as we please. When we

take mean values over the intervals of length 2j , j ≤ 0, only one value will be
different from 0: 2j /3. See Figure 1.4. Correspondingly, the contribution in
the wavelet expansion from these scales tend to 0 (in L2 ) as j → −∞.
0.8
0.6
0.4
0.2
−0.2
−0.4
0 0.5 1 1.5 2
Figure 1.4: The third successive means and differences.

1.2 An Example of a Wavelet Transform

We have (rather arbitrarily) pasted together a function shown in Figure 1.5.
The function consists of an sinusoidal part, a stepwise constant part, and a
parabolic part.
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1
−0.2 0 0.2 0.4 0.6 0.8 1
Figure 1.5: The given function.
The function is then decomposed in the plot to the left in Figure 1.6 into
independent (orthogonal) parts, a Multi-Resolution Decomposition, each part
essentially (but not exactly) in a frequency octave. In the plot to the right,
the wavelet coefficients are shown correspondingly.
Multi−Resolution Decomposition
−2
−4
−4
−6
Dyad
−6
−8
−8
−10 −10
0 0.5 1 0 0.5 1
t t
Figure 1.6: The decomposition (left) and the coefficients (right).
The horizontal axes are time, and the lowest graph represents the up-
1.3. FOURIER VS WAVELET 7
permost half of the available frequency range, the next graph represents the
upper half of the remaining lower half of the frequency range etc.
Note that the sharp changes in the function are clearly visible in the
resolutions. A corresponding plot of the Fourier spectrum only reveals fre-
quency peaks. Using time windows and doing Fourier transforms for each
window will reveal the same features as the multiresolution decomposition,
but choosing the right window size requires additional information in general.
Moreover, the number of operations in the multiresolution algorithm is
linear in the number of samples of the signal, where the fast Fourier transform
has an additional logarithmic factor.
1.3 Fourier vs Wavelet

We will now consider a sampled signal with the values (0, 0, 1, 0) and period 4.
We will compare the algorithmic implementations of the discrete Fourier and
the wavelet methods and their respective frequency building blocks. In Ex-
ercise 1.4, we indicate how a translation influences the respective transforms:
for the Fourier transform, we get a phase factor; for the wavelet transform,
the effect of a translation is, in general, not transparent.
From a Fourier analysis aspect, this sequence may be taken as the values
at the integers of the function
x(t) = 1/4 − 1/2 cos πt/2 + 1/4 cos πt
where the discrete Fourier sequence thus is (1/4, −1/4, 1/4, −1/4): the first
element, 1/4, being the mean of the original sequence. These are calculated
in the standard way: (xn is the original sequence with x2 = 1)
3
X
Xk = 1/4 xn e−2πikn/4 (k = 0, 1, 2, 3)
n=0
From a Haar wavelet analysis viewpoint, the sample sequence is encoded

in the function
x(t) = 0 ϕ(t) + 0 ϕ(t − 1) + 1 ϕ(t − 2) + 0 ϕ(t − 3)
where ϕ(t) = 1 (0 < t < 1) and = 0 elsewhere. In Figure 1.7, the sample val-
ues thus are depicted to the right of their respective indices on the horizontal
t axis.
We will now show how the different frequency bands contribute to the
(different) functions.
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 1 2 3 4 0 1 2 3 4
Figure 1.7: The Fourier (left) and the Haar (right) representations.
The Fourier component with highest frequency is 1/4 cos πt. The Haar
wavelet component with highest frequency is obtained as follows.1
 
0
1 −1 0 0  
 0 = 0
0 0 1 −1  1  1
0
This measures the difference between the adjacent elements taken pairwise,
and the resulting sequence is encoded in the function
Gx(t) = 0 ψ(t/2) + 1 ψ(t/2 − 1)
which is a high-frequency part, and where
ψ(t) = 1 ϕ(2t) − 1 ϕ(2t − 1)
is the Haar wavelet. Here the two coefficients ±1 are the non-zero entries in
the filter matrix. These two components are shown in Figure 1.8, where the
localization is obvious in the Haar component, but it is not clear from the
Fourier component.
The corresponding means are calculated in the analogous way.
 
0
1 1 0 0  
 0 = 0
0 0 1 1  1  1
0
This is encoded in the function
Hx(t) = 0 ϕ(t/2) + 1/2 ϕ(t/2 − 1)
1
For notational simplicity, we have suppressed a normalizing factor 21/2 . See Exercise
1.2
1 1
0.5 0.5
0 0
−0.5 −0.5
0 1 2 3 4 0 1 2 3 4
Figure 1.8: The Fourier (left) and the Haar (right) components.
which is the corresponding low-frequency part shown in Figure 1.9. (The

factor 1/2 instead of the expected 1 comes from the suppressed normalizing
factors mentioned before. See Exercise 1.2.)
0.8
0.6
0.4
0.2
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4
Figure 1.9: The corresponding mean (low-frequency) component Hx.
Denoting the filter matrices above with the same letters G, H, and their
respective adjoints G∗ , H ∗, it is easy to verify that (I is the the identity
matrix)
G∗ G + H ∗ H = 2I
GH ∗ = HG∗ = 0
HH ∗ = GG∗ = 2I
The first equation shows that we may recover the original sequence from
the sequences encoded in Gx and Hx, and the middle two express that the
functions Gx and Hx are orthogonal. (Note that the encoded sequences are
not orthogonal as such, but that this is an orthogonality relation between
the columns of G and the columns of H).
For the next highest frequency, the Fourier component is −1/2 cos πt/2.
(Since the sequence is real-valued, X1 is the complex conjugate to X−1 = X3 .)
The corresponding Haar wavelet component is calculated from the previous
level means:

0
1 −1 = −1
1
This measures the difference between (adjacent pairs of) means. The result-
ing sequence is encoded in the function (normalizing factor suppressed; see
Exercise 1.2 )
−1/4 ψ(t/4)
This are shown in Figure 1.10.
1 1
0.5 0.5
0 0
−0.5 −0.5
0 1 2 3 4 0 1 2 3 4
Figure 1.10: The next highest Fourier (left) and Haar (right) components.
The corresponding means are also calculated from the previous level
means:

0
1 1 = 1
1
This is encoded in the function (0 < t < 4)
ϕ(t/4) ≡ 1
which thus represents the mean value 1/4 (see Exercise 1.2) of the original
sequence in Figure 1.11.
0.8
0.6
0.4
0.2
−0.2
0 0.5 1 1.5 2 2.5 3 3.5 4
Figure 1.11: The mean in both Fourier and Haar representations.
The wavelet analysis may be seen as a successive procedure of taking

mean values and differences at the double scale, recording the differences
and repeating the process with the mean values. This may be repeated until
the mean values are taken at a scale comparable to the length of the original
sequence divided by the length of the filter. In the above example, we got
exactly the mean value at the last step, as the length of the Haar filter is 2:
the filter is represented by the two non-zero entries in the matrices G and H.
Exercises 1.3
1.2. Verify that the correct values appear when choosing the normalized
functions in the encoding representations: for example, choosing the normal-
ized 21/2 ψ(2t − 1) instead of ψ(2t − 1). ’Normalized’ means that the integral
of the squared function equals 1 (the L2 norm is 1).
1.3. Compare the fast Fourier transform algorithm (FFT) and the above
Haar wavelet transform, with regard to how the non-locality and the locality
of the respective transform appear. This will already be apparent in the
four-point case.
1.4. Work out what happens if another segment is chosen, that is, the four
samples are (0, 0, 0, 1) and x0 = x1 = x2 = 0, x3 = 1. Compare the influence
on the Fourier and the wavelet representations.
1.4 Fingerprints and Image Compression

Storing fingerprints in an easily accessible archive means that the images
have to be stored electronically, and in digital form. The grey-scale of the
picture can be viewed as a function of two variables, g(x, y). Of course, this
function g has to be sampled into picture elements, pixels, which makes it
defined on a plane array of, say, 512 × 512 points. These 512 × 512 greyscale
values with a range of perhaps 64 shades available (as in Figure 1.12) then
amounts to 29+9+6 ≈ 2 Mbytes, just for one fingerprint.
Clearly, if the number of bits needed could be reduced to, say, 1%, then
both storage, retrieval, and communication would come at a much reduced
cost.
In fact, such a reduction or compression of the fingerprint information is
presently done by the Federal Bureau of Investigation in the USA, using a
wavelet technique. The success of the technique relies on the ability of the
wavelets to detect and encode local changes in a function (picture) and this
is exactly what constitutes the information of the fingerprint: the individual
pattern of skin ridges on the fingertips in Figure 1.12.
Another related challenge is the efficient compression of video signals,
e.g., to facilitate public communication of videos in real time over the web.
Algorithms for this compression are presently being commercially developed.
1.5 Noise Reduction

Another successful application of wavelet techniques is the suppression of
noise in signals. In a sense, this is related to the picture compression with
wavelets described in the previous section.
Considering, e.g., white noise, this is usually thought of as a stochastic
process, having realizations with a flat Fourier power spectrum. When the
wavelets are independent as an orthonormal basis, this translates to the
analoguous property for the wavelet coefficients. Thus, if the noise to be
suppressed is known to be white, then a simple thresholding of the wavelet
coefficients is a successful procedure in practice. (Thresholding means that
all wavelet coefficients below the chosen threshold are put to 0.)
1.6 Notes
Wavelet analysis has been used in signal/image processing practice for less
than two decades. Most of the mathematical ideas distinguishing wavelet
analysis from classical Fourier analysis are less than a century old.
1.6. NOTES 13
Figure 1.12: A fingerprint produced from a grey-scale digital representation.

A main reason for the applied mathematician’s interest in wavelets is the

increasing possibilities to perform calculations swiftly, i.e., those which are
realized in the development of yet more powerful computers.
For historical perspectives, we refer to the books by Meyer [24] [23], Dau-
bechies [11], Kahane & Lemarié [21]. These books, excepting [24], give a
comparatively fully detailed mathematical treatment of the theory.
The two books by Hubbard [19] and Meyer [24] are meant for a wider
readership, as is an overview article by Jawerth and Sweldens [20].
There are also books with different emphases. We mention here the books
by Strang and Nguyen [27] (signal processing), by Hernández and Weiss [16]
(Fourier transform techniques), by Chui [7] [8] (splines, signal processing),
and by Mallat [22] (signal processing).2
Information on the Web

There is much material on wavelets available on the Web/Internet. In par-
ticular, there is a periodical, Wavelet Digest, which is subscribable (free of
charge), and a (free) Matlab toolbox WaveLab.
We refer the reader to the URL
http://www.wavelet.org
2
These suggestions are not meant to be exhaustive.
Part I
Theory
15
Chapter 2
Signal Processing
In this chapter we review standard material from discrete-time signal pro-

cessing. The concepts introduced here will be employed throughout Chapter
3 in the context of filter banks. Continuous-time signals (or functions) will
be treated in Chapter 4, when we discuss wavelets. The first section defines
discrete-time signals and filters. Then follow two sections on the Fourier
and z-transform, and their applications to filter theory. Thereafter follows
a section on linear phase and symmetric filters, which will play a key role
later on in our study of symmetric wavelet bases. We end this chapter with
a review of vector spaces, two-dimensional signal processing, and sampling
of continuous-time signals.
2.1 Signals and Filters

A discrete-time signal x is a sequence of real or complex numbers
x = (xk )∞
k=−∞ = (. . . , x−1 , x0 , x1 , . . . ).
In most cases our signals will be real valued, but for the sake of generality
we assume that they are complex valued. Mathematically, a signal is then a
2
P x 2: Z → C. Moreover, a signal x ∈ ℓ (Z) if it has finite energy, that
function
is, k |xk | < ∞. In the next chapter we will have more to say about the
space ℓ2 (Z) and discrete-time bases.
A filter H is an operator mapping an input signal x into an output signal
y = Hx, and is often illustrated by a block diagram as in Figure 2.1. A filter
H is linear if it satisfies the following two conditions for all input signals x
and y, and real numbers a
(2.1a) H(x + y) = Hx + Hy,
(2.1b) H(ax) = aHx.
17
18 CHAPTER 2. SIGNAL PROCESSING
x H y
Figure 2.1: Block diagram of a filter.
A simple example of a linear filter is the delay operator D, which delays the
input signal x one step. A delay of n steps is written D n and is defined as
y = Dnx ⇔ yk = xk−n , for all k ∈ Z.
For a time-invariant (or shift-invariant) filter a delay in the input produces
a corresponding delay in the output; so that for all input signals x
H(Dx) = D(Hx).
That is, the operator H commutes with the delay operator, HD = DH.
From this it also follows that the filter is invariant to an arbitrary delay of n
steps, H(D n x) = D n (Hx).
Now, suppose H is a linear and time-invariant (LTI) filter. Let h be the
output, or response, when the input is a unit impulse
(
1 for k = 0,
δk =
0 otherwise,
that is, h = Hδ. The sequence (hk ) is called the impulse response of the
filter. Since the filter is time-invariant we have
D n h = H(D n δ).
So, if we write the input signal x of the filter as
X
x = · · · + x−1 D −1 δ + x0 δ + x1 Dδ + · · · = xn D n δ,
n
and use the fact that the filter is linear, we can write the output signal
y = Hx as
X X
y = H( xn D n δ) = xn H(D n δ)
n n
X
= xn D n h =: h ∗ x.
n
Here we have introduced a new operator, the convolution between h and x,

defined by
X X
(2.2) y = h ∗ x ⇔ yk = xn (D n h)k = xn hk−n .
n n
2.1. SIGNALS AND FILTERS 19
From this we conclude that a LTI filter is uniquely determined by its impulse
response, and that the output y always can be written as the convolution
between the input x and the impulse response h,
(2.3) y = Hx = h ∗ x.
This demonstrates how the properties of linearity and time invariance moti-
vates the definition of convolution. In the literature, it is common to restrict
the meaning of the word filter to an operator that is both linear and time-
invariant, and use the word operator in the more general case.
A finite impulse response (FIR) filter only has a finite number of coeffi-
cients different from zero. If a filter is not FIR, it is called infinite impulse
response (IIR).
A LTI filter is causal if it satisfies hk = 0 for k < 0. Non-causal filters are
often referred to as non-realizable, since they require the knowledge of future
values of the input signal. This is not necessarily a problem in applications,
where the signal values might already be stored on a physical medium such
as a CD-ROM. Further, FIR non-causal filters can always be delayed to make
them causal. Later, when we develop the theory of filter banks and wavelets,
it is convenient to work with non-causal filters.
The correlation x ⋆ y between two signals x and y is another sequence
defined by
X X
(2.4) (x ⋆ y)k = xn yn−k = xn+k yn .
n n
The result of correlating a signal with itself is referred to as the autocorrela-

tion of the signal.
Example 2.1. An example of a LTI filter is the averaging filter defined by
the impulse response
(
1/2 for k = 0, 1,
hk =
0 otherwise.
This filter is FIR and causal. The output y is computed as the convolution
between the input x and impulse response h,
X
yk = hn xk−n
n
= h0 xk + h1 xk−1
1
= (xk + xk−1 ).
2
That is, the output is the time-average of the past two input values.
Example 2.2. In filter banks the downsampling (↓ 2) and upsampling (↑ 2)

operators are fundamental. The downsampling operator removes all odd-
indexed values in a sequence, and the upsampling operator inserts a zero
between every value in the sequence,
(↓ 2)x = (. . . , x−4 , x−2 , x0 , x2 , x4 , . . . ),

(↑ 2)x = (. . . , 0, x−1 , 0, x0 , 0, x1 , 0, . . . ).
These two operators are linear but not time-invariant.
Exercises 2.1
2.1. Verify that the up- and downsampling operators are linear but not time-
invariant.
2.2. A filter is stable if all bounded input signals produce a bounded output
signal. A signal x is bounded if |xk | < C for all kPand for some constant C.
Prove that a LTI filter H is stable if and only if k |hk | < ∞.
2.2 The z-transform

We will now introduce the z-transform, and later the Fourier transform. The
action of filters in the time and frequency domain is fundamental in signal
processing. Convolution in the time domain becomes a simple multiplication
in the frequency domain. This is the key to the success of these transforms.
We define the z-transform of a discrete-time signal x by
∞
X
(2.5) X(z) = xk z −k , z ∈ C,
k=−∞
and occasionally we write this as x ⊃ X(z). The series is convergent, and

the function X(z) is analytic for all complex z inside some annular domain,
r < |z| < R, in the complex plane.
Example 2.3. Some simple sequences and their z-transforms are given by
1. x = δ, X(z) = 1, (impulse)
2. x = D n δ, X(z) = z −n , (delayed impulse)
(
1, k ≥ 0, z
3. xk = X(z) = . (unit step)
0, k < 0, z−1
2.2. THE Z-TRANSFORM 21
Let us now see how different operations in the time domain are translated
into the z-domain. A delay n steps corresponds to a multiplication by z −n
(2.6) x ⊃ X(z) ⇔ D n x ⊃ z −n X(z).
We will use the notation x∗ to denote the time reverse of the signal x, x∗k =
x−k , and we have
(2.7) x ⊃ X(z) ⇔ x∗ ⊃ X(z −1 ).
The usefulness of the z-transform is largely contained in the convolution
theorem. It states that convolution in the time domain corresponds to a
simple multiplication in the z-domain.
Theorem 2.1. (The convolution theorem)
(2.8) y =h∗x ⇔ Y (z) = H(z)X(z).
The transform H(z) of the impulse response of the filter is called the
transfer function of the filter. This means that we can compute the output
of a LTI filter by a simple multiplication in the z-domain. Often this is
easier than directly computing the convolution. To invert the z-transform one
usually uses tables, partial fraction expansion, and theorems. The correlation
also has a corresponding relation on the transform side
(2.9) y = x1 ⋆ x2 ⇔ Y (z) = X1 (z)X2 (z −1 ).
Example 2.4. Let us again consider the averaging filter from Example 2.1
given by h0 = h1 = 1/2. If we now compute the output in the z-domain we
proceed as follows
1 1
H(z) = + z −1 ,
2 2
1 1
Y (z) = H(z)X(z) = X(z) + z −1 X(z)
2 2
1
⇒ yk = (xk + xk−1 ).
2
Which is the same result as we obtained in the time domain.
Exercises 2.2
2.3. Verify relations (2.6) and (2.7).
2.4. Prove that the correlation between between x and y can be written as
the convolution between x and the time reverse y ∗ of y, that is, x ∗ y = x ⋆ y ∗ .
Then prove relation (2.9).
2.3 The Fourier Transform

The Fourier transform of a signal gives us the frequency content of the signal.
The frequency content often gives us valuable information about the signal
not revealed in the time domain, such as the presence of oscillations. For
filters, the Fourier transform of the impulse response tells us how different
frequencies in a signal are amplified and shifted in phase.
The discrete-time Fourier transform is defined as
∞
X
(2.10) X(ω) = xk e−iωk , ω ∈ R.
k=−∞
It follows that X(ω) is 2π-periodic. Note that we, with an abuse of notation,
use the same letter X to denote both the Fourier and z-transform of a signal
x. From the context, and the different letters ω and z for the argument,
it should be clear what we refer to. To obtain the signal values from the
transform we use the inversion formula
Z π
1
(2.11) xk = X(ω)eiωk dω.
2π −π
The Parseval formula tells us that the Fourier transform conserves the energy
in the following sense
X Z π
2 1
(2.12) |xk | = |X(ω)|2 dω,
k
2π −π
or more generally
X Z π
1
(2.13) hx, yi = xk yk = X(ω)Y (ω) dω.
k
2π −π
From the definitions of the Fourier and z-transform, we see that we obtain the
Fourier transform X(ω) from the z-transform X(z) through the substitution
z = eiω . The convolution theorem for the z-transform therefore gives us a
corresponding theorem in the frequency domain
(2.14) y =h∗x ⇔ Y (ω) = H(ω)X(ω).
The Fourier transform H(ω) of the impulse response of a LTI filter is

called the frequency response of the filter. An interesting property of LTI
filters is that a pure frequency input produces a pure frequency output, but
2.3. THE FOURIER TRANSFORM 23
with a different amplitude and phase. Let us see why this is so. If the input
xk = eiωk , where |ω| ≤ π, the output y is
X X
yk = hn xk−n = hn eiω(k−n)
n n
X
iωk −iωn
=e hn e = eiωk H(ω).
n
Write the complex number H(ω) in polar form, H(ω) = |H(ω)| eiφ(ω) , and
we get
yk = |H(ω)| ei(ωk+φ(ω)) .
Thus, the output is also a pure frequency but with amplitude |H(ω)| and a
phase delay of −φ(ω). By plotting the magnitude response |H(ω)| and the
phase function φ(ω) for |ω| ≤ π, we see how the filter affects different fre-
quency components of the signal. This is the reason for using the word filter
in the first place; it filters out certain frequencies components of the input
signal. A filter with magnitude response constant equal to one, |H(ω)| = 1,
is therefore called an allpass filter – all frequency components of the input
signal are unaffected in magnitude (but not in phase).
Example 2.5. For the averaging (or lowpass) filter h0 = h1 = 1/2 we have
1
H(ω) = (1 + e−iω ) = e−iω/2 (eiω/2 + e−iω/2 )/2
2
= e−iω/2 cos(ω/2).
From this we see that the magnitude |H(ω)| = cos(ω/2) for |ω| < π. To
the left in Figure 2.2 the magnitude response is plotted. We see that high
frequencies, near ω = π, are multiplied by a factor close to zero and low
frequencies, near ω = 0, by a factor close to one. For the differencing (or
highpass) filter


1/2 for k = 0,
gk = −1/2 for k = 1,


0 otherwise,
we have G(ω) = (1 − e−iω )/2. The magnitude response is plotted to the right
in Figure 2.2. These two filters are the simplest possible examples of low-
and highpass filters, respectively.
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 1 2 3 0 1 2 3
Figure 2.2: The magnitude responses of a low- and a highpass filter.
Example 2.6. The ideal lowpass filter suppresses frequencies above the cut-
off at ω = π/2 completely, and frequencies below this cut-off passes through
unaffected. This filter is defined by the frequency response function
(
1, |ω| < π/2,
H(ω) =
0, π/2 < |ω| < π.
It follows from the inversion formula (2.11) that the filter coefficients are
samples of a sinc function
sin(πk/2)
(2.15) hk = 2 sinc(k/2) = 2 · .
πk/2
Exercises 2.3
2.5. First, show that the filter coefficients of the ideal lowpass filter are given
by (2.15). Then, compute the filter coefficients (gk ) of the ideal highpass filter
(
0, |ω| < π/2,
G(ω) =
1, π/2 < |ω| < π.
2.4 Linear Phase and Symmetry

A filter has linear phase if its phase function φ(ω) is a linear function. More
generally, a filter also has linear phase if the phase function is piecewise linear
with constant slope. The points of discontinuity of a linear phase filter occurs
where H(ω) = 0.
Linear phase is an important property in many applications, such as
speech and sound processing, since a filter with linear phase delays different
2.4. LINEAR PHASE AND SYMMETRY 25
frequency components an equal amount. Later, we will also see how lin-
ear phase in a filter bank corresponds to symmetric (and non-orthogonal)
wavelets.
Example 2.7. From Example 2.5 we see that the phase of the lowpass filter
h0 = h1 = 1/2 is
φ(ω) = −ω/2, |ω| < π.
This is an example of a linear phase filter.
If the filter coefficients of a filter H are symmetric around zero, so that

hk = h−k , the frequency response will be real valued and even,
H(ω) = h0 + h1 (eiω + e−iω ) + h2 (ei2ω + e−i2ω ) + · · ·

= h0 + 2h1 cos ω + 2h1 cos 2ω + · · · .
The filter then has zero phase, φ(ω) = 0. A filter that is antisymmetric
around zero will have an imaginary, and odd, frequency response function
with phase π/2 or −π/2. If hk = −h−k we have h0 = 0 and
H(ω) = h1 (−eiω + e−iω ) + h2 (−ei2ω + e−i2ω ) + · · ·

= −2i(h1 sin ω + h2 sin 2ω + · · · ).
Note that the sign of the factor (h1 sin ω +h2 sin 2ω +· · · ) determines whether
the phase is π/2 or −π/2, and that this depends on the frequency ω. (−2i =
2e−iπ/2 .)
A causal filter can not have zero or constant phase. A causal filter can
on the other hand be symmetric or antisymmetric, but not around zero.
Causal filters have linear phase when they are symmetric or antisymmetric:
hk = hN −k or hk = −hN −k , respectively. Here we have assumed that we
have a FIR filter with nonzero coefficients h0 , h1 , . . . , hN . We will then have
a factor e−iN ω/2 in H(ω), and we see the linear term −Nω/2 in the phase.
Example 2.8. The FIR filter with nonzero coefficients h0 = h2 = 1/2 and
h1 = 1 is symmetric, and
1 1
H(ω) = + e−iω + e−i2ω
2 2
= e−iω (1 + cos ω).
The filter has linear phase, φ(ω) = −ω, since 1 + cos ω ≥ 0 for all ω.
Recall that the frequency responses of symmetric and antisymmetric filters

were even and odd, respectively; that is, H(ω) = ±H(−ω). For the z-
transform this means that
H(z) = ±H(z −1 ).
Conclusion: for symmetric and antisymmetric filters the zeros of H(z) must
come in pairs as zi and zi−1 . When we construct wavelets later in this book,
we will see that symmetric wavelets corresponds to symmetric filters in a
filter bank.
The group delay of a filter is defined as
dφ
τ (ω) = − ,
dω
where φ(ω) is the phase of the filter. The group delay measures the delay at
the frequency ω.
Example 2.9. Suppose that the input x of the linear phase filter in Example
2.8 equals the sum of two pure frequencies, xk = eiω1 k + eiω2 k . Since φ(ω) =
−ω the group delay τ (ω) = 1 and the output y then equals
yk = H(ω1)eiω1 k + H(ω2)eiω2 k
= |H(ω1 )| e−iω1 eiω1 k + |H(ω2 )| e−iω2 eiω2 k
= |H(ω1 )| eiω1 (k−1) + |H(ω2 )| eiω2 (k−1) .
We see that the two oscillations are both delayed one step.
If the group delay is constant, different frequencies are delayed an equal

amount. The filter then necessarily has linear phase.
Exercises 2.4
2.6. Show that the frequency response of the highpass filter g0 = 1/2 and
g1 = −1/2 can be written as
G(ω) = ie−iω/2 sin(ω/2).
Then compute and plot the magnitude and phase of G(ω). Note that the
factor sin(ω/2) is not positive for all |ω| < π.
2.5. VECTOR SPACES 27
2.5 Vector Spaces

Our interest lies in the vector spaces Rn and Cn . These are n-dimensional vec-
tor spaces over the real and complex numbers, respectively. A vector x ∈ Rn ,
or Cn , is represented as a column vector of n numbers, x = (x1 . . . xn )t . The
scalar product of two vectors x and y is defined as
(2.16) hx, yi = x1 y1 + · · · + xn yn ,
and the vectors are orthogonal if hx, yi = 0. The vectors ϕ(1) , . . . , ϕ(n) is a
basis of Rn (or Cn ) if every vector x ∈ Rn (or Cn ) can be written uniquely
as
x = a1 ϕ(1) + · · · + an ϕ(n) .
The numbers a1 , . . . , an are the coordinates of x with respect to the basis

ϕ(1) , . . . , ϕ(n) . A basis is orthonormal if
(
(j) (k) 1, j = k,
(2.17) hϕ , ϕ i =
0, j 6= k.
For an orthonormal basis the coordinates ak = hx, ϕ(k) i, and then
x = hx, ϕ(1) iϕ(1) + · · · + hx, ϕ(n) iϕ(n) .
The natural basis of Rn is orthonormal and given by the basis vectors

(
(j) 1, k = j,
δk =
0, k 6= j.
The coordinates of a vector x with respect to this basis is x1 , . . . , xn . If two

vectors x and y have the coordinates a1 , . . . , an and b1 , . . . , bn with respect to
the orthonormal basis ϕ(1) , . . . , ϕ(n) , respectively, the scalar product between
these two vectors can be computed as
(2.18) hx, yi = a1 b1 + · · · + an bn .
Example 2.10. In R2 the natural basis is given by two vectors: δ (1) = (1, 0)t
and δ (2) = (0, 1)t . Another orthonormal basis of R2 is the natural basis
rotated 45 degrees counter-clockwise and has the basis vectors

(1) 1 1 (2) 1 −1
ϕ =√ , and ϕ = √ .
2 1 2 1
From elementary linear algebra we know that the coordinate transforma-

tion of a vector, with respect to two orthonormal bases, is represented by
an orthogonal matrix. For assume that the vector x ∈ Rn has the coordi-
nates a = (a1 , . . . , an )t and e a = (e an )t with respect to the two bases
a1 , . . . , e
ϕ(1) , . . . , ϕ(n) and ϕ
e(1) , . . . , ϕ
e(n) , respectively. Then we can represent x in two
ways,
x = a1 ϕ(1) + · · · + an ϕ(n) = e e(1) + · · · + e

a1 ϕ e(n) .
an ϕ
If we take the inner product of both sides of this equation with the basis
vector ϕ(j) we obtain an expression for the coordinate aj ,
n
X
aj = e e(k) , ϕ(j) i
ak hϕ ⇔ a = Pe
a.
k=1
The equivalence follows from the definition of matrix multiplication, and the
matrix P has elements Pjk = hϕ e(k) , ϕ(j) i. The matrix P is orthogonal since
P P t = P t P = I. For a vector x ∈ Cn the above also holds true, but the
matrix P is hermitian P P ∗ = P ∗ P = I. Here P ∗ is the adjoint, or the
∗
complex conjugate transpose, of P : Pjk = P kj . Note that for a hermitian,
or orthogonal, matrix we have
hx, P yi = hP ∗x, yi, for all x, y ∈ Cn .
Example 2.11. Let us see how to transform the coordinates of a vector

x ∈ R2 from the natural coordinate system to the one given in Example
2.10. First we write x as
x = a1 ϕ(1) + a2 ϕ(2) = x1 δ (1) + x2 δ (2) ,
and then we compute the elements of P ,

(i)
Pij = hδ (j) , ϕ(i) i = ϕj ,
to get the full transformation matrix

a1 1 1 1 x1
a = Px ⇔ =√ .
a2 2 −1 1 x2
Note that it is very easy to find the transformation from a to x (since P is

orthogonal), x = P −1 a = P t a.
2.6. TWO-DIMENSIONAL SIGNAL PROCESSING 29
A very interesting and useful transformation on Cn is the discrete Fourier

transform. For a vector x ∈ Cn the transform X ∈ Cn is defined as
n
1X
Xj = xk W −jk , where W = ei2π/n .
n
k=0
√
To make this transformation orthogonal we should replace 1/n by 1/ n, but
the convention is to define it as we have. We obtain x from X through the
inversion formula
n
X
xk = Xj W jk .
j=0
2.6 Two-Dimensional Signal Processing

We will now extend some of the previous definitions and results to two di-
mensions. For instance, we will define the two-dimensional discrete Fourier
transform.
In image processing it is natural to work with sampled functions of two
variables. These functions are defined on the integer grid
Z2 := {(kx ky )t : kx , ky ∈ Z}.
The integer grid Z2 can be seen as a subspace of R2 with integer elements,

and an element k ∈ Z2 is sometimes called a multi-index. A grey-scale image
can then be represented 1 as a function f : Z2 → R. Also, we write fk rather
than f (k) for the values of f .
A two-dimensional filter H is an operator mapping an input function
f into an output function g. Linearity for two-dimensional filters is defined
exactly as in one dimension. The shift operator S n , where n = (nx ny )t ∈ Z2 ,
is defined by
g = S nf ⇔ gk = fk−n , for all k ∈ Z2 .
For a shift-invariant filter
H(S n f ) = S n (Hf ), for all n ∈ Z2 ,
and all functions f .

1
Of course, an N × N image is nonzero only in a finite region, 0 ≤ kx , ky < N .
Suppose H is a linear and shift-invariant filter. Let h be the output when

the input is a two-dimensional unit impulse
(
1 for k = (0 0)t ,
δk =
0 otherwise,
that is, h = Hδ. Just as for signals we can now write the output g = Hf as
a convolution (in two dimensions)
X
g= fn S n h =: h ∗ f.
n∈Z2
The two-dimensional discrete Fourier transform is defined as

X
F (ξ, η) = fk e−i(ξkx +ηky ) , ξ, η ∈ R,
k∈Z2
and we then have the 2-D convolution theorem:

g =h∗f ⇔ G(ξ, η) = H(ξ, η)F (ξ, η).
Two-dimensional filters will be further discussed in Chapter 5 on wavelets in
higher dimensions.
2.7 Sampling
Here we will briefly explain the mathematical basis of exchanging a function
for its sample values at, say, the integers. The fundamental result is Poisson’s
summation formula.
For a continuous-time signal f (t) we define its Fourier transform as
Z ∞
b
f(ω) = f (t)e−iωt dt,
−∞
and the inverse transform is given by

Z ∞
1 b iωt
f (t) = f(ω)e dω.
2π −∞
Theorem 2.2 (Poisson’s summation formula). Assume that the func-
tion f and its Fourier transform both are continuous and (for simplicity)
have quadratic decay at infinity. Then holds
X X
(2.19) b − 2kπ) =
f(ω f (l)e−ilω
k l
2.7. SAMPLING 31
The proof is Exercise 2.8.
Now, if the function f has been lowpass filtered with cut-off frequency π,
b
that is, f(ω) = 0 for ω ≥ π, then holds in the pass-band |ω| < π
X
b
f(ω) = f (l)e−ilω
l
Here we have complete information about the Fourier transform fb(ω) in terms
of the sample values f (l). Thus, applying the inverse Fourier transform, we
get a formula, the Sampling Theorem, reconstructing the function from its
sample values.
Z πX
f (t) = 1/(2π) f (l)e−ilω eiωt dω
−π l
X
= f (l) sin π(t − l)/(π(t − l))
l
If the condition fb(ω) = 0 for |ω| ≥ π is violated, then there will be

interference (alias effects) from adjacent terms in (2.19), and fb(ω) will not,
in general, be retrievable from the sample values at the integers.
Exercises 2.7
2.7. Consider the function f (t) = sin πt. What happens when this is sam-
pled at the integers? Relate this to the conditions in the Sampling Theorem.
2.8. Prove Poisson’s summation formula (2.19), noting that the left-hand
side has period 2π, and that the right-hand side is a Fourier series with the
same period.
2.9. Work out Poisson’s summation formula (2.19) when the function f (t)
is sampled at the points t = 2−J k. What should be the maximum cut-off
frequency in this case? (ω = 2J π)
Chapter 3
Filter Banks
The study of filter banks in signal processing was one of the paths that led
to wavelets. In signal processing one has for a long time developed differ-
ent systems of basis functions to represent signals. In many applications it
is desirable to have basis functions that are well localized in time and in
frequency. For computational purposes these functions should also have a
simple structure and allow for fast computations. Wavelets satisfy all these
criteria.
The fast computation and representation of functions in wavelet bases is
intimately connected to filter banks. As a matter of fact, the so-called fast
wavelet transform is performed as a repeated application of the low- and
highpass filters in a filter bank. Filter banks operate in discrete time and
wavelets in continuous time. Wavelets are discussed in the next chapter.
3.1 Discrete-Time Bases

We would like to carry over the elements of linear algebra (see Section 2.5)
in finite dimensions to infinite-dimensional signals. A basic question is to
determine whether a sequence of signals (ϕ(n) )∞ n=−∞ is a basis, that is, if all
signals x can be written uniquely as
X
(3.1) x= cn ϕ(n) ,
n
for some numbers (cn ). Here we encounter a problem not present in finite
dimensions. The series expansion involves an infinite number of terms, and
this series must converge. It turns out that we can not find a basis for all
signals, and we will here restrict ourselves to those with finite energy. Then,
we can carry over all the concepts from finite-dimensional vector spaces in a
fairly straightforward way.
33
34 CHAPTER 3. FILTER BANKS
Convergence and Completeness

To discuss convergence we need a measure of the size of a signal. The norm
of a signal provides us with such a measure. Just as in finite dimensions
there are different norms, but the one we will use is the energy norm,
X 1/2
kxk = |xk |2 .
k
A vector space equipped with a norm is called a normed space. Now, a

sequence of signals (x(n) )∞
n=1 is said to converge (in the energy norm) to x if
(n)
x − x → 0, as n → ∞.
Accordingly, the equality in (3.1) means that the sequence of partial sums
N
X
(N )
s = cn ϕ(n) ,
n=−N

is convergent with limit x, that is, s(N ) − x → 0, as N → ∞.
Remark. In the definition of convergence we assumed that we had a candi-

date for the limit of the sequence. It would be convenient to have a definition,
or test, of convergence that does not involve the limit explicitly. A funda-
mental property of the real and complex number systems is that a sequence
of numbers is convergent if and only if it is a Cauchy sequence. In a general
normed space, a sequence (x(n) )∞n=1 is a Cauchy sequence if for all ǫ > 0 there
is an N such that
(n)
x − x(m) < ǫ, for all n, m > N.
Loosely speaking, a Cauchy sequence is a sequence where the vectors are

getting closer and closer together, or a sequence that is trying to converge.
In a general normed space all Cauchy sequences does not have to converge.
An example is Q, the set of all rational numbers. A normed space where
all Cauchy sequences are convergent is called a Banach space, or a complete
normed space. The finite dimensional vector spaces Rn and Cn are complete.
The space of all signals with finite energy,
ℓ2 (Z) = {x : Z → C | kxk < ∞},
is also complete. In this chapter we will assume that all signals are contained
in this space.
3.1. DISCRETE-TIME BASES 35
Hilbert Spaces and Orthonormal Bases

Just as in finite dimensions, it is convenient to work with orthogonal bases.
To discuss orthogonality we need to impose further structure on our space
ℓ2 (Z), by defining an inner product between two vectors. The inner product
between two signals x and y in ℓ2 (Z) is defined as
X
hx, yi = xk yk .
k
Convergence of this infinite sum follows from the Cauchy-Schwarz inequality,
|hx, yi| ≤ kxk kyk .
Note that we now can write the norm of x ∈ ℓ2 (Z) as
kxk2 = hx, xi.
A complete space with an inner product is called a Hilbert space. We have

only looked at the particular Hilbert spaces Rn , Cn , and ℓ2 (Z) so far, but
in the chapters on wavelets below we also work with the space L2 (R) of all
continuous-time signals with finite energy.
We now have all the tools necessary to define orthonormal bases for our
signals. First, recall definition (3.1) of a basis. A basis (ϕ(n) )∞
n=−∞ is or-
thonormal if
(
1, j = k,
hϕ(j) , ϕ(k) i =
0, j 6= k.
For an orthonormal basis the coordinates of a signal x are given by
(3.2) cn = hx, ϕ(n) i.
It follows that the signal can be written as

X
(3.3) x= hx, ϕ(n) iϕ(n) .
n
Equation (3.2) is referred to as an analysis and equation (3.3) as a synthesis

of the signal x, respectively.
3.2 The Discrete-Time Haar Basis

How do we find bases for our space ℓ2 (Z), and what properties do we want
these bases to have? Usually we want the basis functions to be well localized
both in time and in frequency. The coordinates of a signal in the basis will
then provide a measure of the strength of the signal for different time and
frequency intervals. We will construct bases starting from two prototype ba-
sis functions and their even translates. These bases are further characterized
by the fact that the coordinates in the new basis can be computed using
a filter bank. This is another important property, since it means that the
coordinate transformation can be computed efficiently.
The Basis Functions

The discrete-time Haar basis is an example of the special class of orthogonal
bases that are related to filter banks. These orthogonal bases are character-
ized by the fact that they are formed as the even translates of two prototype
basis functions ϕ and ψ. For the Haar basis these two functions are given by,
 √
( √ 
1/ 2 for k = 0, 1, 1/ √ 2 for k = 0,
ϕk = and, ψk = −1/ 2 for k = 1,
0 otherwise, 

0 otherwise.
The basis functions (ϕ(n) ) are now formed as the even translates of these two
prototypes (see Figure 3.1),
(2n) (2n+1)
ϕk = ϕk−2n , and, ϕk = ψk−2n .
The coordinates of a signal x in this new basis are consequently,
1
yn(0) := c2n = hx, ϕ(2n) i = √ (x2n + x2n+1 ),
2
(3.4)
1
yn(1) := c2n+1 = hx, ϕ(2n+1) i = √ (x2n − x2n+1 ),
2
in other words, weighted averages and differences of pairwise values of x.
Another way of interpreting this is to say that we take pairwise values of x,
and then rotate the coordinate system in the plane (R2 ) 45 degrees counter-
clockwise. Here, we have also introduced the sequences y (0) and y (1) con-
sisting of the even- and odd-indexed coordinates, respectively. The basis
functions is an orthonormal basis of ℓ2 (Z), and we can therefore reconstruct
3.2. THE DISCRETE-TIME HAAR BASIS 37
(0) (2n)
ϕk ϕk
............................................................................ -
0 1 2n 2n + 1 k
(1) (2n+1)
ϕk ϕk
............................................................................ -
0 1 2n 2n + 1 k
Figure 3.1: The discrete-time Haar basis.
the signal values as

X (n)
xk = cn ϕ k
n
X (2n)
X (2n+1)
(3.5) = yn(0) ϕk + yn(1) ϕk
n n
X X
= yn(0) ϕk−2n + yn(1) ψk−2n .
n n
Analysis
We will now show how we can use a filter bank to compute the coordinates
in (3.4). If we define the impulse responses of two filters H and G as
hk = ϕk , and, gk = ψk ,
and if we let h∗ and g ∗ denote the time-reverses of these filters, we can write
the inner products in (3.4) as a convolution,
X (2n)
X X
yn(0) = xk ϕk = xk hk−2n = xk h∗2n−k
k k k
∗
= (x ∗ h )2n .
(1)
Similarly, we get yn = (x ∗ g ∗ )2n .
The conclusion is that we can compute y (0) and y (1) by filtering x with H ∗
and G∗ , respectively, and then downsampling the output of these two filters
(see Figure 3.2). Downsampling removes all odd-indexed values of a signal,
and we define the downsampling operator (↓ 2) as
(↓ 2)x = (. . . , x−2 , x0 , x2 , . . . ).
Thus, we have
y (0) = (↓ 2)H ∗x, and y (1) = (↓ 2)G∗ x.
Here, H ∗ and G∗ denote the low- and highpass filters with impulse responses
h∗ and g ∗, respectively. These two filters are non-causal since their impulse
responses are the time-reverses of causal filters. This is not necessarily a
problem in applications, where the filters can always be made causal by
delaying them a certain number of steps; the output is then delayed an equal
number of steps.
H∗ ↓2 y (0)
x
G∗ ↓2 y (1)
Figure 3.2: The analysis part of a filter bank.
Synthesis
So far we have seen how we can analyze a signal, or compute its coordinates,
in the Haar basis using a filter bank. Let us now demonstrate how we can
synthesize, or reconstruct, a signal from the knowledge of its coordinates.
From the definition of the filters H and G, and from the reconstruction
formula (3.5) we get
X X
xk = yn(0) ϕk−2n + yn(1) ψk−2n
n n
X X
= yn(0) hk−2n + yn(1) gk−2n
n n
(0) (1)
= (v ∗ h)k + (v ∗ g)k ,
where v (0) = (↑ 2)y (0) and v (1) = (↑ 2)y (1) , see Exercise 3.1. Here the
upsampling operator (↑ 2) is defined as
(↑ 2)y = (. . . , y−1, 0, y0, 0, y1, . . . ).
Hence, the signal x is the sum of to signals,
x = v (0) ∗ h + v (1) ∗ g

= H (↑ 2)y (0) + G (↑ 2)y (1) =: x(0) + x(1) .
3.3. THE SUBSAMPLING OPERATORS 39
The signals x(0) and x(1) are obtained by first upsampling y (0) and y (1) , and
then filtering the result with H and G, respectively; see Figure 3.3. On the
other hand, if we look back at the reconstruction formula (3.5), we can write
x as
x = x(0) + x(1)
X X
= hx, ϕ(2n) iϕ(2n) + hx, ϕ(2n+1) iϕ(2n+1) ,
n n
which means that x(0) and x(1) are the orthogonal projections of x onto the
subspaces spanned by the even and odd basis functions, respectively.
x(0)
y (0) ↓2 H
+ x
y (1) ↓2 G x(1)
Figure 3.3: The synthesis part of a filter bank.
Exercises 3.2
P
3.1. Show that (v ∗ h)k = n yn hk−2n , where v = (↑ 2)y. Observe that
v2n = yn .
3.3 The Subsampling Operators

In this section we study the effect of down- and upsampling on a signal.
Specifically, we arrive at formulas for how the z- and Fourier transform of a
signal change, when the signal is subsampled.
Downsampling
The downsampling operator (↓ 2) removes all odd-indexed values of a signal,
and is consequently defined as
(↓ 2)x = (. . . , x−2 , x0 , x2 , . . . ).
If we let y = (↓ 2)x we have yk = x2k , and in the z-domain we get (see
Exercise 3.2)
1
(3.6) Y (z) = X z 1/2 + X −z 1/2 .
2
The corresponding relation in the frequency domain is

1 h ω ω i
Y (ω) = X +X +π .
2 2 2
From these relations we see that we get an alias component in the spectrum
(frequency domain) of y from the term X(ω/2 + π). The filters before the
upsampling operators will reduce this alias effect and if the filters are ideal
low- and highpass filters, respectively, the alias component will be completely
deleted.
Upsampling
The upsampling operator (↑ 2) inserts a zero between every value of a signal,
(↑ 2)y = (. . . , y−1, 0, y0, 0, y1, . . . ).
If we let u = (↑ 2)y it is easy to verify (see Exercise 3.2) that

(3.7) U(z) = Y z 2 ,
and for the Fourier transform we get
U(ω) = Y (2ω).
The spectrum U(ω) of u is thus a dilated version of Y (ω) by a factor two.

This will result in the appearance of an image in U(ω). The low- and highpass
filters after the upsampling reduce the effect of the image, and if the filters
are ideal low- and highpass filters the image will be completely deleted.
Down- and Upsampling

By combining the results from the previous two sections, we obtain a relation
between a signal x and the down- and upsampled signal
u = (↑ 2)(↓ 2)x = (. . . , x−2 , 0, x0 , 0, x2 , . . . ).
In the z-domain we have

1
U(z) = [X(z) + X(−z)] ,
2
and in the Fourier domain
1
U(ω) = [X(ω) + X(ω + π)] .
2
3.4. PERFECT RECONSTRUCTION 41
Exercises 3.3
3.2. Prove relations (3.6) and (3.7).

3.3. Show that if Y (z) = X z 1/2 then Y (ω) = X(ω/2), and if U(z) =
Y (z 2 ) then U(ω) = Y (2ω).
3.4. Assume that a filter H has the transfer function H(z). Show that the
time-reverse H ∗ of the filter has the transfer function H ∗(z) = H (z −1 ).
3.5. Consider a signal x with a 2π-periodic Fourier transform X(ω). Plot

X(ω), X(ω/2), and X(ω/2 + π) so that you understand how the alias com-
ponent appears in the downsampled signal y = (↓ 2)x.
Similarly, consider a signal y and plot Y (ω) and U(ω) = Y (2ω), so that
you see the appearance of the image in the spectrum of u = (↑ 2)y.
3.4 Perfect Reconstruction

Above we studied the Haar basis which is an example of a special type of
discrete-time bases. These bases were characterized by the fact that they were
formed as the even translates of two prototype basis functions. We saw how
low- and highpass filters, followed by downsampling, in the analysis part of a
filter bank gave us the coordinates of the signal in the new basis. Similarly,
upsampling followed by low- and highpass filters gave us the expansion of
the signal in the new basis.
A filter bank consists of an analysis and a synthesis part as depicted in
Figure 3.4. The goal is to find conditions on the low- and highpass filters so
that the output x b equals the input x. This is called perfect reconstruction.
For a general filter bank, the impulse responses of the low- and highpass filters
in the synthesis part equals two prototype basis functions in a corresponding
discrete-time basis. If the basis is orthogonal, the analysis filters are the time-
reverses of the synthesis filters. More generally, we can have a biorthogonal
basis, and then the analysis filters are denoted as H e and G. e
y (0) x(0)
He∗ ↓2 ↓2 H
x + b
x
e∗
G ↓2 ↓2 G x(1)
y (1)
Figure 3.4: Filter bank.

Reconstruction Conditions
The results of the previous section for the down- and upsampling give us the
following expressions for the z-transform of x(0) and x(1)
1 h i
X (0) (z) = H(z) X(z)H e ∗ (z) + X(−z)H
e ∗ (−z) ,
2
1 h i
X (1) (z) = G(z) X(z)G e∗ (z) + X(−z)G e∗ (−z) .
2
b
Adding these together, we obtain an expression for the z-transform of x
b 1h e ∗ (z) + G(z)G
i
e∗ (z) X(z)
X(z) = H(z)H
2
1h e ∗ (−z) + G(z)G
i
e∗ (−z) X(−z),
+ H(z)H
2
where we have grouped the terms with the factors X(z) and X(−z) together,
respectively.
b, if the
From this we see that we get perfect reconstruction, that is x = x
factor in front of X(z) equals one, and the factor in front of X(−z) equals
zero. We thus get the following two conditions on the filters
(3.8) H(z)H e ∗ (z) + G(z)Ge∗ (z) = 2, (no distortion)

(3.9) e ∗ (−z) + G(z)G
H(z)H e∗ (−z) = 0. (alias cancellation)
The first condition ensures that there is no distortion of the signal, and the
second condition that the alias component X(−z) is cancelled. These two
conditions will appear again when we study wavelets. This is the key to the
connection between filter banks and wavelets. Due to different normaliza-
tions, the right-hand side in the no distortion condition equals 1 for wavelets
though.
Alias Cancellation and the Product Filter

At this stage we will define the highpass filters in terms of the lowpass filters,
so that the alias cancellation condition (3.9) automatically is satisfied. We
let the highpass filters equal
(3.10) e ∗ (−z),
G(z) = −z −L H e
and G(z) = −z −L H ∗ (−z),
where L is an arbitrary odd integer. In Exercise 3.6 below you are to verify
that this choice cancels the alias, and then you will also see why L has
to be odd. In this book we will in most cases choose L = 1. Choosing the
3.4. PERFECT RECONSTRUCTION 43
highpass filters in this way is a sufficient condition to ensure alias cancellation.

In Chapter 4 we motivate this choice further when discussing biorthogonal
bases.
If we substitute the alias cancellation choice (3.10) into the no distortion
condition (3.8), we get a single condition on the two lowpass filters for perfect
reconstruction,
e ∗ (z) + H(−z)H
H(z)H e ∗ (−z) = 2.
e ∗ (z), and this finally reduces

We now define the product filter P (z) = H(z)H
the perfect reconstruction condition to
(3.11) P (z) + P (−z) = 2.
The left hand side of this equation can be written

X
P (z) + P (−z) = 2p0 + 2 p2n z −2n .
n
From this we conclude that all even powers in P (z) must be zero, except the
constant term which should equal one. The odd powers all cancel and are
the design variables in a filter bank.
The design of a perfect reconstruction filter bank is then a question of
finding a product filter P (z) satisfying condition (3.11). Once such a product
filter has been found, it is factored in some way as P (z) = H(z)H e ∗ (z). The
highpass filters are then given by equation (3.10).
Example 3.1. Let us see if the discrete-time Haar basis satisfies the perfect
reconstruction condition. The filters are given by
e 1 e 1
H(z) = H(z) = √ (1 + z −1 ), G(z) = G(z) = √ (1 − z −1 ).
2 2
The product filter is then
e ∗ (z) = H(z)H(z
P (z) = H(z)H e −1 )
1
= (z + 2 + z −1 ).
2
This product filter indeed satisfies the perfect reconstruction condition, since
all even powers equal zero except the constant term p0 = 1.
Orthogonal Filter Banks

In the first part of this chapter we saw that for orthogonal discrete-time bases
the corresponding filters in the filter bank were related as
e
H(z) = H(z), e
and G(z) = G(z).
Such a filter bank is consequently called orthogonal. For orthogonal filter
banks we can write the perfect reconstruction condition solely in terms of
the synthesis lowpass filter H(z). The relation between the analysis and
synthesis filters implies that
e ∗ (z) = H(z)H(z −1 ),
P (z) = H(z)H
that is, the sequence p is the autocorrelation of h (see Chapter 2). In the
Fourier domain we have
P (ω) = H(ω)H(ω) = |H(ω)|2 ≥ 0.
This means that P (ω) is even and real-valued. Orthogonality thus implies
that the coefficients in the product filter must be symmetric, that is, pn =
p−n . This follows from Section 2.4 on linear phase and symmetry. In the
Fourier domain we can now write the perfect reconstruction condition (3.11)
as
(3.12) |H(ω)|2 + |H(ω + π)|2 = 2.
This condition will appear again when we study orthogonal wavelet bases.
As we mentioned earlier, the only difference is that for wavelets the left-hand
side should equal 1.
Biorthogonal Bases
Recall that an orthogonal filter bank could be seen as a realization of the
expansion of a signal into a special type of discrete-time basis. This basis was
formed by the even translates of two basis function ϕ and ψ, where ϕk = hk
and ψk = gk . And we had
X X
x= hx, ϕ(2n) iϕ(2n) + hx, ϕ(2n+1) iϕ(2n+1) ,
n n
(2n) (2n+1)
where ϕk = ϕk−2n and ϕk = ψk−2n . Now, a biorthogonal filter bank
corresponds to the biorthogonal expansion
X X
x= e(2n) iϕ(2n) +
hx, ϕ e(2n+1) iϕ(2n+1) ,
hx, ϕ
n n
= ψek−2n ; ϕ
ek = e
hk and ψek = gek .
(2n) (2n+1)
ek
Here, ϕ ek−2n and ϕ
=ϕ ek
3.5. DESIGN OF FILTER BANKS 45
Exercises 3.4
3.6. Verify that the alias cancellation choice (3.10) of the highpass filters
implies that condition (3.9) is satisfied.
3.7. There exists a filter bank that is even simpler than the one based on the
Haar basis – the lazy filter bank. It is orthogonal and given by the lowpass
filter H(z) = z −1 . What are the corresponding highpass filters? What is the
product filter, and does it satisfy the perfect reconstruction condition?
The signals y (0) and y (1) are then the odd- and even-indexed values of x,
respectively. Verify this! What are the signals x(0) and x(1) ? and is their
sum equal to x?
3.5 Design of Filter Banks

The discussion in the previous section showed that the construction of a
perfect reconstruction filter bank can be reduced to the following three steps:
1. Find a product filter P (z) satisfying P (z) + P (−z) = 2.
2. Factor, in some way, the product filter into the two lowpass filters,
P (z) = H(z)He ∗ (z).
3. Define the highpass filters as

e ∗ (−z),
G(z) = −z −L H e
and G(z) = −z −L H ∗ (−z),
where L is an arbitrary odd integer.
The coefficients in the product filter satisfies p0 = 1 and p2n = 0. In an
orthogonal filter bank we also have the symmetry condition pn = p−n . The
simplest example of such a product filter came from the Haar basis, where
1
P (z) = (z + 2 + z −1 ).
2
The question is now how to find product filters of higher orders. Such product
filters would, in turn, give us low- and highpass filters of higher orders, since
P (z) = H(z)H e ∗ (z). There are usually several different ways to factor P (z)
into H(z) and H e ∗ (z). For an orthogonal filter bank we have H(z) = H(z),
e
and in the more general case we have a so-called biorthogonal filter bank.
In this section we will describe how to construct one family of product
filters discovered by the mathematician Ingrid Daubechies. There are other
types of product filters, and we will discuss these later in Chapter 8 (which
defines several families of wavelet bases).
The Daubechies Product Filter

In 1988 Daubechies proposed a symmetric product filter of the following form
N N
1+z 1 + z −1
P (z) = QN (z),
2 2
where QN (z) is a symmetric polynomial with 2N − 1 powers in z,
QN (z) = aN −1 z N −1 + · · · + a1 z + a0 + a1 z −1 + . . . aN −1 z 1−N .
The polynomial QN (z) is chosen so that P (z) satisfies the perfect reconstruc-
tion condition, and it is unique.
So far, no conditions have actually stated that H and He should be lowpass
e
filters, or that G and G should be highpass filters. But we see that the
Daubechies product filter is chosen so that P (z) has a zero of order 2N for z =
−1, that is, P (ω) has a zero of order 2N for ω = π. This means that P (z) is
the product of two lowpass filters. As we will see when we present the wavelet
theory, the number of zeros are related to the approximation properties of
wavelet bases. This is where the theory of wavelets has influenced the design
of filter banks. There is nothing in the discrete-time theory that suggest why
there should be more than one zero at z = −1 for the lowpass filters.
Let us illustrate with two examples.
Example 3.2. For N = 1 we obtain the product filter for the Haar basis,

1+z 1 + z −1
P (z) = Q1 (z).
2 2
Here the condition p0 = 1 implies Q1 (z) = a0 = 2 and we have
1
P (z) = (z + 2 + z −1 ).
2
Example 3.3. For N = 2 we get the next higher-order product filter

2 2
1+z 1 + z −1
P (z) = Q2 (z).
2 2
Here Q2 (z) = a1 z + a0 + a1 z −1 , and if we substitute this expression into P (z)
and simplify we get
1
P (z) = (a1 z 3 + (a0 + 4a1 )z 2 + (4a0 + 7a1 )z + (6a0 + 8a1 )
16
+ (4a0 + 7a1 )z −1 + (a0 + 4a1 )z −2 + a1 z −3 ).
3.5. DESIGN OF FILTER BANKS 47
The perfect reconstruction conditions p0 = 1 and p2 = 0 give the linear

system

6a0 + 8a1 = 16,
a0 + 4a1 = 0,
with solution a0 = 4 and a1 = −1. We then have

1
P (z) = (−z 3 + 9z + 16 + 9z −1 − z −3 ).
16
Factorization
Let us first assume that we want to construct an orthogonal filter bank using
the symmetric Daubechies product filter. Then, since P (z) = H(z)H(z −1 ),
we know that the zeros of P (z) always come in pairs as zk and zk−1 . When
we factor P (z) we can, for each zero zk , let either (z − zk ) or (z − zk−1 ) be
a factor of H(z). If we always choose the zero that is inside or on the unit
circle, |zk | ≤ 1, then H(z) is called the minimum phase factor of P (z).
Now, suppose we also want the filter H(z) to be symmetric. Then the
zeros of H(z) must come together as zk and zk−1 . But this contradicts the
orthogonality condition except for the Haar basis, where both zeros are at
z = −1. Thus, orthogonal filter banks can not have symmetric filters. 1
In a biorthogonal basis, or filter bank, we factor the product filter as
P (z) = H(z)H e ∗ (z). There are several ways of doing so, and we then obtain
several different filter banks for a given product filter. In most cases, we
want the filters H(z) and H e ∗ (z) to be symmetric, unless we are designing an
orthogonal filter bank, that is.
Finally, since we want both H(z) and H e ∗ (z) to have real coefficients, we
always let the complex conjugate zeros zk and zk belong to either H(z) or
He ∗ (z).
Let us again illustrate this with an example.
Example 3.4. For N = 2 the Daubechies product filter was given by
1
P (z) = (−z 3 + 9z + 16 + 9z −1 − z −3 )
16
2 2
1+z 1 + z −1
= (−z + 4 − z −1 ).
2 2
1
We have assumed that all filters are FIR. A filter bank with IIR symmetric filters can
be orthogonal.
√
This polynomial
√ has four zeros at z = −1, one at z = 2 − 3, and one at
z = 2 + 3. Two possible factorizations of this product filter are:
1. Orthogonal and non-symmetric,

e
H(z) = H(z) =
1 √ √ √ √
= √ (1 + 3) + (3 + 3)z −1 + (3 − 3)z −2 + (1 − 3)z −3 .
4 2
2. Biorthogonal and symmetric,

1
H(z) = √ z + 2 + z −1 ,
2 2
1
e
H(z) = √ −z 2 + 2z + 6 + 2z −1 − z −2 .
4 2
3.6 Notes
There are two books that we direct the reader to, where the wavelet theory
is started from the study of filter banks. These are Wavelets and Filter
Banks by Nguyen and Strang [27], and Wavelets and Subband Coding by
Kovacevic and Vetterli [30]. These books are appropriate for an engineer or
an undergraduate student with a background in signal processing.
Chapter 4
Multiresolution Analysis
This chapter is devoted to the concept of Multi-Resolution Analysis (MRA).

As the name suggests, the basic idea is to analyze a function at different
resolutions, or scales. Wavelets enter as a way to represent the difference
between approximations at different scales.
We begin the chapter with some basic mathematical concepts for conti-
nuous-time signal processing. We then turn to the definition of scaling func-
tions, multiresolution analysis and wavelets. Thereafter, we study orthogonal
and biorthogonal wavelet bases. Finally, we discuss the approximation prop-
erties of scaling functions and wavelets, which provide a way to construct
filter banks.
4.1 Projections and Bases in L2(R)

In this section, we present some mathematical concepts that are indispensable
for a full understanding of multiresolution analysis and wavelets. We do not
always give complete proofs, since this would entail too much mathematical
detail. Thus, we state the key definitions and results, and then we discuss
them suppressing some formal justifications.
The Hilbert Space L2 (R)

For discrete-time signals, we worked within the space ℓ2 (Z) of finite-energy
signals. In continuous time, the energy of a signal f is given by integrating
|f (t)|2 over, for example, the real line. Thus the space of signals in continuous
time with finite energy may be given by the following definition.
49
50 CHAPTER 4. MULTIRESOLUTION ANALYSIS
Definition 4.1. We define L2 (R) as the set of all functions f (t) such that
Z ∞
|f (t)|2 dt < ∞.
−∞
This is a linear space equipped with the norm

Z ∞ 1/2
2
kf k = |f (t)| dt .
−∞
Three fundamental properties of the norm are
1. kf k ≥ 0 and kf k = 0 ⇒ f = 0,
2. kcf k = |c| kf k , for c ∈ C,
3. kf + gk ≤ kf k + kgk . (The triangle inequality)
The L2 -norm can be used as an error measure. If fe is an approximation

of f , one way to quantify the error is kf − fek. We say that a sequence
(fn ) of functions converge (in L2 ) to the limit f , fn → f as n → ∞, if
kfn − f k → 0 as n → ∞. In other words, the functions fn become better
and better approximations to f (in L2 ).
The L2 -norm is induced by the scalar product
Z ∞
(4.1) hf, gi = f (t)g(t) dt,
−∞
p
that is, we have kf k = hf, f i. That the integral (4.1) is finite is guaranteed
by the Cauchy-Schwarz inequality
|hf, gi| ≤ kf k kgk .
The existence of a scalar product makes it possible to talk about orthog-

onality; two functions f and g are said to be orthogonal if hf, gi = 0.
Remark. The space L2 (R) endowed with the scalar product is complete: if
we have a Cauchy sequence, kfn − fm k → 0 as m, n → ∞, then this sequence
converges in L2 (R) to a limit f : kfn − f k → 0 as n → ∞. A normed vector
space with a scalar product, which is also complete, is termed a Hilbert space.
Thus, L2 (R) is a Hilbert space, as well as the spaces Rn , Cn , and ℓ2 (Z).
4.1. PROJECTIONS AND BASES IN L2 (R) 51
The space L2 (R) contains all physically realizable signals. It is also the
natural setting for the continuous-time Fourier transform:
Z ∞
b
F f (ω) = f (ω) = f (t)e−iωt dt.
−∞
For the Fourier transform the Parseval formula holds

Z ∞ Z ∞
b
|f(ω)| 2
dω = 2π |f (t)|2 dt,
−∞ −∞
b
or kF f k = (2π)1/2 kf k. The quantity |f(ω)| 2
/(2π) can then be interpreted
as the energy density at frequency ω. Integrating this energy density over
all frequencies gives the total energy of the signal, according to Parseval’s
formula. We finally remark that Parseval’s formula is a special case of the
seemingly more general Plancherel’s formula
Z ∞ Z ∞
b g(ω) dω = 2π
f (ω)b f (t)g(t) dt,
−∞ −∞
or hfb, gbi = 2πhf, gi.
Closed Subspaces and Projections

We will consider subspaces of signals in the space L2 (R). A typical example
is the class of band-limited signals, i.e., signals f with fb(ω) = 0 for |ω| > Ω,
where Ω is some fixed cut-off frequency.
A linear subspace V is closed if fn ∈ V , fn → f implies f ∈ V . The
space of band-limited functions is closed. For closed subspaces it is possible
to define a projection operator onto that space. The idea behind projections
is to approximate an arbitrary signal f with a signal v in the subspace V .
We want the optimal approximation, which minimizes kf − vk, v ∈ V . A
unique minimizing w in V can be shown to exist. This justifies the following
definition.
Definition 4.2. The (orthogonal) projection of f onto the closed subspace

V is the unique w ∈ V such that
kf − wk ≤ kf − vk , for all v ∈ V.
The projection of f onto V is denoted PV f .

V⊥
W

PV⊥ f f
* w f
*
6
PV f − f
-?

-
V V
PV f v

Figure 4.1: Projections in two dimensions.
A useful characterization of the projection PV f is that f − PV f is orthog-

onal to V :
hf − PV f, vi = 0, for all v ∈ V.
This may be illustrated by projections in the plane as in Figure 4.1. If we
define the orthogonal complement to V as
V ⊥ = {u ∈ L2 (R) : hu, vi = 0, for all v ∈ V },
we have the orthogonal decomposition f = PV f + PV ⊥ f .
We will sometimes use the full term orthogonal projection, since there are
other (oblique) projections. If W is any closed subspace such that every f
can be uniquely decomposed as f = v + w, where v ∈ V and w ∈ W , we say
that v is the projection of f onto V along W (see Figure 4.1).
Example 4.1. The orthogonal projection onto the space V of band-limited
functions, is given by removing everything above the frequency Ω:
(
fb(ω) if |ω| < Ω,
(4.2) Pd
V f(ω) =
0 otherwise.
We leave it as an exercise to verify this.
Riesz Bases
The notion of a basis in linear spaces extends from finite dimensions and
ℓ2 (Z) to L2 (R). We say that a collection {ϕk }k∈Z of functions is a basis for
a linear subspace V if any function f ∈ V can be written uniquely as
X
(4.3) f= ck ϕ k .
k
4.1. PROJECTIONS AND BASES IN L2 (R) 53
We also say that V is spanned by the functions ϕk . The sum (4.3) should
be interpreted as the limit of finite sums when the number of terms goes to
infinity. More precisely, kf − sK k → 0 as K → ∞, where sK is the finite
sum 1
K
X
sK = ck ϕ k .
k=−K
In other words, the energy of f − sK goes to zero when more and more terms
in the sum are added. A fundamental fact about the scalar product that we
will use throughout the book is:
X X
(4.4) h ck ϕk , gi = ck hϕk , gi.
k k
The proof is outlined in Exercise 4.4.

The numbers (ck ) are the coefficients of f in the basis {ϕk }. The compu-
tation of these coefficients from f is called the analysis of f in the basis {ϕk }.
The reconstruction of f from (ck ) is referred to as synthesis. We want anal-
ysis and synthesis in the basis {ϕk } to be numerically stable and therefore
make the following definition:
Definition
P 4.3. A basis {ϕk } of a subspace V is said to be a Riesz basis if,
for f = k ck ϕk ,
X
A kf k2 ≤ |ck |2 ≤ B kf k2 ,
k
holds, where 0 < A ≤ 1 ≤ B are constants which do not depend on f .

To motivate why this implies numerical stability, we consider an approx-
imation fe of a signal f . If the signal itself has coefficients (ck ) and its
approximation has coefficients (e
ck ), we have
X
Akf − fek2 ≤ ck |2 ≤ Bkf − fek2 .
|ck − e
k
Small errors in the signal will then give small errors in the coefficients and
vice versa, provided that A−1 and B are of moderate size. A perhaps more
relevant result involves relative errors,
r r
kf − fek B kc − eck kc − e
ck e
B kf − fk
≤ and ≤ .
kf k A kck kck A kf k
1
Strictly speaking, we should have used two indices K1 and K2 that independently go
to infinity.
p
Here, kck is the ℓ2 (Z)-norm. The number B/A has a name: the condition
number . It gives an upper bound on how much relative errors can grow when
passing between f and its coefficients (ck ). Since we always have A ≤ B, the
condition number must be at least one. The optimal case occurs when the
condition number is 1, A = B = 1. We then have an orthonormal (ON) basis.
For ON-bases we have hϕk , ϕl i = δk,l , where we have used the Kronecker delta
symbol
(
1 if k = l,
δk,l =
0 otherwise.
Taking scalar products with ϕl in (4.3) gives cl = hf, ϕl i and thus every
f ∈ V can be written as
X
(4.5) f= hf, ϕk iϕk .
k
For the orthogonal projection PV onto V , it is easy to show that, for any
f ∈ L2 (R), hPV f, ϕk i = hf, ϕk i. We then have
X
PV f = hf, ϕk iϕk .
k
There is a generalization of the formula (4.5) for Riesz bases. We then

assume a dual Riesz basis {ϕ ek } such that hϕk , ϕ
el i = δk,l . The bases {ϕk }
ek } are said to be biorthogonal . In this biorthogonal case,
and {ϕ
X
f= ek iϕk .
hf, ϕ
k
We may define a projection operator (not orthogonal) through

X
PV f = ek iϕk .
hf, ϕ
k
If we denote by Ve the linear space spanned by the dual basis functions ϕ ek ,

PV is the projection onto V along V .e ⊥
ek }. But if we
In general, there is no unique choice of the dual basis {ϕ
require that the linear space spanned by the dual basis equals V , there is
just one such basis. This is, for instance, the case when V = L2 (R).
4.2. SCALING FUNCTIONS AND APPROXIMATION 55
Exercises 4.1
4.1. Show that the set of band-limited functions is a subspace of L2 (R). Also
show that it is closed (quite difficult).
4.2. Verify that the projection operator in Example 4.1 is given by (4.2).
Hint: Show that
1 b d
0 = hf − PV f, vi = h f − PV f , b
vi,
2π
for each band-limited v.
4.3. Show that if {ϕk } is an orthonormal basis for V , and PV is the orthog-
onal projection onto V , then holds, for any f ,
hPV f, ϕk i = hf, ϕk i.
4.4. Prove (4.4). Hint: Let f be the sum in (4.3) and let sK be the finite
sum. First show that
K
X
hsK , gi = ck hϕk , gi.
k=−K
which is straightforward. Then follows

XK

hf, gi − ck hϕk , gi = |hf, gi − hsK , gi|

k=−K
= |hf − sK , gi|
≤ kf − sK k kgk → 0 as K → ∞.
4.2 Scaling Functions and Approximation

The central idea in a multiresolution analysis (MRA) is to approximate func-
tions at different scales, or levels of resolution. These approximations are
provided by the scaling function, which is sometimes also called the approx-
imating function.
The Haar Scaling Function

The Haar scaling function is the simplest example; it is defined by
(
1 if 0 < t < 1,
ϕ(t) =
0 otherwise.
With this scaling function, we get piecewise constant approximations. For

instance, we can approximate a function f with a function f1 that is piecewise
constant on the intervals (k/2, (k+1)/2), k ∈ Z. This function can be written
as
X
(4.6) f1 (t) = s1,k ϕ(2t − k).
k
This should be obvious since ϕ(2t − k) equals 1 on (k/2, (k + 1)/2) and 0

otherwise (see figure 4.2). The subspace of functions in L2 (R) of the form
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 1 2 3 0 1 2 3
f1 (t) ϕ(2t − 1) ϕ(2t − 4)
Figure 4.2: A function in V1 and scaling functions at scale 1/2.
(4.6) is denoted V1 . The coefficients (s1,k ) may be chosen as the mean values
of f over the intervals (k/2, (k + 1)/2),
Z (k+1)/2 Z ∞
s1,k = 2 f (t) dt = 2 f (t)ϕ(2t − k) dt.
k/2 −∞
The approximation (4.6) is in fact the orthogonal projection of f onto V1

having the ON basis {21/2 ϕ(2t − k)}. For notational convenience, the factor
21/2 is included in s1,k .
The coefficients (s1,k ) could also be chosen as the sample values of f at
t = k/2, s1,k = f (k/2).
We may approximate f on twice a coarser scale by a function f0 , that is
piecewise constant on the intervals (k, k + 1),
X
(4.7) f0 (t) = s0,k ϕ(t − k).
k
If the coefficients (s0,k ) are choosen as mean values over the intervals
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 1 2 3 0 1 2 3
f0 (t) ϕ(t − 1)
Figure 4.3: A function in V0 and a scaling function at scale 1.
(k, k + 1), it is easy to verify the relation,

1
s0,k = (s1,2k + s1,2k+1 ).
2
The linear space of the functions in (4.7) is denoted V0 . More generally, we
can get piecewise constant approximations on intervals (2−j k, 2−j (k + 1)),
j ∈ Z, with functions of the form
X
fj (t) = sj,k ϕ(2j t − k).
k
The corresponding linear space of functions is denoted Vj . The spaces Vj are

referred to as approximation spaces at the scale 2−j .
The Definition of Multiresolution Analysis

A disadvantage with the Haar scaling function is that it generates discontin-
uous approximations. Another scaling function is the hat function


1 + t if − 1 ≤ t ≤ 0,
ϕ(t) = 1 − t if 0 ≤ t ≤ 1,


0 otherwise.
Taking linear combinations of ϕ(t − k) as in (4.7) gives continuous, piecewise
linear approximations.
The approximation spaces Vj in this case consist of continuous functions
that are piecewise linear on the intervals (2−j k, 2−j (k + 1)). This will often
give better approximations compared to the Haar scaling function.
We now put forward a general framework for the construction of scaling
functions and approximation spaces. This is the notion of a multiresolution
analysis.
2 2
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Figure 4.4: The hat scaling function and a function in the corresponding V0
space.
Definition 4.4. A multiresolution analysis (MRA) is a family of closed sub-

spaces Vj of L2 (R) with the following properties:
1. Vj ⊂ Vj+1 for all j ∈ Z,
2. f (t) ∈ Vj ⇔ f (2t) ∈ Vj+1 for all j ∈ Z,

S
3. j Vj is dense in L2 (R),
T
4. j Vj = {0},
5. There exists a scaling function ϕ ∈ V0 , such that {ϕ(t − k)} is a Riesz

basis for V0 .
The first condition just states that functions in Vj+1 contain more details
than functions in Vj : in a certain sense, we add information when we ap-
proximate a function at a finer scale. The second condition says that Vj+1
approximates functions at twice a finer scale than Vj , and also gives a connec-
tion between the spaces Vj . The fifth conditions requires the approximation
spaces to be spanned by scaling functions. Let us introduce the dilated,
translated, and normalized scaling functions
ϕj,k (t) = 2j/2ϕ(2j t − k).
After a scaling by 2j , it is easy to see that for fixed j, the scaling functions
ϕj,k constitute a Riesz basis for Vj . Thus, every fj ∈ Vj can be written as
X
(4.8) fj (t) = sj,k ϕj,k (t).
k
The reason for the factors 2j/2 is that all scaling functions have equal norms,
kϕj,k k = kϕk.
The remaining two conditions are of a more technical nature. They are
needed to ensure that the wavelets, which we will introduce soon, give a Riesz
basis for L2 (R). The third basically says that any function can be approx-
imated arbitrarily well with a function fj ∈ Vj , if we just choose the scale
fine enough. This is what is meant by density. Finally, the fourth condition
says, loosely speaking, that the only function that can be approximated at
an arbitrarily coarse scale is the zero function.
Properties of the Scaling Function

The definition of MRA imposes quite strong restrictions on the scaling func-
tion. In addition, there are some other properties we want the scaling func-
tion to have. It should be localized in time, which means a fast decay to
zero as |t| → ∞. Preferably, it should have compact support, that is, be zero
outside a bounded interval. This localization property ensures that the coef-
ficients (sj,k ) in the approximations (4.8) contain local information about f .
Note that the scaling functions we have seen so far, the Haar and hat scaling
functions, both have compact support. We further want the scaling function
to have integral one,
Z ∞
(4.9) ϕ(t) dt = 1.
−∞
This condition has to do with the approximation properties of the scaling

function, something we will discuss in more detail in Section 4.7.
Let us now return to the definition of MRA and see what it implies for
the scaling function. Since the scaling function belongs to V0 , it also belongs
to V1 , according to Condition 1. Thus, it can be written as
X
(4.10) ϕ(t) = 2 hk ϕ(2t − k),
k
for some coefficients (hk ). This is the scaling equation . Taking the Fourier
transform of the scaling equation gives us (Exercise 4.10)
(4.11) b
ϕ(ω) b
= H(ω/2)ϕ(ω/2),
where
X
H(ω) = hk e−ikω .
k
P
b = 1, we get H(0) = hk = 1, and we see that
Letting ω = 0 and using ϕ(0)
the coefficients (hk ) may be interpreted as an averaging filter. In fact, as we
will later see, also H(π) = 0 holds, and thus H is a lowpass filter. It can
be shown that the scaling function is uniquely defined by this filter together
with the normalization (4.9). As a matter of fact, repeating (4.11) and again
b = 1 yields (under certain conditions) the infinite product formula
using ϕ(0)
Y
(4.12) b
ϕ(ω) = H(ω/2j ).
j>0
The properties of the scaling function are reflected in the filter coefficients
(hk ), and scaling functions are usually constructed by designing suitable fil-
ters.
Example 4.2. The B-spline of order N is defined by the convolution
SN (t) = χ(t) ∗ . . . ∗ χ(t), (N factors)
where χ(t) denotes the Haar scaling function. The B-spline is a scaling
function (Exercise 4.7) for each N. The translated functions SN (t − k) gives
N −2 times continuously differentiable, piecewise polynomial approximations
of degree N − 1. The cases N = 1 and N = 2 corresponds to the Haar and
the hat scaling functions, respectively. When N grows larger, the scaling
functions become more and more regular, but also more and more spread
out. We will describe wavelets based on B-spline scaling functions in Chapter
8.
Example 4.3. The sinc function
sin πt
ϕ(t) = sinc t = ,
πt
is another scaling function (Exercise 4.8). It does not have compact support,
and the decay as |t| → ∞ is very slow. Therefore, is it not used in practice.
It has interesting theoretical properties though.
It is in a sense dual to the Haar scaling function, since its Fourier trans-
form is given by the box function
(
1 if − π < ω < π,
b
ϕ(ω) =
0 otherwise.
It follows that every function in V0 is band-limited with cut-off frequency
π. In fact, the Sampling Theorem (cf. Section 2.7) states that every such
band-limited function f can be reconstructed from its sample values f (k) via
X
f (t) = f (k) sinc(t − k).
k
4.3. WAVELETS AND DETAIL SPACES 61
Thus, V0 equals the set of band-limited functions with cut-off frequency π.

The Vj spaces, by scaling, become the set of functions band-limited to the
frequency bands (−2j π, 2j π).
Exercises 4.2
4.5. Show that f (t) ∈ Vj ⇔ f (2−j t) ∈ V0 .
4.6. Verify that {ϕj,k }k∈Z is a Riesz basis for Vj for fixed j, given that
{ϕ0,k }k∈Z is a Riesz basis for V0 .
4.7. Derive the scaling equation for the spline scaling functions. Hint: Work
in the Fourier domain and show that ϕ(ω) = H(ω/2)ϕ(ω/2), where H(ω) is
2π-periodic. Calculate the coefficients (hk ).
4.8. Derive the scaling equation for the sinc scaling function. Hint: Work
as in the previous problem.
4.9. Show that the scaling equation can be written more generally as
√ X
ϕj,k = 2 hl ϕj+1,l+2k .
l
4.10. Prove identity (4.11).
b = 1?
4.11. Verify (4.12). Why do we need ϕ(0)
4.3 Wavelets and Detail Spaces

We will now turn to a description of the difference between two successive
approximation spaces in a multiresolution analysis: the wavelet or detail
spaces.
The Haar Wavelet

A multiresolution analysis allows us to approximate functions at different
levels of resolution. Let us look at the approximations of a function f at two
consecutive scales, f0 ∈ V0 and f1 ∈ V1 . The approximation f1 contains more
details than f0 and the difference is the function
d0 = f1 − f0 .
We return again to the Haar system. Here, f1 is piecewise constant on the

intervals (k/2, (k + 1)/2) with values s1,k , and f0 is piecewise constant on
(k, k + 1) with values s0,k that are pairwise mean values of the s1,k :s
1
(4.13) s0,k = (s1,2k + s1,2k+1 ).
2
In Figure 4.5, we have plotted the function d0 in the Haar system. It
0.5 1.5
0.5
0 0
−0.5
−1
−0.5 −1.5
0 1 2 3 0 1 2 3
d0 (t) ψ(t − 1)
Figure 4.5: The Haar wavelet and a function in the corresponding W0 space
is piecewise constant on the intervals (k/2, (k + 1)/2) so d0 ∈ V1 . Let us

consider the interval (k, k + 1) in Figure 4.6.
Denote d0 :s value on the first half by w0,k . The value on the second half
is then −w0,k and they both measure the deviation of f1 from its mean value
on the interval (k, k + 1):
1
(4.14) w0,k = (s1,2k − s1,2k+1 ).
2
s1,2k
s0,k
s1,2k+1 w0,k
- - -
k k+1/2 k+1 k k+1/2 k+1 k k+1/2 k+1
−w0,k
Figure 4.6: Two scales and their difference

The Haar wavelet is defined by



1 if 0 < t < 1/2,
ψ(t) = −1 if 1/2 < t < 1,


0 otherwise.
Using this wavelet we can write d0 as
X
(4.15) d0 (t) = w0,k ψ(t − k),
k
with wavelet coefficients (w0,k ) at scale 1 = 20 . We want to generalize this

to any MRA, and therefore make the following definition.
Definition 4.5. For a general MRA, a function ψ is said to be a wavelet if
the detail space W0 spanned by the functions ψ(t − k) complements V0 in V1 .
By this we mean that any f1 ∈ V1 can be uniquely written as f1 = f0 + d0 ,
where f0 ∈ V0 and d0 ∈ W0 . We write this formally as V1 = V0 ⊕ W0 . Finally,
we require the wavelets ψ(t − k) to be a Riesz basis for W0 .
Note that the space W0 need not be unique. However, the decomposition
f1 = f0 +d0 is unique, once the wavelet ψ (the space W0 ) is chosen. However,
if we require W0 to be orthogonal to V0 , then W0 is uniquely determined.
Example 4.4. When the scaling function is the hat function the wavelet
can be chosen as the function in V1 with values at the half-integers
f (0) = 3/2,
f (1/2) = f (−1/2) = −1/2,
f (1) = f (−1) = −1/4.
This wavelet is shown in Figure 4.7. In this case, V0 and W0 are not orthog-
onal.
Example 4.5. For the sinc scaling function from Example 4.3, we choose
the wavelet as
(
b 1 if π < |ω| < 2π,
ψ(ω) =
0 otherwise.
It is easy to see that ψ(t) = sinc 2t − sinc t (Exercise 4.14). This is the
sinc wavelet. The space W0 will be the set of functions that are band-
limited to the frequency band π < |ω| < 2π. More generally, the space
Wj will contain all functions that are band-limited to the frequency band
2j π < |ω| < 2j+1 π.
1.5 2
1.5
1
1
0.5 0.5
0
0
−0.5
−0.5 −1
−2 −1 0 1 2 −2 −1 0 1 2
Figure 4.7: The hat scaling function and a piecewise linear wavelet spanning
W0 .
Properties of the Wavelet

The function ψ is sometimes called the mother wavelet. As with the scaling
function, we want it to be localized in time. We also want it to have integral
zero
Z
ψ(t) dt = 0,
b = 0, and it thus has to oscillate. This is also referred to as the cancel-

or ψ(0)
lation property and it is connected to having wavelets represent differences.
The function ψ will be a wave that quickly emerges and dies off, hence the
term wavelet (small wave). Since ψ ∈ V1 it can be written as
X
(4.16) ψ(t) = 2 gk ϕ(2t − k),
k
for some coefficients (gk ). This is the wavelet equation . A Fourier transform
gives
(4.17) b
ψ(ω) b
= G(ω/2)ϕ(ω/2),
where
X
G(ω) = gk e−ikω .
b P
b
Using ϕ(0) = 1 and ψ(0) = 0, we get that G(0) = gk = 0. Thus the
coefficients (gk ) can be interpreted as a difference filter. Later we will see
that also G(π) = 1 holds, and G is in fact a highpass filter. The wavelet and
all its properties are determined by this filter, given the scaling function.
The Wavelet Decomposition

The dilated and translated wavelets ψj,k are defined as
ψj,k (t) = 2j/2 ψ(2j k − t).
The detail spaces Wj are defined as the the set of functions of the form
X
(4.18) dj (t) = wj,k ψj,k (t).
k
From Definition 4.5 it follows that any f1 ∈ V1 can be decomposed as f1 =

f0 + d0 , where f0 ∈ V0 is an approximation at twice a coarser scale, and
d0 ∈ W0 contains the lost details. After a scaling with 2j , we see that a
function fj+1 ∈ Vj+1 can be decomposed as fj+1 = fj + dj , where fj ∈ Vj
and dj ∈ Wj , that is, Vj+1 = Vj ⊕ Wj .
Starting at a finest scale J, and repeating the decomposition fj+1 = fj +dj
until a certain level j0 , we can write any fJ ∈ VJ as
fJ (t) = dJ−1 (t) + dJ−2 (t) + . . . dj0 (t) + fj0 (t)

J−1 X
X X
= wj,k ψj,k (t) + sj0 ,k ϕj0 ,k (t).
j=j0 k k
We can express this in terms of approximation and detail spaces as
VJ = WJ−1 ⊕ WJ−2 ⊕ . . . ⊕ Wj0 ⊕ Vj0 .
Using the fourth condition in the definition of MRA one can show that
fj0 goes to 0 in L2 when j0 → −∞. The third condition now implies that,
choosing J larger and larger, we can approximate a function f with approx-
imations fJ that become closer and closer to f . Letting J → ∞ therefore
gives us the wavelet decomposition of f
X
(4.19) f (t) = wj,k ψj,k (t).
j,k
We have thus indicated how to prove that {ψj,k } is a basis for L2 (R). How-
ever, it still remains to construct the highpass filter G determining the mother
wavelet ψ.
The decomposition Vj+1 = Vj ⊕ Wj above is not unique. There are many
ways to choose the wavelet ψ and the corresponding detail spaces Wj . Each
such choice corresponds to a choice of the highpass filter G. In the next
section, we will describe a special choice, which gives us an orthogonal system
or orthogonal wavelet basis. This corresponds to choosing H and G as filters

in an orthogonal filter bank. Thereafter we discuss the more general case,
biorthogonal systems, which corresponds to biorthogonal filter banks.
We conclude this section by looking at the wavelet decomposition in the
frequency domain. We saw earlier, that for the sinc scaling function, the
Vj spaces are the spaces of functions band-limited to the frequency bands
(0, 2j π) (actually (−2j π, 2j π), but we ignore negative frequencies to simplify
the discussion). The detail spaces are the sets of band-limited functions in
the frequency bands (2j π, 2j+1π). The wavelet decomposition can in this case
be seen as a decomposition of the frequency domain as in Figure 4.8. For
V−3 W−3 W−2 W−1

-
0 π/2 π
Figure 4.8: The wavelet decomposition in the frequency domain.
other wavelets, this frequency decomposition should be interpreted approx-

imately, since the wavelets ψj,k have frequency contents outside the band
(2j π, 2j+1π). In other words, the frequency bands are overlapping. The
amount of overlapping will depend on the mother wavelet.
Exercises 4.3
4.12. Verify that the wavelet in Example 4.4 will span the difference between
V1 and V0 .
4.13. Verify that, for j 6= 0, each function fj+1 ∈ Vj+1 can be written as
fj+1 = fj + dj where dj ∈ Wj . (For j = 0 the statement is true by definition.)
4.14. Verify that ψ(t) = sinc 2t − sinc t, when ψ is the sinc wavelet in Ex-
ample 4.3. Hint: Work in the Fourier domain.
4.4 Orthogonal Systems

In this section, the wavelet space W0 is to be orthogonal to the approximation
space V0 . This means that we will get an orthogonal decomposition of V1
into V1 = V0 ⊕ W0 , and ultimately we will arrive at an orthonormal wavelet
basis in L2 (R).
4.4. ORTHOGONAL SYSTEMS 67
Orthogonality Conditions
The first requirement is that the scaling functions ϕ(t − k) constitute an
orthogonal basis for V0 , that is,
Z ∞
ϕ(t − k)ϕ(t − l) dt = δk,l .
−∞
Using the scaling equation (4.10) we can transform this to a condition on the
coefficients (hk ) (see Exercise 4.15):
X
(4.20) hl hl+2k = δk /2.
l
We also require the wavelets ψ(t − k) to form an orthogonal basis for W0 ,

Z ∞
ψ(t − k)ψ(t − l) dt = δk,l .
−∞
Expressed in the coefficients (gk ) this becomes

X
(4.21) gl gl+2k = δk /2.
l
Finally, functions in V0 must be orthogonal to functions in W0 which is

formally written as V0 ⊥ W0 . (Figure 4.9). Then the scaling functions
Wj
dj *
fj+1
6

- Vj
fj
Figure 4.9: Vj ⊥ Wj
ϕ(t − k) have to be orthogonal to each wavelet ψ(t − l):

Z ∞
ϕ(t − k)ψ(t − l) dt = 0, for all k and l.
−∞
For the filter coefficients this means that (see Exercise 4.17)
X
(4.22) hm+2k gm+2l = 0.
m
All of the above orthogonality conditions can be transposed to an arbi-

trary scale. Using the scalar product notation we have
hϕj,k , ϕj,l i = δk,l ,

hψj,k , ψj,l i = δk,l ,
hϕj,k , ψj,l i = 0.
In other words, {ϕj,k } is an orthonormal basis for Vj , {ψj,k } for Wj , and

Vj ⊥ Wj . The approximation fj ∈ Vj of a function f can be chosen as the
orthogonal projection onto Vj , which we denote by Pj . It can be computed
as
X
fj = Pj f = hf, ϕj,k iϕj,k .
k
The detail dj becomes the projection of f onto Wj ,

X
dj = Qj f = hf, ψj,k iψj,k .
k
In terms of the filter functions H(ω) and G(ω), (4.20) - (4.22) becomes
|H(ω)|2 + |H(ω + π)|2 = 1,

(4.23) |G(ω)|2 + |G(ω + π)|2 = 1,
H(ω)G(ω) + H(ω + π)G(ω + π) = 0.
√
As we will see in the next section, these are the conditions on 2 H and
√
2 G to be low- and highpass filters in an orthogonal filter bank2 . This is
a crucial observation, since construction of orthogonal scaling functions and
wavelets becomes equivalent with the construction of orthogonal filter banks.
Example 4.6. We now give a concrete example on how to construct orthog-

onal wavelets from an orthogonal filter bank. We start with the Daubechies
product filter with N = 5 (see Section 3.5). We then use the minimum
phase orthogonal factorization, and this gives us the so called Daubechies-10
orthogonal filter bank. The corresponding scaling function and wavelet can
2
√
Note that the factor 2 is included in the filters in Chapter 3. Then the right-hand
sides of 4.23 are replaced by 2
4.4. ORTHOGONAL SYSTEMS 69
now be computed numerically, using the cascade algorithm, to be explained in

Chapter 8. We have plotted the Daubechies-10 scaling function and mother
wavelet in Figure 4.10. Note that they are smoother than the hat scaling
function and the corresponding wavelet (and, of course, smoother than in
the Haar case). The price is the increased support width. In Chapter 8, we
will describe most wavelet bases that have been used in applications.
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
−2 0 2 4 6 8 −4 −2 0 2 4 6
Figure 4.10: The Daubechies-10 scaling function and wavelet
Characterizations in the Orthonormal Case

As we now will see, orthogonal filter bank does not always lead to an or-
thonormal wavelet basis.
First of all, the infinite product (4.12) has to converge. If it does converge,
it need not produce a function in L2 . Consider, for example, the most simple
low-pass filter hk = δk , the identity filter. Together with the high-pass filter
gk = δk−1 , this gives an orthogonal filter bank. It is called the lazy filter
bank, since it does not do anything except splitting the input signal into
even- and odd-indexed samples. The filter function is H(ω) = 1, and the
infinite product formula implies that ϕ(ω)b = 1. The scaling function is then
the Dirac delta distribution, which is not even a function. Even if the prod-
uct yields a function in L2 , we have to make sure that also conditions 3 and
4 in the definition of MRA (Definition 4.4) are satisfied. An additional sim-
ple sufficient condition on the low-pass filter H(ω), determining the scaling
function, to satisfy 3 and 4 is H(ω) > 0 for |ω| ≤ π/2 (and ϕ̂(0) = 1).
Now we are about to give precise charactarizations on scaling functions
and wavelets in an orthogonal MRA. All equations are to hold almost every-
where (a.e.). We advice readers not familiar with this concept, to interpret
almost everywhere as “everywhere except a finite number of points”.
Theorem 4.1. The following three conditions (a.e.) on a function ϕ with

kϕk = 1 are equivalent to ϕ being a scaling function for an orthogonal
multiresolution analysis of L2 (R).
X
1. b + 2πk)|2 = 1,
|ϕ(ω
k
b −j ω) = 1,
2. lim ϕ(2
j→∞
b
3. ϕ(2ω) b
= H(ω)ϕ(ω), for some 2π- periodic function H.
The first condition is equivalent to the orthonormality of the integer trans-

lates ϕ(t − k). The second is essentially the approximation property of the
MRA, and the third is the scaling equation.
Theorem 4.2. The following three conditions (a.e.) on a function ψ with

kψk = 1 are equivalent to having an orthonormal basis ψj,k connected to a
multiresolution analysis of L2 (R).
X 2

1. b j
ψ(2 ω) = 1,
j
X
2. ψ(2 b j (ω + 2kπ)) = 0,
b j ω)ψ(2 for all odd integers k,
j≥0
X X 2

3. b j
ψ(2 (ω + 2kπ)) = 1.
j≥1 k
The first two conditions are equivalent to the orthonormal basis property
of ψj,k in L2 (R), and the third then relates to a corresponding scaling func-
tion. Broadly speaking, the first condition means that linear combinations
of ψj,k are dense in L2 (R), and the second then relates to the orthogonality.
Exercises 4.4
4.15. Verify that the condition on the scaling functions ϕ(t − k) to be or-
thogonal implies (4.20). Hint: First, you may assume l = 0 (why?). Then,
4.5. THE DISCRETE WAVELET TRANSFORM 71
show that
Z ∞
δk = ϕ(t)ϕ(t − k) dt
−∞
Z ! !
∞ X X
= 2 hl ϕ(2t − l) 2 hm ϕ(2t − 2k − m) dt
−∞ l m
XZ ∞
=4 hl hm ϕ(2t − l)ϕ(2t − 2k − m) dt
l,m −∞
XZ ∞
=2 hl hm ϕ(t − l)ϕ(t − 2k − m) dt
l,m −∞
X
=2 hl hl+2k .
l
4.16. Show that (4.20) is equivalent to the first equation in (4.23).
4.17. Show that the condition on the scaling functions ϕ(t − k) to be or-
thogonal to the wavelets ψ(t − k) implies (4.22).
4.5 The Discrete Wavelet Transform

We are now about to describe how to compute scaling and wavelet coefficients
(sj0 ,k ) and (wj,k ) in the wavelet decomposition
J−1 X
X X
fJ (t) = wj,k ψj,k (t) + sj0 ,k ϕj0 ,k (t).
j=j0 k k
The computation of the coefficients is done with filter banks. For the sake
of simplicity we derive this connection between MRA’s and filter banks in
the orthogonal case. The biorthogonal case is entirely similar, the notation
just becomes a bit more cumbersome.
We assume that we know the scaling coefficients sJ,k = hf, ϕJ,k i of a
function f at a certain finest scale J. In practice, we usually only have sample
values f (2−J k) available, and we have to compute the scaling coefficients
numerically from these sample values. This is known as pre-filtering. It is
common practice to simply replace the scaling coefficients with the sample
values. The effect of doing so, and other aspects of pre-filtering, will be
treated in Chapter 15.
The Forward Wavelet Transform

Assume that we know the scaling coefficients sj+1,k = hf, ϕj+1,k i at an arbi-
trary scale j + 1. We split fj+1 into a coarser approximation fj and details
dj :
fj+1 (t) = fj (t) + dj (t),
or written out more explicitly

X X X
(4.24) sj+1,k ϕj+1,k (t) = sj,k ϕj,k (t) + wj,k ψj,k (t),
k k k
where sj,k = hf, ϕj,k i and wj,k = hf, ψj,k i. A scalar multiplication on both
sides with ϕj,l together with the orthogonality conditions for scaling functions
and wavelets gives us (Exercise 4.18)
X
sj,k = sj+1,l hϕj+1,l , ϕj,k i
l
From the scaling equation

√ X
ϕj,k = 2 hm ϕj+1,m+2k
m
we obtain
√ X √
hϕj+1,l, ϕj,k i = 2 hm hϕj+1,l , ϕj+1,m+2k i = 2 hl−2k .
m
With a similar calculation for the wavelet coefficients, we have derived the
formulas
√ X √ X
(4.25) sj,k = 2 hl−2k sj+1,l and wj,k = 2 gl−2k sj+1,l .
l l
Scaling and wavelet coefficients at the coarser scale are thus computed by
sending the scaling coefficients at the finer scale through the√analysis part √ of
the orthogonal filter bank with lowpass and highpass filter 2 H and 2 G.
Repeating this recursively, starting with the coefficients (sJ,k ), gives the
wavelet coefficients (wj,k ) for j = j0 , . . . , J − 1, and the scaling coefficients
(sj0 ,k ) at the coarsest scale. This recursive scheme is called the Fast Forward
Wavelet Transform. If we start with N scaling coefficients at the finest scale,
the computational effort is roughly 4MN operations, where M is the filter
length. Compare this with the FFT algorithm, where the computational
effort is 2N log N operations.
4.5. THE DISCRETE WAVELET TRANSFORM 73
H∗ ↓2 sJ−2
H∗ ↓2
sJ G∗ ↓2 wJ−2
G∗ ↓2 wJ−1
Figure 4.11: The Forward Wavelet Transform as an iterated filter bank
The forward wavelet transform amounts to computing

J−1 X
X X
f (t) ≈ fJ (t) = hf, ψj,k iψj,k (t) + hf, ϕj0 ,k iϕj0 ,k (t).
j=j0 k k
Letting j0 → −∞ and J → ∞ gives us

X
f (t) = hf, ψj,k iψj,k (t),
j,k
an expansion of f in the ON-basis {ψj,k }.
The Inverse Wavelet Transform

There is, of course, an inverse wavelet transform as well. Since we compute
sj = (sj,k )k∈Z and wj = (wj,k )k∈Z from sj+1 via the analysis part of a filter
bank, sj+1 can be reconstructed by feeding sj and wj into the synthesis part,
√ X
sj+1,k = 2 (hk−2l sj,k + gk−2l wj,k ).
l
From sj0 and wj0 , we can thus reconstruct sj0 +1 , which together with wj0 +1
gives us sj0 +2 and so on, until we finally arrive at the sJ . This is the Fast
Inverse Wavelet Transform, see Figure 4.12. Note that we use neither the
scaling function nor the wavelet explicitly in the forward or inverse wavelet
transform, only the orthogonal filter bank.
To recover the sample values f (2−J k) from the fine-scale coefficients (sJ,k )
we need to do a post-filtering step. This will be be discussed in Chapter 15.
Exercises 4.5
4.18. Verify all the steps in the derivation of the filter equations (4.25).
sj 0 ↑2 H
+ ↑2 H
wj 0 ↑2 G + sj0 +2
wj0 +1 ↑2 G
Figure 4.12: The Inverse Wavelet Transform as an iterated filter bank
4.19. Assume that {ψj,k } is an orthonormal basis in L2 (R). Put Ψ(t) =

21/2 ψ(2t), and thus Ψj,k = ψj+1,2k . Show that {Ψj,k } becomes an orthonormal
system, which is not a (Riesz) basis. Hint: It misses all ψj,2l+1 , at least. That
is, the functions ψj,2l+1 all have coefficients 0. Why does this contradict {Ψj,k }
being a Riesz basis?
4.6 Biorthogonal Systems

Biorthogonality Conditions
In a biorthogonal system we also have a dual MRA with scaling functions
ϕej,k and wavelets ψej,k , and corresponding approximation and detail spaces
Vej and W
fj . The dual scaling function ϕ e satisfies a scaling equation,
X
(4.26) e =2
ϕ(t) e e − k).
hk ϕ(2t
For the dual mother wavelet holds the dual wavelet equation
X
(4.27) e =2
ψ(t) e e − k).
gk ϕ(2t
We have the following biorthogonality conditions on the scaling functions and

wavelets:
hϕj,k , ϕej,l i = δk,l ,

hψj,k , ψej,l i = δk,l ,
hϕj,k , ψej,l i = 0,
ej,k , ψj,l i = 0.
hϕ
The two latter conditions mean that Vj ⊥ W fj and Vej ⊥ Wj respectively (see
Figure 4.13). After some calculations similar to those in the orthogonal case
4.6. BIORTHOGONAL SYSTEMS 75
fj
W Wj

fj+1
d *

j

XXX fj
X XX
- Vj
XX
XXX
XX
Vej
fj and Vej ⊥ Wj
Figure 4.13: Vj ⊥ W
we arrive at the following biorthogonality conditions for the filter functions

e
H(ω), H(ω), e
G(ω) and G(ω):
e
H(ω)H(ω) + e + π)H(ω + π)
H(ω = 1,
e
G(ω)G(ω) + e + π)G(ω + π)
G(ω = 1,
(4.28) e e + π)H(ω + π)
G(ω)H(ω) + G(ω = 0,
e
H(ω)G(ω) + e + π)G(ω + π)
H(ω = 0.
If we introduce the modulation matrix

H(ω) H(ω + π)
M(ω) = ,
G(ω) G(ω + π)
f(ω), we can write Equation (4.28) in the more compact

and similarly for M
form
t
f
M(ω)M(ω) = I.
f(ω)t M(ω) = I, and written out this becomes

We also must have M
e
H(ω)H(ω) e
+ G(ω)G(ω) = 1,
(4.29) e e
H(ω)H(ω + π) + G(ω)G(ω + π) = 0.
Hence, the four equations in (4.28) can actually be reduced to two. The
latter equations are the perfect reconstruction conditions (3.8)-(3.9) for filter
banks,
√ √transformed
√ to the
√ Fourier domain (Exercise 4.21). It means that
e
2 H, 2 H, 2 G, and 2 G e are high- and lowpass filters in a biorthogonal
filter bank.
We will now derive a connection between the low- and highpass filters.
Cramer’s rule gives us (see Exercise 4.22)
e G(ω + π) e H(ω + π)
(4.30) H(ω) = and G(ω) = ,
∆(ω) ∆(ω)
where ∆(ω) = det M(ω). In practice we often want finite filters, which
corresponds to wavelets and scaling functions having compact support. Then
one can show that ∆(ω) has the form ∆(ω) = Ce−Liω for some odd integer
L and constant C with |C| = 1. Different choices give essentially the same
wavelet: the only thing that differs is an integer translation and the constant
C. A common choice is C = 1 and L = 1 which gives the alternating flip
construction (cf. Equation 3.10):
(4.31) e + π) and G(ω)

G(ω) = −e−iω H(ω e = −e−iω H(ω + π),
or, expressed in the filter coefficients,
gk = (−1)k e
h1−k gk = (−1)k h1−k .
and e
This leaves us with the task of designing proper lowpass filters that satisfy
the first equation in (4.28). It is then easy to verify (Exercise 4.23) that the
remaining equations are satisfied.
We mentioned above that finite filters correspond to wavelets and scal-
ing functions having compact support. This is based on the Paley-Wiener
theorem. Taking this as a fact, and assuming that the lowpass filters have
lengths M and M f, it is straightforward to show that ϕ and ϕe are zero outside
intervals of length M − 1 and M − 1 (Exercise 4.24). Further, both ψ and ψe
f
are supported on intervals of length (M + M f − 2)/2.
The Discrete Biorthogonal Wavelet Transform

The computation of the wavelet coefficients in a biorthogonal system does
not differ much from the orthogonal case, the only difference being that we
use a biorthogonal filter bank. The derivation is essential the same, and we
therefore leave it entirely as an exercise to the reader.
In the biorthogonal case we start from the following approximation at
scale 2−J :
X
fJ (t) = eJ,k iϕJ,k (t).
hf, ϕ
k
4.6. BIORTHOGONAL SYSTEMS 77
Note that we take inner products with the dual scaling functions ϕ eJ,k . This
is a non-orthogonal projection onto VJ , along VJ .e ⊥
An approximation fj+1 can be split into two parts, a coarser approxima-

tion fj ∈ Vj and details dj ∈ Wj :
X X
fj+1 (t) = fj (t) + dj (t) = sj,k ϕj,k (t) + wj,k ψj,k (t).
k k
The scaling coefficients sj,k = hf, ϕej,k i and wavelet coefficients wj,k = hf, ψej,k i
are computed by feeding the scaling coefficients sj+1,k = hf, ϕ ej+1,k i into the
biorthogonal filter bank:
√ X √ X
(4.32) sj,k = 2 ehl−2k sj+1,k and wj,k = 2 e
gl−2k sj+1,k .
l l
Repeating this recursively, starting with sJ , until a certain coarsest level j0 ,

we end up with
J−1 X
X X
(4.33) fJ (t) = hf, ψej,k iψj,k (t) + ej0 ,k iϕj0 ,k (t).
hf, ϕ
j=j0 k k
Letting j0 → −∞ and J → ∞, we get

X
f (t) = hf, ψej,k iψj,k (t).
j,k
To reconstruct the fine-scale coefficients sJ , we feed recursively scaling co-

efficients sj and wavelet coefficients wj into the synthesis part of the biorthog-
onal filter bank,
√ X
sj+1,k = 2 (hk−2l sj,k + gk−2l wj,k ).
l
Exercises 4.6
4.20. Derive one of the equations in (4.28), using the same kind of calcu-
lations as in the orthogonal case. Start with deriving the corresponding
identity for the filter coefficients.
4.21. Verify that (4.21) are the perfect reconstruction conditions for filter
banks.
4.22. Verify (4.30).

4.23. Show that the alternating flip construction leads to a biorthogonal

filter bank, provided that the first equation in (4.28) is satisfied.
4.24. Let the lowpass filters be FIR with filter lengths M and Mf. Assume
e Use the scaling equations to
e are zero outside [0, A] and [0, A].
that ϕ and ϕ
show that A = M − 1 and A e=M f − 1. Then, use the wavelet equations to
show that both ψ and ψe are zero outside [0, (M + M f − 2)/2].
4.25. Derive the filter equations (4.32).
4.7 Approximation and Vanishing Moments

In this section, we will focus on the approximation properties of scaling func-
tions and wavelets. Closely related to this is the property of the wavelets
having vanishing moments, which produces the capability of wavelets to com-
press signals.
Approximation Properties of Scaling Functions

The scaling functions are to be used to approximate general functions. There-
fore they need to have certain approximation properties.
It turns out that the scaling function ϕ has the approximation property
(cf. also Chapter 12)
kf − Pj f k ≤ C 2−jα kD α f k
for α ≤ N − 1, when it reproduces polynomials up to order N − 1:

X
(4.34) k α ϕ(t − k) = tα , for α = 0, . . . , N − 1.
k
This means that3 tα ∈ Vj , for each j and α = 0, . . . , N − 1. The integer N

is the order of the multiresolution analysis.
Vanishing Moments
The polynomial reproducing property (4.34) is maybe not so interesting in
its own right, but rather since it is connected to the dual wavelets having
vanishing moments. If tα ∈ Vj , we then have tα ⊥ W fj , since Vj ⊥ W
fj . This
3
With a slight abuse of notation, since tα does not belong to L2 (R)
4.7. APPROXIMATION AND VANISHING MOMENTS 79
means that htα , ψej,k i = 0, for every wavelet ψej,k . Written out more explicitly,
we have
Z
tα ψej,k (t) dt = 0, for n = 0, . . . , N − 1.
We say that the dual wavelets have N vanishing moments. Having N van-
ishing moments can equivalently be stated as the Fourier transform having
a zero of order N at ω = 0,
be
D α ψ(0) = 0, for n = 0, . . . , N − 1.
be be
e ϕ(ω) be e
Using the relation ψ(2ω) = G(ω) and ϕ(0) = 1 we see that G(ω) must
have a zero of order N at ω = 0. From (4.30) it then follows that H(ω) must
be of the form
N
e−iω + 1
(4.35) H(ω) = Q(ω),
2
for some 2π-periodic function Q(ω). The larger N we choose, the sharper
transition between pass-band and stop-band we get.
e
In the same way, we can show that the filter function H(ω) must be of
the form
Ne
e e−iω + 1 e
(4.36) H(ω) = Q(ω),
2
where N e denotes the number of vanishing moments for ψ, or the order of

the dual multiresolution analysis. The factorizations (4.35) and (4.36) are a
e
starting point in the design of filter banks. The functions Q(ω) and Q(ω) then
have to be chosen so that the biorthogonality conditions are satisfied. Also,
different properties of the wavelets depend on this choice. This is discussed
further in Chapter 8.
It is the vanishing moments that give wavelets the compression ability.
We will give a brief explanation here. Let ψej,k be a (fine-scale) wavelet with
N vanishing moments. We may then think of ψej,k as being zero outside a
small interval of size proportional to 2−j . Suppose that the signal has α
continuous derivatives on that interval. Then it can be approximated there
with an (α−1)-degree Taylor polynomial Pα−1 (t) with error of order O(2−jα),
and we get
Z
hf, ψej,k i = f (t)ψej,k (t) dt
Z
= Pα−1 (t)ψej,k (t) dt + O(2−jα)
= O(2−jα),
provided that α ≤ N. So where the signal is smooth, the fine-scale wavelet

coefficients will be almost 0, and need not be stored. We need only keep the
fine-scale coefficients, where the signal has abrupt changes, discontinuities
etc.
Vanishing moments of the dual wavelet ψe are also connected to the reg-
ularity of the mother wavelet ψ. It can be shown that the dual wavelet
must have at least as many vanishing moments as the mother wavelet has
derivatives. More vanishing moments leads to smoother mother wavelets.
The regularity of ψ also depends on the choice of Q. The number of dual
vanishing moments N e for the wavelet ψ is not as important as N, since we
do not take scalar products with the wavelets ψj,k . Further, N e affects the
regularity of the dual wavelets, and this is not as important as the regu-
larity of the wavelets ψj,k , which are used as basis functions in the wavelet
decomposition. Therefore, N is usually chosen larger than N e.
Exercises 4.7
4.26. Assuming that the function H(ω) has a zero of order N at ω = π, show,
using scaling, that D α ϕ(2kπ)
b = 0 for all integers k 6= 0 and 0 ≤ α ≤ N − 1.
αbe
Show further that D ψ(4kπ) = 0 for all integers k and 0 ≤ α ≤ N − 1.
4.27. Show, using Poisson’s summation formula and the previous exercise,
that for 0 ≤ α ≤ N − 1
X
k α ϕ(t − k) = tα
k
when the function H(ω) has a zero of order N at ω = π. The scaling function
ϕ thus reproduces polynomials of degree at most N − 1.
4.28. Prove, using Poisson’s summation formula and scaling, the two iden-
tities
4.8. NOTES 81
X X
(−1)k ψ(t − k) = ψ(t/2 − l + 1/4),
k l
X X
k
(−1) ϕ(t − k) = (−1)l ψ(t − l − 1/2).
k l
4.8 Notes
The idea of approximating an image at different scales, and storing the differ-
ence between these approximations, appeared already in the pyramid algo-
rithm of Burt and Adelson 1983. At the same time, the theory of wavelets had
started to make progress, and several wavelet bases had been constructed,
among others by the French mathematician Yves Meyer. This made the
French engineer Stephane Mallat realize a connection between wavelets and
filter banks. Together with Meyer he formulated the definition of multires-
olution analyses. This connection led to a breakthrough in wavelet theory,
since it gave both new constructions of wavelet bases and fast algorithms.
The Belgian mathematician Ingrid Daubechies constructed the first family
of wavelets within this new framework in 1988, and many different wavelets
have been constructed since.
The overview article by Jawerth and Sweldens [20] is a good start for
further reading and understanding of MRA and wavelets. It also contains
an extensive reference list. The books Ten Lectures on Wavelets by Ingrid
Daubechies [11] and A First Course on Wavelets by Hernandez & Weiss [16],
give a more mathematically complete description. Among other things, they
contain conditions on low- and highpass filters to generate Riesz bases of
wavelets. A detailed discussion about this can also be found in the book by
Strang & Nguyen [27].
Chapter 5
Wavelets in Higher Dimensions
Grey-scale images can be represented by functions of two variables. At each

point (x, y) in the image, f (x, y) is the grey-scale value at that point. In
order to use wavelets for image processing we therefore need to extend the
wavelet transform to functions of several variables.
There are basically two ways of doing this, separable or non-separable
wavelet transforms. Separable transforms are constructed automatically from
one-dimensional transforms, and are in general easier to work with. We will
focus mainly on separable transforms.
However, the separable bases are connected to symmetries under rotations
by 90o only. If greater rotational symmetry is desired, the constructions (non-
separable) become more involved.
5.1 The Separable Wavelet Transform

Separable wavelets are directly constructed from one-dimensional wavelets.
We will only consider two dimensions, but it should be obvious how the
construction extends to higher dimensions. We start with the most simple
case.
The Two-dimensional Haar System

Let us consider a function f1 (x, y) that is piecewise constant on the squares
kx /2 < x < (kx + 1)/2, ky /2 < y < (ky + 1)/2 with values s1,k . Here we
use the multi-index notation k = (kx , ky ) ∈ Z2 ; kx , ky ∈ Z. We can think of
s1,k as the pixel values in a digital grey-scale image. We try to reduce the
amount of information by approximating four neighbouring values with their
83
84 CHAPTER 5. WAVELETS IN HIGHER DIMENSIONS
mean value
1
(5.1) s0,k = (s1,2k + s1,2k+ex + s1,2k+ey + s1,2k+e ),
4
where ex = (1, 0), ey = (0, 1), and e = (1, 1).
y
6 6
2 2
s1,(0,3) s1,(1,3) s1,(2,3) s1,(3,3)
s0,(0,1) s0,(1,1)
s1,(0,2) s1,(1,2) s1,(2,2) s1,(3,2)
1 1
s1,(0,1) s1,(1,1) s1,(2,1) s1,(3,1)
s0,(0,0) s0,(1,0)
s1,(0,0) s1,(1,0) s1,(2,0) s1,(3,0)
- x -
1 2 1 2
Figure 5.1: Averages in the two-dimensional Haar transform
This amounts to approximate f1 with a function f0 that is piecewise

constant on the squares kx < x < kx + 1, ky < y < ky + 1 (Figure 5.1).
Obviously, we lose information doing this approximation, and the question
is how this information can be represented. In the one-dimensional Haar
transform, we replace two neighbouring scaling coefficients with their mean
value, and then store the lost information as the difference. In the present
case, we replace four values with their mean value, and therefore we need
three ’differences’. In the Haar system, we compute
H 1
w0,k = (s1,2k + s1,2k+ex − s1,2k+ey − s1,2k+e ),
4
V 1
w0,k = (s1,2k − s1,2k+ex + s1,2k+ey − s1,2k+e ),
4
D 1
w0,k = (s1,2k − s1,2k+ex − s1,2k+ey + s1,2k+e ).
4
In Figure 5.2, we have scetched the ’polarity’ of these differences, and the
averaging. The superscripts H,V ,D are shorthands for horizontal, vertical,
5.1. THE SEPARABLE WAVELET TRANSFORM 85
and diagonal, respectively. The coefficients w0H can be seen as differences

in the y-direction, and they will be able to ’detect’ horizontal edges: thus,
the superscript H. Similarily, the coefficients w0V can be seen as differences
in the x-direction, and vertical edges will show up in w0V . Diagonal edges
will be present in w0D . Away from edges, these differences will generally be
negligible. This is how compression becomes possible.
s1
s1,2k+ey s1,2k+e
s1,2k s1,2k+ex
s0 w0H w0V w0D
+ + − − + − − +
+ + + + + − + −
Figure 5.2: Averages and differences in the two-dimensional Haar transform
The averaging in Equation (5.1) is the low-pass filtering step in the two-
dimensional Haar transform. It can be decomposed into two one-dimensional
lowpass filtering steps. First, we apply a lowpass filtering with subsamling
in the x-direction which gives us
1 1
(s1,2k + s1,2k+ex ) and (s1,2k+ey + s1,2k+e ).
2 2
Averaging these two, that is, averaging in the y-direction then gives us s0,k .
Further, the three wavelet coefficient sequences are the result of applying low-
and highpass filters in the x- and y-directions. The whole process is shown in
Figure 5.3, where H and G denote the Haar filters. The superscripts x and y
indicate that filtering and subsampling are done in the x- and y-directions.

- Hy∗ - ↓ y2 s0

- Hx∗ - ↓x2

- G∗y - ↓ y2 w0H

s1

- Hy∗ - ↓ y2 w0V

- G∗x - ↓x2

- G∗y - ↓ y2 w0D

Figure 5.3: The two-dimensional Haar transform as two one-dimensional

Haar transforms.
General Separable Wavelet Transforms

The decomposition of the two-dimensional Haar transform into two one-
dimensional filtering steps provides a generic way to construct a two-dimen-
sional wavelet transform from a one-dimensional. We start from a set of
e and G.
biorthogonal filters H, G, H, e We then define two-dimensional low-
and highpass synthesis filters by
H = Hx Hy ,
GH = Gx H y ,
GV = H x Gy ,
GD = Gx Gy .
e G
Analysis filters H, e H, G
e V , and G
e D are defined analogously. For notational
convenience, from now on, we let the operators Hx , Hy , etc. include upsam-
pling. For instance, Hx is an upsamling in the x-direction followed by a
filtering in the x-direction. Given scaling coefficients sj+1 = (sj+1,k )k∈Z2 , we
compute averages and wavelet coefficients
sj =He ∗ sj+1 e y∗ H
=H e x∗ sj+1 ,
wjH =Ge ∗ sj+1 e∗ H
=G e∗
H y x sj+1 ,
(5.2) e e ∗G e∗
wjV = G∗V sj+1 =H y x sj+1 ,
wjD =Ge ∗ sj+1 e∗ G
=G e∗
D y x sj+1 .
Here, the transposed operators include downsampling; H e ∗ becomes a filtering

x
e , followed by a downsampling.
in the x-direction with the reversed filter H ∗
We thus compute scaling and wavelet coefficients by first applying the anal-
5.1. THE SEPARABLE WAVELET TRANSFORM 87
e∗y H
G e x∗ sj+1 G
e ∗y G
e ∗x sj+1
sj+1 - e ∗ sj+1
H e ∗ sj+1
G -
x x
e y∗ H
H e x∗ sj+1 H
e y∗ G
e∗x sj+1
Figure 5.4: Separable filters.
e x∗ sj+1 and G
ysis filter bank on the rows of sj+1 to get H e∗x sj+1 . We then apply
the analysis filter bank along the columns of these, see Figure 5.4.
To reconstruct sj+1 from the scaling- and wavelet coefficients at the
coarser scale, we reverse the whole process in Figure 5.4. Thus, we first
apply the synthesis filters in the y-direction to recover H e ∗ sj+1 and G
e ∗ sj+1 :
x x
e x∗ sj+1 = Hy sj + Gy wjH ,
H
e∗ sj+1 = Hy w V + Gy w D .
G x j j
We then apply the synthesis filter bank in the x-direction to get
e x sj+1 + Gx G
sj+1 = Hx H ex sj+1
= Hx Hy sj + Hx Gy wjH + Gx Hy wjV + Gx Gy wjD .
= Hsj + GH wjH + GV wjV + GD wjD .
In the Forward Wavelet Transform, we recursively split the scaling coef-

ficients sj , starting from coefficients sJ at some finest scale. In Figure 5.5,
two steps of the transform are shown, together with the typical organization
of the wavelet coefficients. The Inverse Transform just reverses the whole
H D H D
wJ−1 wJ−1 wJ−1 wJ−1
sJ - -
H D
wJ−2 wJ−2
sJ−1 V V
wJ−1 wJ−1
V
sJ−2 wJ−2
Figure 5.5: The separable wavelet transform.

process, using the synthesis filters.

We make a short remark here about the filtering process. The computa-
tion of, e.g., sj from sj+1 in (5.2) can be written out more explicitly as
X
sj,k = 2 hlx −2kx e
e hly −2ky sj+1,l .
l∈Z2
This is a filtering with the two-dimensional lowpass filter h hkx e

ek = e hky , fol-
lowed by a ’wo-dimensional downsampling’. This downsampling operator
removes all coefficients with ’odd’ multi-indices as follows.
 .. .. .. 
. . .
 
· · · s−2,2 s0,2 s2,2 · · ·
 
(↓ 2)2 s = · · · s−2,0 s0,0 s2,0 · · · .
 
· · · s−2,−2 s0,−2 s2,−2 · · ·
.. .. ..
. . .
The wavelet coefficients are computed analogously using the highpass filters
ekH = e
g hkx e ekV = e
gky , g gkx e ekD = e
hky , and g gkx e
gky . Further, the reconstruction of
sj+1 can be seen as applying two-dimensional filters after a two-dimensional
upsampling. This upsampling operator inserts zeroes as follows.
 .. .. .. .. .. 
. . . . .
 
· · · s−1,1 0 s0,1 0 s1,1 · · ·
 
· · · 0 0 0 0 0 · · ·
 
(↑ 2)2 s = · · · s−1,0 0 s0,0 0 s1,0 · · · .
 
· · · 0 0 0 0 0 · · ·
 
· · · s−1,−1 0 s0,−1 0 s1,−1 · · ·
.. .. .. .. ..
. . . . .
Separable Filters in the Frequency Domain

We have been referring to the filters H and Gν as low- and highpass without
any further explanation than their ’averaging’ and ’differencing’. However,
it is straightforward to see that their filter functions are given by (without
upsampling)
H(ξ, η) = H(ξ)H(η),
GH (ξ, η) = G(ξ)H(η),
(5.3)
GV (ξ, η) = H(ξ)G(η),
GD (ξ, η) = G(ξ)G(η).
5.2. TWO-DIMENSIONAL WAVELETS 89
Ideally, H(ω) equals 1 on [0, π/2] and 0 on [π/2, π], and G(ω) equals 1 on
[π/2, π] and 0 on [0, π/2]. Then, H(ξ, η) and Gν (ξ, η) will decompose the
frequency domain as in Figure 5.6. For non-ideal filters, there is of course
overlapping between the frequency regions.
η
π6
GH GD
π
2
H GV
- ξ
π
2
π
Figure 5.6: Frequency Support for Ideal Separable Filters.
Problems
5.1. Verify that the wavelet coefficients in the two-dimensional Haar trans-
form are given by two one-dimensional filtering steps as described in Figure
5.3.
5.2. Work out the construction of separable filters in three dimensions. How
many highpass filters will there be? (7)
5.3. Show that the two-dimensional downsampling operator can be written
as (↓ 2)2 = (↓ 2)x (↓ 2)y , where (↓ 2)x and (↓ 2)y are the one-dimemensional
downsampling operators in the x- and y-directions. Also show that
(↑ 2)2 = (↑ 2)x (↑ 2)y .
5.2 Two-dimensional Wavelets

In one dimension, we started from the approximation and detail spaces, the
scaling function, and the wavelet. From there, we derived filters and the
fast wavelet transform. In the separable case, the natural starting point
is the separable filters and the associated wavelet transform. We will now
describe the scaling functions and wavelets associated with those filters, and
the corresponding multiresolution analysis. The presentation is merely a
sketch, and we refer to the exercises for the reader who wants to enter further
into the details.
The Two-dimensional Haar Scaling Function

and Wavelets
The two-dimensional Haar scaling function is defined by
(
1 when 0 < x, y < 1,
Φ(x, y) =
0 otherwise.
With this function, we can write the piecewise constant function f1 in the
previous section as
X
f1 (x, y) = s1,k Φ(2x − kx , 2y − ky ).
k∈Z2
This is so, because Φ(2x − kx , 2y − ky ) is one for kx /2 < x < (kx + 1)/2,
ky /2 < y < (ky + 1)/2 and zero otherwise. The coarser approximation f0 can
similarly be written as
X
f0 (x, y) = s0,k Φ(x − kx , y − ky ).
k
In the one-dimensional Haar system, the difference d0 = f1 −f0 can be written

as a linear combination of wavelets that are piecewise constant with values
±1. This is also the case in two dimensions, and, not surprisingly, we need
three different wavelets ΨH , ΨV , and ΨD . They are defined by
(
±1 when 0 < x, y < 1,
Ψν (x, y) =
0 otherwise.
where ν is H, V , or D. They attain the values 1 and −1 according to the
’polarity’ in Figure 5.2. With these wavelets, the ’detail image’ d0 can be
written as
X X
ν
(5.4) d 0 (x, y) = w0,k Ψν (x − kx , y − ky ).
k ν∈{H,V,D}
Separable Scaling Functions and Wavelets

Just as for the filters, the Haar scaling function and wavelets can be expressed
through their corresponding one-dimensional scaling function and wavelets.
It is easy so see that
Φ(x, y) = ϕ(x)ϕ(y),
ΨH (x, y) = ϕ(x)ψ(y),
ΨV (x, y) = ψ(x)ϕ(y),
ΨD (x, y) = ψ(x)ψ(y).
5.2. TWO-DIMENSIONAL WAVELETS 91
This is also how wavelets and scaling functions are defined for general sep-
arable wavelet bases. We define dilated, translated, and normalized scaling
functions by
Φj,k (x, y) = 2j Φ(2j x − kx , 2j y − ky ), where j ∈ Z, k ∈ Z2
and similarly for the wavelets. Note that we need 2j as the normalizing
factor in two dimensions. We define approximation spaces Vj as the set of
all functions of the form
X
fj (x, y) = sj,k Φj,k (x, y).
k
Detail spaces WjH , WjV , and WjD are defined analogously. The scaling func-
tion satisfies the scaling equation
X
Φ(x) = 4 hk Φ(2x − kx , 2y − ky ).
k
We also have similar equations for the wavelets,

X
Ψν (x) = 4 gkν Φ(2x − kx , 2y − ky ).
k
This implies that Vj ⊂ Vj+1 and Wjν ⊂ Vj+1 . Each fj+1 ∈ Vj+1 can be
decomposed as
(5.5) fj+1 = fj + dH V D
j + dj + dj ,
where fj ∈ Vj and dνj ∈ Wjν . The function fj+1 can be written as

X
(5.6) fj+1 (x, y) = sj+1,k Φj+1,k (x, y),
k
and
X X
(5.7) fj (x, y) = sj,k Φj,k (x, y) and dνj (x, y) = ν
wj,k Ψνj,k (x, y).
k k
To switch between (5.6) and (5.7) we use the analysis and synthesis filter
banks (5.2) and (5.1).
Finally, we also have dual scaling functions, wavelets, approximation and
detail spaces with the same properties described above. Together with the
primal scaling functions and wavelets they satisfy biorthogonality conditions
similar to those in one dimensions. The biorthogonality conditions can be

transposed to the filter domain. If we define
 
H(ξ, η) H(ξ + π, η) H(ξ, η + π) H(ξ + π, η + π)
 GH (ξ, η) GH (ξ + π, η) GH (ξ, η + π) GH (ξ + π, η + π) 
M(ξ, η) =  
 GV (ξ, η) GV (ξ + π, η) GV (ξ, η + π) GV (ξ + π, η + π)  ,
GD (ξ, η) GD (ξ + π, η) GD (ξ, η + π) GD (ξ + π, η + π)
f η) similarly, we get
and M(ξ,
(5.8) f η)t = I.
M(ξ, η)M(ξ,
It is easy to verify that the separable filters satisfy these equations. However,
it is also possible to use these equations as the starting point for the con-
struction of two-dimensional, non-separable wavelets. We will not explore
this topic further, instead we take a look at two other, entirely different
constructions of non-separable wavelets in the next section.
Separable Wavelets in the Frequency Domain

The Fourier transforms of the scaling function and the wavelets are given by
b η)
Φ(ξ, = b ϕ(η),
ϕ(ξ) b
cH (ξ, η)
Ψ = ϕ(ξ) b
b ψ(η),
cV (ξ, η)
Ψ = b ϕ(η),
ψ(ξ) b
cD (ξ, η)
Ψ = b b
ψ(ξ)ψ(η).
We know that ϕ b is essentially localized to the frequency band [0, π] and thus
b
Φ(ξ, η) is essentially localized to the square 0 < ξ, η < π in the frequency
plane. A similar argument for the wavelets gives that the separable wavelet
transform corresponds to decomposition of the frequency plane as in Figure
5.7.
Exercises 5.2
5.4. Verify Equation (5.4).
5.5. Show that Φj,k (x, y) = ϕj,kx (x)ϕj,ky (y).
5.6. Show that the scaling function satisties the scaling equation
X
Φ(x) = 4 hk Φ(2x − kx , 2y − ky ), where hk = hkx hky .
k
Hint: Use the one-dimensional scaling function.

5.3. NON-SEPARABLE WAVELETS 93
η
π6
H
W−1 D
W−1
π
2
H D
W−2 W−2
V
W−1
V
V−2 W−2
- ξ
π
2
π
Figure 5.7: The frequency plane decomposition for the separable wavelet
transform.
5.7. Show that Vj ⊂ Vj+1 . Hint: Use the scaling equation to show that the
scaling functions Φj,k belong to Vj+1. From there, conclude that each finite
sum
K
X
sj+1,k Φj+1,k (x, y)
k=−K
belong to Vj+1 . Finally, since Vj+1 is closed, it follows that all corresponding
infinite sums, that is, elements of Vj , belong to Vj+1.
5.8. Show that we can decompose fj+1 as in (5.5). Hint: Show that each
scaling function Φj+1,k can be decomposed in this way.
5.9. Show that separable filters satisfy the biortogonality conditions (5.8).
5.3 Non-separable Wavelets

A disadvantage with separable wavelets is their meagre rotational invariance.
A strong emphasis is put on the coordinate axes. While variations in the
coordinate directions are separated into the WjH and WjV frequency channels,
variations in the two diagonal directions are mixed into the channels WjD . A
45o rotation of an image can thus change the wavelet coefficients dramatically.
We will now give two examples of more isotropic wavelets in two dimensions.
We switch here to the notation f (x), x ∈ R2 , for convenience.
Quincunx Wavelets
For separable wavelets, the dilated, translated, and normalized scaling func-
tions can be written as
Φj,k (x) = 2j Φ(D j x − k); j ∈ Z, k ∈ Z2 .
Here, D is the dilation matrix

2 0
D= .
0 2
For Quincunx wavelets, we instead have scaling functions
ϕj,k (x) = 2j/2 ϕ(D j x − k),
with the dilation matrix

1 −1 1 1
D= or D = .
1 1 1 −1
√
Note that, for the first D, D = 2R, where R is an orthogonal matrix, a
45o rotation. For the latter D, the rotation is followed by a reflection in the
x1 -axis.
To simplify our discussion we will assume that we have the first D. In
this case, when going from one scale to the√next coarser, the scaling function
is rotated 45o and scaled in x by a factor 2. Approximation spaces Vj are
defined in the usual way with the requirement Vj ⊂ Vj+1 .
Let us look at the two approximation spaces V0 and V−1 . Functions in
V0 can be written as linear combinations of the scaling functions ϕ(x − k).
Assuming that ϕ(x−k) is localized in a region around x = 0, ϕ(x−k) will be
localized around x = k, and we associate this function with the point k ∈ Z2 .
The scaling functions ϕ(D −1 x − k) in V−1 are in the same way associated
with the points Dk, k ∈ Z2 . We denote the set of these points DZ2 . We
sometimes refer to Z2 as the sampling lattice and DZ2 as the subsampling
lattice.
In Figure 5.8, we have plotted the sampling and subsampling lattice for
the Quincunx and the separable case. We see that there are ’twice as many’
scaling functions in V0 as in V−1 in the Quincunx case. Thus it seems plausi-
ble that the difference between V0 and V−1 must be spanned by ψ(D −1 x − k)
for one wavelet ψ. This somewhat intuitive argument can in fact be made
rigorous, but we will not do this here. Instead we observe that in the sep-
arable case there are ’four times as many’ scaling functions in V0 as in V−1 ,
6 6
5x ox x ox x ox 5x x x x x x
ox x ox x ox x ox x ox x ox x
x ox x ox x ox x x x x x x
ox x ox x ox x ox x ox x ox x
x ox x ox x ox x x x x x x
ox x ox x ox x - ox x ox x ox x -
5 5
Figure 5.8: Sampling lattice (x’s) and subsampling lattice (o’s) for the Quin-
cunx and the separable case.
and that we need three different wavelets. Dilated and translated wavelets
can now be defined by
ψj,k (x) = 2j/2 ψ(D j x − k); j ∈ Z, k ∈ Z2 .
Their associated detail spaces are denoted Wj . The requirements Vj ⊂ Vj+1

and Wj ⊂ Vj+1 induces scaling and wavelet equations
X X
ϕ(x) = hk ϕ(Dx − k) and ψ(x) = gk ϕ(Dx − k).
k k
Again, the scaling function and wavelet are completely determined by the
low and highpass filter coefficients (hk ) and (gk ). In the biorthogonal case,
we also have dual scaling functions and wavelets, satisfying dual scaling and
wavelet equations with dual filters (e
hk ) and (e
gk ). Biorthogonality conditions
on wavelets and scaling functions can be transformed into conditions on the
filters. In the frequency domain they become
e 1 , ξ2 )H(ξ1 , ξ2)
H(ξ + e 1 + π, ξ2 + π)H(ξ1 + π, ξ2 + π)
H(ξ = 1,
e 1 , ξ2 )G(ξ1, ξ2 )
G(ξ + e 1 + π, ξ2 + π)G(ξ1 + π, ξ2 + π)
G(ξ = 1,
(5.9) e 1 , ξ2 )H(ξ1, ξ2 ) e 1 + π, ξ2 + π)H(ξ1 + π, ξ2 + π)
G(ξ + G(ξ = 0,
e 1 , ξ2 )G(ξ1 , ξ2 )
H(ξ + e 1 + π, ξ2 + π)G(ξ1 + π, ξ2 + π)
H(ξ = 0.
The construction of biorthogonal Quincunx wavelets comes down to con-

structing filters satisfying these equations. One possible method is to start
from a set of symmetric, biorthogonal filters in one dimension. They are all
polynomials in cos ω, and replacing cos ω with 21 (cos ξ1 + cos ξ2 ) gives two-
dimensional filters satisfying (5.9).
In the Forward Wavelet Transform, we start at a finest scale with fJ ∈ Vj ,

and recursively make the decomposition
X
sj+1,k ϕj+1,k (x) = fj+1(x)
k
X X
= fj (x) + dj (x) = sj,k ϕj,k (x) + wj,k ψj,k (x).
k k
We compute sj and wj from sj+1 via the analysis filters
e ∗ sj+1
sj = (↓ 2)D H e∗ sj+1
and wj = (↓ 2)D G
The downsampling operator (↓ 2)D removes all coefficients but those with
indices k ∈ DZ2 . These are sometimes referred to as even indices and all
other indices are called odd. In the Inverse Wavelet Transform we successively
recover sj+1 from sj and wj , using the synthesis filters,
sj+1 = H(↑ 2)D sj + G(↑ 2)D wj ,
where the upsampling operator (↑ 2)D interleaves zeros at the odd indices.
ξ2
π 6
@
W−1 W−2@@ W−1
@
W−3 @ W @
@ −3 @
@ @
0 V−3 @ @
@ W−2 @ W−2
@ @
@ W−3@ W−3
@ @
@
W−1 @ W−2 W−1
@
−π @
- ξ1
−π 0 π
Figure 5.9: The ideal frequency plane decomposition for Quincunx wavelets.
Through their construction, Quincunx wavelets treat coordinate and di-

agonal directions equally, and we have achieved more rotational invariance
compared to separable wavelets. A 45o rotation will (approximately) just
move the frecuency channels one step: Wj becomes Wj+1 . This is in sharp
contrast with the separable case, where the same rotation divides the fre-
quency channel WD into WH and WV , and the two latter are mixed into
WD . In Figure 5.9, we have plotted the frequency plane decomposition for
Quincunx wavelets.
Hexagonal Wavelets
Quincunx wavelet achieved more rotational invariance by including a rotation
in the dilation matrix. Another method is to use a sampling lattice different
from Z2 . One alternative is to use the hexagonal lattice Γ in Figure 5.10.
Figure 5.10: The Hexagonal Lattice (left) and the sampling frequency regions
(right) for rectangular and hexagonal sampling.
An interesting feature with the hexagonal lattice is that it provides an

optimal sampling density for band-limited functions with no directional pref-
erence. To be more precise, let f be circularly band-limited, fb(ξ) = 0
for |ξ| > π. Sampling on the lattice Z2 is associated with the square
−π ≤ ξ1 , ξ2 ≤ π in the frequency domain; a function that is band-limited in
this frequency region can be exactly reconstructed from its sample values on
the lattice Z2 . Equivalently, a function sampled on the hexagonal lattice can
be reconstructed from the sample values if it is band-limited to the hexagonal
region in Figure 5.10. We see from this figure that, in the hexagonal case, we
need to cover a smaller (13.4% less) area in the frequency domain compared
to rectangular sampling, in order to recover f from its sample values. Those
areas are proportional to the sampling densities associated to the lattices,
i.e., the number of sample points per unit area. Thus, using the hexago-
nal lattice we need 13.4% fewer sample values of the circularly band-limited
function f compared to the usual rectangular sampling lattice.
The approximation space V0 is spanned by scaling functions ϕ(x − γ)
where γ ∈ Γ. We are thus translating ϕ along the hexagonal lattice. There
will be one scaling function centred around each point in the hexagonal lat-
tice, assuming that ϕ is centred around the origin. The dilated, translated,
and normalized scaling functions are defined as
ϕj,γ (x) = 2j ϕ(2j x − γ), j ∈ Z, γ ∈ Γ.
It is also possible to have a dilation matrix D that includes a rotation, but

we will only consider a pure scaling by 2 here. In this case, three wavelets
ψ 1 , ψ 2 , and ψ 3 are needed to span the difference between V1 and V0 .
Properly constructed, the scaling function will be invariant under 60o
rotations and each wavelet will be associated with one of the directions in
Figure 5.11. More precisely, ψ 1 will be oscillating in the x1 -direction, and
ψ 2 (x) = ψ 1 (R60o x), ψ 3 (x) = ψ 1 (R120o x), where R60o and R120o denotes clock-
wise rotations 60o and 120o . In Figure 5.11 we have also sketched the fre-
quency plane decomposition associated with those wavelets.
"bb
" b
" b
" b
" b
ψ3 ψ2 " W3
" 2 b
HH −1 " b W−1
A
K
HH " b
A " W−2
3 2
b
W−2
H""bb
A
A - ψ1 1 1
W−1 1
W−2
V−2 1
W−2
W−1
A b2b""3HH
A b W−2
b
"H
W−2 "
" H
A b
b " H
b W 2
W 3 "
"
b −1 −1 "
b "
b "
b "
b
b"
Figure 5.11: Hexagonal wavelets and their frequency plane decomposition.
The construction of such scaling functions and wavelets is based upon

the construction of appropriate low- and highpass filters. The connection
between these filters and scaling functions and wavelets is through scaling
and wavelet equations. The filters have to satisfy certain biorthogonality
(perfect reconstruction) conditions. We do not go into details here, and just
point out that the filter design becomes considerably more difficult in the
hexagonal lattice setting.
5.4 Notes
In the paper [9] by Cohen and Daubechies, several Quincunx wavelet bases
are constructed and their regularity is investigated. An interesting fact is that
for D being a rotation, orthogonal wavelets can at most be continuous, while
5.4. NOTES 99
arbitrarily regular orthogonal wavelets are possible when D also includes a

reflection. For biorthogonal wavelets, arbitrary regularity can be achieved for
both choices of D. A few wavelets on the hexagonal lattice are constructed
by Cohen and Schlenker in [10].
Finally, there exists a theory for the construction of scaling functions and
wavelets for more general sampling lattices Γ and dilation matrices D. In
the article by Strichartz [28] this theory is presented, along with a general
construction of orthogonal wavelets. These wavelets can be made arbitrarily
smooth, but they do not have compact support. In fact, they can be seen as
a generalization of the Battle-Lemarié wavelets presented in section 8.4.
Chapter 6
The Lifting Scheme
Here we will consider the question of characterizing all compactly supported

biorthogonal wavelet bases (cf. Chapter 3) sharing the same lowpass filter,
say.1 This will be done by using the Euclidean division algorithm on the
decomposition of the filter ’polynomial’ into even and odd terms.
Moreover, we will describe how any compactly supported biorthogonal
filter bank can be built by successive ’lifting steps’ starting from the trivial
subsampling of elements with even and odd indices. This construction will
follow from the factorization produced by the Euclidean algorithm just men-
tioned. This procedure has been applied to the construction of integer to
integer MRA filter banks, which then allow lossless coding.
The successive lifting steps can also be used to tailor the filters to the
signal by adopting a prediction scheme for the design of the highpass filter.
6.1 The Basic Idea

A Lifting Step
The basic idea behind the lifting scheme is to build larger filters from very
simple ones through a sequence of lifting steps. In Figure 6.1 we show a
lifting step. We feed sj+1 through the analysis part of a filter bank with
filters h and g (down-sampling included) to obtain sj and wj . We then form
’new’ scaling and wavelet coefficients
snj = sj and wjn = wj − psj ,

1
For notational convience, we will let h and h̃ etc. denote the analysis and the synthesis
filter functions, respectively, in this chapter. Elsewhere, the corresponding notation is H̃
and H, that is, the reversed one, in capital letters.
101
102 CHAPTER 6. THE LIFTING SCHEME
- h∗ sj - snj
?
sj+1 p

?
- g∗ wj - − - wjn

Figure 6.1: A lifting step.
where p is some filter. Notice that
wjn = g ∗ sj+1 − ph∗ sj+1 = (g ∗ − ph∗ )sj+1 = (g − hp∗ )∗ sj+1 ,
so snj and wjn are the result of applying ’new’ filters to sj+1 ,
snj = hn∗ sj+1 and wjn = g n∗ sj+1,
where
hn = h and g n = g − hp∗ .
To recover sj+1 from snj and wjn we simply proceed as in Figure 6.2. This
amounts to applying new (lifted) synthesis filters
e
hn = e g p and gen = e
h+e g,
and computing
sj+1 = e
hn snj + e
g n wjn .
The connection between the original and lifted filters in the Fourier do-
main can be written out in matrix form

e
hn (ω) 1 s(ω) e
h(ω)
(6.1a) = ,
g n (ω)
e 0 1 e
g (ω)

e
hn (ω) 1 0 h(ω)
(6.1b) = ,
g n (ω) −s(ω) 1 g(ω)
where s(ω) is the filter function for p.

6.1. THE BASIC IDEA 103
snj sj - e
h
?
?
p + - sj+1

6

?
wjn - + wj - e
g

Figure 6.2: Inverting the lifted filter bank.
To motivate the lifting step, let us consider a simple example, where the
initial filters are the lazy filters, that is,
n
sj,k = sj+1,2k and wj,k = sj+1,2k+1.
From a compression point of view, this is not a useful filter pair, since there
are no reasons to expect many wavelet coefficients to be small. This is where
the prediction filter p enters the scene. We try to predict the odd-indexed
scaling coefficients from the even-indexed via linear interpolation:
1 1
sbj+1,2k+1 = (sj+1,2k + sj+1,2k+2 ), or w
bj,k = psj,k = (sj,k + sj,k+1).
2 2
The new wavelet coefficients then become the prediction errors
n
wj,k bj,k
= wj,k − w
1
= sj+1,2k+1 − (sj+1,2k + sj+1,2k+2 )
2
1 1
= − sj+1,2k + sj+1,2k+1 − sj+1,2k+2.
2 2
We see that the new highpass filter is g0n = g2n = −1/2, g1n = 1, all other
gkn = 0.
In regions where the signal is smooth, the prediction can be expected to
be accurate and thus the corresponding wavelet coefficients will be small. A
more detailed analysis shows that the lifting step increases the number of
vanishing moments, from zero in the lazy wavelet transform to two vanishing
moments (linear polynomials will have zero wavelet coefficients).
Dual Lifting
After the lifting step, the wavelet coefficients changed, but the scaling coef-
ficients were left unaltered. They can be updated with a dual lifting step as
shown in Figure 6.3. We apply an ’updating’ filter u to the lifted wavelet

coefficients, and get new scaling coefficients
snj = sj + uwjn .

- h ∗ sj - + - snj

6
?
sj+1 p u
6
?

- g∗ wj - − - wjn

Figure 6.3: Dual lifting.
The reason for updating the scaling coefficents is to avoid aliasing. In

our case, with the lazy filters, all signals with the same even-indexed scaling
coefficients sj+1,2k will have the same scaling coefficients at the next level.
To (partly) avoid this, we can force the mean value of the scaling coefficients
to be constant, that is
X X
snj,k = sj+1,k .
k k
This is in fact equivalent to having one dual vanishing moment. We will not
motivate this any further, instead we go on and make the Ansatz
snj,k = sj,k + uwj,k

n
n n
= sj,k + Awj,k + Bwj,k−1
= −B/2 sj+1,2k−2 + Bsj+1,2k−1 + (1 − A/2 − B/2)sj+1,2k
+ Asj+1,2k+1 − A/2 sj+1,2k+2.
It is not hard to show that A = B = 1/4 gives the constant mean value
property. In this case, the new lowpass synthesis filter become hn−2 = hn2 =
−1/8, hn−1 = hn1 = 1/4, hn0 = 3/4, all other hnk = 0. The new filters (hnk ) and
(gkn ) are the analysis filters associated with the hat scaling function and the
wavelet in Figure 4.7.
6.2. FACTORIZATIONS 105
Dual lifting has the following effect on the filters:

e
hn (ω) 1 0 e
h(ω)
(6.2a) = ,
gen (ω) −t(ω) 1 e
g (ω)
n
h (ω) 1 t(ω) h(ω)
(6.2b) = ,
g n (ω) 0 1 g(ω)
where t(ω) is the filter function for u.
6.2 Factorizations
As we have seen in Chapter 3, to any biorthogonal wavelet basis with finite
support there corresponds ’polynomials’, uniquely defined up to shifts and
multiplication with scalars. Such a ’polynomial’ may be written
L−1
X
h(z) = hk z −k
k=0
PL−1 PL−1−N
(A shift produces k=0 hk z −k−N = k=−N hk+N z −k .)
As we also noted in Chapter 3, there are three other polynomials h̃, g, g̃,
such that

h(z)h̃(z −1 ) + g(z)g̃(z −1 ) = 2
h(z)h̃(−z −1 ) + g(z)g̃(−z −1 ) = 0
These are the conditions for perfect reconstruction from the analysis. We may
write them in modulation matrix notation as follows (with redundancy):

h(z −1 ) g(z −1 ) h̃(z) h̃(−z) 1 0
=2 .
h(−z −1 ) g(−z −1 ) g̃(z) g̃(−z) 0 1
Dividing each polynomial into even and odd parts as
h(z) = he (z 2 ) + z −1 ho (z 2 )
where
he (z 2 ) := (h(z) + h(−z))/2, ho (z 2 ) := (h(z) − h(−z))/(2z −1 )
we define the polyphase matrix P (and analogously P̃ ), through

2 he (z 2 ) ge (z 2 ) 1 1 1 h(z) g(z)
P (z ) := =
ho (z 2 ) go (z 2 ) 2 z −z h(−z) g(−z)
and the matrix equation can now be written
P̃ (z −1 )t P (z) = I
where I is the identity matrix. We now shift and scale g and g̃ so that
det P (z) ≡ 1.2
Note that the basis is orthogonal precisely when P = P̃ , i.e., h = h̃ and
g = g̃.
Looking at P̃ (z −1 )t P (z) = I, we note that P (i.e., h and g) determines
P̃ (i.e., h̃ and g̃) since the matrix P̃ (z −1 )t is the inverse of the matrix P (z)
and det P (z) = 1.
We will even see that, given h such that he and ho have no common
zeroes (except 0 and ∞), such a P (and thus P̃ ) can be constructed, using
the Euclidean division algorithm on the given he and ho .
The Euclidean division algorithm for integers is now demonstrated in a a
specific case, by which the general principle becomes obvious.
Example 6.1. We wish to find the greatest common divisor of 85 and 34

and proceed as follows
85 = 2 · 34 + 17
34 = 2 · 17 + 0
using 2 · 34 ≤ 85 < 3 · 34. Now, clearly, 17 divides both 34 and 85, and is the
greatest common divisor. We now proceed with 83 and 34 instead.
83 = 2 · 34 + 15
34 = 2 · 15 + 4
15 = 3 · 4 + 3
4=1·3+1
3=3·1+0
This means that 1 is the greatest common divisor of 83 and 34, and they are
thus relatively prime. In matrix notation this is

83 2 1 2 1 3 1 1 1 3 1 1
=
34 1 0 1 0 1 0 1 0 1 0 0
2
This is possible, since det P (z) must have length 1 if the inverse P̃ (z −1 )t is to contain
polynomials only. Scaling and shifts are thus made in unison.
6.2. FACTORIZATIONS 107
For the polynomials in the polyphase matrix we may proceed analogously,

but the sequence of division steps is no longer unique, as will be seen in the
following examples. We are only interested in the case when the polynomials
to be divided are ’relatively prime’, in other words, have no common zeroes
(except 0 and ∞).
Example 6.2. Set h(z) = 1 + z −1 + 4z −2 + z −3 + z −4 and thus he (z) =
1 + 4z −1 + z −2 , ho (z) = 1 + z −1 . We divide he by ho by requiring that the
length of the remainder is to be strictly less than the length of the ho . This
can be done in three ways:
1 + 4z −1 + z −2 = (1 + z −1 )(1 + z −1 ) + 2z −1
1 + 4z −1 + z −2 = (1 + 3z −1 )(1 + z −1 ) − 2z −2
1 + 4z −1 + z −2 = (3 + z −1 )(1 + z −1 ) − 2
This division is thus not unique. Let us choose the first division and continue
the algorithm:
1 + z −1 = 2z −1 (1/2 z + 1/2) + 0.
Here the algorithm stops when the remainder is 0. We may represent the
steps of the algorithm as
−1
1 + 4z −1 + z −2 1 + z −1 1 1/2z + 1/2 1 2z
−1 =
1+z 1 0 1 0 0
Note that the number of steps will be bounded by the length of the division,
here 2.
When the common factor has length 1, the two polynomials he and ho
have no zeroes in common, except possibly 0 and ∞.
Example 6.3. The Haar case may be represented by h(z) = 1 + z −1 with
he (z) = ho (z) ≡ 1. The algorithm may then be represented by

1 1 1 1
=
1 1 0 0
which is just one step.
We will eventually arrive at the following factorization below: (6.3).
k/2
Y
he (z) 1 si (z) 1 0 C
=
ho (z) 0 1 ti (z) 1 0
i=1
We now present an argument leading to the factorization (6.3). From the

examples we get the scheme

he (z) q1 (z) 1 qk (z) 1 rk (z)
= ...
ho (z) 1 0 1 0 0
where rk (z) is a greatest common factor in he (z) and ho (z). The zeroes of
rk (z) one precisely the common zeroes of he (z) and ho (z), excepting 0 and
∞.
In the case that he (z) and ho (z) appear in a column of a polyphase matrix
P (z), for which det P (z) = 1, clearly any rk (z) must have length 1, i.e.,
rk (z) = Cz M for some integer M and some non-zero constant C. (rk (z) will
always factor out of det P (z), since it is a factor in the first column.)
We now consider a polyphase matrix

he (z) ge (z)
P (z) =
ho (z) go(z)
with det P (z) = 1. We note that

qi (z) 1 0 1 1 0
=
1 0 1 0 qi (z) 1

1 qi (z) 0 1
=
0 1 1 0
and that, if the length of he is strictly less than the length of ho , then q2 (z) ≡
0. Therefore, we may write

he (z) 1 q1 (z) 1 0 1 0 rk (z)
= ...
ho (z) 0 1 q2 (z) 1 qk (z) 1 0
2
0 1
when k is even, since = I. If k is odd, we may add a factor
1 0

1 0
with qk+1 = 0.
qk+1 1
Now the product has an even number of factors and we have the desired
factorization (6.3):
k/2
Y
he (z) 1 si (z) 1 0 C
(6.3) =
ho (z) 0 1 ti (z) 1 0
i=1
if we shift both he and ho with the factor z M from the algorithm. (This
means that ge and go has to be shifted with z −M to preserve det P (z) = 1.)
6.3. LIFTING 109
6.3 Lifting
The factorization (6.3) gives a polyphase matrix P n (z) through
k/2
Y
n he (z) gen (z) 1 si (z) 1 0 C 0
P (z) := :=
ho (z) gon (z) 0 1 ti (z) 1 0 1/C
i=1
where the last scaling 1/C is chosen to give det P n (z) = 1. Here the super-
script n indicate that gen and gon may not come from the same highpass filter
g, which we started from in P . All we did was to use the Euclidean algorithm
on he and h0 without any regard to g.
Moreover, given a polyphase matrix P (z), any P n (z) with the same h(z),
i.e., identical first columns and det P n (z) = det P (z) = 1, is thus related to
P (z) through

n 1 s(z)
P (z) = P (z)
0 1
for some polynomial s(z). In the same way, any P n (z) with the same g(z)
as P (z) and with det P n (z) = 1 satisfies

n 1 0
P (z) = P (z)
t(z) 1
for some polynomial t(z). In these two cases P n is said to be obtained from
P by lifting and dual lifting, respectively.
In this terminology, we can now conclude that any polyphase matrix can
be obtained from the trivial subsampling of even and odd indexed elements
(with the trivial polyphase matrix I, i.e., h(z) = 1 and g(z) = z −1 ) by
successive lifting and dual lifting steps and a scaling.
6.4 Implementations
We will now turn to how the factorization is implemented. The polyphase
matrix P (z)t performs the analysis part of the transform. E.g., with x(z) =
xe (z 2 ) + z −1 xo (z 2 ) as before,

t xe (z) he (z)xe (z) + ho (z)xo (z)
P (z) =
xo (z) ge (z)xe (z) + go (z)xo (z)
represents the even numbered entries of the outputs h(z)x(z) and g(z)x(z)
after subsampling.
Specifically, we will now discuss the Haar case with h(z) = 1 + z −1 .
Example 6.4. Using the algorithm on the Haar lowpass filter, h(z) = 1 +
z −1 , we have he (z) = ho (z) = 1, and obtain

1 1 0 1
= .
1 1 1 0
This gives

n 1 0 1 0 1 0
P (z) = =
1 1 1 1 0 1
i.e., g n (z) = z −1 , and

n −1 n t −1 1 −1
P̃ (z ) = (P (z) ) =
0 1
which means h̃n (z) = 1 and g̃ n (z) = −1 + z.
If we consider the usual Haar polyphase matrix

 
1 −1/2
P (z) =  
1 1/2
i.e., h(z) = 1 + z −1 and g(z) = −1/2 + 1/2z −1 , then we have (identical first
columns)

n 1 s(z)
P (z) = P (z)
0 1
where now s(z) = 1/2.

Equivalently,

n 1 −1/2 1 0 1 −1/2
P (z) = P (z) = .
0 1 1 1 0 1
Correspondingly, we get

−1 1 −1 1 0
P̃ (z ) =
0 1 1/2 1
This factorization into lifting and dual lifting steps is implemented as

follows.
6.4. IMPLEMENTATIONS 111
Denote the signal to be analyzed by {xk }k ,3 and let the successive low-
(j) (j)
and highpass components of it be {vk }k and {uk }k after, respectively, stage
j = 1, 2, . . . . In our example, this becomes for the analysis,
(
(1)
vk = x2k
(1)
uk = x2k+1
(
(2) (1) (1)
vk = vk + uk
(2) (1)
uk = uk
(
(2)
vk = vk
(2) (2)
uk = −1/2vk + uk
where each step corresponds to a matrix factor.

For the reconstruction, the steps are reversed,
(
(2)
vk = vk
(2)
uk = 1/2vk + uk
(
(1) (2) (2)
vk = vk − uk
(1) (2)
uk = uk
(
(1)
x2k = vk
(1)
x2k+1 = uk
This is an example of an integer-to-integer transform. Note also that such

a factorization represents a reduction in the number of operations needed to
execute the transform.
Exercises 6.4
6.1. Determine the polyphase matrix P (z) for the lazy filter bank: h(z) =
h̃(z) = 1 and g(z) = g̃(z) = z −1 . (identity)
6.2. Apply the Euclidean division algorithm to the polynomials
(1 + z −1 )(1 + 2z −1 ) = 1 + 3z −1 + 2z −2 and 1 + z −1 .
3
This was denoted by the letter s instead of x in Section 6.1.
6.3. Determine two distinct polyphase and dual polyphase matrices sharing
the polynomial h(z) in Example 6.2.
6.4.
√ Consider
√ what happens in Example 6.4 if h(z) = 1 + z −1 is scaled to
1/ 2 + 1/ 2 z −1 .
6.5 Notes
This chapter is based on the paper Daubechies & Sweldens [12]. For informa-
tion about the construction of integer-to-integer transforms, see the paper by
Calderbank & Daubechies & Sweldens & Yeo [6]. A practical overview can
be found in an article by Sweldens & Schröder [29]. All these three papers
contain many further references.
Chapter 7
The Continuous Wavelet

Transform
The continuous wavelet transform is the prototype for the wavelet techniques,
and it has its place among ’reproducing kernel’ type theories.
In comparison to the discrete transform, the continuous one allows more
freedom in the choice of the analyzing wavelet. In a way, the discrete wavelet
transform in Chapter 4 is an answer to the question of when the dyadic
sampling of the continuous transform does not entail loss of information.
This chapter may be a little less elementary, but the arguments are quite
straightforward and uncomplicated.
7.1 Some Basic Facts

The continuous wavelet transform is defined by the following expression
Z ∞
t − b −1/2
Wψ f (a, b) = f (t)ψ a dt
−∞ a
Here we may take ψ ∈ L2 , a > 0, and ψ is real-valued, and absolutely
integrable, for simplicity. The variable a gives a continuous set of scales
(dilations), and b a continuous set of positions (translations). In the discrete
transforms both these sets were discrete.
Note that this may be seen as a convolution. With the notation ψa (t) =
a−1/2 ψ(−t/a) we have
W f (a, b) := Wψ f (a, b) = f ∗ ψa (b)
or, after a Fourier transformation in the variable b,
Fb W f (a, β) = fb(β)ψ
ca (β)
113
114 CHAPTER 7. THE CONTINUOUS WAVELET TRANSFORM
If the last relation is multiplied by a factor and integrated over a, we might

b
obtain just a constant factor times f(β). We may then perform an inverse
Fourier transform, and so get f reproduced from W f .
Let us multiply by the complex conjugate of ψ ca , and integrate using that
ψ is real-valued:
Z ∞
ca(β) da/a2 =
Fb W f (a, β) ψ
0
Z ∞
(7.1) = f (β) ca (β)|2 da/a2
|ψ
0
Z ∞
= f (β) b
|ψ(ω)| 2
dω/ω := Cψ f (β)
0
where we now assume that

Z ∞
Cψ = b
|ψ(ω)| 2
dω/ω
0
is positive and finite. This is called the admissibility condition. It implies

b
that ψ(0) = 0, since absolute integrability of ψ implies that ψb is a continu-
ous function. The condition
R can be interpreted to say that ψ must have a
b = 0.
cancellation property: ψ(t) dt = ψ(0)
Thus (we disregard convergence issues here) by an inverse Fourier trans-
form, we get the inversion formula
Z ∞ Z ∞
t−b
f (t) = Cψ−1 W f (a, b) a −1/2
ψ da/a2 db
−∞ 0 a
Obviously, there are many real-valued functions ψ ∈ L1 ∩ L2 with 0 < Cψ <

∞. It is when we require that already the sampled values of Wψ f (a, b),
e.g., the values Wψ f (2j , k2−j ), j, k ∈ Z, should suffice, that more restringent
conditions have to be made on ψ. Clearly, these sample values suffice when
{ψ(2j t − k)}j,k is a basis, since we are then back in Chapter 4.
Using the Parseval formula, we also note that
Z ∞ Z ∞ Z ∞
2 2
(7.2) |W f (a, b)| da/a db = Cψ |f (t)|2 dt
−∞ 0 −∞
2
Example 7.1. Consider the wavelet ψ(t) = te−t . Take f (t) = H(t − t0 ) −
H(t − t1 ) where t0 < t1 , and H(t) is the Heaviside unit jump function at
t = 0.
7.2. GLOBAL REGULARITY 115
Further
2
Z t1 Z t1
t−b t−b − t−b
W f (a, b) = a−1/2 ψ dt = a−1/2 e a

dt
t0 a t a
2 0
t1
1 1/2 − t−b
a
= a e =
2 t
t1 −b 2 2
0
t0 −b
1 − −
= a1/2 e a
−e a
and thus W f (a, b) decays rapidly in b away from b = t0 and b = t1 . Moreover,

this effect is more marked for small values of a, i.e., for small scales in t. The
reader is asked to draw a picture of this.
Exercises 7.1
7.1. Work out what happens if the multiplication is done by the complex
ca (a different function) instead of by the complex conjugate of
conjugate of Ψ
ca in Equation (7.1).
ψ
7.2. Make the modifications to the definition of the continuous wavelet trans-
form and to the reconstruction formula needed if ψ is allowed to be complex-
valued.
7.3. Verify the Parseval type formula (7.2).

2 2
7.4. Putting ψ(t) = te−t , check that 0 < Cψ < ∞, and let f (t) = e−ǫ(t−t0 ) .
Compute the transform W f (a, b) and consider how it depends on ǫ > 0 and
t0 .
7.5. Consider now the function ψ(t) = t/(1 + t2 ), and check that it is an
admissible wavelet. Let now f (t) = H(t − t0 ) − H(t − t1 ) as in Example 7.1,
compute W f (a, b), and compare with the result in the example.
7.2 Global Regularity

We will now describe how certain differentiability properties are reflected in
properties of the continuous wavelet transform.
The formula (assuming 0 < Cψ < ∞)
Z ∞
b
Fb W f (a, β) a1/2ψ(aβ) da/a2 = Cψ f (β)
0
can also be used to characterize certain differentiability properties of f in

terms of properties of its wavelet transform W f (a, b).
First, we consider the spaces H s for s ≥ 0.
Definition 7.1. The space H s consists of ’all’ functions f such that
(1 + | · |2 )s/2 fb ∈ L2
with norm
Z ∞ 1/2
kf kH s = 2 s/2
|(1 + |ω| ) fb(ω)|2 dω
−∞
Since F (D k f )(ω) = (iω)k F f (ω), it is clear that H s consists of f with

D k f ∈ L2 for all k ≤ s, when s is an even positive integer.1 For our purposes,
it is enough to think of such f , which are infinitely differentiable and vanish
outside a bounded interval.
Multiplying fb(β) by |β|s in the formula (7.1), we have
ca (β) =
a−s Fb W f (a, β) = f (β)a−s ψ
ca (β).
= |β|sf (β) |aβ|−sψ
Squaring the absolute values, integrating, and using the Parseval formula,
we obtain
Z ∞Z ∞
|a−s Wf (a, b)|2 da/a2 db =
−∞ Z0 Z ∞
∞
= −1 s 2
|F (| · | F f )(t)| dt b
|ω|−2s |ψ(ω)| 2
dω/ω
−∞ 0
if
Z ∞
0< b
|ω|−2s |ψ(ω)| 2
dω/ω < ∞.
0
This requirement on ψb means essentially that ψ(ω)

b = o(|ω|s) as ω → 0 or
Z ∞
D k ψ(t) dt = 0 for 0 ≤ k ≤ s.
−∞
1
The definition of H s needs to be made more precise, but we will only note that
infinitely differentiable functions f which vanish outside a bounded set is dense in H s
with its standard definition.
7.3. LOCAL REGULARITY 117
Additional cancellation is thus required of ψ now, in proportion to the num-

ber of derivatives of the function to be analyzed.
With the aid of the arguments in Exercise 7.6, we may now conclude that
there exist positive constants A and B such that
Z ∞ Z ∞
A kf k2H s ≤ |a−s W f (a, b)|2 da/a2 db ≤ B kf k2H s
−∞ 0
This means that having s derivatives in L2 corresponds exactly to having the

integral
Z ∞Z ∞
|a−s W f (a, b)|2 da/a2 db
−∞ 0
finite. This may be seen a growth condition on the transform W f (a, b) in

the scale variable a. The transform has to be small on small scales (small a)
exactly tuned with the factor a−s .
Exercises 7.2
7.6. Show that positive constants C1 and C2 exist for which
C1 kf kH s ≤ kf k + kF −1{| · |s F f }k ≤ C2 kf kH s
using the inequalities
max(1, |ω|s) ≤ (1 + |ω|2)s/2 ≤ 2s/2 max(1, |ω|s).
7.3 Local Regularity

We will now make a local regularity statement in contrast to the global one
involving the Fourier transform and H s . Let us assume that for a fixed t0
|f (t) − f (t0 )| ≤ C|t − t0 |s
where 0 < s < 1, say. This is usually called a local Lipschitz condition and
s is a measure of the regularity. The adjective local refers to the condition
being tied to the point t0 , and thus it is not global or uniform.
What does this local regularity condition on the function f imply for the
wavelet transform Wψ f (a, b)?
R
We note first that the cancellation property ψ(t) dt = 0 makes it possi-
ble to insert f (t0 ):
Z ∞
−1/2 t−b
W f (a, b) = (f (t) − f (t0 )) a ψ dt.
−∞ a
Thus
Z
∞
s −1/2
t − t0
|W f (a, t0 )| ≤ C |t − t0 | a ψ dt =
−∞ a
1
+s
= Ca 2
and
Z
∞
−1/2 t − b
|W f (a, b)| ≤ |f (t) − f (t0 )|a ψ dt +
−∞ a
Z
∞
−1/2 t − b
+ |f (t0 ) − f (b)|a ψ dt
−∞ a
≤ Ca1/2 (as + |b − t0 |s ).
Thus the local Lipschitz regularity condition implies a growth condition on
the wavelet transform.
Conversely, going from a growth condition on the wavelet transform, there
is the following local regularity result. Note that a global condition on f is
also made, and that an extra logarithmic factor appears in the regularity
estimate.
If f is, say, bounded and
|W f (a, b)| ≤ Ca1/2 (as + |b − t0 |s )
it follows that
|f (t) − f (t0 )| ≤ C|t − t0 |s log |t − t0 |−1
holds for all t close enough to t0 . We omit the argument leading to this
result.
Exercises 7.3
7.7. Determine the exponent in the Lipschitz condition for the function
f (t) = ts for t > 0 and f (t) = 0 otherwise. Calculate also the transform,
using the Haar wavelet ψ(t) = t/ |t| for 0 < |t| < 1/2 and ψ(t) = 0 elsewhere.
7.4. NOTES 119
7.4 Notes
Further material may be found in, e.g., the books by Holschneider [18], Ka-
hane & Lemarié [21], and Meyer [23].
The connection between differentiability properties and discrete wavelet
representations is described in Chapter 12.
Part II
Applications
121
Chapter 8
Wavelet Bases: Examples
So far we have only seen a few examples of wavelet bases. There is in fact a
large number of different wavelet bases, and in a practical application it is not
always an easy task to choose the right one. In this chapter, we will describe
the most frequently used wavelet bases. We will also try to give some general
advice on how to choose the wavelet basis in certain applications.
8.1 Regularity and Vanishing Moments

Most of the wavelets we will discuss in this chapter have compact support,
and the corresponding filters are FIR. We will always use the alternating flip
construction
e + π) and G(ω)
G(ω) = −e−iω H(ω e = −e−iω H(ω + π),
or, expressed in the filter coefficients,
gk = (−1)k e
h1−k gk = (−1)k h1−k .
and e
The lowpass filters are factorized as

N Ne
1 + e−iω e 1 + e−iω e
H(ω) = Q(ω) and H(ω) = Q(ω).
2 2
Different wavelet families differ in the choice of the trigonometric polyno-

mials Q and Q.e The parameters N and N e can be varied within each family,
and control the number of vanishing moments. The choice of these parame-
ters is important. We saw earlier that the number N of vanishing moments
for the analysis (dual) wavelet determines the rate of decay of the wavelet
123
124 CHAPTER 8. WAVELET BASES: EXAMPLES
coefficients where the signal is smooth. The vanishing moments thus produce
the compression ability, and are also connected to the smoothness of the syn-
thesis wavelet. We usually want the synthesis wavelets to have some smooth-
ness, which is particularly important in compression applications. This is so
because compression is achieved by, roughly speaking, leaving out terms in
the sum
X
f (t) = wj,k ψj,k (t),
j,k
corresponding to ’small’ wavelet coefficients wj,k .

The human eye tends to detect this cancellation more easily if the syn-
thesis wavelets are irregular. One the other hand, the more regularity and
vanishing moments we want, the longer filter we have to use. This makes
the synthesis wavelets more spread out in time, which may cause ’ringing’
artifacts around discontinuities and edges.
Thus there is a trade-off between regularity/vanishing moments and filter
length. We again remark that the number N e of vanishing moments for the
synthesis wavelets, is not as important as N. This is because we take scalar
products with the dual wavelets, wj,k = hf, ψej,k i, and not with the synthesis
wavelets. For the same reason, the smoothness of the dual wavelets, which
is connected to Ne , is not so important. As a consequence, we usually assign
more vanishing moments to the analysis wavelets. In a practical application,
there is no obvious choice of N and N e.
8.2 Orthogonal Bases

The first choice that has to be made is whether to use an orthogonal basis
or not. Orthogonality allows interpretation of the wavelet coefficients as an
energy distribution. The energy of a signal f can be expressed as
Z ∞ X
|f (t)|2 dt = |wj,k |2 .
−∞ j,k
The squared modulus of the wavelet coefficient |wj,k |2 can be viewed as a

measure of f ’s energy content roughly in the time interval (2−j k, 2−j (k + 1))
and frequency interval (2j π, 2j+1π). Summing these energies over the whole
time-frequency plane gives the total energy.
In the best basis algorithm in Chapter 9, it will be orthogonality that
makes the entropy cost function additive.
8.2. ORTHOGONAL BASES 125
Another application, in which orthogonality appears, is noise reduction.

It is so, because white noise in the sample values will then be transformed
into white noise in the wavelet coefficients. As a consequence, the noise
will statistically be equally spread over all the wavelet coefficients, while the
energy of the signal will (hopefully) be concentrated in a few large wavelet
coefficients. This makes it possible to extract the signal by keeping large
coefficients, and at the same time remove most of the noise by setting small
coefficients to zero. We will return to this ’wavelet shrinkage denoising’
in Chapter 10. Finally, orthogonality guarantees numerical stability in the
wavelet transform, since the condition number of the wavelet transform then
is one. Thus relative errors will not grow during the forward and inverse
wavelet transform.
Orthogonal bases corresponds to N = N e and Q = Q.e We are left with
designing the trigonometric polynomial Q in the factorization
N
1 + e−iω
H(ω) = Q(ω),
2
such that
(8.1) |H(ω)|2 + |H(ω + π)|2 = 2.
Since |Q(ω)|2 is real-valued, it is a polynomial in cos ω (Exercise 8.1). We

write it as a polynomial in sin2 ω/2 = (1 − cos ω)/2 instead,

2 2 ω
|Q(ω)| = P sin .
2
With y = sin2 ω/2, (8.1) then becomes
(8.2) (1 − y)N P (y) + y N P (1 − y) = 2, for 0 ≤ y ≤ 1.
One solution to this equation is the (N − 1):th order Taylor polynomial of

(1 − y)−N
N
X −1
N −1+k
PN (y) = y k = (1 − y)−N + O(y N ).
k
k=0
The product filter H(z)H(z −1 ) obtained from PN is usually referred to as

the Daubechies-2N Product Filter. To realize that PN (y) is a solution, we
observe that
(1 − y)N PN (y) + y N PN (1 − y) = 2 + (1 − y)N O(y N ) + y N O((1 − y)N ).

Consider the term (1−y)N O(y N )+y N O((1−y)N ). This must be a polynomial
of degree at most 2N − 1. On the other hand, it has zeros of order N at both
y = 0 and y = 1. Therefore, it must be identically 0, and so PN is a solution
to (8.2). It is possible to show that every other solution must be of the form
(8.3) P (y) = PN (y) + y N R(1 − 2y),
where R is any odd polynomial, such that P (y) ≥ 0 for 0 ≤ y ≤ 1.

After choosing a suitable R, we can easily compute the polynomial S(z) =
Q(z)Q(z −1 ). From there, we can extract Q by spectral factorization.
We briefly review the spectral factorization algorithm here. The zeros of
the polynomial S(z) come in pairs zi and zi−1 . One of these zeros belongs to
Q(z), and the other one to Q(z −1 ). Thus, the factor (z − zi ) goes into one
polynomial and (z − zi−1 ) goes into the other. Since there are two choices
for each such pair, the spectral factorization is not unique. As we will see,
different choices can give quite different wavelets.
Daubechies Orthogonal Wavelets

In the Daubechies orthogonal wavelet family, the lowpass filter is constructed
by first choosing R = 0 in (8.3). In the spectral factorization, zeros inside
the unit circle always go into the polynomial Q(z). This minimizes the
variation of the phase of Q(z) as z goes around the unit circle, and this is
why these filters are called minimum-phase. The filter length is 2N, and
the mother wavelet and scaling function are zero outside [0, 2N − 1]. The
smoothness increases with N and the number of continous derivatives grows
asymptotically as 0.2075N. A disadvantage with the Daubechies wavelets is
that they are very asymmetric, except for the Haar wavelet which corresponds
to N = 1. This can be seen in Figures 8.1 - 8.2, where we have plotted
Daubechies wavelets for N = 2, . . . , 9.
Symmlets
A way to make orthogonal wavelets less asymmetric is to do the spectral fac-
torization without requiring the minimum-phase property. Instead of always
choosing zeros inside the unit circle, we may choose zeros to make the phase
as nearly linear as possible. The corresponding family of orthogonal wavelets
is usually referred to as least asymmetric wavelets or Symmlets. They are
clearly more symmetric than Daubechies wavelets, as can be seen in Figure
8.3. The price we have to pay is that the Symmlet with N vanishing moment
is less regular than the corresponding Daubechies wavelet.
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
−1 0 1 2 3 4 −2 −1 0 1 2 3
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
0 2 4 6 −2 0 2 4
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
−2 0 2 4 6 −4 −2 0 2 4
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
−2 0 2 4 6 8 −4 −2 0 2 4 6
ϕ ψ
Figure 8.1: Daubechies orthogonal wavelets and scaling functions for N =

2, 3, 4, 5.
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −5 0 5 10
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −5 0 5 10
ϕ ψ
Figure 8.2: Daubechies orthogonal wavelets and scaling functions for N =

6, 7, 8, 9.
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
−2 0 2 4 6 8 −5 0 5
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 20 −10 −5 0 5 10
ϕ ψ
Figure 8.3: Symmlets and their scaling functions for N = 2, 3, 4, and 5.

Coiflets
Another family of orthogonal wavelets was designed to give also the scaling
functions vanishing moments,
Z ∞
tn ϕ(t) dt = 0, n = 1, . . . , N − 1.
−∞
This can be a useful property since, using a Taylor expansion, hf, ϕJ,k i =
2J/2 f (2−J k) + O(2−Jp) in regions where f has p continuous derivatives. For
smooth signals, the fine-scale scaling coefficients can thus be well approxi-
mated by the sample values, and pre-filtering may be dispensed with. These
wavelets are referred to as Coiflets. They correspond to a special choice of
the polynomial R in (8.3). Their support width is 3N −1 and the filter length
is 3N. In Figure 8.4, we have plotted the first four Coiflets together with
their scaling functions. We see that the Coiflets are even more symmetric
than the Symmlets. This is, of course, obtained to the price of the increased
filter length.
The Cascade Algorithm

Before proceeding to biorthogonal wavelets, we describe how the plots in Fig-
ures 8.1 - 8.4 were obtained. This was carried out using the cascade algorithm.
It is based on the fact that, for a continuous function f , 2j/2 hf, ϕj,k i → f (t0 ),
as j → ∞, if t0 = 2−j k is kept fixed. This is outlined in Exercise 8.4
If we want to compute sample values of a (continous) scaling function
ϕj0 ,l we can thus compute its scaling coefficients at some fine scale J > j0 .
Algorithm. (The Cascade Algorithm)
1. Start with scaling coefficients of ϕj0 ,l at scale j0 given by
sj0 ,k = hϕj0 ,l , ϕj0 ,k i = δk,l ,

wj0 ,k = hϕj0 ,l , ψj0 ,k i = 0.
2. Compute scaling coefficients sj,k = hϕj0 ,l , ϕj,k i and wavelet coefficients

wj,k = hϕj0 ,l , ψj,k i with the Fast Inverse Wavelet Transform.
3. Stop at some finest scale J and use 2J/2 sJ,k as approximations of

ϕj0 ,l (2−J k).
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
−2 0 2 4 6 −4 −2 0 2 4
1.5 2
1 1
0.5 0
0 −1
−0.5 −2
0 5 10 −5 0 5
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 −10 −5 0 5 10
1.5 1.5
1
1
0.5
0.5 0
−0.5
0
−1
−0.5 −1.5
0 5 10 15 20 −10 −5 0 5 10
ϕ ψ
Figure 8.4: The first four Coiflets with scaling functions for 2, 4, 6, and 8
vanishing moments.
Exercises 8.2
8.1. Show that |Q(ω)|2 is a polynomial in cos ω. Hint: It is a trigonometric
polynomial,
K
X K
X K
X
|Q(ω)|2 = ck eikω = ck cos kω + i ck sin kω.
k=−K k=−K k=−K
The coefficients ck are real (why?) and thus

K
X
2
0 = Im |Q(ω)| = ck sin kω.
k=−K
Finally, show that each cos kω is a polynomial in cos ω.

8.2. Verify that (8.1) is transformed into (8.2) with the substitutions
ω ω
|Q(ω)|2 = P sin2 and y = sin2 .
2 2
8.3. Show that all solutions to (8.2) must be of the form (8.3). Hint: Assume
that P (y) is any solution. Show that we then must have
(1 − y)N [P (y) − PN (y)] + y N [P (1 − y) − PN (1 − y)] = 0,
and that this implies that
P (y) − PN (y) = y N Pe(y),
for some polynomial Pe. Further, verify that Pe satisfies
Pe(y) + Pe(1 − y) = 0.
Finally, conclude that Pe(y) must be anti-symmetric around y = 1/2 and thus
can be written as Pe(y) = R(1 − 2y) for some odd polynomial R.
8.4. Show that if t0 = 2−j k is held constant, and f is continuous,
2j/2 hf, ϕj,k i → f (t0 ), as j → ∞. Hint: Verify that
Z ∞
j/2 j/2
2 hf, ϕj,k i = 2 f (t)ϕj,k (t) dt
−∞
Z ∞ Z ∞
j/2 j/2
= 2 f (t0 ) ϕj,k (t) dt + 2 [f (t) − f (t0 )] ϕj,k (t) dt
−∞ −∞
Z ∞

= f (t0 ) + f (t0 + 2−j k) − f (t0 ) ϕ(t) dt.
−∞
Since f is continuous, f (t0 + 2−j k) − f (t0 ) → 0 as j → ∞, for each t.

Therefore, the last integral goes to zero as well. (The latter statement is
justified by the so called Dominated Convergence Theorem, but we believe
that most readers will accept it anyway.)
8.3. BIORTHOGONAL BASES 133
8.3 Biorthogonal Bases

As mentioned above, it is impossible to have symmetry and orthogonality
at the same time. However, using biorthogonal wavelets, symmetry is possi-
ble. In image processing, and in particular image compression, symmetry is
important for the same reason as smoothness: leaving out a term in the sum
X
wj,k ψj,k (t)
j,k
is more easily detected by the human eye, if the synthesis wavelets ψj,k are
asymmetric. Also, symmetry is equivalent to filters having linear phase, cf.
Section 2.4.
In general, we have more flexibility in the biorthogonal case, since we
have two filters to design instead of one. Therefore, a general guideline could
be to always use biorthogonal bases, unless orthogonality is important for
the application at hand.
The filter design is also easier in the biorthogonal case. There are several
different ways to design the filters. One method is to choose an arbitrary
synthesis lowpass filter H(z). For the analysis lowpass filter, we have to
e
solve for H(z) in
(8.4) e −1 ) + H(−z)H(−z
H(z)H(z e −1
) = 1.
It is possible to show that solutions exist, if, e.g., H(z) is symmetric and
H(z) and H(−z) have no common zeros. These solutions can be found by
e
solving linear systems of equations for the coefficients of H(z).
Another method is to apply spectral factorization to different product
filters. If we want to design symmetric filters, both Q and Q e must be sym-
e
metric. A detailed analysis shows that N + N must be even, and that we
can write
e
Q(ω) = q(cos ω) and Q(ω) = qe(cos ω)
for some polynomials q and qe. This is when Q and Q e both have odd length.
They can also both have even length, and then a factor e−iω/2 must be in-
cluded. In any case, if we define L = (N + N e )/2 and the polynomial P
through

2 ω
P sin = q(cos ω)e
q (cos ω),
2
the equation (8.4) transforms into
(1 − y)L P (y) + y LP (1 − y) = 2,
The solutions to this equation are known from the previous section. After
e are computed using
choosing a solution, that is, choosing R in (8.3), Q and Q
spectral factorization.
A third method to construct biorthogonal filters, is to use lifting, as
described in Chapter 6.
Biorthogonal Spline Wavelets.

This is a family of biorthogonal wavelet bases, where the scaling function is a
B-spline function of order N, i.e., a convolution of the Haar scaling function
with itself N times. The corresponding analysis lowpass filter is
N
1 + e−iω
H(ω) = .
2
The synthesis lowpass filter is
e L−1
−iω N X
k
e 1 + e L − 1 + k 2 ω
H(ω) = sin .
2 k 2
k=0
The lowpass filters are based on a particular factorization of Daubechies’

product filter (R = 0), namely Q(z) ≡ 1 and Q(z) e = PL (z).
These spline wavelets have many attractive features, in addition to being
symmetric. The filter coefficients are dyadic rational, that is, of the form
2j k where j, k ∈ Z. This means that they can be exactly represented in a
computer. Moreover, multiplication with these coefficients becomes a very
fast operation, which is an advantage in computer implementations. Another
nice property is that we have analytic expressions for ϕ and ψ, something
that is not available in general.
A major disadvantage, however, is the large difference in filter length be-
tween the lowpass filters. This occurs because we put the whole polynomial
e
PL into the analysis lowpass filter H(ω). This factor also destroys the reg-
ularity of the dual functions. The primal scaling function derived from the
synthesis filter has N − 2 continuous derivatives, and so does the wavelet. As
seen in Figure (8.5) - (8.7), the dual scaling function and wavelet have much
less regularity for Ne = N, which is due to the factor PL (sin2 ω/2) present in
e
H(ω).
Other Biorthogonal Wavelets

Here we will describe other biorthogonal wavelets, based on the Daubechies
product filter.
ϕ
1.5
0.5
−0.5
−1 0 1 2
2 1.5 2
1 1 1
0 0.5 0
−1 0 −1
−2 −0.5 −2
−2 0 2 −2 0 2 −2 0 2
2 1.5 2
1 1 1
0 0.5 0
−1 0 −1
−2 −0.5 −2
−2 0 2 −2 0 2 −2 0 2
2 1.5 2
1 1 1
0 0.5 0
−1 0 −1
−2 −0.5 −2
−2 0 2 −2 0 2 −2 0 2
ψ e
ϕ ψe
e = 1, 2, and 3.
Figure 8.5: Biorthogonal spline wavelets with N = 1 and N
ϕ
1.5
0.5
−0.5
−2 0 2
2 6
4
4
1
2 2
0
0 0
−2
−1 −2 −4
−2 0 2 −2 0 2 −2 0 2
2 3 3
2 2
1
1
1
0
0
0 −1
−1 −1 −2
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
2 2 3
1 2
1
1
0
0
0
−1 −1
−2 −1 −2
−2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
2 2 3
1 2
1
1
0
0
0
−1 −1
−2 −1 −2
−2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
ψ e
ϕ ψe
e = 2, 4, 6, and 8.
ϕ
1
0.5
−0.5
−2 0 2
4 4
1
2 2
0 0
0
−2
−1 −2
−4
−2 0 2 4 −2 0 2 4 −2 0 2
2 2
1
1
1
0 0
0 −1
−1
−2
−1
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
2
2
1
1 1
0 0
0 −1
−1
−2
−1
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
2
2
1
1 1
0 0
0 −1
−1
−2
−1
−2 0 2 4 −4 −2 0 2 4 −2 0 2 4
ψ e
ϕ ψe
e = 3, 5, 7, and 9.
To make the filters have more similar lengths, we again invoke the spectral
factorization method to factorize PL (z) into Q(z) and Q(z). e To maintain
−1
symmetry, we always put zeros zi and zi together. Also, zi and zi must
always come together in order to have real-valued filter coefficients. This
limits the number of possible factorizations, but, for fixed N and N e , there
are still several possibilities. A disadvantage with these wavelets is that the
filter coefficients no longer are dyadic rational, or even rational.
We have plotted scaling functions and wavelets for some for some of the
most common filters in Figures 8.8 - 8.9.
The first filter pair is the biorthogonal filters used in the FBI finger-
print standard. They are constructed using a particular factorization of the
Daubechies-8 product filter. Both the primal and the dual wavelet have 4
vanishing moments. The filters have lengths 7 and 9, and this is the reason
for the notation 9/7 (the first number being the length of the dual lowpass
filter).
The second 6/10 pair is obtained by moving one vanishing moment (a
zero at z = −1) from the primal to the dual wavelet, and also interchanging
some other zeros. The primal scaling function and wavelet become somewhat
smoother. This filter pair is also very good for image compression.
The two last 9/11 pairs are based on two different factorizations of the
Daubechies-10 product filter.
Almost Orthogonal Systems

Another construction comes from the observation that Coiflets are very close
to being symmetric. This suggests that it might be possible to construct
symmetric, biorthogonal wavelets that are close to the Coiflets, and thus close
to being orthogonal. One possible way is to choose, roughly speaking, H(ω)
e is
as a rational approximation of the Coiflet low-pass filter. The dual filter H
then found by solving linear equations. This gives us a family of biorthogonal,
symmetric, almost orthogonal wavelets with rational coefficients.
8.4 Wavelets without Compact Support

We will end this chapter by briefly mentioning some other wavelet bases,
where not all of the wavelets and scaling functions have compact support.
They are perhaps not so interesting in applications, but be include them
8.4. WAVELETS WITHOUT COMPACT SUPPORT 139
Scaling function Wavelet

1.5 2
1.5
1
1
0.5 0.5
0
0
−0.5
−0.5 −1
−4 −2 0 2 4 −4 −2 0 2 4
Dual scaling function Dual wavelet

1.5 2
1.5
1
1
0.5 0.5
0
0
−0.5
−0.5 −1
−5 0 5 −4 −2 0 2 4
1 1.5
0.8 1
0.6 0.5
0.4 0
0.2 −0.5
0 −1
−0.2 −1.5
−2 0 2 4 −4 −2 0 2 4

1.5 2
1 1
0.5 0
−1
0
−2
−2 0 2 4 −4 −2 0 2 4
Figure 8.8: The 6/10 and the FBI 9/7 biorthogonal filter pair.

2.5
1.5 2
1.5
1 1
0.5
0.5
0
0 −0.5
−1
−0.5 −1.5
−5 0 5 −4 −2 0 2 4

1.5
1.2
1 1
0.8
0.5
0.6
0.4
0
0.2
0 −0.5
−0.2
−1
−5 0 5 −4 −2 0 2 4

0.8
1
0.6
0.4 0.5
0.2 0
0
−0.5
−0.2
−5 0 5 −4 −2 0 2 4
6 6
4
4
2
2
0
0
−2
−2 −4
−4 −6
−5 0 5 −4 −2 0 2 4
Figure 8.9: Scaling functions and wavelets for two 9/11 biorthogonal filter
pair.
8.4. WAVELETS WITHOUT COMPACT SUPPORT 141
anayway, both for historical reasons, and because of their interesting theo-
retical properties.
Meyer Wavelets
The Meyer wavelets are explicitly described: the Fourier transform of the
wavelet is defined as

1 −iω/2

 2π e sin π2 ν(3 |ω| − 1) if 2π
3
< |ω| ≤ 4π
3
,

b
ψ(ω) = 2π e 1 −iω/2 π 3 4π
cos 2 ν( 2 |ω| − 1) if 3 < |ω| ≤ 3 8π


0, otherwise.
Here, ν is any smooth function such that

(
0 if x ≤ 0,
ν(x) =
1 if x ≥ 1,
and ν(x) + ν(1 − x) = 1. These conditions on ν are enough to ensure that

{ψj,k } is an orthogonal basis. Using MRA, a proof of this is possible by
identifying the scaling function

1

 2π if |ω| ≤ 2π
3
,

b
ϕ(ω) = 2π cos 2 ν(3 |ω| − 1) if 3 < |ω| ≤ 4π
1 π 2π
,


3
0 otherwise.
The Meyer wavelets have a number of interesting properties. Since ψb is

zero in an interval around the origin, we have ψb(n) (0) = 0 for all n ∈ Z.
Therefore the Meyer wavelets have an infinite number of vanishing moments.
Both wavelets and scaling functions are symmetric. They are both infinitely
differentiable (C ∞ ), since they have compact support in the frequency do-
main. They cannot have compact support, though, but decay fast. The rate
of decay is determined by the smoothness of ν.
Battle-Lemarié Wavelets
The construction of the Battle-Lemarié orthogonal family starts from the
B-spline scaling functions. These are not orthogonal to their integer trans-
lates, since they are positive and overlapping. Remember the orthogonality
condition for the scaling function from Section 4.4.
X
(8.5) b + 2πl)|2 = 1.
|ϕ(ω
l
1.5 2
1.5
1
1
0.5
0.5
0
−0.5
0
−1
−0.5 −1.5
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
Figure 8.10: The piecewise linear Battle-Lemarié scaling function and

wavelet.
Instead, the B-spline scaling functions satisfy the inequalities

X
A≤ b + 2πl)|2 ≤ B,
|ϕ(ω
l
for all ω, where A and B are some positive constants. This is actually the
condition on {ϕ(t − k)} to be a Riesz basis, transformed to the Fourier
domain. Now it is possible to define the Battle-Lemarié scaling function
b
ϕ(ω)
b# (ω) = qP
ϕ .
2
l b + 2πl)|
|ϕ(ω
This scaling function satisfies (8.5), and thus it generates an orthogonal mul-
tiresolution analysis. The wavelet is defined by the alternating flip construc-
tion.
One can show that this ’orthogonalized’ spline scaling function spans the
same Vj spaces as the original spline scaling function. However, it will no
longer have compact support, and this also holds for the wavelet. They both
have exponential decay, though, and are symmetrical. In Figure 8.10, we
have plotted the Battle-Lemarié scaling function and wavelet correponding
to piecewise linear splines. Note that they are both piecewise linear, and
that the decay is indeed very fast.
Chui Semiorthogonal Spline Wavelets

Semiorthogonal bases is a special case of biorthogonal bases, where we require
Vj = Vej and Wj = W fj . We still have two scaling functions ϕ, ϕ, e and
e
wavelets ψ, ψ, but they must now generate the same Vj and Wj spaces. For
orthogonality, we need the extra condition ϕ = ϕ e and ψ = ψ. e The usual
8.5. NOTES 143
biorthogonality conditions immediately implies that Vj ⊥ Wj for each j.

At each scale, all scaling functions must then be orthogonal to all wavelets.
Further, the wavelets ψ0,k belong to all approximation spaces Vj for j > 0.
They must therefore be orthogonal to all wavelets with j > 0. We can thus
conclude that wavelets at different scales are always orthogonal. At a fixed
scale j, the wavelets ψj,k are not orthogonal among themselves, but rather
to the dual wavelets ψej,k ,
hψj,k , ψej,l i = δk,l .
The semiorthogonal spline wavelets of Chui & Wang use B-splines of order
N as scaling functions. The wavelets become spline functions with support
width [0, 2N − 1]. The dual scaling functions and wavelets are also splines
but without compact support. They decay very fast, though. All scaling
functions and wavelets are symmetric, and all filter coefficients are rational.
We also have analytic expressions for all scaling functions and wavelets.
8.5 Notes
One of the first families of wavelets to appear was the Meyer wavelets, which
were constructed by Yves Meyer in 1985. This was before the connection
between wavelets and filter banks was discovered, and thus the notion of
multiresolution analysis was not yet available. Instead, Meyer defined his
wavelets through explicit constructions in the Fourier domain.
Ingrid Daubechies constructed her family of orthogonal wavelets in 1988.
It was the first construction within the framework of MRA and filter banks.
Shortly thereafter followed the construction of Symmlets, Coiflets (which
were asked for by Ronald Coifman in 1989, therefore the name), and later
also various biorthogonal bases.
For more details about the construction of different wavelet bases, we
refer to Daubechies’ book [11], where also tables with filter coefficients can
be found. Filter coefficients are also available in WaveLab.
Chapter 9
Adaptive Bases
In this chapter, we describe two constructions allowing an adaption of the

analysis to the signal at hand. These are Wavelet Packets and Local Trigono-
metric Bases. In a sense, they are dual to each other: wavelet packets provide
flexibility in the frequency band decomposition, whereas local trigonometric
bases, which are closely related to windowed Fourier analysis, allow an adap-
tion of the time interval decomposition.
Wavelet packets and local trigonometric bases are special cases of the
more general concept of time-frequency decompositions, which we discuss
briefly at first.
9.1 Time-Frequency Decompositions

A time-frequency decomposition of a signal f is an expansion
X
f (t) = ck bk (t),
k
where each basis function bk is well localized in both time and frequency.
By the latter we mean that both f (t) and f(ω) b decay fast as |t| , |ω| → ∞.
Such basis functions are sometimes referred to as time-frequency atoms. The
wavelet decomposition
X
f (t) = hf, ψej,k i ψj,k (t)
j,k
is an example of a time-frequency decomposition. We assume that the mother

wavelet ψ has its energy mainly in the time interval (0, 1) and frequency band
(π, 2π). By this we mean that
Z 1 Z 2π
2
|ψ(t)| dt and b
|ψ(ω)| 2
dω
0 π
145
146 CHAPTER 9. ADAPTIVE BASES
both are close to the total energy of the signal. (Cf. the inequality 1.1
in Chapter 1, which puts a limit to simultaneous localization in time and
in frequency.) By scaling, the basis functions ψj,k (t) then have their en-
ergy essentially concentrated to the time interval (2−j k, 2−j (k + 1)) and
the frequency interval (2j π, 2j+1π). We associate ψj,k with the rectangle
(2−j k, 2−j (k + 1)) × (2j π, 2j+1 π) in the time-frequency plane (see Figure 9.1).
These rectangles are sometimes referred to as Heisenberg boxes. The Heisen-
ω
6
2j+1π
2j π
- t
−j −j
2 k 2 (k + 1)
Figure 9.1: The Heisenberg box for ψj,k .
berg boxes for the wavelets ψj,k gives a tiling of the time-frequency plane
as shown in Figure (9.2). The lowest rectangle corresponds to the scaling
function ϕj0 ,0 at the coarsest scale. If we have an orthonormal wavelet basis,
ω
π6
ψ−1,0 ψ−1,1 ψ−1,2 ψ−1,3
ψ−2,0 ψ−2,1
ψ−3,0
ϕ−3,0
- t
8
Figure 9.2: Tiling of the time-frequency plane for a wavelet basis.
the energy of a signal can be expressed in terms of the wavelet coefficients

9.1. TIME-FREQUENCY DECOMPOSITIONS 147
wj,k = hf, ψj,k i as

X
kf k2 = |wj,k |2 .
j,k
Hence, the squared modulus of wavelet coefficients |wj,k |2 can be interpreted

as an energy distribution in the time-frequency plane. Each Heisenberg box
contains a certain amount |wj,k |2 of the signal’s total energy.
Two extreme cases of time-frequency decompositions are shown in Figure
9.3. The first one has perfect time resolution but no frequency resolution.
The corresponding basis is associated with the sample values, so the basis
functions could be, e.g., sinc functions. More generally, they could be scaling
functions at the finest scale. The other time-frequency decomposition is
the discrete Fourier transform, which has no time resolution but perfect
frequency resolution.
ω ω
6 6
- t - t
Figure 9.3: Time-frequency plane for the basis associated with sampling and
the Fourier basis.
Another time-frequency decomposition is the windowed Fourier trans-

form, where the signal is divided into segments (kT, (k + 1)T ) of length T ,
and a discrete Fourier transform is applied to each segment. The decomposi-
tion into segments can be thought of as multiplying the signal with window
functions that equal 1 on (kT, (k +1)T ), and 0 otherwise. These windows are
called ideal windows. To make the transition between segments smoother,
one can use smoother versions of the ideal window. One then multiplies
the signal with w(t − kT ), where w is a smooth approximation of the ideal
window for (0, T ). The tiling of the time-frequency plane for the windowed
Fourier transform is shown in Figure (9.4) for two different window sizes T .
ω ω
6 6
2π
T
2π
T
- t - t
T T
Figure 9.4: Tiling of the time-frequency plane for windowed Fourier trans-
form with two window sizes
How to choose a proper window size is the major problem with the windowed
Fourier transform. If the window size is T , each segment will be analyzed
at frequencies, which are integer multiples of 2π/T . Choosing a narrow win-
dow will give us good time resolution, but bad frequency resolution, and vice
versa.
One solution to this problem would be to adapt the window size to the
frequency, and use narrow windows at high frequencies and wider windows for
low frequencies. This is basically done by the wavelet transform, even though
there is no explicit window involved. For high frequencies, the wavelets are
well localized in time, and the Heisenberg boxes are narrow and high. At
lower frequencies, the wavelets are more spread out, and we get boxes with
large width and small height. This is useful, since many high frequency
phenomena are short-lived, e.g., edges and transients, while lower frequency
components usually have a longer duration in time.
Exercises 9.1
9.1. Sketch for yourself how the decomposition of the time-frequency plane
changes as we go through the filtering steps in the Forward Wavelet Trans-
form, starting at the finest scaling coefficients.
9.2. WAVELET PACKETS 149
9.2 Wavelet Packets

As mentioned above, the wavelet transform is suitable for signals with short-
lived high-frequency and long-lived low-frequency components. However,
some signals having high-frequency components with a long duration.1 This
is the case for images with textures, i.e., parts of the image having a special
structure. An example of textures is given by the fingerprint image in Figure
1.12 in Chapter 1. Such parts of the image often have a well defined fre-
quency content at high frequencies. Many fine-scale wavelets will therefore
be needed to approximate this region of the image. In this case, it could be
preferable to have more spread out fine-scale wavelets.
The Wavelet Packet Transforms

The wavelet transform decomposes signals into the frequency bands as in
Figure 4.8. We would like to be able to split the signal into more general
frequency bands. One way to accomplish this flexibility is to allow also the
wavelet coefficients to be split with the analysis filter bank. Starting with
the scaling coefficients sJ , we compute scaling coefficients at a coarser scale,
sJ−1 , and wavelet coefficients wJ−1 , using the analysis filter bank. Both can
be further split with the analysis filters. At the next level we can have up
to four sequences of coefficients, and they are all allowed to be further split.
This gives us a tree-structure of coefficients at different levels as depicted
in Figure 9.5. This tree is called the wavelet packet tree. At each node in
the wavelet packet tree, we have a set of coefficients that we can choose
to further split or not. Every sequence of such choices gives a particular
wavelet packet transform. Each such transform corresponds to a subtree of
the wavelet packet tree with all nodes having either two children or none.
Wavelet Packet Basis Functions

Every wavelet packet transform gives us the coefficients in a particular wave-
let packet basis. The scaling coefficients (sJ−1,k ) and wavelet coefficients
(wJ−1,k ) are coefficients in the bases {ϕJ−1,k } and {ψJ−1,k } of VJ−1 and WJ−1 .
When we split the space VJ−1 further, we get coefficients (sJ−2,k ) and
(wJ−2,k ) for the basis functions ϕJ−2,k and ψJ−2,k , that together span VJ−1 .
When we split WJ−1 with the analysis filters, we get coefficients for certain
1
With space variables, the terms are large wave-number components and long exten-
sion, respectively. We will, however, follow established practice and use the term frequency
instead of wave-number.
sJ
XXX
XXX
XXX
9
z
X
sJ−1 wJ−1
HH HH
HH HH

j
H
j
H
sJ−2 wJ−2
@ @ @ @
@R
@ @@
R @@
R @
R
@
sJ−3 wJ−3
Figure 9.5: The wavelet packet tree.
wavelet packet basis functions. Before we define these basis functions, we

introduce a convenient notation:
m0 (ω) = H(ω) and m1 (ω) = G(ω).
We now define wavelet packets at level J − 2 as
(1,0) (1,0) (1,1) (1,1)
ψJ−2,k (t) = ψJ−2 (t − 2J−2 k) and ψJ−2,k (t) = ψJ−2 (t − 2J−2 k),
where
[ 1
ψJ−2 (ω) = √ m0 (21−J ω)ψbJ−1,0(ω),
(1,0)
2
[ 1
ψJ−2 (ω) = √ m1 (21−J ω)ψbJ−1,0(ω).
(1,1)
(9.1)
2
(1,0) (1,1)
The wavelet packets ψJ−2,k and ψJ−2,k together constitute a Riesz basis for
WJ−1 . This is a consequence of the splitting trick :
Theorem 9.1. Assume that {χ(t − k)} is a Riesz basis for a subspace V .
Then the functions
√ √
χ0k (t) = 2χ0 (t − 2k) and χ1k (t) = 2χ1 (t − 2k)
together consitute a Riesz basis for V , where
c
χ0 (ω) = m0 (ω)b c1 (ω) = m1 (ω)b
χ(ω) and χ χ(ω).
9.2. WAVELET PACKETS 151
We now define general wavelet packets as

e
ψj,k (t) = ψje (t − 2−j k),
where e = (e1 , . . . , eL ) is a sequence of 0’s and 1’s, and

L−1
Y
ce (ω) = 2−j/2 ϕ(2
ψ b −j−Lω) mei (2i−j−Lω).
j
i=0
In other words, the wavelet packet ψje is obtained by applying a sequence of

low- and highpass filterings according to the ei ’s, starting with the scaling
e
function ϕj+L,0. The space spanned by the wavelet packets ψj,k for fixed
e
j and e is denoted Wj . In Figure 9.6, we show the ideal frequency bands
associated with these spaces.
V0
(0) (1)
W−1 W−1
(0,0) (0,1) (1,0) (1,1)
W−2 W−2 W−2 W−2
(0,0,0) (0,0,1) (0,1,0) (0,1,1) (1,0,0) (1,0,1) (1,1,0) (1,1,1)
W−3 W−3 W−3 W−3 W−3 W−3 W−3 W−3
-
0 π/2 π
Figure 9.6: The ideal frequency bands for the wavelet packet spaces
In Figure (9.7) we show the tiling of the ideal time-frequency plane for two
particular wavelet packet bases together with the corresponding subtrees of
the wavelet packet tree. Note that we get two high-frequency basis functions
with long duration in time with the first basis (the left figure).
Wavelet Packets in Two Dimensions

There are also wavelet packets for two-dimensional wavelet transforms. For
separable wavelets, this amounts to allowing the detail spaces WH , WV , and
WD to be further split with the separable two-dimensional filters. This gives
a more complicated wavelet packet tree, where each node has four children. In
Figure 9.8, we show the frequency plane decomposition of separable wavelet
packets after one and two splittings. The superscripts 0, 1, 2, 3 denote filtering
with H, GH , GV , and GD respectively.
t t
HH HH
t HHt t HHt
@ @ @
t @t t @t t @t
A A A
t At t At t At
ω ω
6 6
- t - t
Figure 9.7: The tiling of the ideal time-frequency plane for two different
wavelet packet bases
η η
π6 π6
(2,2) (2,3) (3,2) (3,3)
W−2 W−2 W−2 W−2
(2) (3)
W−1 W−1
(2,0) (2,1) (3,0) (3,1)
W−2 W−2 W−2 W−2
π π
2 2
(0,2) (0,3) (1,2) (1,3)
W−2 W−2 W−2 W−2
(0) (1)
W−1 W−1
(0,0) (0,1) (1,0) (1,1)
W−2 W−2 W−2 W−2
- ξ - ξ
π π
2
π 2
π
Figure 9.8: The ideal decomposition of the frequency plane for separable
two-dimensional wavelet packet bases
9.3. ENTROPY AND BEST BASIS SELECTION 153
Exercises 9.2
9.2. Sketch for yourself step by step how the ideal decomposition of the
time-frequency plane changes when going from the representaion in the VJ
space to the wavelet packets representations in Figure 9.7.
9.3 Entropy and Best Basis Selection

A natural question at this point is which of the many wavelet packets trans-
forms to choose. If we have some a priori information about the signals under
consideration, this can sometimes be used to select a proper wavelet packet
basis. Such is the case with the FBI fingerprints images, where the ridges
in the fingerprints are known to be repeated with certain frequencies. If no
such information is available, we have to use the transform that is optimal
for the given signal, with respect to some chosen criterion.
The Best Basis Algorithm

Suppose that we have a cost function Λ that somehow measures the perfor-
mance of different transforms, that is, if c and e c are the coefficients in two
different bases, the first one is preferred if Λ(c) < Λ(e c). We further assume
that Λ is additive, which means that if we merge two sequences c1 and c2
into one sequence c, which we write c = [c1 c2 ], we have Λ(c) = Λ(c1 ) + Λ(c2 ).
We will describe how to find the wavelet packet basis whose coefficients, for
a given signal, minimize Λ. At first, this seems to be an overwhelming task,
since the number of wavelet packet bases is proportional to 2N , if we start
with N scaling coefficients sJ,k . However, there is a depth-first search algo-
rithm that finds the optimal basis in O(N log N) operations. This algorithm
is called the Best Basis Algorithm.
To understand how the algorithm works, let us look at the splitting of
the scaling coefficients sJ into sJ−1 and wJ−1 . We recursively assume that we
know the optimal sequence of choices for splitting sJ−1 and wJ−1 , and denote
the corresponding optimal wavelet packets coefficients sopt opt
J−1 and wJ−1 . The
additivity now implies that the best wavelet packet coefficients we can get, if
we decide to split sJ , is copt opt opt
J−1 = [sJ−1 wJ−1 ], since for any other set of wavelet
packet coefficients coth oth oth
J−1 = [sJ−1 wJ−1 ], we have
opt opt opt
Λ(coth oth oth
J−1 ) = Λ(sJ−1 ) + Λ(wJ−1 ) ≥ Λ(sJ−1 ) + Λ(wJ−1 ) = Λ(cJ−1 ).
Thus we should split sJ exactly when

Λ(sJ ) > Λ(copt opt opt
J−1 ) = Λ(sJ−1 ) + Λ(wJ−1 ).
Entropy and Other Cost Functions

Entropy
In many applications, the goal is to extract the relevant information in a
signal, using as few coefficients as possible. Thus, we want a few coefficients
to be large and the remaining ones to be small. The entropy is a common
way to measure this property. The entropy of the coefficients c is defined as
X
H(c) := − pk log pk ,
k
where
|ck |2
pk = and 0 log 0 := 0.
kck2
A well-known fact about the entropy measure, which is outlined in Exer-
cise 9.4, is the following:
Theorem 9.2. If c is a finite sequence, of length K say, then
0 ≤ H(c) ≤ log K.
Further, the minimum value is attained only when all√ but one ck are 0, and
the maximum is attained only when all |ck | equal 1/ K.
Of course, the conclusion also holds for sequences c with at most K non-zero
coefficients. The number d(c) = eH(c) is known in information theory as the
theoretical dimension of c. It can be proved that the number of coefficents
that are needed to approximate c with error less than ǫ is proportional to
d(c)/ǫ.
However, we may not use the entropy measure as a cost function directly,
since it is not additive. But if we define the additive cost function
X
Λ(c) := − |ck |2 log |ck |2 ,
k
we have that
1
(9.2) H(c) = 2 log kck + Λ(c).
kck2
Thus we see that minimizing Λ is equivalent to minimizing H for coefficient
sequences with fixed norm. Therefore, if we want to use the best basis
algorithm to minimize the entropy, we must ensure that the norm of sJ
equals the norm of [sJ−1 wJ−1 ]. In other words, we must use orthogonal
filters.
9.3. ENTROPY AND BEST BASIS SELECTION 155
Other cost functions

An alternative cost function is the ℓ1 -norm of c:
X
Λ(c) := |ck | .
k
For coefficient sequences with fixed ℓ2 (Z)-norm, Λ is also minimized when all
coefficients are 0 except one and maximized when all |ck | are equal. To get
relevant comparisons between sJ and [sJ−1 wJ−1 ], we again need the filters
to be orthogonal.
Finally, we mention two application-dependent cost functions. When
using wavelet packets for reducing noise in signals, one would like to choose
the basis that gives the smallest error between the denoised signal and the
real signal. As a cost function one can use an estimate of this prediction
error. One example of this, the SURE cost function, will be discussed in
Section 10.2. In classification applications, cost functions that measure the
capability of separating classes are used. They will be discussed in more
detail in Chapter 14.
Exercises 9.3
9.3. Verify that the entropy measure is not an additive cost function, but
that Λ is. Then verify the identity (9.2).
9.4. Prove theorem 9.2. Hint: Use that the exponential function is strictly
convex, which means that
P X
λk xk
e k ≤ λk exk ,
k
P
for λk ≥ 0 and k λk = 1, and equality occurs only when all xk are equal.
Apply this with λk = pk and xk = log pk to obtain eH(c) ≤ K. The left
inequality should be obvious, since all 0 ≤ pk ≤ 1, which implies log pk ≤ 0.
The only way H(c) = 0 could happen is if all pk are either 0 or 1. But since
they sum up to 1, this implies that one pk equals 1 and the remaining are 0.
9.5. Show that for c ∈ RK

K
X √
kck ≤ |ck | ≤ K kck ,
k=1
where the maximum value is attained when all |ck | are equal, and the min-
imum is attained when all ck ’s but one are 0. Hint: For the right hand
inequality, write
K
X K
X
|ck | = 1 · |ck |
k=1 k=1
and use the Cauchy-Schwartz inequality. For the left hand inequality, you
may assume that kck = 1 (why?). Show that this implies |ck | ≤ 1 for each
k. Then |ck |2 ≤ |ck |, and the inequality follows.
9.4 Local Trigonometric Bases

In the previous section, we have described how wavelet packets provide time-
frequency decompositions with adaptivity in frequency. When using these
packets, we split frequency bands into two parts whenever it is to our advan-
tage. Local trigonometric bases can be seen as the dual of wavelet packets,
where now the adaptivity comes in the time variable instead. Moreover, the
local trigonometric bases are defined explicitly, while the definition of wavelet
packets is algorithmic.
The Local Trigonometric Basis Functions

We start with a partition of the real line into intervals Ik , k ∈ Z. To keep
things simple, we choose Ik = [k, k + 1], but the construction works without
serious complications for non-uniform partitions. We consider window func-
tions wk (t) associated with the intervals Ik , having the following properties:
(9.3a) 0 ≤ wk (t) ≤ 1 for all t,

1 3
(9.3b) wk (t) = 0 for t ≤ k − and t ≥ k + ,
2 2
1
(9.3c) wk (k + t)2 + wk (k − t)2 = 1 for |t| ≤ ,
2
2 2 1
(9.3d) wk (k + 1 + t) + wk (k + 1 − t) = 1 for |t| ≤ ,
2
(9.3e) wk (k − t) = wk−1 (k + t) for all t.
These somewhat complicated conditions are needed to ensure that the local
trigonometric basis functions, which we will define soon, give an ON basis of
L2 (R).
9.4. LOCAL TRIGONOMETRIC BASES 157
Setting t = 1/2 in (9.3c), we see that wk (k + 1/2) = 1, since, from (9.3b),

we have wk (k − 1/2) = 0. Sometimes, the window function wk is required to
be identically 1 on [k, k + 1] and 0 outside, except in small intervals around
t = k and t = k + 1. In this case, wk (t) is very close to the ideal window.
Since we have a uniform partition, we can get the window functions from one
window function w(t) by translation, wk (t) = w(t − k).
1.5
0.5
−0.5
−1
−1 0 1 2 3 4
Figure 9.9: Window functions.
2 2
1 1
0 0
−1 −1
−2 −2
−1 0 1 2 −1 0 1 2
Figure 9.10: Two basis functions b3,0 and b5,0
The Local Trigonometric Basis Functions for the partitioning Ik are con-
structed by filling the windows wk (t) with cosine oscillations at frequencies
π(n + 21 ):

√ 1
bn,k (t) = 2 wk (t) cos π(n + )(t − k) .
2
It can be shown that the functions bn,k constitute an orthonormal basis for
L2 (R). To compute the coefficients

Z ∞
cn,k = hf, bn,k i = f (t)bn,k (t) dt
−∞

√ Z k+ 23
1
= 2 f (t)wk (t) cos π(n + )(t − k) dt,
k− 21 2

we observe that cos π(n + 12 )(t − k) is symmetric around the left interval
endpoint t = k, and anti-symmetric around the right interval endpoint t =
k + 1. Therefore, the parts of f (t)wk (t) outside the interval [k, k + 1] can be
folded back into the interval as shown in Figure 9.11. This produces a folded
version fe of f , and the coefficients cn,k are then given by

√ Z k+1
1
cn,k = 2 e
f (t) cos π(n + )(t − k) dt.
k 2
When computing the coefficients cn,k , after the folding, we can use the fast
2.5
1.5
0.5
−0.5
−1
−1.5
−2
−1 −0.5 0 0.5 1 1.5 2
Figure 9.11: The windowed function f (t)w(t) (solid line) and its folded ver-
sion (dashed line).
algorithms associated with the Discrete Cosine Transform (DCT). For N

sample values, the number of operations is then O(N log N). To recover the
sample values of f from the coefficients cn,k , we first apply an inverse cosine
transform to get sample values of the folded function f. e This function can
then be unfolded to produce f , see Exercise 9.6.
9.4. LOCAL TRIGONOMETRIC BASES 159
Adaptive Segmentation
As mentioned earlier, the construction of the basis functions bn,k works for
an arbitrary partition of the real line. We seek an optimal partition, using,
for instance, one of the cost functions described above. This can be done
by adaptively merging intervals, starting from some initial partition. Let us
consider merging the intervals I0 = [0, 1] and I1 = [1, 2] to Ie0 = [0, 2]. A
window function for Ie0 is given by
p
e0 (t) =
w w0 (t)2 + w1 (t)2 ,
which can be shown to satisfy the conditions (9.3a) -(9.3e), slightly modified.
The basis functions associated with Ie0 are defined by

1
ebn,0 (t) = √ w 1 t
e0 (t) cos π(n + ) .
2 2 2
Note that we analyze at frequencies π2 (n + 21 ) after merging, so we get twice

as good frequency resolution. At the same time, we lose time resolution.
Denote the coefficients for the basis functions bn,0 , bn,1 , and ebn,0 with cn,0 ,
cn,1 , and e
cn,0 . The transform from [c0 c1 ] to e
c0 can be shown to be orthogonal.
Therefore it pays off to merge I0 and I1 when Λ(e c0 ) < Λ(c0 ) + Λ(c1 ).
We can merge intervals I2k = [2k, 2k + 1], I2k+1 = [2k + 1, 2k + 2] into
e
Ik = [2k, 2k + 2] in the same way. These can then be merged further, which
gives a tree structure similar to the wavelet packet tree. This tree can be
searched with the same algorithm to find the optimal time segmentation with
O(N log N) operations.
In Figure 9.12, the decomposition of the time-frequency plane is shown
for a particular local trigonometric basis. We encourage the reader to figure
out how the merging of intervals was done to obtain this basis, starting from
no frequency resolution as in Figure 9.3.
Exercises 9.4
9.6. Show that the folded function fe is given by
(
wk (k − t)f (k + t) − wk (k + t)f (k − t), for − 12 ≤ t ≤ 0,
fe(k + t) =
wk (k + t)f (k + t) + wk (k − t)f (k − t), for 0 ≤ t ≤ 21 .
ω
6
- t
Figure 9.12: Tiling of the time-frequency plane with a Local Trigonometric

basis.
Also show that f can be recovered from its folded version through
(
wk (k + t)fe(k − t) + wk (k − t)f (k + t), for − 21 ≤ t ≤ 0,
f (k + t) =
wk (k + t)f (k + t) − wk (k − t)f (k − t), for 0 ≤ t ≤ 12 .
9.7. Construct a tiling of the time-frequency plane that is impossible to

obtain from wavelet packets or local trigonometric bases.
9.5 Notes
For further reading about wavelet packets and local trigonometric bases, and
time-frequency decompositions in general, we refer to the paper Lectures on
Wavelet Packet Algorithms [31] by Wickerhauser.
Wavelet packets can also be defined in the non-separable case. Wavelet
packets for hexagonal wavelets give rise to some fascinating frequency plane
decompositions, which can be found in the paper by Cohen and Schlenker
[10].
Chapter 10
Compression and Noise

Reduction
To date, perhaps the most successful application of wavelets is found in image

compression. It is based on the observation that, for most images, a few large
wavelet coefficients contain the ’relevant information’ of the image, while the
other coefficients are very small.
An analoguous consideration is the theoretical basis behind wavelet de-
noising. Since orthogonal wavelets transform white noise into white noise,
noise will be uniformly spread over all wavelet coefficients. This makes it
possible to extract the few large wavelet coefficients, and removing most of
the noise by setting small wavelet coefficients to zero.
10.1 Image Compression

For simplicity, we will only work with grey-scale images f (x, y), where 0 <
x, y < 1. We assume 0 ≤ f (x, y) ≤ 1, where 0 is black and 1 is white.
The extension to colour images is straightforward, since a colour image can
be represented by three functions fr (x, y), fg (x, y), and fb (x, y), using the
standard red-green-blue representation. A general image compression algo-
- Image - - Entropy -
f Quantization
transform coding
Figure 10.1: The general image compression algorithm
rithm consists of three parts as shown in figure 10.1, an image transform,
161
162 CHAPTER 10. COMPRESSION AND NOISE REDUCTION
quantization, and entropy coding. The image transform is in general a linear,

invertible transform that decorrelates the image to make compression possi-
ble. The quantization step maps the transformed coefficients into a smaller,
finite set of values. This is the step in the compression algorithm, where
some information is lost. Most of the compression is achieved in the entropy
coding step. To recover the compressed image, we reverse the whole process
as in Figure 10.2, yielding an approximation fb of the original image.
fb Inverse De-quantization Decoding

transform
Figure 10.2: Decompression
Image Transform
Most images have spatial correlation, that is, neighbouring pixels tend to
have similar grey-scale values. The purpose of the image transform is to
exploit this redundancy to make compression possible. Remember the two-
dimensional Haar transform, where groups of four pixel values are replaced by
their mean value, and three wavelet coefficients or ’differences’. If those four
pixel values are similar, the corresponding wavelet coefficients will essentially
be 0. The averaging and differencing is repeated recursively on the mean
values, to capture large-scale redundancy. For smooth regions of the image,
most wavelet coefficients will be almost 0. Fine-scale wavelet coefficients are
needed around edges, and in areas with rapid variations. A few large-scale
coefficients will take care of slow variations in the image. For images without
too much variation, a few wavelet coefficients contain the relevant information
in the image. For images with e.g. texture, such as the fingerprint images, a
wavelet packet transform might be more appropriate.
Most of the compression is achieved in the first filtering steps of the
wavelet transform. Therefore, in wavelet image compression, the filter bank
is usually iterated only a few times, say, 4 or 5. A smoother wavelet than the
Haar wavelet is used, since compression with Haar wavelets leads to blocking
artifacts; rectangular patterns appear in the reconstructed image. The choice
of an optimal wavelet basis is an open problem, since there are many aspects
to take into account.
First, we want synthesis scaling functions and wavelets to be smooth.
At the same time, smoothness increases the filter length, and thus also the
10.1. IMAGE COMPRESSION 163
support width of scaling functions and wavelets. Too long synthesis filters
will give rise to ringing artifacts around edges. Further, we want all filters
to be symmetric.
Another problem associated with wavelet image compression is border
artifacts. The wavelet transform assumes that f (x, y) is defined in the whole
plane, and therefore the image need to be extended outside the borders.
There are three extensions in practice: zero-padding, periodic extension, or
symmetric extension. Zero-padding defines the image to be zero outside the
borders. After compression, this will have a ’darkening’ influence on the
image near the border. Periodic extension assumes that the image extends
periodically outside the borders. Unless the grey-scale values at the left
border matches those at the right border etc., periodic extension will induce
discontinuities at the borders, which again lead to compression artifacts.
Generally, the best method is to use symmetric extension, which gives a
continuous extension at the borders, and no compression artifacts appears.
Symmetric extension requires symmetric filters. An alternative is to use so
called boundary corrected wavelets. We will discuss the boundary problem
further in Chapter 15.
Another possible image transform is the classical Fourier transform. If
the signal is smooth, the Fourier coefficients decay very fast towards high
frequencies, and the image can be represented using a fairly small number
of low frequency coefficients. However, the presence of a single edge will
cause the Fourier coefficients to decay very slowly, and compression no longer
becomes possible. A way to get around this is to use a windowed Fourier
transform instead. This is basically the JPEG algorithm, where the image
is divided into blocks of 8 × 8 pixels, and a cosine transform is applied to
each block. For blocks containing no edges, high frequency coefficients will
be almost zero.
For high compression ratios, blocking artifacts appear in the JPEG algo-
rithm, that is, the 8 × 8 blocks become visible in the reconstructed image.
With properly choosen wavelets, wavelet image commpression works better.
For moderate compression ratios, e.g. 1 : 10, the performance of JPEG is
comparable to wavelets.
Quantization and Entropy Coding

After the image transform, most coefficients will be almost zero. A crude
compression algorithm would be to set all those to zero, and only store the
few remaining significant coefficients. A more sophisticated algorithm applies
a quantization rule to the transformed coefficients.
A quantization rule is a function q : R → {0, 1, . . . , K}. A coefficient
value x > 0 is assigned the integer k if it belongs to an interval (dk , dk+1].

The numbers dk are called the decision points of the quantization rule. A
negative value of x will be assigned to a corresponding interval on the negative
axis.
The de-quantization rule assigns a reconstruction level rk ∈ (dk , dk+1] to
the integer k. Usually, rk is chosen as the midpoint of the interval [dk , dk+1].
Quantization followed by de-quantization becomes a piecewise constant func-
tion that assigns all dk < x ≤ dk+1 the same value rk (Figure 10.3).
6
r2
r1
-
−d2 −d1 d1 d2
−r1
−r2
Figure 10.3: Quantization followed by de-quantization.
The design of the decision points dk is important to obtain good com-

pression. In wavelet image compression, different quantization rules are used
for different scales. More bits, and thus more decision points are used in
’key’ frequency bands. Usually, the highest frequency bands are assigned
very few bits, and more and more bits are assigned to the lower frequency
bands. Within each band, the decision points are decided from a statistical
model of the wavelet/scaling coefficients.
After quantization, we are left with a set of integer coefficients, of which
many are zero. To achieve significant compression, all these 0’s need to
be stored in an efficient way. One way to do this is to order the wavelet
coefficients in a clever way to obtain long strings of 0’s. These strings can then
be represented by their length, using, for instance, run-length coding. This
is an example of entropy coding. Other encoding schemes are also possible,
e.g., Huffman coding. Entropy coding is a lossless step, and thus invertible.
The only lossy step in the compression algorithm is the quantization.
10.1. IMAGE COMPRESSION 165
Video Compression
A video signal is a sequence of images fi (x, y). Each second contains approx-
imately 30 images, so the amount of information is huge. In order to transfer
video signals over the internet or telephone lines (video conferencing), mas-
sive compression is necessary. The simplest approach to video compression
is to compress each image fi seperately. However, this method does not ex-
ploit temporal correlation in the video signal: adjacent frames tend to be
very similar. One way to do this is to treat the video signal as a 3D signal
f (x, y, t) and apply a three-dimensional wavelet transform.
Another method is to compute the difference images
∆fi = fi+1 − fi .
Together with an initial image f0 these difference images contain the informa-
tion necessary to reconstruct the video signal. The difference images contain
the changes between adjacent frames. For parts of the images without move-
ment, the difference images will be zero, and thus we have already achieved
significant compression. Further compression is obtained by exploiting spa-
tial redundancy in the difference images, and applying a two-dimensional
wavelet transform W to all ∆fi and to f0 . The transformed images W ∆fi
and W f0 are quantized and encoded, and then transmitted/stored.
To reconstruct the video signal, inverse wavelet transforms are computed
to get back approximate ∆f c i and fb0 . From there, we can recover the video
signal approximately by
fbi+1 = fbi + ∆f
c .
i
The sparse structure of ∆fi can be used to speed up the inverse wavelet
transforms. For wavelets with compact support, each scaling and wavelet
coefficient only affect a small region of the image. Thus we only need to
compute pixels in ∆fi corresponding to non-zero scaling and wavelet coeffi-
cients.
The difference scheme just described is a special case of motion estima-
tion, where we try to predict a frame fi from M previous frames
fbi = P (fi−1 , . . . , fi−M ),
and apply the wavelet transform to the prediction errors ∆fi = fbi − fi . The
predictor P tries to discover motions in the video to make an accurate guess
about the next frame. The wavelet transform is well suited for this, since it
contains local information about the images.
An Example of Wavelet Image Compression

We conclude our discussion about image compression by including a sim-
ple example of wavelet compression. All computations were done using the
WaveLab package. As test image we used the famous 512 × 512 pixels Lena
image, which is a standard test image for image compression. We used the
crude wavelet compression algorithm, and kept the p % largest wavelet coef-
ficients by amplitude. We used the FBI 9/7 filter pair and repeated the filter
bank 5 times. We tried four different values of the ’compression parameter’,
p = 100 (no compression), p = 10, 4, and 1. Note that p does not exactly
correpond to the compression ratio, since, in practice, also the zeroes need
to be stored. In Figures 10.4 and 10.5 we show the decompressed images.
For p = 10, there is hardly no difference to the original image. For p = 4,
the image quality is still good, but there is some ringing around edges. For
p = 1, the quality of the decompressed image is quite poor.
10.2 Denoising
Suppose that a signal f (t) is sampled on the unit interval [0, 1] at the points
tk = 2−J k, k = 1, . . . , K = 2J . Denote the exact sample values by fk = f (tk ).
Assume that we only have noisy mearuments of fk , i.e., we have data yk =
fk + σzk . Here, zk is assumed to be Gaussian white noise, i.e., independent
normally distributed random variables with mean 0 and variance 1. The
parameter σ is the noise level, which is generally unknown, and has to be
estimated from the data.
We want to recover f from the noisy data. Applying an orthogonal dis-
crete wavelet transform W , yields
W y = W f + σW z
Orthogonal transforms turns Gaussian white noise into Gaussian white

noise. If we denote the wavelet coefficients of yk by γj,k , and f :s wavelet
coefficients by wj,k , we thus have
γj,k = wj,k + σe
zj,k ,
where zej,k is Gaussian white noise.

In Figure 10.6 we have plotted the HeaviSine test function with and
without noise. We have also included a ’spike plot’ of both the exact and noisy
wavelet coefficients. The fine-scale wavelet coefficients are at the bottom of
the figure, and the coarse-scale coefficients at the top. From this figure it
is clear that it is possible to remove most of the noise by removing ’small’
10.2. DENOISING 167
p=100
p=10
Figure 10.4: Original image and 1:10 compression.

p=4
p=1
Figure 10.5: Decompressed images with 1:25 and 1:100 compression.

10.2. DENOISING 169
wavelet coefficients γj,k that mostly contain noise, and extract the test signal
by keeping large coefficients. Note the similarity with the crude wavelet
image compression algorithm.
HeavySine Noisy HeavySine

10 10
5 5
0 0
−5 −5
−10 −10
−15 −15
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Exact wavelet coefficients Noisy wavelet coefficients

−3 −3
−4 −4
−5 −5
−6 −6
−7 −7
−8 −8
−9 −9
−10 −10
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 10.6: The HeavySine test function and its wavelet coefficients, with
and without noise.
Hard and Soft Thresholding

The noise reduction algorithm suggested above can be seen as applying a
threshold function ηT (w) to the noisy wavelet coefficients. In this case, we
are applying hard thresholding
(
w if |w| ≥ T,
(10.1) ηT (w) =
0 otherwise.
Coefficients with absolute value less than some threshold T are shrunk to 0,
and all other coefficients are left unaltered. The threshold need to be properly
choosen, depending on the noise level σ. There is also a soft thresholding,
where the threshold function is defined by



w − T if w ≥ T,
(10.2) ηT (w) = w + T if w ≤ −T,


0 otherwise.
The difference compared to hard thresholding is that coefficients with abso-

lute value greater than T are shrunk by an amount of T . In Figure 10.7,
we plot those two threshold functions. Other threshold functions are also
possible, e.g., combinations of hard and soft thresholding.
Hard thresholding Soft thresholding

4 4
2 2
0 0
−2 −2
−4 −4
−4 −2 0 2 4 −4 −2 0 2 4
Figure 10.7: Hard and soft thresholding.
Algorithm. (Wavelet Denoising).
1. Compute noisy wavelet coefficients γj,k , j = j0 , . . . , J − 1, and noisy

scaling coefficients λj0 ,k .
√
2. Choose the threshold as T = 2 log N .
3. Estimate scaling and wavelet coefficients by
sbj0 ,k = λj0 ,k ,
wbj,k = ηT (γj,k ).
4. Obtain estimates fbi by applying an inverse wavelet transform to sbj0 ,k

bj,k .
and w
10.2. DENOISING 171
Denoising with soft thresholding Denoising with hard thresholding

10 10
5 5
0 0
−5 −5
−10 −10
−15 −15
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 10.8: The de-noised signal.
In Figure 10.8, we show the result of applying wavelet denoising to the

test signal in Figure 10.6.
We remark that the threshold choice above can be theoretically justified.
With this threshold, the wavelet de-noising algorithm can be shown to be
optimal in a certian sense. If fb is any estimate of f from y, we define the
risk associated with this estimator as the expected mean squared error
" K #
1 X
R(fb, f ) = E (fbk − fk )2 .
K k=1
Denoising with soft thresholding minimizes the risk under the constraint that
fb should with high propability be at least as smooth as f . We will not make
this more precise. Instead, we just mention that hard thresholding generally
leads to smaller mean squared error than soft thresholding, but that the
estimator fb is not so smooth.
In a practical application, the threshold has to be fine-tuned to the par-
ticular class of signals under consideration. Also, the noise level is in general
unknown, and has to be estimated from the data. This is done using the
wavelet coefficients on the finest scale, since the influence from the signal f
is usually less there. The noise level is estimated as
b = Median(|γJ−1,k |)/.6745.
σ
The reason for using a median estimate is to reduce the influence of outliers,
i.e., noisy wavelet coefficients with a large signal content. The same estimate
can be used for noise reduction with coloured noise. In that case, scale-
dependent thresholds are chosen as
p
Tj = 2 log K Median(|γj,k |)/.6745.
The wavelet coefficients at each scale are then thresholded according to these
thresholds.
Wavelet Packet Denoising

The performance of the denoising algorithm depends on how much the en-
tropy of the signal is decreased in the wavelet transform, that is, to what
extent most coefficients are small, and very few coefficients are large. This
suggests that it might be worthwile to use the best basis algorithm of Chapter
9, and apply soft or hard thresholding to the optimal wavelet packets coeffi-
cients. However, we can not use, e.g., the entropy measure since this requires
knowledge of the exact wavelet coefficients, which we do not know. Com-
puting the entropy of the noisy wavelet coefficients would be very misleading
because of the white noise, which has a very high entropy.
Instead, one can use as cost function an estimate of the risk R(f, fb),
where fb is the estimator of f obtained from thresholding the noisy wavelet
packet coefficients. For soft thresholding, such an estimate is given by Stein’s
Unbiased Risk Estimator
K
X
(10.3) SURE(c) = σ 2 (K − 2♯{|ck | ≤ T }) + min(c2i , T 2 ),
k=1
where ck are the noisy wavelet packet coefficients in a particular wavelet

packet basis. Using the best basis algorithm with the SURE cost function,
can improve noise reduction in certain cases.
10.3 Notes
The overview article by Jawerth et al. [17] is a good starting point for further
reading about wavelet image and video compression. The book by Nguyen
& Strang [27] contains a more detailed discussion about various aspects of
image compression, such as quantization, entropy coding, border extension,
and filter design. Also, comparisons between wavelet image compression and
the JPEG compression algorithm can be found.
The overview article [13] by Donoho contains a comprehensive description
of wavelet denoising algorithms. It also includes several numerical examples,
both for synthetic and real-world signals.
Chapter 11
Fast Numerical Linear Algebra
In this chapter we will study the numerical solution of linear equations. Typ-
ically, we are given a linear operator T and a function f , and we seek the
solution u of the equation
T u = f.
The linear operator T is either a differential or integral operator. Discretiza-
tion of such an equation leads to a system of linear equations with a large
number of unknowns. The discretization normally starts at some coarse scale,
or grid, which is successively refined giving a sequence of linear equations de-
noted as
Tj uj = fj .
Here Tj is a matrix, or equivalently, an operator on a suitable finite-dimensio-
nal space. For differential equations this matrix is a sparse and ill-conditio-
ned, and for integral equations the matrix is full and, depending on the
operator, sometimes ill-conditioned. Today these equations are solved us-
ing various iterative methods. The most efficient methods are multilevel or
multigrid methods. These take advantage of the fact that we have a sequence
of scales, or operators, and are also quite simple. Recently it has been pro-
posed to use wavelets for the solution of both linear and nonlinear equations.
We will describe the so called non-standard form of an operator in a wavelet
basis, and see how it relates to standard multilevel methods.
11.1 Model Problems

We will restrict our attention to differential and integral equations that are
elliptic. By this we mean that the discretization matrices are symmetric and
positive definite.
173
174 CHAPTER 11. FAST NUMERICAL LINEAR ALGEBRA
As an example of an elliptic differential equation we consider the Laplace

equation in one dimension
−(a(x)u′ (x))′ = f (x), 0 < x < 1,
(11.1)
u(0) = u(1) = 0.
where 0 < a < a(x) < a.

The following integral equation is a simple type of a boundary layer po-
tential in one dimension
Z 1
(11.2) − u(y) log |x − y| dy = f (x), 0 < x < 1.
0
The kernel is called the logarithmic potential and is more commonly defined
on a curve in the plane.
11.2 Discretization
We will consider the Galerkin method for the discretization of linear operator
equations. There exists several other discretization methods, such as the
finite difference method for differential equations and the collocation method
for integral equations. These can be seen as special cases of the Galerkin
method, though, with certain choices of quadrature formulas and function
spaces. In any case, wavelet and other multilevel methods work in the same
way.
Now, assume that the operator T : V → V , where V is some Hilbert
space such as L2 (R). The equation T u = f is then equivalent to finding a
u ∈ V such that
hT u, vi = hf, vi for all v ∈ V .
This is called the variational formulation of the operator equation. We now

seek an approximation uj of u in some finite-dimensional subspace Vj of V .
The Galerkin method is a finite-dimensional analogue of the variational form:
find uj ∈ Vj such that
(11.3) hT uj , vj i = hf, vj i for all vj ∈ Vj .
Let us further assume that (ϕk )N

k=1 is a basis of Vj , so that we can write
N
X
uj = ak ϕk ,
k=1
11.2. DISCRETIZATION 175
for some numbers (ak ). If we substitute this expression into equation (11.3),
and use the fact that (ϕk )N
k=1 is a basis of Vj , equation (11.3) is equivalent to
N
X
ak hT ϕk , ϕn i = hf, ϕn i, for n = 1, . . . , N.
k=1
But this a linear system of equations, which we write in matrix form as
Tj uj = fj ,
where Tj is an N × N matrix with elements
(Tj )n,k = hT ϕk , ϕn i,
and where uj and fj are vectors with components an and hf, ϕn i, respectively.
Throughout this chapter we will, with a slight abuse of notation, use the
the same symbol Tj to denote both a matrix and an operator on Vj . Similarly,
we use the notation uj to denote both a vector with components ak , and the
P
corresponding function uj = N k=1 ak ϕk .
Example 11.1. For the differential equation (11.1) the elements of the ma-
trix Tj are (by partial integration)
Z 1
(Tj )n,k = ϕ′k (x)ϕ′n (x) dx,
0
and the components of the right-hand side vector fj equal

Z 1
(fj )n = f (x)ϕn (x) dx.
0
Here, a natural choice of the finite-dimensional spaces Vj is the space of

piecewise linear and continuous functions on a grid with Nj = 2j nodes. The
basis functions spanning this space are the hat functions, with support on two
intervals. The matrix Tj is then tri-diagonal and thus sparse. Unfortunately,
the condition number of the matrix is proportional to Nj2 .
Example 11.2. For the integral equation (11.2) the elements of the matrix
Tj are given by
Z 1 Z 1
(Tj )n,k = − ϕk (x)ϕn (x) log |x − y| dxdy,
0 0
and the components of the right-hand side vector fj are given by

Z 1
(fj )n = f (x)ϕn (x) dx.
0
There are several natural choices of the finite dimensional spaces Vj for this
equation, and the simplest is to let Vj be the space of piecewise constant
functions on a grid with Nj = 2j intervals. The basis functions spanning
this space are the box functions. In this case the matrix Tj is full because
of the coupling factor log |x − y| in the integrand. The condition number of
the matrix is also large (since the continuous operator T is compact).
11.3 The Non-Standard Form

First, assume we are given a matrix (or operator) TJ on some fine-scale
subspace VJ of a multiresolution analysis. We then seek the solution uJ of
the linear system
TJ uJ = fJ .
Recall that this linear system was equivalent to the set of equations
NJ
X
uJ,k hT ϕJ,k , ϕJ,n i = hf, ϕJ,n i, for n = 1, . . . , NJ ,
k=1
P J
where uJ = N k=1 uJ,k ϕJ,k . From the subspace splitting VJ = VJ−1 ⊕ WJ−1
we can also write uJ as
NJ −1 NJ −1
X X
uJ = uJ−1,k ϕJ−1,k + wJ−1,k ψJ−1,k ,
k=1 k=1
and this induces the following decomposition of the linear system

NJ −1 NJ −1
X X
uJ−1,k hT ϕJ−1,k , ϕJ−1,n i + wJ−1,k hT ψJ−1,k , ϕJ−1,n i = hf, ϕJ−1,n i,
k=1 k=1
NJ −1 NJ −1
X X
uJ−1,k hT ϕJ−1,k , ψJ−1,n i + wJ−1,k hT ψJ−1,k , ψJ−1,n i = hf, ψJ−1,n i,
k=1 k=1
NJ −1
for n = 1, . . . , NJ−1 . This follows since the scaling functions (ϕJ−1,k )k=1 is
NJ −1
a basis of VJ−1 and the wavelets (ψJ−1,k )k=1 is a basis of WJ−1 . We write
11.4. THE STANDARD FORM 177
this as the following block matrix system

TJ−1 CJ−1 uJ−1 fJ−1
(11.4) = .
BJ−1 AJ−1 wJ−1 dJ−1
The non-standard form of an operator is obtained if we continue this splitting
recursively on TJ−1 until we reach some coarse-scale operator TL , where L <
J. The matrices, or operators, of the non-standard form are thus defined as
Aj : Wj → Wj , (Aj )n,k = hT ψj,k , ψj,n i,
Bj : Vj → Wj , (Bj )n,k = hT ϕj,k , ψj,n i,
Cj : Wj → Vj , (Cj )n,k = hT ψj,k , ϕj,n i,
Tj : Vj → Vj , (Tj )n,k = hT ϕj,k , ϕj,ni.
We can therefore represent the operator TJ as the coarse scale operator TL ,
plus the sequence of triplets {Aj , Bj , Cj }J−1
j=L .
In a multiresolution context, the scale j refers to the subspace Wj and the
functions therein. We then note that the operator Aj describe the interaction
on the scale j, and the operators Bj and Cj describe the interaction between
the scale j and all coarser scales. Also, the operator Tj is an averaged version
of the operator Tj+1 . These properties of the operators reveals a remarkable
feature of the non-standard form. Namely, the decoupling of the interaction
between the different scales.
It is also important to note that we can not represent the non-standard
form as a single block matrix, that is, it is not a representation of the op-
erator in any basis. The non-standard form is recursive in nature, and is
based on the nesting property · · · ⊂ V0 ⊂ V1 ⊂ · · · of the subspaces of a
multiresolution analysis.
It is convenient to store the non-standard form (of a discretization matrix)
as a block matrix as depicted to the left in Figure 11.1, though. This block
matrix can also be seen as the result of taking the two-dimensional discrete
wavelet transform of the discretization matrix TJ . Compare, Chapter 8 on
wavelet bases in several dimensions and Chapter 10 on image compression.
We have in this section only derived the algebraic properties of the non-
standard form. This means that it is also applicable to all other bases (besides
wavelet bases) that have the nesting property, for example, to hierarchical
bases.
11.4 The Standard Form

The standard form of an operator is just the representation, or discretization,
of the operator in a wavelet basis. If VJ is a subspace of an MRA, we can
decompose this space as
VJ = VL ⊕ WL ⊕ · · · ⊕ WJ−1 , where L < J.
j N
We know that for an MRA the scaling functions (ϕj,k )k=1 span the Vj -spaces
Nj
and the wavelets (ψj,k )k=1 span the Wj -spaces. By a change of basis the
equation TJ uJ = fJ on VJ can be written as the following block matrix
system
    
TL CL,L ... CL,J−1 ujL fL
 BL,L AL,L ... 
AL,J−1   
wLj dL
    
 .. .. ..  .. =
  .
.. ..
 . . .  .   . .
j
BJ−1,L AJ−1,L . . . AJ−1,J−1 wJ−1 dJ−1
The standard form does not fully utilize the hierarchical structure of a mul-
tiresolution analysis. Therefore we will not consider it any further in this
chapter.
11.5 Compression
So far we have said nothing about the structure of the Aj , Bj , and Cj matri-
ces. Since wavelets have vanishing moments it turns out that these matrices
are sparse. More specifically, for one-dimensional problems they have a rapid
off-diagonal decay. In the case the Tj matrices are ill-conditioned the Aj ma-
trices are even well conditioned, making the non-standard form a suitable
representation for iterative methods.
Let us now prove that integral operators produce sparse matrices in the
non-standard form. Start with an integral operator with kernel K(x, y)
Z
T u(x) = K(x, y)u(y) dy.
For the moment we assume that the kernel K(x, y) is smooth away from the
diagonal x = y, where it is singular. Typical examples of such kernels are
the following
K(x, y) = − log |x − y| , (logarithmic potential)

1
K(x, y) = . (Hilbert transform)
x−y
11.5. COMPRESSION 179
For simplicity we will use the Haar wavelet basis, which has one vanishing
moment, that is,
Z
ψj,k (x) dx = 0.
The support of the Haar wavelet ψj,k , as well as the scaling function ϕj,k ,
is the interval Ij,k = [2−j k, 2−j (k + 1)]. Now, consider the elements of the
matrix Bj (the Aj and Cj matrices are treated similarly)
Z Z
(Bj )n,k = K(x, y)ψj,n(x)ϕj,k (y) dxdy.
Ij,k Ij,n
If we assume that |k − n| > 1, so that K(x, y) is smooth on the domain of

integration, we can make a Taylor expansion of K(x, y) around the midpoint
x0 = 2−j (k + 1/2) of the interval Ij,k
K(x, y) = K(x0 , y) + (x − x0 )∂x K(ξ, y), for some ξ ∈ Ij,k .
Since the Haar wavelet has one vanishing moment it follows that
Z Z
K(x, y)ψj,n (x) dx = ∂x K(ξ, y)xψj,n(x) dx,
Ij,n Ij,n
for some ξ ∈ Ij,k . Therefore,

Z

K(x, y)ψj,n (x) dx ≤ C |Ij,n | max |∂x K(x, y)| ,
Ij,n x∈Ij,n
R
where C = xψj,n (x) dx and |Ij,n | = 2−j . This gives us an estimate for the
size of the elements of Bj
Z
−j
|(Bj )n,k | ≤ C2 max |∂x K(x, y)| dy
Ij,k x∈Ij,n
≤ C2−2j max |∂x K(x, y)| .

x∈Ij,n ,y∈Ij,k
To proceed from here we need to know the off-diagonal decay of the kernel.
For the logarithmic potential we have
|∂x K(x, y)| = |x − y|−1 ,
and this implies that
|(Bj )n,k | ≤ C2−j (|k − n| − 1)−1 .

Exactly the same estimate will hold for the elements of Cj . For the matrix
Aj we can also make a Taylor expansion of the kernel in the y-variable, giving
an even faster off-diagonal decay of its elements
|(Aj )n,k | ≤ C2−j (|k − n| − 1)−2 .
For other integral operators similar estimates for the off-diagonal decay
will hold. Also, the number of vanishing moments of the wavelet will in-
crease the degree of the decay. As a matter of fact, there is a large class of
integral operator, referred to as Calderón-Zygmund operators, for which one
can prove a general estimate. A Calderón-Zygmund operator is a bounded
integral operator on L2 (R), with a kernel satisfying the estimates
|K(x, y)| ≤ C0 |x − y|−1 ,

|∂x K + ∂y K| ≤ C1 |x − y|−N −1 ,
For such an operator one can prove that
|(Aj )n,k | + |(Bj )n,k | + |(Cj )n,k | ≤ CN 2−j (|k − n| + 1)−N −1 ,
if the wavelet has N number of vanishing moments.

Example 11.3. To the right in Figure 11.1 we have marked all matrix el-
ements of the Aj , Bj , and Cj blocks exceeding a small threshold. As can
be seen we get good band-diagonal approximations of these matrices. The
matrix is the non-standard representation of the logarithmic potential in the
Haar wavelet basis.
TJ−2 BJ−2
BJ−1
CJ−2 AJ−2
CJ−1 AJ−1
Figure 11.1: The non-standard form of the logarithmic potential.

11.6. MULTILEVEL ITERATIVE METHODS 181
11.6 Multilevel Iterative Methods

We will now propose a method for the iterative solution of a system of equa-
tions based on the non-standard form. The method is very similar to a
multigrid V-cycle. It is here assumed that the reader is familiar with basic
iterative methods such as Jacobi and Gauss-Seidel. Further, a knowledge of
the multigrid method is needed in order to understand the connection with
our wavelet based method. We refer the reader to the references in the Notes
section at the end of the chapter.
To start with, let us make a comparison between the condition numbers of
the Tj and Aj matrices. In Table 11.1 we have listed the condition numbers
for the discretization of the logarithmic potential in the Haar Basis. Here N
is the number of basis functions in Vj . Wee see that the condition number
of Tj grows linearly in the number of unknowns N. On the other hand, the
condition number of Aj stays bounded for all j and is about 2. This suggest
that we iterate with the matrices Aj , that is, on the Wj -spaces.
j N κ(Tj ) κ(Aj )
5 32 57 1.94
6 64 115 1.97
7 128 230 1.98
8 256 460 1.99
Table 11.1: Condition numbers for the logarithmic potential.
To begin with, we rewrite the block matrix form (11.4) of TJ uJ = fJ as
AJ−1 wJ−1 = dJ−1 − BJ−1 uJ−1 ,

TJ−1 uJ−1 = fJ−1 − CJ−1wJ−1 .
Inspired by the multigrid method, we begin by solving the first equation for
wJ−1 approximately using a simple smoother. This should work well since
AJ−1 is well conditioned. Next, we update the right-hand side of the second
equation and solve it for uJ−1 . Now, since TJ−1 is still ill conditioned we solve
this equation recursively by splitting TJ−1 one step further. When we have
reached a sufficiently coarse-scale operator TL , we solve the equation for uL
exactly though. Finally, we update the right-hand side of the first equation
and repeat the above steps a number of times.
Based on this we propose the following recursive algorithm, see Figure
11.2. The number of loops K is a small number, typically less than 5. The
(0)
function uj = Solve(uj , fj )
if j = L
Solve Tj uj = fj using Gaussian elimination
else
(0) (0) (0)
Project uj onto Vj−1 and Wj−1 to get uj−1 and wj−1
Project fj onto Vj−1 and Wj−1 to get fj−1 and dj−1
for k = 1, . . . , K
(k) (k−1) (k−1)
uj−1 = Solve(uj−1 , fj−1 − Cj−1wj−1 )
(k) (k−1) (k)
wj−1 = Iter(wj−1 , dj−1 − Bj−1 uj−1)
end
(K) (K)
uj = uj−1 + wj−1
end
Figure 11.2: The wavelet-multigrid method.
(0)
function Iter(wj , dj ) solves Aj wj = dj approximately using a simple itera-
(0)
tive method with initial vector wj .
11.7 Notes
The non-standard form of an operator was invented by Beylkin, Coifman,
and Rokhlin in [4], Fast Wavelet Transforms and Numerical Algorithms I.
It can be seen as a generalisation of the Fast Multipole Method (FMM)
for computing potential interactions by Greengard and Rokhlin, see A Fast
Algorithm for Particle Simulations [15]. Methods for solving equations in the
non-standard form has further been developed primarily by Beylkin, see for
example Wavelets and Fast Numerical Algorithms [3]. For an introduction to
multigrid and iterative methods we refer the reader to the book A Multigrid
Tutorial [5] by Briggs.
Chapter 12
Functional Analysis
Functional analysis in this chapter will mean the study of global differentia-
bility of functions expressed in terms of their derivatives being, say, square in-
tegrable (or belonging to certain Banach spaces). The corresponding wavelet
descriptions will be made in terms of orthogonal MRA-wavelets (Chapter 4).
Wavelet representations are also well suited to describe local differentia-
bility properties of functions. However, for this matter, we will only give
references in Notes at the end of the chapter.
12.1 Differentiability and Wavelet Represen-

tation
Differentiability properties of a function may be expressed in terms of the
behaviour of its Fourier transform for large values of the frequency variable.
Analogously, these properties can be described in terms of the behaviour of
the wavelet coefficients on small scales.
In order to avoid introducing additional mathematical techniques (Paley-
Littlewood decompositions of Lp ), we will mainly consider L2 , i.e., p = 2.
The norm in Lp is defined by
Z ∞ 1/p
p
kf kp := |f (t)| dt
−∞
Moreover, we consider only orthogonal wavelets associated with an MRA,

which vanish outside a bounded interval.
Recall that, for a 2π-periodic function f ∈ L2 (0, 2π) with f (α) ∈ L2 (0, 2π),
183
184 CHAPTER 12. FUNCTIONAL ANALYSIS
it is possible to show that for the Fourier coefficients cn (Parseval)

1 (α) 2 X α 2
(12.1) kf k2 = |n cn | .
2π n6=0
We now come to the main result of this chapter. The following theorem
has an extension to Lp , 1 < p < ∞ cited below. However, the crucial lemma
is essentially the same for p = 2.
Theorem 12.1. Under the same assumptions on the function f now defined
on R, i.e., f ∈ L2 (R) with f (α) ∈ L2 (R), we will show that for the wavelet
coefficients wj,k (D N ψ ∈ L2 )
X
kf (α) k22 ∼ |2αj wj,k |2 (0 ≤ α ≤ N)
j,k
R
holds when xα ψ(x)dx = 0 for 0 ≤ α ≤ N. Here ∼ denotes that the
quotient of the two expressions is bounded from below and from above by
positive constants depending only on the wavelet ψ.
The proof will be based on a lemma related to the so-called Bernstein

and Jackson theorems. The Bernstein theorem expresses the behaviour of
the derivatives of a function in terms of the function and its spectral content.
The Jackson theorem describes the spectral approximability of a function in
terms of its differentiability.
P
Lemma: For f (x) = k w0,k ψ(x − k), (D α f ∈ L2 ) holds (0 ≤ α ≤ N)
kD α f k22 ∼ kf k22
Proof of the theorem: P Apply the lemma P with g(x) = f (2−j x) to ob-
tain kD α f k22 ∼ 22jα k |w0,k |2 when f (x) = k w0,k 2
j/2
ψ(2j x − k). This
proves the theorem. (In general, only a summation over the dilation index,
j, remains and the corresponding subspaces are orthogonal.)
α
P α
Proof of the lemma:P Starting from D Pf (x) = k w0,k D ψ(x − k), note
that |D α f (x)|2 ≤ k |w0,k |2 |D α ψ(x − k)| k |D α ψ(x − k)| by the Cauchy-
Schwarz inequality. Integrating this yields
X X Z X
α 2 α 2
kD f k2 ≤ sup |D ψ(x − k)| |w0,k | |D α ψ(x − k)|dx ≤ C |w0,k |2 .
x
k k k
12.1. DIFFERENTIABILITY AND WAVELET REPRESENTATION 185
The proof of the remaining inequality is based on the existence of a function

Ψ(x) vanishing outside a bounded interval such that D α Ψ = ψ, α ≤ N fixed.
Assuming this, we have (partial integration)
X XZ
f (x) = w0,k ψ(x − k) = f (y)ψ(y − k) dy ψ(x − k)
XZ
k k
α
= (−1) D α f (y)Ψ(y − k) dy ψ(x − k)
k
and
Z Z X
kf k22 = | D α f (y)Ψ(y − k)ψ(x − k)|2 dxdy ≤
k
≤ CkD α f k22
There remains to prove the existence of Ψ.

R x We need only prove
R ∞ this for N = 1
and then use induction. Put Ψ(x) = −∞ ψ(x)dx = − x ψ(x)dx. Then
R
clearly
R Ψ vanishes Routside a bounded interval, Ψ′ = ψ. Finally, Ψ(x)dx =
− xΨ′ (x)dx = − xψ(x)dx = 0 by a partial integration, which now makes
it possible use an induction argument.
The general result with the same wavelet ψ is (1 < p < ∞, D α f ∈ Lp , 0 ≤

α ≤ N)
X
kDfα kp ∼ k( |2jαwj,k 2j/2χ[0,1] (2j · −k)|2 )1/2 kp
k
where wj,k = hf, ψj,k i.

This is connected with the classical Paley-Littlewood decomposition (f ∈
p
L , 1 < p < ∞)
X
kf kp ∼ k( |µj ∗ f |2 )1/2 kp
j
where, e.g., µ̂j (ω) = µ̂(2−j ω) ≥ 0, µ̂ is infinitely

P differentiable, and vanishes
−j
outside the set {ω; π/2 ≤ |ω| ≤ 2π} and j µ̂(2 ω) = 1 for ω 6= 0.
There are also results showing that certain wavelet bases are unconditional
in Lp , 1 < p < ∞, in the Hardy space H 1 - replacing L1 - and in the space
BMO (functions of bounded mean oscillation) - replacing L∞ .
The case L2 is treated in this chapter: the Parseval relation shows that
the wavelet bases are then unconditional, since the wavelet coefficients enter
186 CHAPTER 12. FUNCTIONAL ANALYSIS
with their absolute values only. This means that the (L2 -)convergence of the
wavelet expansion is unconditional in the sense that it will not depend on,
for example, the order of summation or the signs (phases) of the coefficients.
This is in contrast to, for example, the basis einωt n for Lp (0, 2π), p 6= 2,
which is not unconditional.
Exercises 12.1
12.1. Formulate and prove the analogue in R of the identity 12.1.
P
12.2. Show that for f (x) = k w0,k ψ(x − k) with ψ as in Theorem 12.1
holds
kD α f kp ∼ kf kp (0 ≤ α ≤ N)
using the Hölder inequality

Z ∞

f (t)g(t) dt ≤ kf kp kgkq (1/p + 1/q = 1, 1 ≤ p ≤ ∞)

−∞
where the Cauchy-Schwarz inequality was applied for p = 2, and writing out
w0,k .
12.3. Relate the Bernstein-Jackson lemma to the following approximation

statement in Section 4.7 of Chapter 4.
kf − fj k2 ≤ C 2−jα kD α f k2
where fj is the projection on Vj , the closed linear span of the functions

{ϕj,k }k .
12.2 Notes
For further material, we refer to the books by Meyer [23], Kahane & Lemarié
[21], Hernández & Weiss [16]. Wavelets and local regularity are treated in a
recent dissertation by Andersson [2], which also contains many references.
Chapter 13
An Analysis Tool
We will give two examples of how the continuous wavelet transform in Chap-
ter 7 may be used in signal processing, and also indicate an algorithmic short-
cut if the wavelet is associated with a multiresolution analysis; see Chapter
4.
13.1 Two Examples

Recall the definition of the continuous wavelet transform
Z ∞
t−b
Wψ f (a, b) = f (t)ψ |a|−1/2 dt
−∞ a
where f ∈ L1 , ψ ∈ L1 ∩ L2 , a 6= 0; ψ is real-valued for simplicity, and fulfills
Z ∞
|ψ̂(ξ)|2 dξ/ξ < ∞
0
Example 13.1. In this example, we show the continuous wavelet transform

of a chirp. The complex Morlet wavelet can be viewed in Figure 13.1:
2
ψ(t) = eiω0 t e−t /2
b
ψ(ω)
2
= (2π)1/2 e−(ω−ω0 ) /2
b
where now ψ(0) ≈ 0 only, and ω0 = 5.336. This choice of ω0 makes the real
part of ψ, Reψ, have its first maximum value outside the origin as half the
b = 0, a small correction may
the modulus value there. In order to make ψ(0)
1/2 2 2
be added, e.g., −(2π) exp(−ω /2 − ω0 /2).)
187
188 CHAPTER 13. AN ANALYSIS TOOL
3
1
2.5
0.5 2
0 1.5
1
−0.5
0.5
−1 0
−5 0 5 0 5 10
b
Figure 13.1: Left: Reψ(t) (dotted), Imψ(t) (dashed). Right: ψ(ω).
A chirp is chosen as the function cos(300t3 ) exp(−4t2 ). It is shown to-

gether with its transform1
1 2
3
0.5
4
log2(1/a)
0 5
6
−0.5
7
−1 8
0 0.5 1 0 0.5 1
t b
Figure 13.2: Left: cos(300t3 ) exp(−4t2 ). Right: The transform.
Example 13.2. In Figures 13.3, 13.4, 13.5,2 the continuous wavelet trans-
form of three different signals are shown, which represent measurements in
the velocity field of a fluid. The wavelet is a complex Morlet wavelet shown
in Figure 13.1. The figures are to be viewed rotated a quarter turn clockwise
relative to their captions. The signals are shown at the top (sic! ) for easy
reference. On the horizontal axis is the parameter b and on the vertical a,
which may be viewed as time and frequency respectively (but are certainly
not that exactly).
1
The transform was calculated by Matlab, using the Morlet wavelet (Figure 13.2) in
Wavelab.
2
These three figures have been produced by C-F Stein, using a program written and
put to his disposal by M. Holschneider.
13.1. TWO EXAMPLES 189
Figure 13.3: A streaky structure.

Figure 13.4: A turbulent spot.

13.1. TWO EXAMPLES 191
Figure 13.5: A wave packet.

13.2 A Numerical Sometime Shortcut

When the continuous wavelet transform
Z ∞
t−b
Wψ f (a, b) = f (t)ψ |a|−1/2 dt
−∞ a
is implemented numerically, this involves usually the Fast Fourier Transform

algorithm. However, if the wavelet ψ is associated with an MRA (Chapter
4), there is a possibility to use this structure as follows.
Note that, for a = 2−j , b = 2−j k
Z ∞
−j −j
wj,k := Wψ f (2 , 2 k) = f (t)2j/2 ψ(2j t − k) dt
−∞
i.e., these are the wavelet coefficients defined in Chapter 4.

Thus, starting from a calculation of the projection on the finest scales of
interest, the space V0 , say, of the MRA, we may proceed as with the discrete
wavelet algorithm in Chapter 4. This projection is then
Z ∞
s0,k := f (t)φ(t − k) dt
−∞
13.3 Notes
A recent application of the continuous wavelet transform to the analysis of
data concerning the polar motion, the Chandler wobble, can be found in
Gibert et al. [14].
For further reading, we suggest the books by Holschneider [18], and Ka-
hane & Lemarié [21].
Chapter 14
Feature Extraction
For most signal and image classification problems the dimension of the input
signal is very large. For example, a typical segment of an audio signal has a
few thousand samples, and a common image size is 512 × 512 pixels. This
makes it practically impossible to directly employ a traditional classifier,
such as linear discriminant analysis (LDA), on the input signal. Therefore
one usually divides a classifier into two parts. First, one maps the input signal
into some lower dimensional space containing the most relevant features of
the different classes of possible input signals. Then, these features are fed
into a traditional classifier such as an LDA or an artificial neural net (ANN).
Most literature on classification focuses on the properties of the second step.
We will present a rather recent wavelet based technique for the first step,
that is, the feature extractor. This technique expands the input signal into
a large time-frequency dictionary consisting of, for instance, wavelet packets
and local trigonometric bases. It then finds, using the best-basis algorithm,
the one basis that best discriminates the different classes of input signals.
We refer to these bases as local discriminant bases.
14.1 The Classifier

To begin with we set up some notation. Let X be the set of all input signals,
and let Y be the corresponding set of class labels. To simplify matters, we
further assume that we only have two classes, Y = {1, 2}. The input signal
space X ⊂ Rn , and as we already mentioned n is typically of the order of
103 -106 . A function d : X → Y, which assigns a class label to each input
signal x ∈ X , is called a classifier. We also assume we have access to a
training set consisting of N pairs of input signals and corresponding class
labels, {(x1 , y1), . . . , (xN , yN )}. Let N = N1 + N2 , where N1 and N2 are the
193
194 CHAPTER 14. FEATURE EXTRACTION
(y)
number of class 1 and 2 signals, respectively. We use the notation xi to
denote that a signal in the training set belongs to class y. Let P (A, y) be
a probability distribution on X × Y, where A ⊂ X and y ∈ Y. This means
that,
P (A, y) = P (X ∈ A, Y = y) = πy P (X ∈ A|Y = y),
where X ∈ X and Y ∈ Y are random variables. Here πy is the prior prob-
ability of class y, which is usually estimated as πy = Ny /N, y = 1, 2. The
optimal classifier for this set up is the Bayes classifier. To obtain it we need
an estimate of P (A, y). The number of training samples is small compared to
the size of the input signals though, that is, N ≪ n. This makes it impossible
to reliably estimate P (A, y) in practice.
Feature extraction is the resolution to the problem with the high dimen-
sionality of the input signal space. It is essential to extract relevant features
of the signal for classification purposes. In practice, it is known that multi-
variate data in Rn are almost never n-dimensional. Rather, the data exhibit
an intrinsic lower dimensional structure. Hence, the following approach of
splitting the classifier into two functions is often taken:
d = g ◦ f.
Here, f : X → F is a feature extractor mapping the input signal space
into a lower dimensional feature space F ⊂ Rm . The dimension m of the
feature space is typically at least ten times smaller than the dimension n
of the input signal space. The feature extractor is followed by a traditional
classifier g : F → Y, which should work well if the different signal classes are
well separated in the feature space. We will describe an automatic procedure
for constructing the feature extractor given the training data.
14.2 Local Discriminant Bases

Let D be a time-frequency dictionary consisting of a collection of basis vec-
tors {ψj }Mj=1 . If, for instance, this dictionary consists of all the basis vectors
in a given wavelet packet basis then M = n(1 + log2 n). The dictionary with
its basis vectors contains a huge number (> 2n ) of orthonormal bases of Rn ,
see Chapter 9. A wavelet packet dictionary should be well suited for signal
classification purposes since discriminant features are probably separated in
the time-frequency domain, but not necessarily in the time (or space) do-
main. Given a signal and a cost function, we can expand the signal into
an optimal basis using the best-basis algorithm. For signal and image com-
pression applications the cost function could be the entropy of the expansion
14.3. DISCRIMINANT MEASURES 195
coefficients for a given basis. Our objective is to separate different classes of

input signals, and we will use other cost functions to discriminate between
the characteristic features of the different classes. For classification purposes,
we prefer to use the term discriminate function rather than cost function, and
the dictionary of bases is called local discriminant bases (LDB).
Before we discuss how to choose the cost function to discriminate between
different bases,let us suppose that we have already computed the best basis.
We then form an orthonormal n × n matrix

W = ψ1 . . . ψn ,
where ψ1 , . . . , ψn are the basis vectors of the best basis. Applying the trans-
pose of this matrix to an input signal x, gives us the coordinates of the signal
in the best basis (we regard x and the ψi :s as column vectors in Rn )
 t   
ψ1 x hx, ψ1 i
   
W t x =  ...  =  ...  .
ψn t x hx, ψn i
The LDB classifier is defined as
d = g ◦ (Pm W t ),
where the m × n matrix Pm is a feature selector, which selects the most

important m (< n) features (coordinates) in the best basis.
14.3 Discriminant Measures

The dictionary D of LDBs can also be expressed as the set of all possible
orthonormal bases that can be constructed from the basis functions {ψj }M j=1 .
Let B be such a basis and let ∆(B) be a discriminant function measuring the
ability of the basis B to separate (discriminate) the different classes of input
signals. Then, the optimal basis W is chosen as the one that maximizes the
discriminant function
W = argmax ∆(B).
B∈D
To define ∆ we start by defining a discriminant measure for each unit basis

vector ψi ∈ D. Let X ∈ X be a random sample from the input signal space,
and let
Zi = hX, ψi i.
(y)
Then Zi is also a random variable, and we sometimes write Zi to emphasize
that the random variable X is of class y. Now, we are interested in the
probability density function (pdf) of Zi for each class y, which we denote as
(y)
qi (z). We can estimate these pdfs by expanding the available signals in the
(y)
training set into the basis functions of the dictionary. An estimate qbi , for
(y)
qi , can then be computed by a pdf estimation technique called averaged
shifted histograms (ASH), for example.
(1) (2)
Once we have estimated the pdfs qbi and qbi , for class 1 and 2, we need
(1) (2)
a distance function δ(bqi , qbi ), that measures the ability of the direction ψi
to separate the two classes. If the two pdfs are similar δ should be close to
zero. The best direction is the one for which the two pdfs look most different
from one another. For this direction δ should attain a maximum positive
value. There are several ways to measure the discrepancy between two pdfs
of which we mention
Z p p
δ(p, q) = ( p(z) − q(z))2 dz (Hellinger distance)
Z 1/2
2
δ(p, q) = (p(z) − q(z)) dz (ℓ2 -distance)
The choice of the distance function is, of course, problem dependent.

When we now know how to construct a discriminant function for a single
basis vector, we must
construct one for a complete basis B ∈ D. Suppose
B = ψ1 . . . ψn , and define the discriminant powers δi for direction ψi as
(1) (2)
qi , qbi ).
δi = δ(b
Further let {δ(i) } be the decreasing rearrangement of {δi }, that is, the dis-
criminant powers sorted in decreasing order. The discriminant function for
the basis B is then finally defined as the sum of the k (< n) largest discrim-
inant powers
k
X
∆(B) = δ(i) .
i=1
One possibility could be to let k = m, the dimension of the feature space.

This is not necessary though, and the choice of k needs further research.
14.4 The LDB Algorithm

We conclude the discussion above by formulating the following algorithm.
14.5. NOTES 197
Algorithm. (The LDB Algorithm)
1. Expand all the signals in the training set into the time-frequency dic-
tionary D.
(y)
2. Estimate the projected pdfs qbi for each basis vector ψi and class y.
3. Compute the discriminant power δi of each basis vector ψi in the dic-

tionary.
P
4. Choose the best basis as W = argmax ∆(B), where ∆(B) = ki=1 δ(i)
B∈D
is the sum of the k largest discriminant powers of the basis B.
5. Define the matrix Pm so that it selects the m largest discriminant pow-

ers.
6. Design the classifier g on the m features from the previous step.
All of these steps take no more than O(n log n) operations (see Chapter
9), making this algorithm computationally efficient.
14.5 Notes
Linear discriminant bases for classification purposes was introduced by Saito
and Coifman in the paper Local discriminant bases [25]. Improvements to
the LDB method were later described by the same authors in [26]. They have
performed successful experiments on geophysical acoustic waveform classifi-
cation, radar signal classification, and classification of neuron firing patterns
of monkeys.
Chapter 15
Implementation Issues
This chapter is concerned with two issues related to the actual implementa-
tion of the discrete wavelet transform. The first concerns finite length signals
and how to extend, or otherwise treat, them. The second, how to process
the sample values of a continuous-time function, before the discrete wavelet
transform is being implemented in a filter bank. (The discrete wavelet trans-
form was defined in Chapter 4.)
15.1 Finite Length Signals

In applications we always have signals of finite length. So far, we have as-
sumed that the signals extended indefinitely. Let us now start with a signal
of finite length L, with sample values x0 , x1 , . . . , xL−1 . Proper handling of
finite length signals is especially important in image processing applications,
where poor treatment of the boundaries clearly becomes visible in, for in-
stance, a compressed image. We will describe how to handle finite length in
one dimension, and for images we simply do the same for one column or row
at a time.
P
The problem lies in the computation of convolutions. The sum k hk xn−k
normally involves, for instance, x−1 , which is not defined. We can deal
with this problem in two ways. First, we can extend, in some way, the
signal beyond the boundaries. The other possibility is to change to special
boundary-corrected filters near the end-points of the signal. In both cases
there exists several alternatives, of which we will present the most impor-
tant. We start with different extension techniques, and then we go on with
boundary-corrected filters.
199
200 CHAPTER 15. IMPLEMENTATION ISSUES
Extension of Signals
We will describe three extension methods: extension by zeros (zero-padding),
extension by periodicity (wraparound), and extension by reflection (symmet-
ric extension). See Figure 15.1.
Zero-padding simply sets the rest of the values to zero, giving the infinite
sequence
. . . , 0, 0, x0, x1 , . . . , xL−2 , xL−1 , 0, 0, . . . .
Unless x0 = xL−1 = 0, zero-padding introduces a jump in the function. Zero-

padding of a continuous function defined on (0, L) would in general produce
a discontinuous function. Therefore, zero-padding is, in most cases, not used.
Wraparound creates a periodic signal with period L, by defining x−1 =
xL−1 , xL = x0 , etc. We get
. . . , xL−2 , xL−1 , x0 , x1 , . . . , xL−2 , xL−1 , x0 , x1 , . . . .
For signals that are naturally periodic, wraparound is better than zero-
padding. The discrete Fourier transform (DFT) of a vector in RL is the convo-
lution, of the periodic extension, of the vector with the filter 1, W 1 , . . . , W L−1,
where W = e−i2π/L .
zero−padding wraparound symmetric extension
2 2 2
1 1 1
0 0 0
−1 −1 −1
−1 0 1 2 −1 0 1 2 −1 0 1 2
Figure 15.1: Extension methods
There are two ways to extend a discrete-time signal symmetrically – either

we repeat the first and last sample, or we do not.
If we do not repeat, it corresponds to the extension f (−t) = f (t) in
continuous time. In discrete time we take x−1 = x1 , xL = xL−2 , and so on,
to get
. . . , x2 , x1 , x0 , x1 , . . . , xL−2 , xL−1 , xL−2 , xL−3 , . . . .
We call this whole-point symmetry since the point of symmetry is around

t = 0. The extended signal has period 2L − 1.
15.1. FINITE LENGTH SIGNALS 201
Half-point symmetry is the other possible symmetric extension method.

It does repeat the first and last sample value, and its point of symmetry is
around t = 1/2. We let x−1 = x0 , and xL = xL−1 , etc. We then get a signal
with period 2L,
. . . , x1 , x0 , x0 , x1 , . . . , xL−2 , xL−1 , xL−1 , xL−2 , . . . .
The two symmetric extension techniques give rise to two versions of the
discrete cosine transform (DCT). The DCT is used in the JPEG image com-
pression standard, for example.
In continuous time, symmetric extension gives a continuous function.
This is the advantage. It does introduce a jump in the first derivative, though.
Symmetric Extension and Filters

From now on, let us suppose that we have extended the signal using whole-
or half-point symmetry. Then, we are faced with another problem. The
outputs of the analysis part of the filter bank have been downsampled, after
the filtering. We want these output signals to be symmetric as well, since
the filter bank is repeated on the lowpass output in the discrete wavelet
transform. How do we achieve this?
For a biorthogonal filter bank, with symmetric (or antisymmetric) filters,
we use whole-point symmetric extension for odd-length filters, and half-point
symmetric extension for even-length filters.
We refer to the books and articles in the Notes section for further details
about extension and its practical implementation.
Boundary-Corrected Filters
Boundary-corrected filters are quite complicated to construct. In continuous
time this corresponds to defining a wavelet basis for a finite interval [0, L].
Such construction are still at the research stage. Let us therefore describe
one example of boundary-corrected wavelets.
The example we will consider is the piecewise linear hat function as the
scaling function. It is supported on two intervals and the corresponding
wavelet on three intervals, see Figure 15.2. These are the synthesis functions.
The filters are 0.5, 1, 0.5, and 0.1, -0.6, 1, -0.6, 0.1, respectively. This is
an example of a semi-orthogonal wavelet basis. The approximation space
V0 is orthogonal to the detail space W0 , but the basis functions within those
spaces are not orthogonal; the hat function is not orthogonal to its translates.
This basis is useful when discretizing certain differential equations, using the
V0 W0
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0 1 2 3 0 1 2 3
Figure 15.2: Piecewise linear scaling function and wavelet.
Galerkin method. Then, one does not need to know the dual scaling function
and wavelet. See Chapter 11 for details on how to solve differential equations
using wavelets.
Finally, in Figure 15.3, we see two boundary-corrected wavelets. Depend-
ing on whether we have Dirichlet or Neumann conditions at the boundary of
the differential equation, we may want to force the boundary condition to 0.
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
0 1 2 3 0 1 2 3
Figure 15.3: Boundary corrected wavelets.
15.2 Pre- and Post-Filtering

When implementing the discrete wavelet transform we need the scaling coef-
ficients sJ,k , at some fine scale J. The problem is this: we are normally given
the sample values f (2−J k) of a continuous-time function f (t), and these are
not equal to sJ,k . It is a common mistake to believe so! We will now give
some suggestions on how to handle this.
15.2. PRE- AND POST-FILTERING 203
Pre-filtering
Suppose we are given the sample values f (2−J k) of a lowpass filtered signal
f , with fˆ(ω) = 0 for |ω| ≥ 2J π. These sample values will be related to the
approximation of f at the scale 2−J , through the projection
X
fJ (t) := PJ f = hf, ϕJ,k iϕJ,k (t).
k
A relation appears if we compute the scaling coefficients sJ,k = hf, ϕJ,k i ap-
proximately with some numerical integration method, e.g., using a rectangle
approximation
Z ∞
sJ,k = f (t)ϕJ,k (t) dt
−∞
X
≈ 2−J f (2−J l)ϕJ,k (2−J l)
l
X
= 2−J/2 f (2−J l)ϕ(l − k).
l
Note that the last expression is a filtering of the samples of f , where the filter
coefficients are 2−J/2 ϕ(−l). This is called pre-filtering. There exists other
pre-filtering methods. If, for instance, f is band-limited, this can be taken
into account, to compute the scaling coefficients sJ,k more accurately. It is
common practice to use the sample values directly as the scaling coefficients,
which then introduces an error. This error will have its main influence at the
smallest scales, that is, for j < J close to J (see Exercise 15.1).
Post-filtering
The sample values can be reconstructed approximately, through a filtering
of the scaling coefficients sJ,k with the filter coefficients 2J/2 ϕ(k)
f (2−J k) ≈ fJ (2−J k)
X
= sJ,l ϕJ,l (2−J k)
l
X
J/2
=2 sJ,l ϕ(k − l).
l
This is a convolution and it is called the post-filtering step.

Exercises 15.2
15.1. Suppose that we are given the sample values f (2−J k) of a lowpass
filtered signal f , with fb(ω) = 0 for |ω| ≥ 2J π. For ω in the pass-band, verify
that
X
fb(ω) = 2−J
−J
f (2−J k)e−i2 kω ,
k
and that the Fourier transform of the function

X
fJ (t) = f (2−J k)ϕJ,k (t)
k
equals
X
fbJ (ω) = 2−J/2 ϕ(2
−J kω
b −J ω) f (2−J k)e−i2 .
k
This indicates how a filter might be constructed to compensate for the influ-
ence of the scaling function.
15.3 Notes
The overview article of Jawerth and Sweldens [20] describes how to define
orthogonal wavelets on an interval. The book [27] by Nguyen and Strang
discusses finite length signals and also contains useful references for further
reading.
Bibliography
[1] P. Andersson, Characterization of pointwise hölder regularity, Appl.

Comput. Harmon. Anal. 4 (1997), 429–443.
[2] , Wavelets and local regularity, Ph.D. thesis, Chalmers University

of Technology and Göteborg University, 1997.
[3] G. Beylkin, Wavelets and fast numerical algorithms, Lecture Notes for
short course, AMS-93, Proceedings of Symposia in Applied Mathemat-
ics, vol. 47, 1993, pp. 89–117.
[4] G. Beylkin, R. Coifman, and V. Rokhlin, Fast wavelet transforms and

numerical algorithms i, Comm. Pure and Appl. Math. 44 (1991), 141–
183.
[5] W. Briggs, A multigrid tutorial, SIAM, 1987.
[6] A.R. Calderbank, I. Daubechies, W. Sweldens, and B-L Yeo, Wavelet

transforms that map integers to integers, Appl. Comput. Harmon. Anal.
5 (1998), 312–369.
[7] C.K. Chui, Introduction to wavelets, New York: Academic Press, 1992.
[8] , Wavelets: a mathematical tool for signal analysis, Philadelphia:

SIAM, 1997.
[9] A. Cohen and I. Daubechies, Non-separable bidimensional wavelets, Re-

vista Matemática Iberoamericana 9 (1993), no. 1, 51–137.
[10] A. Cohen and J-M Schlenker, Compactly supported bidimensional

wavelet bases with hexagonalsymmetry, Constr. Approx. 9 (1993), 209–
236.
[11] I. Daubechies, Ten lectures on wavelets, SIAM, 1992.
205
206 BIBLIOGRAPHY
[12] I. Daubechies and W. Sweldens, Factoring wavelet transforms into lifting

steps, Tech. report, Bell Laboratories, Lucent Technologies, 1996.
[13] D.L. Donoho, Nonlinear wavelet methods for recovery of signals, den-
sities and spectra from indirect and noisy data, Proc. Symposia in Ap-
plied Mathematics (I. Daubechies, ed.), American Mathematical Society,
1993.
[14] D. Gibert, M. Holschneider, and J.L. LeMouel, Wavelet analysis of the
chandler wobble, J Geophys Research - Solid Earth 103 (1998), no. B11,
27069–27089.
[15] L. Greengard and V. Rokhlin, A fast algorithm for particle simulations,
Journal of Computational Physics 73(1) (1987), 325–348.
[16] E. Hernández and G. Weiss, A first course on wavelets, CRC Press,
1996.
[17] M. Hilton, B. Jawerth, and A. Sengupta, Compressing still and moving
images with wavelets, To appear in Multimedia Systems, Vol. 2, No. 1,
1994.
[18] M. Holschneider, Wavelets: An analysis tool, Oxford: Clarendon Press,
1995.
[19] B.B. Hubbard, World according to wavelets: The story of a mathemat-
ical technique in the making, Wellesley, Mass : A K Peters, 1998.
[20] B. Jawerth and W. Sweldens, An overview of wavelet based multiresolu-
tion analyses, SIAM Rev. 36 (1994), no. 3, 377–412.
[21] J-P Kahane and P.G. Lemarié-Rieusset, Fourier series and wavelets,
Gordon & Breach, 1995.
[22] S. Mallat, A wavelet tour of signal processing, Academic Press, 1998.
[23] Y. Meyer, Ondelettes et opérateurs: I, Hermann, 1990.
[24] , Wavelets: Algorithms & applications, SIAM, 1993.
[25] N. Saito and R. Coifman, Local discriminant bases, Mathematical Imag-
ing: Wavelet Applications in Signal and Image Processing (A.F. Laine
and M.A. Unser, eds.), vol. 2303, SPIE, 1994.
[26] , Improved local discriminant bases using emperical probability
estimation, Statistical Computing, Amer. Stat. Assoc., 1996.
BIBLIOGRAPHY 207
[27] G. Strang and T. Nguyen, Wavelets and filter banks, Wellesley-Cam-

bridge Press, 1996.
[28] R. Strichartz, Wavelets and self-affine tilings, Constructive Approxima-

tion (1993), no. 9, 327–346.
[29] W. Sweldens and P. Schröder, Building your own wavelets at home, Tech.
report, University of South Carolina, Katholieke Universiteit Leuven,
1995.
[30] M. Vetterli and J. Kovacevic, Wavelets and subband coding, Prentice

Hall, 1995.
[31] M.V. Wickerhauser, Lectures on wavelet packets algorithms, Tech. re-

port, Department of Mathematics, Washington University, St Louis,
1991.
Index
admissibility condition, 114 compression

algorithm of images, 161
best basis, 153 of operators, 178
cascade, 130 condition number, 54
LDB, 197 convergence, 34, 50
alias, 31, 40 convolution, 18
alias cancellation, 42 convolution theorem, 21
allpass filter, 23 correlation, 19
alternating flip, 76 cost function, 153
analysis, 37
antisymmetric, 25 detail spaces, 65
autocorrelation, 19 differential equation, 174
dilation matrix, 94
Banach space, 34 downsampling, 37, 39
band-limited, 51 dual basis, 54
basis dual lifting, 109
discrete-time, 33
L2 (R), 52 entropy, 154
Riesz, 53 extension of signals, 200
Battle-Lemarié wavelet, 142
Bernstein theorem, 184 factorization, 47
best basis algorithm, 153 filter, 17
biorthogonal, 54 allpass, 23
biorthogonality, 74 antisymmetric, 25
causal, 19
Calderón-Zygmund operator, 180 finite impulse response, 19
cancellation property, 114 frequency response, 22
Cauchy sequence, 34 group delay, 26
Cauchy-Schwarz inequality, 35, 50 highpass, 23
causal, 19 impulse response of, 18
characterizations, 69 infinite impulse response, 19
Chui wavelet, 143 linear, 17
closed subspace, 51 linear phase, 24
complete, 34 lowpass, 23
208
INDEX 209
LTI, 18 magnitude response, 23

magnitude response, 23 Meyer wavelet, 141
phase function, 23 modulation matrix, 75, 105
stable, 20 mother wavelet, 64
symmetric, 25 multiresolution analysis, 57
time-invariant, 18
transfer function, 21 no distortion, 42
filter banks non-standard form, 176
orthogonal, 44 norm, 34
fingerprint, 138 normed space, 34
forward transform, 72 order of MRA, 78
Fourier transform, 30 orthogonal complement, 52
discrete, 29
discrete-time, 22 Parseval formula, 22, 51
frequency response, 22 perfect reconstruction
condition, 43
Galerkin method, 174 phase function, 23
group delay, 26 Plancherel formula, 51
Poisson summation formula, 30
Haar wavelet, 63 polyphase matrix, 105
Heisenberg box, 146 post-filtering, 203
hexogonal lattice, 97 product filter, 43
highpass filter, 23 Daubechies, 46
Hilbert space, 35, 50 product formula, 60
implementation, 199 projection, 51
post-filtering, 203 quantization rule, 163
pre-filtering, 203
impulse response, 18 Riesz basis, 53
inner product, 35
sampling densities, 97
integral equation, 174
sampling lattice, 94
internet, 14
sampling theorem, 31
inverse transform, 73
scaling equation, 59
Jackson theorem, 184 scaling function, 58
signal, 17
lazy filter, 45, 111 time reverse of, 21
LDB algorithm, 196 sinc function, 60
lifting, 109 sinc wavelet, 63
linear phase, 24 spectral factorization, 126
logarithmic potential, 174 splitting trick, 150
lowpass filter, 23 standard form, 177
210 INDEX
subsampling lattice, 94
symmetric, 25
synthesis, 38
threshold, 169
time-frequency atom, 145
transfer function, 21
tree
wavelet packet, 149
upsampling, 38, 40
vanishing moment, 79
wavelet, 63
wavelet decomposition, 65
wavelet equation, 64
wavelet packet tree, 149
wavelet packets, 151
web, 14
white noise, 166
window function, 147, 156
z-transform, 20

Wavelets

Uploaded by

Wavelets

Uploaded by

WAVELETS

Jöran Bergh Fredrik Ekstedt Martin Lindberg

Why write another book on wavelets?

Göteborg in January 1999, JB FE ML

An Outline of the Contents

4.3 Wavelets and Detail Spaces . . . . . . . . . . . . . . . . . . . 61

5 Wavelets in Higher Dimensions 83

6 The Lifting Scheme 101

7 The Continuous Wavelet Transform 113

9 Adaptive Bases 145

10 Compression and Noise Reduction 161

11 Fast Numerical Linear Algebra 173

12 Functional Analysis 183

13 An Analysis Tool 187

14 Feature Extraction 193

15 Implementation Issues 199

What are wavelets and when are they useful?

Under certain mild conditions on the function f we have both a Fourier

Roughly put, the contributions are, respectively,

ψ(t) = D(tψ(t)) − tDψ(t)

followed by a partial integration, and an application of the the Cauchy-

|hf, gi| ≤ kf k kgk

1.1 The Haar Wavelet and Approximation

ϕ(t) = ϕ(2t) + ϕ(2t − 1)

Figure 1.1: The functions ϕ (left) and ψ (right).

The approximations are integer translates of dilated scaling functions.

Figure 1.2: The function and its chosen step approximation.

Clearly, this approximation gets better with increasing j, and, choosing

We may repeat this procedure as many times as we please. When we

Figure 1.4: The third successive means and differences.

1.2 An Example of a Wavelet Transform

−0.2 0 0.2 0.4 0.6 0.8 1

Figure 1.5: The given function.

Figure 1.6: The decomposition (left) and the coefficients (right).

1.3 Fourier vs Wavelet

x(t) = 1/4 − 1/2 cos πt/2 + 1/4 cos πt

From a Haar wavelet analysis viewpoint, the sample sequence is encoded

x(t) = 0 ϕ(t) + 0 ϕ(t − 1) + 1 ϕ(t − 2) + 0 ϕ(t − 3)

which is the corresponding low-frequency part shown in Figure 1.9. (The

Figure 1.9: The corresponding mean (low-frequency) component Hx.

This are shown in Figure 1.10.

This is encoded in the function (0 < t < 4)

Figure 1.11: The mean in both Fourier and Haar representations.

The wavelet analysis may be seen as a successive procedure of taking

1.4 Fingerprints and Image Compression

1.5 Noise Reduction

Figure 1.12: A fingerprint produced from a grey-scale digital representation.

A main reason for the applied mathematician’s interest in wavelets is the

Information on the Web

In this chapter we review standard material from discrete-time signal pro-

2.1 Signals and Filters

Figure 2.1: Block diagram of a filter.

Here we have introduced a new operator, the convolution between h and x,

The result of correlating a signal with itself is referred to as the autocorrela-

Example 2.2. In filter banks the downsampling (↓ 2) and upsampling (↑ 2)

(↓ 2)x = (. . . , x−4 , x−2 , x0 , x2 , x4 , . . . ),

These two operators are linear but not time-invariant.

2.2 The z-transform

and occasionally we write this as x ⊃ X(z). The series is convergent, and

2.3 The Fourier Transform

(2.14) y =h∗x ⇔ Y (ω) = H(ω)X(ω).

The Fourier transform H(ω) of the impulse response of a LTI filter is

Figure 2.2: The magnitude responses of a low- and a highpass filter.

2.4 Linear Phase and Symmetry

φ(ω) = −ω/2, |ω| < π.

This is an example of a linear phase filter.

If the filter coefficients of a filter H are symmetric around zero, so that

H(ω) = h0 + h1 (eiω + e−iω ) + h2 (ei2ω + e−i2ω ) + · · ·

H(ω) = h1 (−eiω + e−iω ) + h2 (−ei2ω + e−i2ω ) + · · ·

Recall that the frequency responses of symmetric and antisymmetric filters

If the group delay is constant, different frequencies are delayed an equal

G(ω) = ie−iω/2 sin(ω/2).

2.5 Vector Spaces

The numbers a1 , . . . , an are the coordinates of x with respect to the basis

For an orthonormal basis the coordinates ak = hx, ϕ(k) i, and then

x = hx, ϕ(1) iϕ(1) + · · · + hx, ϕ(n) iϕ(n) .